qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 00/65] Migration 20231011 patches
@ 2023-10-11  9:20 Juan Quintela
  2023-10-11  9:20 ` [PULL 01/65] migration/qmp: Fix crash on setting tls-authz with null Juan Quintela
                   ` (65 more replies)
  0 siblings, 66 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

The following changes since commit 0ad0d9dcd16f2ea816a413008ac5191ebef45c95:

  Merge tag 'firmware/seabios-20231010-pull-request' of https://gitlab.com/kraxel/qemu into staging (2023-10-10 10:22:16 -0400)

are available in the Git repository at:

  https://gitlab.com/juan.quintela/qemu.git tags/migration-20231011-pull-request

for you to fetch changes up to 5e79a4bf032213fd59aa614781751fe76584f8e8:

  migration: Add migration_rp_wait|kick() (2023-10-11 11:17:05 +0200)

----------------------------------------------------------------
Migration Pull request (20231011 edition)

Hi

In this pull request:

- Markus RDMA cleanup series
- recover fixes from peter
- migration capability from fabiano
- negative migration test from fabiano.

Please, pull.

Thanks, Juan.

----------------------------------------------------------------

Fabiano Rosas (5):
  tests/qtest: migration: Expose migrate_set_capability
  tests/qtest: migration: Add migrate_incoming_qmp helper
  tests/qtest: migration: Use migrate_incoming_qmp where appropriate
  migration: Set migration status early in incoming side
  tests/qtest: migration: Add support for negative testing of
    qmp_migrate

Markus Armbruster (53):
  migration/rdma: Clean up qemu_rdma_poll()'s return type
  migration/rdma: Clean up qemu_rdma_data_init()'s return type
  migration/rdma: Clean up rdma_delete_block()'s return type
  migration/rdma: Drop fragile wr_id formatting
  migration/rdma: Consistently use uint64_t for work request IDs
  migration/rdma: Fix unwanted integer truncation
  migration/rdma: Clean up two more harmless signed vs. unsigned issues
  migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  migration/rdma: Put @errp parameter last
  migration/rdma: Eliminate error_propagate()
  migration/rdma: Drop rdma_add_block() error handling
  migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  migration/rdma: Make qemu_rdma_buffer_mergeable() return bool
  migration/rdma: Use bool for two RDMAContext flags
  migration/rdma: Fix or document problematic uses of errno
  migration/rdma: Ditch useless numeric error codes in error messages
  migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  migration/rdma: Fix QEMUFileHooks method return values
  migration/rdma: Fix rdma_getaddrinfo() error checking
  migration/rdma: Return -1 instead of negative errno code
  migration/rdma: Dumb down remaining int error values to -1
  migration/rdma: Replace int error_state by bool errored
  migration/rdma: Drop superfluous assignments to @ret
  migration/rdma: Check negative error values the same way everywhere
  migration/rdma: Plug a memory leak and improve a message
  migration/rdma: Delete inappropriate error_report() in macro ERROR()
  migration/rdma: Retire macro ERROR()
  migration/rdma: Fix error handling around rdma_getaddrinfo()
  migration/rdma: Drop "@errp is clear" guards around error_setg()
  migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  migration/rdma: Convert qemu_rdma_exchange_send() to Error
  migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  migration/rdma: Convert qemu_rdma_write_flush() to Error
  migration/rdma: Convert qemu_rdma_write_one() to Error
  migration/rdma: Convert qemu_rdma_write() to Error
  migration/rdma: Convert qemu_rdma_post_send_control() to Error
  migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  migration/rdma: Silence qemu_rdma_resolve_host()
  migration/rdma: Silence qemu_rdma_connect()
  migration/rdma: Silence qemu_rdma_reg_control()
  migration/rdma: Don't report received completion events as error
  migration/rdma: Silence qemu_rdma_block_for_wrid()
  migration/rdma: Silence qemu_rdma_register_and_get_keys()
  migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  migration/rdma: Use error_report() & friends instead of stderr
  migration/rdma: Replace flawed device detail dump by tracing

Peter Xu (7):
  migration/qmp: Fix crash on setting tls-authz with null
  migration: Allow RECOVER->PAUSED convertion for dest qemu
  migration: Display error in query-migrate irrelevant of status
  migration: Introduce migrate_has_error()
  qemufile: Always return a verbose error
  migration: Remember num of ramblocks to sync during recovery
  migration: Add migration_rp_wait|kick()

 qapi/migration.json               |    5 +-
 migration/migration.h             |   16 +
 tests/qtest/libqtest.h            |   28 +
 tests/qtest/migration-helpers.h   |   10 +
 migration/migration.c             |   36 +-
 migration/options.c               |    9 +-
 migration/qemu-file.c             |   15 +-
 migration/ram.c                   |   33 +-
 migration/rdma.c                  | 1023 ++++++++++++++---------------
 migration/savevm.c                |    3 +-
 tests/qtest/libqtest.c            |   33 +
 tests/qtest/migration-helpers.c   |   60 ++
 tests/qtest/migration-test.c      |   39 +-
 tests/qtest/virtio-net-failover.c |   77 +--
 migration/trace-events            |   10 +-
 tests/qtest/meson.build           |    1 +
 16 files changed, 757 insertions(+), 641 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PULL 01/65] migration/qmp: Fix crash on setting tls-authz with null
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
@ 2023-10-11  9:20 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 02/65] tests/qtest: migration: Expose migrate_set_capability Juan Quintela
                   ` (64 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:20 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas, qemu-stable, Daniel P . Berrangé,
	Philippe Mathieu-Daudé

From: Peter Xu <peterx@redhat.com>

QEMU will crash if anyone tries to set tls-authz (which is a type
StrOrNull) with 'null' value.  Fix it in the easy way by converting it to
qstring just like the other two tls parameters.

Cc: qemu-stable@nongnu.org # v4.0+
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230905162335.235619-2-peterx@redhat.com>
---
 migration/options.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..6bbfd4853d 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1408,20 +1408,25 @@ void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
 {
     MigrationParameters tmp;
 
-    /* TODO Rewrite "" to null instead */
+    /* TODO Rewrite "" to null instead for all three tls_* parameters */
     if (params->tls_creds
         && params->tls_creds->type == QTYPE_QNULL) {
         qobject_unref(params->tls_creds->u.n);
         params->tls_creds->type = QTYPE_QSTRING;
         params->tls_creds->u.s = strdup("");
     }
-    /* TODO Rewrite "" to null instead */
     if (params->tls_hostname
         && params->tls_hostname->type == QTYPE_QNULL) {
         qobject_unref(params->tls_hostname->u.n);
         params->tls_hostname->type = QTYPE_QSTRING;
         params->tls_hostname->u.s = strdup("");
     }
+    if (params->tls_authz
+        && params->tls_authz->type == QTYPE_QNULL) {
+        qobject_unref(params->tls_authz->u.n);
+        params->tls_authz->type = QTYPE_QSTRING;
+        params->tls_authz->u.s = strdup("");
+    }
 
     migrate_params_test_apply(params, &tmp);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 02/65] tests/qtest: migration: Expose migrate_set_capability
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
  2023-10-11  9:20 ` [PULL 01/65] migration/qmp: Fix crash on setting tls-authz with null Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 03/65] tests/qtest: migration: Add migrate_incoming_qmp helper Juan Quintela
                   ` (63 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Fabiano Rosas <farosas@suse.de>

The following patch will make use of this function from within
migrate-helpers.c, so move it there.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-2-farosas@suse.de>
---
 tests/qtest/migration-helpers.h |  3 +++
 tests/qtest/migration-helpers.c | 11 +++++++++++
 tests/qtest/migration-test.c    | 11 -----------
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 009e250e90..484d7c960f 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,9 @@ bool migrate_watch_for_resume(QTestState *who, const char *name,
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
+void migrate_set_capability(QTestState *who, const char *capability,
+                            bool value);
+
 QDict *migrate_query(QTestState *who);
 QDict *migrate_query_not_failed(QTestState *who);
 
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index be00c52d00..2df198c99e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -70,6 +70,17 @@ void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...)
                              "{ 'execute': 'migrate', 'arguments': %p}", args);
 }
 
+void migrate_set_capability(QTestState *who, const char *capability,
+                            bool value)
+{
+    qtest_qmp_assert_success(who,
+                             "{ 'execute': 'migrate-set-capabilities',"
+                             "'arguments': { "
+                             "'capabilities': [ { "
+                             "'capability': %s, 'state': %i } ] } }",
+                             capability, value);
+}
+
 /*
  * Note: caller is responsible to free the returned object via
  * qobject_unref() after use
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 46f1c275a2..48778565a3 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -605,17 +605,6 @@ static void migrate_cancel(QTestState *who)
     qtest_qmp_assert_success(who, "{ 'execute': 'migrate_cancel' }");
 }
 
-static void migrate_set_capability(QTestState *who, const char *capability,
-                                   bool value)
-{
-    qtest_qmp_assert_success(who,
-                             "{ 'execute': 'migrate-set-capabilities',"
-                             "'arguments': { "
-                             "'capabilities': [ { "
-                             "'capability': %s, 'state': %i } ] } }",
-                             capability, value);
-}
-
 static void migrate_postcopy_start(QTestState *from, QTestState *to)
 {
     qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 03/65] tests/qtest: migration: Add migrate_incoming_qmp helper
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
  2023-10-11  9:20 ` [PULL 01/65] migration/qmp: Fix crash on setting tls-authz with null Juan Quintela
  2023-10-11  9:21 ` [PULL 02/65] tests/qtest: migration: Expose migrate_set_capability Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 04/65] tests/qtest: migration: Use migrate_incoming_qmp where appropriate Juan Quintela
                   ` (62 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Fabiano Rosas <farosas@suse.de>

file-based migration requires the target to initiate its migration after
the source has finished writing out the data in the file. Currently
there's no easy way to initiate 'migrate-incoming', allow this by
introducing migrate_incoming_qmp helper, similarly to migrate_qmp.

Also make sure migration events are enabled and wait for the incoming
migration to start before returning. This avoid a race when querying
the migration status too soon after issuing the command.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-3-farosas@suse.de>
---
 tests/qtest/migration-helpers.h |  4 ++++
 tests/qtest/migration-helpers.c | 29 +++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 484d7c960f..57d295a4fe 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -23,6 +23,10 @@ bool migrate_watch_for_resume(QTestState *who, const char *name,
 G_GNUC_PRINTF(3, 4)
 void migrate_qmp(QTestState *who, const char *uri, const char *fmt, ...);
 
+G_GNUC_PRINTF(3, 4)
+void migrate_incoming_qmp(QTestState *who, const char *uri,
+                          const char *fmt, ...);
+
 void migrate_set_capability(QTestState *who, const char *capability,
                             bool value);
 
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 2df198c99e..08f5ee1179 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -81,6 +81,35 @@ void migrate_set_capability(QTestState *who, const char *capability,
                              capability, value);
 }
 
+void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
+{
+    va_list ap;
+    QDict *args, *rsp, *data;
+
+    va_start(ap, fmt);
+    args = qdict_from_vjsonf_nofail(fmt, ap);
+    va_end(ap);
+
+    g_assert(!qdict_haskey(args, "uri"));
+    qdict_put_str(args, "uri", uri);
+
+    migrate_set_capability(to, "events", true);
+
+    rsp = qtest_qmp(to, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
+                    args);
+    g_assert(qdict_haskey(rsp, "return"));
+    qobject_unref(rsp);
+
+    rsp = qtest_qmp_eventwait_ref(to, "MIGRATION");
+    g_assert(qdict_haskey(rsp, "data"));
+
+    data = qdict_get_qdict(rsp, "data");
+    g_assert(qdict_haskey(data, "status"));
+    g_assert_cmpstr(qdict_get_str(data, "status"), ==, "setup");
+
+    qobject_unref(rsp);
+}
+
 /*
  * Note: caller is responsible to free the returned object via
  * qobject_unref() after use
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 04/65] tests/qtest: migration: Use migrate_incoming_qmp where appropriate
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (2 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 03/65] tests/qtest: migration: Add migrate_incoming_qmp helper Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 05/65] migration: Set migration status early in incoming side Juan Quintela
                   ` (61 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Fabiano Rosas <farosas@suse.de>

Use the new migrate_incoming_qmp helper in the places that currently
open-code calling migrate-incoming.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-4-farosas@suse.de>
---
 tests/qtest/migration-test.c      | 12 ++---
 tests/qtest/virtio-net-failover.c | 77 ++++---------------------------
 tests/qtest/meson.build           |  1 +
 3 files changed, 14 insertions(+), 76 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 48778565a3..84660b3e3d 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1970,8 +1970,7 @@ static void *test_migrate_fd_start_hook(QTestState *from,
     close(pair[0]);
 
     /* Start incoming migration from the 1st socket */
-    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
-                             "  'arguments': { 'uri': 'fd:fd-mig' }}");
+    migrate_incoming_qmp(to, "fd:fd-mig", "{}");
 
     /* Send the 2nd socket to the target */
     qtest_qmp_fds_assert_success(from, &pair[1], 1,
@@ -2193,8 +2192,7 @@ test_migrate_precopy_tcp_multifd_start_common(QTestState *from,
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
-                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
 
     return NULL;
 }
@@ -2447,8 +2445,7 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
-                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+    migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}");
 
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
@@ -2478,8 +2475,7 @@ static void test_multifd_tcp_cancel(void)
     migrate_set_capability(to2, "multifd", true);
 
     /* Start incoming migration from the 1st socket */
-    qtest_qmp_assert_success(to2, "{ 'execute': 'migrate-incoming',"
-                             "  'arguments': { 'uri': 'tcp:127.0.0.1:0' }}");
+    migrate_incoming_qmp(to2, "tcp:127.0.0.1:0", "{}");
 
     g_free(uri);
     uri = migrate_get_socket_address(to2, "socket-address");
diff --git a/tests/qtest/virtio-net-failover.c b/tests/qtest/virtio-net-failover.c
index 4a809590bf..0d40bc1f2d 100644
--- a/tests/qtest/virtio-net-failover.c
+++ b/tests/qtest/virtio-net-failover.c
@@ -11,6 +11,7 @@
 #include "libqtest.h"
 #include "libqos/pci.h"
 #include "libqos/pci-pc.h"
+#include "migration-helpers.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
 #include "qapi/qmp/qjson.h"
@@ -736,26 +737,10 @@ static void test_migrate_out(gconstpointer opaque)
     machine_stop(qts);
 }
 
-static QDict *get_migration_event(QTestState *qts)
-{
-    QDict *resp;
-    QDict *data;
-
-    resp = qtest_qmp_eventwait_ref(qts, "MIGRATION");
-    g_assert(qdict_haskey(resp, "data"));
-
-    data = qdict_get_qdict(resp, "data");
-    g_assert(qdict_haskey(data, "status"));
-    qobject_ref(data);
-    qobject_unref(resp);
-
-    return data;
-}
-
 static void test_migrate_in(gconstpointer opaque)
 {
     QTestState *qts;
-    QDict *resp, *args, *ret;
+    QDict *resp, *ret;
     g_autofree gchar *uri = g_strdup_printf("exec: cat %s", (gchar *)opaque);
 
     qts = machine_start(BASE_MACHINE
@@ -787,18 +772,7 @@ static void test_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
 
-    args = qdict_from_jsonf_nofail("{}");
-    g_assert_nonnull(args);
-    qdict_put_str(args, "uri", uri);
-
-    resp = qtest_qmp(qts, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
-                     args);
-    g_assert(qdict_haskey(resp, "return"));
-    qobject_unref(resp);
-
-    resp = get_migration_event(qts);
-    g_assert_cmpstr(qdict_get_str(resp, "status"), ==, "setup");
-    qobject_unref(resp);
+    migrate_incoming_qmp(qts, uri, "{}");
 
     resp = get_failover_negociated_event(qts);
     g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
@@ -888,7 +862,7 @@ static void test_off_migrate_out(gconstpointer opaque)
 static void test_off_migrate_in(gconstpointer opaque)
 {
     QTestState *qts;
-    QDict *resp, *args, *ret;
+    QDict *ret;
     g_autofree gchar *uri = g_strdup_printf("exec: cat %s", (gchar *)opaque);
 
     qts = machine_start(BASE_MACHINE
@@ -920,18 +894,7 @@ static void test_off_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, true, "primary0", MAC_PRIMARY0);
 
-    args = qdict_from_jsonf_nofail("{}");
-    g_assert_nonnull(args);
-    qdict_put_str(args, "uri", uri);
-
-    resp = qtest_qmp(qts, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
-                     args);
-    g_assert(qdict_haskey(resp, "return"));
-    qobject_unref(resp);
-
-    resp = get_migration_event(qts);
-    g_assert_cmpstr(qdict_get_str(resp, "status"), ==, "setup");
-    qobject_unref(resp);
+    migrate_incoming_qmp(qts, uri, "{}");
 
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, true, "primary0", MAC_PRIMARY0);
@@ -1026,7 +989,7 @@ static void test_guest_off_migrate_out(gconstpointer opaque)
 static void test_guest_off_migrate_in(gconstpointer opaque)
 {
     QTestState *qts;
-    QDict *resp, *args, *ret;
+    QDict *ret;
     g_autofree gchar *uri = g_strdup_printf("exec: cat %s", (gchar *)opaque);
 
     qts = machine_start(BASE_MACHINE
@@ -1058,18 +1021,7 @@ static void test_guest_off_migrate_in(gconstpointer opaque)
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
 
-    args = qdict_from_jsonf_nofail("{}");
-    g_assert_nonnull(args);
-    qdict_put_str(args, "uri", uri);
-
-    resp = qtest_qmp(qts, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
-                     args);
-    g_assert(qdict_haskey(resp, "return"));
-    qobject_unref(resp);
-
-    resp = get_migration_event(qts);
-    g_assert_cmpstr(qdict_get_str(resp, "status"), ==, "setup");
-    qobject_unref(resp);
+    migrate_incoming_qmp(qts, uri, "{}");
 
     check_one_card(qts, true, "standby0", MAC_STANDBY0);
     check_one_card(qts, false, "primary0", MAC_PRIMARY0);
@@ -1728,7 +1680,7 @@ static void test_multi_out(gconstpointer opaque)
 static void test_multi_in(gconstpointer opaque)
 {
     QTestState *qts;
-    QDict *resp, *args, *ret;
+    QDict *resp, *ret;
     g_autofree gchar *uri = g_strdup_printf("exec: cat %s", (gchar *)opaque);
 
     qts = machine_start(BASE_MACHINE
@@ -1794,18 +1746,7 @@ static void test_multi_in(gconstpointer opaque)
     check_one_card(qts, true, "standby1", MAC_STANDBY1);
     check_one_card(qts, false, "primary1", MAC_PRIMARY1);
 
-    args = qdict_from_jsonf_nofail("{}");
-    g_assert_nonnull(args);
-    qdict_put_str(args, "uri", uri);
-
-    resp = qtest_qmp(qts, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
-                     args);
-    g_assert(qdict_haskey(resp, "return"));
-    qobject_unref(resp);
-
-    resp = get_migration_event(qts);
-    g_assert_cmpstr(qdict_get_str(resp, "status"), ==, "setup");
-    qobject_unref(resp);
+    migrate_incoming_qmp(qts, uri, "{}");
 
     resp = get_failover_negociated_event(qts);
     g_assert_cmpstr(qdict_get_str(resp, "device-id"), ==, "standby0");
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 1fba07f4ed..66795cfcd2 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -317,6 +317,7 @@ qtests = {
   'tpm-tis-i2c-test': [io, tpmemu_files, 'qtest_aspeed.c'],
   'tpm-tis-device-swtpm-test': [io, tpmemu_files, 'tpm-tis-util.c'],
   'tpm-tis-device-test': [io, tpmemu_files, 'tpm-tis-util.c'],
+  'virtio-net-failover': files('migration-helpers.c'),
   'vmgenid-test': files('boot-sector.c', 'acpi-utils.c'),
   'netdev-socket': files('netdev-socket.c', '../unit/socket-helpers.c'),
 }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 05/65] migration: Set migration status early in incoming side
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (3 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 04/65] tests/qtest: migration: Use migrate_incoming_qmp where appropriate Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate Juan Quintela
                   ` (60 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Fabiano Rosas <farosas@suse.de>

We are sending a migration event of MIGRATION_STATUS_SETUP at
qemu_start_incoming_migration but never actually setting the state.

This creates a window between qmp_migrate_incoming and
process_incoming_migration_co where the migration status is still
MIGRATION_STATUS_NONE. Calling query-migrate during this time will
return an empty response even though the incoming migration command
has already been issued.

Commit 7cf1fe6d68 ("migration: Add migration events on target side")
has added support to the 'events' capability to the incoming part of
migration, but chose to send the SETUP event without setting the
state. I'm assuming this was a mistake.

This introduces a change in behavior, any QMP client waiting for the
SETUP event will hang, unless it has previously enabled the 'events'
capability. Having the capability enabled is sufficient to continue to
receive the event.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-5-farosas@suse.de>
---
 migration/migration.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 585d3c8f55..2057e42134 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -431,13 +431,16 @@ void migrate_add_address(SocketAddress *address)
 static void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p = NULL;
+    MigrationIncomingState *mis = migration_incoming_get_current();
 
     /* URI is not suitable for migration? */
     if (!migration_channels_and_uri_compatible(uri, errp)) {
         return;
     }
 
-    qapi_event_send_migration(MIGRATION_STATUS_SETUP);
+    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
+                      MIGRATION_STATUS_SETUP);
+
     if (strstart(uri, "tcp:", &p) ||
         strstart(uri, "unix:", NULL) ||
         strstart(uri, "vsock:", NULL)) {
@@ -531,7 +534,7 @@ process_incoming_migration_co(void *opaque)
 
     mis->largest_page_size = qemu_ram_pagesize_largest();
     postcopy_state_set(POSTCOPY_INCOMING_NONE);
-    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
+    migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP,
                       MIGRATION_STATUS_ACTIVE);
 
     mis->loadvm_co = qemu_coroutine_self();
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (4 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 05/65] migration: Set migration status early in incoming side Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11 13:04   ` Fabiano Rosas
  2023-10-11  9:21 ` [PULL 07/65] migration: Allow RECOVER->PAUSED convertion for dest qemu Juan Quintela
                   ` (59 subsequent siblings)
  65 siblings, 1 reply; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Fabiano Rosas <farosas@suse.de>

There is currently no way to write a test for errors that happened in
qmp_migrate before the migration has started.

Add a version of qmp_migrate that ensures an error happens. To make
use of it a test needs to set MigrateCommon.result as
MIG_TEST_QMP_ERROR.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230712190742.22294-6-farosas@suse.de>
---
 tests/qtest/libqtest.h          | 28 ++++++++++++++++++++++++++++
 tests/qtest/migration-helpers.h |  3 +++
 tests/qtest/libqtest.c          | 33 +++++++++++++++++++++++++++++++++
 tests/qtest/migration-helpers.c | 20 ++++++++++++++++++++
 tests/qtest/migration-test.c    | 16 ++++++++++++----
 5 files changed, 96 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index e53e350e3a..5fe3d13466 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -810,6 +810,34 @@ void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
     G_GNUC_PRINTF(4, 0);
 #endif /* !_WIN32 */
 
+/**
+ * qtest_qmp_assert_failure_ref:
+ * @qts: QTestState instance to operate on
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ *
+ * Sends a QMP message to QEMU, asserts that an 'error' key is present in
+ * the response, and returns the response.
+ */
+QDict *qtest_qmp_assert_failure_ref(QTestState *qts, const char *fmt, ...)
+    G_GNUC_PRINTF(2, 3);
+
+/**
+ * qtest_vqmp_assert_failure_ref:
+ * @qts: QTestState instance to operate on
+ * @fmt: QMP message to send to qemu, formatted like
+ * qobject_from_jsonf_nofail().  See parse_interpolation() for what's
+ * supported after '%'.
+ * @args: variable arguments for @fmt
+ *
+ * Sends a QMP message to QEMU, asserts that an 'error' key is present in
+ * the response, and returns the response.
+ */
+QDict *qtest_vqmp_assert_failure_ref(QTestState *qts,
+                                     const char *fmt, va_list args)
+    G_GNUC_PRINTF(2, 0);
+
 /**
  * qtest_qmp_assert_success_ref:
  * @qts: QTestState instance to operate on
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 57d295a4fe..4f51d0f8bc 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -27,6 +27,9 @@ G_GNUC_PRINTF(3, 4)
 void migrate_incoming_qmp(QTestState *who, const char *uri,
                           const char *fmt, ...);
 
+G_GNUC_PRINTF(3, 4)
+void migrate_qmp_fail(QTestState *who, const char *uri, const char *fmt, ...);
+
 void migrate_set_capability(QTestState *who, const char *capability,
                             bool value);
 
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 3f94a4f477..dc7a55634c 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -1259,6 +1259,28 @@ void qtest_memset(QTestState *s, uint64_t addr, uint8_t pattern, size_t size)
     qtest_rsp(s);
 }
 
+QDict *qtest_vqmp_assert_failure_ref(QTestState *qts,
+                                     const char *fmt, va_list args)
+{
+    QDict *response;
+    QDict *ret;
+
+    response = qtest_vqmp(qts, fmt, args);
+
+    g_assert(response);
+    if (!qdict_haskey(response, "error")) {
+        g_autoptr(GString) s = qobject_to_json_pretty(QOBJECT(response), true);
+        g_test_message("%s", s->str);
+    }
+    g_assert(qdict_haskey(response, "error"));
+    g_assert(!qdict_haskey(response, "return"));
+    ret = qdict_get_qdict(response, "error");
+    qobject_ref(ret);
+    qobject_unref(response);
+
+    return ret;
+}
+
 QDict *qtest_vqmp_assert_success_ref(QTestState *qts,
                                      const char *fmt, va_list args)
 {
@@ -1321,6 +1343,17 @@ void qtest_vqmp_fds_assert_success(QTestState *qts, int *fds, size_t nfds,
 }
 #endif /* !_WIN32 */
 
+QDict *qtest_qmp_assert_failure_ref(QTestState *qts, const char *fmt, ...)
+{
+    QDict *response;
+    va_list ap;
+
+    va_start(ap, fmt);
+    response = qtest_vqmp_assert_failure_ref(qts, fmt, ap);
+    va_end(ap);
+    return response;
+}
+
 QDict *qtest_qmp_assert_success_ref(QTestState *qts, const char *fmt, ...)
 {
     QDict *response;
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 08f5ee1179..0c185db450 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -49,6 +49,26 @@ bool migrate_watch_for_resume(QTestState *who, const char *name,
     return false;
 }
 
+void migrate_qmp_fail(QTestState *who, const char *uri, const char *fmt, ...)
+{
+    va_list ap;
+    QDict *args, *err;
+
+    va_start(ap, fmt);
+    args = qdict_from_vjsonf_nofail(fmt, ap);
+    va_end(ap);
+
+    g_assert(!qdict_haskey(args, "uri"));
+    qdict_put_str(args, "uri", uri);
+
+    err = qtest_qmp_assert_failure_ref(
+        who, "{ 'execute': 'migrate', 'arguments': %p}", args);
+
+    g_assert(qdict_haskey(err, "desc"));
+
+    qobject_unref(err);
+}
+
 /*
  * Send QMP command "migrate".
  * Arguments are built from @fmt... (formatted like
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 84660b3e3d..8eb2053dbb 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -697,6 +697,8 @@ typedef struct {
         MIG_TEST_FAIL,
         /* This test should fail, dest qemu should fail with abnormal status */
         MIG_TEST_FAIL_DEST_QUIT_ERR,
+        /* The QMP command for this migration should fail with an error */
+        MIG_TEST_QMP_ERROR,
     } result;
 
     /*
@@ -1503,6 +1505,7 @@ static void test_precopy_common(MigrateCommon *args)
 {
     QTestState *from, *to;
     void *data_hook = NULL;
+    g_autofree char *connect_uri = NULL;
 
     if (test_migrate_start(&from, &to, args->listen_uri, &args->start)) {
         return;
@@ -1537,13 +1540,17 @@ static void test_precopy_common(MigrateCommon *args)
     }
 
     if (!args->connect_uri) {
-        g_autofree char *local_connect_uri =
-            migrate_get_socket_address(to, "socket-address");
-        migrate_qmp(from, local_connect_uri, "{}");
+        connect_uri = migrate_get_socket_address(to, "socket-address");
     } else {
-        migrate_qmp(from, args->connect_uri, "{}");
+        connect_uri = g_strdup(args->connect_uri);
     }
 
+    if (args->result == MIG_TEST_QMP_ERROR) {
+        migrate_qmp_fail(from, connect_uri, "{}");
+        goto finish;
+    }
+
+    migrate_qmp(from, connect_uri, "{}");
 
     if (args->result != MIG_TEST_SUCCEED) {
         bool allow_active = args->result == MIG_TEST_FAIL;
@@ -1595,6 +1602,7 @@ static void test_precopy_common(MigrateCommon *args)
         wait_for_serial("dest_serial");
     }
 
+finish:
     if (args->finish_hook) {
         args->finish_hook(from, to, data_hook);
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 07/65] migration: Allow RECOVER->PAUSED convertion for dest qemu
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (5 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 08/65] migration/rdma: Clean up qemu_rdma_poll()'s return type Juan Quintela
                   ` (58 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

There's a bug on dest that if a double fault triggered on dest qemu (a
network issue during postcopy-recover), we won't set PAUSED correctly
because we assumed we always came from ACTIVE.

Fix that by always overwriting the state to PAUSE.

We could also check for these two states, but maybe it's an overkill.  We
did the same on the src QEMU to unconditionally switch to PAUSE anyway.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-10-peterx@redhat.com>
---
 migration/savevm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 60eec7c31f..497ce02bd7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2734,7 +2734,8 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
         qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
     }
 
-    migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
+    /* Current state can be either ACTIVE or RECOVER */
+    migrate_set_state(&mis->state, mis->state,
                       MIGRATION_STATUS_POSTCOPY_PAUSED);
 
     /* Notify the fault thread for the invalidated file handle */
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 08/65] migration/rdma: Clean up qemu_rdma_poll()'s return type
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (6 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 07/65] migration: Allow RECOVER->PAUSED convertion for dest qemu Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 09/65] migration/rdma: Clean up qemu_rdma_data_init()'s " Juan Quintela
                   ` (57 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
-1, or @ret, which is int.  Its callers assign the return value to int
variables, then check whether it's negative.  Unclean.

Return int instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-2-armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index cd5e1afe60..e72864d1cf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1469,8 +1469,8 @@ static uint64_t qemu_rdma_make_wrid(uint64_t wr_id, uint64_t index,
  * (of any kind) has completed.
  * Return the work request ID that completed.
  */
-static uint64_t qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
-                               uint64_t *wr_id_out, uint32_t *byte_len)
+static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
+                          uint64_t *wr_id_out, uint32_t *byte_len)
 {
     int ret;
     struct ibv_wc wc;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 09/65] migration/rdma: Clean up qemu_rdma_data_init()'s return type
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (7 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 08/65] migration/rdma: Clean up qemu_rdma_poll()'s return type Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 10/65] migration/rdma: Clean up rdma_delete_block()'s " Juan Quintela
                   ` (56 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_data_init() return type is void *.  It actually returns
RDMAContext *, and all its callers assign the value to an
RDMAContext *.  Unclean.

Return RDMAContext * instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-3-armbru@redhat.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index e72864d1cf..1432fb80ec 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2759,7 +2759,7 @@ static void qemu_rdma_return_path_dest_init(RDMAContext *rdma_return_path,
     rdma_return_path->is_return_path = true;
 }
 
-static void *qemu_rdma_data_init(const char *host_port, Error **errp)
+static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
 {
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 10/65] migration/rdma: Clean up rdma_delete_block()'s return type
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (8 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 09/65] migration/rdma: Clean up qemu_rdma_data_init()'s " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 11/65] migration/rdma: Drop fragile wr_id formatting Juan Quintela
                   ` (55 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

rdma_delete_block() always returns 0, which its only caller ignores.
Return void instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-4-armbru@redhat.com>
---
 migration/rdma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1432fb80ec..65ed814d88 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -668,7 +668,7 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
  * Note: If used outside of cleanup, the caller must ensure that the destination
  * block structures are also updated
  */
-static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
+static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *old = local->block;
@@ -754,8 +754,6 @@ static int rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
                                 &local->block[x]);
         }
     }
-
-    return 0;
 }
 
 /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 11/65] migration/rdma: Drop fragile wr_id formatting
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (9 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 10/65] migration/rdma: Clean up rdma_delete_block()'s " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 12/65] migration/rdma: Consistently use uint64_t for work request IDs Juan Quintela
                   ` (54 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

wrid_desc[] uses 4001 pointers to map four integer values to strings.

print_wrid() accesses wrid_desc[] out of bounds when passed a negative
argument.  It returns null for values 2..1999 and 2001..3999.

qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
trying to format a null string.  I believe access out of bounds is not
possible.

Not worth cleaning up.  Dumb down to show just numeric wr_id.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-5-armbru@redhat.com>
---
 migration/rdma.c       | 32 +++++++-------------------------
 migration/trace-events |  8 ++++----
 2 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 65ed814d88..8f297c9e46 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -133,13 +133,6 @@ enum {
     RDMA_WRID_RECV_CONTROL = 4000,
 };
 
-static const char *wrid_desc[] = {
-    [RDMA_WRID_NONE] = "NONE",
-    [RDMA_WRID_RDMA_WRITE] = "WRITE RDMA",
-    [RDMA_WRID_SEND_CONTROL] = "CONTROL SEND",
-    [RDMA_WRID_RECV_CONTROL] = "CONTROL RECV",
-};
-
 /*
  * Work request IDs for IB SEND messages only (not RDMA writes).
  * This is used by the migration protocol to transmit
@@ -535,7 +528,6 @@ static void network_to_result(RDMARegisterResult *result)
     result->host_addr = ntohll(result->host_addr);
 };
 
-const char *print_wrid(int wrid);
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
@@ -1362,14 +1354,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
     return -1;
 }
 
-const char *print_wrid(int wrid)
-{
-    if (wrid >= RDMA_WRID_RECV_CONTROL) {
-        return wrid_desc[RDMA_WRID_RECV_CONTROL];
-    }
-    return wrid_desc[wrid];
-}
-
 /*
  * Perform a non-optimized memory unregistration after every transfer
  * for demonstration purposes, only if pin-all is not requested.
@@ -1491,15 +1475,15 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     if (wc.status != IBV_WC_SUCCESS) {
         fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
                         wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%s!\n", wrid_desc[wr_id]);
+        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
 
         return -1;
     }
 
     if (rdma->control_ready_expected &&
         (wr_id >= RDMA_WRID_RECV_CONTROL)) {
-        trace_qemu_rdma_poll_recv(wrid_desc[RDMA_WRID_RECV_CONTROL],
-                  wr_id - RDMA_WRID_RECV_CONTROL, wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_recv(wr_id - RDMA_WRID_RECV_CONTROL, wr_id,
+                                  rdma->nb_sent);
         rdma->control_ready_expected = 0;
     }
 
@@ -1510,7 +1494,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             (wc.wr_id & RDMA_WRID_BLOCK_MASK) >> RDMA_WRID_BLOCK_SHIFT;
         RDMALocalBlock *block = &(rdma->local_ram_blocks.block[index]);
 
-        trace_qemu_rdma_poll_write(print_wrid(wr_id), wr_id, rdma->nb_sent,
+        trace_qemu_rdma_poll_write(wr_id, rdma->nb_sent,
                                    index, chunk, block->local_host_addr,
                                    (void *)(uintptr_t)block->remote_host_addr);
 
@@ -1520,7 +1504,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
             rdma->nb_sent--;
         }
     } else {
-        trace_qemu_rdma_poll_other(print_wrid(wr_id), wr_id, rdma->nb_sent);
+        trace_qemu_rdma_poll_other(wr_id, rdma->nb_sent);
     }
 
     *wr_id_out = wc.wr_id;
@@ -1665,8 +1649,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
             break;
         }
         if (wr_id != wrid_requested) {
-            trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                       wrid_requested, print_wrid(wr_id), wr_id);
+            trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
         }
     }
 
@@ -1705,8 +1688,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
                 break;
             }
             if (wr_id != wrid_requested) {
-                trace_qemu_rdma_block_for_wrid_miss(print_wrid(wrid_requested),
-                                   wrid_requested, print_wrid(wr_id), wr_id);
+                trace_qemu_rdma_block_for_wrid_miss(wrid_requested, wr_id);
             }
         }
 
diff --git a/migration/trace-events b/migration/trace-events
index 002abe3a4e..19f9ee7c6d 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -208,7 +208,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
+qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -222,9 +222,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(const char *compstr, int64_t comp, int64_t id, int sent) "completion %s #%" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(const char *compstr, int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %s (%" PRId64 ") left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(const char *compstr, int64_t comp, int left) "other completion %s (%" PRId64 ") received left %d"
+qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 12/65] migration/rdma: Consistently use uint64_t for work request IDs
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (10 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 11/65] migration/rdma: Drop fragile wr_id formatting Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 13/65] migration/rdma: Fix unwanted integer truncation Juan Quintela
                   ` (53 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

We use int instead of uint64_t in a few places.  Change them to
uint64_t.

This cleans up a comparison of signed qemu_rdma_block_for_wrid()
parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
actual arguments are non-negative enumeration constants.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-6-armbru@redhat.com>
---
 migration/rdma.c       | 7 ++++---
 migration/trace-events | 8 ++++----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8f297c9e46..d1e727f30b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1599,13 +1599,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     return rdma->error_state;
 }
 
-static struct ibv_comp_channel *to_channel(RDMAContext *rdma, int wrid)
+static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_comp_channel :
            rdma->recv_comp_channel;
 }
 
-static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
+static struct ibv_cq *to_cq(RDMAContext *rdma, uint64_t wrid)
 {
     return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_cq : rdma->recv_cq;
 }
@@ -1623,7 +1623,8 @@ static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
  * completions only need to be recorded, but do not actually
  * need further processing.
  */
-static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
+static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
+                                    uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
     int num_cq_events = 0, ret = 0;
diff --git a/migration/trace-events b/migration/trace-events
index 19f9ee7c6d..6a50994402 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -208,7 +208,7 @@ qemu_rdma_accept_incoming_migration(void) ""
 qemu_rdma_accept_incoming_migration_accepted(void) ""
 qemu_rdma_accept_pin_state(bool pin) "%d"
 qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
-qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but got %" PRIu64
+qemu_rdma_block_for_wrid_miss(uint64_t wcomp, uint64_t req) "A Wanted wrid %" PRIu64 " but got %" PRIu64
 qemu_rdma_cleanup_disconnect(void) ""
 qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
@@ -222,9 +222,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) "Waiting for response %s"
 qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
 qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes already in buffer"
 qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block structures"
-qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" PRId64 " received (%" PRId64 ") left %d"
-qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
-qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " received left %d"
+qemu_rdma_poll_recv(uint64_t comp, int64_t id, int sent) "completion %" PRIu64 " received (%" PRId64 ") left %d"
+qemu_rdma_poll_write(uint64_t comp, int left, uint64_t block, uint64_t chunk, void *local, void *remote) "completions %" PRIu64 " left %d, block %" PRIu64 ", chunk: %" PRIu64 " %p %p"
+qemu_rdma_poll_other(uint64_t comp, int left) "other completion %" PRIu64 " received left %d"
 qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
 qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" PRIu64 " bytes @ %p"
 qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand Paging memory region: %s"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 13/65] migration/rdma: Fix unwanted integer truncation
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (11 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 12/65] migration/rdma: Consistently use uint64_t for work request IDs Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 14/65] migration/rdma: Clean up two more harmless signed vs. unsigned issues Juan Quintela
                   ` (52 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill()
to an int variable before it adds it to @done / subtracts it from
@want, both size_t.  Truncation when qemu_rdma_fill() copies more than
INT_MAX bytes.  Seems vanishingly unlikely, but needs fixing all the
same.

Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-7-armbru@redhat.com>
---
 migration/rdma.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d1e727f30b..ff8e475f59 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2871,7 +2871,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     RDMAControlHeader head;
     int ret = 0;
     ssize_t i;
-    size_t done = 0;
+    size_t done = 0, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmain);
@@ -2892,9 +2892,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
          * were given and dish out the bytes until we run
          * out of bytes.
          */
-        ret = qemu_rdma_fill(rdma, data, want, 0);
-        done += ret;
-        want -= ret;
+        len = qemu_rdma_fill(rdma, data, want, 0);
+        done += len;
+        want -= len;
         /* Got what we needed, so go to next iovec */
         if (want == 0) {
             continue;
@@ -2921,9 +2921,9 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         /*
          * SEND was received with new bytes, now try again.
          */
-        ret = qemu_rdma_fill(rdma, data, want, 0);
-        done += ret;
-        want -= ret;
+        len = qemu_rdma_fill(rdma, data, want, 0);
+        done += len;
+        want -= len;
 
         /* Still didn't get enough, so lets just return */
         if (want) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 14/65] migration/rdma: Clean up two more harmless signed vs. unsigned issues
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (12 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 13/65] migration/rdma: Fix unwanted integer truncation Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 15/65] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Juan Quintela
                   ` (51 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_exchange_get_response() compares int parameter @expecting
with uint32_t head->type.  Actual arguments are non-negative
enumeration constants, RDMAControlHeader uint32_t member type, or
qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
for the latter are non-negative enumeration constants.  Change both
parameters to uint32_t.

In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
counts from 0 up to @niov, which is size_t.  Change @i to size_t.

While there, make qio_channel_rdma_readv() and
qio_channel_rdma_writev() more consistent: change the former's @done
to ssize_t, and delete the latter's useless initialization of @len.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-8-armbru@redhat.com>
---
 migration/rdma.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ff8e475f59..1aac50b0de 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1801,7 +1801,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, int expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
@@ -1961,7 +1961,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                int expecting)
+                                   uint32_t expecting)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -2784,8 +2784,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     RDMAContext *rdma;
     int ret;
     ssize_t done = 0;
-    size_t i;
-    size_t len = 0;
+    size_t i, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
@@ -2870,8 +2869,8 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     RDMAContext *rdma;
     RDMAControlHeader head;
     int ret = 0;
-    ssize_t i;
-    size_t done = 0, len;
+    ssize_t done = 0;
+    size_t i, len;
 
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmain);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 15/65] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (13 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 14/65] migration/rdma: Clean up two more harmless signed vs. unsigned issues Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 16/65] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Juan Quintela
                   ` (50 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-9-armbru@redhat.com>
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1aac50b0de..1159f990af 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3075,7 +3075,7 @@ qio_channel_rdma_source_finalize(GSource *source)
     object_unref(OBJECT(ssource->rioc));
 }
 
-GSourceFuncs qio_channel_rdma_source_funcs = {
+static GSourceFuncs qio_channel_rdma_source_funcs = {
     qio_channel_rdma_source_prepare,
     qio_channel_rdma_source_check,
     qio_channel_rdma_source_dispatch,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 16/65] migration/rdma: Fix qemu_rdma_accept() to return failure on errors
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (14 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 15/65] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 17/65] migration/rdma: Put @errp parameter last Juan Quintela
                   ` (49 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_accept() returns 0 in some cases even when it didn't
complete its job due to errors.  Impact is not obvious.  I figure the
caller will soon fail again with a misleading error message.

Fix it to return -1 on any failure.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-10-armbru@redhat.com>
---
 migration/rdma.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1159f990af..8fd1b314b5 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3360,6 +3360,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3372,6 +3373,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
+            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3383,10 +3385,11 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     network_to_caps(&cap);
 
     if (cap.version < 1 || cap.version > RDMA_CONTROL_VERSION_CURRENT) {
-            error_report("Unknown source RDMA version: %d, bailing...",
-                            cap.version);
-            rdma_ack_cm_event(cm_event);
-            goto err_rdma_dest_wait;
+        error_report("Unknown source RDMA version: %d, bailing...",
+                     cap.version);
+        rdma_ack_cm_event(cm_event);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     /*
@@ -3416,9 +3419,10 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (!rdma->verbs) {
         rdma->verbs = verbs;
     } else if (rdma->verbs != verbs) {
-            error_report("ibv context not matching %p, %p!", rdma->verbs,
-                         verbs);
-            goto err_rdma_dest_wait;
+        error_report("ibv context not matching %p, %p!", rdma->verbs,
+                     verbs);
+        ret = -1;
+        goto err_rdma_dest_wait;
     }
 
     qemu_rdma_dump_id("dest_init", verbs);
@@ -3475,6 +3479,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
+        ret = -1;
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 17/65] migration/rdma: Put @errp parameter last
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (15 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 16/65] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 18/65] migration/rdma: Eliminate error_propagate() Juan Quintela
                   ` (48 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

include/qapi/error.h demands:

 * - Functions that use Error to report errors have an Error **errp
 *   parameter.  It should be the last parameter, except for functions
 *   taking variable arguments.

qemu_rdma_connect() does not conform.  Clean it up.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-11-armbru@redhat.com>
---
 migration/rdma.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8fd1b314b5..bc6d8248f2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2552,7 +2552,8 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     }
 }
 
-static int qemu_rdma_connect(RDMAContext *rdma, Error **errp, bool return_path)
+static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
+                             Error **errp)
 {
     RDMACapabilities cap = {
                                 .version = RDMA_CONTROL_VERSION_CURRENT,
@@ -4183,7 +4184,7 @@ void rdma_start_outgoing_migration(void *opaque,
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
-    ret = qemu_rdma_connect(rdma, errp, false);
+    ret = qemu_rdma_connect(rdma, false, errp);
 
     if (ret) {
         goto err;
@@ -4204,7 +4205,7 @@ void rdma_start_outgoing_migration(void *opaque,
             goto return_path_err;
         }
 
-        ret = qemu_rdma_connect(rdma_return_path, errp, true);
+        ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
         if (ret) {
             goto return_path_err;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 18/65] migration/rdma: Eliminate error_propagate()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (16 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 17/65] migration/rdma: Put @errp parameter last Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 19/65] migration/rdma: Drop rdma_add_block() error handling Juan Quintela
                   ` (47 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

When all we do with an Error we receive into a local variable is
propagating to somewhere else, we can just as well receive it there
right away.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-12-armbru@redhat.com>
---
 migration/rdma.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bc6d8248f2..c858d3fbe4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2465,7 +2465,6 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 {
     int ret, idx;
-    Error *local_err = NULL, **temp = &local_err;
 
     /*
      * Will be validated against destination's actual capabilities
@@ -2473,14 +2472,14 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
      */
     rdma->pin_all = pin_all;
 
-    ret = qemu_rdma_resolve_host(rdma, temp);
+    ret = qemu_rdma_resolve_host(rdma, errp);
     if (ret) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating pd and cq! Your mlock()"
+        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
@@ -2488,13 +2487,13 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error allocating qp!");
+        ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_init_ram_blocks(rdma);
     if (ret) {
-        ERROR(temp, "rdma migration: error initializing ram blocks!");
+        ERROR(errp, "rdma migration: error initializing ram blocks!");
         goto err_rdma_source_init;
     }
 
@@ -2509,7 +2508,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret) {
-            ERROR(temp, "rdma migration: error registering %d control!",
+            ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
         }
@@ -2518,7 +2517,6 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     return 0;
 
 err_rdma_source_init:
-    error_propagate(errp, local_err);
     qemu_rdma_cleanup(rdma);
     return -1;
 }
@@ -4111,7 +4109,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 {
     int ret;
     RDMAContext *rdma;
-    Error *local_err = NULL;
 
     trace_rdma_start_incoming_migration();
 
@@ -4121,13 +4118,12 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
         return;
     }
 
-    rdma = qemu_rdma_data_init(host_port, &local_err);
+    rdma = qemu_rdma_data_init(host_port, errp);
     if (rdma == NULL) {
         goto err;
     }
 
-    ret = qemu_rdma_dest_init(rdma, &local_err);
-
+    ret = qemu_rdma_dest_init(rdma, errp);
     if (ret) {
         goto err;
     }
@@ -4150,7 +4146,6 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 cleanup_rdma:
     qemu_rdma_cleanup(rdma);
 err:
-    error_propagate(errp, local_err);
     if (rdma) {
         g_free(rdma->host);
         g_free(rdma->host_port);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 19/65] migration/rdma: Drop rdma_add_block() error handling
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (17 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 18/65] migration/rdma: Eliminate error_propagate() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 20/65] migration/rdma: Drop qemu_rdma_search_ram_block() " Juan Quintela
                   ` (46 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

rdma_add_block() can't fail.  Return void, and drop the unreachable
error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-13-armbru@redhat.com>
---
 migration/rdma.c | 30 +++++++++---------------------
 1 file changed, 9 insertions(+), 21 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c858d3fbe4..466725dbf0 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -559,9 +559,9 @@ static inline uint8_t *ram_chunk_end(const RDMALocalBlock *rdma_ram_block,
     return result;
 }
 
-static int rdma_add_block(RDMAContext *rdma, const char *block_name,
-                         void *host_addr,
-                         ram_addr_t block_offset, uint64_t length)
+static void rdma_add_block(RDMAContext *rdma, const char *block_name,
+                           void *host_addr,
+                           ram_addr_t block_offset, uint64_t length)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     RDMALocalBlock *block;
@@ -615,8 +615,6 @@ static int rdma_add_block(RDMAContext *rdma, const char *block_name,
                          block->nb_chunks);
 
     local->nb_blocks++;
-
-    return 0;
 }
 
 /*
@@ -630,7 +628,8 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
     void *host_addr = qemu_ram_get_host_addr(rb);
     ram_addr_t block_offset = qemu_ram_get_offset(rb);
     ram_addr_t length = qemu_ram_get_used_length(rb);
-    return rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    rdma_add_block(opaque, block_name, host_addr, block_offset, length);
+    return 0;
 }
 
 /*
@@ -638,7 +637,7 @@ static int qemu_rdma_init_one_block(RAMBlock *rb, void *opaque)
  * identify chunk boundaries inside each RAMBlock and also be referenced
  * during dynamic page registration.
  */
-static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
+static void qemu_rdma_init_ram_blocks(RDMAContext *rdma)
 {
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
     int ret;
@@ -646,14 +645,11 @@ static int qemu_rdma_init_ram_blocks(RDMAContext *rdma)
     assert(rdma->blockmap == NULL);
     memset(local, 0, sizeof *local);
     ret = foreach_not_ignored_block(qemu_rdma_init_one_block, rdma);
-    if (ret) {
-        return ret;
-    }
+    assert(!ret);
     trace_qemu_rdma_init_ram_blocks(local->nb_blocks);
     rdma->dest_blocks = g_new0(RDMADestBlock,
                                rdma->local_ram_blocks.nb_blocks);
     local->init = true;
-    return 0;
 }
 
 /*
@@ -2491,11 +2487,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        ERROR(errp, "rdma migration: error initializing ram blocks!");
-        goto err_rdma_source_init;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     /* Build the hash that maps from offset to RAMBlock */
     rdma->blockmap = g_hash_table_new(g_direct_hash, g_direct_equal);
@@ -3438,11 +3430,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         goto err_rdma_dest_wait;
     }
 
-    ret = qemu_rdma_init_ram_blocks(rdma);
-    if (ret) {
-        error_report("rdma migration: error initializing ram blocks!");
-        goto err_rdma_dest_wait;
-    }
+    qemu_rdma_init_ram_blocks(rdma);
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 20/65] migration/rdma: Drop qemu_rdma_search_ram_block() error handling
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (18 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 19/65] migration/rdma: Drop rdma_add_block() error handling Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 21/65] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Juan Quintela
                   ` (45 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
unreachable error handling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-14-armbru@redhat.com>
---
 migration/rdma.c | 24 ++++++++----------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 466725dbf0..b412dad542 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1231,15 +1231,13 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
  *
  * Once the block is found, also identify which 'chunk' within that
  * block that the page belongs to.
- *
- * This search cannot fail or the migration will fail.
  */
-static int qemu_rdma_search_ram_block(RDMAContext *rdma,
-                                      uintptr_t block_offset,
-                                      uint64_t offset,
-                                      uint64_t length,
-                                      uint64_t *block_index,
-                                      uint64_t *chunk_index)
+static void qemu_rdma_search_ram_block(RDMAContext *rdma,
+                                       uintptr_t block_offset,
+                                       uint64_t offset,
+                                       uint64_t length,
+                                       uint64_t *block_index,
+                                       uint64_t *chunk_index)
 {
     uint64_t current_addr = block_offset + offset;
     RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
@@ -1251,8 +1249,6 @@ static int qemu_rdma_search_ram_block(RDMAContext *rdma,
     *block_index = block->index;
     *chunk_index = ram_chunk_index(block->local_host_addr,
                 block->local_host_addr + (current_addr - block->offset));
-
-    return 0;
 }
 
 /*
@@ -2341,12 +2337,8 @@ static int qemu_rdma_write(RDMAContext *rdma,
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
 
-        ret = qemu_rdma_search_ram_block(rdma, block_offset,
-                                         offset, len, &index, &chunk);
-        if (ret) {
-            error_report("ram block search failed");
-            return ret;
-        }
+        qemu_rdma_search_ram_block(rdma, block_offset,
+                                   offset, len, &index, &chunk);
         rdma->current_index = index;
         rdma->current_chunk = chunk;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 21/65] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (19 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 20/65] migration/rdma: Drop qemu_rdma_search_ram_block() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 22/65] migration/rdma: Use bool for two RDMAContext flags Juan Quintela
                   ` (44 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_buffer_mergeable() is semantically a predicate.  It returns
int 0 or 1.  Return bool instead, and fix the function name's
spelling.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-15-armbru@redhat.com>
---
 migration/rdma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b412dad542..2e62d2cd0a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2264,7 +2264,7 @@ static int qemu_rdma_write_flush(RDMAContext *rdma)
     return 0;
 }
 
-static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
+static inline bool qemu_rdma_buffer_mergeable(RDMAContext *rdma,
                     uint64_t offset, uint64_t len)
 {
     RDMALocalBlock *block;
@@ -2272,11 +2272,11 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     uint8_t *chunk_end;
 
     if (rdma->current_index < 0) {
-        return 0;
+        return false;
     }
 
     if (rdma->current_chunk < 0) {
-        return 0;
+        return false;
     }
 
     block = &(rdma->local_ram_blocks.block[rdma->current_index]);
@@ -2284,29 +2284,29 @@ static inline int qemu_rdma_buffer_mergable(RDMAContext *rdma,
     chunk_end = ram_chunk_end(block, rdma->current_chunk);
 
     if (rdma->current_length == 0) {
-        return 0;
+        return false;
     }
 
     /*
      * Only merge into chunk sequentially.
      */
     if (offset != (rdma->current_addr + rdma->current_length)) {
-        return 0;
+        return false;
     }
 
     if (offset < block->offset) {
-        return 0;
+        return false;
     }
 
     if ((offset + len) > (block->offset + block->length)) {
-        return 0;
+        return false;
     }
 
     if ((host_addr + len) > chunk_end) {
-        return 0;
+        return false;
     }
 
-    return 1;
+    return true;
 }
 
 /*
@@ -2329,7 +2329,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
     int ret;
 
     /* If we cannot merge it, we flush the current buffer first. */
-    if (!qemu_rdma_buffer_mergable(rdma, current_addr, len)) {
+    if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(rdma);
         if (ret) {
             return ret;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 22/65] migration/rdma: Use bool for two RDMAContext flags
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (20 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 21/65] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 23/65] migration/rdma: Fix or document problematic uses of errno Juan Quintela
                   ` (43 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

@error_reported and @received_error are flags.  The latter is even
assigned bool true.  Change them from int to bool.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-16-armbru@redhat.com>
---
 migration/rdma.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 2e62d2cd0a..dffca30382 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -91,7 +91,7 @@ static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
             if (!rdma->error_reported) { \
                 error_report("RDMA is in an error state waiting migration" \
                                 " to abort!"); \
-                rdma->error_reported = 1; \
+                rdma->error_reported = true; \
             } \
             return rdma->error_state; \
         } \
@@ -365,8 +365,8 @@ typedef struct RDMAContext {
      * and remember the error state.
      */
     int error_state;
-    int error_reported;
-    int received_error;
+    bool error_reported;
+    bool received_error;
 
     /*
      * Description of ram blocks used throughout the code.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 23/65] migration/rdma: Fix or document problematic uses of errno
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (21 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 22/65] migration/rdma: Use bool for two RDMAContext flags Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 24/65] migration/rdma: Ditch useless numeric error codes in error messages Juan Quintela
                   ` (42 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

We use errno after calling Libibverbs functions that are not
documented to set errno (manual page does not mention errno), or where
the documentation is unclear ("returns [...] the value of errno on
failure").  While this could be read as "sets errno and returns it",
a glance at the source code[*] kills that hope:

    static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
                                    struct ibv_send_wr **bad_wr)
    {
            return qp->context->ops.post_send(qp, wr, bad_wr);
    }

The callback can be

    static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
                              struct ibv_send_wr **bad)
    {
            /* This version of driver supports RAW QP only.
             * Posting WR is done directly in the application.
             */
            return EOPNOTSUPP;
    }

Neither of them touches errno.

One of these errno uses is easy to fix, so do that now.  Several more
will go away later in the series; add temporary FIXME commments.
Three will remain; add TODO comments.  TODO, not FIXME, because the
bug might be in Libibverbs documentation.

[*] https://github.com/linux-rdma/rdma-core.git
    commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-17-armbru@redhat.com>
---
 migration/rdma.c | 45 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 39 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index dffca30382..35b0129ae6 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -853,6 +853,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
         for (x = 0; x < num_devices; x++) {
             verbs = ibv_open_device(dev_list[x]);
+            /*
+             * ibv_open_device() is not documented to set errno.  If
+             * it does, it's somebody else's doc bug.  If it doesn't,
+             * the use of errno below is wrong.
+             * TODO Find out whether ibv_open_device() sets errno.
+             */
             if (!verbs) {
                 if (errno == EPERM) {
                     continue;
@@ -1162,11 +1168,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
     ret = ibv_advise_mr(pd, advice,
                         IBV_ADVISE_MR_FLAG_FLUSH, &sg_list, 1);
     /* ignore the error */
-    if (ret) {
-        trace_qemu_rdma_advise_mr(name, len, addr, strerror(errno));
-    } else {
-        trace_qemu_rdma_advise_mr(name, len, addr, "successed");
-    }
+    trace_qemu_rdma_advise_mr(name, len, addr, strerror(ret));
 #endif
 }
 
@@ -1183,7 +1185,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
                     local->block[i].local_host_addr,
                     local->block[i].length, access
                     );
-
+        /*
+         * ibv_reg_mr() is not documented to set errno.  If it does,
+         * it's somebody else's doc bug.  If it doesn't, the use of
+         * errno below is wrong.
+         * TODO Find out whether ibv_reg_mr() sets errno.
+         */
         if (!local->block[i].mr &&
             errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
                 access |= IBV_ACCESS_ON_DEMAND;
@@ -1291,6 +1298,12 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
         trace_qemu_rdma_register_and_get_keys(len, chunk_start);
 
         block->pmr[chunk] = ibv_reg_mr(rdma->pd, chunk_start, len, access);
+        /*
+         * ibv_reg_mr() is not documented to set errno.  If it does,
+         * it's somebody else's doc bug.  If it doesn't, the use of
+         * errno below is wrong.
+         * TODO Find out whether ibv_reg_mr() sets errno.
+         */
         if (!block->pmr[chunk] &&
             errno == ENOTSUP && rdma_support_odp(rdma->verbs)) {
             access |= IBV_ACCESS_ON_DEMAND;
@@ -1408,6 +1421,11 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         block->remote_keys[chunk] = 0;
 
         if (ret != 0) {
+            /*
+             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
+             * not documented to set errno.  Will go away later in
+             * this series.
+             */
             perror("unregistration chunk failed");
             return -ret;
         }
@@ -1658,6 +1676,11 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
         if (ret) {
+            /*
+             * FIXME perror() is problematic, because ibv_reg_mr() is
+             * not documented to set errno.  Will go away later in
+             * this series.
+             */
             perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
@@ -2210,6 +2233,11 @@ retry:
         goto retry;
 
     } else if (ret > 0) {
+        /*
+         * FIXME perror() is problematic, because whether
+         * ibv_post_send() sets errno is unclear.  Will go away later
+         * in this series.
+         */
         perror("rdma migration: post rdma write failed");
         return -ret;
     }
@@ -2579,6 +2607,11 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
     }
     if (ret) {
+        /*
+         * FIXME perror() is wrong, because
+         * qemu_get_cm_event_timeout() can fail without setting errno.
+         * Will go away later in this series.
+         */
         perror("rdma_get_cm_event after rdma_connect");
         ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 24/65] migration/rdma: Ditch useless numeric error codes in error messages
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (22 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 23/65] migration/rdma: Fix or document problematic uses of errno Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 25/65] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Juan Quintela
                   ` (41 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Several error messages include numeric error codes returned by failed
functions:

* ibv_poll_cq() returns an unspecified negative value.  Useless.

* rdma_accept and rdma_get_cm_event() return -1.  Useless.

* qemu_rdma_poll() returns either -1 or an unspecified negative
  value.  Useless.

* qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
  qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
  qemu_rdma_write() return a negative value that may or may not be an
  errno value.  While reporting human-readable errno
  information (which a number is not) can be useful, reporting an
  error code that may or may not be an errno value is useless.

Drop these error codes from the error messages.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-18-armbru@redhat.com>
---
 migration/rdma.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 35b0129ae6..c4197c6437 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1476,7 +1476,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq return %d", ret);
+        error_report("ibv_poll_cq failed");
         return ret;
     }
 
@@ -2226,7 +2226,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: failed to make "
-                         "room in full send queue! %d", ret);
+                         "room in full send queue!");
             return ret;
         }
 
@@ -2819,7 +2819,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
     ret = qemu_rdma_write_flush(rdma);
     if (ret < 0) {
         rdma->error_state = ret;
-        error_setg(errp, "qemu_rdma_write_flush returned %d", ret);
+        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -2839,7 +2839,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 
             if (ret < 0) {
                 rdma->error_state = ret;
-                error_setg(errp, "qemu_rdma_exchange_send returned %d", ret);
+                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -2929,7 +2929,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
 
         if (ret < 0) {
             rdma->error_state = ret;
-            error_setg(errp, "qemu_rdma_exchange_recv returned %d", ret);
+            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3271,7 +3271,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
      */
     ret = qemu_rdma_write(rdma, block_offset, offset, size);
     if (ret < 0) {
-        error_report("rdma migration: write error! %d", ret);
+        error_report("rdma migration: write error");
         goto err;
     }
 
@@ -3287,7 +3287,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
         ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL);
 
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3303,7 +3303,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
         ret = qemu_rdma_poll(rdma, rdma->send_cq, &wr_id_in, NULL);
 
         if (ret < 0) {
-            error_report("rdma migration: polling error! %d", ret);
+            error_report("rdma migration: polling error");
             goto err;
         }
 
@@ -3478,13 +3478,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
     if (ret) {
-        error_report("rdma_accept returns %d", ret);
+        error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret) {
-        error_report("rdma_accept get_cm_event failed %d", ret);
+        error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 25/65] migration/rdma: Fix io_writev(), io_readv() methods to obey contract
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (23 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 24/65] migration/rdma: Ditch useless numeric error codes in error messages Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 26/65] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Juan Quintela
                   ` (40 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

QIOChannelClass methods qio_channel_rdma_readv() and
qio_channel_rdma_writev() violate their method contract when
rdma->error_state is non-zero:

1. They return whatever is in rdma->error_state then.  Only -1 will be
   fine.  -2 will be misinterpreted as "would block".  Anything less
   than -2 isn't defined in the contract.  A positive value would be
   misinterpreted as success, but I believe that's not actually
   possible.

2. They neglect to set an error then.  If something up the call stack
   dereferences the error when failure is returned, it will crash.  If
   it ignores the return value and checks the error instead, it will
   miss the error.

Crap like this happens when return statements hide in macros,
especially when their uses are far away from the definition.

I elected not to investigate how callers are impacted.

Expand the two bad macro uses, so we can set an error and return -1.
The next commit will then get rid of the macro altogether.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-19-armbru@redhat.com>
---
 migration/rdma.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c4197c6437..18be228e3b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2810,7 +2810,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     /*
      * Push out any writes that
@@ -2896,7 +2900,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    CHECK_ERROR_STATE();
+    if (rdma->error_state) {
+        error_setg(errp,
+                   "RDMA is in an error state waiting migration to abort!");
+        return -1;
+    }
 
     for (i = 0; i < niov; i++) {
         size_t want = iov[i].iov_len;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 26/65] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (24 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 25/65] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 27/65] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Juan Quintela
                   ` (39 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Hiding return statements in macros is a bad idea.  Use a function
instead, and open code the return part.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-20-armbru@redhat.com>
---
 migration/rdma.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 18be228e3b..30e2c817f2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -85,18 +85,6 @@
  */
 static uint32_t known_capabilities = RDMA_CAPABILITY_PIN_ALL;
 
-#define CHECK_ERROR_STATE() \
-    do { \
-        if (rdma->error_state) { \
-            if (!rdma->error_reported) { \
-                error_report("RDMA is in an error state waiting migration" \
-                                " to abort!"); \
-                rdma->error_reported = true; \
-            } \
-            return rdma->error_state; \
-        } \
-    } while (0)
-
 /*
  * A work request ID is 64-bits and we split up these bits
  * into 3 parts:
@@ -451,6 +439,16 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
+static int check_error_state(RDMAContext *rdma)
+{
+    if (rdma->error_state && !rdma->error_reported) {
+        error_report("RDMA is in an error state waiting migration"
+                     " to abort!");
+        rdma->error_reported = true;
+    }
+    return rdma->error_state;
+}
+
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
 {
     RDMALocalBlock *local_block;
@@ -3268,7 +3266,10 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
 
@@ -3574,7 +3575,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     local = &rdma->local_ram_blocks;
     do {
@@ -3878,6 +3882,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3889,7 +3894,10 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     trace_qemu_rdma_registration_start(flags);
     qemu_put_be64(f, RAM_SAVE_FLAG_HOOK);
@@ -3920,7 +3928,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -EIO;
     }
 
-    CHECK_ERROR_STATE();
+    ret = check_error_state(rdma);
+    if (ret) {
+        return ret;
+    }
 
     qemu_fflush(f);
     ret = qemu_rdma_drain_cq(rdma);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 27/65] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (25 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 26/65] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 28/65] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Juan Quintela
                   ` (38 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
they find on that works.  If none works, they return the first Error
set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.

qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
ibv_open_device() fails.  If a later address fails differently, we use
that Error instead, or else the generic one.  Harmless enough, but
needs fixing all the same.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-21-armbru@redhat.com>
---
 migration/rdma.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index 30e2c817f2..9c576bdcba 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -861,6 +861,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 if (errno == EPERM) {
                     continue;
                 } else {
+                    error_setg_errno(errp, errno,
+                                     "could not open RDMA device context");
                     return -EINVAL;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 28/65] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (26 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 27/65] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 29/65] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Juan Quintela
                   ` (37 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_get_cm_event_timeout() neglects to set an error when it fails
because rdma_get_cm_event() fails.  Harmless, as its caller
qemu_rdma_connect() substitutes a generic error then.  Fix it anyway.

qemu_rdma_connect() also sets the generic error when its own call of
rdma_get_cm_event() fails.  Make the error handling more obvious: set
a specific error right after rdma_get_cm_event() fails.  Delete the
generic error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-22-armbru@redhat.com>
---
 migration/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9c576bdcba..0039295a15 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2555,7 +2555,11 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
         ERROR(errp, "failed to poll cm event, errno=%i", errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
-        return rdma_get_cm_event(rdma->channel, cm_event);
+        if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
+            ERROR(errp, "failed to get cm event");
+            return -1;
+        }
+        return 0;
     } else {
         ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
         return -1;
@@ -2605,6 +2609,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
         ret = qemu_get_cm_event_timeout(rdma, &cm_event, 5000, errp);
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
+        if (ret < 0) {
+            ERROR(errp, "failed to get cm event");
+        }
     }
     if (ret) {
         /*
@@ -2613,7 +2620,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
          * Will go away later in this series.
          */
         perror("rdma_get_cm_event after rdma_connect");
-        ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 29/65] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (27 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 28/65] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 30/65] migration/rdma: Fix QEMUFileHooks method return values Juan Quintela
                   ` (36 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_data_init() neglects to set an Error when it fails because
@host_port is null.  Fortunately, no caller passes null, so this is
merely a latent bug.  Drop the flawed code handling null argument.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-23-armbru@redhat.com>
---
 migration/rdma.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 0039295a15..55eb8222ea 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2767,25 +2767,22 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
     RDMAContext *rdma = NULL;
     InetSocketAddress *addr;
 
-    if (host_port) {
-        rdma = g_new0(RDMAContext, 1);
-        rdma->current_index = -1;
-        rdma->current_chunk = -1;
+    rdma = g_new0(RDMAContext, 1);
+    rdma->current_index = -1;
+    rdma->current_chunk = -1;
 
-        addr = g_new(InetSocketAddress, 1);
-        if (!inet_parse(addr, host_port, NULL)) {
-            rdma->port = atoi(addr->port);
-            rdma->host = g_strdup(addr->host);
-            rdma->host_port = g_strdup(host_port);
-        } else {
-            ERROR(errp, "bad RDMA migration address '%s'", host_port);
-            g_free(rdma);
-            rdma = NULL;
-        }
-
-        qapi_free_InetSocketAddress(addr);
+    addr = g_new(InetSocketAddress, 1);
+    if (!inet_parse(addr, host_port, NULL)) {
+        rdma->port = atoi(addr->port);
+        rdma->host = g_strdup(addr->host);
+        rdma->host_port = g_strdup(host_port);
+    } else {
+        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        g_free(rdma);
+        rdma = NULL;
     }
 
+    qapi_free_InetSocketAddress(addr);
     return rdma;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 30/65] migration/rdma: Fix QEMUFileHooks method return values
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (28 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 29/65] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 31/65] migration/rdma: Fix rdma_getaddrinfo() error checking Juan Quintela
                   ` (35 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

The QEMUFileHooks methods don't come with a written contract.  Digging
through the code calling them, we find:

* save_page():

  Negative values RAM_SAVE_CONTROL_DELAYED and
  RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
  an unspecified error.

  qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
  believe the latter is always negative.  Nothing stops either of them
  to clash with the special values, though.  Feels unlikely, but fix
  it anyway to return only the special values and -1.

* before_ram_iterate(), after_ram_iterate():

  Negative value means error.  qemu_rdma_registration_start() and
  qemu_rdma_registration_stop() comply as far as I can tell.  Make
  them comply *obviously*, by returning -1 on error.

* hook_ram_load:

  Negative value means error.  rdma_load_hook() already returns -1 on
  error.  Leave it alone.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-24-armbru@redhat.com>
---
 migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 42 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 55eb8222ea..974edde6a3 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3268,12 +3268,11 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
     rdma = qatomic_rcu_read(&rioc->rdmaout);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3329,9 +3328,10 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
     }
 
     return RAM_SAVE_CONTROL_DELAYED;
+
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static void rdma_accept_incoming_migration(void *opaque);
@@ -3577,12 +3577,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     local = &rdma->local_ram_blocks;
@@ -3615,7 +3614,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
                 ret = -EIO;
-                goto out;
+                goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
 
@@ -3627,7 +3626,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
         case RDMA_CONTROL_REGISTER_FINISHED:
             trace_qemu_rdma_registration_handle_finished();
-            goto out;
+            return 0;
 
         case RDMA_CONTROL_RAM_BLOCKS_REQUEST:
             trace_qemu_rdma_registration_handle_ram_blocks();
@@ -3648,7 +3647,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
-                    goto out;
+                    goto err;
                 }
             }
 
@@ -3687,7 +3686,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("rdma migration: error sending remote info");
-                goto out;
+                goto err;
             }
 
             break;
@@ -3714,7 +3713,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
                     ret = -ENOENT;
-                    goto out;
+                    goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
                 if (block->is_ram_block) {
@@ -3724,7 +3723,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             block->block_name, block->offset,
                             reg->key.current_addr);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                     host_addr = (block->local_host_addr +
                                 (reg->key.current_addr - block->offset));
@@ -3740,7 +3739,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
                         ret = -ERANGE;
-                        goto out;
+                        goto err;
                     }
                 }
                 chunk_start = ram_chunk_start(block, chunk);
@@ -3752,7 +3751,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
                     ret = -EINVAL;
-                    goto out;
+                    goto err;
                 }
                 reg_result->rkey = tmp_rkey;
 
@@ -3769,7 +3768,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_UNREGISTER_REQUEST:
@@ -3792,7 +3791,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
                     ret = -ret;
-                    goto out;
+                    goto err;
                 }
 
                 rdma->total_registrations--;
@@ -3805,24 +3804,23 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (ret < 0) {
                 error_report("Failed to send control buffer");
-                goto out;
+                goto err;
             }
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
             ret = -EIO;
-            goto out;
+            goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
             ret = -EIO;
-            goto out;
+            goto err;
         }
     } while (1);
-out:
-    if (ret < 0) {
-        rdma->error_state = ret;
-    }
-    return ret;
+
+err:
+    rdma->error_state = ret;
+    return -1;
 }
 
 /* Destination:
@@ -3844,7 +3842,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
     rdma = qatomic_rcu_read(&rioc->rdmain);
 
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
     /* Find the matching RAMBlock in our local list */
@@ -3857,7 +3855,7 @@ rdma_block_notification_handle(QEMUFile *f, const char *name)
 
     if (found == -1) {
         error_report("RAMBlock '%s' not found on destination", name);
-        return -ENOENT;
+        return -1;
     }
 
     rdma->local_ram_blocks.block[curr].src_index = rdma->next_src_index;
@@ -3887,7 +3885,6 @@ static int qemu_rdma_registration_start(QEMUFile *f,
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
-    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -3896,12 +3893,11 @@ static int qemu_rdma_registration_start(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     trace_qemu_rdma_registration_start(flags);
@@ -3930,12 +3926,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     RCU_READ_LOCK_GUARD();
     rdma = qatomic_rcu_read(&rioc->rdmaout);
     if (!rdma) {
-        return -EIO;
+        return -1;
     }
 
-    ret = check_error_state(rdma);
-    if (ret) {
-        return ret;
+    if (check_error_state(rdma)) {
+        return -1;
     }
 
     qemu_fflush(f);
@@ -3966,7 +3961,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     qemu_rdma_reg_whole_ram_blocks : NULL);
         if (ret < 0) {
             fprintf(stderr, "receiving remote info!");
-            return ret;
+            return -1;
         }
 
         nb_dest_blocks = resp.len / sizeof(RDMADestBlock);
@@ -3989,7 +3984,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
             rdma->error_state = -EINVAL;
-            return -EINVAL;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, reg_result_idx, &resp);
@@ -4005,7 +4000,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
                 rdma->error_state = -EINVAL;
-                return -EINVAL;
+                return -1;
             }
             local->block[i].remote_host_addr =
                     rdma->dest_blocks[i].remote_host_addr;
@@ -4025,7 +4020,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     return 0;
 err:
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 static const QEMUFileHooks rdma_read_hooks = {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 31/65] migration/rdma: Fix rdma_getaddrinfo() error checking
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (29 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 30/65] migration/rdma: Fix QEMUFileHooks method return values Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 32/65] migration/rdma: Return -1 instead of negative errno code Juan Quintela
                   ` (34 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

rdma_getaddrinfo() returns 0 on success.  On error, it returns one of
the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
This is broken by design: POSIX implicitly specifies the EAI_ error
codes to be non-zero, no more.  They could clash with -1.  Nothing we
can do about this design flaw.

Both callers of rdma_getaddrinfo() only recognize negative values as
error.  Works only because systems elect to make the EAI_ error codes
negative.

Best not to rely on that: change the callers to treat any non-zero
value as failure.  Also change them to return -1 instead of the value
received from getaddrinfo() on failure, to avoid positive error
values.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-25-armbru@redhat.com>
---
 migration/rdma.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 974edde6a3..dd0b073792 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -941,14 +941,14 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
         ERROR(errp, "RDMA hostname has not been set");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create CM channel");
-        return -EINVAL;
+        return -1;
     }
 
     /* create CM id */
@@ -962,7 +962,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_resolve_get_addr;
     }
@@ -1004,7 +1004,6 @@ route:
                 rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1025,7 +1024,6 @@ route:
         ERROR(errp, "result not equal to event_route_resolved: %s",
                         rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
-        ret = -EINVAL;
         goto err_resolve_get_addr;
     }
     rdma_ack_cm_event(cm_event);
@@ -1040,7 +1038,7 @@ err_resolve_get_addr:
 err_resolve_create_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    return ret;
+    return -1;
 }
 
 /*
@@ -2695,7 +2693,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     port_str[15] = '\0';
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
-    if (ret < 0) {
+    if (ret) {
         ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
         goto err_dest_init_bind_addr;
     }
@@ -2739,7 +2737,7 @@ err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
     rdma->error_state = ret;
-    return ret;
+    return -1;
 
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 32/65] migration/rdma: Return -1 instead of negative errno code
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (30 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 31/65] migration/rdma: Fix rdma_getaddrinfo() error checking Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 33/65] migration/rdma: Dumb down remaining int error values to -1 Juan Quintela
                   ` (33 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Several functions return negative errno codes on failure.  Callers
check for specific codes exactly never.  For some of the functions,
callers couldn't check even if they wanted to, because the functions
also return negative values that aren't errno codes, leaving readers
confused on what the function actually returns.

Clean up and simplify: return -1 instead of negative errno code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-26-armbru@redhat.com>
---
 migration/rdma.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index dd0b073792..bc39b7ab2e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -863,14 +863,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 } else {
                     error_setg_errno(errp, errno,
                                      "could not open RDMA device context");
-                    return -EINVAL;
+                    return -1;
                 }
             }
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
                 ERROR(errp, "Could not query initial IB port");
-                return -EINVAL;
+                return -1;
             }
 
             if (port_attr.link_layer == IBV_LINK_LAYER_INFINIBAND) {
@@ -895,7 +895,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                 ERROR(errp, "You only have RoCE / iWARP devices in your systems"
                             " and your management software has specified '[::]'"
                             ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                return -ENONET;
+                return -1;
             }
         }
 
@@ -911,13 +911,13 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
         ERROR(errp, "Could not query initial IB port");
-        return -EINVAL;
+        return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
         ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
                     "(but patches on linux-rdma in progress)");
-        return -ENONET;
+        return -1;
     }
 
 #endif
@@ -1425,7 +1425,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
              * this series.
              */
             perror("unregistration chunk failed");
-            return -ret;
+            return -1;
         }
         rdma->total_registrations--;
 
@@ -1570,7 +1570,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (ret) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
-                        return -EPIPE;
+                        return -1;
                     }
 
                     error_report("receive cm event while wait comp channel,"
@@ -1578,7 +1578,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        return -EPIPE;
+                        return -1;
                     }
                     rdma_ack_cm_event(cm_event);
                 }
@@ -1591,18 +1591,18 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                       * I don't trust errno from qemu_poll_ns
                      */
                 error_report("%s: poll failed", __func__);
-                return -EPIPE;
+                return -1;
             }
 
             if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
                 /* Bail out and let the cancellation happen */
-                return -EPIPE;
+                return -1;
             }
         }
     }
 
     if (rdma->received_error) {
-        return -EPIPE;
+        return -1;
     }
     return rdma->error_state;
 }
@@ -1772,7 +1772,7 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
 
     if (ret > 0) {
         error_report("Failed to use post IB SEND for control");
-        return -ret;
+        return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
@@ -1841,15 +1841,15 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         if (head->type == RDMA_CONTROL_ERROR) {
             rdma->received_error = true;
         }
-        return -EIO;
+        return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
         error_report("too long length: %d", head->len);
-        return -EINVAL;
+        return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
         error_report("Malformed length: %d byte_len %d", head->len, byte_len);
-        return -EINVAL;
+        return -1;
     }
 
     return 0;
@@ -2115,7 +2115,7 @@ retry:
                                 (uint8_t *) &comp, NULL, NULL, NULL);
 
                 if (ret < 0) {
-                    return -EIO;
+                    return -1;
                 }
 
                 /*
@@ -2159,7 +2159,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey");
-                return -EINVAL;
+                return -1;
             }
 
             reg_result = (RDMARegisterResult *)
@@ -2178,7 +2178,7 @@ retry:
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
                 error_report("cannot get lkey!");
-                return -EINVAL;
+                return -1;
             }
         }
 
@@ -2190,7 +2190,7 @@ retry:
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
             error_report("cannot get lkey!");
-            return -EINVAL;
+            return -1;
         }
     }
 
@@ -2237,7 +2237,7 @@ retry:
          * in this series.
          */
         perror("rdma migration: post rdma write failed");
-        return -ret;
+        return -1;
     }
 
     set_bit(chunk, block->transit_bitmap);
@@ -2969,14 +2969,14 @@ static int qemu_rdma_drain_cq(RDMAContext *rdma)
     int ret;
 
     if (qemu_rdma_write_flush(rdma) < 0) {
-        return -EIO;
+        return -1;
     }
 
     while (rdma->nb_sent) {
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
             error_report("rdma migration: complete polling error!");
-            return -EIO;
+            return -1;
         }
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 33/65] migration/rdma: Dumb down remaining int error values to -1
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (31 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 32/65] migration/rdma: Return -1 instead of negative errno code Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 34/65] migration/rdma: Replace int error_state by bool errored Juan Quintela
                   ` (32 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

This is just to make the error value more obvious.  Callers don't
mind.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-27-armbru@redhat.com>
---
 migration/rdma.c | 45 +++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bc39b7ab2e..3c7518d65a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1434,7 +1434,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                 &resp, NULL, NULL);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         trace_qemu_rdma_unregister_waiting_complete(chunk);
@@ -1475,7 +1475,7 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
 
     if (ret < 0) {
         error_report("ibv_poll_cq failed");
-        return ret;
+        return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
@@ -1604,7 +1604,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -1;
     }
-    return rdma->error_state;
+    return -!!rdma->error_state;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
@@ -1649,7 +1649,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
     while (wr_id != wrid_requested) {
         ret = qemu_rdma_poll(rdma, poll_cq, &wr_id_in, byte_len);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         wr_id = wr_id_in & RDMA_WRID_TYPE_MASK;
@@ -1723,7 +1723,7 @@ err_block_for_wrid:
     }
 
     rdma->error_state = ret;
-    return ret;
+    return -1;
 }
 
 /*
@@ -1778,9 +1778,10 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
         error_report("rdma migration: send polling control error");
+        return -1;
     }
 
-    return ret;
+    return 0;
 }
 
 /*
@@ -1822,7 +1823,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
 
     if (ret < 0) {
         error_report("rdma migration: recv polling control error!");
-        return ret;
+        return -1;
     }
 
     network_to_control((void *) rdma->wr_data[idx].control);
@@ -1902,7 +1903,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               RDMA_CONTROL_READY,
                                               RDMA_WRID_READY);
         if (ret < 0) {
-            return ret;
+            return -1;
         }
     }
 
@@ -1914,7 +1915,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
         if (ret) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
-            return ret;
+            return -1;
         }
     }
 
@@ -1924,7 +1925,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting first control recv!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1934,7 +1935,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -1945,7 +1946,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
         }
 
@@ -1954,7 +1955,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
-            return ret;
+            return -1;
         }
 
         qemu_rdma_move_header(rdma, RDMA_WRID_DATA, resp);
@@ -1990,7 +1991,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
 
     if (ret < 0) {
         error_report("Failed to send control buffer!");
-        return ret;
+        return -1;
     }
 
     /*
@@ -2000,7 +2001,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     qemu_rdma_move_header(rdma, RDMA_WRID_READY, head);
@@ -2011,7 +2012,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret) {
         error_report("rdma migration: error posting second control recv!");
-        return ret;
+        return -1;
     }
 
     return 0;
@@ -2084,7 +2085,7 @@ retry:
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
-            return ret;
+            return -1;
         }
     }
 
@@ -2151,7 +2152,7 @@ retry:
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
                                     &resp, &reg_result_idx, NULL);
             if (ret < 0) {
-                return ret;
+                return -1;
             }
 
             /* try to overlap this single registration with the one we sent. */
@@ -2225,7 +2226,7 @@ retry:
         if (ret < 0) {
             error_report("rdma migration: failed to make "
                          "room in full send queue!");
-            return ret;
+            return -1;
         }
 
         goto retry;
@@ -2276,7 +2277,7 @@ static int qemu_rdma_write_flush(RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
-        return ret;
+        return -1;
     }
 
     if (ret == 0) {
@@ -2358,7 +2359,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(rdma);
         if (ret) {
-            return ret;
+            return -1;
         }
         rdma->current_length = 0;
         rdma->current_addr = current_addr;
@@ -3524,7 +3525,7 @@ err_rdma_dest_wait:
     rdma->error_state = ret;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
-    return ret;
+    return -1;
 }
 
 static int dest_ram_sort_func(const void *a, const void *b)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 34/65] migration/rdma: Replace int error_state by bool errored
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (32 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 33/65] migration/rdma: Dumb down remaining int error values to -1 Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 35/65] migration/rdma: Drop superfluous assignments to @ret Juan Quintela
                   ` (31 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

All we do with the value of RDMAContext member @error_state is test
whether it's zero.  Change to bool and rename to @errored.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-28-armbru@redhat.com>
---
 migration/rdma.c | 66 ++++++++++++++++++++++++------------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3c7518d65a..a57ec3791a 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -352,7 +352,7 @@ typedef struct RDMAContext {
      * memory registration, then do not attempt any future work
      * and remember the error state.
      */
-    int error_state;
+    bool errored;
     bool error_reported;
     bool received_error;
 
@@ -439,14 +439,14 @@ typedef struct QEMU_PACKED {
     uint64_t chunks;            /* how many sequential chunks to register */
 } RDMARegister;
 
-static int check_error_state(RDMAContext *rdma)
+static bool rdma_errored(RDMAContext *rdma)
 {
-    if (rdma->error_state && !rdma->error_reported) {
+    if (rdma->errored && !rdma->error_reported) {
         error_report("RDMA is in an error state waiting migration"
                      " to abort!");
         rdma->error_reported = true;
     }
-    return rdma->error_state;
+    return rdma->errored;
 }
 
 static void register_to_network(RDMAContext *rdma, RDMARegister *reg)
@@ -1547,7 +1547,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
          * But we need to be able to handle 'cancel' or an error
          * without hanging forever.
          */
-        while (!rdma->error_state  && !rdma->received_error) {
+        while (!rdma->errored && !rdma->received_error) {
             GPollFD pfds[2];
             pfds[0].fd = comp_channel->fd;
             pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
@@ -1604,7 +1604,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
     if (rdma->received_error) {
         return -1;
     }
-    return -!!rdma->error_state;
+    return -rdma->errored;
 }
 
 static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
@@ -1722,7 +1722,7 @@ err_block_for_wrid:
         ibv_ack_cq_events(cq, num_cq_events);
     }
 
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -2386,7 +2386,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if ((rdma->error_state ||
+        if ((rdma->errored ||
              migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
             !rdma->received_error) {
             RDMAControlHeader head = { .len = 0,
@@ -2672,14 +2672,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     if (!rdma->host || !rdma->host[0]) {
         ERROR(errp, "RDMA host is not set!");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
         ERROR(errp, "could not create rdma event channel");
-        rdma->error_state = -EINVAL;
+        rdma->errored = true;
         return -1;
     }
 
@@ -2737,7 +2737,7 @@ err_dest_init_bind_addr:
 err_dest_init_create_listen_id:
     rdma_destroy_event_channel(rdma->channel);
     rdma->channel = NULL;
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 
 }
@@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2824,7 +2824,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      */
     ret = qemu_rdma_write_flush(rdma);
     if (ret < 0) {
-        rdma->error_state = ret;
+        rdma->errored = true;
         error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
@@ -2844,7 +2844,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
 
             if (ret < 0) {
-                rdma->error_state = ret;
+                rdma->errored = true;
                 error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
@@ -2902,7 +2902,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         return -1;
     }
 
-    if (rdma->error_state) {
+    if (rdma->errored) {
         error_setg(errp,
                    "RDMA is in an error state waiting migration to abort!");
         return -1;
@@ -2938,7 +2938,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
 
         if (ret < 0) {
-            rdma->error_state = ret;
+            rdma->errored = true;
             error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
@@ -3212,21 +3212,21 @@ qio_channel_rdma_shutdown(QIOChannel *ioc,
     switch (how) {
     case QIO_CHANNEL_SHUTDOWN_READ:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_WRITE:
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     case QIO_CHANNEL_SHUTDOWN_BOTH:
     default:
         if (rdmain) {
-            rdmain->error_state = -1;
+            rdmain->errored = true;
         }
         if (rdmaout) {
-            rdmaout->error_state = -1;
+            rdmaout->errored = true;
         }
         break;
     }
@@ -3270,7 +3270,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3329,7 +3329,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
     return RAM_SAVE_CONTROL_DELAYED;
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3350,13 +3350,13 @@ static void rdma_cm_poll_handler(void *opaque)
 
     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
-        if (!rdma->error_state &&
+        if (!rdma->errored &&
             migration_incoming_get_current()->state !=
               MIGRATION_STATUS_COMPLETED) {
             error_report("receive cm event, cm event is %d", cm_event->event);
-            rdma->error_state = -EPIPE;
+            rdma->errored = true;
             if (rdma->return_path) {
-                rdma->return_path->error_state = -EPIPE;
+                rdma->return_path->errored = true;
             }
         }
         rdma_ack_cm_event(cm_event);
@@ -3522,7 +3522,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     return 0;
 
 err_rdma_dest_wait:
-    rdma->error_state = ret;
+    rdma->errored = true;
     qemu_rdma_cleanup(rdma);
     g_free(rdma_return_path);
     return -1;
@@ -3579,7 +3579,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3818,7 +3818,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     } while (1);
 
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
@@ -3895,7 +3895,7 @@ static int qemu_rdma_registration_start(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3928,7 +3928,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
         return -1;
     }
 
-    if (check_error_state(rdma)) {
+    if (rdma_errored(rdma)) {
         return -1;
     }
 
@@ -3982,7 +3982,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                     "Your QEMU command line parameters are probably "
                     "not identical on both the source and destination.",
                     local->nb_blocks, nb_dest_blocks);
-            rdma->error_state = -EINVAL;
+            rdma->errored = true;
             return -1;
         }
 
@@ -3998,7 +3998,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                         "vs %" PRIu64, local->block[i].block_name, i,
                         local->block[i].length,
                         rdma->dest_blocks[i].length);
-                rdma->error_state = -EINVAL;
+                rdma->errored = true;
                 return -1;
             }
             local->block[i].remote_host_addr =
@@ -4018,7 +4018,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
     return 0;
 err:
-    rdma->error_state = ret;
+    rdma->errored = true;
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 35/65] migration/rdma: Drop superfluous assignments to @ret
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (33 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 34/65] migration/rdma: Replace int error_state by bool errored Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 36/65] migration/rdma: Check negative error values the same way everywhere Juan Quintela
                   ` (30 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-29-armbru@redhat.com>
---
 migration/rdma.c | 35 ++++++++++-------------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a57ec3791a..08a82d5e57 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1530,7 +1530,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                                        struct ibv_comp_channel *comp_channel)
 {
     struct rdma_cm_event *cm_event;
-    int ret = -1;
+    int ret;
 
     /*
      * Coroutine doesn't start until migration_fd_process_incoming()
@@ -1635,7 +1635,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
                                     uint64_t wrid_requested,
                                     uint32_t *byte_len)
 {
-    int num_cq_events = 0, ret = 0;
+    int num_cq_events = 0, ret;
     struct ibv_cq *cq;
     void *cq_ctx;
     uint64_t wr_id = RDMA_WRID_NONE, wr_id_in;
@@ -1685,8 +1685,7 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         num_cq_events++;
 
-        ret = -ibv_req_notify_cq(cq, 0);
-        if (ret) {
+        if (ibv_req_notify_cq(cq, 0)) {
             goto err_block_for_wrid;
         }
 
@@ -1733,7 +1732,7 @@ err_block_for_wrid:
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
                                        RDMAControlHeader *head)
 {
-    int ret = 0;
+    int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
     struct ibv_send_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1890,7 +1889,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    int *resp_idx,
                                    int (*callback)(RDMAContext *rdma))
 {
-    int ret = 0;
+    int ret;
 
     /*
      * Wait until the dest is ready before attempting to deliver the message
@@ -2890,7 +2889,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
     RDMAContext *rdma;
     RDMAControlHeader head;
-    int ret = 0;
+    int ret;
     ssize_t done = 0;
     size_t i, len;
 
@@ -3379,7 +3378,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     RDMAContext *rdma_return_path = NULL;
     struct rdma_cm_event *cm_event;
     struct ibv_context *verbs;
-    int ret = -EINVAL;
+    int ret;
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
@@ -3389,7 +3388,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     if (cm_event->event != RDMA_CM_EVENT_CONNECT_REQUEST) {
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3402,7 +3400,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         rdma_return_path = qemu_rdma_data_init(rdma->host_port, NULL);
         if (rdma_return_path == NULL) {
             rdma_ack_cm_event(cm_event);
-            ret = -1;
             goto err_rdma_dest_wait;
         }
 
@@ -3417,7 +3414,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
         error_report("Unknown source RDMA version: %d, bailing...",
                      cap.version);
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3450,7 +3446,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     } else if (rdma->verbs != verbs) {
         error_report("ibv context not matching %p, %p!", rdma->verbs,
                      verbs);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3504,7 +3499,6 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_accept not event established");
         rdma_ack_cm_event(cm_event);
-        ret = -1;
         goto err_rdma_dest_wait;
     }
 
@@ -3567,7 +3561,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     static RDMARegisterResult results[RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE];
     RDMALocalBlock *block;
     void *host_addr;
-    int ret = 0;
+    int ret;
     int idx = 0;
     int count = 0;
     int i = 0;
@@ -3596,7 +3590,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
         if (head.repeat > RDMA_CONTROL_MAX_COMMANDS_PER_MESSAGE) {
             error_report("rdma: Too many requests in this message (%d)."
                             "Bailing.", head.repeat);
-            ret = -EIO;
             break;
         }
 
@@ -3612,7 +3605,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 error_report("rdma: 'compress' bad block index %u (vs %d)",
                              (unsigned int)comp->block_idx,
                              rdma->local_ram_blocks.nb_blocks);
-                ret = -EIO;
                 goto err;
             }
             block = &(rdma->local_ram_blocks.block[comp->block_idx]);
@@ -3711,7 +3703,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                     error_report("rdma: 'register' bad block index %u (vs %d)",
                                  (unsigned int)reg->current_index,
                                  rdma->local_ram_blocks.nb_blocks);
-                    ret = -ENOENT;
                     goto err;
                 }
                 block = &(rdma->local_ram_blocks.block[reg->current_index]);
@@ -3721,7 +3712,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             " offset: %" PRIx64 " current_addr: %" PRIx64,
                             block->block_name, block->offset,
                             reg->key.current_addr);
-                        ret = -ERANGE;
                         goto err;
                     }
                     host_addr = (block->local_host_addr +
@@ -3737,7 +3727,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                         error_report("rdma: bad chunk for block %s"
                             " chunk: %" PRIx64,
                             block->block_name, reg->key.chunk);
-                        ret = -ERANGE;
                         goto err;
                     }
                 }
@@ -3749,7 +3738,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                             (uintptr_t)host_addr, NULL, &tmp_rkey,
                             chunk, chunk_start, chunk_end)) {
                     error_report("cannot get rkey");
-                    ret = -EINVAL;
                     goto err;
                 }
                 reg_result->rkey = tmp_rkey;
@@ -3789,7 +3777,6 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
                 if (ret != 0) {
                     perror("rdma unregistration chunk failed");
-                    ret = -ret;
                     goto err;
                 }
 
@@ -3808,11 +3795,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             break;
         case RDMA_CONTROL_REGISTER_RESULT:
             error_report("Invalid RESULT message at dest.");
-            ret = -EIO;
             goto err;
         default:
             error_report("Unknown control message %s", control_desc(head.type));
-            ret = -EIO;
             goto err;
         }
     } while (1);
@@ -3916,7 +3901,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
-    int ret = 0;
+    int ret;
 
     if (migration_in_postcopy()) {
         return 0;
@@ -4190,7 +4175,7 @@ void rdma_start_outgoing_migration(void *opaque,
     MigrationState *s = opaque;
     RDMAContext *rdma_return_path = NULL;
     RDMAContext *rdma;
-    int ret = 0;
+    int ret;
 
     /* Avoid ram_block_discard_disable(), cannot change during migration. */
     if (ram_block_discard_is_required()) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 36/65] migration/rdma: Check negative error values the same way everywhere
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (34 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 35/65] migration/rdma: Drop superfluous assignments to @ret Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 37/65] migration/rdma: Plug a memory leak and improve a message Juan Quintela
                   ` (29 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

When a function returns 0 on success, negative value on error,
checking for non-zero suffices, but checking for negative is clearer.
So do that.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-30-armbru@redhat.com>
---
 migration/rdma.c | 82 ++++++++++++++++++++++++------------------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 08a82d5e57..b0125b01cf 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -953,7 +953,7 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create channel id");
         goto err_resolve_create_id;
     }
@@ -974,10 +974,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
         ret = rdma_resolve_addr(rdma->cm_id, NULL, e->ai_dst_addr,
                 RDMA_RESOLVE_TIMEOUT_MS);
-        if (!ret) {
+        if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
                 ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
-                if (ret) {
+                if (ret < 0) {
                     continue;
                 }
             }
@@ -994,7 +994,7 @@ route:
     qemu_rdma_dump_gid("source_resolve_addr", rdma->cm_id);
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
@@ -1010,13 +1010,13 @@ route:
 
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
@@ -1124,7 +1124,7 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
     attr.qp_type = IBV_QPT_RC;
 
     ret = rdma_create_qp(rdma->cm_id, rdma->pd, &attr);
-    if (ret) {
+    if (ret < 0) {
         return -1;
     }
 
@@ -1567,7 +1567,7 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
 
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-                    if (ret) {
+                    if (ret < 0) {
                         error_report("failed to get cm event while wait "
                                      "completion channel");
                         return -1;
@@ -1668,12 +1668,12 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
     while (1) {
         ret = qemu_rdma_wait_comp_channel(rdma, ch);
-        if (ret) {
+        if (ret < 0) {
             goto err_block_for_wrid;
         }
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
-        if (ret) {
+        if (ret < 0) {
             /*
              * FIXME perror() is problematic, because ibv_reg_mr() is
              * not documented to set errno.  Will go away later in
@@ -1911,7 +1911,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
@@ -1922,7 +1922,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a WR to replace the one we just consumed for the READY message.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting first control recv!");
         return -1;
     }
@@ -2009,7 +2009,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Post a new RECV work request to replace the one we just consumed.
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv!");
         return -1;
     }
@@ -2357,7 +2357,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
         ret = qemu_rdma_write_flush(rdma);
-        if (ret) {
+        if (ret < 0) {
             return -1;
         }
         rdma->current_length = 0;
@@ -2487,12 +2487,12 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     rdma->pin_all = pin_all;
 
     ret = qemu_rdma_resolve_host(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
                     " limits may be too low. Please check $ ulimit -a # and "
                     "search for 'ulimit -l' in the output");
@@ -2500,7 +2500,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
@@ -2517,7 +2517,7 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             ERROR(errp, "rdma migration: error registering %d control!",
                                                             idx);
             goto err_rdma_source_init;
@@ -2591,13 +2591,13 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     caps_to_network(&cap);
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         perror("rdma_connect");
         ERROR(errp, "connecting to destination!");
         goto err_rdma_source_connect;
@@ -2611,7 +2611,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
             ERROR(errp, "failed to get cm event");
         }
     }
-    if (ret) {
+    if (ret < 0) {
         /*
          * FIXME perror() is wrong, because
          * qemu_get_cm_event_timeout() can fail without setting errno.
@@ -2684,7 +2684,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
@@ -2700,7 +2700,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
@@ -2709,12 +2709,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
         ret = rdma_bind_addr(listen_id, e->ai_dst_addr);
-        if (ret) {
+        if (ret < 0) {
             continue;
         }
         if (e->ai_family == AF_INET6) {
             ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
-            if (ret) {
+            if (ret < 0) {
                 continue;
             }
         }
@@ -3342,7 +3342,7 @@ static void rdma_cm_poll_handler(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("get_cm_event failed %d", errno);
         return;
     }
@@ -3382,7 +3382,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     int idx;
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         goto err_rdma_dest_wait;
     }
 
@@ -3452,13 +3452,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     qemu_rdma_dump_id("dest_init", verbs);
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating pd and cq!");
         goto err_rdma_dest_wait;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error allocating qp!");
         goto err_rdma_dest_wait;
     }
@@ -3467,7 +3467,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
-        if (ret) {
+        if (ret < 0) {
             error_report("rdma: error registering %d control", idx);
             goto err_rdma_dest_wait;
         }
@@ -3485,13 +3485,13 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     }
 
     ret = rdma_accept(rdma->cm_id, &conn_param);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept failed");
         goto err_rdma_dest_wait;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma_accept get_cm_event failed");
         goto err_rdma_dest_wait;
     }
@@ -3506,7 +3506,7 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma->connected = true;
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-    if (ret) {
+    if (ret < 0) {
         error_report("rdma migration: error posting second control recv");
         goto err_rdma_dest_wait;
     }
@@ -3635,7 +3635,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
             if (rdma->pin_all) {
                 ret = qemu_rdma_reg_whole_ram_blocks(rdma);
-                if (ret) {
+                if (ret < 0) {
                     error_report("rdma migration: error dest "
                                     "registering ram blocks");
                     goto err;
@@ -4096,7 +4096,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     trace_qemu_rdma_accept_incoming_migration();
     ret = qemu_rdma_accept(rdma);
 
-    if (ret) {
+    if (ret < 0) {
         fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
         return;
     }
@@ -4140,7 +4140,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     }
 
     ret = qemu_rdma_dest_init(rdma, errp);
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4148,7 +4148,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
 
     ret = rdma_listen(rdma->listen_id, 5);
 
-    if (ret) {
+    if (ret < 0) {
         ERROR(errp, "listening on socket!");
         goto cleanup_rdma;
     }
@@ -4190,14 +4190,14 @@ void rdma_start_outgoing_migration(void *opaque,
 
     ret = qemu_rdma_source_init(rdma, migrate_rdma_pin_all(), errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
     trace_rdma_start_outgoing_migration_after_rdma_source_init();
     ret = qemu_rdma_connect(rdma, false, errp);
 
-    if (ret) {
+    if (ret < 0) {
         goto err;
     }
 
@@ -4212,13 +4212,13 @@ void rdma_start_outgoing_migration(void *opaque,
         ret = qemu_rdma_source_init(rdma_return_path,
                                     migrate_rdma_pin_all(), errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
         ret = qemu_rdma_connect(rdma_return_path, true, errp);
 
-        if (ret) {
+        if (ret < 0) {
             goto return_path_err;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 37/65] migration/rdma: Plug a memory leak and improve a message
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (35 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 36/65] migration/rdma: Check negative error values the same way everywhere Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 38/65] migration/rdma: Delete inappropriate error_report() in macro ERROR() Juan Quintela
                   ` (28 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

When migration capability @rdma-pin-all is true, but the server cannot
honor it, qemu_rdma_connect() calls macro ERROR(), then returns
success.

ERROR() sets an error.  Since qemu_rdma_connect() returns success, its
caller rdma_start_outgoing_migration() duly assumes @errp is still
clear.  The Error object leaks.

ERROR() additionally reports the situation to the user as an error:

    RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.

Is this an error or not?  It actually isn't; we disable @rdma-pin-all
and carry on.  "Correcting" the user's configuration decisions that
way feels problematic, but that's a topic for another day.

Replace ERROR() by warn_report().  This plugs the memory leak, and
emits a clearer message to the user.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-31-armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b0125b01cf..00e3c430f4 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2637,8 +2637,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
      * and disable them otherwise.
      */
     if (rdma->pin_all && !(cap.flags & RDMA_CAPABILITY_PIN_ALL)) {
-        ERROR(errp, "Server cannot support pinning all memory. "
-                        "Will register memory dynamically.");
+        warn_report("RDMA: Server cannot support pinning all memory. "
+                    "Will register memory dynamically.");
         rdma->pin_all = false;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 38/65] migration/rdma: Delete inappropriate error_report() in macro ERROR()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (36 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 37/65] migration/rdma: Plug a memory leak and improve a message Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 39/65] migration/rdma: Retire " Juan Quintela
                   ` (27 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

Macro ERROR() violates this principle.  Delete the error_report()
there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-32-armbru@redhat.com>
---
 migration/rdma.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 00e3c430f4..6c0e6cda2c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,12 +40,8 @@
 #include "options.h"
 #include <poll.h>
 
-/*
- * Print and error on both the Monitor and the Log file.
- */
 #define ERROR(errp, fmt, ...) \
     do { \
-        fprintf(stderr, "RDMA ERROR: " fmt "\n", ## __VA_ARGS__); \
         if (errp && (*(errp) == NULL)) { \
             error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
         } \
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 39/65] migration/rdma: Retire macro ERROR()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (37 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 38/65] migration/rdma: Delete inappropriate error_report() in macro ERROR() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 40/65] migration/rdma: Fix error handling around rdma_getaddrinfo() Juan Quintela
                   ` (26 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

ERROR() has become "error_setg() unless an error has been set
already".  Hiding the conditional in the macro is in the way of
further work.  Replace the macro uses by their expansion, and delete
the macro.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-33-armbru@redhat.com>
---
 migration/rdma.c | 168 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 120 insertions(+), 48 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6c0e6cda2c..4074509f06 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -40,13 +40,6 @@
 #include "options.h"
 #include <poll.h>
 
-#define ERROR(errp, fmt, ...) \
-    do { \
-        if (errp && (*(errp) == NULL)) { \
-            error_setg(errp, "RDMA ERROR: " fmt, ## __VA_ARGS__); \
-        } \
-    } while (0)
-
 #define RDMA_RESOLVE_TIMEOUT_MS 10000
 
 /* Do not merge data if larger than this. */
@@ -865,7 +858,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                ERROR(errp, "Could not query initial IB port");
+                if (errp && !*errp) {
+                    error_setg(errp,
+                               "RDMA ERROR: Could not query initial IB port");
+                }
                 return -1;
             }
 
@@ -888,9 +884,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                ERROR(errp, "You only have RoCE / iWARP devices in your systems"
-                            " and your management software has specified '[::]'"
-                            ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                if (errp && !*errp) {
+                    error_setg(errp, "RDMA ERROR: "
+                               "You only have RoCE / iWARP devices in your systems"
+                               " and your management software has specified '[::]'"
+                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
+                }
                 return -1;
             }
         }
@@ -906,13 +905,18 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        ERROR(errp, "Could not query initial IB port");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
+        }
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        ERROR(errp, "Linux kernel's RoCE / iWARP does not support IPv6 "
-                    "(but patches on linux-rdma in progress)");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "Linux kernel's RoCE / iWARP does not support IPv6 "
+                       "(but patches on linux-rdma in progress)");
+        }
         return -1;
     }
 
@@ -936,21 +940,27 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        ERROR(errp, "RDMA hostname has not been set");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
+        }
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create CM channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create CM channel");
+        }
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create channel id");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create channel id");
+        }
         goto err_resolve_create_id;
     }
 
@@ -959,7 +969,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_resolve_get_addr;
     }
 
@@ -982,7 +995,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     }
 
     rdma_freeaddrinfo(res);
-    ERROR(errp, "could not resolve address %s", rdma->host);
+    if (errp && !*errp) {
+        error_setg(errp, "RDMA ERROR: could not resolve address %s",
+                   rdma->host);
+    }
     goto err_resolve_get_addr;
 
 route:
@@ -991,13 +1007,18 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_addr_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
+        }
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        ERROR(errp, "result not equal to event_addr_resolved %s",
-                rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp,
+                       "RDMA ERROR: result not equal to event_addr_resolved %s",
+                       rdma_event_str(cm_event->event));
+        }
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1007,18 +1028,25 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        ERROR(errp, "could not resolve rdma route");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
+        }
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        ERROR(errp, "could not perform event_route_resolved");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
+        }
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        ERROR(errp, "result not equal to event_route_resolved: %s",
-                        rdma_event_str(cm_event->event));
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "result not equal to event_route_resolved: %s",
+                       rdma_event_str(cm_event->event));
+        }
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2489,15 +2517,20 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating pd and cq! Your mlock()"
-                    " limits may be too low. Please check $ ulimit -a # and "
-                    "search for 'ulimit -l' in the output");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: "
+                       "rdma migration: error allocating pd and cq! Your mlock()"
+                       " limits may be too low. Please check $ ulimit -a # and "
+                       "search for 'ulimit -l' in the output");
+        }
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        ERROR(errp, "rdma migration: error allocating qp!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
+        }
         goto err_rdma_source_init;
     }
 
@@ -2514,8 +2547,11 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            ERROR(errp, "rdma migration: error registering %d control!",
-                                                            idx);
+            if (errp && !*errp) {
+                error_setg(errp,
+                           "RDMA ERROR: rdma migration: error registering %d control!",
+                           idx);
+            }
             goto err_rdma_source_init;
         }
     }
@@ -2543,19 +2579,29 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        ERROR(errp, "poll cm event timeout");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: poll cm event timeout");
+        }
         return -1;
     } else if (ret < 0) {
-        ERROR(errp, "failed to poll cm event, errno=%i", errno);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                       errno);
+        }
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
             return -1;
         }
         return 0;
     } else {
-        ERROR(errp, "no POLLIN event, revent=%x", poll_fd.revents);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                       poll_fd.revents);
+        }
         return -1;
     }
 }
@@ -2588,14 +2634,18 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        ERROR(errp, "posting second control recv");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: posting second control recv");
+        }
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         goto err_rdma_source_connect;
     }
 
@@ -2604,7 +2654,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            ERROR(errp, "failed to get cm event");
+            if (errp && !*errp) {
+                error_setg(errp, "RDMA ERROR: failed to get cm event");
+            }
         }
     }
     if (ret < 0) {
@@ -2619,7 +2671,9 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        ERROR(errp, "connecting to destination!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: connecting to destination!");
+        }
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2666,14 +2720,18 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        ERROR(errp, "RDMA host is not set!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
+        }
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        ERROR(errp, "could not create rdma event channel");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
+        }
         rdma->errored = true;
         return -1;
     }
@@ -2681,7 +2739,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        ERROR(errp, "could not create cm_id!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not create cm_id!");
+        }
         goto err_dest_init_create_listen_id;
     }
 
@@ -2690,14 +2750,19 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        ERROR(errp, "could not rdma_getaddrinfo address %s", rdma->host);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                       rdma->host);
+        }
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        ERROR(errp, "Error: could not set REUSEADDR option");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
+        }
         goto err_dest_init_bind_addr;
     }
     for (e = res; e != NULL; e = e->ai_next) {
@@ -2719,7 +2784,9 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        ERROR(errp, "Error: could not rdma_bind_addr!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
+        }
         goto err_dest_init_bind_addr;
     }
 
@@ -2771,7 +2838,10 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        ERROR(errp, "bad RDMA migration address '%s'", host_port);
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                       host_port);
+        }
         g_free(rdma);
         rdma = NULL;
     }
@@ -4145,7 +4215,9 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        ERROR(errp, "listening on socket!");
+        if (errp && !*errp) {
+            error_setg(errp, "RDMA ERROR: listening on socket!");
+        }
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 40/65] migration/rdma: Fix error handling around rdma_getaddrinfo()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (38 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 39/65] migration/rdma: Retire " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 41/65] migration/rdma: Drop "@errp is clear" guards around error_setg() Juan Quintela
                   ` (25 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
addresses to find one that works, holding onto the first Error from
qemu_rdma_broken_ipv6_kernel() for use when no address works.  Issues:

1. If @errp was &error_abort or &error_fatal, we'd terminate instead
   of trying the next address.  Can't actually happen, since no caller
   passes these arguments.

2. When @errp is a pointer to a variable containing NULL, and
   qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
   contains NULL.  Subsequent iterations pass it again, violating
   Error usage rules.  Dangerous, as setting an error would then trip
   error_setv()'s assertion.  Works only because
   qemu_rdma_broken_ipv6_kernel() and the code following the loops
   carefully avoids setting a second error.

3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
   finds a working address, @errp still holds the first error from
   qemu_rdma_broken_ipv6_kernel().  If we then run into another error,
   we report the qemu_rdma_broken_ipv6_kernel() failure instead.

4. If we don't run into another error, we leak the Error object.

Use a local error variable, and propagate to @errp.  This fixes 3. and
also cleans up 1 and partly 2.

Free this error when we have a working address.  This fixes 4.

Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
until it fails.  Pass null on any later iterations.  This cleans up
the remainder of 2.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-34-armbru@redhat.com>
---
 migration/rdma.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 4074509f06..3fb899f963 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -932,6 +932,7 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
  */
 static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret;
     struct rdma_addrinfo *res;
     char port_str[16];
@@ -976,7 +977,10 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
         goto err_resolve_get_addr;
     }
 
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_resolve_host_trying(rdma->host, ip);
@@ -985,17 +989,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
                 RDMA_RESOLVE_TIMEOUT_MS);
         if (ret >= 0) {
             if (e->ai_family == AF_INET6) {
-                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs, errp);
+                ret = qemu_rdma_broken_ipv6_kernel(rdma->cm_id->verbs,
+                                                   local_errp);
                 if (ret < 0) {
                     continue;
                 }
             }
+            error_free(err);
             goto route;
         }
     }
 
     rdma_freeaddrinfo(res);
-    if (errp && !*errp) {
+    if (err) {
+        error_propagate(errp, err);
+    } else {
         error_setg(errp, "RDMA ERROR: could not resolve address %s",
                    rdma->host);
     }
@@ -2707,6 +2715,7 @@ err_rdma_source_connect:
 
 static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 {
+    Error *err = NULL;
     int ret, idx;
     struct rdma_cm_id *listen_id;
     char ip[40] = "unknown";
@@ -2765,7 +2774,11 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
         }
         goto err_dest_init_bind_addr;
     }
+
+    /* Try all addresses, saving the first error in @err */
     for (e = res; e != NULL; e = e->ai_next) {
+        Error **local_errp = err ? NULL : &err;
+
         inet_ntop(e->ai_family,
             &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
         trace_qemu_rdma_dest_init_trying(rdma->host, ip);
@@ -2774,17 +2787,21 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
             continue;
         }
         if (e->ai_family == AF_INET6) {
-            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs, errp);
+            ret = qemu_rdma_broken_ipv6_kernel(listen_id->verbs,
+                                               local_errp);
             if (ret < 0) {
                 continue;
             }
         }
+        error_free(err);
         break;
     }
 
     rdma_freeaddrinfo(res);
     if (!e) {
-        if (errp && !*errp) {
+        if (err) {
+            error_propagate(errp, err);
+        } else {
             error_setg(errp, "RDMA ERROR: Error: could not rdma_bind_addr!");
         }
         goto err_dest_init_bind_addr;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 41/65] migration/rdma: Drop "@errp is clear" guards around error_setg()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (39 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 40/65] migration/rdma: Fix error handling around rdma_getaddrinfo() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 42/65] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Juan Quintela
                   ` (24 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

These guards are all redundant now.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-35-armbru@redhat.com>
---
 migration/rdma.c | 164 +++++++++++++++--------------------------------
 1 file changed, 51 insertions(+), 113 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3fb899f963..fdb527af39 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -858,10 +858,8 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
             if (ibv_query_port(verbs, 1, &port_attr)) {
                 ibv_close_device(verbs);
-                if (errp && !*errp) {
-                    error_setg(errp,
-                               "RDMA ERROR: Could not query initial IB port");
-                }
+                error_setg(errp,
+                           "RDMA ERROR: Could not query initial IB port");
                 return -1;
             }
 
@@ -884,12 +882,10 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
                                 " migrate over the IB fabric until the kernel "
                                 " fixes the bug.\n");
             } else {
-                if (errp && !*errp) {
-                    error_setg(errp, "RDMA ERROR: "
-                               "You only have RoCE / iWARP devices in your systems"
-                               " and your management software has specified '[::]'"
-                               ", but IPv6 over RoCE / iWARP is not supported in Linux.");
-                }
+                error_setg(errp, "RDMA ERROR: "
+                           "You only have RoCE / iWARP devices in your systems"
+                           " and your management software has specified '[::]'"
+                           ", but IPv6 over RoCE / iWARP is not supported in Linux.");
                 return -1;
             }
         }
@@ -905,18 +901,14 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
     /* IB ports start with 1, not 0 */
     if (ibv_query_port(verbs, 1, &port_attr)) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Could not query initial IB port");
-        }
+        error_setg(errp, "RDMA ERROR: Could not query initial IB port");
         return -1;
     }
 
     if (port_attr.link_layer == IBV_LINK_LAYER_ETHERNET) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "Linux kernel's RoCE / iWARP does not support IPv6 "
-                       "(but patches on linux-rdma in progress)");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "Linux kernel's RoCE / iWARP does not support IPv6 "
+                   "(but patches on linux-rdma in progress)");
         return -1;
     }
 
@@ -941,27 +933,21 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
     struct rdma_addrinfo *e;
 
     if (rdma->host == NULL || !strcmp(rdma->host, "")) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA hostname has not been set");
         return -1;
     }
 
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create CM channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create CM channel");
         return -1;
     }
 
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &rdma->cm_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create channel id");
-        }
+        error_setg(errp, "RDMA ERROR: could not create channel id");
         goto err_resolve_create_id;
     }
 
@@ -970,10 +956,8 @@ static int qemu_rdma_resolve_host(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_resolve_get_addr;
     }
 
@@ -1015,18 +999,14 @@ route:
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_addr_resolved");
         goto err_resolve_get_addr;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ADDR_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp,
-                       "RDMA ERROR: result not equal to event_addr_resolved %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp,
+                   "RDMA ERROR: result not equal to event_addr_resolved %s",
+                   rdma_event_str(cm_event->event));
         error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
@@ -1036,25 +1016,19 @@ route:
     /* resolve route */
     ret = rdma_resolve_route(rdma->cm_id, RDMA_RESOLVE_TIMEOUT_MS);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not resolve rdma route");
-        }
+        error_setg(errp, "RDMA ERROR: could not resolve rdma route");
         goto err_resolve_get_addr;
     }
 
     ret = rdma_get_cm_event(rdma->channel, &cm_event);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
-        }
+        error_setg(errp, "RDMA ERROR: could not perform event_route_resolved");
         goto err_resolve_get_addr;
     }
     if (cm_event->event != RDMA_CM_EVENT_ROUTE_RESOLVED) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "result not equal to event_route_resolved: %s",
-                       rdma_event_str(cm_event->event));
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "result not equal to event_route_resolved: %s",
+                   rdma_event_str(cm_event->event));
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
@@ -2525,20 +2499,16 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
 
     ret = qemu_rdma_alloc_pd_cq(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: "
-                       "rdma migration: error allocating pd and cq! Your mlock()"
-                       " limits may be too low. Please check $ ulimit -a # and "
-                       "search for 'ulimit -l' in the output");
-        }
+        error_setg(errp, "RDMA ERROR: "
+                   "rdma migration: error allocating pd and cq! Your mlock()"
+                   " limits may be too low. Please check $ ulimit -a # and "
+                   "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
     }
 
     ret = qemu_rdma_alloc_qp(rdma);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
-        }
+        error_setg(errp, "RDMA ERROR: rdma migration: error allocating qp!");
         goto err_rdma_source_init;
     }
 
@@ -2555,11 +2525,9 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
     for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
         ret = qemu_rdma_reg_control(rdma, idx);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp,
-                           "RDMA ERROR: rdma migration: error registering %d control!",
-                           idx);
-            }
+            error_setg(errp,
+                       "RDMA ERROR: rdma migration: error registering %d control!",
+                       idx);
             goto err_rdma_source_init;
         }
     }
@@ -2587,29 +2555,21 @@ static int qemu_get_cm_event_timeout(RDMAContext *rdma,
     } while (ret < 0 && errno == EINTR);
 
     if (ret == 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: poll cm event timeout");
-        }
+        error_setg(errp, "RDMA ERROR: poll cm event timeout");
         return -1;
     } else if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
-                       errno);
-        }
+        error_setg(errp, "RDMA ERROR: failed to poll cm event, errno=%i",
+                   errno);
         return -1;
     } else if (poll_fd.revents & POLLIN) {
         if (rdma_get_cm_event(rdma->channel, cm_event) < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
             return -1;
         }
         return 0;
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
-                       poll_fd.revents);
-        }
+        error_setg(errp, "RDMA ERROR: no POLLIN event, revent=%x",
+                   poll_fd.revents);
         return -1;
     }
 }
@@ -2642,18 +2602,14 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: posting second control recv");
-        }
+        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
         perror("rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2662,9 +2618,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            if (errp && !*errp) {
-                error_setg(errp, "RDMA ERROR: failed to get cm event");
-            }
+            error_setg(errp, "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
@@ -2679,9 +2633,7 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
         error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: connecting to destination!");
-        }
+        error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
     }
@@ -2729,18 +2681,14 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     }
 
     if (!rdma->host || !rdma->host[0]) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: RDMA host is not set!");
-        }
+        error_setg(errp, "RDMA ERROR: RDMA host is not set!");
         rdma->errored = true;
         return -1;
     }
     /* create CM channel */
     rdma->channel = rdma_create_event_channel();
     if (!rdma->channel) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create rdma event channel");
-        }
+        error_setg(errp, "RDMA ERROR: could not create rdma event channel");
         rdma->errored = true;
         return -1;
     }
@@ -2748,9 +2696,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
     /* create CM id */
     ret = rdma_create_id(rdma->channel, &listen_id, NULL, RDMA_PS_TCP);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not create cm_id!");
-        }
+        error_setg(errp, "RDMA ERROR: could not create cm_id!");
         goto err_dest_init_create_listen_id;
     }
 
@@ -2759,19 +2705,15 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp)
 
     ret = rdma_getaddrinfo(rdma->host, port_str, NULL, &res);
     if (ret) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
-                       rdma->host);
-        }
+        error_setg(errp, "RDMA ERROR: could not rdma_getaddrinfo address %s",
+                   rdma->host);
         goto err_dest_init_bind_addr;
     }
 
     ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
                           &reuse, sizeof reuse);
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
-        }
+        error_setg(errp, "RDMA ERROR: Error: could not set REUSEADDR option");
         goto err_dest_init_bind_addr;
     }
 
@@ -2855,10 +2797,8 @@ static RDMAContext *qemu_rdma_data_init(const char *host_port, Error **errp)
         rdma->host = g_strdup(addr->host);
         rdma->host_port = g_strdup(host_port);
     } else {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
-                       host_port);
-        }
+        error_setg(errp, "RDMA ERROR: bad RDMA migration address '%s'",
+                   host_port);
         g_free(rdma);
         rdma = NULL;
     }
@@ -4232,9 +4172,7 @@ void rdma_start_incoming_migration(const char *host_port, Error **errp)
     ret = rdma_listen(rdma->listen_id, 5);
 
     if (ret < 0) {
-        if (errp && !*errp) {
-            error_setg(errp, "RDMA ERROR: listening on socket!");
-        }
+        error_setg(errp, "RDMA ERROR: listening on socket!");
         goto cleanup_rdma;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 42/65] migration/rdma: Convert qemu_rdma_exchange_recv() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (40 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 41/65] migration/rdma: Drop "@errp is clear" guards around error_setg() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 43/65] migration/rdma: Convert qemu_rdma_exchange_send() " Juan Quintela
                   ` (23 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_readv() violates this principle: it calls
error_report() via qemu_rdma_exchange_recv().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_recv() to Error.

Necessitates setting an error when qemu_rdma_exchange_get_response()
failed.  Since this error will go away later in this series, simply
use "FIXME temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-36-armbru@redhat.com>
---
 migration/rdma.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index fdb527af39..b13fec6328 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1980,7 +1980,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
  * control-channel message.
  */
 static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
-                                   uint32_t expecting)
+                                   uint32_t expecting, Error **errp)
 {
     RDMAControlHeader ready = {
                                 .len = 0,
@@ -1995,7 +1995,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2006,6 +2006,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
                                           expecting, RDMA_WRID_READY);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -2016,7 +2017,7 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv!");
+        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2957,11 +2958,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
         /* We've got nothing at all, so lets wait for
          * more to arrive
          */
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE,
+                                      errp);
 
         if (ret < 0) {
             rdma->errored = true;
-            error_setg(errp, "qemu_rdma_exchange_recv failed");
             return -1;
         }
 
@@ -3575,6 +3576,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     RDMAControlHeader blocks = { .type = RDMA_CONTROL_RAM_BLOCKS_RESULT,
                                  .repeat = 1 };
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMALocalBlocks *local;
     RDMAControlHeader head;
@@ -3604,9 +3606,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
     do {
         trace_qemu_rdma_registration_handle_wait();
 
-        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE);
+        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_NONE, &err);
 
         if (ret < 0) {
+            error_report_err(err);
             break;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 43/65] migration/rdma: Convert qemu_rdma_exchange_send() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (41 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 42/65] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 44/65] migration/rdma: Convert qemu_rdma_exchange_get_response() " Juan Quintela
                   ` (22 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_exchange_send().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_exchange_send() to Error.

Necessitates setting an error when qemu_rdma_post_recv_control(),
callback(), or qemu_rdma_exchange_get_response() failed.  Since these
errors will go away later in this series, simply use "FIXME temporary
error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-37-armbru@redhat.com>
---
 migration/rdma.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b13fec6328..7866cf9bb7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma));
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
                                        const uint8_t *host)
@@ -1376,6 +1377,8 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
  */
 static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
 {
+    Error *err = NULL;
+
     while (rdma->unregistrations[rdma->unregister_current]) {
         int ret;
         uint64_t wr_id = rdma->unregistrations[rdma->unregister_current];
@@ -1438,8 +1441,9 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         reg.key.chunk = chunk;
         register_to_network(rdma, &reg);
         ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                &resp, NULL, NULL);
+                                      &resp, NULL, NULL, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
 
@@ -1893,7 +1897,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma))
+                                   int (*callback)(RDMAContext *rdma),
+                                   Error **errp)
 {
     int ret;
 
@@ -1908,6 +1913,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               RDMA_CONTROL_READY,
                                               RDMA_WRID_READY);
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1918,7 +1924,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
         if (ret < 0) {
-            error_report("rdma migration: error posting"
+            error_setg(errp, "rdma migration: error posting"
                     " extra control recv for anticipated result!");
             return -1;
         }
@@ -1929,7 +1935,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      */
     ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
     if (ret < 0) {
-        error_report("rdma migration: error posting first control recv!");
+        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -1939,7 +1945,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     ret = qemu_rdma_post_send_control(rdma, data, head);
 
     if (ret < 0) {
-        error_report("Failed to send control buffer!");
+        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -1951,6 +1957,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
             trace_qemu_rdma_exchange_send_issue_callback();
             ret = callback(rdma);
             if (ret < 0) {
+                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -1960,6 +1967,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                               resp->type, RDMA_WRID_DATA);
 
         if (ret < 0) {
+            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -2034,6 +2042,7 @@ static int qemu_rdma_write_one(RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
                                uint64_t length)
 {
+    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2119,9 +2128,10 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL);
+                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
 
                 if (ret < 0) {
+                    error_report_err(err);
                     return -1;
                 }
 
@@ -2156,8 +2166,9 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL);
+                                    &resp, &reg_result_idx, NULL, &err);
             if (ret < 0) {
+                error_report_err(err);
                 return -1;
             }
 
@@ -2864,11 +2875,11 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
             head.len = len;
             head.type = RDMA_CONTROL_QEMU_FILE;
 
-            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
+            ret = qemu_rdma_exchange_send(rdma, &head,
+                                          data, NULL, NULL, NULL, errp);
 
             if (ret < 0) {
                 rdma->errored = true;
-                error_setg(errp, "qemu_rdma_exchange_send failed");
                 return -1;
             }
 
@@ -3925,6 +3936,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
                                        uint64_t flags, void *data)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     RDMAControlHeader head = { .len = 0, .repeat = 1 };
     int ret;
@@ -3968,9 +3980,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
         ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
                     &reg_result_idx, rdma->pin_all ?
-                    qemu_rdma_reg_whole_ram_blocks : NULL);
+                    qemu_rdma_reg_whole_ram_blocks : NULL,
+                    &err);
         if (ret < 0) {
-            fprintf(stderr, "receiving remote info!");
+            error_report_err(err);
             return -1;
         }
 
@@ -4021,9 +4034,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
     trace_qemu_rdma_registration_stop(flags);
 
     head.type = RDMA_CONTROL_REGISTER_FINISHED;
-    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
+    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL, &err);
 
     if (ret < 0) {
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 44/65] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (42 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 43/65] migration/rdma: Convert qemu_rdma_exchange_send() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 45/65] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Juan Quintela
                   ` (21 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
principle: they call error_report() via
qemu_rdma_exchange_get_response().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by converting qemu_rdma_exchange_get_response() to
Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-38-armbru@redhat.com>
---
 migration/rdma.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 7866cf9bb7..87f2e6129e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1824,14 +1824,15 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
  * Block and wait for a RECV control channel message to arrive.
  */
 static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
-                RDMAControlHeader *head, uint32_t expecting, int idx)
+                RDMAControlHeader *head, uint32_t expecting, int idx,
+                Error **errp)
 {
     uint32_t byte_len;
     int ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RECV_CONTROL + idx,
                                        &byte_len);
 
     if (ret < 0) {
-        error_report("rdma migration: recv polling control error!");
+        error_setg(errp, "rdma migration: recv polling control error!");
         return -1;
     }
 
@@ -1844,7 +1845,7 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         trace_qemu_rdma_exchange_get_response_none(control_desc(head->type),
                                              head->type);
     } else if (head->type != expecting || head->type == RDMA_CONTROL_ERROR) {
-        error_report("Was expecting a %s (%d) control message"
+        error_setg(errp, "Was expecting a %s (%d) control message"
                 ", but got: %s (%d), length: %d",
                 control_desc(expecting), expecting,
                 control_desc(head->type), head->type, head->len);
@@ -1854,11 +1855,12 @@ static int qemu_rdma_exchange_get_response(RDMAContext *rdma,
         return -1;
     }
     if (head->len > RDMA_CONTROL_MAX_BUFFER - sizeof(*head)) {
-        error_report("too long length: %d", head->len);
+        error_setg(errp, "too long length: %d", head->len);
         return -1;
     }
     if (sizeof(*head) + head->len != byte_len) {
-        error_report("Malformed length: %d byte_len %d", head->len, byte_len);
+        error_setg(errp, "Malformed length: %d byte_len %d",
+                   head->len, byte_len);
         return -1;
     }
 
@@ -1911,9 +1913,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
         ret = qemu_rdma_exchange_get_response(rdma, &resp_ignored,
                                               RDMA_CONTROL_READY,
-                                              RDMA_WRID_READY);
+                                              RDMA_WRID_READY, errp);
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
     }
@@ -1964,10 +1965,10 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
 
         trace_qemu_rdma_exchange_send_waiting(control_desc(resp->type));
         ret = qemu_rdma_exchange_get_response(rdma, resp,
-                                              resp->type, RDMA_WRID_DATA);
+                                              resp->type, RDMA_WRID_DATA,
+                                              errp);
 
         if (ret < 0) {
-            error_setg(errp, "FIXME temporary error message");
             return -1;
         }
 
@@ -2011,10 +2012,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
      * Block and wait for the message.
      */
     ret = qemu_rdma_exchange_get_response(rdma, head,
-                                          expecting, RDMA_WRID_READY);
+                                          expecting, RDMA_WRID_READY, errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 45/65] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (43 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 44/65] migration/rdma: Convert qemu_rdma_exchange_get_response() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 46/65] migration/rdma: Convert qemu_rdma_write_flush() " Juan Quintela
                   ` (20 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via callback qemu_rdma_reg_whole_ram_blocks().  I
elected not to investigate how callers handle the error, i.e. precise
impact is not known.

Clean this up by converting the callback to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-39-armbru@redhat.com>
---
 migration/rdma.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 87f2e6129e..189932097e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -518,7 +518,8 @@ static void network_to_result(RDMARegisterResult *result)
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp);
 
 static inline uint64_t ram_chunk_index(const uint8_t *start,
@@ -1177,7 +1178,7 @@ static void qemu_rdma_advise_prefetch_mr(struct ibv_pd *pd, uint64_t addr,
 #endif
 }
 
-static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
+static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, Error **errp)
 {
     int i;
     RDMALocalBlocks *local = &rdma->local_ram_blocks;
@@ -1217,16 +1218,16 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
         }
 
         if (!local->block[i].mr) {
-            perror("Failed to register local dest ram block!");
-            break;
+            error_setg_errno(errp, errno,
+                             "Failed to register local dest ram block!");
+            goto err;
         }
         rdma->total_registrations++;
     }
 
-    if (i >= local->nb_blocks) {
-        return 0;
-    }
+    return 0;
 
+err:
     for (i--; i >= 0; i--) {
         ibv_dereg_mr(local->block[i].mr);
         local->block[i].mr = NULL;
@@ -1899,7 +1900,8 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
 static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
                                    uint8_t *data, RDMAControlHeader *resp,
                                    int *resp_idx,
-                                   int (*callback)(RDMAContext *rdma),
+                                   int (*callback)(RDMAContext *rdma,
+                                                   Error **errp),
                                    Error **errp)
 {
     int ret;
@@ -1956,9 +1958,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     if (resp) {
         if (callback) {
             trace_qemu_rdma_exchange_send_issue_callback();
-            ret = callback(rdma);
+            ret = callback(rdma, errp);
             if (ret < 0) {
-                error_setg(errp, "FIXME temporary error message");
                 return -1;
             }
         }
@@ -3671,10 +3672,9 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             if (rdma->pin_all) {
-                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
+                ret = qemu_rdma_reg_whole_ram_blocks(rdma, &err);
                 if (ret < 0) {
-                    error_report("rdma migration: error dest "
-                                    "registering ram blocks");
+                    error_report_err(err);
                     goto err;
                 }
             }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 46/65] migration/rdma: Convert qemu_rdma_write_flush() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (44 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 45/65] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 47/65] migration/rdma: Convert qemu_rdma_write_one() " Juan Quintela
                   ` (19 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_write_flush().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_flush() to Error.

Necessitates setting an error when qemu_rdma_write_one() failed.
Since this error will go away later in this series, simply use "FIXME
temporary error message" there.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-40-armbru@redhat.com>
---
 migration/rdma.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 189932097e..1a74c6d955 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2283,7 +2283,7 @@ retry:
  * We support sending out multiple chunks at the same time.
  * Not all of them need to get signaled in the completion queue.
  */
-static int qemu_rdma_write_flush(RDMAContext *rdma)
+static int qemu_rdma_write_flush(RDMAContext *rdma, Error **errp)
 {
     int ret;
 
@@ -2295,6 +2295,7 @@ static int qemu_rdma_write_flush(RDMAContext *rdma)
             rdma->current_index, rdma->current_addr, rdma->current_length);
 
     if (ret < 0) {
+        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
@@ -2368,6 +2369,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
                            uint64_t len)
 {
+    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2375,8 +2377,9 @@ static int qemu_rdma_write(RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(rdma);
+        ret = qemu_rdma_write_flush(rdma, &err);
         if (ret < 0) {
+            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2393,7 +2396,10 @@ static int qemu_rdma_write(RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        return qemu_rdma_write_flush(rdma);
+        if (qemu_rdma_write_flush(rdma, &err) < 0) {
+            error_report_err(err);
+            return -1;
+        }
     }
 
     return 0;
@@ -2857,10 +2863,9 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
      * Push out any writes that
      * we're queued up for VM's ram.
      */
-    ret = qemu_rdma_write_flush(rdma);
+    ret = qemu_rdma_write_flush(rdma, errp);
     if (ret < 0) {
         rdma->errored = true;
-        error_setg(errp, "qemu_rdma_write_flush failed");
         return -1;
     }
 
@@ -3002,9 +3007,11 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
  */
 static int qemu_rdma_drain_cq(RDMAContext *rdma)
 {
+    Error *err = NULL;
     int ret;
 
-    if (qemu_rdma_write_flush(rdma) < 0) {
+    if (qemu_rdma_write_flush(rdma, &err) < 0) {
+        error_report_err(err);
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 47/65] migration/rdma: Convert qemu_rdma_write_one() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (45 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 46/65] migration/rdma: Convert qemu_rdma_write_flush() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 48/65] migration/rdma: Convert qemu_rdma_write() " Juan Quintela
                   ` (18 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_flush() violates this principle: it calls
error_report() via qemu_rdma_write_one().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_write_one() to Error.  Bonus:
resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-41-armbru@redhat.com>
---
 migration/rdma.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 1a74c6d955..369d30c895 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2041,9 +2041,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
  */
 static int qemu_rdma_write_one(RDMAContext *rdma,
                                int current_index, uint64_t current_addr,
-                               uint64_t length)
+                               uint64_t length, Error **errp)
 {
-    Error *err = NULL;
     struct ibv_sge sge;
     struct ibv_send_wr send_wr = { 0 };
     struct ibv_send_wr *bad_wr;
@@ -2097,7 +2096,7 @@ retry:
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
 
         if (ret < 0) {
-            error_report("Failed to Wait for previous write to complete "
+            error_setg(errp, "Failed to Wait for previous write to complete "
                     "block %d chunk %" PRIu64
                     " current %" PRIu64 " len %" PRIu64 " %d",
                     current_index, chunk, sge.addr, length, rdma->nb_sent);
@@ -2129,10 +2128,9 @@ retry:
 
                 compress_to_network(rdma, &comp);
                 ret = qemu_rdma_exchange_send(rdma, &head,
-                                (uint8_t *) &comp, NULL, NULL, NULL, &err);
+                                (uint8_t *) &comp, NULL, NULL, NULL, errp);
 
                 if (ret < 0) {
-                    error_report_err(err);
                     return -1;
                 }
 
@@ -2167,9 +2165,8 @@ retry:
 
             register_to_network(rdma, &reg);
             ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
-                                    &resp, &reg_result_idx, NULL, &err);
+                                    &resp, &reg_result_idx, NULL, errp);
             if (ret < 0) {
-                error_report_err(err);
                 return -1;
             }
 
@@ -2177,7 +2174,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey");
+                error_setg(errp, "cannot get lkey");
                 return -1;
             }
 
@@ -2196,7 +2193,7 @@ retry:
             if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                 &sge.lkey, NULL, chunk,
                                                 chunk_start, chunk_end)) {
-                error_report("cannot get lkey!");
+                error_setg(errp, "cannot get lkey!");
                 return -1;
             }
         }
@@ -2208,7 +2205,7 @@ retry:
         if (qemu_rdma_register_and_get_keys(rdma, block, sge.addr,
                                                      &sge.lkey, NULL, chunk,
                                                      chunk_start, chunk_end)) {
-            error_report("cannot get lkey!");
+            error_setg(errp, "cannot get lkey!");
             return -1;
         }
     }
@@ -2242,7 +2239,7 @@ retry:
         trace_qemu_rdma_write_one_queue_full();
         ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_RDMA_WRITE, NULL);
         if (ret < 0) {
-            error_report("rdma migration: failed to make "
+            error_setg(errp, "rdma migration: failed to make "
                          "room in full send queue!");
             return -1;
         }
@@ -2250,12 +2247,8 @@ retry:
         goto retry;
 
     } else if (ret > 0) {
-        /*
-         * FIXME perror() is problematic, because whether
-         * ibv_post_send() sets errno is unclear.  Will go away later
-         * in this series.
-         */
-        perror("rdma migration: post rdma write failed");
+        error_setg_errno(errp, ret,
+                         "rdma migration: post rdma write failed");
         return -1;
     }
 
@@ -2291,11 +2284,10 @@ static int qemu_rdma_write_flush(RDMAContext *rdma, Error **errp)
         return 0;
     }
 
-    ret = qemu_rdma_write_one(rdma,
-            rdma->current_index, rdma->current_addr, rdma->current_length);
+    ret = qemu_rdma_write_one(rdma, rdma->current_index, rdma->current_addr,
+                              rdma->current_length, errp);
 
     if (ret < 0) {
-        error_setg(errp, "FIXME temporary error message");
         return -1;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 48/65] migration/rdma: Convert qemu_rdma_write() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (46 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 47/65] migration/rdma: Convert qemu_rdma_write_one() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 49/65] migration/rdma: Convert qemu_rdma_post_send_control() " Juan Quintela
                   ` (17 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Just for consistency with qemu_rdma_write_one() and
qemu_rdma_write_flush(), and for slightly simpler code.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-42-armbru@redhat.com>
---
 migration/rdma.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 369d30c895..9f45f6a91d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2359,9 +2359,8 @@ static inline bool qemu_rdma_buffer_mergeable(RDMAContext *rdma,
  */
 static int qemu_rdma_write(RDMAContext *rdma,
                            uint64_t block_offset, uint64_t offset,
-                           uint64_t len)
+                           uint64_t len, Error **errp)
 {
-    Error *err = NULL;
     uint64_t current_addr = block_offset + offset;
     uint64_t index = rdma->current_index;
     uint64_t chunk = rdma->current_chunk;
@@ -2369,9 +2368,8 @@ static int qemu_rdma_write(RDMAContext *rdma,
 
     /* If we cannot merge it, we flush the current buffer first. */
     if (!qemu_rdma_buffer_mergeable(rdma, current_addr, len)) {
-        ret = qemu_rdma_write_flush(rdma, &err);
+        ret = qemu_rdma_write_flush(rdma, errp);
         if (ret < 0) {
-            error_report_err(err);
             return -1;
         }
         rdma->current_length = 0;
@@ -2388,10 +2386,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
 
     /* flush it if buffer is too large */
     if (rdma->current_length >= RDMA_MERGE_MAX) {
-        if (qemu_rdma_write_flush(rdma, &err) < 0) {
-            error_report_err(err);
-            return -1;
-        }
+        return qemu_rdma_write_flush(rdma, errp);
     }
 
     return 0;
@@ -3290,6 +3285,7 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
                                ram_addr_t offset, size_t size)
 {
     QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(qemu_file_get_ioc(f));
+    Error *err = NULL;
     RDMAContext *rdma;
     int ret;
 
@@ -3315,9 +3311,9 @@ static int qemu_rdma_save_page(QEMUFile *f, ram_addr_t block_offset,
      * is full, or the page doesn't belong to the current chunk,
      * an actual RDMA write will occur and a new chunk will be formed.
      */
-    ret = qemu_rdma_write(rdma, block_offset, offset, size);
+    ret = qemu_rdma_write(rdma, block_offset, offset, size, &err);
     if (ret < 0) {
-        error_report("rdma migration: write error");
+        error_report_err(err);
         goto err;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 49/65] migration/rdma: Convert qemu_rdma_post_send_control() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (47 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 48/65] migration/rdma: Convert qemu_rdma_write() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 50/65] migration/rdma: Convert qemu_rdma_post_recv_control() " Juan Quintela
                   ` (16 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_exchange_send() violates this principle: it calls
error_report() via qemu_rdma_post_send_control().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_post_send_control() to Error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-43-armbru@redhat.com>
---
 migration/rdma.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9f45f6a91d..aeb0a8921e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1741,7 +1741,8 @@ err_block_for_wrid:
  * containing some data and block until the post completes.
  */
 static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
-                                       RDMAControlHeader *head)
+                                       RDMAControlHeader *head,
+                                       Error **errp)
 {
     int ret;
     RDMAWorkRequestData *wr = &rdma->wr_data[RDMA_WRID_CONTROL];
@@ -1781,13 +1782,13 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
     ret = ibv_post_send(rdma->qp, &send_wr, &bad_wr);
 
     if (ret > 0) {
-        error_report("Failed to use post IB SEND for control");
+        error_setg(errp, "Failed to use post IB SEND for control");
         return -1;
     }
 
     ret = qemu_rdma_block_for_wrid(rdma, RDMA_WRID_SEND_CONTROL, NULL);
     if (ret < 0) {
-        error_report("rdma migration: send polling control error");
+        error_setg(errp, "rdma migration: send polling control error");
         return -1;
     }
 
@@ -1945,10 +1946,9 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Deliver the control message that was requested.
      */
-    ret = qemu_rdma_post_send_control(rdma, data, head);
+    ret = qemu_rdma_post_send_control(rdma, data, head, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2002,10 +2002,9 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Inform the source that we're ready to receive a message.
      */
-    ret = qemu_rdma_post_send_control(rdma, NULL, &ready);
+    ret = qemu_rdma_post_send_control(rdma, NULL, &ready, errp);
 
     if (ret < 0) {
-        error_setg(errp, "Failed to send control buffer!");
         return -1;
     }
 
@@ -2394,6 +2393,7 @@ static int qemu_rdma_write(RDMAContext *rdma,
 
 static void qemu_rdma_cleanup(RDMAContext *rdma)
 {
+    Error *err = NULL;
     int idx;
 
     if (rdma->cm_id && rdma->connected) {
@@ -2405,7 +2405,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .repeat = 1,
                                      };
             error_report("Early error. Sending error.");
-            qemu_rdma_post_send_control(rdma, NULL, &head);
+            if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
+                error_report_err(err);
+            }
         }
 
         rdma_disconnect(rdma->cm_id);
@@ -3705,10 +3707,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
 
 
             ret = qemu_rdma_post_send_control(rdma,
-                                        (uint8_t *) rdma->dest_blocks, &blocks);
+                                    (uint8_t *) rdma->dest_blocks, &blocks,
+                                    &err);
 
             if (ret < 0) {
-                error_report("rdma migration: error sending remote info");
+                error_report_err(err);
                 goto err;
             }
 
@@ -3783,10 +3786,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
             }
 
             ret = qemu_rdma_post_send_control(rdma,
-                            (uint8_t *) results, &reg_resp);
+                            (uint8_t *) results, &reg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
@@ -3818,10 +3821,10 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                                                        reg->key.chunk);
             }
 
-            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp);
+            ret = qemu_rdma_post_send_control(rdma, NULL, &unreg_resp, &err);
 
             if (ret < 0) {
-                error_report("Failed to send control buffer");
+                error_report_err(err);
                 goto err;
             }
             break;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 50/65] migration/rdma: Convert qemu_rdma_post_recv_control() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (48 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 49/65] migration/rdma: Convert qemu_rdma_post_send_control() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 51/65] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Juan Quintela
                   ` (15 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Just for symmetry with qemu_rdma_post_send_control().  Error messages
lose detail I consider of no use to users.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-44-armbru@redhat.com>
---
 migration/rdma.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index aeb0a8921e..41ea2edcda 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1799,7 +1799,8 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, uint8_t *buf,
  * Post a RECV work request in anticipation of some future receipt
  * of data on the control channel.
  */
-static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
+static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx,
+                                       Error **errp)
 {
     struct ibv_recv_wr *bad_wr;
     struct ibv_sge sge = {
@@ -1816,6 +1817,7 @@ static int qemu_rdma_post_recv_control(RDMAContext *rdma, int idx)
 
 
     if (ibv_post_recv(rdma->qp, &recv_wr, &bad_wr)) {
+        error_setg(errp, "error posting control recv");
         return -1;
     }
 
@@ -1926,10 +1928,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
      * If the user is expecting a response, post a WR in anticipation of it.
      */
     if (resp) {
-        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA);
+        ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_DATA, errp);
         if (ret < 0) {
-            error_setg(errp, "rdma migration: error posting"
-                    " extra control recv for anticipated result!");
             return -1;
         }
     }
@@ -1937,9 +1937,8 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a WR to replace the one we just consumed for the READY message.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting first control recv!");
         return -1;
     }
 
@@ -2023,9 +2022,8 @@ static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader *head,
     /*
      * Post a new RECV work request to replace the one we just consumed.
      */
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "rdma migration: error posting second control recv!");
         return -1;
     }
 
@@ -2608,9 +2606,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     caps_to_network(&cap);
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: posting second control recv");
         goto err_rdma_source_connect;
     }
 
@@ -3402,6 +3399,7 @@ static void rdma_cm_poll_handler(void *opaque)
 
 static int qemu_rdma_accept(RDMAContext *rdma)
 {
+    Error *err = NULL;
     RDMACapabilities cap;
     struct rdma_conn_param conn_param = {
                                             .responder_resources = 2,
@@ -3538,9 +3536,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
     rdma_ack_cm_event(cm_event);
     rdma->connected = true;
 
-    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+    ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY, &err);
     if (ret < 0) {
-        error_report("rdma migration: error posting second control recv");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 51/65] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (49 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 50/65] migration/rdma: Convert qemu_rdma_post_recv_control() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 52/65] migration/rdma: Silence qemu_rdma_resolve_host() Juan Quintela
                   ` (14 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() violates this principle: it calls
error_report() via qemu_rdma_alloc_pd_cq().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.

The conversion loses a piece of advice on one of two failure paths:

    Your mlock() limits may be too low. Please check $ ulimit -a # and search for 'ulimit -l' in the output

Not worth retaining.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-45-armbru@redhat.com>
---
 migration/rdma.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 41ea2edcda..ee9221d5d2 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1052,19 +1052,19 @@ err_resolve_create_id:
 /*
  * Create protection domain and completion queues
  */
-static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
+static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma, Error **errp)
 {
     /* allocate pd */
     rdma->pd = ibv_alloc_pd(rdma->verbs);
     if (!rdma->pd) {
-        error_report("failed to allocate protection domain");
+        error_setg(errp, "failed to allocate protection domain");
         return -1;
     }
 
     /* create receive completion channel */
     rdma->recv_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->recv_comp_channel) {
-        error_report("failed to allocate receive completion channel");
+        error_setg(errp, "failed to allocate receive completion channel");
         goto err_alloc_pd_cq;
     }
 
@@ -1074,21 +1074,21 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
     rdma->recv_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->recv_comp_channel, 0);
     if (!rdma->recv_cq) {
-        error_report("failed to allocate receive completion queue");
+        error_setg(errp, "failed to allocate receive completion queue");
         goto err_alloc_pd_cq;
     }
 
     /* create send completion channel */
     rdma->send_comp_channel = ibv_create_comp_channel(rdma->verbs);
     if (!rdma->send_comp_channel) {
-        error_report("failed to allocate send completion channel");
+        error_setg(errp, "failed to allocate send completion channel");
         goto err_alloc_pd_cq;
     }
 
     rdma->send_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
                                   NULL, rdma->send_comp_channel, 0);
     if (!rdma->send_cq) {
-        error_report("failed to allocate send completion queue");
+        error_setg(errp, "failed to allocate send completion queue");
         goto err_alloc_pd_cq;
     }
 
@@ -2503,12 +2503,8 @@ static int qemu_rdma_source_init(RDMAContext *rdma, bool pin_all, Error **errp)
         goto err_rdma_source_init;
     }
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, errp);
     if (ret < 0) {
-        error_setg(errp, "RDMA ERROR: "
-                   "rdma migration: error allocating pd and cq! Your mlock()"
-                   " limits may be too low. Please check $ ulimit -a # and "
-                   "search for 'ulimit -l' in the output");
         goto err_rdma_source_init;
     }
 
@@ -3482,9 +3478,9 @@ static int qemu_rdma_accept(RDMAContext *rdma)
 
     qemu_rdma_dump_id("dest_init", verbs);
 
-    ret = qemu_rdma_alloc_pd_cq(rdma);
+    ret = qemu_rdma_alloc_pd_cq(rdma, &err);
     if (ret < 0) {
-        error_report("rdma migration: error allocating pd and cq!");
+        error_report_err(err);
         goto err_rdma_dest_wait;
     }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 52/65] migration/rdma: Silence qemu_rdma_resolve_host()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (50 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 51/65] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 53/65] migration/rdma: Silence qemu_rdma_connect() Juan Quintela
                   ` (13 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_resolve_host() violates this principle: it calls
error_report().

Clean this up: drop error_report().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-46-armbru@redhat.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ee9221d5d2..3c7a407d25 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1009,7 +1009,6 @@ route:
         error_setg(errp,
                    "RDMA ERROR: result not equal to event_addr_resolved %s",
                    rdma_event_str(cm_event->event));
-        error_report("rdma_resolve_addr");
         rdma_ack_cm_event(cm_event);
         goto err_resolve_get_addr;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 53/65] migration/rdma: Silence qemu_rdma_connect()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (51 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 52/65] migration/rdma: Silence qemu_rdma_resolve_host() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 54/65] migration/rdma: Silence qemu_rdma_reg_control() Juan Quintela
                   ` (12 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_connect() violates this principle: it calls error_report()
and perror().  I elected not to investigate how callers handle the
error, i.e. precise impact is not known.

Clean this up: replace perror() by changing error_setg() to
error_setg_errno(), and drop error_report().  I believe the callers'
error reports suffice then.  If they don't, we need to convert to
Error instead.

Bonus: resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-47-armbru@redhat.com>
---
 migration/rdma.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 3c7a407d25..8e1e8c4d47 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2608,8 +2608,8 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
 
     ret = rdma_connect(rdma->cm_id, &conn_param);
     if (ret < 0) {
-        perror("rdma_connect");
-        error_setg(errp, "RDMA ERROR: connecting to destination!");
+        error_setg_errno(errp, errno,
+                         "RDMA ERROR: connecting to destination!");
         goto err_rdma_source_connect;
     }
 
@@ -2618,21 +2618,15 @@ static int qemu_rdma_connect(RDMAContext *rdma, bool return_path,
     } else {
         ret = rdma_get_cm_event(rdma->channel, &cm_event);
         if (ret < 0) {
-            error_setg(errp, "RDMA ERROR: failed to get cm event");
+            error_setg_errno(errp, errno,
+                             "RDMA ERROR: failed to get cm event");
         }
     }
     if (ret < 0) {
-        /*
-         * FIXME perror() is wrong, because
-         * qemu_get_cm_event_timeout() can fail without setting errno.
-         * Will go away later in this series.
-         */
-        perror("rdma_get_cm_event after rdma_connect");
         goto err_rdma_source_connect;
     }
 
     if (cm_event->event != RDMA_CM_EVENT_ESTABLISHED) {
-        error_report("rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect");
         error_setg(errp, "RDMA ERROR: connecting to destination!");
         rdma_ack_cm_event(cm_event);
         goto err_rdma_source_connect;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 54/65] migration/rdma: Silence qemu_rdma_reg_control()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (52 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 53/65] migration/rdma: Silence qemu_rdma_connect() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 55/65] migration/rdma: Don't report received completion events as error Juan Quintela
                   ` (11 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
they call error_report() via qemu_rdma_reg_control().  I elected not
to investigate how callers handle the error, i.e. precise impact is
not known.

Clean this up by dropping the error reporting from
qemu_rdma_reg_control().  I believe the callers' error reports
suffice.  If they don't, we need to convert to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-48-armbru@redhat.com>
---
 migration/rdma.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 8e1e8c4d47..6a8ff8fa96 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1360,7 +1360,6 @@ static int qemu_rdma_reg_control(RDMAContext *rdma, int idx)
         rdma->total_registrations++;
         return 0;
     }
-    error_report("qemu_rdma_reg_control failed");
     return -1;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 55/65] migration/rdma: Don't report received completion events as error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (53 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 54/65] migration/rdma: Silence qemu_rdma_reg_control() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 56/65] migration/rdma: Silence qemu_rdma_block_for_wrid() Juan Quintela
                   ` (10 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

When qemu_rdma_wait_comp_channel() receives an event from the
completion channel, it reports an error "receive cm event while wait
comp channel,cm event is T", where T is the numeric event type.
However, the function fails only when T is a disconnect or device
removal.  Events other than these two are not actually an error, and
reporting them as an error is wrong.  If we need to report them to the
user, we should use something else, and what to use depends on why we
need to report them to the user.

For now, report this error only when the function actually fails.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-49-armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6a8ff8fa96..b5bb47108e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1582,11 +1582,11 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                         return -1;
                     }
 
-                    error_report("receive cm event while wait comp channel,"
-                                 "cm event is %d", cm_event->event);
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
+                        error_report("receive cm event while wait comp channel,"
+                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 56/65] migration/rdma: Silence qemu_rdma_block_for_wrid()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (54 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 55/65] migration/rdma: Don't report received completion events as error Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 57/65] migration/rdma: Silence qemu_rdma_register_and_get_keys() Juan Quintela
                   ` (9 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
qemu_rdma_write_one() violate this principle: they call
error_report(), fprintf(stderr, ...), and perror() via
qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
qemu_rdma_wait_comp_channel().  I elected not to investigate how
callers handle the error, i.e. precise impact is not known.

Clean this up by dropping the error reporting from qemu_rdma_poll(),
qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid().  I
believe the callers' error reports suffice.  If they don't, we need to
convert to Error instead.

Bonus: resolves a FIXME about problematic use of errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-50-armbru@redhat.com>
---
 migration/rdma.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index b5bb47108e..459dcb002e 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1483,17 +1483,12 @@ static int qemu_rdma_poll(RDMAContext *rdma, struct ibv_cq *cq,
     }
 
     if (ret < 0) {
-        error_report("ibv_poll_cq failed");
         return -1;
     }
 
     wr_id = wc.wr_id & RDMA_WRID_TYPE_MASK;
 
     if (wc.status != IBV_WC_SUCCESS) {
-        fprintf(stderr, "ibv_poll_cq wc.status=%d %s!\n",
-                        wc.status, ibv_wc_status_str(wc.status));
-        fprintf(stderr, "ibv_poll_cq wrid=%" PRIu64 "!\n", wr_id);
-
         return -1;
     }
 
@@ -1577,16 +1572,12 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
                 if (pfds[1].revents) {
                     ret = rdma_get_cm_event(rdma->channel, &cm_event);
                     if (ret < 0) {
-                        error_report("failed to get cm event while wait "
-                                     "completion channel");
                         return -1;
                     }
 
                     if (cm_event->event == RDMA_CM_EVENT_DISCONNECTED ||
                         cm_event->event == RDMA_CM_EVENT_DEVICE_REMOVAL) {
                         rdma_ack_cm_event(cm_event);
-                        error_report("receive cm event while wait comp channel,"
-                                     "cm event is %d", cm_event->event);
                         return -1;
                     }
                     rdma_ack_cm_event(cm_event);
@@ -1599,7 +1590,6 @@ static int qemu_rdma_wait_comp_channel(RDMAContext *rdma,
             default: /* Error of some type -
                       * I don't trust errno from qemu_poll_ns
                      */
-                error_report("%s: poll failed", __func__);
                 return -1;
             }
 
@@ -1683,12 +1673,6 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
 
         ret = ibv_get_cq_event(ch, &cq, &cq_ctx);
         if (ret < 0) {
-            /*
-             * FIXME perror() is problematic, because ibv_reg_mr() is
-             * not documented to set errno.  Will go away later in
-             * this series.
-             */
-            perror("ibv_get_cq_event");
             goto err_block_for_wrid;
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 57/65] migration/rdma: Silence qemu_rdma_register_and_get_keys()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (55 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 56/65] migration/rdma: Silence qemu_rdma_block_for_wrid() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 58/65] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Juan Quintela
                   ` (8 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_write_one() violates this principle: it reports errors to
stderr via qemu_rdma_register_and_get_keys().  I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.

Clean this up: silence qemu_rdma_register_and_get_keys().  I believe
the caller's error reports suffice.  If they don't, we need to convert
to Error instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-51-armbru@redhat.com>
---
 migration/rdma.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 459dcb002e..025523bf70 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1325,15 +1325,6 @@ static int qemu_rdma_register_and_get_keys(RDMAContext *rdma,
         }
     }
     if (!block->pmr[chunk]) {
-        perror("Failed to register chunk!");
-        fprintf(stderr, "Chunk details: block: %d chunk index %d"
-                        " start %" PRIuPTR " end %" PRIuPTR
-                        " host %" PRIuPTR
-                        " local %" PRIuPTR " registrations: %d\n",
-                        block->index, chunk, (uintptr_t)chunk_start,
-                        (uintptr_t)chunk_end, host_addr,
-                        (uintptr_t)block->local_host_addr,
-                        rdma->total_registrations);
         return -1;
     }
     rdma->total_registrations++;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 58/65] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (56 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 57/65] migration/rdma: Silence qemu_rdma_register_and_get_keys() Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 59/65] migration/rdma: Use error_report() & friends instead of stderr Juan Quintela
                   ` (7 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job.  When the caller does, the error is reported twice.  When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.

qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().

Moreover, qemu_rdma_cleanup() can't fail.  It is called on error
paths, and QIOChannel close and finalization.  Are the conditions it
reports really errors?  I doubt it.

Downgrade qemu_rdma_cleanup()'s errors to warnings.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-52-armbru@redhat.com>
---
 migration/rdma.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 025523bf70..9d5b3b76eb 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2375,9 +2375,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
                                        .type = RDMA_CONTROL_ERROR,
                                        .repeat = 1,
                                      };
-            error_report("Early error. Sending error.");
+            warn_report("Early error. Sending error.");
             if (qemu_rdma_post_send_control(rdma, NULL, &head, &err) < 0) {
-                error_report_err(err);
+                warn_report_err(err);
             }
         }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 59/65] migration/rdma: Use error_report() & friends instead of stderr
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (57 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 58/65] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 60/65] migration/rdma: Replace flawed device detail dump by tracing Juan Quintela
                   ` (6 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

error_report() obeys -msg, reports the current error location if any,
and reports to the current monitor if any.  Reporting to stderr
directly with fprintf() or perror() is wrong, because it loses all
this.

Fix the offenders.  Bonus: resolves a FIXME about problematic use of
errno.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-53-armbru@redhat.com>
---
 migration/rdma.c | 44 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 9d5b3b76eb..ab2ea85c45 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -877,12 +877,12 @@ static int qemu_rdma_broken_ipv6_kernel(struct ibv_context *verbs, Error **errp)
 
         if (roce_found) {
             if (ib_found) {
-                fprintf(stderr, "WARN: migrations may fail:"
-                                " IPv6 over RoCE / iWARP in linux"
-                                " is broken. But since you appear to have a"
-                                " mixed RoCE / IB environment, be sure to only"
-                                " migrate over the IB fabric until the kernel "
-                                " fixes the bug.\n");
+                warn_report("migrations may fail:"
+                            " IPv6 over RoCE / iWARP in linux"
+                            " is broken. But since you appear to have a"
+                            " mixed RoCE / IB environment, be sure to only"
+                            " migrate over the IB fabric until the kernel "
+                            " fixes the bug.");
             } else {
                 error_setg(errp, "RDMA ERROR: "
                            "You only have RoCE / iWARP devices in your systems"
@@ -1418,12 +1418,8 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
         block->remote_keys[chunk] = 0;
 
         if (ret != 0) {
-            /*
-             * FIXME perror() is problematic, bcause ibv_dereg_mr() is
-             * not documented to set errno.  Will go away later in
-             * this series.
-             */
-            perror("unregistration chunk failed");
+            error_report("unregistration chunk failed: %s",
+                         strerror(ret));
             return -1;
         }
         rdma->total_registrations--;
@@ -3772,7 +3768,8 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
                 block->pmr[reg->key.chunk] = NULL;
 
                 if (ret != 0) {
-                    perror("rdma unregistration chunk failed");
+                    error_report("rdma unregistration chunk failed: %s",
+                                 strerror(errno));
                     goto err;
                 }
 
@@ -3961,10 +3958,10 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
          */
 
         if (local->nb_blocks != nb_dest_blocks) {
-            fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) "
-                    "Your QEMU command line parameters are probably "
-                    "not identical on both the source and destination.",
-                    local->nb_blocks, nb_dest_blocks);
+            error_report("ram blocks mismatch (Number of blocks %d vs %d)",
+                         local->nb_blocks, nb_dest_blocks);
+            error_printf("Your QEMU command line parameters are probably "
+                         "not identical on both the source and destination.");
             rdma->errored = true;
             return -1;
         }
@@ -3977,10 +3974,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f,
 
             /* We require that the blocks are in the same order */
             if (rdma->dest_blocks[i].length != local->block[i].length) {
-                fprintf(stderr, "Block %s/%d has a different length %" PRIu64
-                        "vs %" PRIu64, local->block[i].block_name, i,
-                        local->block[i].length,
-                        rdma->dest_blocks[i].length);
+                error_report("Block %s/%d has a different length %" PRIu64
+                             "vs %" PRIu64,
+                             local->block[i].block_name, i,
+                             local->block[i].length,
+                             rdma->dest_blocks[i].length);
                 rdma->errored = true;
                 return -1;
             }
@@ -4096,7 +4094,7 @@ static void rdma_accept_incoming_migration(void *opaque)
     ret = qemu_rdma_accept(rdma);
 
     if (ret < 0) {
-        fprintf(stderr, "RDMA ERROR: Migration initialization failed\n");
+        error_report("RDMA ERROR: Migration initialization failed");
         return;
     }
 
@@ -4108,7 +4106,7 @@ static void rdma_accept_incoming_migration(void *opaque)
 
     f = rdma_new_input(rdma);
     if (f == NULL) {
-        fprintf(stderr, "RDMA ERROR: could not open RDMA for input\n");
+        error_report("RDMA ERROR: could not open RDMA for input");
         qemu_rdma_cleanup(rdma);
         return;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 60/65] migration/rdma: Replace flawed device detail dump by tracing
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (58 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 59/65] migration/rdma: Use error_report() & friends instead of stderr Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:21 ` [PULL 61/65] migration: Display error in query-migrate irrelevant of status Juan Quintela
                   ` (5 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Markus Armbruster <armbru@redhat.com>

qemu_rdma_dump_id() dumps RDMA device details to stdout.

rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
and qemu_rdma_resolve_host() to show source device details.
rdma_start_incoming_migration() arranges its call via
rdma_accept_incoming_migration() and qemu_rdma_accept() to show
destination device details.

Two issues:

1. rdma_start_outgoing_migration() can run in HMP context.  The
   information should arguably go the monitor, not stdout.

2. ibv_query_port() failure is reported as error.  Its callers remain
   unaware of this failure (qemu_rdma_dump_id() can't fail), so
   reporting this to the user as an error is problematic.

Fixable, but the device detail dump is noise, except when
troubleshooting.  Tracing is a better fit.  Similar function
qemu_rdma_dump_id() was converted to tracing in commit
733252deb8b (Tracify migration/rdma.c).

Convert qemu_rdma_dump_id(), too.

While there, touch up qemu_rdma_dump_gid()'s outdated comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
---
 migration/rdma.c       | 23 ++++++++---------------
 migration/trace-events |  2 ++
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index ab2ea85c45..f6fc226c9b 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -734,38 +734,31 @@ static void rdma_delete_block(RDMAContext *rdma, RDMALocalBlock *block)
 }
 
 /*
- * Put in the log file which RDMA device was opened and the details
- * associated with that device.
+ * Trace RDMA device open, with device details.
  */
 static void qemu_rdma_dump_id(const char *who, struct ibv_context *verbs)
 {
     struct ibv_port_attr port;
 
     if (ibv_query_port(verbs, 1, &port)) {
-        error_report("Failed to query port information");
+        trace_qemu_rdma_dump_id_failed(who);
         return;
     }
 
-    printf("%s RDMA Device opened: kernel name %s "
-           "uverbs device name %s, "
-           "infiniband_verbs class device path %s, "
-           "infiniband class device path %s, "
-           "transport: (%d) %s\n",
-                who,
+    trace_qemu_rdma_dump_id(who,
                 verbs->device->name,
                 verbs->device->dev_name,
                 verbs->device->dev_path,
                 verbs->device->ibdev_path,
                 port.link_layer,
-                (port.link_layer == IBV_LINK_LAYER_INFINIBAND) ? "Infiniband" :
-                 ((port.link_layer == IBV_LINK_LAYER_ETHERNET)
-                    ? "Ethernet" : "Unknown"));
+                port.link_layer == IBV_LINK_LAYER_INFINIBAND ? "Infiniband"
+                : port.link_layer == IBV_LINK_LAYER_ETHERNET ? "Ethernet"
+                : "Unknown");
 }
 
 /*
- * Put in the log file the RDMA gid addressing information,
- * useful for folks who have trouble understanding the
- * RDMA device hierarchy in the kernel.
+ * Trace RDMA gid addressing information.
+ * Useful for understanding the RDMA device hierarchy in the kernel.
  */
 static void qemu_rdma_dump_gid(const char *who, struct rdma_cm_id *id)
 {
diff --git a/migration/trace-events b/migration/trace-events
index 6a50994402..ee9c8f4d63 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -214,6 +214,8 @@ qemu_rdma_close(void) ""
 qemu_rdma_connect_pin_all_requested(void) ""
 qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
 qemu_rdma_dest_init_trying(const char *host, const char *ip) "%s => %s"
+qemu_rdma_dump_id_failed(const char *who) "%s RDMA Device opened, but can't query port information"
+qemu_rdma_dump_id(const char *who, const char *name, const char *dev_name, const char *dev_path, const char *ibdev_path, int transport, const char *transport_name) "%s RDMA Device opened: kernel name %s uverbs device name %s, infiniband_verbs class device path %s, infiniband class device path %s, transport: (%d) %s"
 qemu_rdma_dump_gid(const char *who, const char *src, const char *dst) "%s Source GID: %s, Dest GID: %s"
 qemu_rdma_exchange_get_response_start(const char *desc) "CONTROL: %s receiving..."
 qemu_rdma_exchange_get_response_none(const char *desc, int type) "Surprise: got %s (%d)"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 61/65] migration: Display error in query-migrate irrelevant of status
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (59 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 60/65] migration/rdma: Replace flawed device detail dump by tracing Juan Quintela
@ 2023-10-11  9:21 ` Juan Quintela
  2023-10-11  9:22 ` [PULL 62/65] migration: Introduce migrate_has_error() Juan Quintela
                   ` (4 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:21 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

Display it as long as being set, irrelevant of FAILED status.  E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.

The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.

This will change QAPI behavior by showing up error message outside !FAILED
status, but it's intended and doesn't expect to break anyone.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-2-peterx@redhat.com>
---
 qapi/migration.json   | 5 ++---
 migration/migration.c | 8 +++++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index d8f3bbd7b0..d7dfaa5db9 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -230,9 +230,8 @@
 #     throttled during auto-converge.  This is only present when
 #     auto-converge has started throttling guest cpus.  (Since 2.7)
 #
-# @error-desc: the human readable error description string, when
-#     @status is 'failed'. Clients should not attempt to parse the
-#     error strings.  (Since 2.7)
+# @error-desc: the human readable error description string. Clients
+#     should not attempt to parse the error strings.  (Since 2.7)
 #
 # @postcopy-blocktime: total time when all vCPU were blocked during
 #     postcopy live migration.  This is only present when the
diff --git a/migration/migration.c b/migration/migration.c
index 2057e42134..57f9e9ed0c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1060,9 +1060,6 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     case MIGRATION_STATUS_FAILED:
         info->has_status = true;
-        if (s->error) {
-            info->error_desc = g_strdup(error_get_pretty(s->error));
-        }
         break;
     case MIGRATION_STATUS_CANCELLED:
         info->has_status = true;
@@ -1072,6 +1069,11 @@ static void fill_source_migration_info(MigrationInfo *info)
         break;
     }
     info->status = state;
+
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    if (s->error) {
+        info->error_desc = g_strdup(error_get_pretty(s->error));
+    }
 }
 
 static void fill_destination_migration_info(MigrationInfo *info)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 62/65] migration: Introduce migrate_has_error()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (60 preceding siblings ...)
  2023-10-11  9:21 ` [PULL 61/65] migration: Display error in query-migrate irrelevant of status Juan Quintela
@ 2023-10-11  9:22 ` Juan Quintela
  2023-10-11  9:22 ` [PULL 63/65] qemufile: Always return a verbose error Juan Quintela
                   ` (3 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

Introduce a helper to detect whether MigrationState.error is set for
whatever reason.

This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-3-peterx@redhat.com>
---
 migration/migration.h | 1 +
 migration/migration.c | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/migration/migration.h b/migration/migration.h
index 972597f4de..4106a1dc54 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -476,6 +476,7 @@ bool  migration_has_all_channels(void);
 uint64_t migrate_max_downtime(void);
 
 void migrate_set_error(MigrationState *s, const Error *error);
+bool migrate_has_error(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s, Error *error_in);
 
diff --git a/migration/migration.c b/migration/migration.c
index 57f9e9ed0c..409eb3e916 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1234,6 +1234,13 @@ void migrate_set_error(MigrationState *s, const Error *error)
     }
 }
 
+bool migrate_has_error(MigrationState *s)
+{
+    /* The lock is not helpful here, but still follow the rule */
+    QEMU_LOCK_GUARD(&s->error_mutex);
+    return qatomic_read(&s->error);
+}
+
 static void migrate_error_free(MigrationState *s)
 {
     QEMU_LOCK_GUARD(&s->error_mutex);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 63/65] qemufile: Always return a verbose error
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (61 preceding siblings ...)
  2023-10-11  9:22 ` [PULL 62/65] migration: Introduce migrate_has_error() Juan Quintela
@ 2023-10-11  9:22 ` Juan Quintela
  2023-10-11  9:22 ` [PULL 64/65] migration: Remember num of ramblocks to sync during recovery Juan Quintela
                   ` (2 subsequent siblings)
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

There're a lot of cases where we only have an errno set in last_error but
without a detailed error description.  When this happens, try to generate
an error contains the errno as a descriptive error.

This will be helpful in cases where one relies on the Error*.  E.g.,
migration state only caches Error* in MigrationState.error.  With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-6-peterx@redhat.com>
---
 migration/qemu-file.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 5e8207dae4..7fb659296f 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -142,15 +142,24 @@ void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
  *
  * Return negative error value if there has been an error on previous
  * operations, return 0 if no error happened.
- * Optional, it returns Error* in errp, but it may be NULL even if return value
- * is not 0.
  *
+ * If errp is specified, a verbose error message will be copied over.
  */
 static int qemu_file_get_error_obj(QEMUFile *f, Error **errp)
 {
+    if (!f->last_error) {
+        return 0;
+    }
+
+    /* There is an error */
     if (errp) {
-        *errp = f->last_error_obj ? error_copy(f->last_error_obj) : NULL;
+        if (f->last_error_obj) {
+            *errp = error_copy(f->last_error_obj);
+        } else {
+            error_setg_errno(errp, -f->last_error, "Channel error");
+        }
     }
+
     return f->last_error;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 64/65] migration: Remember num of ramblocks to sync during recovery
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (62 preceding siblings ...)
  2023-10-11  9:22 ` [PULL 63/65] qemufile: Always return a verbose error Juan Quintela
@ 2023-10-11  9:22 ` Juan Quintela
  2023-10-11  9:22 ` [PULL 65/65] migration: Add migration_rp_wait|kick() Juan Quintela
  2023-10-11 17:04 ` [PULL 00/65] Migration 20231011 patches Stefan Hajnoczi
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.

rp_sem will be further reused in follow up patches, as a way to kick the
main thread, e.g., on recovery failures.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-7-peterx@redhat.com>
---
 migration/ram.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index e4bfd39f08..6c40d9af0c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -394,6 +394,14 @@ struct RAMState {
     /* Queue of outstanding page requests from the destination */
     QemuMutex src_page_req_mutex;
     QSIMPLEQ_HEAD(, RAMSrcPageRequest) src_page_requests;
+
+    /*
+     * This is only used when postcopy is in recovery phase, to communicate
+     * between the migration thread and the return path thread on dirty
+     * bitmap synchronizations.  This field is unused in other stages of
+     * RAM migration.
+     */
+    unsigned int postcopy_bmap_sync_requested;
 };
 typedef struct RAMState RAMState;
 
@@ -4119,20 +4127,20 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
 {
     RAMBlock *block;
     QEMUFile *file = s->to_dst_file;
-    int ramblock_count = 0;
 
     trace_ram_dirty_bitmap_sync_start();
 
+    qatomic_set(&rs->postcopy_bmap_sync_requested, 0);
     RAMBLOCK_FOREACH_NOT_IGNORED(block) {
         qemu_savevm_send_recv_bitmap(file, block->idstr);
         trace_ram_dirty_bitmap_request(block->idstr);
-        ramblock_count++;
+        qatomic_inc(&rs->postcopy_bmap_sync_requested);
     }
 
     trace_ram_dirty_bitmap_sync_wait();
 
     /* Wait until all the ramblocks' dirty bitmap synced */
-    while (ramblock_count--) {
+    while (qatomic_read(&rs->postcopy_bmap_sync_requested)) {
         qemu_sem_wait(&s->rp_state.rp_sem);
     }
 
@@ -4159,6 +4167,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
     uint64_t local_size = DIV_ROUND_UP(nbits, 8);
     uint64_t size, end_mark;
+    RAMState *rs = ram_state;
 
     trace_ram_dirty_bitmap_reload_begin(block->idstr);
 
@@ -4225,6 +4234,8 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     /* We'll recalculate migration_dirty_pages in ram_state_resume_prepare(). */
     trace_ram_dirty_bitmap_reload_complete(block->idstr);
 
+    qatomic_dec(&rs->postcopy_bmap_sync_requested);
+
     /*
      * We succeeded to sync bitmap for current ramblock. If this is
      * the last one to sync, we need to notify the main send thread.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PULL 65/65] migration: Add migration_rp_wait|kick()
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (63 preceding siblings ...)
  2023-10-11  9:22 ` [PULL 64/65] migration: Remember num of ramblocks to sync during recovery Juan Quintela
@ 2023-10-11  9:22 ` Juan Quintela
  2023-10-11 17:04 ` [PULL 00/65] Migration 20231011 patches Stefan Hajnoczi
  65 siblings, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11  9:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake,
	Fabiano Rosas

From: Peter Xu <peterx@redhat.com>

It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used.  Prepared to be used even for other things.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20231004220240.167175-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 migration/migration.h | 15 +++++++++++++++
 migration/migration.c | 14 ++++++++++++--
 migration/ram.c       | 16 +++++++---------
 3 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 4106a1dc54..cd5534337c 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -316,6 +316,12 @@ struct MigrationState {
          * be cleared in the rp_thread!
          */
         bool          rp_thread_created;
+        /*
+         * Used to synchronize between migration main thread and return
+         * path thread.  The migration thread can wait() on this sem, while
+         * other threads (e.g., return path thread) can kick it using a
+         * post().
+         */
         QemuSemaphore rp_sem;
         /*
          * We post to this when we got one PONG from dest. So far it's an
@@ -527,4 +533,13 @@ void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
 void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
 
+/* Migration thread waiting for return path thread. */
+void migration_rp_wait(MigrationState *s);
+/*
+ * Kick the migration thread waiting for return path messages.  NOTE: the
+ * name can be slightly confusing (when read as "kick the rp thread"), just
+ * to remember the target is always the migration thread.
+ */
+void migration_rp_kick(MigrationState *s);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 409eb3e916..1c6c81ad49 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1763,6 +1763,16 @@ static void mark_source_rp_bad(MigrationState *s)
     s->rp_state.error = true;
 }
 
+void migration_rp_wait(MigrationState *s)
+{
+    qemu_sem_wait(&s->rp_state.rp_sem);
+}
+
+void migration_rp_kick(MigrationState *s)
+{
+    qemu_sem_post(&s->rp_state.rp_sem);
+}
+
 static struct rp_cmd_args {
     ssize_t     len; /* -1 = variable */
     const char *name;
@@ -1835,7 +1845,7 @@ static int migrate_handle_rp_resume_ack(MigrationState *s, uint32_t value)
                       MIGRATION_STATUS_POSTCOPY_ACTIVE);
 
     /* Notify send thread that time to continue send pages */
-    qemu_sem_post(&s->rp_state.rp_sem);
+    migration_rp_kick(s);
 
     return 0;
 }
@@ -2464,7 +2474,7 @@ static int postcopy_resume_handshake(MigrationState *s)
     qemu_savevm_send_postcopy_resume(s->to_dst_file);
 
     while (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
-        qemu_sem_wait(&s->rp_state.rp_sem);
+        migration_rp_wait(s);
     }
 
     if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
diff --git a/migration/ram.c b/migration/ram.c
index 6c40d9af0c..2f5ce4d60b 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4141,7 +4141,7 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
 
     /* Wait until all the ramblocks' dirty bitmap synced */
     while (qatomic_read(&rs->postcopy_bmap_sync_requested)) {
-        qemu_sem_wait(&s->rp_state.rp_sem);
+        migration_rp_wait(s);
     }
 
     trace_ram_dirty_bitmap_sync_complete();
@@ -4149,11 +4149,6 @@ static int ram_dirty_bitmap_sync_all(MigrationState *s, RAMState *rs)
     return 0;
 }
 
-static void ram_dirty_bitmap_reload_notify(MigrationState *s)
-{
-    qemu_sem_post(&s->rp_state.rp_sem);
-}
-
 /*
  * Read the received bitmap, revert it as the initial dirty bitmap.
  * This is only used when the postcopy migration is paused but wants
@@ -4237,10 +4232,13 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
     qatomic_dec(&rs->postcopy_bmap_sync_requested);
 
     /*
-     * We succeeded to sync bitmap for current ramblock. If this is
-     * the last one to sync, we need to notify the main send thread.
+     * We succeeded to sync bitmap for current ramblock. Always kick the
+     * migration thread to check whether all requested bitmaps are
+     * reloaded.  NOTE: it's racy to only kick when requested==0, because
+     * we don't know whether the migration thread may still be increasing
+     * it.
      */
-    ram_dirty_bitmap_reload_notify(s);
+    migration_rp_kick(s);
 
     ret = 0;
 out:
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate
  2023-10-11  9:21 ` [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate Juan Quintela
@ 2023-10-11 13:04   ` Fabiano Rosas
  2023-10-11 14:11     ` Juan Quintela
  2023-10-11 20:30     ` Juan Quintela
  0 siblings, 2 replies; 70+ messages in thread
From: Fabiano Rosas @ 2023-10-11 13:04 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel
  Cc: Laurent Vivier, Peter Xu, Paolo Bonzini, Markus Armbruster,
	Juan Quintela, Thomas Huth, Li Zhijian, Leonardo Bras, Eric Blake

Juan Quintela <quintela@redhat.com> writes:

> From: Fabiano Rosas <farosas@suse.de>
>
> There is currently no way to write a test for errors that happened in
> qmp_migrate before the migration has started.
>
> Add a version of qmp_migrate that ensures an error happens. To make
> use of it a test needs to set MigrateCommon.result as
> MIG_TEST_QMP_ERROR.
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> Message-ID: <20230712190742.22294-6-farosas@suse.de>
> ---

Hi Juan,

What's the plan for the last patch in that series? The one that adds the
actual test:
[PATCH v5 6/6] tests/qtest: migration-test: Add tests for file-based migration
https://lore.kernel.org/r/20230712190742.22294-7-farosas@suse.de

I'm trying to keep track of what's merged and what's not because I have
more patches on top of it to send.

Thanks!


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate
  2023-10-11 13:04   ` Fabiano Rosas
@ 2023-10-11 14:11     ` Juan Quintela
  2023-10-11 20:30     ` Juan Quintela
  1 sibling, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11 14:11 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Paolo Bonzini,
	Markus Armbruster, Thomas Huth, Li Zhijian, Leonardo Bras,
	Eric Blake

Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> From: Fabiano Rosas <farosas@suse.de>
>>
>> There is currently no way to write a test for errors that happened in
>> qmp_migrate before the migration has started.
>>
>> Add a version of qmp_migrate that ensures an error happens. To make
>> use of it a test needs to set MigrateCommon.result as
>> MIG_TEST_QMP_ERROR.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> Message-ID: <20230712190742.22294-6-farosas@suse.de>
>> ---
>
> Hi Juan,
>
> What's the plan for the last patch in that series? The one that adds the
> actual test:
> [PATCH v5 6/6] tests/qtest: migration-test: Add tests for file-based migration
> https://lore.kernel.org/r/20230712190742.22294-7-farosas@suse.de
>
> I'm trying to keep track of what's merged and what's not because I have
> more patches on top of it to send.

You need to resend the patches that I didn't pick.

If my memory is correct, it didn't apply or make check didn't work without all
patches applied.

Later, Juan.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PULL 00/65] Migration 20231011 patches
  2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
                   ` (64 preceding siblings ...)
  2023-10-11  9:22 ` [PULL 65/65] migration: Add migration_rp_wait|kick() Juan Quintela
@ 2023-10-11 17:04 ` Stefan Hajnoczi
  65 siblings, 0 replies; 70+ messages in thread
From: Stefan Hajnoczi @ 2023-10-11 17:04 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Paolo Bonzini,
	Markus Armbruster, Juan Quintela, Thomas Huth, Li Zhijian,
	Leonardo Bras, Eric Blake, Fabiano Rosas

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate
  2023-10-11 13:04   ` Fabiano Rosas
  2023-10-11 14:11     ` Juan Quintela
@ 2023-10-11 20:30     ` Juan Quintela
  1 sibling, 0 replies; 70+ messages in thread
From: Juan Quintela @ 2023-10-11 20:30 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Laurent Vivier, Peter Xu, Paolo Bonzini,
	Markus Armbruster, Thomas Huth, Li Zhijian, Leonardo Bras,
	Eric Blake

Fabiano Rosas <farosas@suse.de> wrote:
> Juan Quintela <quintela@redhat.com> writes:
>
>> From: Fabiano Rosas <farosas@suse.de>
>>
>> There is currently no way to write a test for errors that happened in
>> qmp_migrate before the migration has started.
>>
>> Add a version of qmp_migrate that ensures an error happens. To make
>> use of it a test needs to set MigrateCommon.result as
>> MIG_TEST_QMP_ERROR.
>>
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> Message-ID: <20230712190742.22294-6-farosas@suse.de>
>> ---
>
> Hi Juan,
>
> What's the plan for the last patch in that series? The one that adds the
> actual test:
> [PATCH v5 6/6] tests/qtest: migration-test: Add tests for file-based migration
> https://lore.kernel.org/r/20230712190742.22294-7-farosas@suse.de
>
> I'm trying to keep track of what's merged and what's not because I have
> more patches on top of it to send.

Added it back, it appears that now it is working.

If you hear nothing else from me, it should be on next pull request.

Later, Juan.



^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2023-10-11 20:31 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-11  9:20 [PULL 00/65] Migration 20231011 patches Juan Quintela
2023-10-11  9:20 ` [PULL 01/65] migration/qmp: Fix crash on setting tls-authz with null Juan Quintela
2023-10-11  9:21 ` [PULL 02/65] tests/qtest: migration: Expose migrate_set_capability Juan Quintela
2023-10-11  9:21 ` [PULL 03/65] tests/qtest: migration: Add migrate_incoming_qmp helper Juan Quintela
2023-10-11  9:21 ` [PULL 04/65] tests/qtest: migration: Use migrate_incoming_qmp where appropriate Juan Quintela
2023-10-11  9:21 ` [PULL 05/65] migration: Set migration status early in incoming side Juan Quintela
2023-10-11  9:21 ` [PULL 06/65] tests/qtest: migration: Add support for negative testing of qmp_migrate Juan Quintela
2023-10-11 13:04   ` Fabiano Rosas
2023-10-11 14:11     ` Juan Quintela
2023-10-11 20:30     ` Juan Quintela
2023-10-11  9:21 ` [PULL 07/65] migration: Allow RECOVER->PAUSED convertion for dest qemu Juan Quintela
2023-10-11  9:21 ` [PULL 08/65] migration/rdma: Clean up qemu_rdma_poll()'s return type Juan Quintela
2023-10-11  9:21 ` [PULL 09/65] migration/rdma: Clean up qemu_rdma_data_init()'s " Juan Quintela
2023-10-11  9:21 ` [PULL 10/65] migration/rdma: Clean up rdma_delete_block()'s " Juan Quintela
2023-10-11  9:21 ` [PULL 11/65] migration/rdma: Drop fragile wr_id formatting Juan Quintela
2023-10-11  9:21 ` [PULL 12/65] migration/rdma: Consistently use uint64_t for work request IDs Juan Quintela
2023-10-11  9:21 ` [PULL 13/65] migration/rdma: Fix unwanted integer truncation Juan Quintela
2023-10-11  9:21 ` [PULL 14/65] migration/rdma: Clean up two more harmless signed vs. unsigned issues Juan Quintela
2023-10-11  9:21 ` [PULL 15/65] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage Juan Quintela
2023-10-11  9:21 ` [PULL 16/65] migration/rdma: Fix qemu_rdma_accept() to return failure on errors Juan Quintela
2023-10-11  9:21 ` [PULL 17/65] migration/rdma: Put @errp parameter last Juan Quintela
2023-10-11  9:21 ` [PULL 18/65] migration/rdma: Eliminate error_propagate() Juan Quintela
2023-10-11  9:21 ` [PULL 19/65] migration/rdma: Drop rdma_add_block() error handling Juan Quintela
2023-10-11  9:21 ` [PULL 20/65] migration/rdma: Drop qemu_rdma_search_ram_block() " Juan Quintela
2023-10-11  9:21 ` [PULL 21/65] migration/rdma: Make qemu_rdma_buffer_mergeable() return bool Juan Quintela
2023-10-11  9:21 ` [PULL 22/65] migration/rdma: Use bool for two RDMAContext flags Juan Quintela
2023-10-11  9:21 ` [PULL 23/65] migration/rdma: Fix or document problematic uses of errno Juan Quintela
2023-10-11  9:21 ` [PULL 24/65] migration/rdma: Ditch useless numeric error codes in error messages Juan Quintela
2023-10-11  9:21 ` [PULL 25/65] migration/rdma: Fix io_writev(), io_readv() methods to obey contract Juan Quintela
2023-10-11  9:21 ` [PULL 26/65] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE() Juan Quintela
2023-10-11  9:21 ` [PULL 27/65] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error Juan Quintela
2023-10-11  9:21 ` [PULL 28/65] migration/rdma: Fix qemu_get_cm_event_timeout() to always " Juan Quintela
2023-10-11  9:21 ` [PULL 29/65] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port Juan Quintela
2023-10-11  9:21 ` [PULL 30/65] migration/rdma: Fix QEMUFileHooks method return values Juan Quintela
2023-10-11  9:21 ` [PULL 31/65] migration/rdma: Fix rdma_getaddrinfo() error checking Juan Quintela
2023-10-11  9:21 ` [PULL 32/65] migration/rdma: Return -1 instead of negative errno code Juan Quintela
2023-10-11  9:21 ` [PULL 33/65] migration/rdma: Dumb down remaining int error values to -1 Juan Quintela
2023-10-11  9:21 ` [PULL 34/65] migration/rdma: Replace int error_state by bool errored Juan Quintela
2023-10-11  9:21 ` [PULL 35/65] migration/rdma: Drop superfluous assignments to @ret Juan Quintela
2023-10-11  9:21 ` [PULL 36/65] migration/rdma: Check negative error values the same way everywhere Juan Quintela
2023-10-11  9:21 ` [PULL 37/65] migration/rdma: Plug a memory leak and improve a message Juan Quintela
2023-10-11  9:21 ` [PULL 38/65] migration/rdma: Delete inappropriate error_report() in macro ERROR() Juan Quintela
2023-10-11  9:21 ` [PULL 39/65] migration/rdma: Retire " Juan Quintela
2023-10-11  9:21 ` [PULL 40/65] migration/rdma: Fix error handling around rdma_getaddrinfo() Juan Quintela
2023-10-11  9:21 ` [PULL 41/65] migration/rdma: Drop "@errp is clear" guards around error_setg() Juan Quintela
2023-10-11  9:21 ` [PULL 42/65] migration/rdma: Convert qemu_rdma_exchange_recv() to Error Juan Quintela
2023-10-11  9:21 ` [PULL 43/65] migration/rdma: Convert qemu_rdma_exchange_send() " Juan Quintela
2023-10-11  9:21 ` [PULL 44/65] migration/rdma: Convert qemu_rdma_exchange_get_response() " Juan Quintela
2023-10-11  9:21 ` [PULL 45/65] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() " Juan Quintela
2023-10-11  9:21 ` [PULL 46/65] migration/rdma: Convert qemu_rdma_write_flush() " Juan Quintela
2023-10-11  9:21 ` [PULL 47/65] migration/rdma: Convert qemu_rdma_write_one() " Juan Quintela
2023-10-11  9:21 ` [PULL 48/65] migration/rdma: Convert qemu_rdma_write() " Juan Quintela
2023-10-11  9:21 ` [PULL 49/65] migration/rdma: Convert qemu_rdma_post_send_control() " Juan Quintela
2023-10-11  9:21 ` [PULL 50/65] migration/rdma: Convert qemu_rdma_post_recv_control() " Juan Quintela
2023-10-11  9:21 ` [PULL 51/65] migration/rdma: Convert qemu_rdma_alloc_pd_cq() " Juan Quintela
2023-10-11  9:21 ` [PULL 52/65] migration/rdma: Silence qemu_rdma_resolve_host() Juan Quintela
2023-10-11  9:21 ` [PULL 53/65] migration/rdma: Silence qemu_rdma_connect() Juan Quintela
2023-10-11  9:21 ` [PULL 54/65] migration/rdma: Silence qemu_rdma_reg_control() Juan Quintela
2023-10-11  9:21 ` [PULL 55/65] migration/rdma: Don't report received completion events as error Juan Quintela
2023-10-11  9:21 ` [PULL 56/65] migration/rdma: Silence qemu_rdma_block_for_wrid() Juan Quintela
2023-10-11  9:21 ` [PULL 57/65] migration/rdma: Silence qemu_rdma_register_and_get_keys() Juan Quintela
2023-10-11  9:21 ` [PULL 58/65] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings Juan Quintela
2023-10-11  9:21 ` [PULL 59/65] migration/rdma: Use error_report() & friends instead of stderr Juan Quintela
2023-10-11  9:21 ` [PULL 60/65] migration/rdma: Replace flawed device detail dump by tracing Juan Quintela
2023-10-11  9:21 ` [PULL 61/65] migration: Display error in query-migrate irrelevant of status Juan Quintela
2023-10-11  9:22 ` [PULL 62/65] migration: Introduce migrate_has_error() Juan Quintela
2023-10-11  9:22 ` [PULL 63/65] qemufile: Always return a verbose error Juan Quintela
2023-10-11  9:22 ` [PULL 64/65] migration: Remember num of ramblocks to sync during recovery Juan Quintela
2023-10-11  9:22 ` [PULL 65/65] migration: Add migration_rp_wait|kick() Juan Quintela
2023-10-11 17:04 ` [PULL 00/65] Migration 20231011 patches Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).