qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel
@ 2025-02-13 17:59 Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 1/9] migration: Set migration error outside of migrate_cancel Fabiano Rosas
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

changes from v2:

Added more if != CANCELLING where it was missing. We have other work
[1] pending around migration states, so I've kept this simple for
correctness only.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1670135398

1- https://lore.kernel.org/r/20250110100707.4805-1-shivam.kumar1@nutanix.com

v2:
https://lore.kernel.org/r/20250211150136.6781-1-farosas@suse.de
v1:
https://lore.kernel.org/r/20241202220137.32584-1-farosas@suse.de

While working on downstream issues with postcopy, I ended up writing a
set of tests for issuing qmp_migrate_cancel() at various points during
the migration. That exposed some bugs, which this series attempts to
fix.

There is also a fix for the issue Daniel found:

https://gitlab.com/qemu-project/qemu/-/issues/2633

I'm also sending the test code. It creates one test per
MIGRATION_STATUS_ state. Each test starts a migration, waits for that
specific state to be reached, issues qmp_migrate_cancel() and checks
that the migration state changes to cancelled (for now only cancelling
migration from the source side).

I was initially worried that this would be too racy, but so far each
test has survived 1000 iterations. I'm thinking it's worth merging,
specially because even after working on this I haven't been able to
clear the questions we have in our todo list [1], so we'll probably
need more work around this area in the future.

1- https://wiki.qemu.org/ToDo/LiveMigration#Migration_cancel_concurrency

Fabiano Rosas (9):
  migration: Set migration error outside of migrate_cancel
  migration: Unify migration_cancel and migrate_fd_cancel
  migration: Change migrate_fd_ to migration_
  migration: Fix hang after error in destination setup phase
  migration: Reject qmp_migrate_cancel after postcopy
  migration: Don't set FAILED state when cancelling
  tests/qtest/migration: Introduce migration_test_add_suffix
  tests/qtest/migration: Add a cancel test
  migration: Update migrate_cancel documentation

 migration/channel.c                    |   9 +-
 migration/migration.c                  |  81 +++++++-----
 migration/migration.h                  |   4 +-
 migration/multifd.c                    |   2 +-
 migration/ram.c                        |   4 +-
 migration/rdma.c                       |   2 +-
 migration/trace-events                 |   4 +-
 qapi/migration.json                    |   4 +-
 tests/qtest/migration/migration-util.c |  24 ++++
 tests/qtest/migration/migration-util.h |   2 +
 tests/qtest/migration/precopy-tests.c  | 176 +++++++++++++++++++++++++
 11 files changed, 267 insertions(+), 45 deletions(-)

-- 
2.35.3



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/9] migration: Set migration error outside of migrate_cancel
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 2/9] migration: Unify migration_cancel and migrate_fd_cancel Fabiano Rosas
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

There's no point passing the error into migration cancel only for it
to call migrate_set_error().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 7 ++-----
 migration/migration.h | 2 +-
 migration/ram.c       | 4 +++-
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 396928513a..7728f52aef 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -342,11 +342,8 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
     qemu_bh_schedule(bh);
 }
 
-void migration_cancel(const Error *error)
+void migration_cancel()
 {
-    if (error) {
-        migrate_set_error(current_migration, error);
-    }
     if (migrate_dirty_limit()) {
         qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
     }
@@ -365,7 +362,7 @@ void migration_shutdown(void)
      * Cancel the current migration - that will (eventually)
      * stop the migration using this structure
      */
-    migration_cancel(NULL);
+    migration_cancel();
     object_unref(OBJECT(current_migration));
 
     /*
diff --git a/migration/migration.h b/migration/migration.h
index eaebcc2042..3547420c52 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -530,7 +530,7 @@ void migration_make_urgent_request(void);
 void migration_consume_urgent_request(void);
 bool migration_rate_limit(void);
 void migration_bh_schedule(QEMUBHFunc *cb, void *opaque);
-void migration_cancel(const Error *error);
+void migration_cancel(void);
 
 void migration_populate_vfio_info(MigrationInfo *info);
 void migration_reset_vfio_bytes_transferred(void);
diff --git a/migration/ram.c b/migration/ram.c
index 6f460fd22d..589b6505eb 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4465,8 +4465,10 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
          * Abort and indicate a proper reason.
          */
         error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
-        migration_cancel(err);
+        migrate_set_error(migrate_get_current(), err);
         error_free(err);
+
+        migration_cancel();
     }
 
     switch (ps) {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/9] migration: Unify migration_cancel and migrate_fd_cancel
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 1/9] migration: Set migration error outside of migrate_cancel Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 3/9] migration: Change migrate_fd_ to migration_ Fabiano Rosas
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

There's no need for two separate functions and this _fd_ is a historic
artifact that makes little sense nowadays.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c  | 18 +++++++-----------
 migration/trace-events |  2 +-
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 7728f52aef..e37842fdd2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -106,7 +106,6 @@ static GSList *migration_blockers[MIG_MODE__MAX];
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 static bool migration_switchover_start(MigrationState *s, Error **errp);
-static void migrate_fd_cancel(MigrationState *s);
 static bool close_return_path_on_source(MigrationState *s);
 static void migration_completion_end(MigrationState *s);
 static void migrate_hup_delete(MigrationState *s);
@@ -342,14 +341,6 @@ void migration_bh_schedule(QEMUBHFunc *cb, void *opaque)
     qemu_bh_schedule(bh);
 }
 
-void migration_cancel()
-{
-    if (migrate_dirty_limit()) {
-        qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
-    }
-    migrate_fd_cancel(current_migration);
-}
-
 void migration_shutdown(void)
 {
     /*
@@ -1555,12 +1546,17 @@ static void migrate_fd_error(MigrationState *s, const Error *error)
     migrate_set_error(s, error);
 }
 
-static void migrate_fd_cancel(MigrationState *s)
+void migration_cancel(void)
 {
+    MigrationState *s = migrate_get_current();
     int old_state ;
     bool setup = (s->state == MIGRATION_STATUS_SETUP);
 
-    trace_migrate_fd_cancel();
+    trace_migration_cancel();
+
+    if (migrate_dirty_limit()) {
+        qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
+    }
 
     WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
         if (s->rp_state.from_dst_file) {
diff --git a/migration/trace-events b/migration/trace-events
index 12b262f8ee..d22600abe6 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -156,7 +156,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam
 migrate_set_state(const char *new_state) "new state %s"
 migrate_fd_cleanup(void) ""
 migrate_error(const char *error_desc) "error=%s"
-migrate_fd_cancel(void) ""
+migration_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
 migrate_pending_exact(uint64_t size, uint64_t pre, uint64_t post) "exact pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
 migrate_pending_estimate(uint64_t size, uint64_t pre, uint64_t post) "estimate pending size %" PRIu64 " (pre = %" PRIu64 " post=%" PRIu64 ")"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/9] migration: Change migrate_fd_ to migration_
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 1/9] migration: Set migration error outside of migrate_cancel Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 2/9] migration: Unify migration_cancel and migrate_fd_cancel Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 20:59   ` Peter Xu
  2025-02-18  7:54   ` Zhijian Li (Fujitsu) via
  2025-02-13 17:59 ` [PATCH v3 4/9] migration: Fix hang after error in destination setup phase Fabiano Rosas
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé, Li Zhijian

Remove all instances of _fd_ from the migration generic code. These
functions have grown over time and the _fd_ part is now just
confusing.

migration_fd_error() -> migration_error() makes it a little
vague. Since it's only used for migration_connect() failures, change
it to migration_connect_set_error().

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/channel.c    |  4 ++--
 migration/migration.c  | 30 +++++++++++++++---------------
 migration/migration.h  |  2 +-
 migration/multifd.c    |  2 +-
 migration/rdma.c       |  2 +-
 migration/trace-events |  2 +-
 6 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/migration/channel.c b/migration/channel.c
index f9de064f3b..24a91ef911 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -74,7 +74,7 @@ void migration_channel_connect(MigrationState *s,
             if (!error) {
                 /* tls_channel_connect will call back to this
                  * function after the TLS handshake,
-                 * so we mustn't call migrate_fd_connect until then
+                 * so we mustn't call migration_connect until then
                  */
 
                 return;
@@ -89,7 +89,7 @@ void migration_channel_connect(MigrationState *s,
             qemu_mutex_unlock(&s->qemu_file_lock);
         }
     }
-    migrate_fd_connect(s, error);
+    migration_connect(s, error);
     error_free(error);
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index e37842fdd2..c39cedef3b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1423,12 +1423,12 @@ static void migration_cleanup_json_writer(MigrationState *s)
     g_clear_pointer(&s->vmdesc, json_writer_free);
 }
 
-static void migrate_fd_cleanup(MigrationState *s)
+static void migration_cleanup(MigrationState *s)
 {
     MigrationEventType type;
     QEMUFile *tmp = NULL;
 
-    trace_migrate_fd_cleanup();
+    trace_migration_cleanup();
 
     migration_cleanup_json_writer(s);
 
@@ -1485,9 +1485,9 @@ static void migrate_fd_cleanup(MigrationState *s)
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
-static void migrate_fd_cleanup_bh(void *opaque)
+static void migration_cleanup_bh(void *opaque)
 {
-    migrate_fd_cleanup(opaque);
+    migration_cleanup(opaque);
 }
 
 void migrate_set_error(MigrationState *s, const Error *error)
@@ -1517,7 +1517,7 @@ static void migrate_error_free(MigrationState *s)
     }
 }
 
-static void migrate_fd_error(MigrationState *s, const Error *error)
+static void migration_connect_set_error(MigrationState *s, const Error *error)
 {
     MigrationStatus current = s->state;
     MigrationStatus next;
@@ -2198,7 +2198,7 @@ void qmp_migrate(const char *uri, bool has_channels,
 
 out:
     if (local_err) {
-        migrate_fd_error(s, local_err);
+        migration_connect_set_error(s, local_err);
         error_propagate(errp, local_err);
     }
 }
@@ -2243,7 +2243,7 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
         if (!resume_requested) {
             yank_unregister_instance(MIGRATION_YANK_INSTANCE);
         }
-        migrate_fd_error(s, local_err);
+        migration_connect_set_error(s, local_err);
         error_propagate(errp, local_err);
         return;
     }
@@ -3427,7 +3427,7 @@ static void migration_iteration_finish(MigrationState *s)
         break;
     }
 
-    migration_bh_schedule(migrate_fd_cleanup_bh, s);
+    migration_bh_schedule(migration_cleanup_bh, s);
     bql_unlock();
 }
 
@@ -3455,7 +3455,7 @@ static void bg_migration_iteration_finish(MigrationState *s)
         break;
     }
 
-    migration_bh_schedule(migrate_fd_cleanup_bh, s);
+    migration_bh_schedule(migration_cleanup_bh, s);
     bql_unlock();
 }
 
@@ -3837,7 +3837,7 @@ fail_setup:
     return NULL;
 }
 
-void migrate_fd_connect(MigrationState *s, Error *error_in)
+void migration_connect(MigrationState *s, Error *error_in)
 {
     Error *local_err = NULL;
     uint64_t rate_limit;
@@ -3847,24 +3847,24 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     /*
      * If there's a previous error, free it and prepare for another one.
      * Meanwhile if migration completes successfully, there won't have an error
-     * dumped when calling migrate_fd_cleanup().
+     * dumped when calling migration_cleanup().
      */
     migrate_error_free(s);
 
     s->expected_downtime = migrate_downtime_limit();
     if (error_in) {
-        migrate_fd_error(s, error_in);
+        migration_connect_set_error(s, error_in);
         if (resume) {
             /*
              * Don't do cleanup for resume if channel is invalid, but only dump
              * the error.  We wait for another channel connect from the user.
              * The error_report still gives HMP user a hint on what failed.
-             * It's normally done in migrate_fd_cleanup(), but call it here
+             * It's normally done in migration_cleanup(), but call it here
              * explicitly.
              */
             error_report_err(error_copy(s->error));
         } else {
-            migrate_fd_cleanup(s);
+            migration_cleanup(s);
         }
         return;
     }
@@ -3944,7 +3944,7 @@ fail:
     migrate_set_error(s, local_err);
     migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
     error_report_err(local_err);
-    migrate_fd_cleanup(s);
+    migration_cleanup(s);
 }
 
 static void migration_class_init(ObjectClass *klass, void *data)
diff --git a/migration/migration.h b/migration/migration.h
index 3547420c52..76afef34c4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -484,7 +484,7 @@ bool  migration_has_all_channels(void);
 void migrate_set_error(MigrationState *s, const Error *error);
 bool migrate_has_error(MigrationState *s);
 
-void migrate_fd_connect(MigrationState *s, Error *error_in);
+void migration_connect(MigrationState *s, Error *error_in);
 
 int migration_call_notifiers(MigrationState *s, MigrationEventType type,
                              Error **errp);
diff --git a/migration/multifd.c b/migration/multifd.c
index ab73d6d984..d7ea8668ad 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -444,7 +444,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
          * channels have no I/O handler callback registered when reaching
          * here, because migration thread will wait for all multifd channel
          * establishments to complete during setup.  Since
-         * migrate_fd_cleanup() will be scheduled in main thread too, all
+         * migration_cleanup() will be scheduled in main thread too, all
          * previous callbacks should guarantee to be completed when
          * reaching here.  See multifd_send_state.channels_created and its
          * usage.  In the future, we could replace this with an assert
diff --git a/migration/rdma.c b/migration/rdma.c
index 855753c671..76fb034923 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -4174,7 +4174,7 @@ void rdma_start_outgoing_migration(void *opaque,
 
     s->to_dst_file = rdma_new_output(rdma);
     s->rdma_migration = true;
-    migrate_fd_connect(s, NULL);
+    migration_connect(s, NULL);
     return;
 return_path_err:
     qemu_rdma_cleanup(rdma);
diff --git a/migration/trace-events b/migration/trace-events
index d22600abe6..58c0f07f5b 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -154,7 +154,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam
 
 # migration.c
 migrate_set_state(const char *new_state) "new state %s"
-migrate_fd_cleanup(void) ""
+migration_cleanup(void) ""
 migrate_error(const char *error_desc) "error=%s"
 migration_cancel(void) ""
 migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/9] migration: Fix hang after error in destination setup phase
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (2 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 3/9] migration: Change migrate_fd_ to migration_ Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 5/9] migration: Reject qmp_migrate_cancel after postcopy Fabiano Rosas
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

If the destination side fails at migration_ioc_process_incoming()
before starting the coroutine, it will report the error but QEMU will
not exit.

Set the migration state to FAILED and exit the process if
exit-on-error allows.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2633
Reported-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/channel.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/migration/channel.c b/migration/channel.c
index 24a91ef911..a547b1fbfe 100644
--- a/migration/channel.c
+++ b/migration/channel.c
@@ -33,6 +33,7 @@
 void migration_channel_process_incoming(QIOChannel *ioc)
 {
     MigrationState *s = migrate_get_current();
+    MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
 
     trace_migration_set_incoming_channel(
@@ -47,6 +48,10 @@ void migration_channel_process_incoming(QIOChannel *ioc)
 
     if (local_err) {
         error_report_err(local_err);
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+        if (mis->exit_on_error) {
+            exit(EXIT_FAILURE);
+        }
     }
 }
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/9] migration: Reject qmp_migrate_cancel after postcopy
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (3 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 4/9] migration: Fix hang after error in destination setup phase Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 6/9] migration: Don't set FAILED state when cancelling Fabiano Rosas
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

After postcopy has started, it's not possible to recover the source
machine in case a migration error occurs because the destination has
already been changing the state of the machine. For that same reason,
it doesn't make sense to try to cancel the migration after postcopy
has started. Reject the cancel command during postcopy.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index c39cedef3b..48c9ad3c96 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2251,7 +2251,18 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
 
 void qmp_migrate_cancel(Error **errp)
 {
-    migration_cancel(NULL);
+    /*
+     * After postcopy migration has started, the source machine is not
+     * recoverable in case of a migration error. This also means the
+     * cancel command cannot be used as cancel should allow the
+     * machine to continue operation.
+     */
+    if (migration_in_postcopy()) {
+        error_setg(errp, "Postcopy migration in progress, cannot cancel.");
+        return;
+    }
+
+    migration_cancel();
 }
 
 void qmp_migrate_continue(MigrationStatus state, Error **errp)
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/9] migration: Don't set FAILED state when cancelling
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (4 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 5/9] migration: Reject qmp_migrate_cancel after postcopy Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 21:04   ` Peter Xu
  2025-02-13 17:59 ` [PATCH v3 7/9] tests/qtest/migration: Introduce migration_test_add_suffix Fabiano Rosas
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Daniel P . Berrangé

The expected outcome from qmp_migrate_cancel() is that the source
migration goes to the terminal state
MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
cancelling.

Make sure there is never a state transition from an unspecified state
into FAILED. Code that sets FAILED, should always either make sure
that the old state is not CANCELLING or specify the old state.

Note that the destination is allowed to go into FAILED, so there's no
issue there.

(I don't think this is relevant as a backport because cancelling does
work, it just doesn't show the right state at the end)

Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 48c9ad3c96..c597aa707e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2648,7 +2648,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
     if (migrate_postcopy_preempt()) {
         migration_wait_main_channel(ms);
         if (postcopy_preempt_establish_channel(ms)) {
-            migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
+            if (ms->state != MIGRATION_STATUS_CANCELLING) {
+                migrate_set_state(&ms->state, ms->state,
+                                  MIGRATION_STATUS_FAILED);
+            }
             error_setg(errp, "%s: Failed to establish preempt channel",
                        __func__);
             return -1;
@@ -2986,7 +2989,9 @@ fail:
         error_free(local_err);
     }
 
-    migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    if (s->state != MIGRATION_STATUS_CANCELLING) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    }
 }
 
 /**
@@ -3009,7 +3014,7 @@ static void bg_migration_completion(MigrationState *s)
         qemu_put_buffer(s->to_dst_file, s->bioc->data, s->bioc->usage);
         qemu_fflush(s->to_dst_file);
     } else if (s->state == MIGRATION_STATUS_CANCELLING) {
-        goto fail;
+        return;
     }
 
     if (qemu_file_get_error(s->to_dst_file)) {
@@ -3953,7 +3958,9 @@ void migration_connect(MigrationState *s, Error *error_in)
 
 fail:
     migrate_set_error(s, local_err);
-    migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    if (s->state != MIGRATION_STATUS_CANCELLING) {
+        migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
+    }
     error_report_err(local_err);
     migration_cleanup(s);
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 7/9] tests/qtest/migration: Introduce migration_test_add_suffix
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (5 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 6/9] migration: Don't set FAILED state when cancelling Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 8/9] tests/qtest/migration: Add a cancel test Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 9/9] migration: Update migrate_cancel documentation Fabiano Rosas
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Daniel P . Berrangé, Laurent Vivier, Paolo Bonzini

Introduce a new migration_test_add_suffix to allow programmatic
creation of tests based on a suffix. Pass the test name into the test
so it can know which variant to run.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/migration-util.c | 24 ++++++++++++++++++++++++
 tests/qtest/migration/migration-util.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/tests/qtest/migration/migration-util.c b/tests/qtest/migration/migration-util.c
index 6261d80e4a..642cf50c8d 100644
--- a/tests/qtest/migration/migration-util.c
+++ b/tests/qtest/migration/migration-util.c
@@ -236,6 +236,7 @@ char *resolve_machine_version(const char *alias, const char *var1,
 typedef struct {
     char *name;
     void (*func)(void);
+    void (*func_full)(void *);
 } MigrationTest;
 
 static void migration_test_destroy(gpointer data)
@@ -265,6 +266,29 @@ void migration_test_add(const char *path, void (*fn)(void))
                              migration_test_destroy);
 }
 
+static void migration_test_wrapper_full(const void *data)
+{
+    MigrationTest *test = (MigrationTest *)data;
+
+    g_test_message("Running /%s%s", qtest_get_arch(), test->name);
+    test->func_full(test->name);
+}
+
+void migration_test_add_suffix(const char *path, const char *suffix,
+                               void (*fn)(void *))
+{
+    MigrationTest *test = g_new0(MigrationTest, 1);
+
+    g_assert(g_str_has_suffix(path, "/"));
+    g_assert(!g_str_has_prefix(suffix, "/"));
+
+    test->func_full = fn;
+    test->name = g_strconcat(path, suffix, NULL);
+
+    qtest_add_data_func_full(test->name, test, migration_test_wrapper_full,
+                             migration_test_destroy);
+}
+
 #ifdef O_DIRECT
 /*
  * Probe for O_DIRECT support on the filesystem. Since this is used
diff --git a/tests/qtest/migration/migration-util.h b/tests/qtest/migration/migration-util.h
index f5f2e4650e..44815e9c42 100644
--- a/tests/qtest/migration/migration-util.h
+++ b/tests/qtest/migration/migration-util.h
@@ -51,6 +51,8 @@ static inline bool probe_o_direct_support(const char *tmpfs)
 bool ufd_version_check(bool *uffd_feature_thread_id);
 bool kvm_dirty_ring_supported(void);
 void migration_test_add(const char *path, void (*fn)(void));
+void migration_test_add_suffix(const char *path, const char *suffix,
+                               void (*fn)(void *));
 char *migrate_get_connect_uri(QTestState *who);
 void migrate_set_ports(QTestState *to, QList *channel_list);
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 8/9] tests/qtest/migration: Add a cancel test
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (6 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 7/9] tests/qtest/migration: Introduce migration_test_add_suffix Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  2025-02-13 17:59 ` [PATCH v3 9/9] migration: Update migrate_cancel documentation Fabiano Rosas
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Daniel P . Berrangé, Laurent Vivier, Paolo Bonzini

The qmp_migrate_cancel() command is poorly tested and code inspection
reveals that there might be concurrency issues with its usage. Add a
test that runs a migration and calls qmp_migrate_cancel() at specific
moments.

In order to make the test more deterministic, instead of calling
qmp_migrate_cancel() at random moments during migration, do it after
the migration status change events are seen.

The expected result is that qmp_migrate_cancel() on the source ends
migration on the source with the "cancelled" state and ends migration
on the destination with the "failed" state. The only exception is that
a failed migration should continue in the failed state.

Cancelling is not allowed during postcopy (no test is added for this
because it's a trivial check in the code).

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 tests/qtest/migration/precopy-tests.c | 176 ++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)

diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index 162fa69531..ba273d10b9 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -20,6 +20,7 @@
 #include "migration/migration-util.h"
 #include "ppc-util.h"
 #include "qobject/qlist.h"
+#include "qapi-types-migration.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
 #include "qemu/range.h"
@@ -536,6 +537,161 @@ static void test_multifd_tcp_cancel(void)
     migrate_end(from, to2, true);
 }
 
+static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
+                                         const char *uri, const char *phase)
+{
+    /*
+     * No migrate_incoming_qmp() at the start to force source into
+     * failed state during migrate_qmp().
+     */
+
+    wait_for_serial("src_serial");
+    migrate_ensure_converge(from);
+
+    migrate_qmp(from, to, uri, NULL, "{}");
+
+    migration_event_wait(from, phase);
+    migrate_cancel(from);
+
+    /* cancelling will not move the migration out of 'failed' */
+
+    wait_for_migration_status(from, "failed",
+                              (const char * []) { "completed", NULL });
+
+    /*
+     * Not waiting for the destination because it never started
+     * migration.
+     */
+}
+
+static void test_cancel_src_after_cancelled(QTestState *from, QTestState *to,
+                                            const char *uri, const char *phase)
+{
+    migrate_incoming_qmp(to, uri, NULL, "{ 'exit-on-error': false }");
+
+    wait_for_serial("src_serial");
+    migrate_ensure_converge(from);
+
+    migrate_qmp(from, to, uri, NULL, "{}");
+
+    /* To move to cancelled/cancelling */
+    migrate_cancel(from);
+    migration_event_wait(from, phase);
+
+    /* The migrate_cancel under test */
+    migrate_cancel(from);
+
+    wait_for_migration_status(from, "cancelled",
+                              (const char * []) { "completed", NULL });
+
+    wait_for_migration_status(to, "failed",
+                              (const char * []) { "completed", NULL });
+}
+
+static void test_cancel_src_after_complete(QTestState *from, QTestState *to,
+                                           const char *uri, const char *phase)
+{
+    migrate_incoming_qmp(to, uri, NULL, "{ 'exit-on-error': false }");
+
+    wait_for_serial("src_serial");
+    migrate_ensure_converge(from);
+
+    migrate_qmp(from, to, uri, NULL, "{}");
+
+    migration_event_wait(from, phase);
+    migrate_cancel(from);
+
+    /*
+     * qmp_migrate_cancel() exits early if migration is not running
+     * anymore, the status will not change to cancelled.
+     */
+    wait_for_migration_complete(from);
+    wait_for_migration_complete(to);
+}
+
+static void test_cancel_src_after_none(QTestState *from, QTestState *to,
+                                       const char *uri, const char *phase)
+{
+    /*
+     * Test that cancelling without a migration happening does not
+     * affect subsequent migrations
+     */
+    migrate_cancel(to);
+
+    wait_for_serial("src_serial");
+    migrate_cancel(from);
+
+    migrate_incoming_qmp(to, uri, NULL, "{ 'exit-on-error': false }");
+
+    migrate_ensure_converge(from);
+    migrate_qmp(from, to, uri, NULL, "{}");
+
+    wait_for_migration_complete(from);
+    wait_for_migration_complete(to);
+}
+
+static void test_cancel_src_pre_switchover(QTestState *from, QTestState *to,
+                                           const char *uri, const char *phase)
+{
+    migrate_set_capability(from, "pause-before-switchover", true);
+    migrate_set_capability(to, "pause-before-switchover", true);
+
+    migrate_set_capability(from, "multifd", true);
+    migrate_set_capability(to, "multifd", true);
+
+    migrate_incoming_qmp(to, uri, NULL, "{ 'exit-on-error': false }");
+
+    wait_for_serial("src_serial");
+    migrate_ensure_converge(from);
+
+    migrate_qmp(from, to, uri, NULL, "{}");
+
+    migration_event_wait(from, phase);
+    migrate_cancel(from);
+    migration_event_wait(from, "cancelling");
+
+    wait_for_migration_status(from, "cancelled",
+                              (const char * []) { "completed", NULL });
+
+    wait_for_migration_status(to, "failed",
+                              (const char * []) { "completed", NULL });
+}
+
+static void test_cancel_src_after_status(void *opaque)
+{
+    const char *test_path = opaque;
+    g_autofree char *phase = g_path_get_basename(test_path);
+    g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+    QTestState *from, *to;
+    MigrateStart args = {
+        .hide_stderr = true,
+    };
+
+    if (migrate_start(&from, &to, "defer", &args)) {
+        return;
+    }
+
+    if (g_str_equal(phase, "cancelling") ||
+        g_str_equal(phase, "cancelled")) {
+        test_cancel_src_after_cancelled(from, to, uri, phase);
+
+    } else if (g_str_equal(phase, "completed")) {
+        test_cancel_src_after_complete(from, to, uri, phase);
+
+    } else if (g_str_equal(phase, "failed")) {
+        test_cancel_src_after_failed(from, to, uri, phase);
+
+    } else if (g_str_equal(phase, "none")) {
+        test_cancel_src_after_none(from, to, uri, phase);
+
+    } else {
+        /* any state that comes before pre-switchover */
+        test_cancel_src_pre_switchover(from, to, uri, phase);
+    }
+
+    migrate_end(from, to, false);
+}
+
 static void calc_dirty_rate(QTestState *who, uint64_t calc_time)
 {
     qtest_qmp_assert_success(who,
@@ -1018,4 +1174,24 @@ void migration_test_add_precopy(MigrationTestEnv *env)
                                test_vcpu_dirty_limit);
         }
     }
+
+    /* ensure new status don't go unnoticed */
+    assert(MIGRATION_STATUS__MAX == 15);
+
+    for (int i = MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) {
+        switch (i) {
+        case MIGRATION_STATUS_DEVICE: /* happens too fast */
+        case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */
+        case MIGRATION_STATUS_COLO: /* no support in tests */
+        case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancelled */
+        case MIGRATION_STATUS_POSTCOPY_PAUSED:
+        case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
+        case MIGRATION_STATUS_POSTCOPY_RECOVER:
+            continue;
+        default:
+            migration_test_add_suffix("/migration/cancel/src/after/",
+                                      MigrationStatus_str(i),
+                                      test_cancel_src_after_status);
+        }
+    }
 }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 9/9] migration: Update migrate_cancel documentation
  2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
                   ` (7 preceding siblings ...)
  2025-02-13 17:59 ` [PATCH v3 8/9] tests/qtest/migration: Add a cancel test Fabiano Rosas
@ 2025-02-13 17:59 ` Fabiano Rosas
  8 siblings, 0 replies; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-13 17:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Daniel P . Berrangé, Markus Armbruster, Eric Blake

Update the migrate_cancel command documentation with a few words about
postcopy and the expected state of the machine after migration.

Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 qapi/migration.json | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 43babd1df4..8b9c53595c 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1524,7 +1524,9 @@
 ##
 # @migrate_cancel:
 #
-# Cancel the current executing migration process.
+# Cancel the currently executing migration process.  Allows a new
+# migration to be started right after.  When postcopy-ram is in use,
+# cancelling is not allowed after the postcopy phase has started.
 #
 # .. note:: This command succeeds even if there is no migration
 #    process running.
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/9] migration: Change migrate_fd_ to migration_
  2025-02-13 17:59 ` [PATCH v3 3/9] migration: Change migrate_fd_ to migration_ Fabiano Rosas
@ 2025-02-13 20:59   ` Peter Xu
  2025-02-18  7:54   ` Zhijian Li (Fujitsu) via
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Xu @ 2025-02-13 20:59 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Daniel P . Berrangé, Li Zhijian

On Thu, Feb 13, 2025 at 02:59:21PM -0300, Fabiano Rosas wrote:
> Remove all instances of _fd_ from the migration generic code. These
> functions have grown over time and the _fd_ part is now just
> confusing.
> 
> migration_fd_error() -> migration_error() makes it a little
> vague. Since it's only used for migration_connect() failures, change
> it to migration_connect_set_error().
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 6/9] migration: Don't set FAILED state when cancelling
  2025-02-13 17:59 ` [PATCH v3 6/9] migration: Don't set FAILED state when cancelling Fabiano Rosas
@ 2025-02-13 21:04   ` Peter Xu
  2025-02-14 12:25     ` Fabiano Rosas
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2025-02-13 21:04 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Daniel P . Berrangé

On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
> The expected outcome from qmp_migrate_cancel() is that the source
> migration goes to the terminal state
> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
> cancelling.
> 
> Make sure there is never a state transition from an unspecified state
> into FAILED. Code that sets FAILED, should always either make sure
> that the old state is not CANCELLING or specify the old state.
> 
> Note that the destination is allowed to go into FAILED, so there's no
> issue there.
> 
> (I don't think this is relevant as a backport because cancelling does
> work, it just doesn't show the right state at the end)
> 
> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
though..

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 6/9] migration: Don't set FAILED state when cancelling
  2025-02-13 21:04   ` Peter Xu
@ 2025-02-14 12:25     ` Fabiano Rosas
  2025-02-14 15:13       ` Peter Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Fabiano Rosas @ 2025-02-14 12:25 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Daniel P . Berrangé

Peter Xu <peterx@redhat.com> writes:

> On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
>> The expected outcome from qmp_migrate_cancel() is that the source
>> migration goes to the terminal state
>> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
>> cancelling.
>> 
>> Make sure there is never a state transition from an unspecified state
>> into FAILED. Code that sets FAILED, should always either make sure
>> that the old state is not CANCELLING or specify the old state.
>> 
>> Note that the destination is allowed to go into FAILED, so there's no
>> issue there.
>> 
>> (I don't think this is relevant as a backport because cancelling does
>> work, it just doesn't show the right state at the end)
>> 
>> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
>> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
>> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
>> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
> though..

I thought we had agreed over IRC that it was best to hold that until the
other MigrationStatus work happens?

Anyway, looking closer at this, there are places that handle CANCELLING
beforehand (_detect_error) and places that only set FAILED after
specific states (multifd), so a single helper will require more
churn. Let's postpone that please.

>
> Reviewed-by: Peter Xu <peterx@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 6/9] migration: Don't set FAILED state when cancelling
  2025-02-14 12:25     ` Fabiano Rosas
@ 2025-02-14 15:13       ` Peter Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2025-02-14 15:13 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Daniel P . Berrangé

On Fri, Feb 14, 2025 at 09:25:12AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Feb 13, 2025 at 02:59:24PM -0300, Fabiano Rosas wrote:
> >> The expected outcome from qmp_migrate_cancel() is that the source
> >> migration goes to the terminal state
> >> MIGRATION_STATUS_CANCELLED. Anything different from this is a bug when
> >> cancelling.
> >> 
> >> Make sure there is never a state transition from an unspecified state
> >> into FAILED. Code that sets FAILED, should always either make sure
> >> that the old state is not CANCELLING or specify the old state.
> >> 
> >> Note that the destination is allowed to go into FAILED, so there's no
> >> issue there.
> >> 
> >> (I don't think this is relevant as a backport because cancelling does
> >> work, it just doesn't show the right state at the end)
> >> 
> >> Fixes: 3dde8fdbad ("migration: Merge precopy/postcopy on switchover start")
> >> Fixes: d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously")
> >> Fixes: 8518278a6a ("migration: implementation of background snapshot thread")
> >> Fixes: bf78a046b9 ("migration: refactor migrate_fd_connect failures")
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >
> > Not like migrate_set_state_failure(MigrationState *s)?  Not a huge deal,
> > though..
> 
> I thought we had agreed over IRC that it was best to hold that until the
> other MigrationStatus work happens?

If we touched this anyway, IMHO no hurt to add a helper too.

migrate_set_state_failure() can then be renamed to migrate_set_failure(),
take a Error* instead so it might help that effort too.

> 
> Anyway, looking closer at this, there are places that handle CANCELLING
> beforehand (_detect_error) and places that only set FAILED after
> specific states (multifd), so a single helper will require more
> churn. Let's postpone that please.

Sure.  Let's go ahead with this.

> 
> >
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/9] migration: Change migrate_fd_ to migration_
  2025-02-13 17:59 ` [PATCH v3 3/9] migration: Change migrate_fd_ to migration_ Fabiano Rosas
  2025-02-13 20:59   ` Peter Xu
@ 2025-02-18  7:54   ` Zhijian Li (Fujitsu) via
  1 sibling, 0 replies; 15+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2025-02-18  7:54 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel@nongnu.org; +Cc: Peter Xu, Daniel P . Berrangé



On 14/02/2025 01:59, Fabiano Rosas wrote:
> Remove all instances of _fd_ from the migration generic code. These
> functions have grown over time and the _fd_ part is now just
> confusing.
> 
> migration_fd_error() -> migration_error() makes it a little
> vague. Since it's only used for migration_connect() failures, change
> it to migration_connect_set_error().
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Tested-by: Li Zhijian <lizhijian@fujitsu.com> # RDMA

> ---
>   migration/channel.c    |  4 ++--
>   migration/migration.c  | 30 +++++++++++++++---------------
>   migration/migration.h  |  2 +-
>   migration/multifd.c    |  2 +-
>   migration/rdma.c       |  2 +-
>   migration/trace-events |  2 +-
>   6 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/migration/channel.c b/migration/channel.c
> index f9de064f3b..24a91ef911 100644
> --- a/migration/channel.c
> +++ b/migration/channel.c
> @@ -74,7 +74,7 @@ void migration_channel_connect(MigrationState *s,
>               if (!error) {
>                   /* tls_channel_connect will call back to this
>                    * function after the TLS handshake,
> -                 * so we mustn't call migrate_fd_connect until then
> +                 * so we mustn't call migration_connect until then
>                    */
>   
>                   return;
> @@ -89,7 +89,7 @@ void migration_channel_connect(MigrationState *s,
>               qemu_mutex_unlock(&s->qemu_file_lock);
>           }
>       }
> -    migrate_fd_connect(s, error);
> +    migration_connect(s, error);
>       error_free(error);
>   }
>   
> diff --git a/migration/migration.c b/migration/migration.c
> index e37842fdd2..c39cedef3b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1423,12 +1423,12 @@ static void migration_cleanup_json_writer(MigrationState *s)
>       g_clear_pointer(&s->vmdesc, json_writer_free);
>   }
>   
> -static void migrate_fd_cleanup(MigrationState *s)
> +static void migration_cleanup(MigrationState *s)
>   {
>       MigrationEventType type;
>       QEMUFile *tmp = NULL;
>   
> -    trace_migrate_fd_cleanup();
> +    trace_migration_cleanup();
>   
>       migration_cleanup_json_writer(s);
>   
> @@ -1485,9 +1485,9 @@ static void migrate_fd_cleanup(MigrationState *s)
>       yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>   }
>   
> -static void migrate_fd_cleanup_bh(void *opaque)
> +static void migration_cleanup_bh(void *opaque)
>   {
> -    migrate_fd_cleanup(opaque);
> +    migration_cleanup(opaque);
>   }
>   
>   void migrate_set_error(MigrationState *s, const Error *error)
> @@ -1517,7 +1517,7 @@ static void migrate_error_free(MigrationState *s)
>       }
>   }
>   
> -static void migrate_fd_error(MigrationState *s, const Error *error)
> +static void migration_connect_set_error(MigrationState *s, const Error *error)
>   {
>       MigrationStatus current = s->state;
>       MigrationStatus next;
> @@ -2198,7 +2198,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>   
>   out:
>       if (local_err) {
> -        migrate_fd_error(s, local_err);
> +        migration_connect_set_error(s, local_err);
>           error_propagate(errp, local_err);
>       }
>   }
> @@ -2243,7 +2243,7 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
>           if (!resume_requested) {
>               yank_unregister_instance(MIGRATION_YANK_INSTANCE);
>           }
> -        migrate_fd_error(s, local_err);
> +        migration_connect_set_error(s, local_err);
>           error_propagate(errp, local_err);
>           return;
>       }
> @@ -3427,7 +3427,7 @@ static void migration_iteration_finish(MigrationState *s)
>           break;
>       }
>   
> -    migration_bh_schedule(migrate_fd_cleanup_bh, s);
> +    migration_bh_schedule(migration_cleanup_bh, s);
>       bql_unlock();
>   }
>   
> @@ -3455,7 +3455,7 @@ static void bg_migration_iteration_finish(MigrationState *s)
>           break;
>       }
>   
> -    migration_bh_schedule(migrate_fd_cleanup_bh, s);
> +    migration_bh_schedule(migration_cleanup_bh, s);
>       bql_unlock();
>   }
>   
> @@ -3837,7 +3837,7 @@ fail_setup:
>       return NULL;
>   }
>   
> -void migrate_fd_connect(MigrationState *s, Error *error_in)
> +void migration_connect(MigrationState *s, Error *error_in)
>   {
>       Error *local_err = NULL;
>       uint64_t rate_limit;
> @@ -3847,24 +3847,24 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>       /*
>        * If there's a previous error, free it and prepare for another one.
>        * Meanwhile if migration completes successfully, there won't have an error
> -     * dumped when calling migrate_fd_cleanup().
> +     * dumped when calling migration_cleanup().
>        */
>       migrate_error_free(s);
>   
>       s->expected_downtime = migrate_downtime_limit();
>       if (error_in) {
> -        migrate_fd_error(s, error_in);
> +        migration_connect_set_error(s, error_in);
>           if (resume) {
>               /*
>                * Don't do cleanup for resume if channel is invalid, but only dump
>                * the error.  We wait for another channel connect from the user.
>                * The error_report still gives HMP user a hint on what failed.
> -             * It's normally done in migrate_fd_cleanup(), but call it here
> +             * It's normally done in migration_cleanup(), but call it here
>                * explicitly.
>                */
>               error_report_err(error_copy(s->error));
>           } else {
> -            migrate_fd_cleanup(s);
> +            migration_cleanup(s);
>           }
>           return;
>       }
> @@ -3944,7 +3944,7 @@ fail:
>       migrate_set_error(s, local_err);
>       migrate_set_state(&s->state, s->state, MIGRATION_STATUS_FAILED);
>       error_report_err(local_err);
> -    migrate_fd_cleanup(s);
> +    migration_cleanup(s);
>   }
>   
>   static void migration_class_init(ObjectClass *klass, void *data)
> diff --git a/migration/migration.h b/migration/migration.h
> index 3547420c52..76afef34c4 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -484,7 +484,7 @@ bool  migration_has_all_channels(void);
>   void migrate_set_error(MigrationState *s, const Error *error);
>   bool migrate_has_error(MigrationState *s);
>   
> -void migrate_fd_connect(MigrationState *s, Error *error_in);
> +void migration_connect(MigrationState *s, Error *error_in);
>   
>   int migration_call_notifiers(MigrationState *s, MigrationEventType type,
>                                Error **errp);
> diff --git a/migration/multifd.c b/migration/multifd.c
> index ab73d6d984..d7ea8668ad 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -444,7 +444,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
>            * channels have no I/O handler callback registered when reaching
>            * here, because migration thread will wait for all multifd channel
>            * establishments to complete during setup.  Since
> -         * migrate_fd_cleanup() will be scheduled in main thread too, all
> +         * migration_cleanup() will be scheduled in main thread too, all
>            * previous callbacks should guarantee to be completed when
>            * reaching here.  See multifd_send_state.channels_created and its
>            * usage.  In the future, we could replace this with an assert
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 855753c671..76fb034923 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -4174,7 +4174,7 @@ void rdma_start_outgoing_migration(void *opaque,
>   
>       s->to_dst_file = rdma_new_output(rdma);
>       s->rdma_migration = true;
> -    migrate_fd_connect(s, NULL);
> +    migration_connect(s, NULL);
>       return;
>   return_path_err:
>       qemu_rdma_cleanup(rdma);
> diff --git a/migration/trace-events b/migration/trace-events
> index d22600abe6..58c0f07f5b 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -154,7 +154,7 @@ multifd_set_outgoing_channel(void *ioc, const char *ioctype, const char *hostnam
>   
>   # migration.c
>   migrate_set_state(const char *new_state) "new state %s"
> -migrate_fd_cleanup(void) ""
> +migration_cleanup(void) ""
>   migrate_error(const char *error_desc) "error=%s"
>   migration_cancel(void) ""
>   migrate_handle_rp_req_pages(const char *rbname, size_t start, size_t len) "in %s at 0x%zx len 0x%zx"

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-18  7:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 17:59 [PATCH v3 0/9] migration: Fix issues during qmp_migrate_cancel Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 1/9] migration: Set migration error outside of migrate_cancel Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 2/9] migration: Unify migration_cancel and migrate_fd_cancel Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 3/9] migration: Change migrate_fd_ to migration_ Fabiano Rosas
2025-02-13 20:59   ` Peter Xu
2025-02-18  7:54   ` Zhijian Li (Fujitsu) via
2025-02-13 17:59 ` [PATCH v3 4/9] migration: Fix hang after error in destination setup phase Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 5/9] migration: Reject qmp_migrate_cancel after postcopy Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 6/9] migration: Don't set FAILED state when cancelling Fabiano Rosas
2025-02-13 21:04   ` Peter Xu
2025-02-14 12:25     ` Fabiano Rosas
2025-02-14 15:13       ` Peter Xu
2025-02-13 17:59 ` [PATCH v3 7/9] tests/qtest/migration: Introduce migration_test_add_suffix Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 8/9] tests/qtest/migration: Add a cancel test Fabiano Rosas
2025-02-13 17:59 ` [PATCH v3 9/9] migration: Update migrate_cancel documentation Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).