qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] migration: New postcopy state, and some cleanups
@ 2024-06-17 18:15 Peter Xu
  2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

v2:
- Collect tags
- Patch 3
  - cover all states in migration_postcopy_is_alive()
- Patch 4 (old)
  - English changes [Fabiano]
  - Split the migration_incoming_state_setup() cleanup into a new patch
    [Fabiano]
  - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
  - Keep using explicit state check in migrate_fd_connect() for resume
    [Fabiano]
- New patches
  - New doc update: "migration/docs: Update postcopy recover session for
    SETUP phase"
  - New test case: last four patches

v1: https://lore.kernel.org/r/20240612144228.1179240-1-peterx@redhat.com

The major goal of this patchset is patch 5, which introduced a new postcopy
state so that we will send an event in postcopy reconnect failures that
Libvirt would prefer to have.  There's more information for that issue in
the commit message alone.

Patch 1-2 are cleanups that are not directly relevant but I found/stored
that could be good to have.  I made it simple by putting them together in
one thread to make patch management easier, but I can send them separately
when necessary.

Patch 3 is also a cleanup, but will be needed for patch 4 as dependency.

Patch 4-5 is the core patches.

Patch 6 updates doc for the new state.

Patch 7-10 adds a new test for the new state.

Comments welcomed, thanks.

CI: https://gitlab.com/peterx/qemu/-/pipelines/1335604588
    (check-dco & check-patch fail to git-fetch, but doesn't look relevant)

Peter Xu (10):
  migration/multifd: Avoid the final FLUSH in complete()
  migration: Rename thread debug names
  migration: Use MigrationStatus instead of int
  migration: Cleanup incoming migration setup state change
  migration/postcopy: Add postcopy-recover-setup phase
  migration/docs: Update postcopy recover session for SETUP phase
  tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure
    tests
  tests/migration-tests: Always enable migration events
  tests/migration-tests: Verify postcopy-recover-setup status
  tests/migration-tests: Cover postcopy failure on reconnect

 docs/devel/migration/postcopy.rst |  31 +++++----
 qapi/migration.json               |   4 ++
 migration/migration.h             |   9 +--
 migration/postcopy-ram.h          |   3 +
 tests/qtest/migration-helpers.h   |   2 +
 migration/colo.c                  |   2 +-
 migration/migration.c             |  98 ++++++++++++++++++--------
 migration/multifd.c               |   6 +-
 migration/postcopy-ram.c          |  10 ++-
 migration/ram.c                   |   4 --
 migration/savevm.c                |   6 +-
 tests/qtest/migration-helpers.c   |  20 ++++++
 tests/qtest/migration-test.c      | 110 ++++++++++++++++++++++++------
 13 files changed, 223 insertions(+), 82 deletions(-)

-- 
2.45.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete()
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

We always do the flush when finishing one round of scan, and during
complete() phase we should scan one more round making sure no dirty page
existed.  In that case we shouldn't need one explicit FLUSH at the end of
complete(), as when reaching there all pages should have been flushed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Tested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ceea586b06..edec1a2d07 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3300,10 +3300,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
         }
     }
 
-    if (migrate_multifd() && !migrate_multifd_flush_after_each_section() &&
-        !migrate_mapped_ram()) {
-        qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-    }
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     return qemu_fflush(f);
 }
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 02/10] migration: Rename thread debug names
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
  2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-19  1:05   ` Zhijian Li (Fujitsu) via
  2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

The postcopy thread names on dest QEMU are slightly confusing, partly I'll
need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
preparation on channel creation").  E.g., "fault-fast" reads like a fast
version of "fault-default", but it's actually the fast version of
"postcopy/listen".

Taking this chance, rename all the migration threads with proper rules.
Considering we only have 15 chars usable, prefix all threads with "mig/",
meanwhile identify src/dst threads properly this time.  So now most thread
names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
the bg-snapshot thread which doesn't have a direction.

For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".

We used to have "live_migration" thread for a very long time, now it's
called "mig/src/main".  We may hope to have "mig/dst/main" soon but not
yet.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/colo.c         | 2 +-
 migration/migration.c    | 6 +++---
 migration/multifd.c      | 6 +++---
 migration/postcopy-ram.c | 4 ++--
 migration/savevm.c       | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index f96c2ee069..6449490221 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void)
     assert(bql_locked());
     assert(migration_incoming_colo_enabled());
 
-    qemu_thread_create(&th, "COLO incoming", colo_process_incoming_thread,
+    qemu_thread_create(&th, "mig/dst/colo", colo_process_incoming_thread,
                        mis, QEMU_THREAD_JOINABLE);
 
     mis->colo_incoming_co = qemu_coroutine_self();
diff --git a/migration/migration.c b/migration/migration.c
index e1b269624c..d41e00ed4c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2408,7 +2408,7 @@ static int open_return_path_on_source(MigrationState *ms)
 
     trace_open_return_path_on_source();
 
-    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
+    qemu_thread_create(&ms->rp_state.rp_thread, "mig/src/rp-thr",
                        source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
     ms->rp_state.rp_thread_created = true;
 
@@ -3747,10 +3747,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
     }
 
     if (migrate_background_snapshot()) {
-        qemu_thread_create(&s->thread, "bg_snapshot",
+        qemu_thread_create(&s->thread, "mig/snapshot",
                 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
     } else {
-        qemu_thread_create(&s->thread, "live_migration",
+        qemu_thread_create(&s->thread, "mig/src/main",
                 migration_thread, s, QEMU_THREAD_JOINABLE);
     }
     s->migration_thread_running = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index f317bff077..7afc0965f6 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1059,7 +1059,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p,
     args->p = p;
 
     p->tls_thread_created = true;
-    qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker",
+    qemu_thread_create(&p->tls_thread, "mig/src/tls",
                        multifd_tls_handshake_thread, args,
                        QEMU_THREAD_JOINABLE);
     return true;
@@ -1185,7 +1185,7 @@ bool multifd_send_setup(void)
         } else {
             p->iov = g_new0(struct iovec, page_count);
         }
-        p->name = g_strdup_printf("multifdsend_%d", i);
+        p->name = g_strdup_printf("mig/src/send_%d", i);
         p->page_size = qemu_target_page_size();
         p->page_count = page_count;
         p->write_flags = 0;
@@ -1601,7 +1601,7 @@ int multifd_recv_setup(Error **errp)
                 + sizeof(uint64_t) * page_count;
             p->packet = g_malloc0(p->packet_len);
         }
-        p->name = g_strdup_printf("multifdrecv_%d", i);
+        p->name = g_strdup_printf("mig/dst/recv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
         p->zero = g_new0(ram_addr_t, page_count);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3419779548..97701e6bb2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1238,7 +1238,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
         return -1;
     }
 
-    postcopy_thread_create(mis, &mis->fault_thread, "fault-default",
+    postcopy_thread_create(mis, &mis->fault_thread, "mig/dst/fault",
                            postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
     mis->have_fault_thread = true;
 
@@ -1258,7 +1258,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
          * This thread needs to be created after the temp pages because
          * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
          */
-        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast",
+        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "mig/dst/preempt",
                                postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
         mis->preempt_thread_status = PREEMPT_THREAD_CREATED;
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index c621f2359b..e71410d8c1 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2129,7 +2129,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
     }
 
     mis->have_listen_thread = true;
-    postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen",
+    postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen",
                            postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
     trace_loadvm_postcopy_handle_listen("return");
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 03/10] migration: Use MigrationStatus instead of int
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
  2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
  2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:38   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

QEMU uses "int" in most cases even if it stores MigrationStatus.  I don't
know why, so let's try to do that right and see what blows up..

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h |  9 +++++----
 migration/migration.c | 24 +++++++-----------------
 2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 6af01362d4..38aa1402d5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -160,7 +160,7 @@ struct MigrationIncomingState {
     /* PostCopyFD's for external userfaultfds & handlers of shared memory */
     GArray   *postcopy_remote_fds;
 
-    int state;
+    MigrationStatus state;
 
     /*
      * The incoming migration coroutine, non-NULL during qemu_loadvm_state().
@@ -301,7 +301,7 @@ struct MigrationState {
     /* params from 'migrate-set-parameters' */
     MigrationParameters parameters;
 
-    int state;
+    MigrationStatus state;
 
     /* State related to return path */
     struct {
@@ -459,7 +459,8 @@ struct MigrationState {
     bool rdma_migration;
 };
 
-void migrate_set_state(int *state, int old_state, int new_state);
+void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
+                       MigrationStatus new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
@@ -479,7 +480,7 @@ int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
-bool migration_postcopy_is_alive(int state);
+bool migration_postcopy_is_alive(MigrationStatus state);
 MigrationState *migrate_get_current(void);
 bool migration_has_failed(MigrationState *);
 bool migrate_mode_is_cpr(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index d41e00ed4c..75c9d80e8e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -390,7 +390,7 @@ void migration_incoming_state_destroy(void)
     yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
-static void migrate_generate_event(int new_state)
+static void migrate_generate_event(MigrationStatus new_state)
 {
     if (migrate_events()) {
         qapi_event_send_migration(new_state);
@@ -1273,8 +1273,6 @@ static void fill_destination_migration_info(MigrationInfo *info)
     }
 
     switch (mis->state) {
-    case MIGRATION_STATUS_NONE:
-        return;
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_CANCELLING:
     case MIGRATION_STATUS_CANCELLED:
@@ -1290,6 +1288,8 @@ static void fill_destination_migration_info(MigrationInfo *info)
         info->has_status = true;
         fill_destination_postcopy_migration_info(info);
         break;
+    default:
+        return;
     }
     info->status = mis->state;
 
@@ -1337,7 +1337,8 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 /* shared migration helpers */
 
-void migrate_set_state(int *state, int old_state, int new_state)
+void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
+                       MigrationStatus new_state)
 {
     assert(new_state < MIGRATION_STATUS__MAX);
     if (qatomic_cmpxchg(state, old_state, new_state) == old_state) {
@@ -1544,7 +1545,7 @@ bool migration_in_postcopy(void)
     }
 }
 
-bool migration_postcopy_is_alive(int state)
+bool migration_postcopy_is_alive(MigrationStatus state)
 {
     switch (state) {
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
@@ -1589,20 +1590,9 @@ bool migration_is_idle(void)
     case MIGRATION_STATUS_COMPLETED:
     case MIGRATION_STATUS_FAILED:
         return true;
-    case MIGRATION_STATUS_SETUP:
-    case MIGRATION_STATUS_CANCELLING:
-    case MIGRATION_STATUS_ACTIVE:
-    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
-    case MIGRATION_STATUS_COLO:
-    case MIGRATION_STATUS_PRE_SWITCHOVER:
-    case MIGRATION_STATUS_DEVICE:
-    case MIGRATION_STATUS_WAIT_UNPLUG:
+    default:
         return false;
-    case MIGRATION_STATUS__MAX:
-        g_assert_not_reached();
     }
-
-    return false;
 }
 
 bool migration_is_active(void)
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 04/10] migration: Cleanup incoming migration setup state change
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (2 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:41   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Destination QEMU can setup incoming ports for two purposes: either a fresh
new incoming migration, in which QEMU will switch to SETUP for channel
establishment, or a paused postcopy migration, in which QEMU will stay in
POSTCOPY_PAUSED until kicking off the RECOVER phase.

Now the state machine worked on dest node for the latter, only because
migrate_set_state() implicitly will become a noop if the current state
check failed.  It wasn't clear at all.

Clean it up by providing a helper migration_incoming_state_setup() doing
proper checks over current status.  Postcopy-paused will be explicitly
checked now, and then we can bail out for unknown states.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 75c9d80e8e..59442181a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -595,6 +595,29 @@ bool migrate_uri_parse(const char *uri, MigrationChannel **channel,
     return true;
 }
 
+static bool
+migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp)
+{
+    MigrationStatus current = mis->state;
+
+    if (current == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        /*
+         * Incoming postcopy migration will stay in PAUSED state even if
+         * reconnection happened.
+         */
+        return true;
+    }
+
+    if (current != MIGRATION_STATUS_NONE) {
+        error_setg(errp, "Illegal migration incoming state: %s",
+                   MigrationStatus_str(current));
+        return false;
+    }
+
+    migrate_set_state(&mis->state, current, MIGRATION_STATUS_SETUP);
+    return true;
+}
+
 static void qemu_start_incoming_migration(const char *uri, bool has_channels,
                                           MigrationChannelList *channels,
                                           Error **errp)
@@ -633,8 +656,9 @@ static void qemu_start_incoming_migration(const char *uri, bool has_channels,
         return;
     }
 
-    migrate_set_state(&mis->state, MIGRATION_STATUS_NONE,
-                      MIGRATION_STATUS_SETUP);
+    if (!migration_incoming_state_setup(mis, errp)) {
+        return;
+    }
 
     if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
         SocketAddress *saddr = &addr->u.socket;
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (3 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:45   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

This patch adds a migration state on src called "postcopy-recover-setup".
The new state will describe the intermediate step starting from when the
src QEMU received a postcopy recovery request, until the migration channels
are properly established, but before the recovery process take place.

The request came from Libvirt where Libvirt currently rely on the migration
state events to detect migration state changes.  That works for most of the
migration process but except postcopy recovery failures at the beginning.

Currently postcopy recovery only has two major states:

  - postcopy-paused: this is the state that both sides of QEMU will be in
    for a long time as long as the migration channel was interrupted.

  - postcopy-recover: this is the state where both sides of QEMU handshake
    with each other, preparing for a continuation of postcopy which used to
    be interrupted.

The issue here is when the recovery port is invalid, the src QEMU will take
the URI/channels, noticing the ports are not valid, and it'll silently keep
in the postcopy-paused state, with no event sent to Libvirt.  In this case,
the only thing Libvirt can do is to poll the migration status with a proper
interval, however that's less optimal.

Considering that this is the only case where Libvirt won't get a
notification from QEMU on such events, let's add postcopy-recover-setup
state to mimic what we have with the "setup" state of a newly initialized
migration, describing the phase of connection establishment.

With that, postcopy recovery will have two paths to go now, and either path
will guarantee an event generated.  Now the events will look like this
during a recovery process on src QEMU:

  - Initially when the recovery is initiated on src, QEMU will go from
    "postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
    this event.

  - Depending on whether the channel re-establishment is succeeded:

    - In succeeded case, src QEMU will move from "postcopy-recover-setup"
      to "postcopy-recover".  Old QEMUs also have this event.

    - In failure case, src QEMU will move from "postcopy-recover-setup" to
      "postcopy-paused" again.  Old QEMUs don't have this event.

This guarantees that Libvirt will always receive a notification for
recovery process properly.

One thing to mention is, such new status is only needed on src QEMU not
both.  On dest QEMU, the state machine doesn't change.  Hence the events
don't change either.  It's done like so because dest QEMU may not have an
explicit point of setup start.  E.g., it can happen that when dest QEMUs
doesn't use migrate-recover command to use a new URI/channel, but the old
URI/channels can be reused in recovery, in which case the old ports simply
can work again after the network routes are fixed up.

Add a new helper postcopy_is_paused() detecting whether postcopy is still
paused, taking RECOVER_SETUP into account too.  When using it on both
src/dst, a slight change is done altogether to always wait for the
semaphore before checking the status, because for both sides a sem_post()
will be required for a recovery.

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>
Cc: Prasad Pandit <ppandit@redhat.com>
Buglink: https://issues.redhat.com/browse/RHEL-38485
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json      |  4 ++++
 migration/postcopy-ram.h |  3 +++
 migration/migration.c    | 40 ++++++++++++++++++++++++++++++++++------
 migration/postcopy-ram.c |  6 ++++++
 migration/savevm.c       |  4 ++--
 5 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index a351fd3714..565c40b637 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -142,6 +142,9 @@
 #
 # @postcopy-paused: during postcopy but paused.  (since 3.0)
 #
+# @postcopy-recover-setup: setup phase for a postcopy recovery process,
+#     preparing for a recovery phase to start.  (since 9.1)
+#
 # @postcopy-recover: trying to recover from a paused postcopy.  (since
 #     3.0)
 #
@@ -166,6 +169,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
             'active', 'postcopy-active', 'postcopy-paused',
+            'postcopy-recover-setup',
             'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device', 'wait-unplug' ] }
 ##
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ecae941211..a6df1b2811 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "qapi/qapi-types-migration.h"
+
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis,
                                     Error **errp);
@@ -193,5 +195,6 @@ enum PostcopyChannels {
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
 void postcopy_preempt_setup(MigrationState *s);
 int postcopy_preempt_establish_channel(MigrationState *s);
+bool postcopy_is_paused(MigrationStatus status);
 
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 59442181a1..fc390115bf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1094,6 +1094,7 @@ bool migration_is_setup_or_active(void)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
@@ -1116,6 +1117,7 @@ bool migration_is_running(void)
     case MIGRATION_STATUS_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
     case MIGRATION_STATUS_SETUP:
     case MIGRATION_STATUS_PRE_SWITCHOVER:
@@ -1253,6 +1255,7 @@ static void fill_source_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_PRE_SWITCHOVER:
     case MIGRATION_STATUS_DEVICE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
         /* TODO add some postcopy stats */
         populate_time_info(info, s);
@@ -1459,9 +1462,30 @@ static void migrate_error_free(MigrationState *s)
 
 static void migrate_fd_error(MigrationState *s, const Error *error)
 {
+    MigrationStatus current = s->state;
+    MigrationStatus next;
+
     assert(s->to_dst_file == NULL);
-    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                      MIGRATION_STATUS_FAILED);
+
+    switch (current) {
+    case MIGRATION_STATUS_SETUP:
+        next = MIGRATION_STATUS_FAILED;
+        break;
+    case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
+        /* Never fail a postcopy migration; switch back to PAUSED instead */
+        next = MIGRATION_STATUS_POSTCOPY_PAUSED;
+        break;
+    default:
+        /*
+         * This really shouldn't happen. Just be careful to not crash a VM
+         * just for this.  Instead, dump something.
+         */
+        error_report("%s: Illegal migration status (%s) detected",
+                     __func__, MigrationStatus_str(current));
+        return;
+    }
+
+    migrate_set_state(&s->state, current, next);
     migrate_set_error(s, error);
 }
 
@@ -1562,6 +1586,7 @@ bool migration_in_postcopy(void)
     switch (s->state) {
     case MIGRATION_STATUS_POSTCOPY_ACTIVE:
     case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP:
     case MIGRATION_STATUS_POSTCOPY_RECOVER:
         return true;
     default:
@@ -1949,6 +1974,9 @@ static bool migrate_prepare(MigrationState *s, bool resume, Error **errp)
             return false;
         }
 
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+                          MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
+
         /* This is a resume, skip init status */
         return true;
     }
@@ -2981,9 +3009,9 @@ static MigThrError postcopy_pause(MigrationState *s)
          * We wait until things fixed up. Then someone will setup the
          * status back for us.
          */
-        while (s->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+        do {
             qemu_sem_wait(&s->postcopy_pause_sem);
-        }
+        } while (postcopy_is_paused(s->state));
 
         if (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER) {
             /* Woken up by a recover procedure. Give it a shot */
@@ -3679,7 +3707,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
 {
     Error *local_err = NULL;
     uint64_t rate_limit;
-    bool resume = s->state == MIGRATION_STATUS_POSTCOPY_PAUSED;
+    bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
     int ret;
 
     /*
@@ -3746,7 +3774,7 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
 
     if (resume) {
         /* Wakeup the main migration thread to do the recovery */
-        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_PAUSED,
+        migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP,
                           MIGRATION_STATUS_POSTCOPY_RECOVER);
         qemu_sem_post(&s->postcopy_pause_sem);
         return;
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 97701e6bb2..1c374b7ea1 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1770,3 +1770,9 @@ void *postcopy_preempt_thread(void *opaque)
 
     return NULL;
 }
+
+bool postcopy_is_paused(MigrationStatus status)
+{
+    return status == MIGRATION_STATUS_POSTCOPY_PAUSED ||
+        status == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP;
+}
diff --git a/migration/savevm.c b/migration/savevm.c
index e71410d8c1..deb57833f8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2864,9 +2864,9 @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     error_report("Detected IO failure for postcopy. "
                  "Migration paused.");
 
-    while (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+    do {
         qemu_sem_wait(&mis->postcopy_pause_sem_dst);
-    }
+    } while (postcopy_is_paused(mis->state));
 
     trace_postcopy_pause_incoming_continued();
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (4 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:47   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Firstly, the "Paused" state was added in the wrong place before. The state
machine section was describing PostcopyState, rather than MigrationStatus.
Drop the Paused state descriptions.

Then in the postcopy recover session, add more information on the state
machine for MigrationStatus in the lines.  Add the new RECOVER_SETUP phase.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 docs/devel/migration/postcopy.rst | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst
index 6c51e96d79..a15594e11f 100644
--- a/docs/devel/migration/postcopy.rst
+++ b/docs/devel/migration/postcopy.rst
@@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END
     (although it can't do the cleanup it would do as it
     finishes a normal migration).
 
- - Paused
-
-    Postcopy can run into a paused state (normally on both sides when
-    happens), where all threads will be temporarily halted mostly due to
-    network errors.  When reaching paused state, migration will make sure
-    the qemu binary on both sides maintain the data without corrupting
-    the VM.  To continue the migration, the admin needs to fix the
-    migration channel using the QMP command 'migrate-recover' on the
-    destination node, then resume the migration using QMP command 'migrate'
-    again on source node, with resume=true flag set.
-
  - End
 
     The listen thread can now quit, and perform the cleanup of migration
@@ -221,7 +210,8 @@ paused postcopy migration.
 
 The recovery phase normally contains a few steps:
 
-  - When network issue occurs, both QEMU will go into PAUSED state
+  - When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED**
+    migration state.
 
   - When the network is recovered (or a new network is provided), the admin
     can setup the new channel for migration using QMP command
@@ -229,9 +219,20 @@ The recovery phase normally contains a few steps:
 
   - On source host, the admin can continue the interrupted postcopy
     migration using QMP command 'migrate' with resume=true flag set.
-
-  - After the connection is re-established, QEMU will continue the postcopy
-    migration on both sides.
+    Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to
+    re-establish the channels.
+
+  - When both sides of QEMU successfully reconnects using a new or fixed up
+    channel, they will go into **POSTCOPY_RECOVER** state, some handshake
+    procedure will be needed to properly synchronize the VM states between
+    the two QEMUs to continue the postcopy migration.  For example, there
+    can be pages sent right during the window when the network is
+    interrupted, then the handshake will guarantee pages lost in-flight
+    will be resent again.
+
+  - After a proper handshake synchronization, QEMU will continue the
+    postcopy migration on both sides and go back to **POSTCOPY_ACTIVE**
+    state.  Postcopy migration will continue.
 
 During a paused postcopy migration, the VM can logically still continue
 running, and it will not be impacted from any page access to pages that
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (5 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:49   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Most of them are not needed, we can stick with one ifdef inside
postcopy_recover_fail() so as to cover the scm right tricks only.
The tests won't run on windows anyway due to has_uffd always false.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b7e3406471..13b59d4c10 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1353,9 +1353,9 @@ static void wait_for_postcopy_status(QTestState *one, const char *status)
                                                   "completed", NULL });
 }
 
-#ifndef _WIN32
 static void postcopy_recover_fail(QTestState *from, QTestState *to)
 {
+#ifndef _WIN32
     int ret, pair1[2], pair2[2];
     char c;
 
@@ -1417,8 +1417,8 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
     close(pair1[1]);
     close(pair2[0]);
     close(pair2[1]);
+#endif
 }
-#endif /* _WIN32 */
 
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
@@ -1458,7 +1458,6 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
     wait_for_postcopy_status(to, "postcopy-paused");
     wait_for_postcopy_status(from, "postcopy-paused");
 
-#ifndef _WIN32
     if (args->postcopy_recovery_test_fail) {
         /*
          * Test when a wrong socket specified for recover, and then the
@@ -1467,7 +1466,6 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
         postcopy_recover_fail(from, to);
         /* continue with a good recovery */
     }
-#endif /* _WIN32 */
 
     /*
      * Create a new socket to emulate a new channel that is different
@@ -1496,7 +1494,6 @@ static void test_postcopy_recovery(void)
     test_postcopy_recovery_common(&args);
 }
 
-#ifndef _WIN32
 static void test_postcopy_recovery_double_fail(void)
 {
     MigrateCommon args = {
@@ -1505,7 +1502,6 @@ static void test_postcopy_recovery_double_fail(void)
 
     test_postcopy_recovery_common(&args);
 }
-#endif /* _WIN32 */
 
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_recovery_tls_psk(void)
@@ -3486,10 +3482,8 @@ int main(int argc, char **argv)
                            test_postcopy_preempt);
         migration_test_add("/migration/postcopy/preempt/recovery/plain",
                            test_postcopy_preempt_recovery);
-#ifndef _WIN32
         migration_test_add("/migration/postcopy/recovery/double-failures",
                            test_postcopy_recovery_double_fail);
-#endif /* _WIN32 */
         if (is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 08/10] tests/migration-tests: Always enable migration events
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (6 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:51   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Libvirt should always enable it, so it'll be nice qtest also cover that for
all tests.  Though this patch only enables it, no extra tests are done on
these events yet.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 13b59d4c10..9ae8892e26 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
         unlink(shmem_path);
     }
 
+    /*
+     * Always enable migration events.  Libvirt always uses it, let's try
+     * to mimic as closer as that.
+     */
+    migrate_set_capability(*from, "events", true);
+    migrate_set_capability(*to, "events", true);
+
     return 0;
 }
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (7 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 19:53   ` Fabiano Rosas
  2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
  2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Making sure the postcopy-recover-setup status is present in the postcopy
failure unit test.  Note that it only applies to src QEMU not dest.

This also introduces the tiny but helpful migration_event_wait() helper.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-helpers.c | 20 ++++++++++++++++++++
 tests/qtest/migration-test.c    |  6 ++++++
 3 files changed, 28 insertions(+)

diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 1339835698..356057b4a0 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -55,4 +55,6 @@ char *find_common_machine_version(const char *mtype, const char *var1,
 char *resolve_machine_version(const char *alias, const char *var1,
                               const char *var2);
 void migration_test_add(const char *path, void (*fn)(void));
+void migration_event_wait(QTestState *s, const char *target);
+
 #endif /* MIGRATION_HELPERS_H */
diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index ce6d6615b5..c0e2066270 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -473,3 +473,23 @@ void migration_test_add(const char *path, void (*fn)(void))
     qtest_add_data_func_full(path, test, migration_test_wrapper,
                              migration_test_destroy);
 }
+
+/*
+ * Wait for a "MIGRATION" event.  This is what Libvirt uses to track
+ * migration status changes.
+ */
+void migration_event_wait(QTestState *s, const char *target)
+{
+    QDict *response, *data;
+    const char *status;
+    bool found;
+
+    do {
+        response = qtest_qmp_eventwait_ref(s, "MIGRATION");
+        data = qdict_get_qdict(response, "data");
+        g_assert(data);
+        status = qdict_get_str(data, "status");
+        found = (strcmp(status, target) == 0);
+        qobject_unref(response);
+    } while (!found);
+}
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 9ae8892e26..a16b1a4824 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1402,6 +1402,12 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
     migrate_recover(to, "fd:fd-mig");
     migrate_qmp(from, to, "fd:fd-mig", NULL, "{'resume': true}");
 
+    /*
+     * Source QEMU has an extra RECOVER_SETUP phase, dest doesn't have it.
+     * Make sure it appears along the way.
+     */
+    migration_event_wait(from, "postcopy-recover-setup");
+
     /*
      * Make sure both QEMU instances will go into RECOVER stage, then test
      * kicking them out using migrate-pause.
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (8 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
@ 2024-06-17 18:15 ` Peter Xu
  2024-06-17 20:07   ` Fabiano Rosas
  2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 18:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Make sure there will be an event for postcopy recovery, irrelevant of
whether the reconnect will success, or when the failure happens.

The added new case is to fail early in postcopy recovery, in which case it
didn't even reach RECOVER stage on src (and in real life it'll be the same
to dest, but the test case is just slightly more involved due to the dual
socketpair setup).

To do that, rename the postcopy_recovery_test_fail to reflect either stage
to fail, instead of a boolean.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 89 ++++++++++++++++++++++++++++++------
 1 file changed, 74 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index a16b1a4824..3e237a1499 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -70,6 +70,17 @@ static QTestMigrationState dst_state;
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
 
+typedef enum PostcopyRecoveryFailStage {
+    /*
+     * "no failure" must be 0 as it's the default.  OTOH, real failure
+     * cases must be >0 to make sure they trigger by a "if" test.
+     */
+    POSTCOPY_FAIL_NONE = 0,
+    POSTCOPY_FAIL_CHANNEL_ESTABLISH,
+    POSTCOPY_FAIL_RECOVERY,
+    POSTCOPY_FAIL_MAX
+} PostcopyRecoveryFailStage;
+
 #if defined(__linux__)
 #include <sys/syscall.h>
 #include <sys/vfs.h>
@@ -680,7 +691,7 @@ typedef struct {
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
-    bool postcopy_recovery_test_fail;
+    PostcopyRecoveryFailStage postcopy_recovery_fail_stage;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1360,12 +1371,16 @@ static void wait_for_postcopy_status(QTestState *one, const char *status)
                                                   "completed", NULL });
 }
 
-static void postcopy_recover_fail(QTestState *from, QTestState *to)
+static void postcopy_recover_fail(QTestState *from, QTestState *to,
+                                  PostcopyRecoveryFailStage stage)
 {
 #ifndef _WIN32
+    bool fail_early = (stage == POSTCOPY_FAIL_CHANNEL_ESTABLISH);
     int ret, pair1[2], pair2[2];
     char c;
 
+    g_assert(stage > POSTCOPY_FAIL_NONE && stage < POSTCOPY_FAIL_MAX);
+
     /* Create two unrelated socketpairs */
     ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
     g_assert_cmpint(ret, ==, 0);
@@ -1399,6 +1414,14 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
     ret = send(pair2[1], &c, 1, 0);
     g_assert_cmpint(ret, ==, 1);
 
+    if (stage == POSTCOPY_FAIL_CHANNEL_ESTABLISH) {
+        /*
+         * This will make src QEMU to fail at an early stage when trying to
+         * resume later, where it shouldn't reach RECOVER stage at all.
+         */
+        close(pair1[1]);
+    }
+
     migrate_recover(to, "fd:fd-mig");
     migrate_qmp(from, to, "fd:fd-mig", NULL, "{'resume': true}");
 
@@ -1408,28 +1431,53 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
      */
     migration_event_wait(from, "postcopy-recover-setup");
 
+    if (fail_early) {
+        /*
+         * When fails at reconnection, src QEMU will automatically goes
+         * back to PAUSED state.  Making sure there is an event in this
+         * case: Libvirt relies on this to detect early reconnection
+         * errors.
+         */
+        migration_event_wait(from, "postcopy-paused");
+    } else {
+        /*
+         * We want to test "fail later" at RECOVER stage here.  Make sure
+         * both QEMU instances will go into RECOVER stage first, then test
+         * kicking them out using migrate-pause.
+         *
+         * Explicitly check the RECOVER event on src, that's what Libvirt
+         * relies on, rather than polling.
+         */
+        migration_event_wait(from, "postcopy-recover");
+        wait_for_postcopy_status(from, "postcopy-recover");
+
+        /* Need an explicit kick on src QEMU in this case */
+        migrate_pause(from);
+    }
+
     /*
-     * Make sure both QEMU instances will go into RECOVER stage, then test
-     * kicking them out using migrate-pause.
+     * For all failure cases, we'll reach such states on both sides now.
+     * Check them.
      */
-    wait_for_postcopy_status(from, "postcopy-recover");
+    wait_for_postcopy_status(from, "postcopy-paused");
     wait_for_postcopy_status(to, "postcopy-recover");
 
     /*
-     * This would be issued by the admin upon noticing the hang, we should
-     * make sure we're able to kick this out.
+     * Kick dest QEMU out too. This is normally not needed in reality
+     * because when the channel is shutdown it should also happens on src.
+     * However here we used separate socket pairs so we need to do that
+     * explicitly.
      */
-    migrate_pause(from);
-    wait_for_postcopy_status(from, "postcopy-paused");
-
-    /* Do the same test on dest */
     migrate_pause(to);
     wait_for_postcopy_status(to, "postcopy-paused");
 
     close(pair1[0]);
-    close(pair1[1]);
     close(pair2[0]);
     close(pair2[1]);
+
+    if (stage != POSTCOPY_FAIL_CHANNEL_ESTABLISH) {
+        close(pair1[1]);
+    }
 #endif
 }
 
@@ -1471,12 +1519,12 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
     wait_for_postcopy_status(to, "postcopy-paused");
     wait_for_postcopy_status(from, "postcopy-paused");
 
-    if (args->postcopy_recovery_test_fail) {
+    if (args->postcopy_recovery_fail_stage) {
         /*
          * Test when a wrong socket specified for recover, and then the
          * ability to kick it out, and continue with a correct socket.
          */
-        postcopy_recover_fail(from, to);
+        postcopy_recover_fail(from, to, args->postcopy_recovery_fail_stage);
         /* continue with a good recovery */
     }
 
@@ -1510,7 +1558,16 @@ static void test_postcopy_recovery(void)
 static void test_postcopy_recovery_double_fail(void)
 {
     MigrateCommon args = {
-        .postcopy_recovery_test_fail = true,
+        .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
+static void test_postcopy_recovery_channel_reconnect(void)
+{
+    MigrateCommon args = {
+        .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
     };
 
     test_postcopy_recovery_common(&args);
@@ -3497,6 +3554,8 @@ int main(int argc, char **argv)
                            test_postcopy_preempt_recovery);
         migration_test_add("/migration/postcopy/recovery/double-failures",
                            test_postcopy_recovery_double_fail);
+        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
+                           test_postcopy_recovery_channel_reconnect);
         if (is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 00/10] migration: New postcopy state, and some cleanups
  2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
                   ` (9 preceding siblings ...)
  2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
@ 2024-06-17 19:34 ` Peter Xu
  2024-06-17 20:12   ` Fabiano Rosas
  10 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 19:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, Jiri Denemark, Bandan Das

Hello,

On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
> v2:
> - Collect tags
> - Patch 3
>   - cover all states in migration_postcopy_is_alive()
> - Patch 4 (old)
>   - English changes [Fabiano]
>   - Split the migration_incoming_state_setup() cleanup into a new patch
>     [Fabiano]
>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>   - Keep using explicit state check in migrate_fd_connect() for resume
>     [Fabiano]
> - New patches
>   - New doc update: "migration/docs: Update postcopy recover session for
>     SETUP phase"
>   - New test case: last four patches

I just found that this won't apply on top of latest master, and also has a
trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
few days on comments, and resend v3 on top of your direct-io stuff.

Meanwhile I also plan to squash below fixup to the last test patch, just to
fix up a spelling error I just found, and also renamed the test cases (as
the new test is actually also a "double failure" test, just at different
phase).  Comments welcomed for that fixup even before a repost.

===8<===
From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Mon, 17 Jun 2024 14:40:15 -0400
Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
 reconnect

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index a4fed4cc6b..fe33b86783 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to,
 
     /*
      * Kick dest QEMU out too. This is normally not needed in reality
-     * because when the channel is shutdown it should also happens on src.
+     * because when the channel is shutdown it should also happen on src.
      * However here we used separate socket pairs so we need to do that
      * explicitly.
      */
@@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
     test_postcopy_recovery_common(&args);
 }
 
-static void test_postcopy_recovery_double_fail(void)
+static void test_postcopy_recovery_fail_handshake(void)
 {
     MigrateCommon args = {
         .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
@@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
     test_postcopy_recovery_common(&args);
 }
 
-static void test_postcopy_recovery_channel_reconnect(void)
+static void test_postcopy_recovery_fail_reconnect(void)
 {
     MigrateCommon args = {
         .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
@@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
                            test_postcopy_preempt);
         migration_test_add("/migration/postcopy/preempt/recovery/plain",
                            test_postcopy_preempt_recovery);
-        migration_test_add("/migration/postcopy/recovery/double-failures",
-                           test_postcopy_recovery_double_fail);
-        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
-                           test_postcopy_recovery_channel_reconnect);
+        migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
+                           test_postcopy_recovery_fail_handshake);
+        migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
+                           test_postcopy_recovery_fail_reconnect);
         if (is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
-- 
2.45.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 03/10] migration: Use MigrationStatus instead of int
  2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
@ 2024-06-17 19:38   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:38 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> QEMU uses "int" in most cases even if it stores MigrationStatus.  I don't
> know why, so let's try to do that right and see what blows up..
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 04/10] migration: Cleanup incoming migration setup state change
  2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
@ 2024-06-17 19:41   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:41 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Destination QEMU can setup incoming ports for two purposes: either a fresh
> new incoming migration, in which QEMU will switch to SETUP for channel
> establishment, or a paused postcopy migration, in which QEMU will stay in
> POSTCOPY_PAUSED until kicking off the RECOVER phase.
>
> Now the state machine worked on dest node for the latter, only because
> migrate_set_state() implicitly will become a noop if the current state
> check failed.  It wasn't clear at all.
>
> Clean it up by providing a helper migration_incoming_state_setup() doing
> proper checks over current status.  Postcopy-paused will be explicitly
> checked now, and then we can bail out for unknown states.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase
  2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
@ 2024-06-17 19:45   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:45 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> This patch adds a migration state on src called "postcopy-recover-setup".
> The new state will describe the intermediate step starting from when the
> src QEMU received a postcopy recovery request, until the migration channels
> are properly established, but before the recovery process take place.
>
> The request came from Libvirt where Libvirt currently rely on the migration
> state events to detect migration state changes.  That works for most of the
> migration process but except postcopy recovery failures at the beginning.
>
> Currently postcopy recovery only has two major states:
>
>   - postcopy-paused: this is the state that both sides of QEMU will be in
>     for a long time as long as the migration channel was interrupted.
>
>   - postcopy-recover: this is the state where both sides of QEMU handshake
>     with each other, preparing for a continuation of postcopy which used to
>     be interrupted.
>
> The issue here is when the recovery port is invalid, the src QEMU will take
> the URI/channels, noticing the ports are not valid, and it'll silently keep
> in the postcopy-paused state, with no event sent to Libvirt.  In this case,
> the only thing Libvirt can do is to poll the migration status with a proper
> interval, however that's less optimal.
>
> Considering that this is the only case where Libvirt won't get a
> notification from QEMU on such events, let's add postcopy-recover-setup
> state to mimic what we have with the "setup" state of a newly initialized
> migration, describing the phase of connection establishment.
>
> With that, postcopy recovery will have two paths to go now, and either path
> will guarantee an event generated.  Now the events will look like this
> during a recovery process on src QEMU:
>
>   - Initially when the recovery is initiated on src, QEMU will go from
>     "postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
>     this event.
>
>   - Depending on whether the channel re-establishment is succeeded:
>
>     - In succeeded case, src QEMU will move from "postcopy-recover-setup"
>       to "postcopy-recover".  Old QEMUs also have this event.
>
>     - In failure case, src QEMU will move from "postcopy-recover-setup" to
>       "postcopy-paused" again.  Old QEMUs don't have this event.
>
> This guarantees that Libvirt will always receive a notification for
> recovery process properly.
>
> One thing to mention is, such new status is only needed on src QEMU not
> both.  On dest QEMU, the state machine doesn't change.  Hence the events
> don't change either.  It's done like so because dest QEMU may not have an
> explicit point of setup start.  E.g., it can happen that when dest QEMUs
> doesn't use migrate-recover command to use a new URI/channel, but the old
> URI/channels can be reused in recovery, in which case the old ports simply
> can work again after the network routes are fixed up.
>
> Add a new helper postcopy_is_paused() detecting whether postcopy is still
> paused, taking RECOVER_SETUP into account too.  When using it on both
> src/dst, a slight change is done altogether to always wait for the
> semaphore before checking the status, because for both sides a sem_post()
> will be required for a recovery.
>
> Cc: Jiri Denemark <jdenemar@redhat.com>
> Cc: Fabiano Rosas <farosas@suse.de>
> Cc: Prasad Pandit <ppandit@redhat.com>
> Buglink: https://issues.redhat.com/browse/RHEL-38485
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase
  2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
@ 2024-06-17 19:47   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:47 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Firstly, the "Paused" state was added in the wrong place before. The state
> machine section was describing PostcopyState, rather than MigrationStatus.
> Drop the Paused state descriptions.
>
> Then in the postcopy recover session, add more information on the state
> machine for MigrationStatus in the lines.  Add the new RECOVER_SETUP phase.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>

> ---
>  docs/devel/migration/postcopy.rst | 31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/docs/devel/migration/postcopy.rst b/docs/devel/migration/postcopy.rst
> index 6c51e96d79..a15594e11f 100644
> --- a/docs/devel/migration/postcopy.rst
> +++ b/docs/devel/migration/postcopy.rst
> @@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END
>      (although it can't do the cleanup it would do as it
>      finishes a normal migration).
>  
> - - Paused
> -
> -    Postcopy can run into a paused state (normally on both sides when
> -    happens), where all threads will be temporarily halted mostly due to
> -    network errors.  When reaching paused state, migration will make sure
> -    the qemu binary on both sides maintain the data without corrupting
> -    the VM.  To continue the migration, the admin needs to fix the
> -    migration channel using the QMP command 'migrate-recover' on the
> -    destination node, then resume the migration using QMP command 'migrate'
> -    again on source node, with resume=true flag set.
> -
>   - End
>  
>      The listen thread can now quit, and perform the cleanup of migration
> @@ -221,7 +210,8 @@ paused postcopy migration.
>  
>  The recovery phase normally contains a few steps:
>  
> -  - When network issue occurs, both QEMU will go into PAUSED state
> +  - When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED**
> +    migration state.
>  
>    - When the network is recovered (or a new network is provided), the admin
>      can setup the new channel for migration using QMP command
> @@ -229,9 +219,20 @@ The recovery phase normally contains a few steps:
>  
>    - On source host, the admin can continue the interrupted postcopy
>      migration using QMP command 'migrate' with resume=true flag set.
> -
> -  - After the connection is re-established, QEMU will continue the postcopy
> -    migration on both sides.
> +    Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to
> +    re-establish the channels.
> +
> +  - When both sides of QEMU successfully reconnects using a new or fixed up

s/reconnects/reconnect

I can touch it up when queueing

> +    channel, they will go into **POSTCOPY_RECOVER** state, some handshake
> +    procedure will be needed to properly synchronize the VM states between
> +    the two QEMUs to continue the postcopy migration.  For example, there
> +    can be pages sent right during the window when the network is
> +    interrupted, then the handshake will guarantee pages lost in-flight
> +    will be resent again.
> +
> +  - After a proper handshake synchronization, QEMU will continue the
> +    postcopy migration on both sides and go back to **POSTCOPY_ACTIVE**
> +    state.  Postcopy migration will continue.
>  
>  During a paused postcopy migration, the VM can logically still continue
>  running, and it will not be impacted from any page access to pages that


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests
  2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
@ 2024-06-17 19:49   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:49 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Most of them are not needed, we can stick with one ifdef inside
> postcopy_recover_fail() so as to cover the scm right tricks only.
> The tests won't run on windows anyway due to has_uffd always false.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 08/10] tests/migration-tests: Always enable migration events
  2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
@ 2024-06-17 19:51   ` Fabiano Rosas
  2024-06-17 21:23     ` Peter Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:51 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Libvirt should always enable it, so it'll be nice qtest also cover that for
> all tests.  Though this patch only enables it, no extra tests are done on
> these events yet.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 13b59d4c10..9ae8892e26 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>          unlink(shmem_path);
>      }
>  
> +    /*
> +     * Always enable migration events.  Libvirt always uses it, let's try
> +     * to mimic as closer as that.
> +     */
> +    migrate_set_capability(*from, "events", true);
> +    migrate_set_capability(*to, "events", true);
> +

What do we do with the one at migrate_incoming_qmp()?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status
  2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
@ 2024-06-17 19:53   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 19:53 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Making sure the postcopy-recover-setup status is present in the postcopy
> failure unit test.  Note that it only applies to src QEMU not dest.
>
> This also introduces the tiny but helpful migration_event_wait() helper.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect
  2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
@ 2024-06-17 20:07   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 20:07 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, peterx, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Make sure there will be an event for postcopy recovery, irrelevant of
> whether the reconnect will success, or when the failure happens.
>
> The added new case is to fail early in postcopy recovery, in which case it
> didn't even reach RECOVER stage on src (and in real life it'll be the same
> to dest, but the test case is just slightly more involved due to the dual
> socketpair setup).
>
> To do that, rename the postcopy_recovery_test_fail to reflect either stage
> to fail, instead of a boolean.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 00/10] migration: New postcopy state, and some cleanups
  2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
@ 2024-06-17 20:12   ` Fabiano Rosas
  0 siblings, 0 replies; 24+ messages in thread
From: Fabiano Rosas @ 2024-06-17 20:12 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Eric Blake,
	Prasad Pandit, Jiri Denemark, Bandan Das

Peter Xu <peterx@redhat.com> writes:

> Hello,
>
> On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
>> v2:
>> - Collect tags
>> - Patch 3
>>   - cover all states in migration_postcopy_is_alive()
>> - Patch 4 (old)
>>   - English changes [Fabiano]
>>   - Split the migration_incoming_state_setup() cleanup into a new patch
>>     [Fabiano]
>>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>>   - Keep using explicit state check in migrate_fd_connect() for resume
>>     [Fabiano]
>> - New patches
>>   - New doc update: "migration/docs: Update postcopy recover session for
>>     SETUP phase"
>>   - New test case: last four patches
>
> I just found that this won't apply on top of latest master, and also has a
> trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
> few days on comments, and resend v3 on top of your direct-io stuff.
>
> Meanwhile I also plan to squash below fixup to the last test patch, just to
> fix up a spelling error I just found, and also renamed the test cases (as
> the new test is actually also a "double failure" test, just at different
> phase).  Comments welcomed for that fixup even before a repost.
>
> ===8<===
> From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Mon, 17 Jun 2024 14:40:15 -0400
> Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
>  reconnect
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index a4fed4cc6b..fe33b86783 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to,
>  
>      /*
>       * Kick dest QEMU out too. This is normally not needed in reality
> -     * because when the channel is shutdown it should also happens on src.
> +     * because when the channel is shutdown it should also happen on src.
>       * However here we used separate socket pairs so we need to do that
>       * explicitly.
>       */
> @@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> -static void test_postcopy_recovery_double_fail(void)
> +static void test_postcopy_recovery_fail_handshake(void)
>  {
>      MigrateCommon args = {
>          .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
> @@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> -static void test_postcopy_recovery_channel_reconnect(void)
> +static void test_postcopy_recovery_fail_reconnect(void)
>  {
>      MigrateCommon args = {
>          .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
> @@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
>                             test_postcopy_preempt);
>          migration_test_add("/migration/postcopy/preempt/recovery/plain",
>                             test_postcopy_preempt_recovery);
> -        migration_test_add("/migration/postcopy/recovery/double-failures",
> -                           test_postcopy_recovery_double_fail);
> -        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
> -                           test_postcopy_recovery_channel_reconnect);
> +        migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
> +                           test_postcopy_recovery_fail_handshake);
> +        migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
> +                           test_postcopy_recovery_fail_reconnect);
>          if (is_x86) {
>              migration_test_add("/migration/postcopy/suspend",
>                                 test_postcopy_suspend);
> -- 
> 2.45.0

LGTM


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 08/10] tests/migration-tests: Always enable migration events
  2024-06-17 19:51   ` Fabiano Rosas
@ 2024-06-17 21:23     ` Peter Xu
  2024-06-19 20:39       ` Peter Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2024-06-17 21:23 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Thomas Huth, Markus Armbruster, Laurent Vivier,
	Eric Blake, Prasad Pandit, Jiri Denemark, Bandan Das

On Mon, Jun 17, 2024 at 04:51:32PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Libvirt should always enable it, so it'll be nice qtest also cover that for
> > all tests.  Though this patch only enables it, no extra tests are done on
> > these events yet.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  tests/qtest/migration-test.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 13b59d4c10..9ae8892e26 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> >          unlink(shmem_path);
> >      }
> >  
> > +    /*
> > +     * Always enable migration events.  Libvirt always uses it, let's try
> > +     * to mimic as closer as that.
> > +     */
> > +    migrate_set_capability(*from, "events", true);
> > +    migrate_set_capability(*to, "events", true);
> > +
> 
> What do we do with the one at migrate_incoming_qmp()?

Hmm missed that..  I'll drop that one in this same patch and rewrite the
commit message.  New version attached:

===8<===
From 443fef4188d544362fc026b46784c15b82624642 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Mon, 17 Jun 2024 10:49:52 -0400
Subject: [PATCH] tests/migration-tests: Always enable migration events

Libvirt should always enable it, so it'll be nice qtest also cover that for
all tests on both sides.  migrate_incoming_qmp() used to enable it only on
dst, now we enable them on both, as we'll start to sanity check events even
on the src QEMU.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-helpers.c | 2 --
 tests/qtest/migration-test.c    | 7 +++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 0ac49ceb54..797b1e8c1c 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -258,8 +258,6 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
     g_assert(!qdict_haskey(args, "uri"));
     qdict_put_str(args, "uri", uri);
 
-    migrate_set_capability(to, "events", true);
-
     rsp = qtest_qmp(to, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
                     args);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 640713bfd5..c015e801ac 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -851,6 +851,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
         unlink(shmem_path);
     }
 
+    /*
+     * Always enable migration events.  Libvirt always uses it, let's try
+     * to mimic as closer as that.
+     */
+    migrate_set_capability(*from, "events", true);
+    migrate_set_capability(*to, "events", true);
+
     return 0;
 }
 
-- 
2.45.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 02/10] migration: Rename thread debug names
  2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
@ 2024-06-19  1:05   ` Zhijian Li (Fujitsu) via
  0 siblings, 0 replies; 24+ messages in thread
From: Zhijian Li (Fujitsu) via @ 2024-06-19  1:05 UTC (permalink / raw)
  To: Peter Xu, qemu-devel@nongnu.org
  Cc: Thomas Huth, Markus Armbruster, Laurent Vivier, Fabiano Rosas,
	Eric Blake, Prasad Pandit, Jiri Denemark, Bandan Das



On 18/06/2024 02:15, Peter Xu wrote:
> The postcopy thread names on dest QEMU are slightly confusing, partly I'll
> need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
> preparation on channel creation").  E.g., "fault-fast" reads like a fast
> version of "fault-default", but it's actually the fast version of
> "postcopy/listen".
> 
> Taking this chance, rename all the migration threads with proper rules.
> Considering we only have 15 chars usable, prefix all threads with "mig/",
> meanwhile identify src/dst threads properly this time.  So now most thread
> names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
> the bg-snapshot thread which doesn't have a direction.
> 
> For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".
> 
> We used to have "live_migration" thread for a very long time, now it's
> called "mig/src/main".  We may hope to have "mig/dst/main" soon but not
> yet.
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> Signed-off-by: Peter Xu <peterx@redhat.com>

Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>


> ---
>   migration/colo.c         | 2 +-
>   migration/migration.c    | 6 +++---
>   migration/multifd.c      | 6 +++---
>   migration/postcopy-ram.c | 4 ++--
>   migration/savevm.c       | 2 +-
>   5 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index f96c2ee069..6449490221 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void)
>       assert(bql_locked());
>       assert(migration_incoming_colo_enabled());
>   
> -    qemu_thread_create(&th, "COLO incoming", colo_process_incoming_thread,
> +    qemu_thread_create(&th, "mig/dst/colo", colo_process_incoming_thread,
>                          mis, QEMU_THREAD_JOINABLE);
>   
>       mis->colo_incoming_co = qemu_coroutine_self();
> diff --git a/migration/migration.c b/migration/migration.c
> index e1b269624c..d41e00ed4c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2408,7 +2408,7 @@ static int open_return_path_on_source(MigrationState *ms)
>   
>       trace_open_return_path_on_source();
>   
> -    qemu_thread_create(&ms->rp_state.rp_thread, "return path",
> +    qemu_thread_create(&ms->rp_state.rp_thread, "mig/src/rp-thr",
>                          source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
>       ms->rp_state.rp_thread_created = true;
>   
> @@ -3747,10 +3747,10 @@ void migrate_fd_connect(MigrationState *s, Error *error_in)
>       }
>   
>       if (migrate_background_snapshot()) {
> -        qemu_thread_create(&s->thread, "bg_snapshot",
> +        qemu_thread_create(&s->thread, "mig/snapshot",
>                   bg_migration_thread, s, QEMU_THREAD_JOINABLE);
>       } else {
> -        qemu_thread_create(&s->thread, "live_migration",
> +        qemu_thread_create(&s->thread, "mig/src/main",
>                   migration_thread, s, QEMU_THREAD_JOINABLE);
>       }
>       s->migration_thread_running = true;
> diff --git a/migration/multifd.c b/migration/multifd.c
> index f317bff077..7afc0965f6 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1059,7 +1059,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams *p,
>       args->p = p;
>   
>       p->tls_thread_created = true;
> -    qemu_thread_create(&p->tls_thread, "multifd-tls-handshake-worker",
> +    qemu_thread_create(&p->tls_thread, "mig/src/tls",
>                          multifd_tls_handshake_thread, args,
>                          QEMU_THREAD_JOINABLE);
>       return true;
> @@ -1185,7 +1185,7 @@ bool multifd_send_setup(void)
>           } else {
>               p->iov = g_new0(struct iovec, page_count);
>           }
> -        p->name = g_strdup_printf("multifdsend_%d", i);
> +        p->name = g_strdup_printf("mig/src/send_%d", i);
>           p->page_size = qemu_target_page_size();
>           p->page_count = page_count;
>           p->write_flags = 0;
> @@ -1601,7 +1601,7 @@ int multifd_recv_setup(Error **errp)
>                   + sizeof(uint64_t) * page_count;
>               p->packet = g_malloc0(p->packet_len);
>           }
> -        p->name = g_strdup_printf("multifdrecv_%d", i);
> +        p->name = g_strdup_printf("mig/dst/recv_%d", i);
>           p->iov = g_new0(struct iovec, page_count);
>           p->normal = g_new0(ram_addr_t, page_count);
>           p->zero = g_new0(ram_addr_t, page_count);
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 3419779548..97701e6bb2 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1238,7 +1238,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>           return -1;
>       }
>   
> -    postcopy_thread_create(mis, &mis->fault_thread, "fault-default",
> +    postcopy_thread_create(mis, &mis->fault_thread, "mig/dst/fault",
>                              postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
>       mis->have_fault_thread = true;
>   
> @@ -1258,7 +1258,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>            * This thread needs to be created after the temp pages because
>            * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
>            */
> -        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "fault-fast",
> +        postcopy_thread_create(mis, &mis->postcopy_prio_thread, "mig/dst/preempt",
>                                  postcopy_preempt_thread, QEMU_THREAD_JOINABLE);
>           mis->preempt_thread_status = PREEMPT_THREAD_CREATED;
>       }
> diff --git a/migration/savevm.c b/migration/savevm.c
> index c621f2359b..e71410d8c1 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2129,7 +2129,7 @@ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis)
>       }
>   
>       mis->have_listen_thread = true;
> -    postcopy_thread_create(mis, &mis->listen_thread, "postcopy/listen",
> +    postcopy_thread_create(mis, &mis->listen_thread, "mig/dst/listen",
>                              postcopy_ram_listen_thread, QEMU_THREAD_DETACHED);
>       trace_loadvm_postcopy_handle_listen("return");
>   

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 08/10] tests/migration-tests: Always enable migration events
  2024-06-17 21:23     ` Peter Xu
@ 2024-06-19 20:39       ` Peter Xu
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2024-06-19 20:39 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, Thomas Huth, Markus Armbruster, Laurent Vivier,
	Eric Blake, Prasad Pandit, Jiri Denemark, Bandan Das

On Mon, Jun 17, 2024 at 05:23:24PM -0400, Peter Xu wrote:
> On Mon, Jun 17, 2024 at 04:51:32PM -0300, Fabiano Rosas wrote:
> > Peter Xu <peterx@redhat.com> writes:
> > 
> > > Libvirt should always enable it, so it'll be nice qtest also cover that for
> > > all tests.  Though this patch only enables it, no extra tests are done on
> > > these events yet.
> > >
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  tests/qtest/migration-test.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > > index 13b59d4c10..9ae8892e26 100644
> > > --- a/tests/qtest/migration-test.c
> > > +++ b/tests/qtest/migration-test.c
> > > @@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
> > >          unlink(shmem_path);
> > >      }
> > >  
> > > +    /*
> > > +     * Always enable migration events.  Libvirt always uses it, let's try
> > > +     * to mimic as closer as that.
> > > +     */
> > > +    migrate_set_capability(*from, "events", true);
> > > +    migrate_set_capability(*to, "events", true);
> > > +
> > 
> > What do we do with the one at migrate_incoming_qmp()?
> 
> Hmm missed that..  I'll drop that one in this same patch and rewrite the
> commit message.  New version attached:
> 
> ===8<===
> From 443fef4188d544362fc026b46784c15b82624642 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Mon, 17 Jun 2024 10:49:52 -0400
> Subject: [PATCH] tests/migration-tests: Always enable migration events
> 
> Libvirt should always enable it, so it'll be nice qtest also cover that for
> all tests on both sides.  migrate_incoming_qmp() used to enable it only on
> dst, now we enable them on both, as we'll start to sanity check events even
> on the src QEMU.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-helpers.c | 2 --
>  tests/qtest/migration-test.c    | 7 +++++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index 0ac49ceb54..797b1e8c1c 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -258,8 +258,6 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, const char *fmt, ...)
>      g_assert(!qdict_haskey(args, "uri"));
>      qdict_put_str(args, "uri", uri);
>  
> -    migrate_set_capability(to, "events", true);
> -

Unfortunately this will break virtio-net-failover test... as it uses
migrate_incoming_qmp() without using test_migrate_start().

I'll leave it there for now, perhaps adding a comment.

>      rsp = qtest_qmp(to, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
>                      args);
>  
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 640713bfd5..c015e801ac 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -851,6 +851,13 @@ static int test_migrate_start(QTestState **from, QTestState **to,
>          unlink(shmem_path);
>      }
>  
> +    /*
> +     * Always enable migration events.  Libvirt always uses it, let's try
> +     * to mimic as closer as that.
> +     */
> +    migrate_set_capability(*from, "events", true);
> +    migrate_set_capability(*to, "events", true);
> +
>      return 0;
>  }
>  
> -- 
> 2.45.0
> 
> 
> -- 
> Peter Xu

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-06-19 20:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
2024-06-19  1:05   ` Zhijian Li (Fujitsu) via
2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
2024-06-17 19:38   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
2024-06-17 19:41   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
2024-06-17 19:45   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
2024-06-17 19:47   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
2024-06-17 19:49   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
2024-06-17 19:51   ` Fabiano Rosas
2024-06-17 21:23     ` Peter Xu
2024-06-19 20:39       ` Peter Xu
2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
2024-06-17 19:53   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Peter Xu
2024-06-17 20:07   ` Fabiano Rosas
2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 20:12   ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).