qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
@ 2025-02-28 12:17 Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 1/5] migration/multifd: move macros to multifd header Prasad Pandit
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
  to notify the destination migration thread to synchronise with the Multifd
  threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
  to receive all their data, before they are shutdown.

  This series also updates the channel discovery function and qtests as
  suggested in the previous review comments.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             147.84s   81 subtests passed
===


v6: https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
* This series (v6) shuts down Multifd threads before starting Postcopy
  migration. It helps to avoid an issue of multifd pages arriving late
  at the destination during Postcopy phase and corrupting the vCPU
  state. It also reorders the qtest patches and does some refactoring
  changes as suggested in previous review.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             161.35s   73 subtests passed
===


v5: https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppandit@redhat.com/T/#t
* This series (v5) consolidates migration capabilities setting in one
  'set_migration_capabilities()' function, thus simplifying test sources.
  It passes all migration tests.
===
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             143.66s   71 subtests passed
===


v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t
* This series (v4) adds more 'multifd+postcopy' qtests which test
  Precopy migration with 'postcopy-ram' attribute set. And run
  Postcopy migrations with 'multifd' channels enabled.
===
$ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test'
# slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
# slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs
...
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             148.41s   71 subtests passed
===


v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t
* This series (v3) passes all existing 'tests/qtest/migration/*' tests
  and adds a new one to enable multifd channels with postcopy migration.


v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u
* This series (v2) further refactors the 'ram_save_target_page'
  function to make it independent of the multifd & postcopy change.


v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
* This series removes magic value (4-bytes) introduced in the
  previous series for the Postcopy channel.


v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
* Currently Multifd and Postcopy migration can not be used together.
  QEMU shows "Postcopy is not yet compatible with multifd" message.

  When migrating guests with large (100's GB) RAM, Multifd threads
  help to accelerate migration, but inability to use it with the
  Postcopy mode delays guest start up on the destination side.

* This patch series allows to enable both Multifd and Postcopy
  migration together. Precopy and Multifd threads work during
  the initial guest (RAM) transfer. When migration moves to the
  Postcopy phase, Multifd threads are restrained and the Postcopy
  threads start to request pages from the source side.

* This series introduces magic value (4-bytes) to be sent on the
  Postcopy channel. It helps to differentiate channels and properly
  setup incoming connections on the destination side.


Thank you.
---
Prasad Pandit (5):
  migration/multifd: move macros to multifd header
  migration: enable multifd and postcopy together
  tests/qtest/migration: consolidate set capabilities
  tests/qtest/migration: add postcopy tests with multifd
  migration: add MULTIFD_RECV_SYNC migration command

 migration/migration.c                     | 136 +++++++++++++---------
 migration/multifd-nocomp.c                |  28 +++--
 migration/multifd.c                       |  17 +--
 migration/multifd.h                       |   6 +
 migration/options.c                       |   5 -
 migration/ram.c                           |   7 +-
 migration/savevm.c                        |  13 +++
 migration/savevm.h                        |   1 +
 tests/qtest/migration/compression-tests.c |  37 +++++-
 tests/qtest/migration/cpr-tests.c         |   6 +-
 tests/qtest/migration/file-tests.c        |  58 +++++----
 tests/qtest/migration/framework.c         |  76 ++++++++----
 tests/qtest/migration/framework.h         |   9 +-
 tests/qtest/migration/misc-tests.c        |   4 +-
 tests/qtest/migration/postcopy-tests.c    |  35 +++++-
 tests/qtest/migration/precopy-tests.c     |  48 +++++---
 tests/qtest/migration/tls-tests.c         |  69 ++++++++++-
 17 files changed, 388 insertions(+), 167 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 1/5] migration/multifd: move macros to multifd header
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
@ 2025-02-28 12:17 ` Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 2/5] migration: enable multifd and postcopy together Prasad Pandit
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Move MULTIFD_ macros to the header file so that
they are accessible from other source files.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/multifd.c | 5 -----
 migration/multifd.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

v6: no change
- https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t

diff --git a/migration/multifd.c b/migration/multifd.c
index 215ad0414a..30e514785f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -33,11 +33,6 @@
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
-/* Multiple fd's */
-
-#define MULTIFD_MAGIC 0x11223344U
-#define MULTIFD_VERSION 1
-
 typedef struct {
     uint32_t magic;
     uint32_t version;
diff --git a/migration/multifd.h b/migration/multifd.h
index cf408ff721..bff867ca6b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -49,6 +49,11 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
 bool multifd_recv(void);
 MultiFDRecvData *multifd_get_recv_data(void);
 
+/* Multiple fd's */
+
+#define MULTIFD_MAGIC 0x11223344U
+#define MULTIFD_VERSION 1
+
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v7 2/5] migration: enable multifd and postcopy together
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 1/5] migration/multifd: move macros to multifd header Prasad Pandit
@ 2025-02-28 12:17 ` Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 3/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Enable Multifd and Postcopy migration together.
The migration_ioc_process_incoming() routine
checks magic value sent on each channel and
helps to properly setup multifd and postcopy
channels.

The Precopy and Multifd threads work during the
initial guest RAM transfer. When migration moves
to the Postcopy phase, the multifd threads are
shutdown and Postcopy threads on the destination
request/pull data from the source side.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/migration.c      | 133 +++++++++++++++++++++----------------
 migration/multifd-nocomp.c |   3 +-
 migration/multifd.c        |  11 ++-
 migration/options.c        |   5 --
 migration/ram.c            |   7 +-
 5 files changed, 89 insertions(+), 70 deletions(-)

v6: update channel discovery part in migration_ioc_process_incoming()
- https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t

diff --git a/migration/migration.c b/migration/migration.c
index c597aa707e..5db9e18272 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -95,6 +95,9 @@ enum mig_rp_message_type {
     MIG_RP_MSG_MAX
 };
 
+/* Migration channel types */
+enum { CH_MAIN, CH_MULTIFD, CH_POSTCOPY };
+
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
@@ -947,28 +950,19 @@ void migration_fd_process_incoming(QEMUFile *f)
     migration_incoming_process();
 }
 
-/*
- * Returns true when we want to start a new incoming migration process,
- * false otherwise.
- */
-static bool migration_should_start_incoming(bool main_channel)
+static bool migration_has_main_and_multifd_channels(void)
 {
-    /* Multifd doesn't start unless all channels are established */
-    if (migrate_multifd()) {
-        return migration_has_all_channels();
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    if (!mis->from_src_file) {
+        /* main channel not established */
+        return false;
     }
 
-    /* Preempt channel only starts when the main channel is created */
-    if (migrate_postcopy_preempt()) {
-        return main_channel;
+    if (migrate_multifd() && !multifd_recv_all_channels_created()) {
+        return false;
     }
 
-    /*
-     * For all the rest types of migration, we should only reach here when
-     * it's the main channel that's being created, and we should always
-     * proceed with this channel.
-     */
-    assert(main_channel);
+    /* main channel and all multifd channels are established */
     return true;
 }
 
@@ -977,59 +971,84 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     QEMUFile *f;
-    bool default_channel = true;
+    uint8_t channel;
     uint32_t channel_magic = 0;
     int ret = 0;
 
-    if (migrate_multifd() && !migrate_mapped_ram() &&
-        !migrate_postcopy_ram() &&
-        qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
-        /*
-         * With multiple channels, it is possible that we receive channels
-         * out of order on destination side, causing incorrect mapping of
-         * source channels on destination side. Check channel MAGIC to
-         * decide type of channel. Please note this is best effort, postcopy
-         * preempt channel does not send any magic number so avoid it for
-         * postcopy live migration. Also tls live migration already does
-         * tls handshake while initializing main channel so with tls this
-         * issue is not possible.
-         */
-        ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
-                                          sizeof(channel_magic), errp);
+    if (!migration_has_main_and_multifd_channels()) {
+        if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+            /*
+             * With multiple channels, it is possible that we receive channels
+             * out of order on destination side, causing incorrect mapping of
+             * source channels on destination side. Check channel MAGIC to
+             * decide type of channel. Please note this is best effort,
+             * postcopy preempt channel does not send any magic number so
+             * avoid it for postcopy live migration. Also tls live migration
+             * already does tls handshake while initializing main channel so
+             * with tls this issue is not possible.
+             */
+            ret = migration_channel_read_peek(ioc, (void *)&channel_magic,
+                                              sizeof(channel_magic), errp);
+            if (ret != 0) {
+                return;
+            }
 
-        if (ret != 0) {
+            channel_magic = be32_to_cpu(channel_magic);
+            if (channel_magic == QEMU_VM_FILE_MAGIC) {
+                channel = CH_MAIN;
+            } else if (channel_magic == MULTIFD_MAGIC) {
+                channel = CH_MULTIFD;
+            } else if (!mis->from_src_file &&
+                        mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+                /* reconnect main channel for postcopy recovery */
+                channel = CH_MAIN;
+            } else {
+                error_setg(errp, "unknown channel magic: %u", channel_magic);
+                return;
+            }
+        } else if (mis->from_src_file && migrate_multifd()) {
+            /*
+             * Non-peekable channels like tls/file are processed as
+             * multifd channels when multifd is enabled.
+             */
+            channel = CH_MULTIFD;
+        } else if (!mis->from_src_file) {
+            channel = CH_MAIN;
+        } else {
+            error_setg(errp, "non-peekable channel used without multifd");
             return;
         }
-
-        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+    } else if (mis->from_src_file) {
+        channel = CH_POSTCOPY;
     } else {
-        default_channel = !mis->from_src_file;
+        channel = CH_MAIN;
     }
 
     if (multifd_recv_setup(errp) != 0) {
         return;
     }
 
-    if (default_channel) {
+    if (channel == CH_MAIN) {
         f = qemu_file_new_input(ioc);
         migration_incoming_setup(f);
-    } else {
+    } else if (channel == CH_MULTIFD) {
         /* Multiple connections */
-        assert(migration_needs_multiple_sockets());
         if (migrate_multifd()) {
             multifd_recv_new_channel(ioc, &local_err);
-        } else {
-            assert(migrate_postcopy_preempt());
-            f = qemu_file_new_input(ioc);
-            postcopy_preempt_new_channel(mis, f);
         }
         if (local_err) {
             error_propagate(errp, local_err);
             return;
         }
+    } else if (channel == CH_POSTCOPY) {
+        assert(migrate_postcopy_preempt());
+        assert(!mis->postcopy_qemufile_dst);
+        f = qemu_file_new_input(ioc);
+        postcopy_preempt_new_channel(mis, f);
+        return;
     }
 
-    if (migration_should_start_incoming(default_channel)) {
+    if (migration_has_main_and_multifd_channels()) {
         /* If it's a recovery, we're done */
         if (postcopy_try_recover()) {
             return;
@@ -1046,20 +1065,15 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
  */
 bool migration_has_all_channels(void)
 {
+    if (!migration_has_main_and_multifd_channels()) {
+        return false;
+    }
+
     MigrationIncomingState *mis = migration_incoming_get_current();
-
-    if (!mis->from_src_file) {
+    if (migrate_postcopy_preempt() && !mis->postcopy_qemufile_dst) {
         return false;
     }
 
-    if (migrate_multifd()) {
-        return multifd_recv_all_channels_created();
-    }
-
-    if (migrate_postcopy_preempt()) {
-        return mis->postcopy_qemufile_dst != NULL;
-    }
-
     return true;
 }
 
@@ -1470,6 +1484,8 @@ static void migration_cleanup(MigrationState *s)
 
     assert(!migration_is_active());
 
+    file_cleanup_outgoing_migration();
+    socket_cleanup_outgoing_migration();
     if (s->state == MIGRATION_STATUS_CANCELLING) {
         migrate_set_state(&s->state, MIGRATION_STATUS_CANCELLING,
                           MIGRATION_STATUS_CANCELLED);
@@ -3382,8 +3398,11 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     }
 
     /* Still a significant amount to transfer */
-    if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover &&
-        qatomic_read(&s->start_postcopy)) {
+    if (!in_postcopy && must_precopy <= s->threshold_size
+        && can_switchover && qatomic_read(&s->start_postcopy)) {
+        if (migrate_multifd()) {
+            multifd_send_shutdown();
+        }
         if (postcopy_start(s, &local_err)) {
             migrate_set_error(s, local_err);
             error_report_err(local_err);
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 1325dba97c..d0edec7cd1 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -16,6 +16,7 @@
 #include "file.h"
 #include "multifd.h"
 #include "options.h"
+#include "migration.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "qemu/error-report.h"
@@ -391,7 +392,7 @@ int multifd_ram_flush_and_sync(QEMUFile *f)
     MultiFDSyncReq req;
     int ret;
 
-    if (!migrate_multifd()) {
+    if (!migrate_multifd() || migration_in_postcopy()) {
         return 0;
     }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 30e514785f..2bd8604ca1 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -467,8 +467,6 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
 
 static void multifd_send_cleanup_state(void)
 {
-    file_cleanup_outgoing_migration();
-    socket_cleanup_outgoing_migration();
     qemu_sem_destroy(&multifd_send_state->channels_created);
     qemu_sem_destroy(&multifd_send_state->channels_ready);
     g_free(multifd_send_state->params);
@@ -481,7 +479,7 @@ void multifd_send_shutdown(void)
 {
     int i;
 
-    if (!migrate_multifd()) {
+    if (!migrate_multifd() || !multifd_send_state) {
         return;
     }
 
@@ -1231,6 +1229,13 @@ static void *multifd_recv_thread(void *opaque)
         }
 
         if (has_data) {
+            /*
+             * multifd thread should not be active and receive data
+             * when migration is in the Postcopy phase. Two threads
+             * writing the same memory area could easily corrupt
+             * the guest state.
+             */
+            assert(!migration_in_postcopy());
             ret = multifd_recv_state->ops->recv(p, &local_err);
             if (ret != 0) {
                 break;
diff --git a/migration/options.c b/migration/options.c
index bb259d192a..5d56591f9e 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -482,11 +482,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
             error_setg(errp, "Postcopy is not compatible with ignore-shared");
             return false;
         }
-
-        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
-            error_setg(errp, "Postcopy is not yet compatible with multifd");
-            return false;
-        }
     }
 
     if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
diff --git a/migration/ram.c b/migration/ram.c
index 589b6505eb..2d8e593c90 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1297,7 +1297,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
         if (!pss->block) {
-            if (multifd_ram_sync_per_round()) {
+            if (multifd_ram_sync_per_round() && !migration_in_postcopy()) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
                 int ret = multifd_ram_flush_and_sync(f);
                 if (ret < 0) {
@@ -1971,9 +1971,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         }
     }
 
-    if (migrate_multifd()) {
-        RAMBlock *block = pss->block;
-        return ram_save_multifd_page(block, offset);
+    if (migrate_multifd() && !migration_in_postcopy()) {
+        return ram_save_multifd_page(pss->block, offset);
     }
 
     if (control_save_page(pss, offset, &res)) {
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v7 3/5] tests/qtest/migration: consolidate set capabilities
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 1/5] migration/multifd: move macros to multifd header Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 2/5] migration: enable multifd and postcopy together Prasad Pandit
@ 2025-02-28 12:17 ` Prasad Pandit
  2025-02-28 12:17 ` [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Migration capabilities are set in multiple '.start_hook'
functions for various tests. Instead, consolidate setting
capabilities in 'migrate_start_set_capabilities()' function
which is called from the 'migrate_start()' function.
While simplifying the capabilities setting, it helps
to declutter the qtest sources.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 tests/qtest/migration/compression-tests.c | 22 +++++--
 tests/qtest/migration/cpr-tests.c         |  6 +-
 tests/qtest/migration/file-tests.c        | 58 ++++++++---------
 tests/qtest/migration/framework.c         | 76 ++++++++++++++++-------
 tests/qtest/migration/framework.h         |  9 ++-
 tests/qtest/migration/misc-tests.c        |  4 +-
 tests/qtest/migration/postcopy-tests.c    |  8 ++-
 tests/qtest/migration/precopy-tests.c     | 29 +++++----
 tests/qtest/migration/tls-tests.c         | 23 ++++++-
 9 files changed, 151 insertions(+), 84 deletions(-)

v6: Move .caps initialization to the .start object.
    Set default multifd threads to 4.
- https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t

diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c
index 8b58401b84..41e79f031b 100644
--- a/tests/qtest/migration/compression-tests.c
+++ b/tests/qtest/migration/compression-tests.c
@@ -35,6 +35,9 @@ static void test_multifd_tcp_zstd(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
     };
     test_precopy_common(&args);
@@ -56,6 +59,9 @@ static void test_multifd_tcp_qatzip(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .start_hook = migrate_hook_start_precopy_tcp_multifd_qatzip,
     };
     test_precopy_common(&args);
@@ -74,6 +80,9 @@ static void test_multifd_tcp_qpl(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .start_hook = migrate_hook_start_precopy_tcp_multifd_qpl,
     };
     test_precopy_common(&args);
@@ -92,6 +101,9 @@ static void test_multifd_tcp_uadk(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .start_hook = migrate_hook_start_precopy_tcp_multifd_uadk,
     };
     test_precopy_common(&args);
@@ -103,10 +115,6 @@ migrate_hook_start_xbzrle(QTestState *from,
                           QTestState *to)
 {
     migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432);
-
-    migrate_set_capability(from, "xbzrle", true);
-    migrate_set_capability(to, "xbzrle", true);
-
     return NULL;
 }
 
@@ -118,6 +126,9 @@ static void test_precopy_unix_xbzrle(void)
         .listen_uri = uri,
         .start_hook = migrate_hook_start_xbzrle,
         .iterations = 2,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_XBZRLE] = true,
+        },
         /*
          * XBZRLE needs pages to be modified when doing the 2nd+ round
          * iteration to have real data pushed to the stream.
@@ -146,6 +157,9 @@ static void test_multifd_tcp_zlib(void)
 {
     MigrateCommon args = {
         .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .start_hook = migrate_hook_start_precopy_tcp_multifd_zlib,
     };
     test_precopy_common(&args);
diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c
index 4758841824..5536e14610 100644
--- a/tests/qtest/migration/cpr-tests.c
+++ b/tests/qtest/migration/cpr-tests.c
@@ -24,9 +24,6 @@ static void *migrate_hook_start_mode_reboot(QTestState *from, QTestState *to)
     migrate_set_parameter_str(from, "mode", "cpr-reboot");
     migrate_set_parameter_str(to, "mode", "cpr-reboot");
 
-    migrate_set_capability(from, "x-ignore-shared", true);
-    migrate_set_capability(to, "x-ignore-shared", true);
-
     return NULL;
 }
 
@@ -39,6 +36,9 @@ static void test_mode_reboot(void)
         .connect_uri = uri,
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_mode_reboot,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true,
+        },
     };
 
     test_file_common(&args, true);
diff --git a/tests/qtest/migration/file-tests.c b/tests/qtest/migration/file-tests.c
index f260e2871d..4d78ce0855 100644
--- a/tests/qtest/migration/file-tests.c
+++ b/tests/qtest/migration/file-tests.c
@@ -107,15 +107,6 @@ static void test_precopy_file_offset_bad(void)
     test_file_common(&args, false);
 }
 
-static void *migrate_hook_start_mapped_ram(QTestState *from,
-                                           QTestState *to)
-{
-    migrate_set_capability(from, "mapped-ram", true);
-    migrate_set_capability(to, "mapped-ram", true);
-
-    return NULL;
-}
-
 static void test_precopy_file_mapped_ram_live(void)
 {
     g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
@@ -123,7 +114,9 @@ static void test_precopy_file_mapped_ram_live(void)
     MigrateCommon args = {
         .connect_uri = uri,
         .listen_uri = "defer",
-        .start_hook = migrate_hook_start_mapped_ram,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+        },
     };
 
     test_file_common(&args, false);
@@ -136,26 +129,14 @@ static void test_precopy_file_mapped_ram(void)
     MigrateCommon args = {
         .connect_uri = uri,
         .listen_uri = "defer",
-        .start_hook = migrate_hook_start_mapped_ram,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+        },
     };
 
     test_file_common(&args, true);
 }
 
-static void *migrate_hook_start_multifd_mapped_ram(QTestState *from,
-                                                   QTestState *to)
-{
-    migrate_hook_start_mapped_ram(from, to);
-
-    migrate_set_parameter_int(from, "multifd-channels", 4);
-    migrate_set_parameter_int(to, "multifd-channels", 4);
-
-    migrate_set_capability(from, "multifd", true);
-    migrate_set_capability(to, "multifd", true);
-
-    return NULL;
-}
-
 static void test_multifd_file_mapped_ram_live(void)
 {
     g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
@@ -163,7 +144,10 @@ static void test_multifd_file_mapped_ram_live(void)
     MigrateCommon args = {
         .connect_uri = uri,
         .listen_uri = "defer",
-        .start_hook = migrate_hook_start_multifd_mapped_ram,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+        },
     };
 
     test_file_common(&args, false);
@@ -176,7 +160,10 @@ static void test_multifd_file_mapped_ram(void)
     MigrateCommon args = {
         .connect_uri = uri,
         .listen_uri = "defer",
-        .start_hook = migrate_hook_start_multifd_mapped_ram,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+        },
     };
 
     test_file_common(&args, true);
@@ -185,8 +172,6 @@ static void test_multifd_file_mapped_ram(void)
 static void *migrate_hook_start_multifd_mapped_ram_dio(QTestState *from,
                                                        QTestState *to)
 {
-    migrate_hook_start_multifd_mapped_ram(from, to);
-
     migrate_set_parameter_bool(from, "direct-io", true);
     migrate_set_parameter_bool(to, "direct-io", true);
 
@@ -201,6 +186,10 @@ static void test_multifd_file_mapped_ram_dio(void)
         .connect_uri = uri,
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_mapped_ram_dio,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
 
     if (!probe_o_direct_support(tmpfs)) {
@@ -246,7 +235,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset_dio(QTestState *from,
     fdset_add_fds(from, file, O_WRONLY, 2, true);
     fdset_add_fds(to, file, O_RDONLY, 2, true);
 
-    migrate_hook_start_multifd_mapped_ram(from, to);
     migrate_set_parameter_bool(from, "direct-io", true);
     migrate_set_parameter_bool(to, "direct-io", true);
 
@@ -261,8 +249,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset(QTestState *from,
     fdset_add_fds(from, file, O_WRONLY, 2, false);
     fdset_add_fds(to, file, O_RDONLY, 2, false);
 
-    migrate_hook_start_multifd_mapped_ram(from, to);
-
     return NULL;
 }
 
@@ -275,6 +261,10 @@ static void test_multifd_file_mapped_ram_fdset(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_mapped_ram_fdset,
         .end_hook = migrate_hook_end_multifd_mapped_ram_fdset,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
 
     test_file_common(&args, true);
@@ -289,6 +279,10 @@ static void test_multifd_file_mapped_ram_fdset_dio(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_mapped_ram_fdset_dio,
         .end_hook = migrate_hook_end_multifd_mapped_ram_fdset,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
 
     if (!probe_o_direct_support(tmpfs)) {
diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c
index 10e1d04b58..be6c245843 100644
--- a/tests/qtest/migration/framework.c
+++ b/tests/qtest/migration/framework.c
@@ -30,6 +30,7 @@
 #define QEMU_VM_FILE_MAGIC 0x5145564d
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
+#define MULTIFD_TEST_CHANNELS 4
 
 unsigned start_address;
 unsigned end_address;
@@ -207,6 +208,52 @@ static QList *migrate_start_get_qmp_capabilities(const MigrateStart *args)
     return capabilities;
 }
 
+static void migrate_start_set_capabilities(QTestState *from, QTestState *to,
+                                           MigrateStart *args)
+{
+    /*
+     * MigrationCapability_lookup and MIGRATION_CAPABILITY_ constants
+     * are from qapi-types-migration.h.
+     */
+    for (uint8_t i = 0; i < MIGRATION_CAPABILITY__MAX; i++)
+    {
+        if (!args->caps[i]) {
+            continue;
+        }
+        if (from) {
+            migrate_set_capability(from,
+                            MigrationCapability_lookup.array[i], true);
+        }
+        if (to) {
+            migrate_set_capability(to,
+                            MigrationCapability_lookup.array[i], true);
+        }
+    }
+
+    /*
+     * Always enable migration events.  Libvirt always uses it, let's try
+     * to mimic as closer as that.
+     */
+    migrate_set_capability(from, "events", true);
+    if (!args->defer_target_connect) {
+        migrate_set_capability(to, "events", true);
+    }
+
+    /*
+     * Default number of channels should be fine for most
+     * tests. Individual tests can override by calling
+     * migrate_set_parameter() directly.
+     */
+    if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) {
+        migrate_set_parameter_int(from, "multifd-channels",
+                                  MULTIFD_TEST_CHANNELS);
+        migrate_set_parameter_int(to, "multifd-channels",
+                                  MULTIFD_TEST_CHANNELS);
+    }
+
+    return;
+}
+
 int migrate_start(QTestState **from, QTestState **to, const char *uri,
                   MigrateStart *args)
 {
@@ -379,14 +426,7 @@ int migrate_start(QTestState **from, QTestState **to, const char *uri,
         unlink(shmem_path);
     }
 
-    /*
-     * Always enable migration events.  Libvirt always uses it, let's try
-     * to mimic as closer as that.
-     */
-    migrate_set_capability(*from, "events", true);
-    if (!args->defer_target_connect) {
-        migrate_set_capability(*to, "events", true);
-    }
+    migrate_start_set_capabilities(*from, *to, args);
 
     return 0;
 }
@@ -432,6 +472,10 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 {
     QTestState *from, *to;
 
+    /* set postcopy capabilities */
+    args->start.caps[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME] = true;
+    args->start.caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true;
+
     if (migrate_start(&from, &to, "defer", &args->start)) {
         return -1;
     }
@@ -440,17 +484,7 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
         args->postcopy_data = args->start_hook(from, to);
     }
 
-    migrate_set_capability(from, "postcopy-ram", true);
-    migrate_set_capability(to, "postcopy-ram", true);
-    migrate_set_capability(to, "postcopy-blocktime", true);
-
-    if (args->postcopy_preempt) {
-        migrate_set_capability(from, "postcopy-preempt", true);
-        migrate_set_capability(to, "postcopy-preempt", true);
-    }
-
     migrate_ensure_non_converge(from);
-
     migrate_prepare_for_dirty_mem(from);
     qtest_qmp_assert_success(to, "{ 'execute': 'migrate-incoming',"
                              "  'arguments': { "
@@ -948,15 +982,9 @@ void *migrate_hook_start_precopy_tcp_multifd_common(QTestState *from,
                                                     QTestState *to,
                                                     const char *method)
 {
-    migrate_set_parameter_int(from, "multifd-channels", 16);
-    migrate_set_parameter_int(to, "multifd-channels", 16);
-
     migrate_set_parameter_str(from, "multifd-compression", method);
     migrate_set_parameter_str(to, "multifd-compression", method);
 
-    migrate_set_capability(from, "multifd", true);
-    migrate_set_capability(to, "multifd", true);
-
     /* Start incoming migration from the 1st socket */
     migrate_incoming_qmp(to, "tcp:127.0.0.1:0", NULL, "{}");
 
diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h
index e4a11870f6..01e425e64e 100644
--- a/tests/qtest/migration/framework.h
+++ b/tests/qtest/migration/framework.h
@@ -12,6 +12,7 @@
 #define TEST_FRAMEWORK_H
 
 #include "libqtest.h"
+#include <qapi/qapi-types-migration.h>
 
 #define FILE_TEST_FILENAME "migfile"
 #define FILE_TEST_OFFSET 0x1000
@@ -120,6 +121,13 @@ typedef struct {
 
     /* Do not connect to target monitor and qtest sockets in qtest_init */
     bool defer_target_connect;
+
+    /*
+     * Migration capabilities to be set in both source and
+     * destination. For unilateral capabilities, use
+     * migration_set_capabilities().
+     */
+    bool caps[MIGRATION_CAPABILITY__MAX];
 } MigrateStart;
 
 typedef enum PostcopyRecoveryFailStage {
@@ -207,7 +215,6 @@ typedef struct {
 
     /* Postcopy specific fields */
     void *postcopy_data;
-    bool postcopy_preempt;
     PostcopyRecoveryFailStage postcopy_recovery_fail_stage;
 } MigrateCommon;
 
diff --git a/tests/qtest/migration/misc-tests.c b/tests/qtest/migration/misc-tests.c
index 2e612d9e38..54995256d8 100644
--- a/tests/qtest/migration/misc-tests.c
+++ b/tests/qtest/migration/misc-tests.c
@@ -98,6 +98,7 @@ static void test_ignore_shared(void)
     QTestState *from, *to;
     MigrateStart args = {
         .use_shmem = true,
+        .caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true,
     };
 
     if (migrate_start(&from, &to, uri, &args)) {
@@ -107,9 +108,6 @@ static void test_ignore_shared(void)
     migrate_ensure_non_converge(from);
     migrate_prepare_for_dirty_mem(from);
 
-    migrate_set_capability(from, "x-ignore-shared", true);
-    migrate_set_capability(to, "x-ignore-shared", true);
-
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c
index 982457bed1..483e3ff99f 100644
--- a/tests/qtest/migration/postcopy-tests.c
+++ b/tests/qtest/migration/postcopy-tests.c
@@ -39,7 +39,9 @@ static void test_postcopy_suspend(void)
 static void test_postcopy_preempt(void)
 {
     MigrateCommon args = {
-        .postcopy_preempt = true,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+        },
     };
 
     test_postcopy_common(&args);
@@ -73,7 +75,9 @@ static void test_postcopy_recovery_fail_reconnect(void)
 static void test_postcopy_preempt_recovery(void)
 {
     MigrateCommon args = {
-        .postcopy_preempt = true,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+        },
     };
 
     test_postcopy_recovery_common(&args);
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index ba273d10b9..f8404793b8 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -108,23 +108,14 @@ static void test_precopy_tcp_plain(void)
     test_precopy_common(&args);
 }
 
-static void *migrate_hook_start_switchover_ack(QTestState *from, QTestState *to)
-{
-
-    migrate_set_capability(from, "return-path", true);
-    migrate_set_capability(to, "return-path", true);
-
-    migrate_set_capability(from, "switchover-ack", true);
-    migrate_set_capability(to, "switchover-ack", true);
-
-    return NULL;
-}
-
 static void test_precopy_tcp_switchover_ack(void)
 {
     MigrateCommon args = {
         .listen_uri = "tcp:127.0.0.1:0",
-        .start_hook = migrate_hook_start_switchover_ack,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_RETURN_PATH] = true,
+            .caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK] = true,
+        },
         /*
          * Source VM must be running in order to consider the switchover ACK
          * when deciding to do switchover or not.
@@ -393,6 +384,9 @@ static void test_multifd_tcp_uri_none(void)
     MigrateCommon args = {
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_precopy_tcp_multifd,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         /*
          * Multifd is more complicated than most of the features, it
          * directly takes guest page buffers when sending, make sure
@@ -408,6 +402,9 @@ static void test_multifd_tcp_zero_page_legacy(void)
     MigrateCommon args = {
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_precopy_tcp_multifd_zero_page_legacy,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         /*
          * Multifd is more complicated than most of the features, it
          * directly takes guest page buffers when sending, make sure
@@ -423,6 +420,9 @@ static void test_multifd_tcp_no_zero_page(void)
     MigrateCommon args = {
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_precopy_tcp_multifd_no_zero_page,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         /*
          * Multifd is more complicated than most of the features, it
          * directly takes guest page buffers when sending, make sure
@@ -439,6 +439,9 @@ static void test_multifd_tcp_channels_none(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_precopy_tcp_multifd,
         .live = true,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
         .connect_channels = ("[ { 'channel-type': 'main',"
                              "    'addr': { 'transport': 'socket',"
                              "              'type': 'inet',"
diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c
index 2cb4a44bcd..72f44defbb 100644
--- a/tests/qtest/migration/tls-tests.c
+++ b/tests/qtest/migration/tls-tests.c
@@ -375,9 +375,11 @@ static void test_postcopy_tls_psk(void)
 static void test_postcopy_preempt_tls_psk(void)
 {
     MigrateCommon args = {
-        .postcopy_preempt = true,
         .start_hook = migrate_hook_start_tls_psk_match,
         .end_hook = migrate_hook_end_tls_psk,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+        },
     };
 
     test_postcopy_common(&args);
@@ -397,9 +399,11 @@ static void test_postcopy_recovery_tls_psk(void)
 static void test_postcopy_preempt_all(void)
 {
     MigrateCommon args = {
-        .postcopy_preempt = true,
         .start_hook = migrate_hook_start_tls_psk_match,
         .end_hook = migrate_hook_end_tls_psk,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+        },
     };
 
     test_postcopy_recovery_common(&args);
@@ -631,6 +635,9 @@ static void test_multifd_tcp_tls_psk_match(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match,
         .end_hook = migrate_hook_end_tls_psk,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
     test_precopy_common(&args);
 }
@@ -640,6 +647,7 @@ static void test_multifd_tcp_tls_psk_mismatch(void)
     MigrateCommon args = {
         .start = {
             .hide_stderr = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
         },
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tcp_tls_psk_mismatch,
@@ -656,6 +664,9 @@ static void test_multifd_tcp_tls_x509_default_host(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tls_x509_default_host,
         .end_hook = migrate_hook_end_tls_x509,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
     test_precopy_common(&args);
 }
@@ -666,6 +677,9 @@ static void test_multifd_tcp_tls_x509_override_host(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tls_x509_override_host,
         .end_hook = migrate_hook_end_tls_x509,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
     test_precopy_common(&args);
 }
@@ -688,6 +702,7 @@ static void test_multifd_tcp_tls_x509_mismatch_host(void)
     MigrateCommon args = {
         .start = {
             .hide_stderr = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
         },
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tls_x509_mismatch_host,
@@ -703,6 +718,9 @@ static void test_multifd_tcp_tls_x509_allow_anon_client(void)
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tls_x509_allow_anon_client,
         .end_hook = migrate_hook_end_tls_x509,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
     };
     test_precopy_common(&args);
 }
@@ -712,6 +730,7 @@ static void test_multifd_tcp_tls_x509_reject_anon_client(void)
     MigrateCommon args = {
         .start = {
             .hide_stderr = true,
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
         },
         .listen_uri = "defer",
         .start_hook = migrate_hook_start_multifd_tls_x509_reject_anon_client,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
                   ` (2 preceding siblings ...)
  2025-02-28 12:17 ` [PATCH v7 3/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit
@ 2025-02-28 12:17 ` Prasad Pandit
  2025-02-28 15:11   ` Fabiano Rosas
  2025-02-28 12:17 ` [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command Prasad Pandit
  2025-02-28 14:53 ` [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Fabiano Rosas
  5 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Add new qtests to run postcopy migration with multifd
channels enabled.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 tests/qtest/migration/compression-tests.c | 15 ++++++++
 tests/qtest/migration/postcopy-tests.c    | 27 +++++++++++++
 tests/qtest/migration/precopy-tests.c     | 19 ++++++++++
 tests/qtest/migration/tls-tests.c         | 46 +++++++++++++++++++++++
 4 files changed, 107 insertions(+)

v6: Move .caps initialisations to .start object.
- https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t

diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c
index 41e79f031b..592e85d69d 100644
--- a/tests/qtest/migration/compression-tests.c
+++ b/tests/qtest/migration/compression-tests.c
@@ -42,6 +42,19 @@ static void test_multifd_tcp_zstd(void)
     };
     test_precopy_common(&args);
 }
+
+static void test_multifd_postcopy_tcp_zstd(void)
+{
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start = {
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
+        },
+        .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
+    };
+
+    test_precopy_common(&args);
+}
 #endif /* CONFIG_ZSTD */
 
 #ifdef CONFIG_QATZIP
@@ -184,6 +197,8 @@ void migration_test_add_compression(MigrationTestEnv *env)
 #ifdef CONFIG_ZSTD
     migration_test_add("/migration/multifd/tcp/plain/zstd",
                        test_multifd_tcp_zstd);
+    migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd",
+                       test_multifd_postcopy_tcp_zstd);
 #endif
 
 #ifdef CONFIG_QATZIP
diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c
index 483e3ff99f..eb637f94f7 100644
--- a/tests/qtest/migration/postcopy-tests.c
+++ b/tests/qtest/migration/postcopy-tests.c
@@ -94,6 +94,29 @@ static void migration_test_add_postcopy_smoke(MigrationTestEnv *env)
     }
 }
 
+static void test_multifd_postcopy(void)
+{
+    MigrateCommon args = {
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
+    };
+
+    test_postcopy_common(&args);
+}
+
+static void test_multifd_postcopy_preempt(void)
+{
+    MigrateCommon args = {
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+        },
+    };
+
+    test_postcopy_common(&args);
+}
+
 void migration_test_add_postcopy(MigrationTestEnv *env)
 {
     migration_test_add_postcopy_smoke(env);
@@ -114,6 +137,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env)
             "/migration/postcopy/recovery/double-failures/reconnect",
             test_postcopy_recovery_fail_reconnect);
 
+        migration_test_add("/migration/postcopy/multifd/plain",
+                           test_multifd_postcopy);
+        migration_test_add("/migration/postcopy/multifd/preempt/plain",
+                           test_multifd_postcopy_preempt);
         if (env->is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index f8404793b8..b2b0db8076 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -34,6 +34,7 @@
 #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
 
 static char *tmpfs;
+static bool postcopy_ram = false;
 
 static void test_precopy_unix_plain(void)
 {
@@ -476,6 +477,11 @@ static void test_multifd_tcp_cancel(void)
     migrate_ensure_non_converge(from);
     migrate_prepare_for_dirty_mem(from);
 
+    if (postcopy_ram) {
+        migrate_set_capability(from, "postcopy-ram", true);
+        migrate_set_capability(to, "postcopy-ram", true);
+    }
+
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
 
@@ -517,6 +523,10 @@ static void test_multifd_tcp_cancel(void)
         return;
     }
 
+    if (postcopy_ram) {
+        migrate_set_capability(to2, "postcopy-ram", true);
+    }
+
     migrate_set_parameter_int(to2, "multifd-channels", 16);
 
     migrate_set_capability(to2, "multifd", true);
@@ -540,6 +550,13 @@ static void test_multifd_tcp_cancel(void)
     migrate_end(from, to2, true);
 }
 
+static void test_multifd_postcopy_tcp_cancel(void)
+{
+    postcopy_ram = true;
+    test_multifd_tcp_cancel();
+    postcopy_ram = false;
+}
+
 static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
                                          const char *uri, const char *phase)
 {
@@ -1127,6 +1144,8 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
                        test_multifd_tcp_uri_none);
     migration_test_add("/migration/multifd/tcp/plain/cancel",
                        test_multifd_tcp_cancel);
+    migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
+                       test_multifd_postcopy_tcp_cancel);
 }
 
 void migration_test_add_precopy(MigrationTestEnv *env)
diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c
index 72f44defbb..81a2faf4c0 100644
--- a/tests/qtest/migration/tls-tests.c
+++ b/tests/qtest/migration/tls-tests.c
@@ -395,6 +395,19 @@ static void test_postcopy_recovery_tls_psk(void)
     test_postcopy_recovery_common(&args);
 }
 
+static void test_multifd_postcopy_recovery_tls_psk(void)
+{
+    MigrateCommon args = {
+        .start_hook = migrate_hook_start_tls_psk_match,
+        .end_hook = migrate_hook_end_tls_psk,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
 /* This contains preempt+recovery+tls test altogether */
 static void test_postcopy_preempt_all(void)
 {
@@ -409,6 +422,19 @@ static void test_postcopy_preempt_all(void)
     test_postcopy_recovery_common(&args);
 }
 
+static void test_multifd_postcopy_preempt_recovery_tls_psk(void)
+{
+    MigrateCommon args = {
+        .start_hook = migrate_hook_start_tls_psk_match,
+        .end_hook = migrate_hook_end_tls_psk,
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
 static void test_precopy_unix_tls_psk(void)
 {
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -657,6 +683,20 @@ static void test_multifd_tcp_tls_psk_mismatch(void)
     test_precopy_common(&args);
 }
 
+static void test_multifd_postcopy_tcp_tls_psk_match(void)
+{
+    MigrateCommon args = {
+        .start = {
+            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+        },
+        .listen_uri = "defer",
+        .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match,
+        .end_hook = migrate_hook_end_tls_psk,
+    };
+
+    test_precopy_common(&args);
+}
+
 #ifdef CONFIG_TASN1
 static void test_multifd_tcp_tls_x509_default_host(void)
 {
@@ -774,6 +814,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
                            test_postcopy_preempt_tls_psk);
         migration_test_add("/migration/postcopy/preempt/recovery/tls/psk",
                            test_postcopy_preempt_all);
+        migration_test_add("/migration/postcopy/multifd/recovery/tls/psk",
+                           test_multifd_postcopy_recovery_tls_psk);
+        migration_test_add("/migration/postcopy/multifd/preempt/recovery/tls/psk",
+                           test_multifd_postcopy_preempt_recovery_tls_psk);
     }
 #ifdef CONFIG_TASN1
     migration_test_add("/migration/precopy/unix/tls/x509/default-host",
@@ -805,6 +849,8 @@ void migration_test_add_tls(MigrationTestEnv *env)
                        test_multifd_tcp_tls_psk_match);
     migration_test_add("/migration/multifd/tcp/tls/psk/mismatch",
                        test_multifd_tcp_tls_psk_mismatch);
+    migration_test_add("/migration/multifd+postcopy/tcp/tls/psk/match",
+                       test_multifd_postcopy_tcp_tls_psk_match);
 #ifdef CONFIG_TASN1
     migration_test_add("/migration/multifd/tcp/tls/x509/default-host",
                        test_multifd_tcp_tls_x509_default_host);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
                   ` (3 preceding siblings ...)
  2025-02-28 12:17 ` [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
@ 2025-02-28 12:17 ` Prasad Pandit
  2025-02-28 13:42   ` Peter Xu
  2025-02-28 14:53 ` [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Fabiano Rosas
  5 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-02-28 12:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

When Multifd and Postcopy migration is enabled together,
before starting Postcopy phase, Multifd threads are shutdown.

But before this shutdown, we need to flush the Multifd channels
on the source side and notify the destination migration thread
to synchronise with the Multifd 'recv_x' threads, so that all
the Multifd data is received and processed on the destination
side, before Multifd threads are shutdown.

This synchronisation happens when the main migration thread
waits for all the Multifd threads to receive their data.

Add 'MULTIFD_RECV_SYNC' migration command to notify the
destination side to synchronise with Multifd threads.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/migration.c      |  3 +++
 migration/multifd-nocomp.c | 21 +++++++++++++++------
 migration/multifd.c        |  1 +
 migration/multifd.h        |  1 +
 migration/savevm.c         | 13 +++++++++++++
 migration/savevm.h         |  1 +
 6 files changed, 34 insertions(+), 6 deletions(-)

v6: New patch, not from earlier versions.
- https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t

diff --git a/migration/migration.c b/migration/migration.c
index 5db9e18272..65fc4f5eed 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3401,6 +3401,9 @@ static MigIterateState migration_iteration_run(MigrationState *s)
     if (!in_postcopy && must_precopy <= s->threshold_size
         && can_switchover && qatomic_read(&s->start_postcopy)) {
         if (migrate_multifd()) {
+            multifd_send_flush();
+            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
+            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
             multifd_send_shutdown();
         }
         if (postcopy_start(s, &local_err)) {
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index d0edec7cd1..bbe07e4f7e 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -334,7 +334,7 @@ retry:
      * After flush, always retry.
      */
     if (pages->block != block || multifd_queue_full(pages)) {
-        if (!multifd_send(&multifd_ram_send)) {
+        if (multifd_send_flush() < 0) {
             return false;
         }
         goto retry;
@@ -387,6 +387,18 @@ bool multifd_ram_sync_per_round(void)
     return !migrate_multifd_flush_after_each_section();
 }
 
+int multifd_send_flush(void)
+{
+    if (!multifd_payload_empty(multifd_ram_send)) {
+        if (!multifd_send(&multifd_ram_send)) {
+            error_report("%s: multifd_send fail", __func__);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 int multifd_ram_flush_and_sync(QEMUFile *f)
 {
     MultiFDSyncReq req;
@@ -396,11 +408,8 @@ int multifd_ram_flush_and_sync(QEMUFile *f)
         return 0;
     }
 
-    if (!multifd_payload_empty(multifd_ram_send)) {
-        if (!multifd_send(&multifd_ram_send)) {
-            error_report("%s: multifd_send fail", __func__);
-            return -1;
-        }
+    if ((ret = multifd_send_flush()) < 0) {
+        return ret;
     }
 
     /* File migrations only need to sync with threads */
diff --git a/migration/multifd.c b/migration/multifd.c
index 2bd8604ca1..8928ca2611 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1265,6 +1265,7 @@ static void *multifd_recv_thread(void *opaque)
 
     rcu_unregister_thread();
     trace_multifd_recv_thread_end(p->id, p->packets_recved);
+    qemu_sem_post(&multifd_recv_state->sem_sync);
 
     return NULL;
 }
diff --git a/migration/multifd.h b/migration/multifd.h
index bff867ca6b..242b923633 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -361,6 +361,7 @@ static inline uint32_t multifd_ram_page_count(void)
 
 void multifd_ram_save_setup(void);
 void multifd_ram_save_cleanup(void);
+int multifd_send_flush(void);
 int multifd_ram_flush_and_sync(QEMUFile *f);
 bool multifd_ram_sync_per_round(void);
 bool multifd_ram_sync_per_section(void);
diff --git a/migration/savevm.c b/migration/savevm.c
index 4046faf009..0b71e988ba 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -37,6 +37,7 @@
 #include "migration/register.h"
 #include "migration/global_state.h"
 #include "migration/channel-block.h"
+#include "multifd.h"
 #include "ram.h"
 #include "qemu-file.h"
 #include "savevm.h"
@@ -90,6 +91,7 @@ enum qemu_vm_cmd {
     MIG_CMD_ENABLE_COLO,       /* Enable COLO */
     MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
+    MIG_CMD_MULTIFD_RECV_SYNC, /* Sync multifd recv_x and main threads */
     MIG_CMD_MAX
 };
 
@@ -109,6 +111,7 @@ static struct mig_cmd_args {
     [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
     [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
     [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
+    [MIG_CMD_MULTIFD_RECV_SYNC] = { .len = 0, .name = "MULTIFD_RECV_SYNC" },
     [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
 };
 
@@ -1201,6 +1204,12 @@ void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
     qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
 }
 
+void qemu_savevm_send_multifd_recv_sync(QEMUFile *f)
+{
+    /* TBD: trace_savevm_send_multifd_recv_sync(); */
+    qemu_savevm_command_send(f, MIG_CMD_MULTIFD_RECV_SYNC, 0, NULL);
+}
+
 bool qemu_savevm_state_blocked(Error **errp)
 {
     SaveStateEntry *se;
@@ -2479,6 +2488,10 @@ static int loadvm_process_command(QEMUFile *f)
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
 
+    case MIG_CMD_MULTIFD_RECV_SYNC:
+        multifd_recv_sync_main();
+        break;
+
     case MIG_CMD_ENABLE_COLO:
         return loadvm_process_enable_colo(mis);
     }
diff --git a/migration/savevm.h b/migration/savevm.h
index 7957460062..91ae703925 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -53,6 +53,7 @@ void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
 void qemu_savevm_send_postcopy_resume(QEMUFile *f);
 void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
+void qemu_savevm_send_multifd_recv_sync(QEMUFile *f);
 
 void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-02-28 12:17 ` [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command Prasad Pandit
@ 2025-02-28 13:42   ` Peter Xu
  2025-03-03 11:43     ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-02-28 13:42 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Fri, Feb 28, 2025 at 05:47:49PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> When Multifd and Postcopy migration is enabled together,
> before starting Postcopy phase, Multifd threads are shutdown.
> 
> But before this shutdown, we need to flush the Multifd channels
> on the source side and notify the destination migration thread
> to synchronise with the Multifd 'recv_x' threads, so that all
> the Multifd data is received and processed on the destination
> side, before Multifd threads are shutdown.
> 
> This synchronisation happens when the main migration thread
> waits for all the Multifd threads to receive their data.
> 
> Add 'MULTIFD_RECV_SYNC' migration command to notify the
> destination side to synchronise with Multifd threads.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>

Thanks for trying to grant me the credit, but.. I suggest not having a new
message at all.  We should be able to do multifd's flush and sync before VM
stopped in postcopy_start()..  So we can drop this line.

And whatever come up with, it needs to be with patch 2 because patch 2
currently is broken itself.

> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  migration/migration.c      |  3 +++
>  migration/multifd-nocomp.c | 21 +++++++++++++++------
>  migration/multifd.c        |  1 +
>  migration/multifd.h        |  1 +
>  migration/savevm.c         | 13 +++++++++++++
>  migration/savevm.h         |  1 +
>  6 files changed, 34 insertions(+), 6 deletions(-)
> 
> v6: New patch, not from earlier versions.
> - https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5db9e18272..65fc4f5eed 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3401,6 +3401,9 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>      if (!in_postcopy && must_precopy <= s->threshold_size
>          && can_switchover && qatomic_read(&s->start_postcopy)) {
>          if (migrate_multifd()) {
> +            multifd_send_flush();
> +            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> +            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);

And.. this is not what I meant..

So again, there're more than one way to make sure no multifd pages will
arrive during postcopy.

What I actually think the easiest is to do flush and sync once in
postcopy_start() as mentioned above.  I think that'll serialize with
postcopy messages later on the main channel, making sure all channels are
flushed before moving to postcopy work.

If you want to stick with shutdown channels, I'm OK.  But then we at least
need one message to say "the recv side finished joining all recv threads".
That needs to be an ACK sent from dest to src, not src to dest.

>              multifd_send_shutdown();
>          }
>          if (postcopy_start(s, &local_err)) {
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index d0edec7cd1..bbe07e4f7e 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -334,7 +334,7 @@ retry:
>       * After flush, always retry.
>       */
>      if (pages->block != block || multifd_queue_full(pages)) {
> -        if (!multifd_send(&multifd_ram_send)) {
> +        if (multifd_send_flush() < 0) {
>              return false;
>          }
>          goto retry;
> @@ -387,6 +387,18 @@ bool multifd_ram_sync_per_round(void)
>      return !migrate_multifd_flush_after_each_section();
>  }
>  
> +int multifd_send_flush(void)
> +{
> +    if (!multifd_payload_empty(multifd_ram_send)) {
> +        if (!multifd_send(&multifd_ram_send)) {
> +            error_report("%s: multifd_send fail", __func__);
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  int multifd_ram_flush_and_sync(QEMUFile *f)
>  {
>      MultiFDSyncReq req;
> @@ -396,11 +408,8 @@ int multifd_ram_flush_and_sync(QEMUFile *f)
>          return 0;
>      }
>  
> -    if (!multifd_payload_empty(multifd_ram_send)) {
> -        if (!multifd_send(&multifd_ram_send)) {
> -            error_report("%s: multifd_send fail", __func__);
> -            return -1;
> -        }
> +    if ((ret = multifd_send_flush()) < 0) {
> +        return ret;
>      }
>  
>      /* File migrations only need to sync with threads */
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 2bd8604ca1..8928ca2611 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1265,6 +1265,7 @@ static void *multifd_recv_thread(void *opaque)
>  
>      rcu_unregister_thread();
>      trace_multifd_recv_thread_end(p->id, p->packets_recved);
> +    qemu_sem_post(&multifd_recv_state->sem_sync);
>  
>      return NULL;
>  }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index bff867ca6b..242b923633 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -361,6 +361,7 @@ static inline uint32_t multifd_ram_page_count(void)
>  
>  void multifd_ram_save_setup(void);
>  void multifd_ram_save_cleanup(void);
> +int multifd_send_flush(void);
>  int multifd_ram_flush_and_sync(QEMUFile *f);
>  bool multifd_ram_sync_per_round(void);
>  bool multifd_ram_sync_per_section(void);
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 4046faf009..0b71e988ba 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -37,6 +37,7 @@
>  #include "migration/register.h"
>  #include "migration/global_state.h"
>  #include "migration/channel-block.h"
> +#include "multifd.h"
>  #include "ram.h"
>  #include "qemu-file.h"
>  #include "savevm.h"
> @@ -90,6 +91,7 @@ enum qemu_vm_cmd {
>      MIG_CMD_ENABLE_COLO,       /* Enable COLO */
>      MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
>      MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
> +    MIG_CMD_MULTIFD_RECV_SYNC, /* Sync multifd recv_x and main threads */
>      MIG_CMD_MAX
>  };
>  
> @@ -109,6 +111,7 @@ static struct mig_cmd_args {
>      [MIG_CMD_POSTCOPY_RESUME]  = { .len =  0, .name = "POSTCOPY_RESUME" },
>      [MIG_CMD_PACKAGED]         = { .len =  4, .name = "PACKAGED" },
>      [MIG_CMD_RECV_BITMAP]      = { .len = -1, .name = "RECV_BITMAP" },
> +    [MIG_CMD_MULTIFD_RECV_SYNC] = { .len = 0, .name = "MULTIFD_RECV_SYNC" },
>      [MIG_CMD_MAX]              = { .len = -1, .name = "MAX" },
>  };
>  
> @@ -1201,6 +1204,12 @@ void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name)
>      qemu_savevm_command_send(f, MIG_CMD_RECV_BITMAP, len + 1, (uint8_t *)buf);
>  }
>  
> +void qemu_savevm_send_multifd_recv_sync(QEMUFile *f)
> +{
> +    /* TBD: trace_savevm_send_multifd_recv_sync(); */
> +    qemu_savevm_command_send(f, MIG_CMD_MULTIFD_RECV_SYNC, 0, NULL);
> +}
> +
>  bool qemu_savevm_state_blocked(Error **errp)
>  {
>      SaveStateEntry *se;
> @@ -2479,6 +2488,10 @@ static int loadvm_process_command(QEMUFile *f)
>      case MIG_CMD_RECV_BITMAP:
>          return loadvm_handle_recv_bitmap(mis, len);
>  
> +    case MIG_CMD_MULTIFD_RECV_SYNC:
> +        multifd_recv_sync_main();
> +        break;
> +
>      case MIG_CMD_ENABLE_COLO:
>          return loadvm_process_enable_colo(mis);
>      }
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 7957460062..91ae703925 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -53,6 +53,7 @@ void qemu_savevm_send_postcopy_listen(QEMUFile *f);
>  void qemu_savevm_send_postcopy_run(QEMUFile *f);
>  void qemu_savevm_send_postcopy_resume(QEMUFile *f);
>  void qemu_savevm_send_recv_bitmap(QEMUFile *f, char *block_name);
> +void qemu_savevm_send_multifd_recv_sync(QEMUFile *f);
>  
>  void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
>                                             uint16_t len,
> -- 
> 2.48.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
                   ` (4 preceding siblings ...)
  2025-02-28 12:17 ` [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command Prasad Pandit
@ 2025-02-28 14:53 ` Fabiano Rosas
  2025-03-03 10:47   ` Prasad Pandit
  5 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2025-02-28 14:53 UTC (permalink / raw)
  To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Hello,
>
> * This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
>   to notify the destination migration thread to synchronise with the Multifd
>   threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
>   to receive all their data, before they are shutdown.
>
>   This series also updates the channel discovery function and qtests as
>   suggested in the previous review comments.

You forgot to split patch 2. We cannot have a commit that revamps the
channel discovery logic and enables a new feature at the same
time. Changing the channel discovery affects all the migration
use-cases, that change cannot be smuggled along with multifd+postcopy
enablement.

Similarly, the multifd+postcopy enablement is a new feature that needs
to be tested and reasoned upon in isolation, it cannot bring along a
bunch of previously existing code that was shuffled around. We need to
be able to understand clearly what is done _in preparation for_ the
feature and what is done _as part of_ the feature.

Not to mention bisect and backporting. Many people will be looking at
this code in the future without any knowledge of migration, but as part
of a bisect section or when searching for missing backports in the
distros.

I also suggested to move that logic into channel.c. The code is now
well-contained enough that we don't need to be reading it every time
someone is going over the migration flow, it becomes just a helper
function.

> ===
> 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             147.84s   81 subtests passed

I see postcopy/multifd/plain hanging from time to time. Probably due to
the changes in patch 5. Please take a look.

> ===
>
>
> v6: https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
> * This series (v6) shuts down Multifd threads before starting Postcopy
>   migration. It helps to avoid an issue of multifd pages arriving late
>   at the destination during Postcopy phase and corrupting the vCPU
>   state. It also reorders the qtest patches and does some refactoring
>   changes as suggested in previous review.
> ===
> 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             161.35s   73 subtests passed
> ===
>
>
> v5: https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppandit@redhat.com/T/#t
> * This series (v5) consolidates migration capabilities setting in one
>   'set_migration_capabilities()' function, thus simplifying test sources.
>   It passes all migration tests.
> ===
> 66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             143.66s   71 subtests passed
> ===
>
>
> v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t
> * This series (v4) adds more 'multifd+postcopy' qtests which test
>   Precopy migration with 'postcopy-ram' attribute set. And run
>   Postcopy migrations with 'multifd' channels enabled.
> ===
> $ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test'
> # slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
> # slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs
> # slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs
> # slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs
> # slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs
> # slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs
> # slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs
> ...
> 66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             148.41s   71 subtests passed
> ===
>
>
> v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t
> * This series (v3) passes all existing 'tests/qtest/migration/*' tests
>   and adds a new one to enable multifd channels with postcopy migration.
>
>
> v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u
> * This series (v2) further refactors the 'ram_save_target_page'
>   function to make it independent of the multifd & postcopy change.
>
>
> v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
> * This series removes magic value (4-bytes) introduced in the
>   previous series for the Postcopy channel.
>
>
> v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
> * Currently Multifd and Postcopy migration can not be used together.
>   QEMU shows "Postcopy is not yet compatible with multifd" message.
>
>   When migrating guests with large (100's GB) RAM, Multifd threads
>   help to accelerate migration, but inability to use it with the
>   Postcopy mode delays guest start up on the destination side.
>
> * This patch series allows to enable both Multifd and Postcopy
>   migration together. Precopy and Multifd threads work during
>   the initial guest (RAM) transfer. When migration moves to the
>   Postcopy phase, Multifd threads are restrained and the Postcopy
>   threads start to request pages from the source side.
>
> * This series introduces magic value (4-bytes) to be sent on the
>   Postcopy channel. It helps to differentiate channels and properly
>   setup incoming connections on the destination side.
>
>
> Thank you.
> ---
> Prasad Pandit (5):
>   migration/multifd: move macros to multifd header
>   migration: enable multifd and postcopy together
>   tests/qtest/migration: consolidate set capabilities
>   tests/qtest/migration: add postcopy tests with multifd
>   migration: add MULTIFD_RECV_SYNC migration command
>
>  migration/migration.c                     | 136 +++++++++++++---------
>  migration/multifd-nocomp.c                |  28 +++--
>  migration/multifd.c                       |  17 +--
>  migration/multifd.h                       |   6 +
>  migration/options.c                       |   5 -
>  migration/ram.c                           |   7 +-
>  migration/savevm.c                        |  13 +++
>  migration/savevm.h                        |   1 +
>  tests/qtest/migration/compression-tests.c |  37 +++++-
>  tests/qtest/migration/cpr-tests.c         |   6 +-
>  tests/qtest/migration/file-tests.c        |  58 +++++----
>  tests/qtest/migration/framework.c         |  76 ++++++++----
>  tests/qtest/migration/framework.h         |   9 +-
>  tests/qtest/migration/misc-tests.c        |   4 +-
>  tests/qtest/migration/postcopy-tests.c    |  35 +++++-
>  tests/qtest/migration/precopy-tests.c     |  48 +++++---
>  tests/qtest/migration/tls-tests.c         |  69 ++++++++++-
>  17 files changed, 388 insertions(+), 167 deletions(-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd
  2025-02-28 12:17 ` [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
@ 2025-02-28 15:11   ` Fabiano Rosas
  2025-03-03  9:33     ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2025-02-28 15:11 UTC (permalink / raw)
  To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Add new qtests to run postcopy migration with multifd
> channels enabled.
>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  tests/qtest/migration/compression-tests.c | 15 ++++++++
>  tests/qtest/migration/postcopy-tests.c    | 27 +++++++++++++
>  tests/qtest/migration/precopy-tests.c     | 19 ++++++++++
>  tests/qtest/migration/tls-tests.c         | 46 +++++++++++++++++++++++
>  4 files changed, 107 insertions(+)
>
> v6: Move .caps initialisations to .start object.
> - https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
>
> diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c
> index 41e79f031b..592e85d69d 100644
> --- a/tests/qtest/migration/compression-tests.c
> +++ b/tests/qtest/migration/compression-tests.c
> @@ -42,6 +42,19 @@ static void test_multifd_tcp_zstd(void)
>      };
>      test_precopy_common(&args);
>  }
> +
> +static void test_multifd_postcopy_tcp_zstd(void)
> +{
> +    MigrateCommon args = {
> +        .listen_uri = "defer",
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
> +        },
> +        .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
> +    };
> +
> +    test_precopy_common(&args);
> +}
>  #endif /* CONFIG_ZSTD */
>  
>  #ifdef CONFIG_QATZIP
> @@ -184,6 +197,8 @@ void migration_test_add_compression(MigrationTestEnv *env)
>  #ifdef CONFIG_ZSTD
>      migration_test_add("/migration/multifd/tcp/plain/zstd",
>                         test_multifd_tcp_zstd);
> +    migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd",
> +                       test_multifd_postcopy_tcp_zstd);
>  #endif
>  
>  #ifdef CONFIG_QATZIP
> diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c
> index 483e3ff99f..eb637f94f7 100644
> --- a/tests/qtest/migration/postcopy-tests.c
> +++ b/tests/qtest/migration/postcopy-tests.c
> @@ -94,6 +94,29 @@ static void migration_test_add_postcopy_smoke(MigrationTestEnv *env)
>      }
>  }
>  
> +static void test_multifd_postcopy(void)
> +{
> +    MigrateCommon args = {
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> +        },
> +    };
> +
> +    test_postcopy_common(&args);
> +}
> +
> +static void test_multifd_postcopy_preempt(void)
> +{
> +    MigrateCommon args = {
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> +            .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
> +        },
> +    };
> +
> +    test_postcopy_common(&args);
> +}
> +
>  void migration_test_add_postcopy(MigrationTestEnv *env)
>  {
>      migration_test_add_postcopy_smoke(env);
> @@ -114,6 +137,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env)
>              "/migration/postcopy/recovery/double-failures/reconnect",
>              test_postcopy_recovery_fail_reconnect);
>  
> +        migration_test_add("/migration/postcopy/multifd/plain",
> +                           test_multifd_postcopy);
> +        migration_test_add("/migration/postcopy/multifd/preempt/plain",
> +                           test_multifd_postcopy_preempt);
>          if (env->is_x86) {
>              migration_test_add("/migration/postcopy/suspend",
>                                 test_postcopy_suspend);
> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index f8404793b8..b2b0db8076 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -34,6 +34,7 @@
>  #define DIRTYLIMIT_TOLERANCE_RANGE  25  /* MB/s */
>  
>  static char *tmpfs;
> +static bool postcopy_ram = false;
>  
>  static void test_precopy_unix_plain(void)
>  {
> @@ -476,6 +477,11 @@ static void test_multifd_tcp_cancel(void)
>      migrate_ensure_non_converge(from);
>      migrate_prepare_for_dirty_mem(from);
>  
> +    if (postcopy_ram) {
> +        migrate_set_capability(from, "postcopy-ram", true);
> +        migrate_set_capability(to, "postcopy-ram", true);
> +    }
> +
>      migrate_set_parameter_int(from, "multifd-channels", 16);
>      migrate_set_parameter_int(to, "multifd-channels", 16);
>  
> @@ -517,6 +523,10 @@ static void test_multifd_tcp_cancel(void)
>          return;
>      }
>  
> +    if (postcopy_ram) {
> +        migrate_set_capability(to2, "postcopy-ram", true);
> +    }
> +
>      migrate_set_parameter_int(to2, "multifd-channels", 16);
>  
>      migrate_set_capability(to2, "multifd", true);
> @@ -540,6 +550,13 @@ static void test_multifd_tcp_cancel(void)
>      migrate_end(from, to2, true);
>  }
>  
> +static void test_multifd_postcopy_tcp_cancel(void)
> +{
> +    postcopy_ram = true;
> +    test_multifd_tcp_cancel();
> +    postcopy_ram = false;
> +}
> +
>  static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
>                                           const char *uri, const char *phase)
>  {
> @@ -1127,6 +1144,8 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
>                         test_multifd_tcp_uri_none);
>      migration_test_add("/migration/multifd/tcp/plain/cancel",
>                         test_multifd_tcp_cancel);
> +    migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
> +                       test_multifd_postcopy_tcp_cancel);
>  }
>  
>  void migration_test_add_precopy(MigrationTestEnv *env)
> diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c
> index 72f44defbb..81a2faf4c0 100644
> --- a/tests/qtest/migration/tls-tests.c
> +++ b/tests/qtest/migration/tls-tests.c
> @@ -395,6 +395,19 @@ static void test_postcopy_recovery_tls_psk(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> +static void test_multifd_postcopy_recovery_tls_psk(void)
> +{
> +    MigrateCommon args = {
> +        .start_hook = migrate_hook_start_tls_psk_match,
> +        .end_hook = migrate_hook_end_tls_psk,
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> +        },
> +    };
> +
> +    test_postcopy_recovery_common(&args);
> +}
> +
>  /* This contains preempt+recovery+tls test altogether */
>  static void test_postcopy_preempt_all(void)
>  {
> @@ -409,6 +422,19 @@ static void test_postcopy_preempt_all(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> +static void test_multifd_postcopy_preempt_recovery_tls_psk(void)
> +{
> +    MigrateCommon args = {
> +        .start_hook = migrate_hook_start_tls_psk_match,
> +        .end_hook = migrate_hook_end_tls_psk,
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> +        },
> +    };
> +
> +    test_postcopy_recovery_common(&args);
> +}
> +
>  static void test_precopy_unix_tls_psk(void)
>  {
>      g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> @@ -657,6 +683,20 @@ static void test_multifd_tcp_tls_psk_mismatch(void)
>      test_precopy_common(&args);
>  }
>  
> +static void test_multifd_postcopy_tcp_tls_psk_match(void)
> +{
> +    MigrateCommon args = {
> +        .start = {
> +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,

Missing POSTCOPY_RAM here, no?

> +        },
> +        .listen_uri = "defer",
> +        .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match,
> +        .end_hook = migrate_hook_end_tls_psk,
> +    };
> +
> +    test_precopy_common(&args);
> +}
> +
>  #ifdef CONFIG_TASN1
>  static void test_multifd_tcp_tls_x509_default_host(void)
>  {
> @@ -774,6 +814,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
>                             test_postcopy_preempt_tls_psk);
>          migration_test_add("/migration/postcopy/preempt/recovery/tls/psk",
>                             test_postcopy_preempt_all);
> +        migration_test_add("/migration/postcopy/multifd/recovery/tls/psk",
> +                           test_multifd_postcopy_recovery_tls_psk);
> +        migration_test_add("/migration/postcopy/multifd/preempt/recovery/tls/psk",
> +                           test_multifd_postcopy_preempt_recovery_tls_psk);
>      }
>  #ifdef CONFIG_TASN1
>      migration_test_add("/migration/precopy/unix/tls/x509/default-host",
> @@ -805,6 +849,8 @@ void migration_test_add_tls(MigrationTestEnv *env)
>                         test_multifd_tcp_tls_psk_match);
>      migration_test_add("/migration/multifd/tcp/tls/psk/mismatch",
>                         test_multifd_tcp_tls_psk_mismatch);
> +    migration_test_add("/migration/multifd+postcopy/tcp/tls/psk/match",
> +                       test_multifd_postcopy_tcp_tls_psk_match);
>  #ifdef CONFIG_TASN1
>      migration_test_add("/migration/multifd/tcp/tls/x509/default-host",
>                         test_multifd_tcp_tls_x509_default_host);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd
  2025-02-28 15:11   ` Fabiano Rosas
@ 2025-03-03  9:33     ` Prasad Pandit
  0 siblings, 0 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-03-03  9:33 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit

On Fri, 28 Feb 2025 at 20:41, Fabiano Rosas <farosas@suse.de> wrote:
> > +static void test_multifd_postcopy_tcp_tls_psk_match(void)
> > +{
> > +    MigrateCommon args = {
> > +        .start = {
> > +            .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
>
> Missing POSTCOPY_RAM here, no?

Yes, will fix it.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-02-28 14:53 ` [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Fabiano Rosas
@ 2025-03-03 10:47   ` Prasad Pandit
  2025-03-03 14:12     ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-03 10:47 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit

Hello Fabiano,

On Fri, 28 Feb 2025 at 20:23, Fabiano Rosas <farosas@suse.de> wrote:
> You forgot to split patch 2. We cannot have a commit that revamps the
> channel discovery logic and enables a new feature at the same
> time. Changing the channel discovery affects all the migration
> use-cases, that change cannot be smuggled along with multifd+postcopy
> enablement.

===
>> Continue with this patch and fix the stuff I mentioned. You can ignore
>> the first two paragraphs of that reply.
>>
>> https://lore.kernel.org/r/87y0y4tf5q.fsf@suse.de
>>
>> I still think we need to test that preempt + multifd scenario, but it
>> should be easy to write a test for that once the series is in more of a
>> final shape.
===

* I thought that major overhaul of the channel discovery part by
moving it to channel.c was to be done subsequently, no? As we
discussed, we need to touch the channel discovery parts in
'migration_ioc_process_incoming' because, we need to identify the
Postcopy channel when its connection comes in. So the Patch-2 does
minimal changes that are _required_ to enable the two (multifd &
postcopy) features together.

* Whereas, moving channel discovery parts out to connections.c could
be done separately with its own reasoning that - we are moving it
outside to connection.c because it fits better there.

> Similarly, the multifd+postcopy enablement is a new feature that needs
> to be tested and reasoned upon in isolation, it cannot bring along a
> bunch of previously existing code that was shuffled around. We need to
> be able to understand clearly what is done _in preparation for_ the
> feature and what is done _as part of_ the feature.
...
> Not to mention bisect and backporting. Many people will be looking at
> this code in the future without any knowledge of migration, but as part
> of a bisect section or when searching for missing backports in the
> distros.

* I think we (you, me, Peter) are all looking at things differently.
    - In my view Patch-2 is the minimal change _required_  to enable
multifd & postcopy. In your view we are _revamping_ channel discovery
parts while _sneaking_ in a feature of enabling multifd & postcopy
together.
    - In my view Patch-5 in this series is an isolated change because
it adds a new migration command to allow multifd threads sync from
source side. But Peter thinks without that 'flush and sync' Patch-2 is
incomplete, so we should merge it back there.

* I've also shared my view about not trying to do everything in this
one series. I don't know how do we move forward from here. I'm also
not sure what the final _acceptable_ patch-set should look like. I
thought a tested working patch-set is a good start. At this point I
think I'll just follow your lead.

> I also suggested to move that logic into channel.c. The code is now
> well-contained enough that we don't need to be reading it every time
> someone is going over the migration flow, it becomes just a helper
> function.

* How exactly do we want to divide patch-2 to move channel discovery
parts to channel.c?
===
     if (!migration_has_main_and_multifd_channels()) {
     }
     ...
     } else {
        channel = CH_MAIN;
    }
===
Do we move this entire if - else - else block to channel.c from
migration_ioc_process_incoming() ?

> > ===
> > 67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                 OK             147.84s   81 subtests passed
>
> I see postcopy/multifd/plain hanging from time to time. Probably due to
> the changes in patch 5. Please take a look.

* Okay. Does it hang indefinitely or for brief moments?

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-02-28 13:42   ` Peter Xu
@ 2025-03-03 11:43     ` Prasad Pandit
  2025-03-03 14:50       ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-03 11:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hello Peter,

On Fri, 28 Feb 2025 at 19:13, Peter Xu <peterx@redhat.com> wrote:
> We should be able to do multifd's flush and sync before VM
> stopped in postcopy_start()..
>
> What I actually think the easiest is to do flush and sync once in
> postcopy_start() as mentioned above.  I think that'll serialize with
> postcopy messages later on the main channel, making sure all channels are
> flushed before moving to postcopy work.

* Is there some specific benefit to calling
'multifd_ram_flush_and_sync()' from OR before postcopy_start()? Maybe
that is missing on me.

* I'll try to explain why adding a migration command makes sense:

   - I did call 'multifd_ram_flush_and_sync()' before calling
postcopy_start() in migration_iteration_run(). After flushing the
'multifd_send' queue, all it does is send
'RAM_SAVE_FLAG_MULTIFD_FLUSH' flag on the main channel. On the
destination side the 'qemu_loadvm_state_main()' function does not
understand that message, because it looks for 'section_type'; And when
it is not able to identify the section, it leads to an error.

          "Expected vmdescription section, but got %d", section_type(=0)"

   - ie. multifd_ram_flush_and_sync() is not reusable by itself. To
make it work, we need to add a (RAM) section header to the message.
Because then it'll go to ram_load_precopy() and call ->
multifd_recv_sync_main().

   - But sending a lone RAM section header from
migration_iteration_run() OR even in postcopy_start() does not seem
right. That is out-of-place, because both migration_iteration_run()
and postcopy_start() have nothing to do with RAM sections.

   - I think  'flush' and 'sync' ought to be small individual
functions/operations that are section agnostic. We should be able to
call 'flush' and 'sync' from anywhere in the code without
side-effects. So tying 'flush' and 'sync' with RAM sections does not
seem right, because we need to be able to call 'flush' and 'sync' from
other parts too, like before calling postcopy_start() OR maybe some
other code parts.

   - Another observation is: when multifd and postcopy are enabled
together and the guest VM is writing to its RAM,
multifd_ram_flush_and_sync() is called only during setup, not after
that.
=====
2025-03-02 18:13:26.425+0000: initiating migration
2025-03-02T18:13:26.508809Z qemu-system-x86_64:
multifd_ram_flush_and_sync: called
2025-03-02 18:15:20.070+0000: shutting down, reason=migrated

2025-03-03 11:26:41.340+0000: initiating migration
2025-03-03T11:26:41.415625Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-03 11:30:22.766+0000: shutting down, reason=migrated
=====

   - If we untie the 'flush and sync' from RAM sections and make it a
general migration command, we shall be able to call it from anywhere
else.

> If you want to stick with shutdown channels, I'm OK.  But then we at least
> need one message to say "the recv side finished joining all recv threads".
> That needs to be an ACK sent from dest to src, not src to dest.

* But earlier we discussed 'flush and sync' is enough for that, no?
Because 'multifd_ram_flush_and_sync' only notifies the destination to
sync multifd_recv threads. It does not receive any acknowledgement
back.

* And multifd_recv_sync_main() function on the destination blocks the
'main' thread until all multfd_recv_threads (mig/dst/recv_x) have
exited, only then it proceeds to accept the incoming new postcopy
connection.


Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-03 10:47   ` Prasad Pandit
@ 2025-03-03 14:12     ` Peter Xu
  2025-03-04  9:47       ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-03 14:12 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit

On Mon, Mar 03, 2025 at 04:17:53PM +0530, Prasad Pandit wrote:
> * I think we (you, me, Peter) are all looking at things differently.
>     - In my view Patch-2 is the minimal change _required_  to enable
> multifd & postcopy. In your view we are _revamping_ channel discovery
> parts while _sneaking_ in a feature of enabling multifd & postcopy
> together.
>     - In my view Patch-5 in this series is an isolated change because
> it adds a new migration command to allow multifd threads sync from
> source side. But Peter thinks without that 'flush and sync' Patch-2 is
> incomplete, so we should merge it back there.

Just to mention, my suggestion does not conflict with splitting patch 2, as
long as you keep every patch complete on its own.

Patch 5 needs to be squashed to either patch 2 or a split patch out of
patch 2, because current patch 2 (or any possible way to split it into
smaller ones, then one of them which enables the feature) is buggy.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-03 11:43     ` Prasad Pandit
@ 2025-03-03 14:50       ` Peter Xu
  2025-03-04  8:10         ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-03 14:50 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Mon, Mar 03, 2025 at 05:13:56PM +0530, Prasad Pandit wrote:
> Hello Peter,
> 
> On Fri, 28 Feb 2025 at 19:13, Peter Xu <peterx@redhat.com> wrote:
> > We should be able to do multifd's flush and sync before VM
> > stopped in postcopy_start()..
> >
> > What I actually think the easiest is to do flush and sync once in
> > postcopy_start() as mentioned above.  I think that'll serialize with
> > postcopy messages later on the main channel, making sure all channels are
> > flushed before moving to postcopy work.
> 
> * Is there some specific benefit to calling
> 'multifd_ram_flush_and_sync()' from OR before postcopy_start()? Maybe
> that is missing on me.
> 
> * I'll try to explain why adding a migration command makes sense:
> 
>    - I did call 'multifd_ram_flush_and_sync()' before calling
> postcopy_start() in migration_iteration_run(). After flushing the
> 'multifd_send' queue, all it does is send
> 'RAM_SAVE_FLAG_MULTIFD_FLUSH' flag on the main channel. On the
> destination side the 'qemu_loadvm_state_main()' function does not
> understand that message, because it looks for 'section_type'; And when
> it is not able to identify the section, it leads to an error.
> 
>           "Expected vmdescription section, but got %d", section_type(=0)"
> 
>    - ie. multifd_ram_flush_and_sync() is not reusable by itself. To
> make it work, we need to add a (RAM) section header to the message.
> Because then it'll go to ram_load_precopy() and call ->
> multifd_recv_sync_main().
> 
>    - But sending a lone RAM section header from
> migration_iteration_run() OR even in postcopy_start() does not seem
> right. That is out-of-place, because both migration_iteration_run()
> and postcopy_start() have nothing to do with RAM sections.
> 
>    - I think  'flush' and 'sync' ought to be small individual
> functions/operations that are section agnostic. We should be able to
> call 'flush' and 'sync' from anywhere in the code without
> side-effects. So tying 'flush' and 'sync' with RAM sections does not
> seem right, because we need to be able to call 'flush' and 'sync' from
> other parts too, like before calling postcopy_start() OR maybe some
> other code parts.

We need the header.  Maybe the easiest as of now is one more hook like
qemu_savevm_state_complete_precopy_prepare(), and only use it in RAM as of
now.

> 
>    - Another observation is: when multifd and postcopy are enabled
> together and the guest VM is writing to its RAM,
> multifd_ram_flush_and_sync() is called only during setup, not after
> that.
> =====
> 2025-03-02 18:13:26.425+0000: initiating migration
> 2025-03-02T18:13:26.508809Z qemu-system-x86_64:
> multifd_ram_flush_and_sync: called
> 2025-03-02 18:15:20.070+0000: shutting down, reason=migrated
> 
> 2025-03-03 11:26:41.340+0000: initiating migration
> 2025-03-03T11:26:41.415625Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-03 11:30:22.766+0000: shutting down, reason=migrated
> =====
> 
>    - If we untie the 'flush and sync' from RAM sections and make it a
> general migration command, we shall be able to call it from anywhere
> else.
> 
> > If you want to stick with shutdown channels, I'm OK.  But then we at least
> > need one message to say "the recv side finished joining all recv threads".
> > That needs to be an ACK sent from dest to src, not src to dest.
> 
> * But earlier we discussed 'flush and sync' is enough for that, no?

Yes it's ok I think, but this patch didn't do that.

+            multifd_send_flush();
+            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
+            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);

I don't think it sent RAM_SAVE_FLAG_MULTIFD_FLUSH.  IIUC you need the
complete multifd_ram_flush_and_sync(), and the new message not needed.

> Because 'multifd_ram_flush_and_sync' only notifies the destination to
> sync multifd_recv threads. It does not receive any acknowledgement
> back.

If my understanding is correct, we don't need ACK.  It will make sure when
postcopy starts all pages flushed and consumed on dest.

Instead of I prepare the patch and whole commit message, please take your
time and think about it, justify it, and if you also think it works put
explanation into commit message and then we can go with it.

> 
> * And multifd_recv_sync_main() function on the destination blocks the
> 'main' thread until all multfd_recv_threads (mig/dst/recv_x) have
> exited, only then it proceeds to accept the incoming new postcopy
> connection.

I don't think it makes sure threads have exited, AFAIU it does not join the
threads, but sync with threads on the per-thread sync messages.

Thanks,

> 
> 
> Thank you.
> ---
>   - Prasad
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-03 14:50       ` Peter Xu
@ 2025-03-04  8:10         ` Prasad Pandit
  2025-03-04 14:35           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-04  8:10 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hello Peter,

On Mon, 3 Mar 2025 at 20:20, Peter Xu <peterx@redhat.com> wrote:
> We need the header.

* We need a section type, which is sent by qemu_savevm_command_send()
as 'QEMU_VM_COMMAND'.

>  Maybe the easiest as of now is one more hook like
> qemu_savevm_state_complete_precopy_prepare(), and only use it in RAM as of
> now.

* What will this helper do?

> > * But earlier we discussed 'flush and sync' is enough for that, no?
>
> Yes it's ok I think, but this patch didn't do that.
>
> +            multifd_send_flush();
> +            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> +            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
>
> I don't think it sent RAM_SAVE_FLAG_MULTIFD_FLUSH.  IIUC you need the
> complete multifd_ram_flush_and_sync(), and the new message not needed.

* If we look at multifd_ram_flush_and_sync(), it does:
     1. multifd_send()                       <= this patch does it via
multifd_send_flush()
     2. multifd_send_sync_main()    <= this patch also calls it above
     3. send RAM_SAVE_FLAG_MULTIFD_FLUSH  <= this patch sends
MIG_CMD_MULTIFD_RECV_SYNC

* What is missing?

> Instead of I prepare the patch and whole commit message, please take your
> time and think about it, justify it, and if you also think it works put
> explanation into commit message and then we can go with it.

* The commit message does explain about flush and sync and how the
migration command helps. What else do we need to add?

> > * And multifd_recv_sync_main() function on the destination blocks the
> > 'main' thread until all multfd_recv_threads (mig/dst/recv_x) have
> > exited, only then it proceeds to accept the incoming new postcopy
> > connection.
>
> I don't think it makes sure threads have exited,

* 'multifd_recv_sync_main()' blocks the main thread on
'multifd_recv_state->sem_sync' semaphore. It is increased when
multifd_recv threads exit due to the shutdown message. ie. the 'main'
thread unblocks when all 'mig/dst/recv_x' threads have exited.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-03 14:12     ` Peter Xu
@ 2025-03-04  9:47       ` Prasad Pandit
  2025-03-04 14:42         ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-04  9:47 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit

Hi,

On Mon, 3 Mar 2025 at 19:42, Peter Xu <peterx@redhat.com> wrote:
> On Mon, Mar 03, 2025 at 04:17:53PM +0530, Prasad Pandit wrote:
> > * I think we (you, me, Peter) are all looking at things differently.
> >     - In my view Patch-2 is the minimal change _required_  to enable
> > multifd & postcopy. In your view we are _revamping_ channel discovery
> > parts while _sneaking_ in a feature of enabling multifd & postcopy
> > together.
> >     - In my view Patch-5 in this series is an isolated change because
> > it adds a new migration command to allow multifd threads sync from
> > source side. But Peter thinks without that 'flush and sync' Patch-2 is
> > incomplete, so we should merge it back there.
>
> Just to mention, my suggestion does not conflict with splitting patch 2, as
> long as you keep every patch complete on its own.
>
> Patch 5 needs to be squashed to either patch 2 or a split patch out of
> patch 2, because current patch 2 (or any possible way to split it into
> smaller ones, then one of them which enables the feature) is buggy.

* I'll try to segregate different parts, then we can discuss how to
split them across patches:

Terminology:
    _requires_  => is without which migration shall not work at all.
    _essential_ => is without which there may be issues.

1. Enable Multifd and Postcopy together
    - It _requires_ removal of the Multifd capability check in
migration/options.c
    - It _requires_ identification of the CH_POSTCOPY connection when it arrives
        - so enum { CH_MAIN, CH_MULTIFD, CH_POSTCOPY } is defined
        - To identify channels, related changes are made to the
channel discovery part (if .. else conditionals) in
migration_ioc_process_incoming() function.
        - These changes help to establish all channel connections.
After that, the migration proceeds as usual until it's time to start
the Postcopy phase.
    - When time comes to start Postcopy phase, we shutdown multifd channels.
        - it _requires_ calling multifd_send_shutdown()
        - It _requires_ moving
file_cleanup_outgoing/socket_cleanup_outgoing calls to
migration_cleanup() function.
    - When Postcopy phase starts, we don't want ram_save_target
_page() function to call ram_save_multifd_page() function, because
migrate_multifd() is still true.
        - It _requires_ adding the !migration_in_postcopy() checks.

* Above changes are minimal _required_ to enable multifd and postcopy
together, while also ensuring that migration continues to work when
those options are not enabled together. With these changes, guest
migration across two machines works without any observed failures.

2. The multifd_ram_flush_and_sync() call/command and the
assert(!migration_in_postcopy()) call in multifd_recv_thread()
    - These calls help to ensure that no multifd data is left behind
when the Postcopy phase starts.
    - And that multifd_recv threads shall not be active when the
Postcopy is running.
    - They protect the guests from potential state corruption.

* It is up to us to decide whether (2) is _required_ OR _essential_
for the feature. Individual opinions can vary here.

3. Revamp of the channel discovery parts by moving those bits to
connection.c or other places.
    - This entails moving the channel discovery parts from
migration_ioc_process_incoming() function to somewhere else because it
makes more sense to move it there and maybe it reduces complexity and
makes the sources easier to understand.*

* It is up to us to decide whether (3) is _required_  OR  _essential_
for the feature. Individual opinions can vary here.

* IMHO (1) is _required_,  (2) is _essential_,  and (3) is neither
_required_ nor _essential_ for the - Enable multifd and postcopy
together - feature. (3) is a completely unrelated change to this
feature.

Since it is an individual opinion, we all can think differently here
and that is perfectly fine. Once we have some consensus, we can decide
how to split or merge patches and move forward.

Hope it helps. Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-04  8:10         ` Prasad Pandit
@ 2025-03-04 14:35           ` Peter Xu
  2025-03-05 11:21             ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-04 14:35 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Tue, Mar 04, 2025 at 01:40:02PM +0530, Prasad Pandit wrote:
> Hello Peter,
> 
> On Mon, 3 Mar 2025 at 20:20, Peter Xu <peterx@redhat.com> wrote:
> > We need the header.
> 
> * We need a section type, which is sent by qemu_savevm_command_send()
> as 'QEMU_VM_COMMAND'.

I think we need the header, the ram is a module.

> 
> >  Maybe the easiest as of now is one more hook like
> > qemu_savevm_state_complete_precopy_prepare(), and only use it in RAM as of
> > now.
> 
> * What will this helper do?

Do similarly like qemu_savevm_state_complete_precopy_iterable() but do
whatever a vmstate hander wants, so it'll be with a header.

> 
> > > * But earlier we discussed 'flush and sync' is enough for that, no?
> >
> > Yes it's ok I think, but this patch didn't do that.
> >
> > +            multifd_send_flush();
> > +            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> > +            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
> >
> > I don't think it sent RAM_SAVE_FLAG_MULTIFD_FLUSH.  IIUC you need the
> > complete multifd_ram_flush_and_sync(), and the new message not needed.
> 
> * If we look at multifd_ram_flush_and_sync(), it does:
>      1. multifd_send()                       <= this patch does it via
> multifd_send_flush()
>      2. multifd_send_sync_main()    <= this patch also calls it above

MULTIFD_SYNC_LOCAL will not invoke MULTIFD_FLAG_SYNC, which we need.

>      3. send RAM_SAVE_FLAG_MULTIFD_FLUSH  <= this patch sends
> MIG_CMD_MULTIFD_RECV_SYNC

IMO we shouldn't make a global cmd for multifd.

> 
> * What is missing?
> 
> > Instead of I prepare the patch and whole commit message, please take your
> > time and think about it, justify it, and if you also think it works put
> > explanation into commit message and then we can go with it.
> 
> * The commit message does explain about flush and sync and how the
> migration command helps. What else do we need to add?

Please consider adding details like "we need message AAA on BBB channel to
serialize with CCC" and details.  Not asking that as required to merge, but
my understanding is that that's what is missing and that's why none of yet
versions can make sure of it in code.  Maybe that'll help you to understand
how that was serialized.

> 
> > > * And multifd_recv_sync_main() function on the destination blocks the
> > > 'main' thread until all multfd_recv_threads (mig/dst/recv_x) have
> > > exited, only then it proceeds to accept the incoming new postcopy
> > > connection.
> >
> > I don't think it makes sure threads have exited,
> 
> * 'multifd_recv_sync_main()' blocks the main thread on
> 'multifd_recv_state->sem_sync' semaphore. It is increased when
> multifd_recv threads exit due to the shutdown message. ie. the 'main'
> thread unblocks when all 'mig/dst/recv_x' threads have exited.

So is it your intention to not send MULTIFD_FLAG_SYNC above?

In all cases, I still think that's not the right way to do.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-04  9:47       ` Prasad Pandit
@ 2025-03-04 14:42         ` Peter Xu
  2025-03-05  7:41           ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-04 14:42 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit

On Tue, Mar 04, 2025 at 03:17:14PM +0530, Prasad Pandit wrote:
> Hi,
> 
> On Mon, 3 Mar 2025 at 19:42, Peter Xu <peterx@redhat.com> wrote:
> > On Mon, Mar 03, 2025 at 04:17:53PM +0530, Prasad Pandit wrote:
> > > * I think we (you, me, Peter) are all looking at things differently.
> > >     - In my view Patch-2 is the minimal change _required_  to enable
> > > multifd & postcopy. In your view we are _revamping_ channel discovery
> > > parts while _sneaking_ in a feature of enabling multifd & postcopy
> > > together.
> > >     - In my view Patch-5 in this series is an isolated change because
> > > it adds a new migration command to allow multifd threads sync from
> > > source side. But Peter thinks without that 'flush and sync' Patch-2 is
> > > incomplete, so we should merge it back there.
> >
> > Just to mention, my suggestion does not conflict with splitting patch 2, as
> > long as you keep every patch complete on its own.
> >
> > Patch 5 needs to be squashed to either patch 2 or a split patch out of
> > patch 2, because current patch 2 (or any possible way to split it into
> > smaller ones, then one of them which enables the feature) is buggy.
> 
> * I'll try to segregate different parts, then we can discuss how to
> split them across patches:
> 
> Terminology:
>     _requires_  => is without which migration shall not work at all.
>     _essential_ => is without which there may be issues.
> 
> 1. Enable Multifd and Postcopy together
>     - It _requires_ removal of the Multifd capability check in
> migration/options.c
>     - It _requires_ identification of the CH_POSTCOPY connection when it arrives
>         - so enum { CH_MAIN, CH_MULTIFD, CH_POSTCOPY } is defined
>         - To identify channels, related changes are made to the
> channel discovery part (if .. else conditionals) in
> migration_ioc_process_incoming() function.
>         - These changes help to establish all channel connections.
> After that, the migration proceeds as usual until it's time to start
> the Postcopy phase.
>     - When time comes to start Postcopy phase, we shutdown multifd channels.
>         - it _requires_ calling multifd_send_shutdown()
>         - It _requires_ moving
> file_cleanup_outgoing/socket_cleanup_outgoing calls to
> migration_cleanup() function.
>     - When Postcopy phase starts, we don't want ram_save_target
> _page() function to call ram_save_multifd_page() function, because
> migrate_multifd() is still true.
>         - It _requires_ adding the !migration_in_postcopy() checks.

IIUC Fabiano is not asking you to drop them, but split them.  Split still
"requires" them to be present, as long as before the enablement patch.

For example, if you want you can put the channel changes into a separate
patch, but without enabling the feature.  That single patch (after applied)
should keep migration working as before.

> 
> * Above changes are minimal _required_ to enable multifd and postcopy
> together, while also ensuring that migration continues to work when
> those options are not enabled together. With these changes, guest
> migration across two machines works without any observed failures.
> 
> 2. The multifd_ram_flush_and_sync() call/command and the
> assert(!migration_in_postcopy()) call in multifd_recv_thread()
>     - These calls help to ensure that no multifd data is left behind
> when the Postcopy phase starts.
>     - And that multifd_recv threads shall not be active when the
> Postcopy is running.
>     - They protect the guests from potential state corruption.
> 
> * It is up to us to decide whether (2) is _required_ OR _essential_
> for the feature. Individual opinions can vary here.
> 
> 3. Revamp of the channel discovery parts by moving those bits to
> connection.c or other places.
>     - This entails moving the channel discovery parts from
> migration_ioc_process_incoming() function to somewhere else because it
> makes more sense to move it there and maybe it reduces complexity and
> makes the sources easier to understand.*
> 
> * It is up to us to decide whether (3) is _required_  OR  _essential_
> for the feature. Individual opinions can vary here.
> 
> * IMHO (1) is _required_,  (2) is _essential_,  and (3) is neither
> _required_ nor _essential_ for the - Enable multifd and postcopy
> together - feature. (3) is a completely unrelated change to this
> feature.
> 
> Since it is an individual opinion, we all can think differently here
> and that is perfectly fine. Once we have some consensus, we can decide
> how to split or merge patches and move forward.
> 
> Hope it helps. Thank you.
> ---
>   - Prasad
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-04 14:42         ` Peter Xu
@ 2025-03-05  7:41           ` Prasad Pandit
  2025-03-05 13:56             ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-05  7:41 UTC (permalink / raw)
  To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit

Hi,

On Tue, 4 Mar 2025 at 20:12, Peter Xu <peterx@redhat.com> wrote:
> IIUC Fabiano is not asking you to drop them, but split them.  Split still
> "requires" them to be present, as long as before the enablement patch.

* Yes, same here; Even I am not suggesting to drop anything. Fabiano
mentioned the following about the changes being done:

    1.  _in preparation for_ the feature  (multifd+postcopy enablement)
    2.  _as part of_ the feature (multifd+postcopy enablement)
    3. Concerns about git bisect and backporting of patches, not
missing patches when backporting.

* I am thinking:
    1. The _required_ changes are the _as part of_  the feature changes.
    2. The refactoring of ram_save_target_page and moving of
MULTIFD_MAGIC/VERSION macros to multifd.h patch can be termed as _in
preparation for_ the feature. Of these
        - Refactoring of 'ram_save_target_page' function patch is
already pulled upstream.
            -> https://gitlab.com/qemu-project/qemu/-/commit/bc38dc2f5f350310724fd7d4f0a09f8c3a4811fa
        - Similarly moving of MULTIFD_ macros patch can be pulled. It
has not changed for many revisions.
    3. The _essential_ changes are further refinement of the feature
(multifd+postcopy enablement).

* IMO, we can split patches as:
     - One patch for -> _required_ OR _as part of_ the feature changes
     - One patch for -> MULTIFD_ macros as _in preparation for_ the
feature change
     - One patch for -> _essential_  changes : flush and sync related
     - Two patches for qtest cases related changes.
This division also seems helpful for git bisect and backporting
related concerns.

* Currently we are divided over what constitutes:  _required_  OR  _as
part of_ the feature changes.
    - Hence the request for individual opinions towards that.

* If we want to merge _required_/_as part of_ the feature  and
_essential_  changes together in one patch - I'm okay.
* If we want to split the _required_ / _as part of_ the feature patch
further into two, we need to define exactly how we do that division. -
[???]

* I have shared my thoughts above, I won't hold on to it unduly.  We
need to find a way to move forward. I'm willing to go with either way
we decide.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-04 14:35           ` Peter Xu
@ 2025-03-05 11:21             ` Prasad Pandit
  2025-03-05 12:54               ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-05 11:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hi,

On Tue, 4 Mar 2025 at 20:05, Peter Xu <peterx@redhat.com> wrote:
> I think we need the header, the ram is a module.
> Do similarly like qemu_savevm_state_complete_precopy_iterable() but do
> whatever a vmstate hander wants, so it'll be with a header.

* I don't fully see yet how this shall work.

> Please consider adding details like "we need message AAA on BBB channel to
> serialize with CCC" and details.  Not asking that as required to merge, but
> my understanding is that that's what is missing and that's why none of yet
> versions can make sure of it in code.  Maybe that'll help you to understand
> how that was serialized.

* Okay, will try.

>
> MULTIFD_SYNC_LOCAL will not invoke MULTIFD_FLAG_SYNC, which we need.
...
> So is it your intention to not send MULTIFD_FLAG_SYNC above?
> In all cases, I still think that's not the right way to do.

* It makes little difference; MULTIFD_FLAG_SYNC is also used to
increase 'multifd_recv_state->sem_sync' semaphore on the destination
side, which then unblocks the 'main' thread waiting on it.
===
diff --git a/migration/migration.c b/migration/migration.c
index 65fc4f5eed..d8c4ea0ad1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3402,7 +3402,7 @@ static MigIterateState
migration_iteration_run(MigrationState *s)
         && can_switchover && qatomic_read(&s->start_postcopy)) {
         if (migrate_multifd()) {
             multifd_send_flush();
-            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
+            multifd_send_sync_main(MULTIFD_SYNC_ALL);
             qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
             multifd_send_shutdown();
         }
diff --git a/migration/multifd.c b/migration/multifd.c
index 8928ca2611..2b5bc2d478 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1265,7 +1265,7 @@ static void *multifd_recv_thread(void *opaque)

     rcu_unregister_thread();
     trace_multifd_recv_thread_end(p->id, p->packets_recved);
-    qemu_sem_post(&multifd_recv_state->sem_sync);
+//  qemu_sem_post(&multifd_recv_state->sem_sync);

     return NULL;
 }
===
host-1] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
             OK             159.46s   79 subtests passed
host-2] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
             OK             164.55s   79 subtests passed
===

* I tried the above patch and it also works the same. I'll use this, no issues.

Thank you.
---
  - Prasad



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-05 11:21             ` Prasad Pandit
@ 2025-03-05 12:54               ` Peter Xu
  2025-03-07 11:45                 ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-05 12:54 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Wed, Mar 05, 2025 at 04:51:00PM +0530, Prasad Pandit wrote:
> Hi,
> 
> On Tue, 4 Mar 2025 at 20:05, Peter Xu <peterx@redhat.com> wrote:
> > I think we need the header, the ram is a module.
> > Do similarly like qemu_savevm_state_complete_precopy_iterable() but do
> > whatever a vmstate hander wants, so it'll be with a header.
> 
> * I don't fully see yet how this shall work.

Another option is add another event for precopy_notifier_list, it can be
PRECOPY_NOTIFY_SWITCH_POSTCOPY, then make RAM register to it, send the
header itself, and do the flush and sync.

Let me know if you want me to write the patches.  That's almost the only
thing left to make it clearer..

> 
> > Please consider adding details like "we need message AAA on BBB channel to
> > serialize with CCC" and details.  Not asking that as required to merge, but
> > my understanding is that that's what is missing and that's why none of yet
> > versions can make sure of it in code.  Maybe that'll help you to understand
> > how that was serialized.
> 
> * Okay, will try.
> 
> >
> > MULTIFD_SYNC_LOCAL will not invoke MULTIFD_FLAG_SYNC, which we need.
> ...
> > So is it your intention to not send MULTIFD_FLAG_SYNC above?
> > In all cases, I still think that's not the right way to do.
> 
> * It makes little difference; MULTIFD_FLAG_SYNC is also used to
> increase 'multifd_recv_state->sem_sync' semaphore on the destination
> side, which then unblocks the 'main' thread waiting on it.

AFAIU that's the whole difference and whole point of doing such..

> ===
> diff --git a/migration/migration.c b/migration/migration.c
> index 65fc4f5eed..d8c4ea0ad1 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3402,7 +3402,7 @@ static MigIterateState
> migration_iteration_run(MigrationState *s)
>          && can_switchover && qatomic_read(&s->start_postcopy)) {
>          if (migrate_multifd()) {
>              multifd_send_flush();
> -            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> +            multifd_send_sync_main(MULTIFD_SYNC_ALL);
>              qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
>              multifd_send_shutdown();
>          }
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 8928ca2611..2b5bc2d478 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1265,7 +1265,7 @@ static void *multifd_recv_thread(void *opaque)
> 
>      rcu_unregister_thread();
>      trace_multifd_recv_thread_end(p->id, p->packets_recved);
> -    qemu_sem_post(&multifd_recv_state->sem_sync);
> +//  qemu_sem_post(&multifd_recv_state->sem_sync);
> 
>      return NULL;
>  }
> ===
> host-1] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>              OK             159.46s   79 subtests passed
> host-2] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>              OK             164.55s   79 subtests passed
> ===
> 
> * I tried the above patch and it also works the same. I'll use this, no issues.

We can't introduce a migration global cmd just to work the same as
RAM_SAVE_FLAG_MULTIFD_FLUSH, which is a sub-cmd for a module.

> 
> Thank you.
> ---
>   - Prasad
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-05  7:41           ` Prasad Pandit
@ 2025-03-05 13:56             ` Fabiano Rosas
  2025-03-06  7:51               ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2025-03-05 13:56 UTC (permalink / raw)
  To: Prasad Pandit, Peter Xu; +Cc: qemu-devel, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> Hi,
>
> On Tue, 4 Mar 2025 at 20:12, Peter Xu <peterx@redhat.com> wrote:
>> IIUC Fabiano is not asking you to drop them, but split them.  Split still
>> "requires" them to be present, as long as before the enablement patch.
>
> * Yes, same here; Even I am not suggesting to drop anything. Fabiano
> mentioned the following about the changes being done:
>
>     1.  _in preparation for_ the feature  (multifd+postcopy enablement)
>     2.  _as part of_ the feature (multifd+postcopy enablement)

The "channel discovery" part can be considered as a separate patch
because in order to integrate it with the new feature you had to
introduce a new concept of enum CH_* which is not a trivially verifiable
change. That change has the potential to break other use-cases of
migration and although the tests are passing we must be cautions about
it. Simply having that part as a patch is enough separation that an
issue there won't be conflated with the multifd+postcopy feature in
general.

I also called it "in preparation" for the feature because the channel
discovery part is clearly a preexisting issue, in the sense that had we
had proper magic numbers in all channels or some other mechanism to
diffrentiate channels, we wouldn't need this patch. That's a cleanup
you're doing beforehand in order to make the subsequent patches simpler.

Changing that code makes sense regardless of whether there is a new
feature being introduced.

Note that none of this is out of the ordinary, you'll find such
discussions in any thread on this community. It may feel arbitrary to
you because that's tacit knowledge we gathered along the years.

>     3. Concerns about git bisect and backporting of patches, not
> missing patches when backporting.

Yes, whenever you can frame a code change as a pure refactoring, you
should do that. If, say, you had to change 100 lines in order to
implement your feature, but 90 of those lines are just preserving the
existing behavior, that is a good candidate to be a separate patch.

Reviewers will be more inclined to approve the patch, since reading code
and making sure it doesn't change behavior is easier than also
accounting for the new behavior simultaneously.

Maintainers moving patches around, merging, testing, etc will have an
easier time when solving git conflicts because the separate code is more
easily compared with the old one.

Backporters have similar issues with git, but also face questions about
the safety of bringing a patch as a prerequisite into older QEMU
versions. Imagine someone wants to enable multifd+postcopy in QEMU 9.2,
it'll be easier for them to have code more clearly defined as
patches. Another scenario could be if the changes to the channel
discovery actually fix a bug that we're currently not aware of. That
would show up in a bisect session and we wouldn't want to apply a patch
that fixes a bug if the same patch also enables a feature.

>
> * I am thinking:
>     1. The _required_ changes are the _as part of_  the feature changes.
>     2. The refactoring of ram_save_target_page and moving of
> MULTIFD_MAGIC/VERSION macros to multifd.h patch can be termed as _in
> preparation for_ the feature. Of these
>         - Refactoring of 'ram_save_target_page' function patch is
> already pulled upstream.
>             -> https://gitlab.com/qemu-project/qemu/-/commit/bc38dc2f5f350310724fd7d4f0a09f8c3a4811fa
>         - Similarly moving of MULTIFD_ macros patch can be pulled. It
> has not changed for many revisions.
>     3. The _essential_ changes are further refinement of the feature
> (multifd+postcopy enablement).
>
> * IMO, we can split patches as:
>      - One patch for -> _required_ OR _as part of_ the feature changes
>      - One patch for -> MULTIFD_ macros as _in preparation for_ the
> feature change
>      - One patch for -> _essential_  changes : flush and sync related
>      - Two patches for qtest cases related changes.
> This division also seems helpful for git bisect and backporting
> related concerns.
>
> * Currently we are divided over what constitutes:  _required_  OR  _as
> part of_ the feature changes.
>     - Hence the request for individual opinions towards that.
>
> * If we want to merge _required_/_as part of_ the feature  and
> _essential_  changes together in one patch - I'm okay.
> * If we want to split the _required_ / _as part of_ the feature patch
> further into two, we need to define exactly how we do that division. -
> [???]
>

We need an extra patch that reads:

 migration: Refactor channel discovery mechanism
 
 The various logical migration channels don't have a standardized way of
 advertising themselves and their connections may be seen out of order
 by the migration destination. When a new connection arrives, the
 incoming migration currently make use of heuristics to determine which
 channel it belongs to.
 
 The next few patches will need to change how the multifd and postcopy
 capabilities interact and that affects the channel discovery heuristic.
 
 Refactor the channel discovery heuristic to make it less opaque and
 simplify the subsequent patches.
 
 <some description of the new code which might be pertinent>
 ---

You'd move all of the channel discovery code into this patch. Some of it
will be unreacheable because multifd is not yet allowed with postcopy,
but that's fine. You can mention it on the commit message.

About moving the code out of migration.c, it was a suggestion that
you're free to push back. Ideally, doing the work would be faster than
arguing against it on the mailing list. But that's fine.

About the hang in the test. It doesn't reproduce often, but once it
does, it hangs forever (although I haven't waited that long).

> * I have shared my thoughts above, I won't hold on to it unduly.  We
> need to find a way to move forward. I'm willing to go with either way
> we decide.
>
> Thank you.
> ---
>   - Prasad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-05 13:56             ` Fabiano Rosas
@ 2025-03-06  7:51               ` Prasad Pandit
  2025-03-06 13:48                 ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-06  7:51 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

Hello Fabiano,

On Wed, 5 Mar 2025 at 19:26, Fabiano Rosas <farosas@suse.de> wrote:
> Note that none of this is out of the ordinary, you'll find such
> discussions in any thread on this community. It may feel arbitrary to
> you because that's tacit knowledge we gathered along the years.

* I understand. I don't find it arbitrary.

> We need an extra patch that reads:
>
>  migration: Refactor channel discovery mechanism
>
>  The various logical migration channels don't have a standardized way of
>  advertising themselves and their connections may be seen out of order
>  by the migration destination. When a new connection arrives, the
>  incoming migration currently make use of heuristics to determine which
>  channel it belongs to.
>
>  The next few patches will need to change how the multifd and postcopy
>  capabilities interact and that affects the channel discovery heuristic.
>
>  Refactor the channel discovery heuristic to make it less opaque and
>  simplify the subsequent patches.
>
>  <some description of the new code which might be pertinent>
>  ---
>
> You'd move all of the channel discovery code into this patch. Some of it
> will be unreacheable because multifd is not yet allowed with postcopy,
> but that's fine. You can mention it on the commit message.

Please see:
    -> https://privatebin.net/?dad6f052dd986f9f#FULnfrCV29NkQpvsQyvWuU4HdYjDwFbUPbDtvLro7mwi

* Does this division look okay?

> About moving the code out of migration.c, it was a suggestion that
> you're free to push back. Ideally, doing the work would be faster than
> arguing against it on the mailing list. But that's fine.

* Same here, I'm not against moving that code part to connection.c OR
doing the work. My suggestion has been to do that movement in another
series and not try to do everything in this one series.

> About the hang in the test. It doesn't reproduce often, but once it
> does, it hangs forever (although I haven't waited that long).

* Okay, I'm not seeing it or able to reproduce it across 3 different
machines. One is my laptop and the other 2 are servers wherein I'm
testing migrations of guests with 64G/128G of RAM and guest dirtying
memory to the tune of 68M/128M/256M bytes. I'll keep an eye on it if I
find something.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/5] Allow to enable multifd and postcopy migration together
  2025-03-06  7:51               ` Prasad Pandit
@ 2025-03-06 13:48                 ` Fabiano Rosas
  0 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2025-03-06 13:48 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> Hello Fabiano,
>
> On Wed, 5 Mar 2025 at 19:26, Fabiano Rosas <farosas@suse.de> wrote:
>> Note that none of this is out of the ordinary, you'll find such
>> discussions in any thread on this community. It may feel arbitrary to
>> you because that's tacit knowledge we gathered along the years.
>
> * I understand. I don't find it arbitrary.
>
>> We need an extra patch that reads:
>>
>>  migration: Refactor channel discovery mechanism
>>
>>  The various logical migration channels don't have a standardized way of
>>  advertising themselves and their connections may be seen out of order
>>  by the migration destination. When a new connection arrives, the
>>  incoming migration currently make use of heuristics to determine which
>>  channel it belongs to.
>>
>>  The next few patches will need to change how the multifd and postcopy
>>  capabilities interact and that affects the channel discovery heuristic.
>>
>>  Refactor the channel discovery heuristic to make it less opaque and
>>  simplify the subsequent patches.
>>
>>  <some description of the new code which might be pertinent>
>>  ---
>>
>> You'd move all of the channel discovery code into this patch. Some of it
>> will be unreacheable because multifd is not yet allowed with postcopy,
>> but that's fine. You can mention it on the commit message.
>
> Please see:
>     -> https://privatebin.net/?dad6f052dd986f9f#FULnfrCV29NkQpvsQyvWuU4HdYjDwFbUPbDtvLro7mwi
>
> * Does this division look okay?
>

Yes.

>> About moving the code out of migration.c, it was a suggestion that
>> you're free to push back. Ideally, doing the work would be faster than
>> arguing against it on the mailing list. But that's fine.
>
> * Same here, I'm not against moving that code part to connection.c OR
> doing the work. My suggestion has been to do that movement in another
> series and not try to do everything in this one series.
>
>> About the hang in the test. It doesn't reproduce often, but once it
>> does, it hangs forever (although I haven't waited that long).
>
> * Okay, I'm not seeing it or able to reproduce it across 3 different
> machines. One is my laptop and the other 2 are servers wherein I'm
> testing migrations of guests with 64G/128G of RAM and guest dirtying
> memory to the tune of 68M/128M/256M bytes. I'll keep an eye on it if I
> find something.

Usually a loaded (or slow) machine is needed to reproduce multifd
synchronization issues. Sometimes running the test in a loop in parallel
with some other workload helps to uncover them. The CI also tends to
have slower machines that hit these problems.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-05 12:54               ` Peter Xu
@ 2025-03-07 11:45                 ` Prasad Pandit
  2025-03-07 22:48                   ` Peter Xu
  2025-03-07 22:51                   ` Peter Xu
  0 siblings, 2 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-03-07 11:45 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hello Peter,

On Wed, 5 Mar 2025 at 18:24, Peter Xu <peterx@redhat.com> wrote:
> > On Tue, 4 Mar 2025 at 20:05, Peter Xu <peterx@redhat.com> wrote:
> > > I think we need the header, the ram is a module.
> > > Do similarly like qemu_savevm_state_complete_precopy_iterable() but do
> > > whatever a vmstate hander wants, so it'll be with a header.
> >
===
diff --git a/migration/migration.c b/migration/migration.c
index 65fc4f5eed..da2c49c303 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3401,9 +3401,10 @@ static MigIterateState
migration_iteration_run(MigrationState *s)
     if (!in_postcopy && must_precopy <= s->threshold_size
         && can_switchover && qatomic_read(&s->start_postcopy)) {
         if (migrate_multifd()) {
-            multifd_send_flush();
-            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
-            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
+/*          multifd_send_flush();
+ *          multifd_send_sync_main(MULTIFD_SYNC_ALL);
+ *          qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
+ */
+            qemu_savevm_state_complete_multifd(s->to_dst_file);
             multifd_send_shutdown();
         }
         if (postcopy_start(s, &local_err)) {
diff --git a/migration/savevm.c b/migration/savevm.c
index 0b71e988ba..c2b181b0cc 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1602,6 +1602,37 @@ int qemu_savevm_state_complete_precopy(QEMUFile
*f, bool iterable_only)
     return qemu_fflush(f);
 }

+int qemu_savevm_state_complete_multifd(QEMUFile *f)
+{
+    int ret;
+    SaveStateEntry *se;
+    int64_t start_ts_each, end_ts_each;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (strcmp(se->idstr, "ram")) {
+            continue;
+        }
+
+        start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_savevm_section_start(se->idstr, se->section_id);
+
+        save_section_header(f, se, QEMU_VM_SECTION_END);
+
+        ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return -1;
+        }
+        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+        trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
+                                    end_ts_each - start_ts_each);
+    }
+
+    return ret;
+}
+
 /* Give an estimate of the amount left to be transferred,
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
diff --git a/migration/savevm.h b/migration/savevm.h
index 91ae703925..e3789984a1 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -70,5 +70,6 @@ int qemu_load_device_state(QEMUFile *f);
 int qemu_loadvm_approve_switchover(void);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy);
+int qemu_savevm_state_complete_multifd(QEMUFile *f);

 #endif
====

63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
     OK             164.02s   79 subtests passed
63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
     OK             164.99s   79 subtests passed
====

* Does the above patch look okay?

====
Guest not dirtying RAM:
===================
2025-03-07 10:57:28.740+0000: initiating migration
2025-03-07T10:57:28.823166Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T10:58:04.450758Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T10:58:04.711523Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07 10:58:05.078+0000: shutting down, reason=migrated

2025-03-07 10:59:44.322+0000: initiating migration
2025-03-07T10:59:44.394714Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:00:20.198360Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:00:20.279135Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07 11:00:20.571+0000: shutting down, reason=migrated
====

Guest dirtying RAM: $ dd if=/dev/urandom of=/tmp/random.bin bs=256M count=32 ...
================
2025-03-07 11:04:00.281+0000: initiating migration
2025-03-07T11:04:00.359426Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:05:51.406988Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:06:56.899920Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07 11:08:02.376+0000: shutting down, reason=migrated

2025-03-07 11:09:19.295+0000: initiating migration
2025-03-07T11:09:19.372012Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:11:24.217610Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07T11:12:35.342709Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called
2025-03-07 11:13:48.947+0000: shutting down, reason=migrated
====

* When a guest is running some workload (dirtying memory), it seems to
take a little more time before switching to the Postcopy phase.

> Let me know if you want me to write the patches.  That's almost the only
> thing left to make it clearer..

* If the above patch is not okay,  it'll help if you share your
version of it. Meanwhile I'll split the patch-2 and re-arrange other
patches.

Thank you.
---
  - Prasad



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-07 11:45                 ` Prasad Pandit
@ 2025-03-07 22:48                   ` Peter Xu
  2025-03-10  7:36                     ` Prasad Pandit
  2025-03-13 12:43                     ` Prasad Pandit
  2025-03-07 22:51                   ` Peter Xu
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Xu @ 2025-03-07 22:48 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Fri, Mar 07, 2025 at 05:15:03PM +0530, Prasad Pandit wrote:
> Hello Peter,
> 
> On Wed, 5 Mar 2025 at 18:24, Peter Xu <peterx@redhat.com> wrote:
> > > On Tue, 4 Mar 2025 at 20:05, Peter Xu <peterx@redhat.com> wrote:
> > > > I think we need the header, the ram is a module.
> > > > Do similarly like qemu_savevm_state_complete_precopy_iterable() but do

[1]

> > > > whatever a vmstate hander wants, so it'll be with a header.
> > >
> ===
> diff --git a/migration/migration.c b/migration/migration.c
> index 65fc4f5eed..da2c49c303 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3401,9 +3401,10 @@ static MigIterateState
> migration_iteration_run(MigrationState *s)
>      if (!in_postcopy && must_precopy <= s->threshold_size
>          && can_switchover && qatomic_read(&s->start_postcopy)) {
>          if (migrate_multifd()) {
> -            multifd_send_flush();
> -            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> -            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
> +/*          multifd_send_flush();
> + *          multifd_send_sync_main(MULTIFD_SYNC_ALL);
> + *          qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
> + */
> +            qemu_savevm_state_complete_multifd(s->to_dst_file);
>              multifd_send_shutdown();
>          }

Please move all of them at the entry of postcopy_start().

>          if (postcopy_start(s, &local_err)) {
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 0b71e988ba..c2b181b0cc 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1602,6 +1602,37 @@ int qemu_savevm_state_complete_precopy(QEMUFile
> *f, bool iterable_only)
>      return qemu_fflush(f);
>  }
> 
> +int qemu_savevm_state_complete_multifd(QEMUFile *f)

I still like the name I provided because it's more generic, above [1].

> +{
> +    int ret;
> +    SaveStateEntry *se;
> +    int64_t start_ts_each, end_ts_each;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (strcmp(se->idstr, "ram")) {
> +            continue;
> +        }
> +
> +        start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
> +        trace_savevm_section_start(se->idstr, se->section_id);
> +
> +        save_section_header(f, se, QEMU_VM_SECTION_END);

Maybe it should be SECTION_PART.   So we provide all iterator a chance to
do something right before switching to postcopy but before VM stop.  RAM
can register.

> +
> +        ret = se->ops->save_live_complete_precopy(f, se->opaque);

Maybe we don't want to run the whole iteration but only flush.  I think for
simplicity we can make a new hook.

> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
> +        save_section_footer(f, se);
> +        if (ret < 0) {
> +            qemu_file_set_error(f, ret);
> +            return -1;
> +        }
> +        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
> +        trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
> +                                    end_ts_each - start_ts_each);

We do not need to account 

> +    }
> +
> +    return ret;
> +}
> +
>  /* Give an estimate of the amount left to be transferred,
>   * the result is split into the amount for units that can and
>   * for units that can't do postcopy.
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 91ae703925..e3789984a1 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -70,5 +70,6 @@ int qemu_load_device_state(QEMUFile *f);
>  int qemu_loadvm_approve_switchover(void);
>  int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>          bool in_postcopy);
> +int qemu_savevm_state_complete_multifd(QEMUFile *f);
> 
>  #endif
> ====
> 
> 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>      OK             164.02s   79 subtests passed
> 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
>      OK             164.99s   79 subtests passed
> ====
> 
> * Does the above patch look okay?
> 
> ====
> Guest not dirtying RAM:
> ===================
> 2025-03-07 10:57:28.740+0000: initiating migration
> 2025-03-07T10:57:28.823166Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T10:58:04.450758Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T10:58:04.711523Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07 10:58:05.078+0000: shutting down, reason=migrated
> 
> 2025-03-07 10:59:44.322+0000: initiating migration
> 2025-03-07T10:59:44.394714Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:00:20.198360Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:00:20.279135Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07 11:00:20.571+0000: shutting down, reason=migrated
> ====
> 
> Guest dirtying RAM: $ dd if=/dev/urandom of=/tmp/random.bin bs=256M count=32 ...
> ================
> 2025-03-07 11:04:00.281+0000: initiating migration
> 2025-03-07T11:04:00.359426Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:05:51.406988Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:06:56.899920Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07 11:08:02.376+0000: shutting down, reason=migrated
> 
> 2025-03-07 11:09:19.295+0000: initiating migration
> 2025-03-07T11:09:19.372012Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:11:24.217610Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07T11:12:35.342709Z qemu-system-x86_64: info:
> multifd_ram_flush_and_sync: called
> 2025-03-07 11:13:48.947+0000: shutting down, reason=migrated
> ====
> 
> * When a guest is running some workload (dirtying memory), it seems to
> take a little more time before switching to the Postcopy phase.
> 
> > Let me know if you want me to write the patches.  That's almost the only
> > thing left to make it clearer..
> 
> * If the above patch is not okay,  it'll help if you share your
> version of it. Meanwhile I'll split the patch-2 and re-arrange other
> patches.

Please see above, if that doesn't help you come up with a final version,
I'll write it.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-07 11:45                 ` Prasad Pandit
  2025-03-07 22:48                   ` Peter Xu
@ 2025-03-07 22:51                   ` Peter Xu
  2025-03-10 14:38                     ` Fabiano Rosas
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-07 22:51 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Fri, Mar 07, 2025 at 05:15:03PM +0530, Prasad Pandit wrote:
> diff --git a/migration/migration.c b/migration/migration.c
> index 65fc4f5eed..da2c49c303 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3401,9 +3401,10 @@ static MigIterateState
> migration_iteration_run(MigrationState *s)
>      if (!in_postcopy && must_precopy <= s->threshold_size
>          && can_switchover && qatomic_read(&s->start_postcopy)) {
>          if (migrate_multifd()) {
> -            multifd_send_flush();
> -            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
> -            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
> +/*          multifd_send_flush();
> + *          multifd_send_sync_main(MULTIFD_SYNC_ALL);
> + *          qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
> + */
> +            qemu_savevm_state_complete_multifd(s->to_dst_file);
>              multifd_send_shutdown();

Forgot to mention one thing:

If you do flush and sync, IMHO we can keep the threads there and remove
this shutdown, as long as we are sure it'll be properly shutdown when
cleanup.

With the assertion in dest threads, I think it should be OK.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-07 22:48                   ` Peter Xu
@ 2025-03-10  7:36                     ` Prasad Pandit
  2025-03-13 12:43                     ` Prasad Pandit
  1 sibling, 0 replies; 37+ messages in thread
From: Prasad Pandit @ 2025-03-10  7:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hi,

On Sat, 8 Mar 2025 at 04:18, Peter Xu <peterx@redhat.com> wrote:
> Please move all of them at the entry of postcopy_start().

* Okay.

> [1]
>
> > +int qemu_savevm_state_complete_multifd(QEMUFile *f)
> I still like the name I provided because it's more generic, above [1].

===
>  Maybe the easiest as of now is one more hook like
> qemu_savevm_state_complete_precopy_prepare(),
===

* ie. use  ->  'qemu_savevm_state_complete_precopy_prepare'  name?
Should it be  _postcopy_prepare?

>> +        if (strcmp(se->idstr, "ram")) {
>> +            continue;
>> +        }
>> +        save_section_header(f, se, QEMU_VM_SECTION_END);
>
> Maybe it should be SECTION_PART.

* Will try with SECTION_PART.

>  So we provide all iterator a chance to
> do something right before switching to postcopy but before VM stop.  RAM
> can register.

* ...all iterators a chance to do something? Above function only
processes "ram" entries -> strcmp(se->idstr, "ram").

* ...RAM can register?

> > +
> > +        ret = se->ops->save_live_complete_precopy(f, se->opaque);
>
> Maybe we don't want to run the whole iteration but only flush.  I think for
> simplicity we can make a new hook.

* ..new hook? I tried calling multifd_ram_flush_and_sync() in place of
->save_live_complete_precopy(),  but it crashes QEMU.

> > +        end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
> > +        trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
> > +                                    end_ts_each - start_ts_each);
>
> We do not need to account
> If you do flush and sync, IMHO we can keep the threads there and remove
> this shutdown, as long as we are sure it'll be properly shutdown when
> cleanup. With the assertion in dest threads, I think it should be OK.

* Okay.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-07 22:51                   ` Peter Xu
@ 2025-03-10 14:38                     ` Fabiano Rosas
  2025-03-10 17:08                       ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2025-03-10 14:38 UTC (permalink / raw)
  To: Peter Xu, Prasad Pandit; +Cc: qemu-devel, berrange, Prasad Pandit

Peter Xu <peterx@redhat.com> writes:

> On Fri, Mar 07, 2025 at 05:15:03PM +0530, Prasad Pandit wrote:
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 65fc4f5eed..da2c49c303 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3401,9 +3401,10 @@ static MigIterateState
>> migration_iteration_run(MigrationState *s)
>>      if (!in_postcopy && must_precopy <= s->threshold_size
>>          && can_switchover && qatomic_read(&s->start_postcopy)) {
>>          if (migrate_multifd()) {
>> -            multifd_send_flush();
>> -            multifd_send_sync_main(MULTIFD_SYNC_LOCAL);
>> -            qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
>> +/*          multifd_send_flush();
>> + *          multifd_send_sync_main(MULTIFD_SYNC_ALL);
>> + *          qemu_savevm_send_multifd_recv_sync(s->to_dst_file);
>> + */
>> +            qemu_savevm_state_complete_multifd(s->to_dst_file);
>>              multifd_send_shutdown();
>
> Forgot to mention one thing:
>
> If you do flush and sync, IMHO we can keep the threads there and remove
> this shutdown, as long as we are sure it'll be properly shutdown when
> cleanup.
>
> With the assertion in dest threads, I think it should be OK.

Good point. Shutdown at random places makes it difficult to protect
against cleanup races.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-10 14:38                     ` Fabiano Rosas
@ 2025-03-10 17:08                       ` Prasad Pandit
  2025-03-10 19:58                         ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-10 17:08 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

On Mon, 10 Mar 2025 at 20:09, Fabiano Rosas <farosas@suse.de> wrote:
> Good point. Shutdown at random places makes it difficult to protect
> against cleanup races.

* Just to understand, how and where do these races occur?

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-10 17:08                       ` Prasad Pandit
@ 2025-03-10 19:58                         ` Fabiano Rosas
  2025-03-11 10:01                           ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Fabiano Rosas @ 2025-03-10 19:58 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> On Mon, 10 Mar 2025 at 20:09, Fabiano Rosas <farosas@suse.de> wrote:
>> Good point. Shutdown at random places makes it difficult to protect
>> against cleanup races.
>
> * Just to understand, how and where do these races occur?
>

They occur when cleanup code is allowed to run when resources have not
yet been allocated or while the resources are still being accessed.

Having the shutdown routine at a single point when it's clear everything
else is ready for shutdown helps not only to avoid those issues, but
also when investigating them. Otherwise you'll have the same code
running at (potentially) completely different points in time and one of
those times the system might be in an unexpected state.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-10 19:58                         ` Fabiano Rosas
@ 2025-03-11 10:01                           ` Prasad Pandit
  2025-03-11 12:44                             ` Fabiano Rosas
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-11 10:01 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

Hi,

On Tue, 11 Mar 2025 at 01:28, Fabiano Rosas <farosas@suse.de> wrote:
> They occur when cleanup code is allowed to run when resources have not
> yet been allocated or while the resources are still being accessed.
>
> Having the shutdown routine at a single point when it's clear everything
> else is ready for shutdown helps not only to avoid those issues, but
> also when investigating them. Otherwise you'll have the same code
> running at (potentially) completely different points in time and one of
> those times the system might be in an unexpected state.

* I see. It's not clear when this would happen in a production deployment.
===
     if (migrate_multifd()) {
          multifd_send_shutdown();  <= [1]
     }

     postcopy_start(...)  <= [2]
===

* There was another scenario regarding multifd shutdown as: the EOF or
shutdown message sent via [1] on each multifd connection reaches the
destination _later_ than the Postcopy phase start via [2]. And this
may result in multifd_recv_threads on the destination overwriting
memory pages written by postcopy thread, thus corrupting the guest
state.

* Do we have any bugs/issues where these above things happened? To see
the real circumstances under which it happened?

* There seems to be some disconnect between the kind of scenarios we
are considering and the minimal requirements for live migrations: a
stable network with real good bandwidth. If we test live migration
with guest dirtying RAM to the tune of 64M/128M/256M/512M bytes, that
assumes/implies that the network bandwidth is much more than 512Mbps,
no?

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-11 10:01                           ` Prasad Pandit
@ 2025-03-11 12:44                             ` Fabiano Rosas
  0 siblings, 0 replies; 37+ messages in thread
From: Fabiano Rosas @ 2025-03-11 12:44 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: Peter Xu, qemu-devel, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> Hi,
>
> On Tue, 11 Mar 2025 at 01:28, Fabiano Rosas <farosas@suse.de> wrote:
>> They occur when cleanup code is allowed to run when resources have not
>> yet been allocated or while the resources are still being accessed.
>>
>> Having the shutdown routine at a single point when it's clear everything
>> else is ready for shutdown helps not only to avoid those issues, but
>> also when investigating them. Otherwise you'll have the same code
>> running at (potentially) completely different points in time and one of
>> those times the system might be in an unexpected state.
>
> * I see. It's not clear when this would happen in a production deployment.
> ===
>      if (migrate_multifd()) {
>           multifd_send_shutdown();  <= [1]
>      }
>
>      postcopy_start(...)  <= [2]
> ===
>
> * There was another scenario regarding multifd shutdown as: the EOF or
> shutdown message sent via [1] on each multifd connection reaches the
> destination _later_ than the Postcopy phase start via [2]. And this
> may result in multifd_recv_threads on the destination overwriting
> memory pages written by postcopy thread, thus corrupting the guest
> state.

Isn't that the point? To add a sync for this which would allow the
shutdown to not be added?

>
> * Do we have any bugs/issues where these above things happened? To see
> the real circumstances under which it happened?
>

We do. They don't come with a description of the circumstances. You're
lucky if you get a coredump. You can peruse `git log migration/multifd`,
I'd say most of the work in the recent years has been solving
concurrency issues.

> * There seems to be some disconnect between the kind of scenarios we
> are considering and the minimal requirements for live migrations: a
> stable network with real good bandwidth.

There's no such requirement. Besides, the topic is not failed migrations
due to lack of resources. We're talking about correctness issues that
are hard to spot. Those should always be fixed when found, independently
of what the production environment is expected to be.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-07 22:48                   ` Peter Xu
  2025-03-10  7:36                     ` Prasad Pandit
@ 2025-03-13 12:43                     ` Prasad Pandit
  2025-03-13 20:08                       ` Peter Xu
  1 sibling, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-13 12:43 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hello Peter,

On Sat, 8 Mar 2025 at 04:18, Peter Xu <peterx@redhat.com> wrote:
> [1]
> Please move all of them at the entry of postcopy_start().
> I still like the name I provided because it's more generic, above [1].
> Maybe it should be SECTION_PART.   So we provide all iterator a chance to
> do something right before switching to postcopy but before VM stop.  RAM
> can register.
> Maybe we don't want to run the whole iteration but only flush.  I think for
> simplicity we can make a new hook.
> We do not need to account
> Please see above, if that doesn't help you come up with a final version,
> I'll write it.

===
diff --git a/migration/migration.c b/migration/migration.c
index f97bb2777f..9ef8e8ee1d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2700,6 +2700,10 @@ static int postcopy_start(MigrationState *ms,
Error **errp)
     QIOChannelBuffer *bioc;
     QEMUFile *fb;

+    if (migrate_multifd()) {
+        qemu_savevm_state_postcopy_prepare(ms->to_dst_file);
+    }
+
     /*
      * Now we're 100% sure to switch to postcopy, so JSON writer won't be
      * useful anymore.  Free the resources early if it is there.  Clearing

diff --git a/migration/ram.c b/migration/ram.c
index 6fd88cbf2a..13a8c63660 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1143,6 +1143,15 @@ static int save_zero_page(RAMState *rs,
PageSearchStatus *pss,
     return len;
 }

+int ram_save_zero_page(QEMUFile *f, void *opaque)
+{
+    RAMState *rs = *(RAMState **)opaque;
+    PageSearchStatus *pss = &rs->pss[RAM_CHANNEL_PRECOPY];
+    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
+    return save_zero_page(rs, pss, offset);
+}
+
 /*
  * @pages: the number of pages written by the control path,
  *        < 0 - error
diff --git a/migration/ram.h b/migration/ram.h
index 921c39a2c5..425c9ad9fd 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -122,4 +122,5 @@ void ram_write_tracking_prepare(void);
 int ram_write_tracking_start(void);
 void ram_write_tracking_stop(void);

+int ram_save_zero_page(QEMUFile *f, void *opaque);
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index ce158c3512..9bb0f03942 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1676,6 +1676,36 @@ int qemu_savevm_state_complete_precopy(QEMUFile
*f, bool iterable_only)
     return qemu_fflush(f);
 }

+int qemu_savevm_state_postcopy_prepare(QEMUFile *f)
+{
+    int ret = 0;
+    SaveStateEntry *se;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (strcmp(se->idstr, "ram")) {
+            continue;
+        }
+
+        save_section_header(f, se, QEMU_VM_SECTION_PART);
+
+        ram_save_zero_page(f, se->opaque);
+        if ((ret = multifd_ram_flush_and_sync(f)) < 0) {
+            return ret;
+        }
+        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+        save_section_footer(f, se);
+        if (ret < 0) {
+            qemu_file_set_error(f, ret);
+            return -1;
+        }
+    }
+
+    return ret;
+}
+
+
 /* Give an estimate of the amount left to be transferred,
  * the result is split into the amount for units that can and
  * for units that can't do postcopy.
diff --git a/migration/savevm.h b/migration/savevm.h
index 138c39a7f9..7d4ff25ca3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -74,4 +74,5 @@ int
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool qemu_loadvm_load_state_buffer(const char *idstr, uint32_t instance_id,
                                    char *buf, size_t len, Error **errp);

+int qemu_savevm_state_postcopy_prepare(QEMUFile *f);
 #endif
===

host-1] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
             OK             139.87s   79 subtests passed
host-2] 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test
             OK             137.52s   79 subtests passed

2025-03-13 12:11:01.653+0000: initiating migration
2025-03-13T12:11:01.690247Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called: 1, 0
2025-03-13T12:12:05.342194Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called: 1, 0
2025-03-13 12:12:54.688+0000: shutting down, reason=migrated

2025-03-13 12:21:10.967+0000: initiating migration
2025-03-13T12:21:10.994878Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called: 1, 0
2025-03-13T12:22:15.955266Z qemu-system-x86_64: info:
multifd_ram_flush_and_sync: called: 1, 0
2025-03-13 12:23:02.107+0000: shutting down, reason=migrated
===

* Does this patch look okay?

Thank you.
---
  - Prasad



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-13 12:43                     ` Prasad Pandit
@ 2025-03-13 20:08                       ` Peter Xu
  2025-03-17 12:30                         ` Prasad Pandit
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Xu @ 2025-03-13 20:08 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Thu, Mar 13, 2025 at 06:13:24PM +0530, Prasad Pandit wrote:
> +int qemu_savevm_state_postcopy_prepare(QEMUFile *f)
> +{
> +    int ret = 0;
> +    SaveStateEntry *se;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (strcmp(se->idstr, "ram")) {
> +            continue;
> +        }
> +
> +        save_section_header(f, se, QEMU_VM_SECTION_PART);
> +
> +        ram_save_zero_page(f, se->opaque);

I'll stop requesting a why here... but I think this is another example that
even if all the tests pass it may not be correct.

> +        if ((ret = multifd_ram_flush_and_sync(f)) < 0) {
> +            return ret;
> +        }
> +        qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +
> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
> +        save_section_footer(f, se);
> +        if (ret < 0) {
> +            qemu_file_set_error(f, ret);
> +            return -1;
> +        }
> +    }
> +
> +    return ret;
> +}

[...]

> * Does this patch look okay?

I've written the relevant patches below, please review and take them if you
agree with the changes.

Thanks,

===8<===

From f9343dfc777ef04168443e86a1fa3922296ea563 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 13 Mar 2025 15:34:10 -0400
Subject: [PATCH 1/2] migration: Add save_postcopy_prepare() savevm handler

Add a savevm handler for a module to opt-in sending extra sections right
before postcopy starts, and before VM is stopped.

RAM will start to use this new savevm handler in the next patch to do flush
and sync for multifd pages.

Note that we choose to do it before VM stopped because the current only
potential user is not sensitive to VM status, so doing it before VM is
stopped is preferred to enlarge any postcopy downtime.

It is still a bit unfortunate that we need to introduce such a new savevm
handler just for the only use case, however it's so far the cleanest.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/migration/register.h | 15 +++++++++++++++
 migration/savevm.h           |  1 +
 migration/migration.c        |  4 ++++
 migration/savevm.c           | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/include/migration/register.h b/include/migration/register.h
index c041ce32f2..b79dc81b8d 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -189,6 +189,21 @@ typedef struct SaveVMHandlers {
 
     /* This runs outside the BQL!  */
 
+    /**
+     * @save_postcopy_prepare
+     *
+     * This hook will be invoked on the source side right before switching
+     * to postcopy (before VM stopped).
+     *
+     * @f:      QEMUFile where to send the data
+     * @opaque: Data pointer passed to register_savevm_live()
+     * @errp:   Error** used to report error message
+     *
+     * Returns: true if succeeded, false if error occured.  When false is
+     * returned, @errp must be set.
+     */
+    bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp);
+
     /**
      * @state_pending_estimate
      *
diff --git a/migration/savevm.h b/migration/savevm.h
index 138c39a7f9..2d5e9c7166 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -45,6 +45,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
 void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
                                         uint64_t *can_postcopy);
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy);
+bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
 int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
diff --git a/migration/migration.c b/migration/migration.c
index d46e776e24..212f6b4145 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2707,6 +2707,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
         }
     }
 
+    if (!qemu_savevm_state_postcopy_prepare(ms->to_dst_file, errp)) {
+        return -1;
+    }
+
     trace_postcopy_start();
     bql_lock();
     trace_postcopy_start_set_run();
diff --git a/migration/savevm.c b/migration/savevm.c
index ce158c3512..23ef4c7dc9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1523,6 +1523,39 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
     qemu_fflush(f);
 }
 
+bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp)
+{
+    SaveStateEntry *se;
+    bool ret;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (!se->ops || !se->ops->save_postcopy_prepare) {
+            continue;
+        }
+
+        if (se->ops->is_active) {
+            if (!se->ops->is_active(se->opaque)) {
+                continue;
+            }
+        }
+
+        trace_savevm_section_start(se->idstr, se->section_id);
+
+        save_section_header(f, se, QEMU_VM_SECTION_PART);
+        ret = se->ops->save_postcopy_prepare(f, se->opaque, errp);
+        save_section_footer(f, se);
+
+        trace_savevm_section_end(se->idstr, se->section_id, ret);
+
+        if (!ret) {
+            assert(*errp);
+            return false;
+        }
+    }
+
+    return true;
+}
+
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
     int64_t start_ts_each, end_ts_each;
-- 
2.47.0


From 299e1cdd9b28802f361ed012673825685e30f965 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Thu, 13 Mar 2025 15:56:01 -0400
Subject: [PATCH 2/2] migration/ram: Implement save_postcopy_prepare()

Implement save_postcopy_prepare(), preparing for the enablement of both
multifd and postcopy.

Please see the rich comment for the rationals.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/ram.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 424df6d9f1..119e7d3ac2 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -4420,6 +4420,42 @@ static int ram_resume_prepare(MigrationState *s, void *opaque)
     return 0;
 }
 
+static bool ram_save_postcopy_prepare(QEMUFile *f, void *opaque, Error **errp)
+{
+    int ret;
+
+    if (migrate_multifd()) {
+        /*
+         * When multifd is enabled, source QEMU needs to make sure all the
+         * pages queued before postcopy starts to be flushed.
+         *
+         * Meanwhile, the load of these pages must happen before switching
+         * to postcopy.  It's because loading of guest pages (so far) in
+         * multifd recv threads is still non-atomic, so the load cannot
+         * happen with vCPUs running on destination side.
+         *
+         * This flush and sync will guarantee those pages loaded _before_
+         * postcopy starts on destination. The rational is, this happens
+         * before VM stops (and before source QEMU sends all the rest of
+         * the postcopy messages).  So when the destination QEMU received
+         * the postcopy messages, it must have received the sync message on
+         * the main channel (either RAM_SAVE_FLAG_MULTIFD_FLUSH, or
+         * RAM_SAVE_FLAG_EOS), and such message should have guaranteed all
+         * previous guest pages queued in the multifd channels to be
+         * completely loaded.
+         */
+        ret = multifd_ram_flush_and_sync(f);
+        if (ret < 0) {
+            error_setg(errp, "%s: multifd flush and sync failed", __func__);
+            return false;
+        }
+    }
+
+    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
+
+    return true;
+}
+
 void postcopy_preempt_shutdown_file(MigrationState *s)
 {
     qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
@@ -4439,6 +4475,7 @@ static SaveVMHandlers savevm_ram_handlers = {
     .load_setup = ram_load_setup,
     .load_cleanup = ram_load_cleanup,
     .resume_prepare = ram_resume_prepare,
+    .save_postcopy_prepare = ram_save_postcopy_prepare,
 };
 
 static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
-- 
2.47.0


-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-13 20:08                       ` Peter Xu
@ 2025-03-17 12:30                         ` Prasad Pandit
  2025-03-17 15:00                           ` Peter Xu
  0 siblings, 1 reply; 37+ messages in thread
From: Prasad Pandit @ 2025-03-17 12:30 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hi,

On Fri, 14 Mar 2025 at 01:40, Peter Xu <peterx@redhat.com> wrote:
>+        save_section_header(f, se, QEMU_VM_SECTION_PART);
> +        ram_save_zero_page(f, se->opaque);
>I'll stop requesting a why here...

* Earlier in this thread you mentioned 'We need a header'. I took it
as a 'RAM page' header, not save_section_header(). Section type
(QEMU_VM_COMMAND) was sent by qemu_savevm_command_send() as well.

> but I think this is another example that even if all the tests pass it may not be correct.

* This is also an example of - communication is hard.

> From f9343dfc777ef04168443e86a1fa3922296ea563 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Thu, 13 Mar 2025 15:34:10 -0400
> Subject: [PATCH 1/2] migration: Add save_postcopy_prepare() savevm handler
>
> Add a savevm handler for a module to opt-in sending extra sections right
> before postcopy starts, and before VM is stopped.
>
> RAM will start to use this new savevm handler in the next patch to do flush
> and sync for multifd pages.
>
> Note that we choose to do it before VM stopped because the current only
> potential user is not sensitive to VM status, so doing it before VM is
> stopped is preferred to enlarge any postcopy downtime.
>
> It is still a bit unfortunate that we need to introduce such a new savevm
> handler just for the only use case, however it's so far the cleanest.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/migration/register.h | 15 +++++++++++++++
>  migration/savevm.h           |  1 +
>  migration/migration.c        |  4 ++++
>  migration/savevm.c           | 33 +++++++++++++++++++++++++++++++++
>  4 files changed, 53 insertions(+)
>
> diff --git a/include/migration/register.h b/include/migration/register.h
> index c041ce32f2..b79dc81b8d 100644
> --- a/include/migration/register.h
> +++ b/include/migration/register.h
> @@ -189,6 +189,21 @@ typedef struct SaveVMHandlers {
>
>      /* This runs outside the BQL!  */
>
> +    /**
> +     * @save_postcopy_prepare
> +     *
> +     * This hook will be invoked on the source side right before switching
> +     * to postcopy (before VM stopped).
> +     *
> +     * @f:      QEMUFile where to send the data
> +     * @opaque: Data pointer passed to register_savevm_live()
> +     * @errp:   Error** used to report error message
> +     *
> +     * Returns: true if succeeded, false if error occured.  When false is
> +     * returned, @errp must be set.
> +     */
> +    bool (*save_postcopy_prepare)(QEMUFile *f, void *opaque, Error **errp);
> +
>      /**
>       * @state_pending_estimate
>       *
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 138c39a7f9..2d5e9c7166 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -45,6 +45,7 @@ void qemu_savevm_state_pending_exact(uint64_t *must_precopy,
>  void qemu_savevm_state_pending_estimate(uint64_t *must_precopy,
>                                          uint64_t *can_postcopy);
>  int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy);
> +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp);
>  void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
>  void qemu_savevm_send_open_return_path(QEMUFile *f);
>  int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
> diff --git a/migration/migration.c b/migration/migration.c
> index d46e776e24..212f6b4145 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2707,6 +2707,10 @@ static int postcopy_start(MigrationState *ms, Error **errp)
>          }
>      }
>
> +    if (!qemu_savevm_state_postcopy_prepare(ms->to_dst_file, errp)) {
> +        return -1;
> +    }
> +
>      trace_postcopy_start();
>      bql_lock();
>      trace_postcopy_start_set_run();
> diff --git a/migration/savevm.c b/migration/savevm.c
> index ce158c3512..23ef4c7dc9 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1523,6 +1523,39 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
>      qemu_fflush(f);
>  }
>
> +bool qemu_savevm_state_postcopy_prepare(QEMUFile *f, Error **errp)
> +{
> +    SaveStateEntry *se;
> +    bool ret;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (!se->ops || !se->ops->save_postcopy_prepare) {
> +            continue;
> +        }
> +
> +        if (se->ops->is_active) {
> +            if (!se->ops->is_active(se->opaque)) {
> +                continue;
> +            }
> +        }
> +
> +        trace_savevm_section_start(se->idstr, se->section_id);
> +
> +        save_section_header(f, se, QEMU_VM_SECTION_PART);
> +        ret = se->ops->save_postcopy_prepare(f, se->opaque, errp);
> +        save_section_footer(f, se);
> +
> +        trace_savevm_section_end(se->idstr, se->section_id, ret);
> +
> +        if (!ret) {
> +            assert(*errp);
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
>  int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
>  {
>      int64_t start_ts_each, end_ts_each;
> --
> 2.47.0
>
>
> From 299e1cdd9b28802f361ed012673825685e30f965 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Thu, 13 Mar 2025 15:56:01 -0400
> Subject: [PATCH 2/2] migration/ram: Implement save_postcopy_prepare()
>
> Implement save_postcopy_prepare(), preparing for the enablement of both
> multifd and postcopy.
>
> Please see the rich comment for the rationals.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/ram.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 424df6d9f1..119e7d3ac2 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -4420,6 +4420,42 @@ static int ram_resume_prepare(MigrationState *s, void *opaque)
>      return 0;
>  }
>
> +static bool ram_save_postcopy_prepare(QEMUFile *f, void *opaque, Error **errp)
> +{
> +    int ret;
> +
> +    if (migrate_multifd()) {
> +        /*
> +         * When multifd is enabled, source QEMU needs to make sure all the
> +         * pages queued before postcopy starts to be flushed.
> +         *
> +         * Meanwhile, the load of these pages must happen before switching
> +         * to postcopy.  It's because loading of guest pages (so far) in
> +         * multifd recv threads is still non-atomic, so the load cannot
> +         * happen with vCPUs running on destination side.
> +         *
> +         * This flush and sync will guarantee those pages loaded _before_
> +         * postcopy starts on destination. The rational is, this happens
> +         * before VM stops (and before source QEMU sends all the rest of
> +         * the postcopy messages).  So when the destination QEMU received
> +         * the postcopy messages, it must have received the sync message on
> +         * the main channel (either RAM_SAVE_FLAG_MULTIFD_FLUSH, or
> +         * RAM_SAVE_FLAG_EOS), and such message should have guaranteed all
> +         * previous guest pages queued in the multifd channels to be
> +         * completely loaded.
> +         */
> +        ret = multifd_ram_flush_and_sync(f);
> +        if (ret < 0) {
> +            error_setg(errp, "%s: multifd flush and sync failed", __func__);
> +            return false;
> +        }
> +    }
> +
> +    qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
> +
> +    return true;
> +}
> +
>  void postcopy_preempt_shutdown_file(MigrationState *s)
>  {
>      qemu_put_be64(s->postcopy_qemufile_src, RAM_SAVE_FLAG_EOS);
> @@ -4439,6 +4475,7 @@ static SaveVMHandlers savevm_ram_handlers = {
>      .load_setup = ram_load_setup,
>      .load_cleanup = ram_load_cleanup,
>      .resume_prepare = ram_resume_prepare,
> +    .save_postcopy_prepare = ram_save_postcopy_prepare,
>  };
>
>  static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
> --
> 2.47.0

* I get the infrastructural changes that they'll help to take
'section' specific action before postcopy starts. It's not clear how
tying flush and sync with a RAM section helps; because on the
destination side 'section' is only used to call
se->ops->load_state()->ram_load->ram_load_precopy()->multifd_recv_sync_main().

* To confirm:
    -  Benefit of this approach is that 'flush and sync' works via
vmstate_load -> se->ops->load_state() -> ram_load ->
ram_load_precopy() sequence?

* Thank you for the patches, I'll send a revised patchset including them.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command
  2025-03-17 12:30                         ` Prasad Pandit
@ 2025-03-17 15:00                           ` Peter Xu
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Xu @ 2025-03-17 15:00 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Mon, Mar 17, 2025 at 06:00:14PM +0530, Prasad Pandit wrote:
> Hi,
> 
> On Fri, 14 Mar 2025 at 01:40, Peter Xu <peterx@redhat.com> wrote:
> >+        save_section_header(f, se, QEMU_VM_SECTION_PART);
> > +        ram_save_zero_page(f, se->opaque);
> >I'll stop requesting a why here...
> 
> * Earlier in this thread you mentioned 'We need a header'. I took it
> as a 'RAM page' header, not save_section_header(). Section type

The question is not about the header, it's about the zero page, and why we
send this zero page out of blue.

IIIUC it can corrupt a page if it uses whatever cached in the previous
round in the pss, and if it used to be a non-zero.

> (QEMU_VM_COMMAND) was sent by qemu_savevm_command_send() as well.

But it should be for generic VM operations.  We have a few outliers but
they're too special.  For this case we'd better not add more outliers,
because neither it is special, nor necessary.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-03-17 15:00 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 12:17 [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-02-28 12:17 ` [PATCH v7 1/5] migration/multifd: move macros to multifd header Prasad Pandit
2025-02-28 12:17 ` [PATCH v7 2/5] migration: enable multifd and postcopy together Prasad Pandit
2025-02-28 12:17 ` [PATCH v7 3/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit
2025-02-28 12:17 ` [PATCH v7 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-02-28 15:11   ` Fabiano Rosas
2025-03-03  9:33     ` Prasad Pandit
2025-02-28 12:17 ` [PATCH v7 5/5] migration: add MULTIFD_RECV_SYNC migration command Prasad Pandit
2025-02-28 13:42   ` Peter Xu
2025-03-03 11:43     ` Prasad Pandit
2025-03-03 14:50       ` Peter Xu
2025-03-04  8:10         ` Prasad Pandit
2025-03-04 14:35           ` Peter Xu
2025-03-05 11:21             ` Prasad Pandit
2025-03-05 12:54               ` Peter Xu
2025-03-07 11:45                 ` Prasad Pandit
2025-03-07 22:48                   ` Peter Xu
2025-03-10  7:36                     ` Prasad Pandit
2025-03-13 12:43                     ` Prasad Pandit
2025-03-13 20:08                       ` Peter Xu
2025-03-17 12:30                         ` Prasad Pandit
2025-03-17 15:00                           ` Peter Xu
2025-03-07 22:51                   ` Peter Xu
2025-03-10 14:38                     ` Fabiano Rosas
2025-03-10 17:08                       ` Prasad Pandit
2025-03-10 19:58                         ` Fabiano Rosas
2025-03-11 10:01                           ` Prasad Pandit
2025-03-11 12:44                             ` Fabiano Rosas
2025-02-28 14:53 ` [PATCH v7 0/5] Allow to enable multifd and postcopy migration together Fabiano Rosas
2025-03-03 10:47   ` Prasad Pandit
2025-03-03 14:12     ` Peter Xu
2025-03-04  9:47       ` Prasad Pandit
2025-03-04 14:42         ` Peter Xu
2025-03-05  7:41           ` Prasad Pandit
2025-03-05 13:56             ` Fabiano Rosas
2025-03-06  7:51               ` Prasad Pandit
2025-03-06 13:48                 ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).