qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Allow to enable multifd and postcopy migration together
@ 2024-11-29 12:22 Prasad Pandit
  2024-11-29 12:22 ` [PATCH v2 1/3] migration/multifd: move macros to multifd header Prasad Pandit
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Prasad Pandit @ 2024-11-29 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* Currently Multifd and Postcopy migration can not be used together.
  QEMU shows "Postcopy is not yet compatible with multifd" message.

  When migrating guests with large (100's GB) RAM, Multifd threads
  help to accelerate migration, but inability to use it with the
  Postcopy mode delays guest start up on the destination side.

* This patch series allows to enable both Multifd and Postcopy
  migration together. Precopy and Multifd threads work during
  the initial guest (RAM) transfer. When migration moves to the
  Postcopy phase, Multifd threads are restrained and the Postcopy
  threads start to request pages from the source side.

* This series removes magic value (4-bytes) introduced in the
  previous series for the Postcopy channel. And refactoring of
  the 'ram_save_target_page' function is made independent of
  the multifd & postcopy change.

  v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
  v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u


Thank you.
---
Prasad Pandit (3):
  migration/multifd: move macros to multifd header
  migration: refactor ram_save_target_page functions
  migration: enable multifd and postcopy together

 migration/migration.c      | 90 +++++++++++++++++++++++---------------
 migration/multifd-nocomp.c |  3 +-
 migration/multifd.c        |  5 ---
 migration/multifd.h        |  5 +++
 migration/options.c        |  5 ---
 migration/ram.c            | 73 +++++++++----------------------
 6 files changed, 82 insertions(+), 99 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/3] migration/multifd: move macros to multifd header
  2024-11-29 12:22 [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
@ 2024-11-29 12:22 ` Prasad Pandit
  2024-12-09 21:07   ` Fabiano Rosas
  2024-11-29 12:22 ` [PATCH v2 2/3] migration: refactor ram_save_target_page functions Prasad Pandit
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Prasad Pandit @ 2024-11-29 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Move MULTIFD_ macros to the header file so that
they are accessible from other source files.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/multifd.c | 5 -----
 migration/multifd.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 498e71fd10..f8914276c4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -33,11 +33,6 @@
 #include "io/channel-socket.h"
 #include "yank_functions.h"
 
-/* Multiple fd's */
-
-#define MULTIFD_MAGIC 0x11223344U
-#define MULTIFD_VERSION 1
-
 typedef struct {
     uint32_t magic;
     uint32_t version;
diff --git a/migration/multifd.h b/migration/multifd.h
index 50d58c0c9c..519dffeb9e 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -33,6 +33,11 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
 bool multifd_recv(void);
 MultiFDRecvData *multifd_get_recv_data(void);
 
+/* Multiple fd's */
+
+#define MULTIFD_MAGIC 0x11223344U
+#define MULTIFD_VERSION 1
+
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
 
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/3] migration: refactor ram_save_target_page functions
  2024-11-29 12:22 [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
  2024-11-29 12:22 ` [PATCH v2 1/3] migration/multifd: move macros to multifd header Prasad Pandit
@ 2024-11-29 12:22 ` Prasad Pandit
  2024-11-29 12:22 ` [PATCH v2 3/3] migration: enable multifd and postcopy together Prasad Pandit
  2024-11-29 16:45 ` [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Peter Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Prasad Pandit @ 2024-11-29 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Refactor ram_save_target_page legacy and multifd
functions into one. Other than simplifying it,
it frees 'migration_ops' object from usage, so it
is expunged.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/ram.c | 67 +++++++++++++------------------------------------
 1 file changed, 17 insertions(+), 50 deletions(-)

v2: Make refactoring change independent of the multifd & postcopy
    change.

v1: Further refactor ram_save_target_page() function to conflate
    save_zero_page() calls.
 - https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u

v0:
 - https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u

diff --git a/migration/ram.c b/migration/ram.c
index 05ff9eb328..5f97dda5f1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -467,13 +467,6 @@ void ram_transferred_add(uint64_t bytes)
     }
 }
 
-struct MigrationOps {
-    int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss);
-};
-typedef struct MigrationOps MigrationOps;
-
-MigrationOps *migration_ops;
-
 static int ram_save_host_page_urgent(PageSearchStatus *pss);
 
 /* NOTE: page is the PFN not real ram_addr_t. */
@@ -1986,55 +1979,36 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len,
 }
 
 /**
- * ram_save_target_page_legacy: save one target page
- *
- * Returns the number of pages written
+ * ram_save_target_page: save one target page to the precopy thread
+ * OR to multifd workers.
  *
  * @rs: current RAM state
  * @pss: data about the page we want to send
  */
-static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
+static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
 {
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
+    if (!migrate_multifd()
+        || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+        if (save_zero_page(rs, pss, offset)) {
+            return 1;
+        }
+    }
+
+    if (migrate_multifd()) {
+        RAMBlock *block = pss->block;
+        return ram_save_multifd_page(block, offset);
+    }
+
     if (control_save_page(pss, offset, &res)) {
         return res;
     }
 
-    if (save_zero_page(rs, pss, offset)) {
-        return 1;
-    }
-
     return ram_save_page(rs, pss);
 }
 
-/**
- * ram_save_target_page_multifd: send one target page to multifd workers
- *
- * Returns 1 if the page was queued, -1 otherwise.
- *
- * @rs: current RAM state
- * @pss: data about the page we want to send
- */
-static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
-{
-    RAMBlock *block = pss->block;
-    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
-
-    /*
-     * While using multifd live migration, we still need to handle zero
-     * page checking on the migration main thread.
-     */
-    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
-        if (save_zero_page(rs, pss, offset)) {
-            return 1;
-        }
-    }
-
-    return ram_save_multifd_page(block, offset);
-}
-
 /* Should be called before sending a host page */
 static void pss_host_page_prepare(PageSearchStatus *pss)
 {
@@ -2121,7 +2095,7 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss)
 
         if (page_dirty) {
             /* Be strict to return code; it must be 1, or what else? */
-            if (migration_ops->ram_save_target_page(rs, pss) != 1) {
+            if (ram_save_target_page(rs, pss) != 1) {
                 error_report_once("%s: ram_save_target_page failed", __func__);
                 ret = -1;
                 goto out;
@@ -2190,7 +2164,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
             if (preempt_active) {
                 qemu_mutex_unlock(&rs->bitmap_mutex);
             }
-            tmppages = migration_ops->ram_save_target_page(rs, pss);
+            tmppages = ram_save_target_page(rs, pss);
             if (tmppages >= 0) {
                 pages += tmppages;
                 /*
@@ -2388,8 +2362,6 @@ static void ram_save_cleanup(void *opaque)
     xbzrle_cleanup();
     multifd_ram_save_cleanup();
     ram_state_cleanup(rsp);
-    g_free(migration_ops);
-    migration_ops = NULL;
 }
 
 static void ram_state_reset(RAMState *rs)
@@ -3055,13 +3027,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
         return ret;
     }
 
-    migration_ops = g_malloc0(sizeof(MigrationOps));
-
     if (migrate_multifd()) {
         multifd_ram_save_setup();
-        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
-    } else {
-        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
     }
 
     bql_unlock();
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/3] migration: enable multifd and postcopy together
  2024-11-29 12:22 [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
  2024-11-29 12:22 ` [PATCH v2 1/3] migration/multifd: move macros to multifd header Prasad Pandit
  2024-11-29 12:22 ` [PATCH v2 2/3] migration: refactor ram_save_target_page functions Prasad Pandit
@ 2024-11-29 12:22 ` Prasad Pandit
  2024-11-29 16:45 ` [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Peter Xu
  3 siblings, 0 replies; 8+ messages in thread
From: Prasad Pandit @ 2024-11-29 12:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit

From: Prasad Pandit <pjp@fedoraproject.org>

Enable Multifd and Postcopy migration together.
The migration_ioc_process_incoming() routine
checks magic value sent on each channel and
helps to properly setup multifd and postcopy
channels.

The Precopy and Multifd threads work during the
initial guest RAM transfer. When migration moves
to the Postcopy phase, the multifd threads are
restrained and Postcopy threads on the destination
request/pull data from the source side.

Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 migration/migration.c      | 90 +++++++++++++++++++++++---------------
 migration/multifd-nocomp.c |  3 +-
 migration/options.c        |  5 ---
 migration/ram.c            |  8 ++--
 4 files changed, 61 insertions(+), 45 deletions(-)

v2: Merge earlier options.c patch into this one. Also make
    !migration_in_postcopy() check in this patch, to separate
    refactoring change from this one.

v1: Avoid using 4-bytes magic value for the Postcopy channel.
    Flush and synchronise Multifd thread before postcopy_start().
 - https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u

v0:
 - https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u

diff --git a/migration/migration.c b/migration/migration.c
index 8c5bd0a75c..fa0a6d3b7d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -92,6 +92,9 @@ enum mig_rp_message_type {
     MIG_RP_MSG_MAX
 };
 
+/* Migration channel types */
+enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY };
+
 /* When we add fault tolerance, we could have several
    migrations at once.  For now we don't need to add
    dynamic creation of migration */
@@ -921,26 +924,32 @@ void migration_fd_process_incoming(QEMUFile *f)
 /*
  * Returns true when we want to start a new incoming migration process,
  * false otherwise.
+ *
+ * All the required channels must be in place before a new incoming
+ * migration process starts.
+ *  - Multifd enabled:
+ *    The main channel and the multifd channels are required.
+ *  - Multifd/Postcopy disabled:
+ *    The main channel is required.
+ *  - Postcopy enabled:
+ *    We don't want to start a new incoming migration when
+ *    the postcopy channel is created. Because it is created
+ *    towards the end of the precopy migration.
  */
-static bool migration_should_start_incoming(bool main_channel)
+static bool migration_should_start_incoming(uint8_t channel)
 {
-    /* Multifd doesn't start unless all channels are established */
-    if (migrate_multifd()) {
-        return migration_has_all_channels();
-    }
+    bool ret = false;
+
+    if (channel != CH_POSTCOPY) {
+        MigrationIncomingState *mis = migration_incoming_get_current();
+        ret = mis->from_src_file ? true : false;
 
-    /* Preempt channel only starts when the main channel is created */
-    if (migrate_postcopy_preempt()) {
-        return main_channel;
+        if (ret && migrate_multifd()) {
+            ret = multifd_recv_all_channels_created();
+        }
     }
 
-    /*
-     * For all the rest types of migration, we should only reach here when
-     * it's the main channel that's being created, and we should always
-     * proceed with this channel.
-     */
-    assert(main_channel);
-    return true;
+    return ret;
 }
 
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
@@ -948,13 +957,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
     MigrationIncomingState *mis = migration_incoming_get_current();
     Error *local_err = NULL;
     QEMUFile *f;
-    bool default_channel = true;
     uint32_t channel_magic = 0;
+    uint8_t channel = CH_DEFAULT;
     int ret = 0;
 
-    if (migrate_multifd() && !migrate_mapped_ram() &&
-        !migrate_postcopy_ram() &&
-        qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
+    if (!migration_should_start_incoming(channel)
+        && qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) {
         /*
          * With multiple channels, it is possible that we receive channels
          * out of order on destination side, causing incorrect mapping of
@@ -972,35 +980,46 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
             return;
         }
 
-        default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC));
+        if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) {
+            channel = CH_DEFAULT;
+        } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) {
+            channel = CH_MULTIFD;
+        } else {
+            error_report("%s: could not identify channel, unknown magic: %u",
+                            __func__, channel_magic);
+            return;
+        }
+
     } else {
-        default_channel = !mis->from_src_file;
+        channel = CH_POSTCOPY;
     }
 
     if (multifd_recv_setup(errp) != 0) {
         return;
     }
 
-    if (default_channel) {
+    if (channel == CH_DEFAULT) {
         f = qemu_file_new_input(ioc);
         migration_incoming_setup(f);
-    } else {
+    } else if (channel == CH_MULTIFD) {
         /* Multiple connections */
-        assert(migration_needs_multiple_sockets());
         if (migrate_multifd()) {
             multifd_recv_new_channel(ioc, &local_err);
-        } else {
+        }
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    } else if (channel == CH_POSTCOPY) {
+        if (migrate_postcopy()) {
             assert(migrate_postcopy_preempt());
+            assert(!mis->postcopy_qemufile_dst);
             f = qemu_file_new_input(ioc);
             postcopy_preempt_new_channel(mis, f);
         }
-        if (local_err) {
-            error_propagate(errp, local_err);
-            return;
-        }
     }
 
-    if (migration_should_start_incoming(default_channel)) {
+    if (migration_should_start_incoming(channel)) {
         /* If it's a recovery, we're done */
         if (postcopy_try_recover()) {
             return;
@@ -1017,21 +1036,22 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp)
  */
 bool migration_has_all_channels(void)
 {
+    bool ret = false;
     MigrationIncomingState *mis = migration_incoming_get_current();
 
     if (!mis->from_src_file) {
-        return false;
+        return ret;
     }
 
     if (migrate_multifd()) {
-        return multifd_recv_all_channels_created();
+        ret = multifd_recv_all_channels_created();
     }
 
-    if (migrate_postcopy_preempt()) {
-        return mis->postcopy_qemufile_dst != NULL;
+    if (ret && migrate_postcopy_preempt()) {
+        ret = mis->postcopy_qemufile_dst != NULL;
     }
 
-    return true;
+    return ret;
 }
 
 int migrate_send_rp_switchover_ack(MigrationIncomingState *mis)
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 55191152f9..e92821e8f6 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -14,6 +14,7 @@
 #include "exec/ramblock.h"
 #include "exec/target_page.h"
 #include "file.h"
+#include "migration.h"
 #include "multifd.h"
 #include "options.h"
 #include "qapi/error.h"
@@ -345,7 +346,7 @@ retry:
 
 int multifd_ram_flush_and_sync(void)
 {
-    if (!migrate_multifd()) {
+    if (!migrate_multifd() || migration_in_postcopy()) {
         return 0;
     }
 
diff --git a/migration/options.c b/migration/options.c
index ad8d6989a8..c498558a85 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -479,11 +479,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
             error_setg(errp, "Postcopy is not compatible with ignore-shared");
             return false;
         }
-
-        if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
-            error_setg(errp, "Postcopy is not yet compatible with multifd");
-            return false;
-        }
     }
 
     if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
diff --git a/migration/ram.c b/migration/ram.c
index 5f97dda5f1..7444fa5280 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1316,9 +1316,9 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
         pss->page = 0;
         pss->block = QLIST_NEXT_RCU(pss->block, next);
         if (!pss->block) {
-            if (migrate_multifd() &&
-                (!migrate_multifd_flush_after_each_section() ||
-                 migrate_mapped_ram())) {
+            if (migrate_multifd() && !migration_in_postcopy()
+                && (!migrate_multifd_flush_after_each_section()
+                    || migrate_mapped_ram())) {
                 QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
                 int ret = multifd_ram_flush_and_sync();
                 if (ret < 0) {
@@ -1997,7 +1997,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
         }
     }
 
-    if (migrate_multifd()) {
+    if (migrate_multifd() && !migration_in_postcopy()) {
         RAMBlock *block = pss->block;
         return ram_save_multifd_page(block, offset);
     }
-- 
2.47.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration together
  2024-11-29 12:22 [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
                   ` (2 preceding siblings ...)
  2024-11-29 12:22 ` [PATCH v2 3/3] migration: enable multifd and postcopy together Prasad Pandit
@ 2024-11-29 16:45 ` Peter Xu
  2024-12-02  7:07   ` Prasad Pandit
  3 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2024-11-29 16:45 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Fri, Nov 29, 2024 at 05:52:53PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> Hello,
> 
> * Currently Multifd and Postcopy migration can not be used together.
>   QEMU shows "Postcopy is not yet compatible with multifd" message.
> 
>   When migrating guests with large (100's GB) RAM, Multifd threads
>   help to accelerate migration, but inability to use it with the
>   Postcopy mode delays guest start up on the destination side.
> 
> * This patch series allows to enable both Multifd and Postcopy
>   migration together. Precopy and Multifd threads work during
>   the initial guest (RAM) transfer. When migration moves to the
>   Postcopy phase, Multifd threads are restrained and the Postcopy
>   threads start to request pages from the source side.
> 
> * This series removes magic value (4-bytes) introduced in the
>   previous series for the Postcopy channel. And refactoring of
>   the 'ram_save_target_page' function is made independent of
>   the multifd & postcopy change.
> 
>   v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
>   v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
> 
> 
> Thank you.
> ---
> Prasad Pandit (3):
>   migration/multifd: move macros to multifd header
>   migration: refactor ram_save_target_page functions
>   migration: enable multifd and postcopy together

Prasad,

I saw that there's still discussion in the previous version, while this
cover letter doesn't mention why it was ignored.  Especially, at least to
me, what Fabiano commented on simplifying the flush condition check makes
senes to me to be addressed.

Please cherish reviewer's feedback and time contributed, and let's finish
the disucssion first before rushing for a new version.  I'll try to join
the discussion later too there.

Meanwhile, before I read into any details, I found that all the tests I
requested are still missing.  Would you please consider adding them?

My previous comment regarding to test is here:

https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/

I listed exactly the minimum set of tests that I think we should have.

In general, any migration new feature must have both doc updates and test
coverage as long as applicable.

Multifd still has its doc missing, which is unfortunate.  We could have one
doc prior to this work, but it can also be done later.

OTOH on testing: this is not a new feature alone, but it's more dangerous
than a new feature due to what I mentioned before, that postcopy needs
atomicity on page movements, and multifd is completely against that from
design POV due to NIC drivers being able to modify guest pages directly.

It means multifd+postcopy bugs will be extremely hard to debug if we don't
put it right first.  So please be serious on the test coverage on this
work.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration together
  2024-11-29 16:45 ` [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Peter Xu
@ 2024-12-02  7:07   ` Prasad Pandit
  2024-12-05 22:41     ` Peter Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Prasad Pandit @ 2024-12-02  7:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

Hello Peter,

On Fri, 29 Nov 2024 at 22:16, Peter Xu <peterx@redhat.com> wrote:
> I saw that there's still discussion in the previous version, while this
> cover letter doesn't mention why it was ignored.  Especially, at least to
> me, what Fabiano commented on simplifying the flush condition check makes
> senes to me to be addressed.

* It is not ignored. Simplifying flush conditionals makes sense to me
too, that is why in the 'v0' version of this series I had added the
!migration_in_postcopy() check to the migrate_multifd() function,
right? I tried to discuss in the 'v1' thread if there's another way to
simplify conditionals. Not sure if you've followed all comments in the
thread.

* Secondly, as you mention above, I also thought Fabiano is pointing
at the complexity of the 'if' conditionals and thus I replied that his
proposed patch does not seem to solve for that complexity. But in his
subsequent reply Fabiano mentions that it is not just about
conditionals, but larger complexity of how and when multifd threads do
flush and sync amongst them.

* IMHO, simplifying that larger complexity of how multifd threads do
flush and sync can be done independently, outside the scope of this
patch series, which is about enabling multifd and postcopy together.

> Meanwhile, before I read into any details, I found that all the tests I
> requested are still missing.  Would you please consider adding them?
>
> My previous comment regarding to test is here:
> https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/
>
> I listed exactly the minimum set of tests that I think we should have.
...
> In general, any migration new feature must have both doc updates and test
> coverage as long as applicable.
>
> Multifd still has its doc missing, which is unfortunate.  We could have one
> doc prior to this work, but it can also be done later.
>
> OTOH on testing: this is not a new feature alone, but it's more dangerous
> than a new feature due to what I mentioned before, that postcopy needs
> atomicity on page movements, and multifd is completely against that from
> design POV due to NIC drivers being able to modify guest pages directly.
>
> It means multifd+postcopy bugs will be extremely hard to debug if we don't
> put it right first.  So please be serious on the test coverage on this
> work.

* I'm yet to get to the test cases. The revised series(v1 and v2) are
posted to share patch updates which were suggested in the previous
reviews. Test cases are a separate/different effort from source
patches. If we want to hold on this patch series till we get the test
cases and documentation in place, that is okay, I'll work on that
next.

Thank you.
---
  - Prasad



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/3] Allow to enable multifd and postcopy migration together
  2024-12-02  7:07   ` Prasad Pandit
@ 2024-12-05 22:41     ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2024-12-05 22:41 UTC (permalink / raw)
  To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit

On Mon, Dec 02, 2024 at 12:37:19PM +0530, Prasad Pandit wrote:
> Hello Peter,
> 
> On Fri, 29 Nov 2024 at 22:16, Peter Xu <peterx@redhat.com> wrote:
> > I saw that there's still discussion in the previous version, while this
> > cover letter doesn't mention why it was ignored.  Especially, at least to
> > me, what Fabiano commented on simplifying the flush condition check makes
> > senes to me to be addressed.
> 
> * It is not ignored. Simplifying flush conditionals makes sense to me
> too, that is why in the 'v0' version of this series I had added the
> !migration_in_postcopy() check to the migrate_multifd() function,
> right?

As explained, that addition was wrong, because migrate_multifd() should
always return the user option only.  Again, you can add another helper.

> I tried to discuss in the 'v1' thread if there's another way to
> simplify conditionals. Not sure if you've followed all comments in the
> thread.

I'll post a version to clean it up, either we go with Fabiano's, or mine,
or a 3rd option.  We shouldn't pile up more condition check there.  It's
growing into something not maintainable.

> 
> * Secondly, as you mention above, I also thought Fabiano is pointing
> at the complexity of the 'if' conditionals and thus I replied that his
> proposed patch does not seem to solve for that complexity. But in his
> subsequent reply Fabiano mentions that it is not just about
> conditionals, but larger complexity of how and when multifd threads do
> flush and sync amongst them.

Yes they're relevant, but I think we can cleanup the whole thing and it's
not that complicated, IMHO.  We'll see.

> 
> * IMHO, simplifying that larger complexity of how multifd threads do
> flush and sync can be done independently, outside the scope of this
> patch series, which is about enabling multifd and postcopy together.

I assume you're working on the test cases, I hope this won't block you from
continuing your work on this series.

As mentioned above, I think we need to clean this up before moving on,
unfortunately.  And I hope things settle already before you have the test
cases ready.  I appreciate you add the test cases for multifd+postcopy.
That's very important.  Before that you can keep your patch as-is, and
leave that part for us to figure out.  Feel free to chime in anytime as
well.

> 
> > Meanwhile, before I read into any details, I found that all the tests I
> > requested are still missing.  Would you please consider adding them?
> >
> > My previous comment regarding to test is here:
> > https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/
> >
> > I listed exactly the minimum set of tests that I think we should have.
> ...
> > In general, any migration new feature must have both doc updates and test
> > coverage as long as applicable.
> >
> > Multifd still has its doc missing, which is unfortunate.  We could have one
> > doc prior to this work, but it can also be done later.
> >
> > OTOH on testing: this is not a new feature alone, but it's more dangerous
> > than a new feature due to what I mentioned before, that postcopy needs
> > atomicity on page movements, and multifd is completely against that from
> > design POV due to NIC drivers being able to modify guest pages directly.
> >
> > It means multifd+postcopy bugs will be extremely hard to debug if we don't
> > put it right first.  So please be serious on the test coverage on this
> > work.
> 
> * I'm yet to get to the test cases. The revised series(v1 and v2) are
> posted to share patch updates which were suggested in the previous
> reviews. Test cases are a separate/different effort from source
> patches. If we want to hold on this patch series till we get the test
> cases and documentation in place, that is okay, I'll work on that
> next.

So we talked about this in our meeting, but still just to keep it a record
so whoever work on migration can reference: we do require test cases and
it's not separate effort.  We request both test cases and docs to present
before mering a feature, unless there's good reason to not to.

E.g. multifd doesn't yet have doc, so doc is not required for this work
yet, however test cases are.

Another outlier is VFIO+multifd cannot easily add test case because CI
normally doesn't have available hardware environment.  However does should
apply there to be required at least from migration POV.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/3] migration/multifd: move macros to multifd header
  2024-11-29 12:22 ` [PATCH v2 1/3] migration/multifd: move macros to multifd header Prasad Pandit
@ 2024-12-09 21:07   ` Fabiano Rosas
  0 siblings, 0 replies; 8+ messages in thread
From: Fabiano Rosas @ 2024-12-09 21:07 UTC (permalink / raw)
  To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit

Prasad Pandit <ppandit@redhat.com> writes:

> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Move MULTIFD_ macros to the header file so that
> they are accessible from other source files.
>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-12-09 21:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-29 12:22 [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
2024-11-29 12:22 ` [PATCH v2 1/3] migration/multifd: move macros to multifd header Prasad Pandit
2024-12-09 21:07   ` Fabiano Rosas
2024-11-29 12:22 ` [PATCH v2 2/3] migration: refactor ram_save_target_page functions Prasad Pandit
2024-11-29 12:22 ` [PATCH v2 3/3] migration: enable multifd and postcopy together Prasad Pandit
2024-11-29 16:45 ` [PATCH v2 0/3] Allow to enable multifd and postcopy migration together Peter Xu
2024-12-02  7:07   ` Prasad Pandit
2024-12-05 22:41     ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).