[RFC PATCH v2 0/9] migration/multifd: Remove multifd_send

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
@ 2024-07-22 17:59 Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 1/9] migration/multifd: Reduce access to p->pages Fabiano Rosas
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

Hi,

In this v2 I took Peter's suggestion of keeping the channels' pointers
and moving only the extra slot. The major changes are in patches 5 and
9. Patch 3 introduces the structure:

typedef enum {
    MULTIFD_PAYLOAD_NONE,
    MULTIFD_PAYLOAD_RAM,
} MultiFDPayloadType;

struct MultiFDSendData {
    MultiFDPayloadType type;
    union {
        MultiFDPages_t ram;
    } u;
};

I added a NONE type so we can use it to tell when the channel has
finished sending a packet, since we'll need to switch types between
clients anyway. This avoids having to introduce a 'size', or 'free'
variable.

WHAT'S MISSING:

- The support for calling multifd_send() concurrently. Maciej has this
  in his series so I didn't touch it.

- A way of adding methods for the new payload type. Currently, the
  compression methods are somewhat coupled with ram migration, so I'm
  not sure how to proceed.

- Moving all the multifd ram code into multifd-ram.c after this^ is
  sorted out.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1381005020

v1:
https://lore.kernel.org/r/20240620212111.29319-1-farosas@suse.de

First of all, apologies for the roughness of the series. I'm off for
the next couple of weeks and wanted to put something together early
for your consideration.

This series is a refactoring (based on an earlier, off-list
attempt[0]), aimed to remove the usage of the MultiFDPages_t type in
the multifd core. If we're going to add support for more data types to
multifd, we first need to clean that up.

This time around this work was prompted by Maciej's series[1]. I see
you're having to add a bunch of is_device_state checks to work around
the rigidity of the code.

Aside from the VFIO work, there is also the intent (coming back from
Juan's ideas) to make multifd the default code path for migration,
which will have to include the vmstate migration and anything else we
put on the stream via QEMUFile.

I have long since been bothered by having 'pages' sprinkled all over
the code, so I might be coming at this with a bit of a narrow focus,
but I believe in order to support more types of payloads in multifd,
we need to first allow the scheduling at multifd_send_pages() to be
independent of MultiFDPages_t. So here it is. Let me know what you
think.

(as I said, I'll be off for a couple of weeks, so feel free to
incorporate any of this code if it's useful. Or to ignore it
completely).

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1340992028

0- https://github.com/farosas/qemu/commits/multifd-packet-cleanups/
1- https://lore.kernel.org/r/cover.1718717584.git.maciej.szmigiero@oracle.com

Fabiano Rosas (9):
  migration/multifd: Reduce access to p->pages
  migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
  migration/multifd: Introduce MultiFDSendData
  migration/multifd: Make MultiFDPages_t:offset a flexible array member
  migration/multifd: Replace p->pages with an union pointer
  migration/multifd: Move pages accounting into
    multifd_send_zero_page_detect()
  migration/multifd: Isolate ram pages packet data
  migration/multifd: Don't send ram data during SYNC
  migration/multifd: Replace multifd_send_state->pages with client data

 migration/file.c              |   3 +-
 migration/file.h              |   2 +-
 migration/multifd-qpl.c       |  10 +-
 migration/multifd-uadk.c      |   9 +-
 migration/multifd-zero-page.c |   9 +-
 migration/multifd-zlib.c      |   4 +-
 migration/multifd-zstd.c      |   4 +-
 migration/multifd.c           | 239 +++++++++++++++++++++-------------
 migration/multifd.h           |  37 ++++--
 migration/ram.c               |   1 +
 10 files changed, 201 insertions(+), 117 deletions(-)

base-commit: a7ddb48bd1363c8bcdf42776d320289c42191f01
-- 
2.35.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 1/9] migration/multifd: Reduce access to p->pages
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov Fabiano Rosas
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

I'm about to replace the p->pages pointer with an opaque pointer, so
do a cleanup now to reduce direct accesses to p->page, which makes the
next diffs cleaner.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 migration/multifd-qpl.c  |  8 +++++---
 migration/multifd-uadk.c |  9 +++++----
 migration/multifd-zlib.c |  2 +-
 migration/multifd-zstd.c |  2 +-
 migration/multifd.c      | 13 +++++++------
 5 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index 9265098ee7..f8c84c52cf 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -404,13 +404,14 @@ retry:
 static void multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 {
     QplData *qpl = p->compress_data;
+    MultiFDPages_t *pages = p->pages;
     uint32_t size = p->page_size;
     qpl_job *job = qpl->sw_job;
     uint8_t *zbuf = qpl->zbuf;
     uint8_t *buf;
 
-    for (int i = 0; i < p->pages->normal_num; i++) {
-        buf = p->pages->block->host + p->pages->offset[i];
+    for (int i = 0; i < pages->normal_num; i++) {
+        buf = pages->block->host + pages->offset[i];
         multifd_qpl_prepare_comp_job(job, buf, zbuf, size);
         if (qpl_execute_job(job) == QPL_STS_OK) {
             multifd_qpl_fill_packet(i, p, zbuf, job->total_out);
@@ -498,6 +499,7 @@ static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     QplData *qpl = p->compress_data;
+    MultiFDPages_t *pages = p->pages;
     uint32_t len = 0;
 
     if (!multifd_send_prepare_common(p)) {
@@ -505,7 +507,7 @@ static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
     }
 
     /* The first IOV is used to store the compressed page lengths */
-    len = p->pages->normal_num * sizeof(uint32_t);
+    len = pages->normal_num * sizeof(uint32_t);
     multifd_qpl_fill_iov(p, (uint8_t *) qpl->zlen, len);
     if (qpl->hw_avail) {
         multifd_qpl_compress_pages(p);
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index d12353fb21..b8ba3cd9c1 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -174,19 +174,20 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error **errp)
     uint32_t hdr_size;
     uint8_t *buf = uadk_data->buf;
     int ret = 0;
+    MultiFDPages_t *pages = p->pages;
 
     if (!multifd_send_prepare_common(p)) {
         goto out;
     }
 
-    hdr_size = p->pages->normal_num * sizeof(uint32_t);
+    hdr_size = pages->normal_num * sizeof(uint32_t);
     /* prepare the header that stores the lengths of all compressed data */
     prepare_next_iov(p, uadk_data->buf_hdr, hdr_size);
 
-    for (int i = 0; i < p->pages->normal_num; i++) {
+    for (int i = 0; i < pages->normal_num; i++) {
         struct wd_comp_req creq = {
             .op_type = WD_DIR_COMPRESS,
-            .src     = p->pages->block->host + p->pages->offset[i],
+            .src     = pages->block->host + pages->offset[i],
             .src_len = p->page_size,
             .dst     = buf,
             /* Set dst_len to double the src in case compressed out >= page_size */
@@ -214,7 +215,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error **errp)
          */
         if (!uadk_data->handle || creq.dst_len >= p->page_size) {
             uadk_data->buf_hdr[i] = cpu_to_be32(p->page_size);
-            prepare_next_iov(p, p->pages->block->host + p->pages->offset[i],
+            prepare_next_iov(p, pages->block->host + pages->offset[i],
                              p->page_size);
             buf += p->page_size;
         }
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 2ced69487e..65f8aba5c8 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -147,7 +147,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
          * with compression. zlib does not guarantee that this is safe,
          * therefore copy the page before calling deflate().
          */
-        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
+        memcpy(z->buf, pages->block->host + pages->offset[i], p->page_size);
         zs->avail_in = p->page_size;
         zs->next_in = z->buf;
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index ca17b7e310..cb6075a9a5 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -138,7 +138,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
         if (i == pages->normal_num - 1) {
             flush = ZSTD_e_flush;
         }
-        z->in.src = p->pages->block->host + pages->offset[i];
+        z->in.src = pages->block->host + pages->offset[i];
         z->in.size = p->page_size;
         z->in.pos = 0;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 0b4cbaddfe..c95fce2222 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -114,11 +114,11 @@ static void multifd_set_file_bitmap(MultiFDSendParams *p)
 
     assert(pages->block);
 
-    for (int i = 0; i < p->pages->normal_num; i++) {
+    for (int i = 0; i < pages->normal_num; i++) {
         ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], true);
     }
 
-    for (int i = p->pages->normal_num; i < p->pages->num; i++) {
+    for (int i = pages->normal_num; i < pages->num; i++) {
         ramblock_set_file_bmap_atomic(pages->block, pages->offset[i], false);
     }
 }
@@ -417,7 +417,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
     int i;
 
     packet->flags = cpu_to_be32(p->flags);
-    packet->pages_alloc = cpu_to_be32(p->pages->allocated);
+    packet->pages_alloc = cpu_to_be32(pages->allocated);
     packet->normal_pages = cpu_to_be32(pages->normal_num);
     packet->zero_pages = cpu_to_be32(zero_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
@@ -953,7 +953,7 @@ static void *multifd_send_thread(void *opaque)
 
             if (migrate_mapped_ram()) {
                 ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
-                                              p->pages->block, &local_err);
+                                              pages->block, &local_err);
             } else {
                 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
                                                   NULL, 0, p->write_flags,
@@ -969,7 +969,7 @@ static void *multifd_send_thread(void *opaque)
             stat64_add(&mig_stats.normal_pages, pages->normal_num);
             stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 
-            multifd_pages_reset(p->pages);
+            multifd_pages_reset(pages);
             p->next_packet_size = 0;
 
             /*
@@ -1684,9 +1684,10 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 
 bool multifd_send_prepare_common(MultiFDSendParams *p)
 {
+    MultiFDPages_t *pages = p->pages;
     multifd_send_zero_page_detect(p);
 
-    if (!p->pages->normal_num) {
+    if (!pages->normal_num) {
         p->next_packet_size = 0;
         return false;
     }
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 1/9] migration/multifd: Reduce access to p->pages Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:21   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData Fabiano Rosas
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

We want to stop dereferencing 'pages' so it can be replaced by an
opaque pointer in the next patches.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/file.c    | 3 ++-
 migration/file.h    | 2 +-
 migration/multifd.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index db870f2cf0..7a8c9e3957 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -198,12 +198,13 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
 }
 
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
-                            int niov, RAMBlock *block, Error **errp)
+                            int niov, MultiFDPages_t *pages, Error **errp)
 {
     ssize_t ret = 0;
     int i, slice_idx, slice_num;
     uintptr_t base, next, offset;
     size_t len;
+    RAMBlock *block = pages->block;
 
     slice_idx = 0;
     slice_num = 1;
diff --git a/migration/file.h b/migration/file.h
index 9f71e87f74..1a1115f7f1 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -21,6 +21,6 @@ int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
-                            int niov, RAMBlock *block, Error **errp);
+                            int niov, MultiFDPages_t *pages, Error **errp);
 int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
 #endif
diff --git a/migration/multifd.c b/migration/multifd.c
index c95fce2222..20a767157e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -953,7 +953,7 @@ static void *multifd_send_thread(void *opaque)
 
             if (migrate_mapped_ram()) {
                 ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
-                                              pages->block, &local_err);
+                                              pages, &local_err);
             } else {
                 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
                                                   NULL, 0, p->write_flags,
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
  2024-07-22 17:59 ` [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov Fabiano Rosas
@ 2024-07-22 19:21   ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:21 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:07PM -0300, Fabiano Rosas wrote:
> We want to stop dereferencing 'pages' so it can be replaced by an
> opaque pointer in the next patches.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 1/9] migration/multifd: Reduce access to p->pages Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:22   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member Fabiano Rosas
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

Add a new data structure to replace p->pages in the multifd
channel. This new structure will hide the multifd payload type behind
an union, so we don't need to add a new field to the channel each time
we want to handle a different data type.

This also allow us to keep multifd_send_pages() as is, without needing
to complicate the pointer switching.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/migration/multifd.h b/migration/multifd.h
index 0ecd6f47d7..c7b1ebe099 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -16,6 +16,7 @@
 #include "ram.h"
 
 typedef struct MultiFDRecvData MultiFDRecvData;
+typedef struct MultiFDSendData MultiFDSendData;
 
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
@@ -89,6 +90,29 @@ struct MultiFDRecvData {
     off_t file_offset;
 };
 
+typedef enum {
+    MULTIFD_PAYLOAD_NONE,
+    MULTIFD_PAYLOAD_RAM,
+} MultiFDPayloadType;
+
+struct MultiFDSendData {
+    MultiFDPayloadType type;
+    union {
+        MultiFDPages_t ram;
+    } u;
+};
+
+static inline bool multifd_payload_empty(MultiFDSendData *data)
+{
+    return data->type == MULTIFD_PAYLOAD_NONE;
+}
+
+static inline void multifd_set_payload_type(MultiFDSendData *data,
+                                            MultiFDPayloadType type)
+{
+    data->type = type;
+}
+
 typedef struct {
     /* Fields are only written at creating/deletion time */
     /* No lock required for them, they are read only */
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData
  2024-07-22 17:59 ` [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData Fabiano Rosas
@ 2024-07-22 19:22   ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:22 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:08PM -0300, Fabiano Rosas wrote:
> Add a new data structure to replace p->pages in the multifd
> channel. This new structure will hide the multifd payload type behind
> an union, so we don't need to add a new field to the channel each time
> we want to handle a different data type.
> 
> This also allow us to keep multifd_send_pages() as is, without needing
> to complicate the pointer switching.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (2 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:20   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 5/9] migration/multifd: Replace p->pages with an union pointer Fabiano Rosas
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

We're about to use MultiFDPages_t from inside the MultiFDSendData
payload union, which means we cannot have pointers to allocated data
inside the pages structure, otherwise we'd lose the reference to that
memory once another payload type touches the union. Move the offset
array into the end of the structure and turn it into a flexible array
member, so it is allocated along with the rest of MultiFDSendData in
the next patches.

Note that the ramblock pointer is still fine because the storage for
it is not owned by the migration code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 21 ++++++---------------
 migration/multifd.h |  4 ++--
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 20a767157e..440319b361 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -389,22 +389,10 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
     return msg.id;
 }
 
-static MultiFDPages_t *multifd_pages_init(uint32_t n)
-{
-    MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
-
-    pages->allocated = n;
-    pages->offset = g_new0(ram_addr_t, n);
-
-    return pages;
-}
-
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
     multifd_pages_reset(pages);
     pages->allocated = 0;
-    g_free(pages->offset);
-    pages->offset = NULL;
     g_free(pages);
 }
 
@@ -1169,7 +1157,9 @@ bool multifd_send_setup(void)
     thread_count = migrate_multifd_channels();
     multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
     multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
-    multifd_send_state->pages = multifd_pages_init(page_count);
+    multifd_send_state->pages = g_malloc0(sizeof(MultiFDPages_t) +
+                                          page_count * sizeof(ram_addr_t));
+    multifd_send_state->pages->allocated = page_count;
     qemu_sem_init(&multifd_send_state->channels_created, 0);
     qemu_sem_init(&multifd_send_state->channels_ready, 0);
     qatomic_set(&multifd_send_state->exiting, 0);
@@ -1181,8 +1171,9 @@ bool multifd_send_setup(void)
         qemu_sem_init(&p->sem, 0);
         qemu_sem_init(&p->sem_sync, 0);
         p->id = i;
-        p->pages = multifd_pages_init(page_count);
-
+        p->pages = g_malloc0(sizeof(MultiFDPages_t) +
+                             page_count * sizeof(ram_addr_t));
+        p->pages->allocated = page_count;
         if (use_packets) {
             p->packet_len = sizeof(MultiFDPacket_t)
                           + sizeof(uint64_t) * page_count;
diff --git a/migration/multifd.h b/migration/multifd.h
index c7b1ebe099..12d4247e23 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -78,9 +78,9 @@ typedef struct {
     uint32_t normal_num;
     /* number of allocated pages */
     uint32_t allocated;
+    RAMBlock *block;
     /* offset of each page */
-    ram_addr_t *offset;
-    RAMBlock *block;
+    ram_addr_t offset[];
 } MultiFDPages_t;
 
 struct MultiFDRecvData {
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member
  2024-07-22 17:59 ` [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member Fabiano Rosas
@ 2024-07-22 19:20   ` Peter Xu
  2024-07-22 20:36     ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:20 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:09PM -0300, Fabiano Rosas wrote:
> We're about to use MultiFDPages_t from inside the MultiFDSendData
> payload union, which means we cannot have pointers to allocated data
> inside the pages structure, otherwise we'd lose the reference to that
> memory once another payload type touches the union. Move the offset
> array into the end of the structure and turn it into a flexible array
> member, so it is allocated along with the rest of MultiFDSendData in
> the next patches.
> 
> Note that the ramblock pointer is still fine because the storage for
> it is not owned by the migration code.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 21 ++++++---------------
>  migration/multifd.h |  4 ++--
>  2 files changed, 8 insertions(+), 17 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 20a767157e..440319b361 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -389,22 +389,10 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
>      return msg.id;
>  }
>  
> -static MultiFDPages_t *multifd_pages_init(uint32_t n)
> -{
> -    MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
> -
> -    pages->allocated = n;
> -    pages->offset = g_new0(ram_addr_t, n);
> -
> -    return pages;
> -}

Considering this is the tricky object to allocate here, shall we keep the
function just for readability (and already dedups below two callers)?  With
it, someone else will notice g_new0() stops working for MultiFDPages_t.
Some verbose comment would be nice too.

Maybe we can also move multifd_ram_payload_size() from the next patch to
here.

> -
>  static void multifd_pages_clear(MultiFDPages_t *pages)
>  {
>      multifd_pages_reset(pages);
>      pages->allocated = 0;
> -    g_free(pages->offset);
> -    pages->offset = NULL;
>      g_free(pages);
>  }
>  
> @@ -1169,7 +1157,9 @@ bool multifd_send_setup(void)
>      thread_count = migrate_multifd_channels();
>      multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>      multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
> -    multifd_send_state->pages = multifd_pages_init(page_count);
> +    multifd_send_state->pages = g_malloc0(sizeof(MultiFDPages_t) +
> +                                          page_count * sizeof(ram_addr_t));
> +    multifd_send_state->pages->allocated = page_count;
>      qemu_sem_init(&multifd_send_state->channels_created, 0);
>      qemu_sem_init(&multifd_send_state->channels_ready, 0);
>      qatomic_set(&multifd_send_state->exiting, 0);
> @@ -1181,8 +1171,9 @@ bool multifd_send_setup(void)
>          qemu_sem_init(&p->sem, 0);
>          qemu_sem_init(&p->sem_sync, 0);
>          p->id = i;
> -        p->pages = multifd_pages_init(page_count);
> -
> +        p->pages = g_malloc0(sizeof(MultiFDPages_t) +
> +                             page_count * sizeof(ram_addr_t));
> +        p->pages->allocated = page_count;
>          if (use_packets) {
>              p->packet_len = sizeof(MultiFDPacket_t)
>                            + sizeof(uint64_t) * page_count;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index c7b1ebe099..12d4247e23 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -78,9 +78,9 @@ typedef struct {
>      uint32_t normal_num;
>      /* number of allocated pages */
>      uint32_t allocated;
> +    RAMBlock *block;
>      /* offset of each page */
> -    ram_addr_t *offset;
> -    RAMBlock *block;
> +    ram_addr_t offset[];
>  } MultiFDPages_t;
>  
>  struct MultiFDRecvData {
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member
  2024-07-22 19:20   ` Peter Xu
@ 2024-07-22 20:36     ` Fabiano Rosas
  0 siblings, 0 replies; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 20:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:09PM -0300, Fabiano Rosas wrote:
>> We're about to use MultiFDPages_t from inside the MultiFDSendData
>> payload union, which means we cannot have pointers to allocated data
>> inside the pages structure, otherwise we'd lose the reference to that
>> memory once another payload type touches the union. Move the offset
>> array into the end of the structure and turn it into a flexible array
>> member, so it is allocated along with the rest of MultiFDSendData in
>> the next patches.
>> 
>> Note that the ramblock pointer is still fine because the storage for
>> it is not owned by the migration code.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 21 ++++++---------------
>>  migration/multifd.h |  4 ++--
>>  2 files changed, 8 insertions(+), 17 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 20a767157e..440319b361 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -389,22 +389,10 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
>>      return msg.id;
>>  }
>>  
>> -static MultiFDPages_t *multifd_pages_init(uint32_t n)
>> -{
>> -    MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
>> -
>> -    pages->allocated = n;
>> -    pages->offset = g_new0(ram_addr_t, n);
>> -
>> -    return pages;
>> -}
>
> Considering this is the tricky object to allocate here, shall we keep the
> function just for readability (and already dedups below two callers)?  With
> it, someone else will notice g_new0() stops working for MultiFDPages_t.
> Some verbose comment would be nice too.
>
> Maybe we can also move multifd_ram_payload_size() from the next patch to
> here.

I guess so, I was trying to keep this diff small so it's easier to
grasp.

>
>> -
>>  static void multifd_pages_clear(MultiFDPages_t *pages)
>>  {
>>      multifd_pages_reset(pages);
>>      pages->allocated = 0;
>> -    g_free(pages->offset);
>> -    pages->offset = NULL;
>>      g_free(pages);
>>  }
>>  
>> @@ -1169,7 +1157,9 @@ bool multifd_send_setup(void)
>>      thread_count = migrate_multifd_channels();
>>      multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>>      multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
>> -    multifd_send_state->pages = multifd_pages_init(page_count);
>> +    multifd_send_state->pages = g_malloc0(sizeof(MultiFDPages_t) +
>> +                                          page_count * sizeof(ram_addr_t));
>> +    multifd_send_state->pages->allocated = page_count;
>>      qemu_sem_init(&multifd_send_state->channels_created, 0);
>>      qemu_sem_init(&multifd_send_state->channels_ready, 0);
>>      qatomic_set(&multifd_send_state->exiting, 0);
>> @@ -1181,8 +1171,9 @@ bool multifd_send_setup(void)
>>          qemu_sem_init(&p->sem, 0);
>>          qemu_sem_init(&p->sem_sync, 0);
>>          p->id = i;
>> -        p->pages = multifd_pages_init(page_count);
>> -
>> +        p->pages = g_malloc0(sizeof(MultiFDPages_t) +
>> +                             page_count * sizeof(ram_addr_t));
>> +        p->pages->allocated = page_count;
>>          if (use_packets) {
>>              p->packet_len = sizeof(MultiFDPacket_t)
>>                            + sizeof(uint64_t) * page_count;
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index c7b1ebe099..12d4247e23 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -78,9 +78,9 @@ typedef struct {
>>      uint32_t normal_num;
>>      /* number of allocated pages */
>>      uint32_t allocated;
>> +    RAMBlock *block;
>>      /* offset of each page */
>> -    ram_addr_t *offset;
>> -    RAMBlock *block;
>> +    ram_addr_t offset[];
>>  } MultiFDPages_t;
>>  
>>  struct MultiFDRecvData {
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 5/9] migration/multifd: Replace p->pages with an union pointer
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (3 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 17:59 ` [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect() Fabiano Rosas
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

We want multifd to be able to handle more types of data than just ram
pages. To start decoupling multifd from pages, replace p->pages
(MultiFDPages_t) with the new type MultiFDSendData that hides the
client payload inside an union.

The general idea here is to isolate functions that *need* to handle
MultiFDPages_t and move them in the future to multifd-ram.c, while
multifd.c will stay with only the core functions that handle
MultiFDSendData/MultiFDRecvData.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-qpl.c       |  6 +--
 migration/multifd-uadk.c      |  2 +-
 migration/multifd-zero-page.c |  2 +-
 migration/multifd-zlib.c      |  2 +-
 migration/multifd-zstd.c      |  2 +-
 migration/multifd.c           | 84 +++++++++++++++++++++--------------
 migration/multifd.h           |  7 +--
 7 files changed, 58 insertions(+), 47 deletions(-)

diff --git a/migration/multifd-qpl.c b/migration/multifd-qpl.c
index f8c84c52cf..17ca16824d 100644
--- a/migration/multifd-qpl.c
+++ b/migration/multifd-qpl.c
@@ -404,7 +404,7 @@ retry:
 static void multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 {
     QplData *qpl = p->compress_data;
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     uint32_t size = p->page_size;
     qpl_job *job = qpl->sw_job;
     uint8_t *zbuf = qpl->zbuf;
@@ -435,7 +435,7 @@ static void multifd_qpl_compress_pages_slow_path(MultiFDSendParams *p)
 static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 {
     QplData *qpl = p->compress_data;
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     uint32_t size = p->page_size;
     QplHwJob *hw_job;
     uint8_t *buf;
@@ -499,7 +499,7 @@ static void multifd_qpl_compress_pages(MultiFDSendParams *p)
 static int multifd_qpl_send_prepare(MultiFDSendParams *p, Error **errp)
 {
     QplData *qpl = p->compress_data;
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     uint32_t len = 0;
 
     if (!multifd_send_prepare_common(p)) {
diff --git a/migration/multifd-uadk.c b/migration/multifd-uadk.c
index b8ba3cd9c1..a561988d2f 100644
--- a/migration/multifd-uadk.c
+++ b/migration/multifd-uadk.c
@@ -174,7 +174,7 @@ static int multifd_uadk_send_prepare(MultiFDSendParams *p, Error **errp)
     uint32_t hdr_size;
     uint8_t *buf = uadk_data->buf;
     int ret = 0;
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
 
     if (!multifd_send_prepare_common(p)) {
         goto out;
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index e1b8370f88..efc0424f74 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -46,7 +46,7 @@ static void swap_page_offset(ram_addr_t *pages_offset, int a, int b)
  */
 void multifd_send_zero_page_detect(MultiFDSendParams *p)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     RAMBlock *rb = pages->block;
     int i = 0;
     int j = pages->num - 1;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 65f8aba5c8..cec552e9b4 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,7 +123,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
  */
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     struct zlib_data *z = p->compress_data;
     z_stream *zs = &z->zs;
     uint32_t out_size = 0;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index cb6075a9a5..ff72c6d90a 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -119,7 +119,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
  */
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     struct zstd_data *z = p->compress_data;
     int ret;
     uint32_t i;
diff --git a/migration/multifd.c b/migration/multifd.c
index 440319b361..f64b053e44 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -49,8 +49,7 @@ typedef struct {
 
 struct {
     MultiFDSendParams *params;
-    /* array of pages to sent */
-    MultiFDPages_t *pages;
+    MultiFDSendData *data;
     /*
      * Global number of generated multifd packets.
      *
@@ -98,6 +97,27 @@ struct {
     MultiFDMethods *ops;
 } *multifd_recv_state;
 
+static size_t multifd_ram_payload_size(void)
+{
+    uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+
+    return sizeof(MultiFDPages_t) + n * sizeof(ram_addr_t);
+}
+
+static MultiFDSendData *multifd_send_data_alloc(void)
+{
+    size_t max_payload_size;
+
+    /*
+     * MultiFDPages_t carries extra data as a flexible array
+     * member. When adding new payload types, change this to use the
+     * larger between multifd_ram_payload_size() and the new type(s).
+     */
+    max_payload_size = multifd_ram_payload_size();
+
+    return g_malloc0(sizeof(MultiFDPayloadType) + max_payload_size);
+}
+
 static bool multifd_use_packets(void)
 {
     return !migrate_mapped_ram();
@@ -110,7 +130,7 @@ void multifd_send_channel_created(void)
 
 static void multifd_set_file_bitmap(MultiFDSendParams *p)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
 
     assert(pages->block);
 
@@ -164,7 +184,7 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
 
 static void multifd_send_prepare_iovs(MultiFDSendParams *p)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
 
     for (int i = 0; i < pages->normal_num; i++) {
         p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
@@ -389,17 +409,10 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
     return msg.id;
 }
 
-static void multifd_pages_clear(MultiFDPages_t *pages)
-{
-    multifd_pages_reset(pages);
-    pages->allocated = 0;
-    g_free(pages);
-}
-
 void multifd_send_fill_packet(MultiFDSendParams *p)
 {
     MultiFDPacket_t *packet = p->packet;
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     uint64_t packet_num;
     uint32_t zero_num = pages->num - pages->normal_num;
     int i;
@@ -579,7 +592,7 @@ static bool multifd_send_pages(void)
     int i;
     static int next_channel;
     MultiFDSendParams *p = NULL; /* make happy gcc */
-    MultiFDPages_t *pages = multifd_send_state->pages;
+    MultiFDSendData *tmp;
 
     if (multifd_send_should_exit()) {
         return false;
@@ -614,11 +627,14 @@ static bool multifd_send_pages(void)
      * qatomic_store_release() in multifd_send_thread().
      */
     smp_mb_acquire();
-    assert(!p->pages->num);
-    multifd_send_state->pages = p->pages;
-    p->pages = pages;
+
+    assert(!p->data->u.ram.num);
+
+    tmp = multifd_send_state->data;
+    multifd_send_state->data = p->data;
+    p->data = tmp;
     /*
-     * Making sure p->pages is setup before marking pending_job=true. Pairs
+     * Making sure p->data is setup before marking pending_job=true. Pairs
      * with the qatomic_load_acquire() in multifd_send_thread().
      */
     qatomic_store_release(&p->pending_job, true);
@@ -648,7 +664,7 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
     MultiFDPages_t *pages;
 
 retry:
-    pages = multifd_send_state->pages;
+    pages = &multifd_send_state->data->u.ram;
 
     /* If the queue is empty, we can already enqueue now */
     if (multifd_queue_empty(pages)) {
@@ -778,8 +794,8 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
     qemu_sem_destroy(&p->sem_sync);
     g_free(p->name);
     p->name = NULL;
-    multifd_pages_clear(p->pages);
-    p->pages = NULL;
+    g_free(p->data);
+    p->data = NULL;
     p->packet_len = 0;
     g_free(p->packet);
     p->packet = NULL;
@@ -796,8 +812,8 @@ static void multifd_send_cleanup_state(void)
     qemu_sem_destroy(&multifd_send_state->channels_ready);
     g_free(multifd_send_state->params);
     multifd_send_state->params = NULL;
-    multifd_pages_clear(multifd_send_state->pages);
-    multifd_send_state->pages = NULL;
+    g_free(multifd_send_state->data);
+    multifd_send_state->data = NULL;
     g_free(multifd_send_state);
     multifd_send_state = NULL;
 }
@@ -846,11 +862,13 @@ int multifd_send_sync_main(void)
 {
     int i;
     bool flush_zero_copy;
+    MultiFDPages_t *pages;
 
     if (!migrate_multifd()) {
         return 0;
     }
-    if (multifd_send_state->pages->num) {
+    pages = &multifd_send_state->data->u.ram;
+    if (pages->num) {
         if (!multifd_send_pages()) {
             error_report("%s: multifd_send_pages fail", __func__);
             return -1;
@@ -925,11 +943,11 @@ static void *multifd_send_thread(void *opaque)
         }
 
         /*
-         * Read pending_job flag before p->pages.  Pairs with the
+         * Read pending_job flag before p->data.  Pairs with the
          * qatomic_store_release() in multifd_send_pages().
          */
         if (qatomic_load_acquire(&p->pending_job)) {
-            MultiFDPages_t *pages = p->pages;
+            MultiFDPages_t *pages = &p->data->u.ram;
 
             p->iovs_num = 0;
             assert(pages->num);
@@ -941,7 +959,7 @@ static void *multifd_send_thread(void *opaque)
 
             if (migrate_mapped_ram()) {
                 ret = file_write_ramblock_iov(p->c, p->iov, p->iovs_num,
-                                              pages, &local_err);
+                                              &p->data->u.ram, &local_err);
             } else {
                 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num,
                                                   NULL, 0, p->write_flags,
@@ -961,7 +979,7 @@ static void *multifd_send_thread(void *opaque)
             p->next_packet_size = 0;
 
             /*
-             * Making sure p->pages is published before saying "we're
+             * Making sure p->data is published before saying "we're
              * free".  Pairs with the smp_mb_acquire() in
              * multifd_send_pages().
              */
@@ -1157,9 +1175,8 @@ bool multifd_send_setup(void)
     thread_count = migrate_multifd_channels();
     multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
     multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
-    multifd_send_state->pages = g_malloc0(sizeof(MultiFDPages_t) +
-                                          page_count * sizeof(ram_addr_t));
-    multifd_send_state->pages->allocated = page_count;
+    multifd_send_state->data = multifd_send_data_alloc();
+    multifd_send_state->data->u.ram.allocated = page_count;
     qemu_sem_init(&multifd_send_state->channels_created, 0);
     qemu_sem_init(&multifd_send_state->channels_ready, 0);
     qatomic_set(&multifd_send_state->exiting, 0);
@@ -1171,9 +1188,8 @@ bool multifd_send_setup(void)
         qemu_sem_init(&p->sem, 0);
         qemu_sem_init(&p->sem_sync, 0);
         p->id = i;
-        p->pages = g_malloc0(sizeof(MultiFDPages_t) +
-                             page_count * sizeof(ram_addr_t));
-        p->pages->allocated = page_count;
+        p->data = multifd_send_data_alloc();
+        p->data->u.ram.allocated = page_count;
         if (use_packets) {
             p->packet_len = sizeof(MultiFDPacket_t)
                           + sizeof(uint64_t) * page_count;
@@ -1675,7 +1691,7 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error **errp)
 
 bool multifd_send_prepare_common(MultiFDSendParams *p)
 {
-    MultiFDPages_t *pages = p->pages;
+    MultiFDPages_t *pages = &p->data->u.ram;
     multifd_send_zero_page_detect(p);
 
     if (!pages->normal_num) {
diff --git a/migration/multifd.h b/migration/multifd.h
index 12d4247e23..c9c01579a0 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -155,12 +155,7 @@ typedef struct {
      */
     bool pending_job;
     bool pending_sync;
-    /* array of pages to sent.
-     * The owner of 'pages' depends of 'pending_job' value:
-     * pending_job == 0 -> migration_thread can use it.
-     * pending_job != 0 -> multifd_channel can use it.
-     */
-    MultiFDPages_t *pages;
+    MultiFDSendData *data;
 
     /* thread local variables. No locking required */
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect()
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (4 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 5/9] migration/multifd: Replace p->pages with an union pointer Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:29   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data Fabiano Rosas
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

All references to pages are being removed from the multifd worker
threads in order to allow multifd to deal with different payload
types.

multifd_send_zero_page_detect() is called by all multifd migration
paths that deal with pages and is the last spot where zero pages and
normal page amounts are adjusted. Move the pages accounting into that
function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd-zero-page.c | 7 ++++++-
 migration/multifd.c           | 2 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index efc0424f74..899e8864c6 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -14,6 +14,7 @@
 #include "qemu/cutils.h"
 #include "exec/ramblock.h"
 #include "migration.h"
+#include "migration-stats.h"
 #include "multifd.h"
 #include "options.h"
 #include "ram.h"
@@ -53,7 +54,7 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
 
     if (!multifd_zero_page_enabled()) {
         pages->normal_num = pages->num;
-        return;
+        goto out;
     }
 
     /*
@@ -74,6 +75,10 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
     }
 
     pages->normal_num = i;
+
+out:
+    stat64_add(&mig_stats.normal_pages, pages->normal_num);
+    stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 }
 
 void multifd_recv_zero_page_process(MultiFDRecvParams *p)
diff --git a/migration/multifd.c b/migration/multifd.c
index f64b053e44..fcdb12e04f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -972,8 +972,6 @@ static void *multifd_send_thread(void *opaque)
 
             stat64_add(&mig_stats.multifd_bytes,
                        p->next_packet_size + p->packet_len);
-            stat64_add(&mig_stats.normal_pages, pages->normal_num);
-            stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
 
             multifd_pages_reset(pages);
             p->next_packet_size = 0;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect()
  2024-07-22 17:59 ` [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect() Fabiano Rosas
@ 2024-07-22 19:29   ` Peter Xu
  2024-07-22 20:07     ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:29 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:11PM -0300, Fabiano Rosas wrote:
> All references to pages are being removed from the multifd worker
> threads in order to allow multifd to deal with different payload
> types.
> 
> multifd_send_zero_page_detect() is called by all multifd migration
> paths that deal with pages and is the last spot where zero pages and
> normal page amounts are adjusted. Move the pages accounting into that
> function.

True, but it's a bit hackish to update (especially, normal) page counters
in a zero page detect function.

I understand you want to move pages out of the thread function, that's
fair. How about put it in your new multifd_ram_fill_packet()?

> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd-zero-page.c | 7 ++++++-
>  migration/multifd.c           | 2 --
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index efc0424f74..899e8864c6 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -14,6 +14,7 @@
>  #include "qemu/cutils.h"
>  #include "exec/ramblock.h"
>  #include "migration.h"
> +#include "migration-stats.h"
>  #include "multifd.h"
>  #include "options.h"
>  #include "ram.h"
> @@ -53,7 +54,7 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
>  
>      if (!multifd_zero_page_enabled()) {
>          pages->normal_num = pages->num;
> -        return;
> +        goto out;
>      }
>  
>      /*
> @@ -74,6 +75,10 @@ void multifd_send_zero_page_detect(MultiFDSendParams *p)
>      }
>  
>      pages->normal_num = i;
> +
> +out:
> +    stat64_add(&mig_stats.normal_pages, pages->normal_num);
> +    stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
>  }
>  
>  void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> diff --git a/migration/multifd.c b/migration/multifd.c
> index f64b053e44..fcdb12e04f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -972,8 +972,6 @@ static void *multifd_send_thread(void *opaque)
>  
>              stat64_add(&mig_stats.multifd_bytes,
>                         p->next_packet_size + p->packet_len);
> -            stat64_add(&mig_stats.normal_pages, pages->normal_num);
> -            stat64_add(&mig_stats.zero_pages, pages->num - pages->normal_num);
>  
>              multifd_pages_reset(pages);
>              p->next_packet_size = 0;
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect()
  2024-07-22 19:29   ` Peter Xu
@ 2024-07-22 20:07     ` Fabiano Rosas
  2024-07-22 20:38       ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 20:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:11PM -0300, Fabiano Rosas wrote:
>> All references to pages are being removed from the multifd worker
>> threads in order to allow multifd to deal with different payload
>> types.
>> 
>> multifd_send_zero_page_detect() is called by all multifd migration
>> paths that deal with pages and is the last spot where zero pages and
>> normal page amounts are adjusted. Move the pages accounting into that
>> function.
>
> True, but it's a bit hackish to update (especially, normal) page counters
> in a zero page detect function.

Hm, that's the one place in the code that actually sets
normal_num. Seems adequate to me.

> I understand you want to move pages out of the thread function, that's
> fair. How about put it in your new multifd_ram_fill_packet()?
>

That one is skipped when mapped-ram is in use. I could move it to
nocomp_send_prepare() after the zero_page_detect. It seems we're moving
towards changing nocomp -> ram at some point anyway. Would that be
better? It would duplicate the call due to the compression code.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect()
  2024-07-22 20:07     ` Fabiano Rosas
@ 2024-07-22 20:38       ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2024-07-22 20:38 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 05:07:57PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 02:59:11PM -0300, Fabiano Rosas wrote:
> >> All references to pages are being removed from the multifd worker
> >> threads in order to allow multifd to deal with different payload
> >> types.
> >> 
> >> multifd_send_zero_page_detect() is called by all multifd migration
> >> paths that deal with pages and is the last spot where zero pages and
> >> normal page amounts are adjusted. Move the pages accounting into that
> >> function.
> >
> > True, but it's a bit hackish to update (especially, normal) page counters
> > in a zero page detect function.
> 
> Hm, that's the one place in the code that actually sets
> normal_num. Seems adequate to me.

Fair point.

> 
> > I understand you want to move pages out of the thread function, that's
> > fair. How about put it in your new multifd_ram_fill_packet()?
> >
> 
> That one is skipped when mapped-ram is in use. I could move it to
> nocomp_send_prepare() after the zero_page_detect. It seems we're moving
> towards changing nocomp -> ram at some point anyway. Would that be
> better? It would duplicate the call due to the compression code.

Maybe it's simply that the helper itself (multifd_send_zero_page_detect)
needs a better name (perhaps multifd_send_ram_setup()?). I'm ok we leave
this one as-is:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (5 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect() Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:37   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC Fabiano Rosas
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

While we cannot yet disentangle the multifd packet from page data, we
can make the code a bit cleaner by setting the page-related fields in
a separate function.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 97 ++++++++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 36 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index fcdb12e04f..d25b8658b2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -409,65 +409,62 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
     return msg.id;
 }
 
-void multifd_send_fill_packet(MultiFDSendParams *p)
+static void multifd_ram_fill_packet(MultiFDSendParams *p)
 {
     MultiFDPacket_t *packet = p->packet;
     MultiFDPages_t *pages = &p->data->u.ram;
-    uint64_t packet_num;
     uint32_t zero_num = pages->num - pages->normal_num;
-    int i;
 
-    packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(pages->allocated);
     packet->normal_pages = cpu_to_be32(pages->normal_num);
     packet->zero_pages = cpu_to_be32(zero_num);
-    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
-
-    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
-    packet->packet_num = cpu_to_be64(packet_num);
 
     if (pages->block) {
         strncpy(packet->ramblock, pages->block->idstr, 256);
     }
 
-    for (i = 0; i < pages->num; i++) {
+    for (int i = 0; i < pages->num; i++) {
         /* there are architectures where ram_addr_t is 32 bit */
         uint64_t temp = pages->offset[i];
 
         packet->offset[i] = cpu_to_be64(temp);
     }
 
-    p->packets_sent++;
     p->total_normal_pages += pages->normal_num;
     p->total_zero_pages += zero_num;
+}
 
-    trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
+void multifd_send_fill_packet(MultiFDSendParams *p)
+{
+    MultiFDPacket_t *packet = p->packet;
+    uint64_t packet_num;
+
+    memset(packet, 0, p->packet_len);
+
+    packet->magic = cpu_to_be32(MULTIFD_MAGIC);
+    packet->version = cpu_to_be32(MULTIFD_VERSION);
+
+    packet->flags = cpu_to_be32(p->flags);
+    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
+
+    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
+    packet->packet_num = cpu_to_be64(packet_num);
+
+    p->packets_sent++;
+
+    multifd_ram_fill_packet(p);
+
+    trace_multifd_send(p->id, packet_num,
+                       be32_to_cpu(packet->normal_pages),
+                       be32_to_cpu(packet->zero_pages),
                        p->flags, p->next_packet_size);
 }
 
-static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+static int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
     MultiFDPacket_t *packet = p->packet;
     int i;
 
-    packet->magic = be32_to_cpu(packet->magic);
-    if (packet->magic != MULTIFD_MAGIC) {
-        error_setg(errp, "multifd: received packet "
-                   "magic %x and expected magic %x",
-                   packet->magic, MULTIFD_MAGIC);
-        return -1;
-    }
-
-    packet->version = be32_to_cpu(packet->version);
-    if (packet->version != MULTIFD_VERSION) {
-        error_setg(errp, "multifd: received packet "
-                   "version %u and expected version %u",
-                   packet->version, MULTIFD_VERSION);
-        return -1;
-    }
-
-    p->flags = be32_to_cpu(packet->flags);
-
     packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
     /*
      * If we received a packet that is 100 times bigger than expected
@@ -496,15 +493,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         return -1;
     }
 
-    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
-    p->packet_num = be64_to_cpu(packet->packet_num);
-    p->packets_recved++;
     p->total_normal_pages += p->normal_num;
     p->total_zero_pages += p->zero_num;
 
-    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
-                       p->flags, p->next_packet_size);
-
     if (p->normal_num == 0 && p->zero_num == 0) {
         return 0;
     }
@@ -546,6 +537,40 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     return 0;
 }
 
+static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
+{
+    MultiFDPacket_t *packet = p->packet;
+    int ret = 0;
+
+    packet->magic = be32_to_cpu(packet->magic);
+    if (packet->magic != MULTIFD_MAGIC) {
+        error_setg(errp, "multifd: received packet "
+                   "magic %x and expected magic %x",
+                   packet->magic, MULTIFD_MAGIC);
+        return -1;
+    }
+
+    packet->version = be32_to_cpu(packet->version);
+    if (packet->version != MULTIFD_VERSION) {
+        error_setg(errp, "multifd: received packet "
+                   "version %u and expected version %u",
+                   packet->version, MULTIFD_VERSION);
+        return -1;
+    }
+
+    p->flags = be32_to_cpu(packet->flags);
+    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
+    p->packet_num = be64_to_cpu(packet->packet_num);
+    p->packets_recved++;
+
+    ret = multifd_ram_unfill_packet(p, errp);
+
+    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
+                       p->flags, p->next_packet_size);
+
+    return ret;
+}
+
 static bool multifd_send_should_exit(void)
 {
     return qatomic_read(&multifd_send_state->exiting);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data
  2024-07-22 17:59 ` [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data Fabiano Rosas
@ 2024-07-22 19:37   ` Peter Xu
  2024-07-22 20:34     ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:37 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:12PM -0300, Fabiano Rosas wrote:
> While we cannot yet disentangle the multifd packet from page data, we
> can make the code a bit cleaner by setting the page-related fields in
> a separate function.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 97 ++++++++++++++++++++++++++++-----------------
>  1 file changed, 61 insertions(+), 36 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index fcdb12e04f..d25b8658b2 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -409,65 +409,62 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
>      return msg.id;
>  }
>  
> -void multifd_send_fill_packet(MultiFDSendParams *p)
> +static void multifd_ram_fill_packet(MultiFDSendParams *p)
>  {
>      MultiFDPacket_t *packet = p->packet;
>      MultiFDPages_t *pages = &p->data->u.ram;
> -    uint64_t packet_num;
>      uint32_t zero_num = pages->num - pages->normal_num;
> -    int i;
>  
> -    packet->flags = cpu_to_be32(p->flags);
>      packet->pages_alloc = cpu_to_be32(pages->allocated);
>      packet->normal_pages = cpu_to_be32(pages->normal_num);
>      packet->zero_pages = cpu_to_be32(zero_num);
> -    packet->next_packet_size = cpu_to_be32(p->next_packet_size);

Definitely good intention, but I had a feeling that this will need to be
reorganized again when Maciej reworks on top, due to the fact that
next_packet_size will be ram-private field, simply because it's defined
after pages_alloc and normal_pages...

E.g., see:

https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com

Where the new MultiFDPacketHdr_t cannot include next_packet_size (even if
VFIO will need that too..).

typedef struct {
    uint32_t magic;
    uint32_t version;
    uint32_t flags;
} __attribute__((packed)) MultiFDPacketHdr_t;

So _maybe_ it's easier we drop this patch and leave that part to Maciej to
identify which is common and which is arm/vfio specific.  No strong
opinions here.

> -
> -    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
> -    packet->packet_num = cpu_to_be64(packet_num);
>  
>      if (pages->block) {
>          strncpy(packet->ramblock, pages->block->idstr, 256);
>      }
>  
> -    for (i = 0; i < pages->num; i++) {
> +    for (int i = 0; i < pages->num; i++) {
>          /* there are architectures where ram_addr_t is 32 bit */
>          uint64_t temp = pages->offset[i];
>  
>          packet->offset[i] = cpu_to_be64(temp);
>      }
>  
> -    p->packets_sent++;
>      p->total_normal_pages += pages->normal_num;
>      p->total_zero_pages += zero_num;
> +}
>  
> -    trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
> +void multifd_send_fill_packet(MultiFDSendParams *p)
> +{
> +    MultiFDPacket_t *packet = p->packet;
> +    uint64_t packet_num;
> +
> +    memset(packet, 0, p->packet_len);
> +
> +    packet->magic = cpu_to_be32(MULTIFD_MAGIC);
> +    packet->version = cpu_to_be32(MULTIFD_VERSION);
> +
> +    packet->flags = cpu_to_be32(p->flags);
> +    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> +
> +    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
> +    packet->packet_num = cpu_to_be64(packet_num);
> +
> +    p->packets_sent++;
> +
> +    multifd_ram_fill_packet(p);
> +
> +    trace_multifd_send(p->id, packet_num,
> +                       be32_to_cpu(packet->normal_pages),
> +                       be32_to_cpu(packet->zero_pages),
>                         p->flags, p->next_packet_size);
>  }
>  
> -static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> +static int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>  {
>      MultiFDPacket_t *packet = p->packet;
>      int i;
>  
> -    packet->magic = be32_to_cpu(packet->magic);
> -    if (packet->magic != MULTIFD_MAGIC) {
> -        error_setg(errp, "multifd: received packet "
> -                   "magic %x and expected magic %x",
> -                   packet->magic, MULTIFD_MAGIC);
> -        return -1;
> -    }
> -
> -    packet->version = be32_to_cpu(packet->version);
> -    if (packet->version != MULTIFD_VERSION) {
> -        error_setg(errp, "multifd: received packet "
> -                   "version %u and expected version %u",
> -                   packet->version, MULTIFD_VERSION);
> -        return -1;
> -    }
> -
> -    p->flags = be32_to_cpu(packet->flags);
> -
>      packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
>      /*
>       * If we received a packet that is 100 times bigger than expected
> @@ -496,15 +493,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          return -1;
>      }
>  
> -    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
> -    p->packet_num = be64_to_cpu(packet->packet_num);
> -    p->packets_recved++;
>      p->total_normal_pages += p->normal_num;
>      p->total_zero_pages += p->zero_num;
>  
> -    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
> -                       p->flags, p->next_packet_size);
> -
>      if (p->normal_num == 0 && p->zero_num == 0) {
>          return 0;
>      }
> @@ -546,6 +537,40 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>      return 0;
>  }
>  
> +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> +{
> +    MultiFDPacket_t *packet = p->packet;
> +    int ret = 0;
> +
> +    packet->magic = be32_to_cpu(packet->magic);
> +    if (packet->magic != MULTIFD_MAGIC) {
> +        error_setg(errp, "multifd: received packet "
> +                   "magic %x and expected magic %x",
> +                   packet->magic, MULTIFD_MAGIC);
> +        return -1;
> +    }
> +
> +    packet->version = be32_to_cpu(packet->version);
> +    if (packet->version != MULTIFD_VERSION) {
> +        error_setg(errp, "multifd: received packet "
> +                   "version %u and expected version %u",
> +                   packet->version, MULTIFD_VERSION);
> +        return -1;
> +    }
> +
> +    p->flags = be32_to_cpu(packet->flags);
> +    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
> +    p->packet_num = be64_to_cpu(packet->packet_num);
> +    p->packets_recved++;
> +
> +    ret = multifd_ram_unfill_packet(p, errp);
> +
> +    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
> +                       p->flags, p->next_packet_size);
> +
> +    return ret;
> +}
> +
>  static bool multifd_send_should_exit(void)
>  {
>      return qatomic_read(&multifd_send_state->exiting);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data
  2024-07-22 19:37   ` Peter Xu
@ 2024-07-22 20:34     ` Fabiano Rosas
  2024-07-22 21:06       ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 20:34 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:12PM -0300, Fabiano Rosas wrote:
>> While we cannot yet disentangle the multifd packet from page data, we
>> can make the code a bit cleaner by setting the page-related fields in
>> a separate function.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 97 ++++++++++++++++++++++++++++-----------------
>>  1 file changed, 61 insertions(+), 36 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index fcdb12e04f..d25b8658b2 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -409,65 +409,62 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
>>      return msg.id;
>>  }
>>  
>> -void multifd_send_fill_packet(MultiFDSendParams *p)
>> +static void multifd_ram_fill_packet(MultiFDSendParams *p)
>>  {
>>      MultiFDPacket_t *packet = p->packet;
>>      MultiFDPages_t *pages = &p->data->u.ram;
>> -    uint64_t packet_num;
>>      uint32_t zero_num = pages->num - pages->normal_num;
>> -    int i;
>>  
>> -    packet->flags = cpu_to_be32(p->flags);
>>      packet->pages_alloc = cpu_to_be32(pages->allocated);
>>      packet->normal_pages = cpu_to_be32(pages->normal_num);
>>      packet->zero_pages = cpu_to_be32(zero_num);
>> -    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>
> Definitely good intention, but I had a feeling that this will need to be
> reorganized again when Maciej reworks on top, due to the fact that
> next_packet_size will be ram-private field, simply because it's defined
> after pages_alloc and normal_pages...
>
> E.g., see:
>
> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>
> Where the new MultiFDPacketHdr_t cannot include next_packet_size (even if
> VFIO will need that too..).

Isn't it just a matter of setting next_packet_size in the other path as
well?

@Maciej, can you comment?

>
> typedef struct {
>     uint32_t magic;
>     uint32_t version;
>     uint32_t flags;
> } __attribute__((packed)) MultiFDPacketHdr_t;
>
> So _maybe_ it's easier we drop this patch and leave that part to Maciej to
> identify which is common and which is arm/vfio specific.  No strong
> opinions here.
>

I could drop it if that's preferrable. However, patch 8/9 is absolutely
necessary so we can remove this madness of having to clear
MultiFDPages_t fields at the multifd_send_thread() top level. It took me
a whole day to figure that one out and that bug has been manifesting
ever since I started working on multifd.

I'm not sure how we'll do that without this patch. Maybe it's better I
fix in this one whatever you guys think needs fixing.

>> -
>> -    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
>> -    packet->packet_num = cpu_to_be64(packet_num);
>>  
>>      if (pages->block) {
>>          strncpy(packet->ramblock, pages->block->idstr, 256);
>>      }
>>  
>> -    for (i = 0; i < pages->num; i++) {
>> +    for (int i = 0; i < pages->num; i++) {
>>          /* there are architectures where ram_addr_t is 32 bit */
>>          uint64_t temp = pages->offset[i];
>>  
>>          packet->offset[i] = cpu_to_be64(temp);
>>      }
>>  
>> -    p->packets_sent++;
>>      p->total_normal_pages += pages->normal_num;
>>      p->total_zero_pages += zero_num;
>> +}
>>  
>> -    trace_multifd_send(p->id, packet_num, pages->normal_num, zero_num,
>> +void multifd_send_fill_packet(MultiFDSendParams *p)
>> +{
>> +    MultiFDPacket_t *packet = p->packet;
>> +    uint64_t packet_num;
>> +
>> +    memset(packet, 0, p->packet_len);
>> +
>> +    packet->magic = cpu_to_be32(MULTIFD_MAGIC);
>> +    packet->version = cpu_to_be32(MULTIFD_VERSION);
>> +
>> +    packet->flags = cpu_to_be32(p->flags);
>> +    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>> +
>> +    packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
>> +    packet->packet_num = cpu_to_be64(packet_num);
>> +
>> +    p->packets_sent++;
>> +
>> +    multifd_ram_fill_packet(p);
>> +
>> +    trace_multifd_send(p->id, packet_num,
>> +                       be32_to_cpu(packet->normal_pages),
>> +                       be32_to_cpu(packet->zero_pages),
>>                         p->flags, p->next_packet_size);
>>  }
>>  
>> -static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> +static int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>  {
>>      MultiFDPacket_t *packet = p->packet;
>>      int i;
>>  
>> -    packet->magic = be32_to_cpu(packet->magic);
>> -    if (packet->magic != MULTIFD_MAGIC) {
>> -        error_setg(errp, "multifd: received packet "
>> -                   "magic %x and expected magic %x",
>> -                   packet->magic, MULTIFD_MAGIC);
>> -        return -1;
>> -    }
>> -
>> -    packet->version = be32_to_cpu(packet->version);
>> -    if (packet->version != MULTIFD_VERSION) {
>> -        error_setg(errp, "multifd: received packet "
>> -                   "version %u and expected version %u",
>> -                   packet->version, MULTIFD_VERSION);
>> -        return -1;
>> -    }
>> -
>> -    p->flags = be32_to_cpu(packet->flags);
>> -
>>      packet->pages_alloc = be32_to_cpu(packet->pages_alloc);
>>      /*
>>       * If we received a packet that is 100 times bigger than expected
>> @@ -496,15 +493,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>          return -1;
>>      }
>>  
>> -    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>> -    p->packet_num = be64_to_cpu(packet->packet_num);
>> -    p->packets_recved++;
>>      p->total_normal_pages += p->normal_num;
>>      p->total_zero_pages += p->zero_num;
>>  
>> -    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
>> -                       p->flags, p->next_packet_size);
>> -
>>      if (p->normal_num == 0 && p->zero_num == 0) {
>>          return 0;
>>      }
>> @@ -546,6 +537,40 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>      return 0;
>>  }
>>  
>> +static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> +{
>> +    MultiFDPacket_t *packet = p->packet;
>> +    int ret = 0;
>> +
>> +    packet->magic = be32_to_cpu(packet->magic);
>> +    if (packet->magic != MULTIFD_MAGIC) {
>> +        error_setg(errp, "multifd: received packet "
>> +                   "magic %x and expected magic %x",
>> +                   packet->magic, MULTIFD_MAGIC);
>> +        return -1;
>> +    }
>> +
>> +    packet->version = be32_to_cpu(packet->version);
>> +    if (packet->version != MULTIFD_VERSION) {
>> +        error_setg(errp, "multifd: received packet "
>> +                   "version %u and expected version %u",
>> +                   packet->version, MULTIFD_VERSION);
>> +        return -1;
>> +    }
>> +
>> +    p->flags = be32_to_cpu(packet->flags);
>> +    p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>> +    p->packet_num = be64_to_cpu(packet->packet_num);
>> +    p->packets_recved++;
>> +
>> +    ret = multifd_ram_unfill_packet(p, errp);
>> +
>> +    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
>> +                       p->flags, p->next_packet_size);
>> +
>> +    return ret;
>> +}
>> +
>>  static bool multifd_send_should_exit(void)
>>  {
>>      return qatomic_read(&multifd_send_state->exiting);
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data
  2024-07-22 20:34     ` Fabiano Rosas
@ 2024-07-22 21:06       ` Peter Xu
  2024-07-24  9:30         ` Maciej S. Szmigiero
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 21:06 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 05:34:44PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 02:59:12PM -0300, Fabiano Rosas wrote:
> >> While we cannot yet disentangle the multifd packet from page data, we
> >> can make the code a bit cleaner by setting the page-related fields in
> >> a separate function.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  migration/multifd.c | 97 ++++++++++++++++++++++++++++-----------------
> >>  1 file changed, 61 insertions(+), 36 deletions(-)
> >> 
> >> diff --git a/migration/multifd.c b/migration/multifd.c
> >> index fcdb12e04f..d25b8658b2 100644
> >> --- a/migration/multifd.c
> >> +++ b/migration/multifd.c
> >> @@ -409,65 +409,62 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
> >>      return msg.id;
> >>  }
> >>  
> >> -void multifd_send_fill_packet(MultiFDSendParams *p)
> >> +static void multifd_ram_fill_packet(MultiFDSendParams *p)
> >>  {
> >>      MultiFDPacket_t *packet = p->packet;
> >>      MultiFDPages_t *pages = &p->data->u.ram;
> >> -    uint64_t packet_num;
> >>      uint32_t zero_num = pages->num - pages->normal_num;
> >> -    int i;
> >>  
> >> -    packet->flags = cpu_to_be32(p->flags);
> >>      packet->pages_alloc = cpu_to_be32(pages->allocated);
> >>      packet->normal_pages = cpu_to_be32(pages->normal_num);
> >>      packet->zero_pages = cpu_to_be32(zero_num);
> >> -    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> >
> > Definitely good intention, but I had a feeling that this will need to be
> > reorganized again when Maciej reworks on top, due to the fact that
> > next_packet_size will be ram-private field, simply because it's defined
> > after pages_alloc and normal_pages...
> >
> > E.g., see:
> >
> > https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
> >
> > Where the new MultiFDPacketHdr_t cannot include next_packet_size (even if
> > VFIO will need that too..).
> 
> Isn't it just a matter of setting next_packet_size in the other path as
> well?
> 
> @Maciej, can you comment?
> 
> >
> > typedef struct {
> >     uint32_t magic;
> >     uint32_t version;
> >     uint32_t flags;
> > } __attribute__((packed)) MultiFDPacketHdr_t;
> >
> > So _maybe_ it's easier we drop this patch and leave that part to Maciej to
> > identify which is common and which is arm/vfio specific.  No strong
> > opinions here.
> >
> 
> I could drop it if that's preferrable. However, patch 8/9 is absolutely
> necessary so we can remove this madness of having to clear
> MultiFDPages_t fields at the multifd_send_thread() top level. It took me
> a whole day to figure that one out and that bug has been manifesting
> ever since I started working on multifd.
> 
> I'm not sure how we'll do that without this patch. Maybe it's better I
> fix in this one whatever you guys think needs fixing.

You can set next_packet_size only in ram path, or you can keep this patch
as is and let Maciej rework it on top.

I'm fine either way.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data
  2024-07-22 21:06       ` Peter Xu
@ 2024-07-24  9:30         ` Maciej S. Szmigiero
  0 siblings, 0 replies; 35+ messages in thread
From: Maciej S. Szmigiero @ 2024-07-24  9:30 UTC (permalink / raw)
  To: Peter Xu, Fabiano Rosas; +Cc: qemu-devel

On 22.07.2024 23:06, Peter Xu wrote:
> On Mon, Jul 22, 2024 at 05:34:44PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>>> On Mon, Jul 22, 2024 at 02:59:12PM -0300, Fabiano Rosas wrote:
>>>> While we cannot yet disentangle the multifd packet from page data, we
>>>> can make the code a bit cleaner by setting the page-related fields in
>>>> a separate function.
>>>>
>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>> ---
>>>>   migration/multifd.c | 97 ++++++++++++++++++++++++++++-----------------
>>>>   1 file changed, 61 insertions(+), 36 deletions(-)
>>>>
>>>> diff --git a/migration/multifd.c b/migration/multifd.c
>>>> index fcdb12e04f..d25b8658b2 100644
>>>> --- a/migration/multifd.c
>>>> +++ b/migration/multifd.c
>>>> @@ -409,65 +409,62 @@ static int multifd_recv_initial_packet(QIOChannel *c, Error **errp)
>>>>       return msg.id;
>>>>   }
>>>>   
>>>> -void multifd_send_fill_packet(MultiFDSendParams *p)
>>>> +static void multifd_ram_fill_packet(MultiFDSendParams *p)
>>>>   {
>>>>       MultiFDPacket_t *packet = p->packet;
>>>>       MultiFDPages_t *pages = &p->data->u.ram;
>>>> -    uint64_t packet_num;
>>>>       uint32_t zero_num = pages->num - pages->normal_num;
>>>> -    int i;
>>>>   
>>>> -    packet->flags = cpu_to_be32(p->flags);
>>>>       packet->pages_alloc = cpu_to_be32(pages->allocated);
>>>>       packet->normal_pages = cpu_to_be32(pages->normal_num);
>>>>       packet->zero_pages = cpu_to_be32(zero_num);
>>>> -    packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>>>
>>> Definitely good intention, but I had a feeling that this will need to be
>>> reorganized again when Maciej reworks on top, due to the fact that
>>> next_packet_size will be ram-private field, simply because it's defined
>>> after pages_alloc and normal_pages...
>>>
>>> E.g., see:
>>>
>>> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>>>
>>> Where the new MultiFDPacketHdr_t cannot include next_packet_size (even if
>>> VFIO will need that too..).
>>
>> Isn't it just a matter of setting next_packet_size in the other path as
>> well?
>>
>> @Maciej, can you comment?
>>
>>>
>>> typedef struct {
>>>      uint32_t magic;
>>>      uint32_t version;
>>>      uint32_t flags;
>>> } __attribute__((packed)) MultiFDPacketHdr_t;
>>>
>>> So _maybe_ it's easier we drop this patch and leave that part to Maciej to
>>> identify which is common and which is arm/vfio specific.  No strong
>>> opinions here.
>>>
>>
>> I could drop it if that's preferrable. However, patch 8/9 is absolutely
>> necessary so we can remove this madness of having to clear
>> MultiFDPages_t fields at the multifd_send_thread() top level. It took me
>> a whole day to figure that one out and that bug has been manifesting
>> ever since I started working on multifd.
>>
>> I'm not sure how we'll do that without this patch. Maybe it's better I
>> fix in this one whatever you guys think needs fixing.
> 
> You can set next_packet_size only in ram path, or you can keep this patch
> as is and let Maciej rework it on top.
> 
> I'm fine either way.
> 

I need to split out the packet header parsing ("unfilling") anyway from
RAM / device state parsing (to detect the packet type before reading the
rest of it to the appropriate data structure) so either way is fine with
me too.

I think if it's easier for Fabiano to work on top of this patch then he
should just keep it.

Thanks,
Maciej



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (6 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 21:03   ` Peter Xu
  2024-07-22 17:59 ` [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data Fabiano Rosas
  2024-07-22 19:58 ` [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Peter Xu
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

Skip saving and loading any ram data in the packet in the case of a
SYNC. This fixes a shortcoming of the current code which requires a
reset of the MultiFDPages_t fields right after the previous
pending_job finishes, otherwise the very next job might be a SYNC and
multifd_send_fill_packet() will put the stale values in the packet.

By not calling multifd_ram_fill_packet(), we can stop resetting
MultiFDPages_t in the multifd core and leave that to the client code.

Actually moving the reset function is not yet done because
pages->num==0 is used by the client code to determine whether the
MultiFDPages_t needs to be flushed. The subsequent patches will
replace that with a generic flag that is not dependent on
MultiFDPages_t.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index d25b8658b2..4394ca6ade 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -438,6 +438,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 {
     MultiFDPacket_t *packet = p->packet;
     uint64_t packet_num;
+    bool sync_packet = p->flags & MULTIFD_FLAG_SYNC;
 
     memset(packet, 0, p->packet_len);
 
@@ -452,7 +453,9 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 
     p->packets_sent++;
 
-    multifd_ram_fill_packet(p);
+    if (!sync_packet) {
+        multifd_ram_fill_packet(p);
+    }
 
     trace_multifd_send(p->id, packet_num,
                        be32_to_cpu(packet->normal_pages),
@@ -563,7 +566,12 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->packet_num = be64_to_cpu(packet->packet_num);
     p->packets_recved++;
 
-    ret = multifd_ram_unfill_packet(p, errp);
+    if (p->flags & MULTIFD_FLAG_SYNC) {
+        p->normal_num = 0;
+        p->zero_num = 0;
+    } else {
+        ret = multifd_ram_unfill_packet(p, errp);
+    }
 
     trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
                        p->flags, p->next_packet_size);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC
  2024-07-22 17:59 ` [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC Fabiano Rosas
@ 2024-07-22 21:03   ` Peter Xu
  2024-07-22 21:24     ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 21:03 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:13PM -0300, Fabiano Rosas wrote:
> Skip saving and loading any ram data in the packet in the case of a
> SYNC. This fixes a shortcoming of the current code which requires a
> reset of the MultiFDPages_t fields right after the previous
> pending_job finishes, otherwise the very next job might be a SYNC and
> multifd_send_fill_packet() will put the stale values in the packet.
> 
> By not calling multifd_ram_fill_packet(), we can stop resetting
> MultiFDPages_t in the multifd core and leave that to the client code.
> 
> Actually moving the reset function is not yet done because
> pages->num==0 is used by the client code to determine whether the
> MultiFDPages_t needs to be flushed. The subsequent patches will
> replace that with a generic flag that is not dependent on
> MultiFDPages_t.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index d25b8658b2..4394ca6ade 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -438,6 +438,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>  {
>      MultiFDPacket_t *packet = p->packet;
>      uint64_t packet_num;
> +    bool sync_packet = p->flags & MULTIFD_FLAG_SYNC;
>  
>      memset(packet, 0, p->packet_len);
>  
> @@ -452,7 +453,9 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>  
>      p->packets_sent++;
>  
> -    multifd_ram_fill_packet(p);
> +    if (!sync_packet) {
> +        multifd_ram_fill_packet(p);
> +    }
>  
>      trace_multifd_send(p->id, packet_num,
>                         be32_to_cpu(packet->normal_pages),
> @@ -563,7 +566,12 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>      p->packet_num = be64_to_cpu(packet->packet_num);
>      p->packets_recved++;
>  
> -    ret = multifd_ram_unfill_packet(p, errp);
> +    if (p->flags & MULTIFD_FLAG_SYNC) {
> +        p->normal_num = 0;
> +        p->zero_num = 0;

Instead of this, I wonder whether we shouldn't touch those fields at all,
but:

diff --git a/migration/multifd.c b/migration/multifd.c
index 0a85951d58..55abd9a1ef 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1547,7 +1547,9 @@ static void *multifd_recv_thread(void *opaque)
             flags = p->flags;
             /* recv methods don't know how to handle the SYNC flag */
             p->flags &= ~MULTIFD_FLAG_SYNC;
-            has_data = p->normal_num || p->zero_num;
+
+            if (!(flags & MULTIFD_FLAG_SYNC))
+                has_data = p->normal_num || p->zero_num;
             qemu_mutex_unlock(&p->mutex);
         } else {
             /*

> +    } else {
> +        ret = multifd_ram_unfill_packet(p, errp);
> +    }
>  
>      trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
>                         p->flags, p->next_packet_size);
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC
  2024-07-22 21:03   ` Peter Xu
@ 2024-07-22 21:24     ` Fabiano Rosas
  0 siblings, 0 replies; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 21:24 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:13PM -0300, Fabiano Rosas wrote:
>> Skip saving and loading any ram data in the packet in the case of a
>> SYNC. This fixes a shortcoming of the current code which requires a
>> reset of the MultiFDPages_t fields right after the previous
>> pending_job finishes, otherwise the very next job might be a SYNC and
>> multifd_send_fill_packet() will put the stale values in the packet.
>> 
>> By not calling multifd_ram_fill_packet(), we can stop resetting
>> MultiFDPages_t in the multifd core and leave that to the client code.
>> 
>> Actually moving the reset function is not yet done because
>> pages->num==0 is used by the client code to determine whether the
>> MultiFDPages_t needs to be flushed. The subsequent patches will
>> replace that with a generic flag that is not dependent on
>> MultiFDPages_t.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index d25b8658b2..4394ca6ade 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -438,6 +438,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>  {
>>      MultiFDPacket_t *packet = p->packet;
>>      uint64_t packet_num;
>> +    bool sync_packet = p->flags & MULTIFD_FLAG_SYNC;
>>  
>>      memset(packet, 0, p->packet_len);
>>  
>> @@ -452,7 +453,9 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>  
>>      p->packets_sent++;
>>  
>> -    multifd_ram_fill_packet(p);
>> +    if (!sync_packet) {
>> +        multifd_ram_fill_packet(p);
>> +    }
>>  
>>      trace_multifd_send(p->id, packet_num,
>>                         be32_to_cpu(packet->normal_pages),
>> @@ -563,7 +566,12 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>      p->packet_num = be64_to_cpu(packet->packet_num);
>>      p->packets_recved++;
>>  
>> -    ret = multifd_ram_unfill_packet(p, errp);
>> +    if (p->flags & MULTIFD_FLAG_SYNC) {
>> +        p->normal_num = 0;
>> +        p->zero_num = 0;
>
> Instead of this, I wonder whether we shouldn't touch those fields at all,
> but:
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 0a85951d58..55abd9a1ef 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -1547,7 +1547,9 @@ static void *multifd_recv_thread(void *opaque)
>              flags = p->flags;
>              /* recv methods don't know how to handle the SYNC flag */
>              p->flags &= ~MULTIFD_FLAG_SYNC;
> -            has_data = p->normal_num || p->zero_num;
> +
> +            if (!(flags & MULTIFD_FLAG_SYNC))
> +                has_data = p->normal_num || p->zero_num;
>              qemu_mutex_unlock(&p->mutex);
>          } else {
>              /*

Good idea.

>
>> +    } else {
>> +        ret = multifd_ram_unfill_packet(p, errp);
>> +    }
>>  
>>      trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
>>                         p->flags, p->next_packet_size);
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (7 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC Fabiano Rosas
@ 2024-07-22 17:59 ` Fabiano Rosas
  2024-07-22 19:55   ` Peter Xu
  2024-07-22 19:58 ` [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Peter Xu
  9 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 17:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Peter Xu, Maciej S . Szmigiero

Multifd currently has a simple scheduling mechanism that distributes
work to the various channels by keeping storage space within each
channel and an extra space that is given to the client. Each time the
client fills the space with data and calls into multifd, that space is
given to the next idle channel and a free storage space is taken from
the channel and given to client for the next iteration.

This means we always need (#multifd_channels + 1) memory slots to
operate multifd.

This is fine, except that the presence of this one extra memory slot
doesn't allow different types of payloads to be processed at the same
time in different channels, i.e. the data type of
multifd_send_state->pages needs to be the same as p->pages.

For each new data type different from MultiFDPage_t that is to be
handled, this logic would need to be duplicated by adding new fields
to multifd_send_state, to the channels and to multifd_send_pages().

Fix this situation by moving the extra slot into the client and using
only the generic type MultiFDSendData in the multifd core.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 58 ++++++++++++++++++++++++++++-----------------
 migration/multifd.h |  2 ++
 migration/ram.c     |  1 +
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 4394ca6ade..0a85951d58 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -49,7 +49,6 @@ typedef struct {
 
 struct {
     MultiFDSendParams *params;
-    MultiFDSendData *data;
     /*
      * Global number of generated multifd packets.
      *
@@ -97,6 +96,8 @@ struct {
     MultiFDMethods *ops;
 } *multifd_recv_state;
 
+/* TODO: move these to multifd-ram.c */
+static MultiFDSendData *multifd_ram_send;
 static size_t multifd_ram_payload_size(void)
 {
     uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
@@ -118,6 +119,14 @@ static MultiFDSendData *multifd_send_data_alloc(void)
     return g_malloc0(sizeof(MultiFDPayloadType) + max_payload_size);
 }
 
+void multifd_ram_save_setup(void)
+{
+    uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+
+    multifd_ram_send = multifd_send_data_alloc();
+    multifd_ram_send->u.ram.allocated = n;
+}
+
 static bool multifd_use_packets(void)
 {
     return !migrate_mapped_ram();
@@ -620,7 +629,7 @@ static void multifd_send_kick_main(MultiFDSendParams *p)
  *
  * Returns true if succeed, false otherwise.
  */
-static bool multifd_send_pages(void)
+static bool multifd_send(MultiFDSendData **send_data)
 {
     int i;
     static int next_channel;
@@ -661,11 +670,16 @@ static bool multifd_send_pages(void)
      */
     smp_mb_acquire();
 
-    assert(!p->data->u.ram.num);
+    assert(multifd_payload_empty(p->data));
 
-    tmp = multifd_send_state->data;
-    multifd_send_state->data = p->data;
+    /*
+     * Swap the pointers. The channel gets the client data for
+     * transferring and the client gets back an unused data slot.
+     */
+    tmp = *send_data;
+    *send_data = p->data;
     p->data = tmp;
+
     /*
      * Making sure p->data is setup before marking pending_job=true. Pairs
      * with the qatomic_load_acquire() in multifd_send_thread().
@@ -697,7 +711,12 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
     MultiFDPages_t *pages;
 
 retry:
-    pages = &multifd_send_state->data->u.ram;
+    pages = &multifd_ram_send->u.ram;
+
+    if (multifd_payload_empty(multifd_ram_send)) {
+        multifd_pages_reset(pages);
+        multifd_set_payload_type(multifd_ram_send, MULTIFD_PAYLOAD_RAM);
+    }
 
     /* If the queue is empty, we can already enqueue now */
     if (multifd_queue_empty(pages)) {
@@ -715,7 +734,7 @@ retry:
      * After flush, always retry.
      */
     if (pages->block != block || multifd_queue_full(pages)) {
-        if (!multifd_send_pages()) {
+        if (!multifd_send(&multifd_ram_send)) {
             return false;
         }
         goto retry;
@@ -833,6 +852,8 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
     g_free(p->packet);
     p->packet = NULL;
     multifd_send_state->ops->send_cleanup(p, errp);
+    g_free(p->data);
+    p->data = NULL;
 
     return *errp == NULL;
 }
@@ -845,8 +866,6 @@ static void multifd_send_cleanup_state(void)
     qemu_sem_destroy(&multifd_send_state->channels_ready);
     g_free(multifd_send_state->params);
     multifd_send_state->params = NULL;
-    g_free(multifd_send_state->data);
-    multifd_send_state->data = NULL;
     g_free(multifd_send_state);
     multifd_send_state = NULL;
 }
@@ -895,15 +914,14 @@ int multifd_send_sync_main(void)
 {
     int i;
     bool flush_zero_copy;
-    MultiFDPages_t *pages;
 
     if (!migrate_multifd()) {
         return 0;
     }
-    pages = &multifd_send_state->data->u.ram;
-    if (pages->num) {
-        if (!multifd_send_pages()) {
-            error_report("%s: multifd_send_pages fail", __func__);
+
+    if (!multifd_payload_empty(multifd_ram_send)) {
+        if (!multifd_send(&multifd_ram_send)) {
+            error_report("%s: multifd_send fail", __func__);
             return -1;
         }
     }
@@ -977,13 +995,11 @@ static void *multifd_send_thread(void *opaque)
 
         /*
          * Read pending_job flag before p->data.  Pairs with the
-         * qatomic_store_release() in multifd_send_pages().
+         * qatomic_store_release() in multifd_send().
          */
         if (qatomic_load_acquire(&p->pending_job)) {
-            MultiFDPages_t *pages = &p->data->u.ram;
-
             p->iovs_num = 0;
-            assert(pages->num);
+            assert(!multifd_payload_empty(p->data));
 
             ret = multifd_send_state->ops->send_prepare(p, &local_err);
             if (ret != 0) {
@@ -1006,13 +1022,13 @@ static void *multifd_send_thread(void *opaque)
             stat64_add(&mig_stats.multifd_bytes,
                        p->next_packet_size + p->packet_len);
 
-            multifd_pages_reset(pages);
             p->next_packet_size = 0;
+            multifd_set_payload_type(p->data, MULTIFD_PAYLOAD_NONE);
 
             /*
              * Making sure p->data is published before saying "we're
              * free".  Pairs with the smp_mb_acquire() in
-             * multifd_send_pages().
+             * multifd_send().
              */
             qatomic_store_release(&p->pending_job, false);
         } else {
@@ -1206,8 +1222,6 @@ bool multifd_send_setup(void)
     thread_count = migrate_multifd_channels();
     multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
     multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
-    multifd_send_state->data = multifd_send_data_alloc();
-    multifd_send_state->data->u.ram.allocated = page_count;
     qemu_sem_init(&multifd_send_state->channels_created, 0);
     qemu_sem_init(&multifd_send_state->channels_ready, 0);
     qatomic_set(&multifd_send_state->exiting, 0);
diff --git a/migration/multifd.h b/migration/multifd.h
index c9c01579a0..04c000f435 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -113,6 +113,8 @@ static inline void multifd_set_payload_type(MultiFDSendData *data,
     data->type = type;
 }
 
+void multifd_ram_save_setup(void);
+
 typedef struct {
     /* Fields are only written at creating/deletion time */
     /* No lock required for them, they are read only */
diff --git a/migration/ram.c b/migration/ram.c
index edec1a2d07..2b90396b3c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3058,6 +3058,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
     migration_ops = g_malloc0(sizeof(MigrationOps));
 
     if (migrate_multifd()) {
+        multifd_ram_save_setup();
         migration_ops->ram_save_target_page = ram_save_target_page_multifd;
     } else {
         migration_ops->ram_save_target_page = ram_save_target_page_legacy;
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data
  2024-07-22 17:59 ` [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data Fabiano Rosas
@ 2024-07-22 19:55   ` Peter Xu
  2024-07-22 20:26     ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:55 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:14PM -0300, Fabiano Rosas wrote:
> Multifd currently has a simple scheduling mechanism that distributes
> work to the various channels by keeping storage space within each
> channel and an extra space that is given to the client. Each time the
> client fills the space with data and calls into multifd, that space is
> given to the next idle channel and a free storage space is taken from
> the channel and given to client for the next iteration.
> 
> This means we always need (#multifd_channels + 1) memory slots to
> operate multifd.
> 
> This is fine, except that the presence of this one extra memory slot
> doesn't allow different types of payloads to be processed at the same
> time in different channels, i.e. the data type of
> multifd_send_state->pages needs to be the same as p->pages.
> 
> For each new data type different from MultiFDPage_t that is to be
> handled, this logic would need to be duplicated by adding new fields
> to multifd_send_state, to the channels and to multifd_send_pages().
> 
> Fix this situation by moving the extra slot into the client and using
> only the generic type MultiFDSendData in the multifd core.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/multifd.c | 58 ++++++++++++++++++++++++++++-----------------
>  migration/multifd.h |  2 ++
>  migration/ram.c     |  1 +
>  3 files changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 4394ca6ade..0a85951d58 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -49,7 +49,6 @@ typedef struct {
>  
>  struct {
>      MultiFDSendParams *params;
> -    MultiFDSendData *data;
>      /*
>       * Global number of generated multifd packets.
>       *
> @@ -97,6 +96,8 @@ struct {
>      MultiFDMethods *ops;
>  } *multifd_recv_state;
>  
> +/* TODO: move these to multifd-ram.c */
> +static MultiFDSendData *multifd_ram_send;
>  static size_t multifd_ram_payload_size(void)
>  {
>      uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> @@ -118,6 +119,14 @@ static MultiFDSendData *multifd_send_data_alloc(void)
>      return g_malloc0(sizeof(MultiFDPayloadType) + max_payload_size);
>  }
>  
> +void multifd_ram_save_setup(void)
> +{
> +    uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> +
> +    multifd_ram_send = multifd_send_data_alloc();
> +    multifd_ram_send->u.ram.allocated = n;

IIUC this line won't help, as the type is still NONE.. We may need to reset
this in multifd_pages_reset() even if it's a constant to RAM code.

Side note: looks like multifd_ram_send is leaked across the patch.

> +}
> +
>  static bool multifd_use_packets(void)
>  {
>      return !migrate_mapped_ram();
> @@ -620,7 +629,7 @@ static void multifd_send_kick_main(MultiFDSendParams *p)
>   *
>   * Returns true if succeed, false otherwise.
>   */
> -static bool multifd_send_pages(void)
> +static bool multifd_send(MultiFDSendData **send_data)
>  {
>      int i;
>      static int next_channel;
> @@ -661,11 +670,16 @@ static bool multifd_send_pages(void)
>       */
>      smp_mb_acquire();
>  
> -    assert(!p->data->u.ram.num);
> +    assert(multifd_payload_empty(p->data));
>  
> -    tmp = multifd_send_state->data;
> -    multifd_send_state->data = p->data;
> +    /*
> +     * Swap the pointers. The channel gets the client data for
> +     * transferring and the client gets back an unused data slot.
> +     */
> +    tmp = *send_data;
> +    *send_data = p->data;
>      p->data = tmp;
> +
>      /*
>       * Making sure p->data is setup before marking pending_job=true. Pairs
>       * with the qatomic_load_acquire() in multifd_send_thread().
> @@ -697,7 +711,12 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
>      MultiFDPages_t *pages;
>  
>  retry:
> -    pages = &multifd_send_state->data->u.ram;
> +    pages = &multifd_ram_send->u.ram;
> +
> +    if (multifd_payload_empty(multifd_ram_send)) {
> +        multifd_pages_reset(pages);
> +        multifd_set_payload_type(multifd_ram_send, MULTIFD_PAYLOAD_RAM);
> +    }
>  
>      /* If the queue is empty, we can already enqueue now */
>      if (multifd_queue_empty(pages)) {
> @@ -715,7 +734,7 @@ retry:
>       * After flush, always retry.
>       */
>      if (pages->block != block || multifd_queue_full(pages)) {
> -        if (!multifd_send_pages()) {
> +        if (!multifd_send(&multifd_ram_send)) {
>              return false;
>          }
>          goto retry;
> @@ -833,6 +852,8 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
>      g_free(p->packet);
>      p->packet = NULL;
>      multifd_send_state->ops->send_cleanup(p, errp);
> +    g_free(p->data);
> +    p->data = NULL;

These two lines look superfluous.

>  
>      return *errp == NULL;
>  }
> @@ -845,8 +866,6 @@ static void multifd_send_cleanup_state(void)
>      qemu_sem_destroy(&multifd_send_state->channels_ready);
>      g_free(multifd_send_state->params);
>      multifd_send_state->params = NULL;
> -    g_free(multifd_send_state->data);
> -    multifd_send_state->data = NULL;
>      g_free(multifd_send_state);
>      multifd_send_state = NULL;
>  }
> @@ -895,15 +914,14 @@ int multifd_send_sync_main(void)
>  {
>      int i;
>      bool flush_zero_copy;
> -    MultiFDPages_t *pages;
>  
>      if (!migrate_multifd()) {
>          return 0;
>      }
> -    pages = &multifd_send_state->data->u.ram;
> -    if (pages->num) {
> -        if (!multifd_send_pages()) {
> -            error_report("%s: multifd_send_pages fail", __func__);
> +
> +    if (!multifd_payload_empty(multifd_ram_send)) {
> +        if (!multifd_send(&multifd_ram_send)) {
> +            error_report("%s: multifd_send fail", __func__);
>              return -1;
>          }
>      }
> @@ -977,13 +995,11 @@ static void *multifd_send_thread(void *opaque)
>  
>          /*
>           * Read pending_job flag before p->data.  Pairs with the
> -         * qatomic_store_release() in multifd_send_pages().
> +         * qatomic_store_release() in multifd_send().
>           */
>          if (qatomic_load_acquire(&p->pending_job)) {
> -            MultiFDPages_t *pages = &p->data->u.ram;
> -
>              p->iovs_num = 0;
> -            assert(pages->num);
> +            assert(!multifd_payload_empty(p->data));
>  
>              ret = multifd_send_state->ops->send_prepare(p, &local_err);
>              if (ret != 0) {
> @@ -1006,13 +1022,13 @@ static void *multifd_send_thread(void *opaque)
>              stat64_add(&mig_stats.multifd_bytes,
>                         p->next_packet_size + p->packet_len);
>  
> -            multifd_pages_reset(pages);
>              p->next_packet_size = 0;
> +            multifd_set_payload_type(p->data, MULTIFD_PAYLOAD_NONE);
>  
>              /*
>               * Making sure p->data is published before saying "we're
>               * free".  Pairs with the smp_mb_acquire() in
> -             * multifd_send_pages().
> +             * multifd_send().
>               */
>              qatomic_store_release(&p->pending_job, false);
>          } else {
> @@ -1206,8 +1222,6 @@ bool multifd_send_setup(void)
>      thread_count = migrate_multifd_channels();
>      multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>      multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
> -    multifd_send_state->data = multifd_send_data_alloc();
> -    multifd_send_state->data->u.ram.allocated = page_count;
>      qemu_sem_init(&multifd_send_state->channels_created, 0);
>      qemu_sem_init(&multifd_send_state->channels_ready, 0);
>      qatomic_set(&multifd_send_state->exiting, 0);
> diff --git a/migration/multifd.h b/migration/multifd.h
> index c9c01579a0..04c000f435 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -113,6 +113,8 @@ static inline void multifd_set_payload_type(MultiFDSendData *data,
>      data->type = type;
>  }
>  
> +void multifd_ram_save_setup(void);
> +
>  typedef struct {
>      /* Fields are only written at creating/deletion time */
>      /* No lock required for them, they are read only */
> diff --git a/migration/ram.c b/migration/ram.c
> index edec1a2d07..2b90396b3c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3058,6 +3058,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
>      migration_ops = g_malloc0(sizeof(MigrationOps));
>  
>      if (migrate_multifd()) {
> +        multifd_ram_save_setup();
>          migration_ops->ram_save_target_page = ram_save_target_page_multifd;
>      } else {
>          migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data
  2024-07-22 19:55   ` Peter Xu
@ 2024-07-22 20:26     ` Fabiano Rosas
  2024-07-22 20:41       ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 20:26 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:14PM -0300, Fabiano Rosas wrote:
>> Multifd currently has a simple scheduling mechanism that distributes
>> work to the various channels by keeping storage space within each
>> channel and an extra space that is given to the client. Each time the
>> client fills the space with data and calls into multifd, that space is
>> given to the next idle channel and a free storage space is taken from
>> the channel and given to client for the next iteration.
>> 
>> This means we always need (#multifd_channels + 1) memory slots to
>> operate multifd.
>> 
>> This is fine, except that the presence of this one extra memory slot
>> doesn't allow different types of payloads to be processed at the same
>> time in different channels, i.e. the data type of
>> multifd_send_state->pages needs to be the same as p->pages.
>> 
>> For each new data type different from MultiFDPage_t that is to be
>> handled, this logic would need to be duplicated by adding new fields
>> to multifd_send_state, to the channels and to multifd_send_pages().
>> 
>> Fix this situation by moving the extra slot into the client and using
>> only the generic type MultiFDSendData in the multifd core.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/multifd.c | 58 ++++++++++++++++++++++++++++-----------------
>>  migration/multifd.h |  2 ++
>>  migration/ram.c     |  1 +
>>  3 files changed, 39 insertions(+), 22 deletions(-)
>> 
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index 4394ca6ade..0a85951d58 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -49,7 +49,6 @@ typedef struct {
>>  
>>  struct {
>>      MultiFDSendParams *params;
>> -    MultiFDSendData *data;
>>      /*
>>       * Global number of generated multifd packets.
>>       *
>> @@ -97,6 +96,8 @@ struct {
>>      MultiFDMethods *ops;
>>  } *multifd_recv_state;
>>  
>> +/* TODO: move these to multifd-ram.c */
>> +static MultiFDSendData *multifd_ram_send;
>>  static size_t multifd_ram_payload_size(void)
>>  {
>>      uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
>> @@ -118,6 +119,14 @@ static MultiFDSendData *multifd_send_data_alloc(void)
>>      return g_malloc0(sizeof(MultiFDPayloadType) + max_payload_size);
>>  }
>>  
>> +void multifd_ram_save_setup(void)
>> +{
>> +    uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
>> +
>> +    multifd_ram_send = multifd_send_data_alloc();
>> +    multifd_ram_send->u.ram.allocated = n;
>
> IIUC this line won't help, as the type is still NONE.. We may need to reset
> this in multifd_pages_reset() even if it's a constant to RAM code.

I could maybe just hardcode it in the packet. No point setting this
every time.

>
> Side note: looks like multifd_ram_send is leaked across the patch.
>

Hm I'll look into it, thanks.

>> +}
>> +
>>  static bool multifd_use_packets(void)
>>  {
>>      return !migrate_mapped_ram();
>> @@ -620,7 +629,7 @@ static void multifd_send_kick_main(MultiFDSendParams *p)
>>   *
>>   * Returns true if succeed, false otherwise.
>>   */
>> -static bool multifd_send_pages(void)
>> +static bool multifd_send(MultiFDSendData **send_data)
>>  {
>>      int i;
>>      static int next_channel;
>> @@ -661,11 +670,16 @@ static bool multifd_send_pages(void)
>>       */
>>      smp_mb_acquire();
>>  
>> -    assert(!p->data->u.ram.num);
>> +    assert(multifd_payload_empty(p->data));
>>  
>> -    tmp = multifd_send_state->data;
>> -    multifd_send_state->data = p->data;
>> +    /*
>> +     * Swap the pointers. The channel gets the client data for
>> +     * transferring and the client gets back an unused data slot.
>> +     */
>> +    tmp = *send_data;
>> +    *send_data = p->data;
>>      p->data = tmp;
>> +
>>      /*
>>       * Making sure p->data is setup before marking pending_job=true. Pairs
>>       * with the qatomic_load_acquire() in multifd_send_thread().
>> @@ -697,7 +711,12 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
>>      MultiFDPages_t *pages;
>>  
>>  retry:
>> -    pages = &multifd_send_state->data->u.ram;
>> +    pages = &multifd_ram_send->u.ram;
>> +
>> +    if (multifd_payload_empty(multifd_ram_send)) {
>> +        multifd_pages_reset(pages);
>> +        multifd_set_payload_type(multifd_ram_send, MULTIFD_PAYLOAD_RAM);
>> +    }
>>  
>>      /* If the queue is empty, we can already enqueue now */
>>      if (multifd_queue_empty(pages)) {
>> @@ -715,7 +734,7 @@ retry:
>>       * After flush, always retry.
>>       */
>>      if (pages->block != block || multifd_queue_full(pages)) {
>> -        if (!multifd_send_pages()) {
>> +        if (!multifd_send(&multifd_ram_send)) {
>>              return false;
>>          }
>>          goto retry;
>> @@ -833,6 +852,8 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
>>      g_free(p->packet);
>>      p->packet = NULL;
>>      multifd_send_state->ops->send_cleanup(p, errp);
>> +    g_free(p->data);
>> +    p->data = NULL;
>
> These two lines look superfluous.
>

Rebase mistake, sorry.

>>  
>>      return *errp == NULL;
>>  }
>> @@ -845,8 +866,6 @@ static void multifd_send_cleanup_state(void)
>>      qemu_sem_destroy(&multifd_send_state->channels_ready);
>>      g_free(multifd_send_state->params);
>>      multifd_send_state->params = NULL;
>> -    g_free(multifd_send_state->data);
>> -    multifd_send_state->data = NULL;
>>      g_free(multifd_send_state);
>>      multifd_send_state = NULL;
>>  }
>> @@ -895,15 +914,14 @@ int multifd_send_sync_main(void)
>>  {
>>      int i;
>>      bool flush_zero_copy;
>> -    MultiFDPages_t *pages;
>>  
>>      if (!migrate_multifd()) {
>>          return 0;
>>      }
>> -    pages = &multifd_send_state->data->u.ram;
>> -    if (pages->num) {
>> -        if (!multifd_send_pages()) {
>> -            error_report("%s: multifd_send_pages fail", __func__);
>> +
>> +    if (!multifd_payload_empty(multifd_ram_send)) {
>> +        if (!multifd_send(&multifd_ram_send)) {
>> +            error_report("%s: multifd_send fail", __func__);
>>              return -1;
>>          }
>>      }
>> @@ -977,13 +995,11 @@ static void *multifd_send_thread(void *opaque)
>>  
>>          /*
>>           * Read pending_job flag before p->data.  Pairs with the
>> -         * qatomic_store_release() in multifd_send_pages().
>> +         * qatomic_store_release() in multifd_send().
>>           */
>>          if (qatomic_load_acquire(&p->pending_job)) {
>> -            MultiFDPages_t *pages = &p->data->u.ram;
>> -
>>              p->iovs_num = 0;
>> -            assert(pages->num);
>> +            assert(!multifd_payload_empty(p->data));
>>  
>>              ret = multifd_send_state->ops->send_prepare(p, &local_err);
>>              if (ret != 0) {
>> @@ -1006,13 +1022,13 @@ static void *multifd_send_thread(void *opaque)
>>              stat64_add(&mig_stats.multifd_bytes,
>>                         p->next_packet_size + p->packet_len);
>>  
>> -            multifd_pages_reset(pages);
>>              p->next_packet_size = 0;
>> +            multifd_set_payload_type(p->data, MULTIFD_PAYLOAD_NONE);
>>  
>>              /*
>>               * Making sure p->data is published before saying "we're
>>               * free".  Pairs with the smp_mb_acquire() in
>> -             * multifd_send_pages().
>> +             * multifd_send().
>>               */
>>              qatomic_store_release(&p->pending_job, false);
>>          } else {
>> @@ -1206,8 +1222,6 @@ bool multifd_send_setup(void)
>>      thread_count = migrate_multifd_channels();
>>      multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>>      multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
>> -    multifd_send_state->data = multifd_send_data_alloc();
>> -    multifd_send_state->data->u.ram.allocated = page_count;
>>      qemu_sem_init(&multifd_send_state->channels_created, 0);
>>      qemu_sem_init(&multifd_send_state->channels_ready, 0);
>>      qatomic_set(&multifd_send_state->exiting, 0);
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index c9c01579a0..04c000f435 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -113,6 +113,8 @@ static inline void multifd_set_payload_type(MultiFDSendData *data,
>>      data->type = type;
>>  }
>>  
>> +void multifd_ram_save_setup(void);
>> +
>>  typedef struct {
>>      /* Fields are only written at creating/deletion time */
>>      /* No lock required for them, they are read only */
>> diff --git a/migration/ram.c b/migration/ram.c
>> index edec1a2d07..2b90396b3c 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -3058,6 +3058,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
>>      migration_ops = g_malloc0(sizeof(MigrationOps));
>>  
>>      if (migrate_multifd()) {
>> +        multifd_ram_save_setup();
>>          migration_ops->ram_save_target_page = ram_save_target_page_multifd;
>>      } else {
>>          migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data
  2024-07-22 20:26     ` Fabiano Rosas
@ 2024-07-22 20:41       ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2024-07-22 20:41 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 05:26:49PM -0300, Fabiano Rosas wrote:
> >> +void multifd_ram_save_setup(void)
> >> +{
> >> +    uint32_t n = MULTIFD_PACKET_SIZE / qemu_target_page_size();
> >> +
> >> +    multifd_ram_send = multifd_send_data_alloc();
> >> +    multifd_ram_send->u.ram.allocated = n;
> >
> > IIUC this line won't help, as the type is still NONE.. We may need to reset
> > this in multifd_pages_reset() even if it's a constant to RAM code.
> 
> I could maybe just hardcode it in the packet. No point setting this
> every time.

Yeah sounds ok.  As long as this keeps working after VFIO can reuse this
object at some point (and overwrite "allocate"), then it looks fine by me.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
                   ` (8 preceding siblings ...)
  2024-07-22 17:59 ` [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data Fabiano Rosas
@ 2024-07-22 19:58 ` Peter Xu
  2024-07-22 20:21   ` Fabiano Rosas
  9 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 19:58 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
> Hi,
> 
> In this v2 I took Peter's suggestion of keeping the channels' pointers
> and moving only the extra slot. The major changes are in patches 5 and
> 9. Patch 3 introduces the structure:
> 
> typedef enum {
>     MULTIFD_PAYLOAD_NONE,
>     MULTIFD_PAYLOAD_RAM,
> } MultiFDPayloadType;
> 
> struct MultiFDSendData {
>     MultiFDPayloadType type;
>     union {
>         MultiFDPages_t ram;
>     } u;
> };
> 
> I added a NONE type so we can use it to tell when the channel has
> finished sending a packet, since we'll need to switch types between
> clients anyway. This avoids having to introduce a 'size', or 'free'
> variable.

This at least looks better to me, thanks.

> 
> WHAT'S MISSING:
> 
> - The support for calling multifd_send() concurrently. Maciej has this
>   in his series so I didn't touch it.
> 
> - A way of adding methods for the new payload type. Currently, the
>   compression methods are somewhat coupled with ram migration, so I'm
>   not sure how to proceed.

What is this one?  Why compression methods need new payload?  Aren't they
ram-typed?

> 
> - Moving all the multifd ram code into multifd-ram.c after this^ is
>   sorted out.
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1381005020
> 
> v1:
> https://lore.kernel.org/r/20240620212111.29319-1-farosas@suse.de
> 
> First of all, apologies for the roughness of the series. I'm off for
> the next couple of weeks and wanted to put something together early
> for your consideration.

PS: I assume this is old content, or I'll envy you on how frequent you can
take days off!..

> 
> This series is a refactoring (based on an earlier, off-list
> attempt[0]), aimed to remove the usage of the MultiFDPages_t type in
> the multifd core. If we're going to add support for more data types to
> multifd, we first need to clean that up.
> 
> This time around this work was prompted by Maciej's series[1]. I see
> you're having to add a bunch of is_device_state checks to work around
> the rigidity of the code.
> 
> Aside from the VFIO work, there is also the intent (coming back from
> Juan's ideas) to make multifd the default code path for migration,
> which will have to include the vmstate migration and anything else we
> put on the stream via QEMUFile.
> 
> I have long since been bothered by having 'pages' sprinkled all over
> the code, so I might be coming at this with a bit of a narrow focus,
> but I believe in order to support more types of payloads in multifd,
> we need to first allow the scheduling at multifd_send_pages() to be
> independent of MultiFDPages_t. So here it is. Let me know what you
> think.
> 
> (as I said, I'll be off for a couple of weeks, so feel free to
> incorporate any of this code if it's useful. Or to ignore it
> completely).
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1340992028
> 
> 0- https://github.com/farosas/qemu/commits/multifd-packet-cleanups/
> 1- https://lore.kernel.org/r/cover.1718717584.git.maciej.szmigiero@oracle.com
> 
> Fabiano Rosas (9):
>   migration/multifd: Reduce access to p->pages
>   migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
>   migration/multifd: Introduce MultiFDSendData
>   migration/multifd: Make MultiFDPages_t:offset a flexible array member
>   migration/multifd: Replace p->pages with an union pointer
>   migration/multifd: Move pages accounting into
>     multifd_send_zero_page_detect()
>   migration/multifd: Isolate ram pages packet data
>   migration/multifd: Don't send ram data during SYNC
>   migration/multifd: Replace multifd_send_state->pages with client data
> 
>  migration/file.c              |   3 +-
>  migration/file.h              |   2 +-
>  migration/multifd-qpl.c       |  10 +-
>  migration/multifd-uadk.c      |   9 +-
>  migration/multifd-zero-page.c |   9 +-
>  migration/multifd-zlib.c      |   4 +-
>  migration/multifd-zstd.c      |   4 +-
>  migration/multifd.c           | 239 +++++++++++++++++++++-------------
>  migration/multifd.h           |  37 ++++--
>  migration/ram.c               |   1 +
>  10 files changed, 201 insertions(+), 117 deletions(-)
> 
> 
> base-commit: a7ddb48bd1363c8bcdf42776d320289c42191f01
> -- 
> 2.35.3
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 19:58 ` [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Peter Xu
@ 2024-07-22 20:21   ` Fabiano Rosas
  2024-07-22 20:54     ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 20:21 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> Hi,
>> 
>> In this v2 I took Peter's suggestion of keeping the channels' pointers
>> and moving only the extra slot. The major changes are in patches 5 and
>> 9. Patch 3 introduces the structure:
>> 
>> typedef enum {
>>     MULTIFD_PAYLOAD_NONE,
>>     MULTIFD_PAYLOAD_RAM,
>> } MultiFDPayloadType;
>> 
>> struct MultiFDSendData {
>>     MultiFDPayloadType type;
>>     union {
>>         MultiFDPages_t ram;
>>     } u;
>> };
>> 
>> I added a NONE type so we can use it to tell when the channel has
>> finished sending a packet, since we'll need to switch types between
>> clients anyway. This avoids having to introduce a 'size', or 'free'
>> variable.
>
> This at least looks better to me, thanks.
>
>> 
>> WHAT'S MISSING:
>> 
>> - The support for calling multifd_send() concurrently. Maciej has this
>>   in his series so I didn't touch it.
>> 
>> - A way of adding methods for the new payload type. Currently, the
>>   compression methods are somewhat coupled with ram migration, so I'm
>>   not sure how to proceed.
>
> What is this one?  Why compression methods need new payload?  Aren't they
> ram-typed?

The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
either nocomp, or the compression-specific methods
(e.g. zlib_send_prepare).

How do we add methods for the upcoming new payload types? I don't expect
us to continue using nocomp and then do "if (ram)... else if
(device_state) ..." inside of them. I would expect us to rename
s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
(e.g. vfio_send_prepare, vmstate_send_prepare, etc).

multifd_nocomp_ops -> multifd_ram_ops // rename
multifd_zlib_ops   // existing
multifd_device_ops // new

The challenge here is that the current framework is nocomp
vs. compression. It needs to become ram + compression vs. other types.

>
>> 
>> - Moving all the multifd ram code into multifd-ram.c after this^ is
>>   sorted out.
>> 
>> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1381005020
>> 
>> v1:

(1)

>> https://lore.kernel.org/r/20240620212111.29319-1-farosas@suse.de
>> 
>> First of all, apologies for the roughness of the series. I'm off for
>> the next couple of weeks and wanted to put something together early
>> for your consideration.
>
> PS: I assume this is old content, or I'll envy you on how frequent you can
> take days off!..

That't the v1 cover letter (1). I don't like to post v2 with "here's
what changed" without having the previous cover-letter at hand for
referencing.

>
>> 
>> This series is a refactoring (based on an earlier, off-list
>> attempt[0]), aimed to remove the usage of the MultiFDPages_t type in
>> the multifd core. If we're going to add support for more data types to
>> multifd, we first need to clean that up.
>> 
>> This time around this work was prompted by Maciej's series[1]. I see
>> you're having to add a bunch of is_device_state checks to work around
>> the rigidity of the code.
>> 
>> Aside from the VFIO work, there is also the intent (coming back from
>> Juan's ideas) to make multifd the default code path for migration,
>> which will have to include the vmstate migration and anything else we
>> put on the stream via QEMUFile.
>> 
>> I have long since been bothered by having 'pages' sprinkled all over
>> the code, so I might be coming at this with a bit of a narrow focus,
>> but I believe in order to support more types of payloads in multifd,
>> we need to first allow the scheduling at multifd_send_pages() to be
>> independent of MultiFDPages_t. So here it is. Let me know what you
>> think.
>> 
>> (as I said, I'll be off for a couple of weeks, so feel free to
>> incorporate any of this code if it's useful. Or to ignore it
>> completely).
>> 
>> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1340992028
>> 
>> 0- https://github.com/farosas/qemu/commits/multifd-packet-cleanups/
>> 1- https://lore.kernel.org/r/cover.1718717584.git.maciej.szmigiero@oracle.com
>> 
>> Fabiano Rosas (9):
>>   migration/multifd: Reduce access to p->pages
>>   migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov
>>   migration/multifd: Introduce MultiFDSendData
>>   migration/multifd: Make MultiFDPages_t:offset a flexible array member
>>   migration/multifd: Replace p->pages with an union pointer
>>   migration/multifd: Move pages accounting into
>>     multifd_send_zero_page_detect()
>>   migration/multifd: Isolate ram pages packet data
>>   migration/multifd: Don't send ram data during SYNC
>>   migration/multifd: Replace multifd_send_state->pages with client data
>> 
>>  migration/file.c              |   3 +-
>>  migration/file.h              |   2 +-
>>  migration/multifd-qpl.c       |  10 +-
>>  migration/multifd-uadk.c      |   9 +-
>>  migration/multifd-zero-page.c |   9 +-
>>  migration/multifd-zlib.c      |   4 +-
>>  migration/multifd-zstd.c      |   4 +-
>>  migration/multifd.c           | 239 +++++++++++++++++++++-------------
>>  migration/multifd.h           |  37 ++++--
>>  migration/ram.c               |   1 +
>>  10 files changed, 201 insertions(+), 117 deletions(-)
>> 
>> 
>> base-commit: a7ddb48bd1363c8bcdf42776d320289c42191f01
>> -- 
>> 2.35.3
>> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 20:21   ` Fabiano Rosas
@ 2024-07-22 20:54     ` Peter Xu
  2024-07-22 21:20       ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 20:54 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
> >> Hi,
> >> 
> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
> >> and moving only the extra slot. The major changes are in patches 5 and
> >> 9. Patch 3 introduces the structure:
> >> 
> >> typedef enum {
> >>     MULTIFD_PAYLOAD_NONE,
> >>     MULTIFD_PAYLOAD_RAM,
> >> } MultiFDPayloadType;
> >> 
> >> struct MultiFDSendData {
> >>     MultiFDPayloadType type;
> >>     union {
> >>         MultiFDPages_t ram;
> >>     } u;
> >> };
> >> 
> >> I added a NONE type so we can use it to tell when the channel has
> >> finished sending a packet, since we'll need to switch types between
> >> clients anyway. This avoids having to introduce a 'size', or 'free'
> >> variable.
> >
> > This at least looks better to me, thanks.
> >
> >> 
> >> WHAT'S MISSING:
> >> 
> >> - The support for calling multifd_send() concurrently. Maciej has this
> >>   in his series so I didn't touch it.
> >> 
> >> - A way of adding methods for the new payload type. Currently, the
> >>   compression methods are somewhat coupled with ram migration, so I'm
> >>   not sure how to proceed.
> >
> > What is this one?  Why compression methods need new payload?  Aren't they
> > ram-typed?
> 
> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
> either nocomp, or the compression-specific methods
> (e.g. zlib_send_prepare).
> 
> How do we add methods for the upcoming new payload types? I don't expect
> us to continue using nocomp and then do "if (ram)... else if
> (device_state) ..." inside of them. I would expect us to rename
> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
> 
> multifd_nocomp_ops -> multifd_ram_ops // rename
> multifd_zlib_ops   // existing
> multifd_device_ops // new
> 
> The challenge here is that the current framework is nocomp
> vs. compression. It needs to become ram + compression vs. other types.

IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
that device state will need, and so far it's only (referring Maciej's
code):

static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
                                            Error **errp)
{
    multifd_send_prepare_header_device_state(p);

    assert(!(p->flags & MULTIFD_FLAG_SYNC));

    p->next_packet_size = p->device_state->buf_len;
    if (p->next_packet_size > 0) {
        p->iov[p->iovs_num].iov_base = p->device_state->buf;
        p->iov[p->iovs_num].iov_len = p->next_packet_size;
        p->iovs_num++;
    }

    p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;

    multifd_send_fill_packet_device_state(p);

    return 0;
}

None of other multifd_ops are used.

I think we can directly invoke this part of device state code in
multifd_send_thread() for now.  So far I think it should be ok.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 20:54     ` Peter Xu
@ 2024-07-22 21:20       ` Fabiano Rosas
  2024-07-22 23:01         ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-22 21:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> >> Hi,
>> >> 
>> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
>> >> and moving only the extra slot. The major changes are in patches 5 and
>> >> 9. Patch 3 introduces the structure:
>> >> 
>> >> typedef enum {
>> >>     MULTIFD_PAYLOAD_NONE,
>> >>     MULTIFD_PAYLOAD_RAM,
>> >> } MultiFDPayloadType;
>> >> 
>> >> struct MultiFDSendData {
>> >>     MultiFDPayloadType type;
>> >>     union {
>> >>         MultiFDPages_t ram;
>> >>     } u;
>> >> };
>> >> 
>> >> I added a NONE type so we can use it to tell when the channel has
>> >> finished sending a packet, since we'll need to switch types between
>> >> clients anyway. This avoids having to introduce a 'size', or 'free'
>> >> variable.
>> >
>> > This at least looks better to me, thanks.
>> >
>> >> 
>> >> WHAT'S MISSING:
>> >> 
>> >> - The support for calling multifd_send() concurrently. Maciej has this
>> >>   in his series so I didn't touch it.
>> >> 
>> >> - A way of adding methods for the new payload type. Currently, the
>> >>   compression methods are somewhat coupled with ram migration, so I'm
>> >>   not sure how to proceed.
>> >
>> > What is this one?  Why compression methods need new payload?  Aren't they
>> > ram-typed?
>> 
>> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
>> either nocomp, or the compression-specific methods
>> (e.g. zlib_send_prepare).
>> 
>> How do we add methods for the upcoming new payload types? I don't expect
>> us to continue using nocomp and then do "if (ram)... else if
>> (device_state) ..." inside of them. I would expect us to rename
>> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
>> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
>> 
>> multifd_nocomp_ops -> multifd_ram_ops // rename
>> multifd_zlib_ops   // existing
>> multifd_device_ops // new
>> 
>> The challenge here is that the current framework is nocomp
>> vs. compression. It needs to become ram + compression vs. other types.
>
> IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
> that device state will need, and so far it's only (referring Maciej's
> code):
>
> static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
>                                             Error **errp)
> {
>     multifd_send_prepare_header_device_state(p);
>
>     assert(!(p->flags & MULTIFD_FLAG_SYNC));
>
>     p->next_packet_size = p->device_state->buf_len;
>     if (p->next_packet_size > 0) {
>         p->iov[p->iovs_num].iov_base = p->device_state->buf;
>         p->iov[p->iovs_num].iov_len = p->next_packet_size;
>         p->iovs_num++;
>     }
>
>     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
>
>     multifd_send_fill_packet_device_state(p);
>
>     return 0;
> }
>
> None of other multifd_ops are used.

There's also a conditional around device_state when calling
->recv(). That could seems like it could go to a hook.

https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com

>
> I think we can directly invoke this part of device state code in
> multifd_send_thread() for now.  So far I think it should be ok.

It's not just that. There's also a check for "if (ram)" at every call to
multifd_ops to avoid calling the ram code when doing the device
migration. It would be way easier to just set noop functions for those.

static MultiFDMethods multifd_devstate_ops = {
    .send_setup = noop_send_setup,
    .send_cleanup = noop_send_cleanup,
    .send_prepare = devstate_send_prepare,
    .recv_setup = noop_recv_setup,
    .recv_cleanup = noop_recv_cleanup,
    .recv = devstate_recv
};

I'm not saying this needs to be done in this series though. But I do
think that's the correct design choice for the long term.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 21:20       ` Fabiano Rosas
@ 2024-07-22 23:01         ` Peter Xu
  2024-07-23 17:48           ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-22 23:01 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
> >> >> Hi,
> >> >> 
> >> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
> >> >> and moving only the extra slot. The major changes are in patches 5 and
> >> >> 9. Patch 3 introduces the structure:
> >> >> 
> >> >> typedef enum {
> >> >>     MULTIFD_PAYLOAD_NONE,
> >> >>     MULTIFD_PAYLOAD_RAM,
> >> >> } MultiFDPayloadType;
> >> >> 
> >> >> struct MultiFDSendData {
> >> >>     MultiFDPayloadType type;
> >> >>     union {
> >> >>         MultiFDPages_t ram;
> >> >>     } u;
> >> >> };
> >> >> 
> >> >> I added a NONE type so we can use it to tell when the channel has
> >> >> finished sending a packet, since we'll need to switch types between
> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
> >> >> variable.
> >> >
> >> > This at least looks better to me, thanks.
> >> >
> >> >> 
> >> >> WHAT'S MISSING:
> >> >> 
> >> >> - The support for calling multifd_send() concurrently. Maciej has this
> >> >>   in his series so I didn't touch it.
> >> >> 
> >> >> - A way of adding methods for the new payload type. Currently, the
> >> >>   compression methods are somewhat coupled with ram migration, so I'm
> >> >>   not sure how to proceed.
> >> >
> >> > What is this one?  Why compression methods need new payload?  Aren't they
> >> > ram-typed?
> >> 
> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
> >> either nocomp, or the compression-specific methods
> >> (e.g. zlib_send_prepare).
> >> 
> >> How do we add methods for the upcoming new payload types? I don't expect
> >> us to continue using nocomp and then do "if (ram)... else if
> >> (device_state) ..." inside of them. I would expect us to rename
> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
> >> 
> >> multifd_nocomp_ops -> multifd_ram_ops // rename
> >> multifd_zlib_ops   // existing
> >> multifd_device_ops // new
> >> 
> >> The challenge here is that the current framework is nocomp
> >> vs. compression. It needs to become ram + compression vs. other types.
> >
> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
> > that device state will need, and so far it's only (referring Maciej's
> > code):
> >
> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
> >                                             Error **errp)
> > {
> >     multifd_send_prepare_header_device_state(p);
> >
> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
> >
> >     p->next_packet_size = p->device_state->buf_len;
> >     if (p->next_packet_size > 0) {
> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
> >         p->iovs_num++;
> >     }
> >
> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
> >
> >     multifd_send_fill_packet_device_state(p);
> >
> >     return 0;
> > }
> >
> > None of other multifd_ops are used.
> 
> There's also a conditional around device_state when calling
> ->recv(). That could seems like it could go to a hook.
> 
> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com

Actually that's exactly what I think is right.. it looks to me now that we
could bypass anything in MultifdOps (including recv()) but let device state
be a parallel layer of MultifdOps itself, leaving MultifdOps only for
compressors.

And yeah, I still remember you just renamed it from recv_pages() to
recv()..  it's just that now when think it again it looks like cleaner to
make it only about pages..

> 
> >
> > I think we can directly invoke this part of device state code in
> > multifd_send_thread() for now.  So far I think it should be ok.
> 
> It's not just that. There's also a check for "if (ram)" at every call to
> multifd_ops to avoid calling the ram code when doing the device
> migration. It would be way easier to just set noop functions for those.
> 
> static MultiFDMethods multifd_devstate_ops = {
>     .send_setup = noop_send_setup,
>     .send_cleanup = noop_send_cleanup,
>     .send_prepare = devstate_send_prepare,
>     .recv_setup = noop_recv_setup,
>     .recv_cleanup = noop_recv_cleanup,
>     .recv = devstate_recv
> };
> 
> I'm not saying this needs to be done in this series though. But I do
> think that's the correct design choice for the long term.

Yes it should be separate.

And what I meant is we don't need all these noops, but recv() keeps being
ignored just like above, then for sender side, right now it's:

            ret = multifd_send_state->ops->send_prepare(p, &local_err);
            if (migrate_mapped_ram()) {
                file_write_ramblock_iov();
            } else {
                ret = qio_channel_writev_full_all();
            }

VFIO can process device state in parallel, so:

    if (ram) {
        ret = multifd_send_state->ops->send_prepare(p, &local_err);
        if (migrate_mapped_ram()) {
                file_write_ramblock_iov();
        } else {
                qio_channel_writev_full_all();
        }
    } else {
        // device state handling
        multifd_send_device_prepare(...);
        ...
        qio_channel_writev_full_all();
    }

Then MultifdOps doesn't apply to device states.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-22 23:01         ` Peter Xu
@ 2024-07-23 17:48           ` Fabiano Rosas
  2024-07-23 18:20             ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-23 17:48 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> >> >> Hi,
>> >> >> 
>> >> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
>> >> >> and moving only the extra slot. The major changes are in patches 5 and
>> >> >> 9. Patch 3 introduces the structure:
>> >> >> 
>> >> >> typedef enum {
>> >> >>     MULTIFD_PAYLOAD_NONE,
>> >> >>     MULTIFD_PAYLOAD_RAM,
>> >> >> } MultiFDPayloadType;
>> >> >> 
>> >> >> struct MultiFDSendData {
>> >> >>     MultiFDPayloadType type;
>> >> >>     union {
>> >> >>         MultiFDPages_t ram;
>> >> >>     } u;
>> >> >> };
>> >> >> 
>> >> >> I added a NONE type so we can use it to tell when the channel has
>> >> >> finished sending a packet, since we'll need to switch types between
>> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
>> >> >> variable.
>> >> >
>> >> > This at least looks better to me, thanks.
>> >> >
>> >> >> 
>> >> >> WHAT'S MISSING:
>> >> >> 
>> >> >> - The support for calling multifd_send() concurrently. Maciej has this
>> >> >>   in his series so I didn't touch it.
>> >> >> 
>> >> >> - A way of adding methods for the new payload type. Currently, the
>> >> >>   compression methods are somewhat coupled with ram migration, so I'm
>> >> >>   not sure how to proceed.
>> >> >
>> >> > What is this one?  Why compression methods need new payload?  Aren't they
>> >> > ram-typed?
>> >> 
>> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
>> >> either nocomp, or the compression-specific methods
>> >> (e.g. zlib_send_prepare).
>> >> 
>> >> How do we add methods for the upcoming new payload types? I don't expect
>> >> us to continue using nocomp and then do "if (ram)... else if
>> >> (device_state) ..." inside of them. I would expect us to rename
>> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
>> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
>> >> 
>> >> multifd_nocomp_ops -> multifd_ram_ops // rename
>> >> multifd_zlib_ops   // existing
>> >> multifd_device_ops // new
>> >> 
>> >> The challenge here is that the current framework is nocomp
>> >> vs. compression. It needs to become ram + compression vs. other types.
>> >
>> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
>> > that device state will need, and so far it's only (referring Maciej's
>> > code):
>> >
>> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
>> >                                             Error **errp)
>> > {
>> >     multifd_send_prepare_header_device_state(p);
>> >
>> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
>> >
>> >     p->next_packet_size = p->device_state->buf_len;
>> >     if (p->next_packet_size > 0) {
>> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
>> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
>> >         p->iovs_num++;
>> >     }
>> >
>> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
>> >
>> >     multifd_send_fill_packet_device_state(p);
>> >
>> >     return 0;
>> > }
>> >
>> > None of other multifd_ops are used.
>> 
>> There's also a conditional around device_state when calling
>> ->recv(). That could seems like it could go to a hook.
>> 
>> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>
> Actually that's exactly what I think is right.. it looks to me now that we
> could bypass anything in MultifdOps (including recv()) but let device state
> be a parallel layer of MultifdOps itself, leaving MultifdOps only for
> compressors.
>
> And yeah, I still remember you just renamed it from recv_pages() to
> recv()..  it's just that now when think it again it looks like cleaner to
> make it only about pages..
>
>> 
>> >
>> > I think we can directly invoke this part of device state code in
>> > multifd_send_thread() for now.  So far I think it should be ok.
>> 
>> It's not just that. There's also a check for "if (ram)" at every call to
>> multifd_ops to avoid calling the ram code when doing the device
>> migration. It would be way easier to just set noop functions for those.
>> 
>> static MultiFDMethods multifd_devstate_ops = {
>>     .send_setup = noop_send_setup,
>>     .send_cleanup = noop_send_cleanup,
>>     .send_prepare = devstate_send_prepare,
>>     .recv_setup = noop_recv_setup,
>>     .recv_cleanup = noop_recv_cleanup,
>>     .recv = devstate_recv
>> };
>> 
>> I'm not saying this needs to be done in this series though. But I do
>> think that's the correct design choice for the long term.
>
> Yes it should be separate.
>
> And what I meant is we don't need all these noops, but recv() keeps being
> ignored just like above, then for sender side, right now it's:
>
>             ret = multifd_send_state->ops->send_prepare(p, &local_err);
>             if (migrate_mapped_ram()) {
>                 file_write_ramblock_iov();
>             } else {
>                 ret = qio_channel_writev_full_all();
>             }
>
> VFIO can process device state in parallel, so:
>
>     if (ram) {
>         ret = multifd_send_state->ops->send_prepare(p, &local_err);
>         if (migrate_mapped_ram()) {
>                 file_write_ramblock_iov();
>         } else {
>                 qio_channel_writev_full_all();
>         }
>     } else {
>         // device state handling
>         multifd_send_device_prepare(...);
>         ...
>         qio_channel_writev_full_all();
>     }
>
> Then MultifdOps doesn't apply to device states.

To avoid getting into bikeshed territory, I think we should postpone
this discussion until after Maciej's series is merged, so we can speak
more concretely about the implications. It's easy enough to go from your
suggestion to mine than the other way around, so let's leave at that.

I had it already written, so more of my reasoning below, if you're
interested.
======

We already have the send/recv threads structured in relation to what we
do inside the hooks. You're just defining a function that's not a hook,
but it has the same signature and responsibilities and needs to be
called at the same moment.

I think the dissonance here is that you don't see the multifd thread
code and the payloads (ram, device) as separate layers. Payload-specific
code should not be at top level. Otherwise, it breaks any semblance of
proper layering:

- payload code will have access to MultiFD*Params, which has a bunch of
  control variables for the loop, the semaphores, etc. that should not
  be touched;

- payload code ends up influencing the flow of the thread
  function. E.g. when zero_copy_send used to dictate whether we'd have
  separate IO for the packet or not.

- temporary variables needed by the payload code will have to be
  declared inside the thread funcion, which makes tempting to use them
  across payload types and also in the thread code itself;

- it creates doubt as to whether new changes go inside the hooks, in the
  if/else or outside of it;

Think about how easy it has has been to review and merge the various
compression features we had. It doesn't matter how much they mess up
inside the hooks, it will never cause the dreaded "Memory content
inconsistency at ..." error from check_guest_ram(). At least not in a
way that affects other people. Now compare that with for instance the
zero-page work, or even mapped-ram, that required a bunch of changes to
the multifd control flow itself (e.g. all of the sync changes w/
mapped-ram).


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-23 17:48           ` Fabiano Rosas
@ 2024-07-23 18:20             ` Peter Xu
  2024-07-23 20:50               ` Fabiano Rosas
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Xu @ 2024-07-23 18:20 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Tue, Jul 23, 2024 at 02:48:48PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
> >> >> Peter Xu <peterx@redhat.com> writes:
> >> >> 
> >> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
> >> >> >> Hi,
> >> >> >> 
> >> >> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
> >> >> >> and moving only the extra slot. The major changes are in patches 5 and
> >> >> >> 9. Patch 3 introduces the structure:
> >> >> >> 
> >> >> >> typedef enum {
> >> >> >>     MULTIFD_PAYLOAD_NONE,
> >> >> >>     MULTIFD_PAYLOAD_RAM,
> >> >> >> } MultiFDPayloadType;
> >> >> >> 
> >> >> >> struct MultiFDSendData {
> >> >> >>     MultiFDPayloadType type;
> >> >> >>     union {
> >> >> >>         MultiFDPages_t ram;
> >> >> >>     } u;
> >> >> >> };
> >> >> >> 
> >> >> >> I added a NONE type so we can use it to tell when the channel has
> >> >> >> finished sending a packet, since we'll need to switch types between
> >> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
> >> >> >> variable.
> >> >> >
> >> >> > This at least looks better to me, thanks.
> >> >> >
> >> >> >> 
> >> >> >> WHAT'S MISSING:
> >> >> >> 
> >> >> >> - The support for calling multifd_send() concurrently. Maciej has this
> >> >> >>   in his series so I didn't touch it.
> >> >> >> 
> >> >> >> - A way of adding methods for the new payload type. Currently, the
> >> >> >>   compression methods are somewhat coupled with ram migration, so I'm
> >> >> >>   not sure how to proceed.
> >> >> >
> >> >> > What is this one?  Why compression methods need new payload?  Aren't they
> >> >> > ram-typed?
> >> >> 
> >> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
> >> >> either nocomp, or the compression-specific methods
> >> >> (e.g. zlib_send_prepare).
> >> >> 
> >> >> How do we add methods for the upcoming new payload types? I don't expect
> >> >> us to continue using nocomp and then do "if (ram)... else if
> >> >> (device_state) ..." inside of them. I would expect us to rename
> >> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
> >> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
> >> >> 
> >> >> multifd_nocomp_ops -> multifd_ram_ops // rename
> >> >> multifd_zlib_ops   // existing
> >> >> multifd_device_ops // new
> >> >> 
> >> >> The challenge here is that the current framework is nocomp
> >> >> vs. compression. It needs to become ram + compression vs. other types.
> >> >
> >> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
> >> > that device state will need, and so far it's only (referring Maciej's
> >> > code):
> >> >
> >> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
> >> >                                             Error **errp)
> >> > {
> >> >     multifd_send_prepare_header_device_state(p);
> >> >
> >> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
> >> >
> >> >     p->next_packet_size = p->device_state->buf_len;
> >> >     if (p->next_packet_size > 0) {
> >> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
> >> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
> >> >         p->iovs_num++;
> >> >     }
> >> >
> >> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
> >> >
> >> >     multifd_send_fill_packet_device_state(p);
> >> >
> >> >     return 0;
> >> > }
> >> >
> >> > None of other multifd_ops are used.
> >> 
> >> There's also a conditional around device_state when calling
> >> ->recv(). That could seems like it could go to a hook.
> >> 
> >> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
> >
> > Actually that's exactly what I think is right.. it looks to me now that we
> > could bypass anything in MultifdOps (including recv()) but let device state
> > be a parallel layer of MultifdOps itself, leaving MultifdOps only for
> > compressors.
> >
> > And yeah, I still remember you just renamed it from recv_pages() to
> > recv()..  it's just that now when think it again it looks like cleaner to
> > make it only about pages..
> >
> >> 
> >> >
> >> > I think we can directly invoke this part of device state code in
> >> > multifd_send_thread() for now.  So far I think it should be ok.
> >> 
> >> It's not just that. There's also a check for "if (ram)" at every call to
> >> multifd_ops to avoid calling the ram code when doing the device
> >> migration. It would be way easier to just set noop functions for those.
> >> 
> >> static MultiFDMethods multifd_devstate_ops = {
> >>     .send_setup = noop_send_setup,
> >>     .send_cleanup = noop_send_cleanup,
> >>     .send_prepare = devstate_send_prepare,
> >>     .recv_setup = noop_recv_setup,
> >>     .recv_cleanup = noop_recv_cleanup,
> >>     .recv = devstate_recv
> >> };
> >> 
> >> I'm not saying this needs to be done in this series though. But I do
> >> think that's the correct design choice for the long term.
> >
> > Yes it should be separate.
> >
> > And what I meant is we don't need all these noops, but recv() keeps being
> > ignored just like above, then for sender side, right now it's:
> >
> >             ret = multifd_send_state->ops->send_prepare(p, &local_err);
> >             if (migrate_mapped_ram()) {
> >                 file_write_ramblock_iov();
> >             } else {
> >                 ret = qio_channel_writev_full_all();
> >             }
> >
> > VFIO can process device state in parallel, so:
> >
> >     if (ram) {
> >         ret = multifd_send_state->ops->send_prepare(p, &local_err);
> >         if (migrate_mapped_ram()) {
> >                 file_write_ramblock_iov();
> >         } else {
> >                 qio_channel_writev_full_all();
> >         }
> >     } else {
> >         // device state handling
> >         multifd_send_device_prepare(...);
> >         ...
> >         qio_channel_writev_full_all();
> >     }
> >
> > Then MultifdOps doesn't apply to device states.
> 
> To avoid getting into bikeshed territory, I think we should postpone
> this discussion until after Maciej's series is merged, so we can speak
> more concretely about the implications. It's easy enough to go from your
> suggestion to mine than the other way around, so let's leave at that.
> 
> I had it already written, so more of my reasoning below, if you're
> interested.

I never thought this is bikeshedding.. What we're discussing now is exactly
what should appear in Maciej's code, am I right?  I thought we should
figure it out before it's merged, if that's the case..

And whose suggestion isn't that important, IMO.  We simply try to discuss
this technically and reach a consensus.. no matter who proposed what.

> ======
> 
> We already have the send/recv threads structured in relation to what we
> do inside the hooks. You're just defining a function that's not a hook,
> but it has the same signature and responsibilities and needs to be
> called at the same moment.
> 
> I think the dissonance here is that you don't see the multifd thread
> code and the payloads (ram, device) as separate layers. Payload-specific
> code should not be at top level. Otherwise, it breaks any semblance of
> proper layering:
> 
> - payload code will have access to MultiFD*Params, which has a bunch of
>   control variables for the loop, the semaphores, etc. that should not
>   be touched;
> 
> - payload code ends up influencing the flow of the thread
>   function. E.g. when zero_copy_send used to dictate whether we'd have
>   separate IO for the packet or not.
> 
> - temporary variables needed by the payload code will have to be
>   declared inside the thread funcion, which makes tempting to use them
>   across payload types and also in the thread code itself;
> 
> - it creates doubt as to whether new changes go inside the hooks, in the
>   if/else or outside of it;
> 
> Think about how easy it has has been to review and merge the various
> compression features we had. It doesn't matter how much they mess up
> inside the hooks, it will never cause the dreaded "Memory content
> inconsistency at ..." error from check_guest_ram(). At least not in a
> way that affects other people. Now compare that with for instance the
> zero-page work, or even mapped-ram, that required a bunch of changes to
> the multifd control flow itself (e.g. all of the sync changes w/
> mapped-ram).

I think there's one issue where we only support one MultiFDMethods as of
now to be active, while the "clients" of multifd can be >1 from payload
POV.  It means I'm not sure how VFIO can provide a MultiFDMethods if it
will overwrite what should be there to define how to process RAM..

Then, we should logically allow VFIO migration to happen with RAM being
compressed with ZSTD/ZLIB/whatever, right?  The question is which
MultiFDMethods we should assign if they're the same layer in this case..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-23 18:20             ` Peter Xu
@ 2024-07-23 20:50               ` Fabiano Rosas
  2024-07-23 21:11                 ` Peter Xu
  0 siblings, 1 reply; 35+ messages in thread
From: Fabiano Rosas @ 2024-07-23 20:50 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Maciej S . Szmigiero

Peter Xu <peterx@redhat.com> writes:

> On Tue, Jul 23, 2024 at 02:48:48PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Mon, Jul 22, 2024 at 06:20:28PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Mon, Jul 22, 2024 at 05:21:48PM -0300, Fabiano Rosas wrote:
>> >> >> Peter Xu <peterx@redhat.com> writes:
>> >> >> 
>> >> >> > On Mon, Jul 22, 2024 at 02:59:05PM -0300, Fabiano Rosas wrote:
>> >> >> >> Hi,
>> >> >> >> 
>> >> >> >> In this v2 I took Peter's suggestion of keeping the channels' pointers
>> >> >> >> and moving only the extra slot. The major changes are in patches 5 and
>> >> >> >> 9. Patch 3 introduces the structure:
>> >> >> >> 
>> >> >> >> typedef enum {
>> >> >> >>     MULTIFD_PAYLOAD_NONE,
>> >> >> >>     MULTIFD_PAYLOAD_RAM,
>> >> >> >> } MultiFDPayloadType;
>> >> >> >> 
>> >> >> >> struct MultiFDSendData {
>> >> >> >>     MultiFDPayloadType type;
>> >> >> >>     union {
>> >> >> >>         MultiFDPages_t ram;
>> >> >> >>     } u;
>> >> >> >> };
>> >> >> >> 
>> >> >> >> I added a NONE type so we can use it to tell when the channel has
>> >> >> >> finished sending a packet, since we'll need to switch types between
>> >> >> >> clients anyway. This avoids having to introduce a 'size', or 'free'
>> >> >> >> variable.
>> >> >> >
>> >> >> > This at least looks better to me, thanks.
>> >> >> >
>> >> >> >> 
>> >> >> >> WHAT'S MISSING:
>> >> >> >> 
>> >> >> >> - The support for calling multifd_send() concurrently. Maciej has this
>> >> >> >>   in his series so I didn't touch it.
>> >> >> >> 
>> >> >> >> - A way of adding methods for the new payload type. Currently, the
>> >> >> >>   compression methods are somewhat coupled with ram migration, so I'm
>> >> >> >>   not sure how to proceed.
>> >> >> >
>> >> >> > What is this one?  Why compression methods need new payload?  Aren't they
>> >> >> > ram-typed?
>> >> >> 
>> >> >> The data we transport is MultiFDPages_t, yes, but the MultiFDMethods are
>> >> >> either nocomp, or the compression-specific methods
>> >> >> (e.g. zlib_send_prepare).
>> >> >> 
>> >> >> How do we add methods for the upcoming new payload types? I don't expect
>> >> >> us to continue using nocomp and then do "if (ram)... else if
>> >> >> (device_state) ..." inside of them. I would expect us to rename
>> >> >> s/nocomp/ram/ and add a new set of MultiFDMethods for the new data type
>> >> >> (e.g. vfio_send_prepare, vmstate_send_prepare, etc).
>> >> >> 
>> >> >> multifd_nocomp_ops -> multifd_ram_ops // rename
>> >> >> multifd_zlib_ops   // existing
>> >> >> multifd_device_ops // new
>> >> >> 
>> >> >> The challenge here is that the current framework is nocomp
>> >> >> vs. compression. It needs to become ram + compression vs. other types.
>> >> >
>> >> > IMHO we can keep multifd_ops[] only for RAM.  There's only send_prepare()
>> >> > that device state will need, and so far it's only (referring Maciej's
>> >> > code):
>> >> >
>> >> > static int nocomp_send_prepare_device_state(MultiFDSendParams *p,
>> >> >                                             Error **errp)
>> >> > {
>> >> >     multifd_send_prepare_header_device_state(p);
>> >> >
>> >> >     assert(!(p->flags & MULTIFD_FLAG_SYNC));
>> >> >
>> >> >     p->next_packet_size = p->device_state->buf_len;
>> >> >     if (p->next_packet_size > 0) {
>> >> >         p->iov[p->iovs_num].iov_base = p->device_state->buf;
>> >> >         p->iov[p->iovs_num].iov_len = p->next_packet_size;
>> >> >         p->iovs_num++;
>> >> >     }
>> >> >
>> >> >     p->flags |= MULTIFD_FLAG_NOCOMP | MULTIFD_FLAG_DEVICE_STATE;
>> >> >
>> >> >     multifd_send_fill_packet_device_state(p);
>> >> >
>> >> >     return 0;
>> >> > }
>> >> >
>> >> > None of other multifd_ops are used.
>> >> 
>> >> There's also a conditional around device_state when calling
>> >> ->recv(). That could seems like it could go to a hook.
>> >> 
>> >> https://lore.kernel.org/r/41dedaf2c9abebb5e45f88c052daa26320715a92.1718717584.git.maciej.szmigiero@oracle.com
>> >
>> > Actually that's exactly what I think is right.. it looks to me now that we
>> > could bypass anything in MultifdOps (including recv()) but let device state
>> > be a parallel layer of MultifdOps itself, leaving MultifdOps only for
>> > compressors.
>> >
>> > And yeah, I still remember you just renamed it from recv_pages() to
>> > recv()..  it's just that now when think it again it looks like cleaner to
>> > make it only about pages..
>> >
>> >> 
>> >> >
>> >> > I think we can directly invoke this part of device state code in
>> >> > multifd_send_thread() for now.  So far I think it should be ok.
>> >> 
>> >> It's not just that. There's also a check for "if (ram)" at every call to
>> >> multifd_ops to avoid calling the ram code when doing the device
>> >> migration. It would be way easier to just set noop functions for those.
>> >> 
>> >> static MultiFDMethods multifd_devstate_ops = {
>> >>     .send_setup = noop_send_setup,
>> >>     .send_cleanup = noop_send_cleanup,
>> >>     .send_prepare = devstate_send_prepare,
>> >>     .recv_setup = noop_recv_setup,
>> >>     .recv_cleanup = noop_recv_cleanup,
>> >>     .recv = devstate_recv
>> >> };
>> >> 
>> >> I'm not saying this needs to be done in this series though. But I do
>> >> think that's the correct design choice for the long term.
>> >
>> > Yes it should be separate.
>> >
>> > And what I meant is we don't need all these noops, but recv() keeps being
>> > ignored just like above, then for sender side, right now it's:
>> >
>> >             ret = multifd_send_state->ops->send_prepare(p, &local_err);
>> >             if (migrate_mapped_ram()) {
>> >                 file_write_ramblock_iov();
>> >             } else {
>> >                 ret = qio_channel_writev_full_all();
>> >             }
>> >
>> > VFIO can process device state in parallel, so:
>> >
>> >     if (ram) {
>> >         ret = multifd_send_state->ops->send_prepare(p, &local_err);
>> >         if (migrate_mapped_ram()) {
>> >                 file_write_ramblock_iov();
>> >         } else {
>> >                 qio_channel_writev_full_all();
>> >         }
>> >     } else {
>> >         // device state handling
>> >         multifd_send_device_prepare(...);
>> >         ...
>> >         qio_channel_writev_full_all();
>> >     }
>> >
>> > Then MultifdOps doesn't apply to device states.
>> 
>> To avoid getting into bikeshed territory, I think we should postpone
>> this discussion until after Maciej's series is merged, so we can speak
>> more concretely about the implications. It's easy enough to go from your
>> suggestion to mine than the other way around, so let's leave at that.
>> 
>> I had it already written, so more of my reasoning below, if you're
>> interested.
>
> I never thought this is bikeshedding.. What we're discussing now is exactly
> what should appear in Maciej's code, am I right?  I thought we should
> figure it out before it's merged, if that's the case..

Yeah, but it should be functionally the same, so it shouldn't matter
whether we merge a solution now or leave it until after his series.

>
> And whose suggestion isn't that important, IMO.  We simply try to discuss
> this technically and reach a consensus.. no matter who proposed what.

Right, I'm just using a shorthand to avoid having to describe the
proposals every time. What I mean is it's easier to switch if/else with
function pointers than the other way around because adding the hooks
will also require some changes to the MultiFDMethods structure.

>
>> ======
>> 
>> We already have the send/recv threads structured in relation to what we
>> do inside the hooks. You're just defining a function that's not a hook,
>> but it has the same signature and responsibilities and needs to be
>> called at the same moment.
>> 
>> I think the dissonance here is that you don't see the multifd thread
>> code and the payloads (ram, device) as separate layers. Payload-specific
>> code should not be at top level. Otherwise, it breaks any semblance of
>> proper layering:
>> 
>> - payload code will have access to MultiFD*Params, which has a bunch of
>>   control variables for the loop, the semaphores, etc. that should not
>>   be touched;
>> 
>> - payload code ends up influencing the flow of the thread
>>   function. E.g. when zero_copy_send used to dictate whether we'd have
>>   separate IO for the packet or not.
>> 
>> - temporary variables needed by the payload code will have to be
>>   declared inside the thread funcion, which makes tempting to use them
>>   across payload types and also in the thread code itself;
>> 
>> - it creates doubt as to whether new changes go inside the hooks, in the
>>   if/else or outside of it;
>> 
>> Think about how easy it has has been to review and merge the various
>> compression features we had. It doesn't matter how much they mess up
>> inside the hooks, it will never cause the dreaded "Memory content
>> inconsistency at ..." error from check_guest_ram(). At least not in a
>> way that affects other people. Now compare that with for instance the
>> zero-page work, or even mapped-ram, that required a bunch of changes to
>> the multifd control flow itself (e.g. all of the sync changes w/
>> mapped-ram).
>
> I think there's one issue where we only support one MultiFDMethods as of
> now to be active, while the "clients" of multifd can be >1 from payload
> POV.  It means I'm not sure how VFIO can provide a MultiFDMethods if it
> will overwrite what should be there to define how to process RAM..
>
> Then, we should logically allow VFIO migration to happen with RAM being
> compressed with ZSTD/ZLIB/whatever, right?  The question is which
> MultiFDMethods we should assign if they're the same layer in this case..

The natural thing would be to put the hooks inside the data
type. Something like this:

struct MultiFDRecvData {
    MultiFDMethods *ops;  <---
    void *opaque;
    size_t size;
    /* for preadv */
    off_t file_offset;
};

struct MultiFDSendData {
    MultiFDPayloadType type;
    MultiFDMethods *ops;   <---
    union {
        MultiFDPages_t ram;
    } u;
};

multifd_ram_save_setup()
{
    multifd_ram_send = multifd_send_data_alloc();
    multifd_register_ops(multifd_ram_send, &multifd_ram_ops);
}

void multifd_register_ops(MultiFDSendData *data, MultiFDMethods *ops)
{
    if (data->type == RAM) {
        method = migrate_multifd_compression();
        assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
        ops = multifd_ops[method];
    }

    data->ops = ops;
}

I'm just not sure whether we have some compiler cleverness optimizing
these function pointer accesses due to multifd_send_state being static
and multifd_send_state->ops being unchanged throughout the
migration. But AFAICS, the proposal above is almost the same thing as we
already have.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages
  2024-07-23 20:50               ` Fabiano Rosas
@ 2024-07-23 21:11                 ` Peter Xu
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Xu @ 2024-07-23 21:11 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, Maciej S . Szmigiero

On Tue, Jul 23, 2024 at 05:50:24PM -0300, Fabiano Rosas wrote:
> The natural thing would be to put the hooks inside the data
> type. Something like this:
> 
> struct MultiFDRecvData {
>     MultiFDMethods *ops;  <---
>     void *opaque;
>     size_t size;
>     /* for preadv */
>     off_t file_offset;
> };
> 
> struct MultiFDSendData {
>     MultiFDPayloadType type;
>     MultiFDMethods *ops;   <---
>     union {
>         MultiFDPages_t ram;
>     } u;
> };
> 
> multifd_ram_save_setup()
> {
>     multifd_ram_send = multifd_send_data_alloc();
>     multifd_register_ops(multifd_ram_send, &multifd_ram_ops);
> }
> 
> void multifd_register_ops(MultiFDSendData *data, MultiFDMethods *ops)
> {
>     if (data->type == RAM) {
>         method = migrate_multifd_compression();
>         assert(0 < method && method < MULTIFD_COMPRESSION__MAX);
>         ops = multifd_ops[method];
>     }
> 
>     data->ops = ops;
> }
> 
> I'm just not sure whether we have some compiler cleverness optimizing
> these function pointer accesses due to multifd_send_state being static
> and multifd_send_state->ops being unchanged throughout the
> migration. But AFAICS, the proposal above is almost the same thing as we
> already have.

Right, this looks as pretty when you can put the ops inside, and iff the
perf goes as well. E.g., you'll need to register the Ops for each batch
too, besides the pointer jumps.  You'll also need to check the hooks one by
one, even if we know most of them will be noop at least for VFIO.

IMHO it's a matter of whether the Ops is useful to VFIO / other devices
first in the forseeable future.  My current gut feeling is they're mostly
not usable.. while supporting device state with an opaque buffer to send /
recv, plus a pretty static device state protocol seems relatively easy to
do.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-07-24  9:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 17:59 [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Fabiano Rosas
2024-07-22 17:59 ` [RFC PATCH v2 1/9] migration/multifd: Reduce access to p->pages Fabiano Rosas
2024-07-22 17:59 ` [RFC PATCH v2 2/9] migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iov Fabiano Rosas
2024-07-22 19:21   ` Peter Xu
2024-07-22 17:59 ` [RFC PATCH v2 3/9] migration/multifd: Introduce MultiFDSendData Fabiano Rosas
2024-07-22 19:22   ` Peter Xu
2024-07-22 17:59 ` [RFC PATCH v2 4/9] migration/multifd: Make MultiFDPages_t:offset a flexible array member Fabiano Rosas
2024-07-22 19:20   ` Peter Xu
2024-07-22 20:36     ` Fabiano Rosas
2024-07-22 17:59 ` [RFC PATCH v2 5/9] migration/multifd: Replace p->pages with an union pointer Fabiano Rosas
2024-07-22 17:59 ` [RFC PATCH v2 6/9] migration/multifd: Move pages accounting into multifd_send_zero_page_detect() Fabiano Rosas
2024-07-22 19:29   ` Peter Xu
2024-07-22 20:07     ` Fabiano Rosas
2024-07-22 20:38       ` Peter Xu
2024-07-22 17:59 ` [RFC PATCH v2 7/9] migration/multifd: Isolate ram pages packet data Fabiano Rosas
2024-07-22 19:37   ` Peter Xu
2024-07-22 20:34     ` Fabiano Rosas
2024-07-22 21:06       ` Peter Xu
2024-07-24  9:30         ` Maciej S. Szmigiero
2024-07-22 17:59 ` [RFC PATCH v2 8/9] migration/multifd: Don't send ram data during SYNC Fabiano Rosas
2024-07-22 21:03   ` Peter Xu
2024-07-22 21:24     ` Fabiano Rosas
2024-07-22 17:59 ` [RFC PATCH v2 9/9] migration/multifd: Replace multifd_send_state->pages with client data Fabiano Rosas
2024-07-22 19:55   ` Peter Xu
2024-07-22 20:26     ` Fabiano Rosas
2024-07-22 20:41       ` Peter Xu
2024-07-22 19:58 ` [RFC PATCH v2 0/9] migration/multifd: Remove multifd_send_state->pages Peter Xu
2024-07-22 20:21   ` Fabiano Rosas
2024-07-22 20:54     ` Peter Xu
2024-07-22 21:20       ` Fabiano Rosas
2024-07-22 23:01         ` Peter Xu
2024-07-23 17:48           ` Fabiano Rosas
2024-07-23 18:20             ` Peter Xu
2024-07-23 20:50               ` Fabiano Rosas
2024-07-23 21:11                 ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).