* [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush
@ 2024-12-06 22:47 Peter Xu
2024-12-06 22:47 ` [PATCH v3 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
` (7 more replies)
0 siblings, 8 replies; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
(note: it's a pipeline of two patchsets, to save CI credits and time)
v1: https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com
v2: https://lore.kernel.org/r/20241206005834.1050905-1-peterx@redhat.com
v3 changelog:
- R-bs collected
- Update commit message of patch 1 [Fabiano]
- English updates [Fabiano]
- Update comment for MULTIFD_SYNC_ALL [Fabiano]
- In multifd_send_sync_main(), assert on req type [Fabiano]
- Some more comments and cleanup for RAM_SAVE_FLAG_* movement [Fabiano]
- Update the last document patch [Fabiano]
This series provides some changes that may be helpful for either VFIO or
postcopy integration on top of multifd.
For VFIO, only patches 1 & 2 are relevant.
For postcopy, it's about patches 3-7, but it needs to be based on 1+2
because of a context dependency.
All these patches can be seen as cleanups / slight optimizations on top of
master branch with/without the VFIO/postcopy work.
Besides CI, qtests, and some real-world multifd tests just to monitor the
sync events happen all correct, I made sure to cover 7.2 machine type
(which uses the legacy sync) so it still works as before - basically sync
will be more frequent, but all thing keeps working smoothly so far.
Thanks,
Peter Xu (7):
migration/multifd: Further remove the SYNC on complete
migration/multifd: Allow to sync with sender threads only
migration/ram: Move RAM_SAVE_FLAG* into ram.h
migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages
migration/multifd: Remove sync processing on postcopy
migration/multifd: Cleanup src flushes on condition check
migration/multifd: Document the reason to sync for save_setup()
migration/multifd.h | 27 ++++++++++--
migration/ram.h | 28 ++++++++++++
migration/rdma.h | 7 ---
migration/multifd-nocomp.c | 74 ++++++++++++++++++++++++++++++-
migration/multifd.c | 17 +++++---
migration/ram.c | 89 +++++++++++++++++---------------------
6 files changed, 173 insertions(+), 69 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/7] migration/multifd: Further remove the SYNC on complete
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-06 22:47 ` [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
ram_save_complete(), because the sync on the destination side is not
needed due to the last iteration of find_dirty_block() having already
done it.
However, that commit overlooked that multifd_ram_flush_and_sync() on the
source side is also not needed at ram_save_complete(), for the same
reason.
Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
multifd_ram_flush_and_sync() means that currently the recv threads will
hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
received.
Luckily, multifd is still all working fine because recv side cleanup
code (mostly multifd_recv_sync_main()) is smart enough to make sure even
if recv threads are stuck at SYNC it'll get kicked out. And since this
is the completion phase of migration, nothing else will be sent after
the SYNCs.
This needs to be fixed because in the future VFIO will have data to push
after ram_save_complete() and we don't want the recv thread to be stuck
in the MULTIFD_FLAG_SYNC message.
Remove the unnecessary (and buggy) invocation of
multifd_ram_flush_and_sync().
For very old binaries (multifd_flush_after_each_section==true), the
flush_and_sync is still needed because each EOS received on destination
will enforce all-channel sync once.
Stable branches do not need this patch, as no real bug I can think of
that will go wrong there.. so not attaching Fixes to be clear on the
backport not needed.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 05ff9eb328..7284c34bd8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
}
}
- ret = multifd_ram_flush_and_sync();
- if (ret < 0) {
- return ret;
+ if (migrate_multifd() &&
+ migrate_multifd_flush_after_each_section()) {
+ /*
+ * Only the old dest QEMU will need this sync, because each EOS
+ * will require one SYNC message on each channel.
+ */
+ ret = multifd_ram_flush_and_sync();
+ if (ret < 0) {
+ return ret;
+ }
}
if (migrate_mapped_ram()) {
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
2024-12-06 22:47 ` [PATCH v3 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-09 20:52 ` Fabiano Rosas
2024-12-06 22:47 ` [PATCH v3 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h Peter Xu
` (5 subsequent siblings)
7 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
Teach multifd_send_sync_main() to sync with threads only.
We already have such requests, which is when mapped-ram is enabled with
multifd. In that case, no SYNC messages will be pushed to the stream when
multifd syncs the sender threads because there's no destination threads
waiting for that. The whole point of the sync is to make sure all threads
finished their jobs.
So fundamentally we have a request to do the sync in different ways:
- Either to sync the threads only,
- Or to sync the threads but also with the destination side.
Mapped-ram did it already because of the use_packet check in the sync
handler of the sender thread. It works.
However it may stop working when e.g. VFIO may start to reuse multifd
channels to push device states. In that case VFIO has similar request on
"thread-only sync" however we can't check a flag because such sync request
can still come from RAM which needs the on-wire notifications.
Paving way for that by allowing the multifd_send_sync_main() to specify
what kind of sync the caller needs. We can use it for mapped-ram already.
No functional change intended.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/multifd.h | 23 ++++++++++++++++++++---
migration/multifd-nocomp.c | 7 ++++++-
migration/multifd.c | 17 +++++++++++------
3 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/migration/multifd.h b/migration/multifd.h
index 50d58c0c9c..6493512305 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -19,6 +19,22 @@
typedef struct MultiFDRecvData MultiFDRecvData;
typedef struct MultiFDSendData MultiFDSendData;
+typedef enum {
+ /* No sync request */
+ MULTIFD_SYNC_NONE = 0,
+ /* Sync locally on the sender threads without pushing messages */
+ MULTIFD_SYNC_LOCAL,
+ /*
+ * Sync not only on the sender threads, but also push MULTIFD_FLAG_SYNC
+ * message to the wire for each iochannel (which is for a remote sync).
+ *
+ * When remote sync is used, need to be paired with a follow up
+ * RAM_SAVE_FLAG_EOS / RAM_SAVE_FLAG_MULTIFD_FLUSH message on the main
+ * channel.
+ */
+ MULTIFD_SYNC_ALL,
+} MultiFDSyncReq;
+
bool multifd_send_setup(void);
void multifd_send_shutdown(void);
void multifd_send_channel_created(void);
@@ -28,7 +44,7 @@ void multifd_recv_shutdown(void);
bool multifd_recv_all_channels_created(void);
void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
void multifd_recv_sync_main(void);
-int multifd_send_sync_main(void);
+int multifd_send_sync_main(MultiFDSyncReq req);
bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
bool multifd_recv(void);
MultiFDRecvData *multifd_get_recv_data(void);
@@ -143,7 +159,7 @@ typedef struct {
/* multifd flags for each packet */
uint32_t flags;
/*
- * The sender thread has work to do if either of below boolean is set.
+ * The sender thread has work to do if either of below field is set.
*
* @pending_job: a job is pending
* @pending_sync: a sync request is pending
@@ -152,7 +168,8 @@ typedef struct {
* cleared by the multifd sender threads.
*/
bool pending_job;
- bool pending_sync;
+ MultiFDSyncReq pending_sync;
+
MultiFDSendData *data;
/* thread local variables. No locking required */
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 55191152f9..219f9e58ef 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -345,6 +345,8 @@ retry:
int multifd_ram_flush_and_sync(void)
{
+ MultiFDSyncReq req;
+
if (!migrate_multifd()) {
return 0;
}
@@ -356,7 +358,10 @@ int multifd_ram_flush_and_sync(void)
}
}
- return multifd_send_sync_main();
+ /* File migrations only need to sync with threads */
+ req = migrate_mapped_ram() ? MULTIFD_SYNC_LOCAL : MULTIFD_SYNC_ALL;
+
+ return multifd_send_sync_main(req);
}
bool multifd_send_prepare_common(MultiFDSendParams *p)
diff --git a/migration/multifd.c b/migration/multifd.c
index 498e71fd10..7ecc3964ee 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -523,11 +523,13 @@ static int multifd_zero_copy_flush(QIOChannel *c)
return ret;
}
-int multifd_send_sync_main(void)
+int multifd_send_sync_main(MultiFDSyncReq req)
{
int i;
bool flush_zero_copy;
+ assert(req != MULTIFD_SYNC_NONE);
+
flush_zero_copy = migrate_zero_copy_send();
for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -543,8 +545,8 @@ int multifd_send_sync_main(void)
* We should be the only user so far, so not possible to be set by
* others concurrently.
*/
- assert(qatomic_read(&p->pending_sync) == false);
- qatomic_set(&p->pending_sync, true);
+ assert(qatomic_read(&p->pending_sync) == MULTIFD_SYNC_NONE);
+ qatomic_set(&p->pending_sync, req);
qemu_sem_post(&p->sem);
}
for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -635,14 +637,17 @@ static void *multifd_send_thread(void *opaque)
*/
qatomic_store_release(&p->pending_job, false);
} else {
+ MultiFDSyncReq req = qatomic_read(&p->pending_sync);
+
/*
* If not a normal job, must be a sync request. Note that
* pending_sync is a standalone flag (unlike pending_job), so
* it doesn't require explicit memory barriers.
*/
- assert(qatomic_read(&p->pending_sync));
+ assert(req != MULTIFD_SYNC_NONE);
- if (use_packets) {
+ /* Only push the SYNC message if it involves a remote sync */
+ if (req == MULTIFD_SYNC_ALL) {
p->flags = MULTIFD_FLAG_SYNC;
multifd_send_fill_packet(p);
ret = qio_channel_write_all(p->c, (void *)p->packet,
@@ -654,7 +659,7 @@ static void *multifd_send_thread(void *opaque)
stat64_add(&mig_stats.multifd_bytes, p->packet_len);
}
- qatomic_set(&p->pending_sync, false);
+ qatomic_set(&p->pending_sync, MULTIFD_SYNC_NONE);
qemu_sem_post(&p->sem_sync);
}
}
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
2024-12-06 22:47 ` [PATCH v3 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
2024-12-06 22:47 ` [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-06 22:47 ` [PATCH v3 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Peter Xu
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
Firstly, we're going to use the multifd flag soon in multifd code, so ram.c
isn't gonna work.
Secondly, we have a separate RDMA flag dangling around, which is definitely
not obvious. There's one comment that helps, but not too much.
Put all RAM save flags altogether, so nothing will get overlooked.
Add a section explain why we can't use bits over 0x200.
Remove RAM_SAVE_FLAG_FULL as it's already not used in QEMU, as the comment
explained.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.h | 28 ++++++++++++++++++++++++++++
migration/rdma.h | 7 -------
migration/ram.c | 21 ---------------------
3 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/migration/ram.h b/migration/ram.h
index 0d1981f888..921c39a2c5 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -33,6 +33,34 @@
#include "exec/cpu-common.h"
#include "io/channel.h"
+/*
+ * RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it
+ * worked for pages that were filled with the same char. We switched
+ * it to only search for the zero value. And to avoid confusion with
+ * RAM_SAVE_FLAG_COMPRESS_PAGE just rename it.
+ *
+ * RAM_SAVE_FLAG_FULL (0x01) was obsoleted in 2009.
+ *
+ * RAM_SAVE_FLAG_COMPRESS_PAGE (0x100) was removed in QEMU 9.1.
+ *
+ * RAM_SAVE_FLAG_HOOK is only used in RDMA. Whenever this is found in the
+ * data stream, the flags will be passed to rdma functions in the
+ * incoming-migration side.
+ *
+ * We can't use any flag that is bigger than 0x200, because the flags are
+ * always assumed to be encoded in a ramblock address offset, which is
+ * multiple of PAGE_SIZE. Here it means QEMU supports migration with any
+ * architecture that has PAGE_SIZE>=1K (0x400).
+ */
+#define RAM_SAVE_FLAG_ZERO 0x002
+#define RAM_SAVE_FLAG_MEM_SIZE 0x004
+#define RAM_SAVE_FLAG_PAGE 0x008
+#define RAM_SAVE_FLAG_EOS 0x010
+#define RAM_SAVE_FLAG_CONTINUE 0x020
+#define RAM_SAVE_FLAG_XBZRLE 0x040
+#define RAM_SAVE_FLAG_HOOK 0x080
+#define RAM_SAVE_FLAG_MULTIFD_FLUSH 0x200
+
extern XBZRLECacheStats xbzrle_counters;
/* Should be holding either ram_list.mutex, or the RCU lock. */
diff --git a/migration/rdma.h b/migration/rdma.h
index a8d27f33b8..f55f28bbed 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -33,13 +33,6 @@ void rdma_start_incoming_migration(InetSocketAddress *host_port, Error **errp);
#define RAM_CONTROL_ROUND 1
#define RAM_CONTROL_FINISH 3
-/*
- * Whenever this is found in the data stream, the flags
- * will be passed to rdma functions in the incoming-migration
- * side.
- */
-#define RAM_SAVE_FLAG_HOOK 0x80
-
#define RAM_SAVE_CONTROL_NOT_SUPP -1000
#define RAM_SAVE_CONTROL_DELAYED -2000
diff --git a/migration/ram.c b/migration/ram.c
index 7284c34bd8..44010ff325 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -71,27 +71,6 @@
/***********************************************************/
/* ram save/restore */
-/*
- * RAM_SAVE_FLAG_ZERO used to be named RAM_SAVE_FLAG_COMPRESS, it
- * worked for pages that were filled with the same char. We switched
- * it to only search for the zero value. And to avoid confusion with
- * RAM_SAVE_FLAG_COMPRESS_PAGE just rename it.
- *
- * RAM_SAVE_FLAG_FULL was obsoleted in 2009.
- *
- * RAM_SAVE_FLAG_COMPRESS_PAGE (0x100) was removed in QEMU 9.1.
- */
-#define RAM_SAVE_FLAG_FULL 0x01
-#define RAM_SAVE_FLAG_ZERO 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS 0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-#define RAM_SAVE_FLAG_XBZRLE 0x40
-/* 0x80 is reserved in rdma.h for RAM_SAVE_FLAG_HOOK */
-#define RAM_SAVE_FLAG_MULTIFD_FLUSH 0x200
-/* We can't use any flag that is bigger than 0x200 */
-
/*
* mapped-ram migration supports O_DIRECT, so we need to make sure the
* userspace buffer, the IO operation size and the file offset are
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
` (2 preceding siblings ...)
2024-12-06 22:47 ` [PATCH v3 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-06 22:47 ` [PATCH v3 5/7] migration/multifd: Remove sync processing on postcopy Peter Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync
request on src. Unify such message into one place, and conditionally send
the message only if necessary.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/multifd.h | 2 +-
migration/multifd-nocomp.c | 27 +++++++++++++++++++++++++--
migration/ram.c | 18 ++++--------------
3 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/migration/multifd.h b/migration/multifd.h
index 6493512305..0fef431f6b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -354,7 +354,7 @@ static inline uint32_t multifd_ram_page_count(void)
void multifd_ram_save_setup(void);
void multifd_ram_save_cleanup(void);
-int multifd_ram_flush_and_sync(void);
+int multifd_ram_flush_and_sync(QEMUFile *f);
size_t multifd_ram_payload_size(void);
void multifd_ram_fill_packet(MultiFDSendParams *p);
int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp);
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 219f9e58ef..58372db0f4 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -20,6 +20,7 @@
#include "qemu/cutils.h"
#include "qemu/error-report.h"
#include "trace.h"
+#include "qemu-file.h"
static MultiFDSendData *multifd_ram_send;
@@ -343,9 +344,10 @@ retry:
return true;
}
-int multifd_ram_flush_and_sync(void)
+int multifd_ram_flush_and_sync(QEMUFile *f)
{
MultiFDSyncReq req;
+ int ret;
if (!migrate_multifd()) {
return 0;
@@ -361,7 +363,28 @@ int multifd_ram_flush_and_sync(void)
/* File migrations only need to sync with threads */
req = migrate_mapped_ram() ? MULTIFD_SYNC_LOCAL : MULTIFD_SYNC_ALL;
- return multifd_send_sync_main(req);
+ ret = multifd_send_sync_main(req);
+ if (ret) {
+ return ret;
+ }
+
+ /* If we don't need to sync with remote at all, nothing else to do */
+ if (req == MULTIFD_SYNC_LOCAL) {
+ return 0;
+ }
+
+ /*
+ * Old QEMUs don't understand RAM_SAVE_FLAG_MULTIFD_FLUSH, it relies
+ * on RAM_SAVE_FLAG_EOS instead.
+ */
+ if (migrate_multifd_flush_after_each_section()) {
+ return 0;
+ }
+
+ qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
+ qemu_fflush(f);
+
+ return 0;
}
bool multifd_send_prepare_common(MultiFDSendParams *p)
diff --git a/migration/ram.c b/migration/ram.c
index 44010ff325..90811aabd4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1306,15 +1306,10 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
(!migrate_multifd_flush_after_each_section() ||
migrate_mapped_ram())) {
QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
- int ret = multifd_ram_flush_and_sync();
+ int ret = multifd_ram_flush_and_sync(f);
if (ret < 0) {
return ret;
}
-
- if (!migrate_mapped_ram()) {
- qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
- qemu_fflush(f);
- }
}
/* Hit the end of the list */
@@ -3044,18 +3039,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
}
bql_unlock();
- ret = multifd_ram_flush_and_sync();
+ ret = multifd_ram_flush_and_sync(f);
bql_lock();
if (ret < 0) {
error_setg(errp, "%s: multifd synchronization failed", __func__);
return ret;
}
- if (migrate_multifd() && !migrate_multifd_flush_after_each_section()
- && !migrate_mapped_ram()) {
- qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
- }
-
qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
ret = qemu_fflush(f);
if (ret < 0) {
@@ -3190,7 +3180,7 @@ out:
if (ret >= 0 && migration_is_running()) {
if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
!migrate_mapped_ram()) {
- ret = multifd_ram_flush_and_sync();
+ ret = multifd_ram_flush_and_sync(f);
if (ret < 0) {
return ret;
}
@@ -3268,7 +3258,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
* Only the old dest QEMU will need this sync, because each EOS
* will require one SYNC message on each channel.
*/
- ret = multifd_ram_flush_and_sync();
+ ret = multifd_ram_flush_and_sync(f);
if (ret < 0) {
return ret;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 5/7] migration/multifd: Remove sync processing on postcopy
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
` (3 preceding siblings ...)
2024-12-06 22:47 ` [PATCH v3 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-06 22:47 ` [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
Multifd never worked with postcopy, at least yet so far.
Remove the sync processing there, because it's confusing, and they should
never appear. Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard
instead of trying to invoke multifd code.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 90811aabd4..154ff5abd4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3772,15 +3772,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
TARGET_PAGE_SIZE);
}
break;
- case RAM_SAVE_FLAG_MULTIFD_FLUSH:
- multifd_recv_sync_main();
- break;
case RAM_SAVE_FLAG_EOS:
- /* normal exit */
- if (migrate_multifd() &&
- migrate_multifd_flush_after_each_section()) {
- multifd_recv_sync_main();
- }
break;
default:
error_report("Unknown combination of migration flags: 0x%x"
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
` (4 preceding siblings ...)
2024-12-06 22:47 ` [PATCH v3 5/7] migration/multifd: Remove sync processing on postcopy Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-09 20:55 ` Fabiano Rosas
2024-12-06 22:47 ` [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
2024-12-17 15:26 ` [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Fabiano Rosas
7 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
The src flush condition check is over complicated, and it's getting more
out of control if postcopy will be involved.
In general, we have two modes to do the sync: legacy or modern ways.
Legacy uses per-section flush, modern uses per-round flush.
Mapped-ram always uses the modern, which is per-round.
Introduce two helpers, which can greatly simplify the code, and hopefully
make it readable again.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/multifd.h | 2 ++
migration/multifd-nocomp.c | 42 ++++++++++++++++++++++++++++++++++++++
migration/ram.c | 10 +++------
3 files changed, 47 insertions(+), 7 deletions(-)
diff --git a/migration/multifd.h b/migration/multifd.h
index 0fef431f6b..bd785b9873 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -355,6 +355,8 @@ static inline uint32_t multifd_ram_page_count(void)
void multifd_ram_save_setup(void);
void multifd_ram_save_cleanup(void);
int multifd_ram_flush_and_sync(QEMUFile *f);
+bool multifd_ram_sync_per_round(void);
+bool multifd_ram_sync_per_section(void);
size_t multifd_ram_payload_size(void);
void multifd_ram_fill_packet(MultiFDSendParams *p);
int multifd_ram_unfill_packet(MultiFDRecvParams *p, Error **errp);
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 58372db0f4..c1f686c0ce 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -344,6 +344,48 @@ retry:
return true;
}
+/*
+ * We have two modes for multifd flushes:
+ *
+ * - Per-section mode: this is the legacy way to flush, it requires one
+ * MULTIFD_FLAG_SYNC message for each RAM_SAVE_FLAG_EOS.
+ *
+ * - Per-round mode: this is the modern way to flush, it requires one
+ * MULTIFD_FLAG_SYNC message only for each round of RAM scan. Normally
+ * it's paired with a new RAM_SAVE_FLAG_MULTIFD_FLUSH message in network
+ * based migrations.
+ *
+ * One thing to mention is mapped-ram always use the modern way to sync.
+ */
+
+/* Do we need a per-section multifd flush (legacy way)? */
+bool multifd_ram_sync_per_section(void)
+{
+ if (!migrate_multifd()) {
+ return false;
+ }
+
+ if (migrate_mapped_ram()) {
+ return false;
+ }
+
+ return migrate_multifd_flush_after_each_section();
+}
+
+/* Do we need a per-round multifd flush (modern way)? */
+bool multifd_ram_sync_per_round(void)
+{
+ if (!migrate_multifd()) {
+ return false;
+ }
+
+ if (migrate_mapped_ram()) {
+ return true;
+ }
+
+ return !migrate_multifd_flush_after_each_section();
+}
+
int multifd_ram_flush_and_sync(QEMUFile *f)
{
MultiFDSyncReq req;
diff --git a/migration/ram.c b/migration/ram.c
index 154ff5abd4..5d4bdefe69 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1302,9 +1302,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
pss->page = 0;
pss->block = QLIST_NEXT_RCU(pss->block, next);
if (!pss->block) {
- if (migrate_multifd() &&
- (!migrate_multifd_flush_after_each_section() ||
- migrate_mapped_ram())) {
+ if (multifd_ram_sync_per_round()) {
QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
int ret = multifd_ram_flush_and_sync(f);
if (ret < 0) {
@@ -3178,8 +3176,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
out:
if (ret >= 0 && migration_is_running()) {
- if (migrate_multifd() && migrate_multifd_flush_after_each_section() &&
- !migrate_mapped_ram()) {
+ if (multifd_ram_sync_per_section()) {
ret = multifd_ram_flush_and_sync(f);
if (ret < 0) {
return ret;
@@ -3252,8 +3249,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
}
}
- if (migrate_multifd() &&
- migrate_multifd_flush_after_each_section()) {
+ if (multifd_ram_sync_per_section()) {
/*
* Only the old dest QEMU will need this sync, because each EOS
* will require one SYNC message on each channel.
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup()
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
` (5 preceding siblings ...)
2024-12-06 22:47 ` [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
@ 2024-12-06 22:47 ` Peter Xu
2024-12-09 20:56 ` Fabiano Rosas
2024-12-17 15:26 ` [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Fabiano Rosas
7 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2024-12-06 22:47 UTC (permalink / raw)
To: qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon, Fabiano Rosas
It's not straightforward to see why src QEMU needs to sync multifd during
setup() phase. After all, there's no page queued at that point.
For old QEMUs, there's a solid reason: EOS requires it to work. While it's
clueless on the new QEMUs which do not take EOS message as sync requests.
One will figure that out only when this is conditionally removed. In fact,
the author did try it out. Logically we could still avoid doing this on
new machine types, however that needs a separate compat field and that can
be an overkill in some trivial overhead in setup() phase.
Let's instead document it completely, to avoid someone else tries this
again and do the debug one more time, or anyone confused on why this ever
existed.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/migration/ram.c b/migration/ram.c
index 5d4bdefe69..e5c590b259 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3036,6 +3036,31 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
migration_ops->ram_save_target_page = ram_save_target_page_legacy;
}
+ /*
+ * This operation is unfortunate..
+ *
+ * For legacy QEMUs using per-section sync
+ * =======================================
+ *
+ * This must exist because the EOS below requires the SYNC messages
+ * per-channel to work.
+ *
+ * For modern QEMUs using per-round sync
+ * =====================================
+ *
+ * Logically such sync is not needed, and recv threads should not run
+ * until setup ready (using things like channels_ready on src). Then
+ * we should be all fine.
+ *
+ * However even if we add channels_ready to recv side in new QEMUs, old
+ * QEMU won't have them so this sync will still be needed to make sure
+ * multifd recv threads won't start processing guest pages early before
+ * ram_load_setup() is properly done.
+ *
+ * Let's stick with this. Fortunately the overhead is low to sync
+ * during setup because the VM is running, so at least it's not
+ * accounted as part of downtime.
+ */
bql_unlock();
ret = multifd_ram_flush_and_sync(f);
bql_lock();
--
2.47.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only
2024-12-06 22:47 ` [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
@ 2024-12-09 20:52 ` Fabiano Rosas
0 siblings, 0 replies; 12+ messages in thread
From: Fabiano Rosas @ 2024-12-09 20:52 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon
Peter Xu <peterx@redhat.com> writes:
> Teach multifd_send_sync_main() to sync with threads only.
>
> We already have such requests, which is when mapped-ram is enabled with
> multifd. In that case, no SYNC messages will be pushed to the stream when
> multifd syncs the sender threads because there's no destination threads
> waiting for that. The whole point of the sync is to make sure all threads
> finished their jobs.
>
> So fundamentally we have a request to do the sync in different ways:
>
> - Either to sync the threads only,
> - Or to sync the threads but also with the destination side.
>
> Mapped-ram did it already because of the use_packet check in the sync
> handler of the sender thread. It works.
>
> However it may stop working when e.g. VFIO may start to reuse multifd
> channels to push device states. In that case VFIO has similar request on
> "thread-only sync" however we can't check a flag because such sync request
> can still come from RAM which needs the on-wire notifications.
>
> Paving way for that by allowing the multifd_send_sync_main() to specify
> what kind of sync the caller needs. We can use it for mapped-ram already.
>
> No functional change intended.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check
2024-12-06 22:47 ` [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
@ 2024-12-09 20:55 ` Fabiano Rosas
0 siblings, 0 replies; 12+ messages in thread
From: Fabiano Rosas @ 2024-12-09 20:55 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon
Peter Xu <peterx@redhat.com> writes:
> The src flush condition check is over complicated, and it's getting more
> out of control if postcopy will be involved.
>
> In general, we have two modes to do the sync: legacy or modern ways.
> Legacy uses per-section flush, modern uses per-round flush.
>
> Mapped-ram always uses the modern, which is per-round.
>
> Introduce two helpers, which can greatly simplify the code, and hopefully
> make it readable again.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup()
2024-12-06 22:47 ` [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
@ 2024-12-09 20:56 ` Fabiano Rosas
0 siblings, 0 replies; 12+ messages in thread
From: Fabiano Rosas @ 2024-12-09 20:56 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon
Peter Xu <peterx@redhat.com> writes:
> It's not straightforward to see why src QEMU needs to sync multifd during
> setup() phase. After all, there's no page queued at that point.
>
> For old QEMUs, there's a solid reason: EOS requires it to work. While it's
> clueless on the new QEMUs which do not take EOS message as sync requests.
>
> One will figure that out only when this is conditionally removed. In fact,
> the author did try it out. Logically we could still avoid doing this on
> new machine types, however that needs a separate compat field and that can
> be an overkill in some trivial overhead in setup() phase.
>
> Let's instead document it completely, to avoid someone else tries this
> again and do the debug one more time, or anyone confused on why this ever
> existed.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
` (6 preceding siblings ...)
2024-12-06 22:47 ` [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
@ 2024-12-17 15:26 ` Fabiano Rosas
7 siblings, 0 replies; 12+ messages in thread
From: Fabiano Rosas @ 2024-12-17 15:26 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Prasad Pandit, Maciej S . Szmigiero, Cédric Le Goater,
Alex Williamson, peterx, Avihai Horon
Peter Xu <peterx@redhat.com> writes:
> CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
> (note: it's a pipeline of two patchsets, to save CI credits and time)
>
> v1: https://lore.kernel.org/r/20241205185303.897010-1-peterx@redhat.com
> v2: https://lore.kernel.org/r/20241206005834.1050905-1-peterx@redhat.com
>
> v3 changelog:
> - R-bs collected
> - Update commit message of patch 1 [Fabiano]
> - English updates [Fabiano]
> - Update comment for MULTIFD_SYNC_ALL [Fabiano]
> - In multifd_send_sync_main(), assert on req type [Fabiano]
> - Some more comments and cleanup for RAM_SAVE_FLAG_* movement [Fabiano]
> - Update the last document patch [Fabiano]
>
> This series provides some changes that may be helpful for either VFIO or
> postcopy integration on top of multifd.
>
> For VFIO, only patches 1 & 2 are relevant.
>
> For postcopy, it's about patches 3-7, but it needs to be based on 1+2
> because of a context dependency.
>
> All these patches can be seen as cleanups / slight optimizations on top of
> master branch with/without the VFIO/postcopy work.
>
> Besides CI, qtests, and some real-world multifd tests just to monitor the
> sync events happen all correct, I made sure to cover 7.2 machine type
> (which uses the legacy sync) so it still works as before - basically sync
> will be more frequent, but all thing keeps working smoothly so far.
>
> Thanks,
>
> Peter Xu (7):
> migration/multifd: Further remove the SYNC on complete
> migration/multifd: Allow to sync with sender threads only
> migration/ram: Move RAM_SAVE_FLAG* into ram.h
> migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages
> migration/multifd: Remove sync processing on postcopy
> migration/multifd: Cleanup src flushes on condition check
> migration/multifd: Document the reason to sync for save_setup()
>
> migration/multifd.h | 27 ++++++++++--
> migration/ram.h | 28 ++++++++++++
> migration/rdma.h | 7 ---
> migration/multifd-nocomp.c | 74 ++++++++++++++++++++++++++++++-
> migration/multifd.c | 17 +++++---
> migration/ram.c | 89 +++++++++++++++++---------------------
> 6 files changed, 173 insertions(+), 69 deletions(-)
Queued, thanks!
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-12-17 15:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 22:47 [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
2024-12-06 22:47 ` [PATCH v3 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
2024-12-06 22:47 ` [PATCH v3 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
2024-12-09 20:52 ` Fabiano Rosas
2024-12-06 22:47 ` [PATCH v3 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h Peter Xu
2024-12-06 22:47 ` [PATCH v3 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Peter Xu
2024-12-06 22:47 ` [PATCH v3 5/7] migration/multifd: Remove sync processing on postcopy Peter Xu
2024-12-06 22:47 ` [PATCH v3 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
2024-12-09 20:55 ` Fabiano Rosas
2024-12-06 22:47 ` [PATCH v3 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
2024-12-09 20:56 ` Fabiano Rosas
2024-12-17 15:26 ` [PATCH v3 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Fabiano Rosas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).