* [PATCH v1 0/4] Allow to enable multifd and postcopy migration together
@ 2024-11-26 11:57 Prasad Pandit
2024-11-26 11:57 ` [PATCH v1 1/4] migration/multifd: move macros to multifd header Prasad Pandit
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Prasad Pandit @ 2024-11-26 11:57 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
Hello,
* Currently Multifd and Postcopy migration can not be used together.
QEMU shows "Postcopy is not yet compatible with multifd" message.
When migrating guests with large (100's GB) RAM, Multifd threads
help to accelerate migration, but inability to use it with the
Postcopy mode delays guest start up on the destination side.
* This patch series allows to enable both Multifd and Postcopy
migration together. Precopy and Multifd threads work during
the initial guest (RAM) transfer. When migration moves to the
Postcopy phase, Multifd threads are restrained and the Postcopy
threads start to request pages from the source side.
* This series removes magic value (4-bytes) introduced in the
previous series for the Postcopy channel.
v0 -> https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
Thank you.
---
Prasad Pandit (4):
migration/multifd: move macros to multifd header
migration: remove multifd check with postcopy
migration: refactor ram_save_target_page functions
migration: enable multifd and postcopy together
migration/migration.c | 95 ++++++++++++++++++++++++--------------
migration/multifd-nocomp.c | 3 +-
migration/multifd.c | 5 --
migration/multifd.h | 5 ++
migration/options.c | 5 --
migration/ram.c | 74 +++++++++--------------------
6 files changed, 89 insertions(+), 98 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH v1 1/4] migration/multifd: move macros to multifd header 2024-11-26 11:57 [PATCH v1 0/4] Allow to enable multifd and postcopy migration together Prasad Pandit @ 2024-11-26 11:57 ` Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 2/4] migration: remove multifd check with postcopy Prasad Pandit ` (2 subsequent siblings) 3 siblings, 0 replies; 18+ messages in thread From: Prasad Pandit @ 2024-11-26 11:57 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Move MULTIFD_ macros to the header file so that they are accessible from other source files. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/multifd.c | 5 ----- migration/multifd.h | 5 +++++ 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/migration/multifd.c b/migration/multifd.c index 498e71fd10..f8914276c4 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -33,11 +33,6 @@ #include "io/channel-socket.h" #include "yank_functions.h" -/* Multiple fd's */ - -#define MULTIFD_MAGIC 0x11223344U -#define MULTIFD_VERSION 1 - typedef struct { uint32_t magic; uint32_t version; diff --git a/migration/multifd.h b/migration/multifd.h index 50d58c0c9c..519dffeb9e 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -33,6 +33,11 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset); bool multifd_recv(void); MultiFDRecvData *multifd_get_recv_data(void); +/* Multiple fd's */ + +#define MULTIFD_MAGIC 0x11223344U +#define MULTIFD_VERSION 1 + /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v1 2/4] migration: remove multifd check with postcopy 2024-11-26 11:57 [PATCH v1 0/4] Allow to enable multifd and postcopy migration together Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 1/4] migration/multifd: move macros to multifd header Prasad Pandit @ 2024-11-26 11:57 ` Prasad Pandit 2024-11-26 21:12 ` Fabiano Rosas 2024-11-26 11:57 ` [PATCH v1 3/4] migration: refactor ram_save_target_page functions Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 4/4] migration: enable multifd and postcopy together Prasad Pandit 3 siblings, 1 reply; 18+ messages in thread From: Prasad Pandit @ 2024-11-26 11:57 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Remove multifd capability check with Postcopy mode. This helps to enable both multifd and postcopy together. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/options.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/migration/options.c b/migration/options.c index ad8d6989a8..c498558a85 100644 --- a/migration/options.c +++ b/migration/options.c @@ -479,11 +479,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) error_setg(errp, "Postcopy is not compatible with ignore-shared"); return false; } - - if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) { - error_setg(errp, "Postcopy is not yet compatible with multifd"); - return false; - } } if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v1 2/4] migration: remove multifd check with postcopy 2024-11-26 11:57 ` [PATCH v1 2/4] migration: remove multifd check with postcopy Prasad Pandit @ 2024-11-26 21:12 ` Fabiano Rosas 2024-11-27 11:43 ` Prasad Pandit 0 siblings, 1 reply; 18+ messages in thread From: Fabiano Rosas @ 2024-11-26 21:12 UTC (permalink / raw) To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > From: Prasad Pandit <pjp@fedoraproject.org> > > Remove multifd capability check with Postcopy mode. > This helps to enable both multifd and postcopy together. > > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > --- > migration/options.c | 5 ----- > 1 file changed, 5 deletions(-) > > diff --git a/migration/options.c b/migration/options.c > index ad8d6989a8..c498558a85 100644 > --- a/migration/options.c > +++ b/migration/options.c > @@ -479,11 +479,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) > error_setg(errp, "Postcopy is not compatible with ignore-shared"); > return false; > } > - > - if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) { > - error_setg(errp, "Postcopy is not yet compatible with multifd"); > - return false; > - } This should be squashed into patch 4. We don't want to enable what doesn't work yet. > } > > if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 2/4] migration: remove multifd check with postcopy 2024-11-26 21:12 ` Fabiano Rosas @ 2024-11-27 11:43 ` Prasad Pandit 0 siblings, 0 replies; 18+ messages in thread From: Prasad Pandit @ 2024-11-27 11:43 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit On Wed, 27 Nov 2024 at 02:43, Fabiano Rosas <farosas@suse.de> wrote: > This should be squashed into patch 4. We don't want to enable what > doesn't work yet. * Okay. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-26 11:57 [PATCH v1 0/4] Allow to enable multifd and postcopy migration together Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 1/4] migration/multifd: move macros to multifd header Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 2/4] migration: remove multifd check with postcopy Prasad Pandit @ 2024-11-26 11:57 ` Prasad Pandit 2024-11-26 21:18 ` Fabiano Rosas 2024-11-26 11:57 ` [PATCH v1 4/4] migration: enable multifd and postcopy together Prasad Pandit 3 siblings, 1 reply; 18+ messages in thread From: Prasad Pandit @ 2024-11-26 11:57 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Refactor ram_save_target_page legacy and multifd functions into one. Other than simplifying it, it frees 'migration_ops' object from usage, so it is expunged. When both Multifd and Postcopy modes are enabled, to avoid errors, the Multifd threads are active until migration reaches the Postcopy mode. This is done by checking the state returned by migration_in_postcopy(). Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/multifd-nocomp.c | 3 +- migration/ram.c | 74 ++++++++++++-------------------------- 2 files changed, 24 insertions(+), 53 deletions(-) v1: Further refactor ram_save_target_page() function to conflate save_zero_page() calls. Add migration_in_postcopy() call to check migration state instead of combining it with migrate_multifd(). v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c index 55191152f9..e92821e8f6 100644 --- a/migration/multifd-nocomp.c +++ b/migration/multifd-nocomp.c @@ -14,6 +14,7 @@ #include "exec/ramblock.h" #include "exec/target_page.h" #include "file.h" +#include "migration.h" #include "multifd.h" #include "options.h" #include "qapi/error.h" @@ -345,7 +346,7 @@ retry: int multifd_ram_flush_and_sync(void) { - if (!migrate_multifd()) { + if (!migrate_multifd() || migration_in_postcopy()) { return 0; } diff --git a/migration/ram.c b/migration/ram.c index 05ff9eb328..9d1ec6209c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -467,13 +467,6 @@ void ram_transferred_add(uint64_t bytes) } } -struct MigrationOps { - int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss); -}; -typedef struct MigrationOps MigrationOps; - -MigrationOps *migration_ops; - static int ram_save_host_page_urgent(PageSearchStatus *pss); /* NOTE: page is the PFN not real ram_addr_t. */ @@ -1323,9 +1316,9 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); if (!pss->block) { - if (migrate_multifd() && - (!migrate_multifd_flush_after_each_section() || - migrate_mapped_ram())) { + if (migrate_multifd() && !migration_in_postcopy() + && (!migrate_multifd_flush_after_each_section() + || migrate_mapped_ram())) { QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; int ret = multifd_ram_flush_and_sync(); if (ret < 0) { @@ -1986,55 +1979,39 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, } /** - * ram_save_target_page_legacy: save one target page + * ram_save_target_page: save one target page to the precopy thread + * OR to multifd workers. * - * Returns the number of pages written + * Multifd mode: returns 1 if the page was queued, -1 otherwise. + * Non-multifd mode: returns the number of pages written. * * @rs: current RAM state * @pss: data about the page we want to send */ -static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) +static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) { ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; int res; + if (!migrate_multifd() + || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { + if (save_zero_page(rs, pss, offset)) { + return 1; + } + } + + if (migrate_multifd() && !migration_in_postcopy()) { + RAMBlock *block = pss->block; + return ram_save_multifd_page(block, offset); + } + if (control_save_page(pss, offset, &res)) { return res; } - if (save_zero_page(rs, pss, offset)) { - return 1; - } - return ram_save_page(rs, pss); } -/** - * ram_save_target_page_multifd: send one target page to multifd workers - * - * Returns 1 if the page was queued, -1 otherwise. - * - * @rs: current RAM state - * @pss: data about the page we want to send - */ -static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss) -{ - RAMBlock *block = pss->block; - ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; - - /* - * While using multifd live migration, we still need to handle zero - * page checking on the migration main thread. - */ - if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { - if (save_zero_page(rs, pss, offset)) { - return 1; - } - } - - return ram_save_multifd_page(block, offset); -} - /* Should be called before sending a host page */ static void pss_host_page_prepare(PageSearchStatus *pss) { @@ -2121,7 +2098,7 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss) if (page_dirty) { /* Be strict to return code; it must be 1, or what else? */ - if (migration_ops->ram_save_target_page(rs, pss) != 1) { + if (ram_save_target_page(rs, pss) != 1) { error_report_once("%s: ram_save_target_page failed", __func__); ret = -1; goto out; @@ -2190,7 +2167,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) if (preempt_active) { qemu_mutex_unlock(&rs->bitmap_mutex); } - tmppages = migration_ops->ram_save_target_page(rs, pss); + tmppages = ram_save_target_page(rs, pss); if (tmppages >= 0) { pages += tmppages; /* @@ -2388,8 +2365,6 @@ static void ram_save_cleanup(void *opaque) xbzrle_cleanup(); multifd_ram_save_cleanup(); ram_state_cleanup(rsp); - g_free(migration_ops); - migration_ops = NULL; } static void ram_state_reset(RAMState *rs) @@ -3055,13 +3030,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) return ret; } - migration_ops = g_malloc0(sizeof(MigrationOps)); - if (migrate_multifd()) { multifd_ram_save_setup(); - migration_ops->ram_save_target_page = ram_save_target_page_multifd; - } else { - migration_ops->ram_save_target_page = ram_save_target_page_legacy; } bql_unlock(); -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-26 11:57 ` [PATCH v1 3/4] migration: refactor ram_save_target_page functions Prasad Pandit @ 2024-11-26 21:18 ` Fabiano Rosas 2024-11-27 11:42 ` Prasad Pandit 0 siblings, 1 reply; 18+ messages in thread From: Fabiano Rosas @ 2024-11-26 21:18 UTC (permalink / raw) To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > From: Prasad Pandit <pjp@fedoraproject.org> > > Refactor ram_save_target_page legacy and multifd > functions into one. Other than simplifying it, > it frees 'migration_ops' object from usage, so it > is expunged. > > When both Multifd and Postcopy modes are enabled, > to avoid errors, the Multifd threads are active until > migration reaches the Postcopy mode. This is done by > checking the state returned by migration_in_postcopy(). This patch should be just the actual refactoring on top of master, with no mention to postcopy at all. > > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > --- > migration/multifd-nocomp.c | 3 +- > migration/ram.c | 74 ++++++++++++-------------------------- > 2 files changed, 24 insertions(+), 53 deletions(-) > > v1: Further refactor ram_save_target_page() function to conflate > save_zero_page() calls. > > Add migration_in_postcopy() call to check migration state > instead of combining it with migrate_multifd(). > > v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u > > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c > index 55191152f9..e92821e8f6 100644 > --- a/migration/multifd-nocomp.c > +++ b/migration/multifd-nocomp.c > @@ -14,6 +14,7 @@ > #include "exec/ramblock.h" > #include "exec/target_page.h" > #include "file.h" > +#include "migration.h" > #include "multifd.h" > #include "options.h" > #include "qapi/error.h" > @@ -345,7 +346,7 @@ retry: > > int multifd_ram_flush_and_sync(void) > { > - if (!migrate_multifd()) { > + if (!migrate_multifd() || migration_in_postcopy()) { > return 0; > } > > diff --git a/migration/ram.c b/migration/ram.c > index 05ff9eb328..9d1ec6209c 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -467,13 +467,6 @@ void ram_transferred_add(uint64_t bytes) > } > } > > -struct MigrationOps { > - int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss); > -}; > -typedef struct MigrationOps MigrationOps; > - > -MigrationOps *migration_ops; > - > static int ram_save_host_page_urgent(PageSearchStatus *pss); > > /* NOTE: page is the PFN not real ram_addr_t. */ > @@ -1323,9 +1316,9 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > pss->page = 0; > pss->block = QLIST_NEXT_RCU(pss->block, next); > if (!pss->block) { > - if (migrate_multifd() && > - (!migrate_multifd_flush_after_each_section() || > - migrate_mapped_ram())) { > + if (migrate_multifd() && !migration_in_postcopy() > + && (!migrate_multifd_flush_after_each_section() > + || migrate_mapped_ram())) { This is getting out of hand. We can't keep growing this condition every time something new comes up. Any ideas? > QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; > int ret = multifd_ram_flush_and_sync(); > if (ret < 0) { > @@ -1986,55 +1979,39 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, > } > > /** > - * ram_save_target_page_legacy: save one target page > + * ram_save_target_page: save one target page to the precopy thread > + * OR to multifd workers. > * > - * Returns the number of pages written > + * Multifd mode: returns 1 if the page was queued, -1 otherwise. > + * Non-multifd mode: returns the number of pages written. Yes, although I wonder if we should keep documenting this when we only call this function for a single page and it always returns at most 1. > * > * @rs: current RAM state > * @pss: data about the page we want to send > */ > -static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) > +static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) > { > ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; > int res; > > + if (!migrate_multifd() > + || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { > + if (save_zero_page(rs, pss, offset)) { > + return 1; > + } > + } > + > + if (migrate_multifd() && !migration_in_postcopy()) { > + RAMBlock *block = pss->block; > + return ram_save_multifd_page(block, offset); > + } > + > if (control_save_page(pss, offset, &res)) { > return res; > } > > - if (save_zero_page(rs, pss, offset)) { > - return 1; > - } > - > return ram_save_page(rs, pss); > } > > -/** > - * ram_save_target_page_multifd: send one target page to multifd workers > - * > - * Returns 1 if the page was queued, -1 otherwise. > - * > - * @rs: current RAM state > - * @pss: data about the page we want to send > - */ > -static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss) > -{ > - RAMBlock *block = pss->block; > - ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; > - > - /* > - * While using multifd live migration, we still need to handle zero > - * page checking on the migration main thread. > - */ > - if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { > - if (save_zero_page(rs, pss, offset)) { > - return 1; > - } > - } > - > - return ram_save_multifd_page(block, offset); > -} > - > /* Should be called before sending a host page */ > static void pss_host_page_prepare(PageSearchStatus *pss) > { > @@ -2121,7 +2098,7 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss) > > if (page_dirty) { > /* Be strict to return code; it must be 1, or what else? */ ... this^ comment, for instance. > - if (migration_ops->ram_save_target_page(rs, pss) != 1) { > + if (ram_save_target_page(rs, pss) != 1) { > error_report_once("%s: ram_save_target_page failed", __func__); > ret = -1; > goto out; > @@ -2190,7 +2167,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) > if (preempt_active) { > qemu_mutex_unlock(&rs->bitmap_mutex); > } > - tmppages = migration_ops->ram_save_target_page(rs, pss); > + tmppages = ram_save_target_page(rs, pss); > if (tmppages >= 0) { > pages += tmppages; > /* > @@ -2388,8 +2365,6 @@ static void ram_save_cleanup(void *opaque) > xbzrle_cleanup(); > multifd_ram_save_cleanup(); > ram_state_cleanup(rsp); > - g_free(migration_ops); > - migration_ops = NULL; > } > > static void ram_state_reset(RAMState *rs) > @@ -3055,13 +3030,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) > return ret; > } > > - migration_ops = g_malloc0(sizeof(MigrationOps)); > - > if (migrate_multifd()) { > multifd_ram_save_setup(); > - migration_ops->ram_save_target_page = ram_save_target_page_multifd; > - } else { > - migration_ops->ram_save_target_page = ram_save_target_page_legacy; > } > > bql_unlock(); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-26 21:18 ` Fabiano Rosas @ 2024-11-27 11:42 ` Prasad Pandit 2024-11-27 12:19 ` Fabiano Rosas 2024-11-27 14:12 ` Fabiano Rosas 0 siblings, 2 replies; 18+ messages in thread From: Prasad Pandit @ 2024-11-27 11:42 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit On Wed, 27 Nov 2024 at 02:49, Fabiano Rosas <farosas@suse.de> wrote: > This patch should be just the actual refactoring on top of master, with > no mention to postcopy at all. * Okay. We'll have to ensure that it is merged before multifd+postcopy change. > > + if (migrate_multifd() && !migration_in_postcopy() > > + && (!migrate_multifd_flush_after_each_section() > > + || migrate_mapped_ram())) { > > This is getting out of hand. We can't keep growing this condition every > time something new comes up. Any ideas? * In 'v0' this series, !migration_in_postcopy() was added to migrate_multifd(), which worked, but was not okay. * Another could be to define a new helper/macro to include above 3-4 checks. Because migrate_multifd(), migrate_multifd_flush_after_each_section() and migrate_mapped_ram() appear together at multiple points. But strangely the equation is not the same. Sometimes migrate_mapped_ram() is 'true' and sometimes it is 'false', so a combined helper may not work. It is all to accommodate different workings of multifd IIUC, if it and precopy used the same protocol/stream, maybe such conditionals would go away automatically. > Yes, although I wonder if we should keep documenting this when we only > call this function for a single page and it always returns at most 1. > > if (page_dirty) { > > /* Be strict to return code; it must be 1, or what else? */ > > ... this^ comment, for instance. > * Okay, can remove them. So to confirm: this refactoring patch should be a separate one by itself? And then in the following multifd+postcopy series we just add !migration_in_postcopy() check to it? Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-27 11:42 ` Prasad Pandit @ 2024-11-27 12:19 ` Fabiano Rosas 2024-11-28 10:43 ` Prasad Pandit 2024-11-27 14:12 ` Fabiano Rosas 1 sibling, 1 reply; 18+ messages in thread From: Fabiano Rosas @ 2024-11-27 12:19 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > On Wed, 27 Nov 2024 at 02:49, Fabiano Rosas <farosas@suse.de> wrote: >> This patch should be just the actual refactoring on top of master, with >> no mention to postcopy at all. > > * Okay. We'll have to ensure that it is merged before multifd+postcopy change. That's ok, just put it at the start of the series. > >> > + if (migrate_multifd() && !migration_in_postcopy() >> > + && (!migrate_multifd_flush_after_each_section() >> > + || migrate_mapped_ram())) { >> >> This is getting out of hand. We can't keep growing this condition every >> time something new comes up. Any ideas? > > * In 'v0' this series, !migration_in_postcopy() was added to > migrate_multifd(), which worked, but was not okay. > > * Another could be to define a new helper/macro to include above 3-4 > checks. Because migrate_multifd(), > migrate_multifd_flush_after_each_section() and migrate_mapped_ram() > appear together at multiple points. But strangely the equation is not > the same. Sometimes migrate_mapped_ram() is 'true' and sometimes it is > 'false', so a combined helper may not work. It is all to accommodate > different workings of multifd IIUC, if it and precopy used the same > protocol/stream, maybe such conditionals would go away automatically. > >> Yes, although I wonder if we should keep documenting this when we only >> call this function for a single page and it always returns at most 1. >> > if (page_dirty) { >> > /* Be strict to return code; it must be 1, or what else? */ >> >> ... this^ comment, for instance. >> > > * Okay, can remove them. > > So to confirm: this refactoring patch should be a separate one by > itself? And then in the following multifd+postcopy series we just add > !migration_in_postcopy() check to it? It doesn't need to be a single patch submission, it could be a patch at the start of this series. It's just that it needs to logically do only one thing, which is to move the code around without adding new bits that don't exist in current master (a strict definition of refactoring). The multifd+postcopy changes come in a subsequent patch so it's clear that one patch was just shuffling code around while the rest is part of the feature enablement. > > Thank you. > --- > - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-27 12:19 ` Fabiano Rosas @ 2024-11-28 10:43 ` Prasad Pandit 0 siblings, 0 replies; 18+ messages in thread From: Prasad Pandit @ 2024-11-28 10:43 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit On Wed, 27 Nov 2024 at 17:49, Fabiano Rosas <farosas@suse.de> wrote: > > On Wed, 27 Nov 2024 at 02:49, Fabiano Rosas <farosas@suse.de> wrote: > >> This patch should be just the actual refactoring on top of master, with > >> no mention to postcopy at all. ... > It doesn't need to be a single patch submission, it could be a patch at > the start of this series. It's just that it needs to logically do only > one thing, which is to move the code around without adding new bits that > don't exist in current master (a strict definition of refactoring). The > multifd+postcopy changes come in a subsequent patch so it's clear that > one patch was just shuffling code around while the rest is part of the > feature enablement. * Okay, will do. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-27 11:42 ` Prasad Pandit 2024-11-27 12:19 ` Fabiano Rosas @ 2024-11-27 14:12 ` Fabiano Rosas 2024-11-28 10:18 ` Prasad Pandit 2024-12-05 22:26 ` Peter Xu 1 sibling, 2 replies; 18+ messages in thread From: Fabiano Rosas @ 2024-11-27 14:12 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > On Wed, 27 Nov 2024 at 02:49, Fabiano Rosas <farosas@suse.de> wrote: >> This patch should be just the actual refactoring on top of master, with >> no mention to postcopy at all. > > * Okay. We'll have to ensure that it is merged before multifd+postcopy change. > >> > + if (migrate_multifd() && !migration_in_postcopy() >> > + && (!migrate_multifd_flush_after_each_section() >> > + || migrate_mapped_ram())) { >> >> This is getting out of hand. We can't keep growing this condition every >> time something new comes up. Any ideas? > > * In 'v0' this series, !migration_in_postcopy() was added to > migrate_multifd(), which worked, but was not okay. > > * Another could be to define a new helper/macro to include above 3-4 > checks. Because migrate_multifd(), > migrate_multifd_flush_after_each_section() and migrate_mapped_ram() > appear together at multiple points. But strangely the equation is not > the same. Sometimes migrate_mapped_ram() is 'true' and sometimes it is > 'false', so a combined helper may not work. It is all to accommodate > different workings of multifd IIUC, if it and precopy used the same > protocol/stream, maybe such conditionals would go away automatically. Maybe this would improve the situation. Peter, what do you think? -->8-- From e9110360eb0efddf6945f37c518e3cc38d12b600 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <farosas@suse.de> Date: Wed, 27 Nov 2024 11:03:04 -0300 Subject: [PATCH] migration: Rationalize multifd flushes from ram code We currently have a mess of conditionals to achieve the correct combination of multifd local flushes, where we sync the local (send/recv) multifd threads between themselves, and multifd remote flushes, where we put a flag on the stream to inform the recv side to perform a local flush. On top of that we also have the multifd_flush_after_each_section property, which is there to provide backward compatibility from when we used to flush after every vmstate section. And lastly, there's the mapped-ram feature which always wants the non-backward compatible behavior and also does not support extraneous flags on the stream (such as the MULTIFD_FLUSH flag). Move the conditionals into a separate function that encapsulates and explains the whole situation. Signed-off-by: Fabiano Rosas <farosas@suse.de> --- migration/ram.c | 198 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 157 insertions(+), 41 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 05ff9eb328..caaaae6fdc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1277,6 +1277,149 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) return pages; } +enum RamMultifdFlushSpots { + FLUSH_SAVE_SETUP, + FLUSH_SAVE_ITER, + FLUSH_DIRTY_BLOCK, + FLUSH_SAVE_COMPLETE, + + FLUSH_LOAD_POSTCOPY_EOS, + FLUSH_LOAD_POSTCOPY_FLUSH, + FLUSH_LOAD_PRECOPY_EOS, + FLUSH_LOAD_PRECOPY_FLUSH, +}; + +static int ram_multifd_flush(QEMUFile *f, enum RamMultifdFlushSpots spot) +{ + int ret; + bool always_flush, do_local_flush, do_remote_flush; + bool mapped_ram = migrate_mapped_ram(); + + if (!migrate_multifd()) { + return 0; + } + + /* + * For backward compatibility, whether to flush multifd after each + * section is sent. This is mutually exclusive with a + * RAM_SAVE_FLAG_MULTIFD_FLUSH on the stream + */ + always_flush = migrate_multifd_flush_after_each_section(); + + /* + * Save side flushes + */ + + do_local_flush = false; + + switch (spot) { + case FLUSH_SAVE_SETUP: + assert(!bql_locked()); + do_local_flush = true; + break; + + case FLUSH_SAVE_ITER: + /* + * This flush is not necessary, only do for backward + * compatibility. Mapped-ram assumes the new scheme. + */ + do_local_flush = always_flush && !mapped_ram; + break; + + case FLUSH_DIRTY_BLOCK: + /* + * This is the flush that's actually required, always do it + * unless we're emulating the old behavior. + */ + do_local_flush = !always_flush || mapped_ram; + break; + + case FLUSH_SAVE_COMPLETE: + do_local_flush = true; + break; + + default: + break; + } + + if (do_local_flush) { + ret = multifd_ram_flush_and_sync(); + if (ret < 0) { + return ret; + } + } + + /* + * There's never a remote flush with mapped-ram because any flags + * put on the stream (aside from RAM_SAVE_FLAG_MEM_SIZE and + * RAM_SAVE_FLAG_EOS) break mapped-ram's assumption that ram pages + * can be read contiguously from the stream. + * + * On the recv side, there's no local flush, even at EOS because + * mapped-ram does its own flush after loading the ramblock. + */ + if (mapped_ram) { + return 0; + } + + do_remote_flush = false; + + /* Save side remote flush */ + switch (spot) { + case FLUSH_SAVE_SETUP: + do_remote_flush = !always_flush; + break; + + case FLUSH_SAVE_ITER: + do_remote_flush = false; + break; + + case FLUSH_DIRTY_BLOCK: + do_remote_flush = do_local_flush; + break; + + case FLUSH_SAVE_COMPLETE: + do_remote_flush = false; + break; + + default: + break; + } + + /* Put a flag on the stream to trigger a remote flush */ + if (do_remote_flush) { + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); + qemu_fflush(f); + } + + /* + * Load side flushes. + */ + + do_local_flush = false; + + switch (spot) { + case FLUSH_LOAD_PRECOPY_EOS: + case FLUSH_LOAD_POSTCOPY_EOS: + do_local_flush = always_flush; + break; + + case FLUSH_LOAD_PRECOPY_FLUSH: + case FLUSH_LOAD_POSTCOPY_FLUSH: + do_local_flush = true; + break; + + default: + break; + } + + if (do_local_flush) { + multifd_recv_sync_main(); + } + + return 0; +} + static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) { if (!multifd_queue_page(block, offset)) { @@ -1323,19 +1466,10 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); if (!pss->block) { - if (migrate_multifd() && - (!migrate_multifd_flush_after_each_section() || - migrate_mapped_ram())) { - QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; - int ret = multifd_ram_flush_and_sync(); - if (ret < 0) { - return ret; - } - - if (!migrate_mapped_ram()) { - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); - qemu_fflush(f); - } + int ret = ram_multifd_flush(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel, + FLUSH_DIRTY_BLOCK); + if (ret < 0) { + return ret; } /* Hit the end of the list */ @@ -3065,18 +3199,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) } bql_unlock(); - ret = multifd_ram_flush_and_sync(); + ret = ram_multifd_flush(f, FLUSH_SAVE_SETUP); bql_lock(); if (ret < 0) { error_setg(errp, "%s: multifd synchronization failed", __func__); return ret; } - if (migrate_multifd() && !migrate_multifd_flush_after_each_section() - && !migrate_mapped_ram()) { - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); - } - qemu_put_be64(f, RAM_SAVE_FLAG_EOS); ret = qemu_fflush(f); if (ret < 0) { @@ -3209,12 +3338,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) out: if (ret >= 0 && migration_is_running()) { - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && - !migrate_mapped_ram()) { - ret = multifd_ram_flush_and_sync(); - if (ret < 0) { - return ret; - } + + ret = ram_multifd_flush(f, FLUSH_SAVE_ITER); + if (ret < 0) { + return ret; } qemu_put_be64(f, RAM_SAVE_FLAG_EOS); @@ -3283,7 +3410,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) } } - ret = multifd_ram_flush_and_sync(); + ret = ram_multifd_flush(f, FLUSH_SAVE_COMPLETE); if (ret < 0) { return ret; } @@ -3797,14 +3924,11 @@ int ram_load_postcopy(QEMUFile *f, int channel) } break; case RAM_SAVE_FLAG_MULTIFD_FLUSH: - multifd_recv_sync_main(); + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_FLUSH); break; case RAM_SAVE_FLAG_EOS: /* normal exit */ - if (migrate_multifd() && - migrate_multifd_flush_after_each_section()) { - multifd_recv_sync_main(); - } + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_EOS); break; default: error_report("Unknown combination of migration flags: 0x%x" @@ -4237,19 +4361,11 @@ static int ram_load_precopy(QEMUFile *f) } break; case RAM_SAVE_FLAG_MULTIFD_FLUSH: - multifd_recv_sync_main(); + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_FLUSH); break; case RAM_SAVE_FLAG_EOS: /* normal exit */ - if (migrate_multifd() && - migrate_multifd_flush_after_each_section() && - /* - * Mapped-ram migration flushes once and for all after - * parsing ramblocks. Always ignore EOS for it. - */ - !migrate_mapped_ram()) { - multifd_recv_sync_main(); - } + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_EOS); break; case RAM_SAVE_FLAG_HOOK: ret = rdma_registration_handle(f); -- 2.35.3 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-27 14:12 ` Fabiano Rosas @ 2024-11-28 10:18 ` Prasad Pandit 2024-11-28 13:19 ` Fabiano Rosas 2024-12-05 22:26 ` Peter Xu 1 sibling, 1 reply; 18+ messages in thread From: Prasad Pandit @ 2024-11-28 10:18 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit On Wed, 27 Nov 2024 at 19:42, Fabiano Rosas <farosas@suse.de> wrote: > From e9110360eb0efddf6945f37c518e3cc38d12b600 Mon Sep 17 00:00:00 2001 > From: Fabiano Rosas <farosas@suse.de> > Date: Wed, 27 Nov 2024 11:03:04 -0300 > Subject: [PATCH] migration: Rationalize multifd flushes from ram code > > We currently have a mess of conditionals to achieve the correct > combination of multifd local flushes, where we sync the local > (send/recv) multifd threads between themselves, and multifd remote > flushes, where we put a flag on the stream to inform the recv side to > perform a local flush. > > On top of that we also have the multifd_flush_after_each_section > property, which is there to provide backward compatibility from when > we used to flush after every vmstate section. > > And lastly, there's the mapped-ram feature which always wants the > non-backward compatible behavior and also does not support extraneous > flags on the stream (such as the MULTIFD_FLUSH flag). > > Move the conditionals into a separate function that encapsulates and > explains the whole situation. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > migration/ram.c | 198 ++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 157 insertions(+), 41 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 05ff9eb328..caaaae6fdc 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1277,6 +1277,149 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) > return pages; > } > > +enum RamMultifdFlushSpots { > + FLUSH_SAVE_SETUP, > + FLUSH_SAVE_ITER, > + FLUSH_DIRTY_BLOCK, > + FLUSH_SAVE_COMPLETE, > + > + FLUSH_LOAD_POSTCOPY_EOS, > + FLUSH_LOAD_POSTCOPY_FLUSH, > + FLUSH_LOAD_PRECOPY_EOS, > + FLUSH_LOAD_PRECOPY_FLUSH, > +}; > + > +static int ram_multifd_flush(QEMUFile *f, enum RamMultifdFlushSpots spot) > +{ > + int ret; > + bool always_flush, do_local_flush, do_remote_flush; > + bool mapped_ram = migrate_mapped_ram(); > + > + if (!migrate_multifd()) { > + return 0; > + } > + > + /* > + * For backward compatibility, whether to flush multifd after each > + * section is sent. This is mutually exclusive with a > + * RAM_SAVE_FLAG_MULTIFD_FLUSH on the stream > + */ > + always_flush = migrate_multifd_flush_after_each_section(); > + > + /* > + * Save side flushes > + */ > + > + do_local_flush = false; > + > + switch (spot) { > + case FLUSH_SAVE_SETUP: > + assert(!bql_locked()); > + do_local_flush = true; > + break; > + > + case FLUSH_SAVE_ITER: > + /* > + * This flush is not necessary, only do for backward > + * compatibility. Mapped-ram assumes the new scheme. > + */ > + do_local_flush = always_flush && !mapped_ram; > + break; > + > + case FLUSH_DIRTY_BLOCK: > + /* > + * This is the flush that's actually required, always do it > + * unless we're emulating the old behavior. > + */ > + do_local_flush = !always_flush || mapped_ram; > + break; > + > + case FLUSH_SAVE_COMPLETE: > + do_local_flush = true; > + break; > + > + default: > + break; > + } > + > + if (do_local_flush) { > + ret = multifd_ram_flush_and_sync(); > + if (ret < 0) { > + return ret; > + } > + } > + > + /* > + * There's never a remote flush with mapped-ram because any flags > + * put on the stream (aside from RAM_SAVE_FLAG_MEM_SIZE and > + * RAM_SAVE_FLAG_EOS) break mapped-ram's assumption that ram pages > + * can be read contiguously from the stream. > + * > + * On the recv side, there's no local flush, even at EOS because > + * mapped-ram does its own flush after loading the ramblock. > + */ > + if (mapped_ram) { > + return 0; > + } > + > + do_remote_flush = false; > + > + /* Save side remote flush */ > + switch (spot) { > + case FLUSH_SAVE_SETUP: > + do_remote_flush = !always_flush; > + break; > + > + case FLUSH_SAVE_ITER: > + do_remote_flush = false; > + break; > + > + case FLUSH_DIRTY_BLOCK: > + do_remote_flush = do_local_flush; > + break; > + > + case FLUSH_SAVE_COMPLETE: > + do_remote_flush = false; > + break; > + > + default: > + break; > + } > + > + /* Put a flag on the stream to trigger a remote flush */ > + if (do_remote_flush) { > + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > + qemu_fflush(f); > + } > + > + /* > + * Load side flushes. > + */ > + > + do_local_flush = false; > + > + switch (spot) { > + case FLUSH_LOAD_PRECOPY_EOS: > + case FLUSH_LOAD_POSTCOPY_EOS: > + do_local_flush = always_flush; > + break; > + > + case FLUSH_LOAD_PRECOPY_FLUSH: > + case FLUSH_LOAD_POSTCOPY_FLUSH: > + do_local_flush = true; > + break; > + > + default: > + break; > + } > + > + if (do_local_flush) { > + multifd_recv_sync_main(); > + } > + > + return 0; > +} > + > static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) > { > if (!multifd_queue_page(block, offset)) { > @@ -1323,19 +1466,10 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > pss->page = 0; > pss->block = QLIST_NEXT_RCU(pss->block, next); > if (!pss->block) { > - if (migrate_multifd() && > - (!migrate_multifd_flush_after_each_section() || > - migrate_mapped_ram())) { > - QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; > - int ret = multifd_ram_flush_and_sync(); > - if (ret < 0) { > - return ret; > - } > - > - if (!migrate_mapped_ram()) { > - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > - qemu_fflush(f); > - } > + int ret = ram_multifd_flush(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel, > + FLUSH_DIRTY_BLOCK); > + if (ret < 0) { > + return ret; > } > > /* Hit the end of the list */ > @@ -3065,18 +3199,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) > } > > bql_unlock(); > - ret = multifd_ram_flush_and_sync(); > + ret = ram_multifd_flush(f, FLUSH_SAVE_SETUP); > bql_lock(); > if (ret < 0) { > error_setg(errp, "%s: multifd synchronization failed", __func__); > return ret; > } > > - if (migrate_multifd() && !migrate_multifd_flush_after_each_section() > - && !migrate_mapped_ram()) { > - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > - } > - > qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > ret = qemu_fflush(f); > if (ret < 0) { > @@ -3209,12 +3338,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > > out: > if (ret >= 0 && migration_is_running()) { > - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && > - !migrate_mapped_ram()) { > - ret = multifd_ram_flush_and_sync(); > - if (ret < 0) { > - return ret; > - } > + > + ret = ram_multifd_flush(f, FLUSH_SAVE_ITER); > + if (ret < 0) { > + return ret; > } > > qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > @@ -3283,7 +3410,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > } > } > > - ret = multifd_ram_flush_and_sync(); > + ret = ram_multifd_flush(f, FLUSH_SAVE_COMPLETE); > if (ret < 0) { > return ret; > } > @@ -3797,14 +3924,11 @@ int ram_load_postcopy(QEMUFile *f, int channel) > } > break; > case RAM_SAVE_FLAG_MULTIFD_FLUSH: > - multifd_recv_sync_main(); > + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_FLUSH); > break; > case RAM_SAVE_FLAG_EOS: > /* normal exit */ > - if (migrate_multifd() && > - migrate_multifd_flush_after_each_section()) { > - multifd_recv_sync_main(); > - } > + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_EOS); > break; > default: > error_report("Unknown combination of migration flags: 0x%x" > @@ -4237,19 +4361,11 @@ static int ram_load_precopy(QEMUFile *f) > } > break; > case RAM_SAVE_FLAG_MULTIFD_FLUSH: > - multifd_recv_sync_main(); > + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_FLUSH); > break; > case RAM_SAVE_FLAG_EOS: > /* normal exit */ > - if (migrate_multifd() && > - migrate_multifd_flush_after_each_section() && > - /* > - * Mapped-ram migration flushes once and for all after > - * parsing ramblocks. Always ignore EOS for it. > - */ > - !migrate_mapped_ram()) { > - multifd_recv_sync_main(); > - } > + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_EOS); > break; > case RAM_SAVE_FLAG_HOOK: > ret = rdma_registration_handle(f); > -- > 2.35.3 > * This does not seem to solve for complexity. When reading/following code, it is easier to see 3-4 conditions and work them to check if the full expression is 'true' or 'false', that is not doable here. * fflush(1) is just flushing buffered content into (or out of) the stream IIUC, why do we have to tie it to a specific spot? At any time it is going to do the same thing: flush available data to (or out of) the stream. * Could we separate out send side fflush(1) from the receive side fflush(1) operations? Writing a flag in the stream on the send side to trigger fflush(1) on the receive side is weird; Data stream need not say when to fflush(1). Let the sender and receiver decide when they want to fflush(1) and fsync(1) data, no? Maybe that'll help to reduce/solve the complexity of long conditionals? ie. if we are able to fflush(1) and fsync(1) without any condition? Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-28 10:18 ` Prasad Pandit @ 2024-11-28 13:19 ` Fabiano Rosas 2024-12-02 9:44 ` Prasad Pandit 0 siblings, 1 reply; 18+ messages in thread From: Fabiano Rosas @ 2024-11-28 13:19 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > On Wed, 27 Nov 2024 at 19:42, Fabiano Rosas <farosas@suse.de> wrote: >> From e9110360eb0efddf6945f37c518e3cc38d12b600 Mon Sep 17 00:00:00 2001 >> From: Fabiano Rosas <farosas@suse.de> >> Date: Wed, 27 Nov 2024 11:03:04 -0300 >> Subject: [PATCH] migration: Rationalize multifd flushes from ram code >> >> We currently have a mess of conditionals to achieve the correct >> combination of multifd local flushes, where we sync the local >> (send/recv) multifd threads between themselves, and multifd remote >> flushes, where we put a flag on the stream to inform the recv side to >> perform a local flush. >> >> On top of that we also have the multifd_flush_after_each_section >> property, which is there to provide backward compatibility from when >> we used to flush after every vmstate section. >> >> And lastly, there's the mapped-ram feature which always wants the >> non-backward compatible behavior and also does not support extraneous >> flags on the stream (such as the MULTIFD_FLUSH flag). >> >> Move the conditionals into a separate function that encapsulates and >> explains the whole situation. >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de> >> --- >> migration/ram.c | 198 ++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 157 insertions(+), 41 deletions(-) >> >> diff --git a/migration/ram.c b/migration/ram.c >> index 05ff9eb328..caaaae6fdc 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -1277,6 +1277,149 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) >> return pages; >> } >> >> +enum RamMultifdFlushSpots { >> + FLUSH_SAVE_SETUP, >> + FLUSH_SAVE_ITER, >> + FLUSH_DIRTY_BLOCK, >> + FLUSH_SAVE_COMPLETE, >> + >> + FLUSH_LOAD_POSTCOPY_EOS, >> + FLUSH_LOAD_POSTCOPY_FLUSH, >> + FLUSH_LOAD_PRECOPY_EOS, >> + FLUSH_LOAD_PRECOPY_FLUSH, >> +}; >> + >> +static int ram_multifd_flush(QEMUFile *f, enum RamMultifdFlushSpots spot) >> +{ >> + int ret; >> + bool always_flush, do_local_flush, do_remote_flush; >> + bool mapped_ram = migrate_mapped_ram(); >> + >> + if (!migrate_multifd()) { >> + return 0; >> + } >> + >> + /* >> + * For backward compatibility, whether to flush multifd after each >> + * section is sent. This is mutually exclusive with a >> + * RAM_SAVE_FLAG_MULTIFD_FLUSH on the stream >> + */ >> + always_flush = migrate_multifd_flush_after_each_section(); >> + >> + /* >> + * Save side flushes >> + */ >> + >> + do_local_flush = false; >> + >> + switch (spot) { >> + case FLUSH_SAVE_SETUP: >> + assert(!bql_locked()); >> + do_local_flush = true; >> + break; >> + >> + case FLUSH_SAVE_ITER: >> + /* >> + * This flush is not necessary, only do for backward >> + * compatibility. Mapped-ram assumes the new scheme. >> + */ >> + do_local_flush = always_flush && !mapped_ram; >> + break; >> + >> + case FLUSH_DIRTY_BLOCK: >> + /* >> + * This is the flush that's actually required, always do it >> + * unless we're emulating the old behavior. >> + */ >> + do_local_flush = !always_flush || mapped_ram; >> + break; >> + >> + case FLUSH_SAVE_COMPLETE: >> + do_local_flush = true; >> + break; >> + >> + default: >> + break; >> + } >> + >> + if (do_local_flush) { >> + ret = multifd_ram_flush_and_sync(); >> + if (ret < 0) { >> + return ret; >> + } >> + } >> + >> + /* >> + * There's never a remote flush with mapped-ram because any flags >> + * put on the stream (aside from RAM_SAVE_FLAG_MEM_SIZE and >> + * RAM_SAVE_FLAG_EOS) break mapped-ram's assumption that ram pages >> + * can be read contiguously from the stream. >> + * >> + * On the recv side, there's no local flush, even at EOS because >> + * mapped-ram does its own flush after loading the ramblock. >> + */ >> + if (mapped_ram) { >> + return 0; >> + } >> + >> + do_remote_flush = false; >> + >> + /* Save side remote flush */ >> + switch (spot) { >> + case FLUSH_SAVE_SETUP: >> + do_remote_flush = !always_flush; >> + break; >> + >> + case FLUSH_SAVE_ITER: >> + do_remote_flush = false; >> + break; >> + >> + case FLUSH_DIRTY_BLOCK: >> + do_remote_flush = do_local_flush; >> + break; >> + >> + case FLUSH_SAVE_COMPLETE: >> + do_remote_flush = false; >> + break; >> + >> + default: >> + break; >> + } >> + >> + /* Put a flag on the stream to trigger a remote flush */ >> + if (do_remote_flush) { >> + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); >> + qemu_fflush(f); >> + } >> + >> + /* >> + * Load side flushes. >> + */ >> + >> + do_local_flush = false; >> + >> + switch (spot) { >> + case FLUSH_LOAD_PRECOPY_EOS: >> + case FLUSH_LOAD_POSTCOPY_EOS: >> + do_local_flush = always_flush; >> + break; >> + >> + case FLUSH_LOAD_PRECOPY_FLUSH: >> + case FLUSH_LOAD_POSTCOPY_FLUSH: >> + do_local_flush = true; >> + break; >> + >> + default: >> + break; >> + } >> + >> + if (do_local_flush) { >> + multifd_recv_sync_main(); >> + } >> + >> + return 0; >> +} >> + >> static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) >> { >> if (!multifd_queue_page(block, offset)) { >> @@ -1323,19 +1466,10 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) >> pss->page = 0; >> pss->block = QLIST_NEXT_RCU(pss->block, next); >> if (!pss->block) { >> - if (migrate_multifd() && >> - (!migrate_multifd_flush_after_each_section() || >> - migrate_mapped_ram())) { >> - QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; >> - int ret = multifd_ram_flush_and_sync(); >> - if (ret < 0) { >> - return ret; >> - } >> - >> - if (!migrate_mapped_ram()) { >> - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); >> - qemu_fflush(f); >> - } >> + int ret = ram_multifd_flush(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel, >> + FLUSH_DIRTY_BLOCK); >> + if (ret < 0) { >> + return ret; >> } >> >> /* Hit the end of the list */ >> @@ -3065,18 +3199,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) >> } >> >> bql_unlock(); >> - ret = multifd_ram_flush_and_sync(); >> + ret = ram_multifd_flush(f, FLUSH_SAVE_SETUP); >> bql_lock(); >> if (ret < 0) { >> error_setg(errp, "%s: multifd synchronization failed", __func__); >> return ret; >> } >> >> - if (migrate_multifd() && !migrate_multifd_flush_after_each_section() >> - && !migrate_mapped_ram()) { >> - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); >> - } >> - >> qemu_put_be64(f, RAM_SAVE_FLAG_EOS); >> ret = qemu_fflush(f); >> if (ret < 0) { >> @@ -3209,12 +3338,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >> >> out: >> if (ret >= 0 && migration_is_running()) { >> - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && >> - !migrate_mapped_ram()) { >> - ret = multifd_ram_flush_and_sync(); >> - if (ret < 0) { >> - return ret; >> - } >> + >> + ret = ram_multifd_flush(f, FLUSH_SAVE_ITER); >> + if (ret < 0) { >> + return ret; >> } >> >> qemu_put_be64(f, RAM_SAVE_FLAG_EOS); >> @@ -3283,7 +3410,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) >> } >> } >> >> - ret = multifd_ram_flush_and_sync(); >> + ret = ram_multifd_flush(f, FLUSH_SAVE_COMPLETE); >> if (ret < 0) { >> return ret; >> } >> @@ -3797,14 +3924,11 @@ int ram_load_postcopy(QEMUFile *f, int channel) >> } >> break; >> case RAM_SAVE_FLAG_MULTIFD_FLUSH: >> - multifd_recv_sync_main(); >> + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_FLUSH); >> break; >> case RAM_SAVE_FLAG_EOS: >> /* normal exit */ >> - if (migrate_multifd() && >> - migrate_multifd_flush_after_each_section()) { >> - multifd_recv_sync_main(); >> - } >> + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_EOS); >> break; >> default: >> error_report("Unknown combination of migration flags: 0x%x" >> @@ -4237,19 +4361,11 @@ static int ram_load_precopy(QEMUFile *f) >> } >> break; >> case RAM_SAVE_FLAG_MULTIFD_FLUSH: >> - multifd_recv_sync_main(); >> + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_FLUSH); >> break; >> case RAM_SAVE_FLAG_EOS: >> /* normal exit */ >> - if (migrate_multifd() && >> - migrate_multifd_flush_after_each_section() && >> - /* >> - * Mapped-ram migration flushes once and for all after >> - * parsing ramblocks. Always ignore EOS for it. >> - */ >> - !migrate_mapped_ram()) { >> - multifd_recv_sync_main(); >> - } >> + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_EOS); >> break; >> case RAM_SAVE_FLAG_HOOK: >> ret = rdma_registration_handle(f); >> -- >> 2.35.3 >> > > * This does not seem to solve for complexity. When reading/following > code, it is easier to see 3-4 conditions and work them to check if the > full expression is 'true' or 'false', that is not doable here. You miss the point that people also need to understand the code, not only tell whether it results in true or false. These syncs are all related and we'd like to be able to reason about them in a single place, not in several little pieces all over the place. When a new feature comes up, like it did when mapped-ram was introduced, we don't want to go around having to squint at conditionals to know whether it applies to the new case or not. Also, ram.c is not the place for open-coded multifd code. The same mapped-ram example applies: having to add if (migrate_mapped_ram()) throughout the ram code was a pain and we had some iterations of flipping the logic until we got it right. > > * fflush(1) is just flushing buffered content into (or out of) the > stream IIUC, why do we have to tie it to a specific spot? At any time > it is going to do the same thing: flush available data to (or out of) > the stream. > There is no fflush() going on here. This code is about the multifd flush, which sends the remaining data from MultiFDPages_t and the multifd sync, which synchronizes the multifd threads. That qemu_fflush is just to make sure the destination sees flag on the stream. > * Could we separate out send side fflush(1) from the receive side > fflush(1) operations? Writing a flag in the stream on the send side to > trigger fflush(1) on the receive side is weird; Data stream need not > say when to fflush(1). Let the sender and receiver decide when they > want to fflush(1) and fsync(1) data, no? Maybe that'll help to > reduce/solve the complexity of long conditionals? ie. if we are able > to fflush(1) and fsync(1) without any condition? There is no flush on the receive side. The RAM_SAVE_FLAG_MULTIFD_FLUSH flag is there to indicate to the destination that at that point in the stream the source has done a flush + sync operation and the destination should sync it's threads as well. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-28 13:19 ` Fabiano Rosas @ 2024-12-02 9:44 ` Prasad Pandit 2024-12-02 14:12 ` Fabiano Rosas 0 siblings, 1 reply; 18+ messages in thread From: Prasad Pandit @ 2024-12-02 9:44 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Hello Fabiano, On Thu, 28 Nov 2024 at 18:50, Fabiano Rosas <farosas@suse.de> wrote: >>> We currently have a mess of conditionals to achieve the correct >>> combination of multifd local flushes, where we sync the local >>> (send/recv) multifd threads between themselves, and multifd remote >>> flushes, where we put a flag on the stream to inform the recv side to >>> perform a local flush. ... > >> + if (do_local_flush) { > >> + ret = multifd_ram_flush_and_sync(); > >> + if (ret < 0) { > >> + return ret; > >> + } > >> + } > >> + ... > >> + /* Put a flag on the stream to trigger a remote flush */ > >> + if (do_remote_flush) { > >> + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > >> + qemu_fflush(f); > >> + } > >> + > >> + if (do_local_flush) { > >> + multifd_recv_sync_main(); > >> + } ... >These syncs are all related and we'd like to be able to reason about > them in a single place, not in several little pieces all over the place. > When a new feature comes up, like it did when mapped-ram was introduced, > we don't want to go around having to squint at conditionals to know whether > it applies to the new case or not. > > Also, ram.c is not the place for open-coded multifd code. The same > mapped-ram example applies: having to add if (migrate_mapped_ram()) > throughout the ram code was a pain and we had some iterations of > flipping the logic until we got it right. * I see. If it is the larger complexity of how multifd threads do flush/sync, not just about long conditionals with 3-4 sub-expressions, I think this can be done separately, instead of as part of this patch series. > There is no fflush() going on here. This code is about the multifd > flush, which sends the remaining data from MultiFDPages_t and the > multifd sync, which synchronizes the multifd threads. That qemu_fflush > is just to make sure the destination sees flag on the stream. * Yes, there is no fflush(3) call. I mentioned fflush(3) as indicative of the operation performed. ie. the description above reads the same as what fflush(3) does to streams. "...fflush() forces a write of all user-space buffered data for the given output or update stream via the stream's underlying write function." In the multifd case we are sending remaining data from MultiFDPages_t buffers onto the respective channels IIUC. The multifd_send_sync_main() function sets the 'p->pending_sync' field and when it is set miltifd_send_thread() function calls ret = qio_channel_write_all(p->c, (void *)p->packet, p->packet_len, &local_err); multifd_send_sync_main() also has 'flush_zero_copy', but that only happens when using --zerocopy option is used -> $ virsh migrate --zerocopy ... > There is no flush on the receive side. The RAM_SAVE_FLAG_MULTIFD_FLUSH > flag is there to indicate to the destination that at that point in the > stream the source has done a flush + sync operation and the destination > should sync it's threads as well. * The comment around where 'RAM_SAVE_FLAG_MULTIFD_FLUSH' gets written above, says -> "...trigger remote flush." * We seem to use terms 'flush' and 'sync' quite freely and interchangeably. ie. variables (ex: do_local_flush) and constants are named with _FLUSH and functions and fields are named as _sync_main() and &p->pending_sync. if (do_local_flush) { multifd_send/_recv_sync_main(); <= do the 'flush' and 'sync' mean the same thing here? } Even in multifd_ram_flush_and_sync() routine, it is named with _flush_ and eventually multifd_send() sets the '&p->pending_job' variable to true. There is no field in MultiFDSendParams structure named with 'flush'. It defines 'pending_sync' and 'sem_sync'. * Such free usage of these terms is bound to create confusion. Because while reading code the reader may relate flush with fflush(3) and sync with fsync(2) calls/operations. It will help if we use these terms more judiciously. * Coming back to the issue of simplifying multifd threads 'flush or sync' operation: 1. I think it can be done separately, outside the scope of this patch series. 2. Must we tie the flush/sync operations with specific spots? Isn't there any other way, where we call multifd_*_sync() unconditionally, without any side-effects? As I see it, we have those conditions, because of the side-effects. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-12-02 9:44 ` Prasad Pandit @ 2024-12-02 14:12 ` Fabiano Rosas 2024-12-03 5:40 ` Prasad Pandit 0 siblings, 1 reply; 18+ messages in thread From: Fabiano Rosas @ 2024-12-02 14:12 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > Hello Fabiano, > > On Thu, 28 Nov 2024 at 18:50, Fabiano Rosas <farosas@suse.de> wrote: >>>> We currently have a mess of conditionals to achieve the correct >>>> combination of multifd local flushes, where we sync the local >>>> (send/recv) multifd threads between themselves, and multifd remote >>>> flushes, where we put a flag on the stream to inform the recv side to >>>> perform a local flush. > ... >> >> + if (do_local_flush) { >> >> + ret = multifd_ram_flush_and_sync(); >> >> + if (ret < 0) { >> >> + return ret; >> >> + } >> >> + } >> >> + > ... >> >> + /* Put a flag on the stream to trigger a remote flush */ >> >> + if (do_remote_flush) { >> >> + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); >> >> + qemu_fflush(f); >> >> + } >> >> + >> >> + if (do_local_flush) { >> >> + multifd_recv_sync_main(); >> >> + } > ... >>These syncs are all related and we'd like to be able to reason about >> them in a single place, not in several little pieces all over the place. >> When a new feature comes up, like it did when mapped-ram was introduced, >> we don't want to go around having to squint at conditionals to know whether >> it applies to the new case or not. >> >> Also, ram.c is not the place for open-coded multifd code. The same >> mapped-ram example applies: having to add if (migrate_mapped_ram()) >> throughout the ram code was a pain and we had some iterations of >> flipping the logic until we got it right. > > * I see. If it is the larger complexity of how multifd threads do > flush/sync, not just about long conditionals with 3-4 sub-expressions, > I think this can be done separately, instead of as part of this patch > series. > >> There is no fflush() going on here. This code is about the multifd >> flush, which sends the remaining data from MultiFDPages_t and the >> multifd sync, which synchronizes the multifd threads. That qemu_fflush >> is just to make sure the destination sees flag on the stream. > > * Yes, there is no fflush(3) call. I mentioned fflush(3) as indicative > of the operation performed. ie. the description above reads the same > as what fflush(3) does to streams. > > "...fflush() forces a write of all user-space buffered data for > the given output or update stream via the stream's underlying write > function." > > In the multifd case we are sending remaining data from MultiFDPages_t > buffers onto the respective channels IIUC. The > multifd_send_sync_main() function sets the 'p->pending_sync' field and > when it is set miltifd_send_thread() function calls > > ret = qio_channel_write_all(p->c, (void *)p->packet, > p->packet_len, &local_err); > > multifd_send_sync_main() also has 'flush_zero_copy', but that only > happens when using --zerocopy option is used -> $ virsh migrate > --zerocopy ... > >> There is no flush on the receive side. The RAM_SAVE_FLAG_MULTIFD_FLUSH >> flag is there to indicate to the destination that at that point in the >> stream the source has done a flush + sync operation and the destination >> should sync it's threads as well. > > * The comment around where 'RAM_SAVE_FLAG_MULTIFD_FLUSH' gets written > above, says -> "...trigger remote flush." > > * We seem to use terms 'flush' and 'sync' quite freely and > interchangeably. ie. variables (ex: do_local_flush) and constants are > named with _FLUSH and functions and fields are named as _sync_main() > and &p->pending_sync. > > if (do_local_flush) { > multifd_send/_recv_sync_main(); <= do the 'flush' and > 'sync' mean the same thing here? > } No, that patch is indeed inconsistent in the terminology, good point. > > Even in multifd_ram_flush_and_sync() routine, it is named with _flush_ > and eventually multifd_send() sets the '&p->pending_job' variable to > true. There is no field in MultiFDSendParams structure named with > 'flush'. It defines 'pending_sync' and 'sem_sync'. Yes, the flush in flush_and_sync is tied with multifd_send(), that's what does the flush. This is more a consequence of the implementation, take a look at multifd_queue_page(). > > * Such free usage of these terms is bound to create confusion. Because > while reading code the reader may relate flush with fflush(3) and sync > with fsync(2) calls/operations. It will help if we use these terms > more judiciously. Well, flush and sync are not reserved terms, we can use them however we see fit. As long as it's consistent, of course. Note however, that there is some intended overlap with system library terminology in the implementation of QEMUFile. In qemu-file.c qemu_fflush() is indeed reminiscent of fflush(). I think it's been agreed in the past that this is not a good way of designing an interface and we should eventually move away from that. > > * Coming back to the issue of simplifying multifd threads 'flush or > sync' operation: > 1. I think it can be done separately, outside the scope of this patch series. Yes, that's fine. > 2. Must we tie the flush/sync operations with specific spots? Isn't > there any other way, where we call multifd_*_sync() unconditionally, > without any side-effects? As I see it, we have those conditions, > because of the side-effects. Perhaps we could move the sync up to the vmstate level and do it unconditionally. > > Thank you. > --- > - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-12-02 14:12 ` Fabiano Rosas @ 2024-12-03 5:40 ` Prasad Pandit 0 siblings, 0 replies; 18+ messages in thread From: Prasad Pandit @ 2024-12-03 5:40 UTC (permalink / raw) To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit Hello Fabiano, On Mon, 2 Dec 2024 at 19:42, Fabiano Rosas <farosas@suse.de> wrote: > > ...multifd_send/_recv_sync_main(); <= do the 'flush' and > > 'sync' mean the same thing here? > > No, that patch is indeed inconsistent in the terminology, good point. > Well, flush and sync are not reserved terms, we can use them however we > see fit. As long as it's consistent, of course. > * It'll help to define what 'flush' does and what 'sync' does in the Multifd context and how they differ from each other. Maybe here or code comments or both places or somewhere. Knowing their intended meaning will help while reading code, reviewing patches and also avoid confusion. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v1 3/4] migration: refactor ram_save_target_page functions 2024-11-27 14:12 ` Fabiano Rosas 2024-11-28 10:18 ` Prasad Pandit @ 2024-12-05 22:26 ` Peter Xu 1 sibling, 0 replies; 18+ messages in thread From: Peter Xu @ 2024-12-05 22:26 UTC (permalink / raw) To: Fabiano Rosas; +Cc: Prasad Pandit, qemu-devel, berrange, Prasad Pandit On Wed, Nov 27, 2024 at 11:12:09AM -0300, Fabiano Rosas wrote: > Prasad Pandit <ppandit@redhat.com> writes: > > > On Wed, 27 Nov 2024 at 02:49, Fabiano Rosas <farosas@suse.de> wrote: > >> This patch should be just the actual refactoring on top of master, with > >> no mention to postcopy at all. > > > > * Okay. We'll have to ensure that it is merged before multifd+postcopy change. > > > >> > + if (migrate_multifd() && !migration_in_postcopy() > >> > + && (!migrate_multifd_flush_after_each_section() > >> > + || migrate_mapped_ram())) { > >> > >> This is getting out of hand. We can't keep growing this condition every > >> time something new comes up. Any ideas? > > > > * In 'v0' this series, !migration_in_postcopy() was added to > > migrate_multifd(), which worked, but was not okay. > > > > * Another could be to define a new helper/macro to include above 3-4 > > checks. Because migrate_multifd(), > > migrate_multifd_flush_after_each_section() and migrate_mapped_ram() > > appear together at multiple points. But strangely the equation is not > > the same. Sometimes migrate_mapped_ram() is 'true' and sometimes it is > > 'false', so a combined helper may not work. It is all to accommodate > > different workings of multifd IIUC, if it and precopy used the same > > protocol/stream, maybe such conditionals would go away automatically. > > Maybe this would improve the situation. Peter, what do you think? > > -->8-- > From e9110360eb0efddf6945f37c518e3cc38d12b600 Mon Sep 17 00:00:00 2001 > From: Fabiano Rosas <farosas@suse.de> > Date: Wed, 27 Nov 2024 11:03:04 -0300 > Subject: [PATCH] migration: Rationalize multifd flushes from ram code > > We currently have a mess of conditionals to achieve the correct > combination of multifd local flushes, where we sync the local > (send/recv) multifd threads between themselves, and multifd remote > flushes, where we put a flag on the stream to inform the recv side to > perform a local flush. > > On top of that we also have the multifd_flush_after_each_section > property, which is there to provide backward compatibility from when > we used to flush after every vmstate section. > > And lastly, there's the mapped-ram feature which always wants the > non-backward compatible behavior and also does not support extraneous > flags on the stream (such as the MULTIFD_FLUSH flag). > > Move the conditionals into a separate function that encapsulates and > explains the whole situation. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > migration/ram.c | 198 ++++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 157 insertions(+), 41 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 05ff9eb328..caaaae6fdc 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1277,6 +1277,149 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss) > return pages; > } > > +enum RamMultifdFlushSpots { > + FLUSH_SAVE_SETUP, > + FLUSH_SAVE_ITER, > + FLUSH_DIRTY_BLOCK, > + FLUSH_SAVE_COMPLETE, > + > + FLUSH_LOAD_POSTCOPY_EOS, > + FLUSH_LOAD_POSTCOPY_FLUSH, > + FLUSH_LOAD_PRECOPY_EOS, > + FLUSH_LOAD_PRECOPY_FLUSH, POSTCOPY ones could be confusing, because we don't support them at all. We should remove them actually from ram_load_postcopy().. > +}; > + > +static int ram_multifd_flush(QEMUFile *f, enum RamMultifdFlushSpots spot) > +{ > + int ret; > + bool always_flush, do_local_flush, do_remote_flush; > + bool mapped_ram = migrate_mapped_ram(); > + > + if (!migrate_multifd()) { > + return 0; > + } > + > + /* > + * For backward compatibility, whether to flush multifd after each > + * section is sent. This is mutually exclusive with a > + * RAM_SAVE_FLAG_MULTIFD_FLUSH on the stream > + */ > + always_flush = migrate_multifd_flush_after_each_section(); > + > + /* > + * Save side flushes > + */ > + > + do_local_flush = false; > + > + switch (spot) { > + case FLUSH_SAVE_SETUP: > + assert(!bql_locked()); > + do_local_flush = true; > + break; > + > + case FLUSH_SAVE_ITER: > + /* > + * This flush is not necessary, only do for backward > + * compatibility. Mapped-ram assumes the new scheme. > + */ > + do_local_flush = always_flush && !mapped_ram; > + break; > + > + case FLUSH_DIRTY_BLOCK: > + /* > + * This is the flush that's actually required, always do it > + * unless we're emulating the old behavior. > + */ > + do_local_flush = !always_flush || mapped_ram; > + break; > + > + case FLUSH_SAVE_COMPLETE: > + do_local_flush = true; > + break; > + > + default: > + break; > + } > + > + if (do_local_flush) { > + ret = multifd_ram_flush_and_sync(); > + if (ret < 0) { > + return ret; > + } > + } > + > + /* > + * There's never a remote flush with mapped-ram because any flags > + * put on the stream (aside from RAM_SAVE_FLAG_MEM_SIZE and > + * RAM_SAVE_FLAG_EOS) break mapped-ram's assumption that ram pages > + * can be read contiguously from the stream. > + * > + * On the recv side, there's no local flush, even at EOS because > + * mapped-ram does its own flush after loading the ramblock. > + */ > + if (mapped_ram) { > + return 0; > + } > + > + do_remote_flush = false; > + > + /* Save side remote flush */ > + switch (spot) { > + case FLUSH_SAVE_SETUP: > + do_remote_flush = !always_flush; > + break; > + > + case FLUSH_SAVE_ITER: > + do_remote_flush = false; > + break; > + > + case FLUSH_DIRTY_BLOCK: > + do_remote_flush = do_local_flush; > + break; > + > + case FLUSH_SAVE_COMPLETE: > + do_remote_flush = false; > + break; > + > + default: > + break; > + } > + > + /* Put a flag on the stream to trigger a remote flush */ > + if (do_remote_flush) { > + qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > + qemu_fflush(f); > + } Unify RAM_SAVE_FLAG_MULTIFD_FLUSH is definitely great. > + > + /* > + * Load side flushes. > + */ > + > + do_local_flush = false; > + > + switch (spot) { > + case FLUSH_LOAD_PRECOPY_EOS: > + case FLUSH_LOAD_POSTCOPY_EOS: > + do_local_flush = always_flush; > + break; > + > + case FLUSH_LOAD_PRECOPY_FLUSH: > + case FLUSH_LOAD_POSTCOPY_FLUSH: > + do_local_flush = true; > + break; > + > + default: > + break; > + } > + > + if (do_local_flush) { > + multifd_recv_sync_main(); > + } Having send/recv call the same function can be prone to errors, IMHO. > + > + return 0; This is a lenghy function! It solves one problem where the caller is clean, however it didn't solve the other problem on when we want to know whether a sync is done, it's harder to follow to me. :( > +} > + > static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset) > { > if (!multifd_queue_page(block, offset)) { > @@ -1323,19 +1466,10 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > pss->page = 0; > pss->block = QLIST_NEXT_RCU(pss->block, next); > if (!pss->block) { > - if (migrate_multifd() && > - (!migrate_multifd_flush_after_each_section() || > - migrate_mapped_ram())) { > - QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; > - int ret = multifd_ram_flush_and_sync(); > - if (ret < 0) { > - return ret; > - } > - > - if (!migrate_mapped_ram()) { > - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > - qemu_fflush(f); > - } > + int ret = ram_multifd_flush(rs->pss[RAM_CHANNEL_PRECOPY].pss_channel, > + FLUSH_DIRTY_BLOCK); > + if (ret < 0) { > + return ret; > } > > /* Hit the end of the list */ > @@ -3065,18 +3199,13 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) > } > > bql_unlock(); > - ret = multifd_ram_flush_and_sync(); > + ret = ram_multifd_flush(f, FLUSH_SAVE_SETUP); Not relevant to this patch. I believe this flush_and_sync() is also not needed for modern QEMUs. It was there also for the same reason as complete(): EOS requires that on old ones.. > bql_lock(); > if (ret < 0) { > error_setg(errp, "%s: multifd synchronization failed", __func__); > return ret; > } > > - if (migrate_multifd() && !migrate_multifd_flush_after_each_section() > - && !migrate_mapped_ram()) { > - qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH); > - } > - > qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > ret = qemu_fflush(f); > if (ret < 0) { > @@ -3209,12 +3338,10 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) > > out: > if (ret >= 0 && migration_is_running()) { > - if (migrate_multifd() && migrate_multifd_flush_after_each_section() && > - !migrate_mapped_ram()) { > - ret = multifd_ram_flush_and_sync(); > - if (ret < 0) { > - return ret; > - } > + > + ret = ram_multifd_flush(f, FLUSH_SAVE_ITER); > + if (ret < 0) { > + return ret; > } > > qemu_put_be64(f, RAM_SAVE_FLAG_EOS); > @@ -3283,7 +3410,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) > } > } > > - ret = multifd_ram_flush_and_sync(); > + ret = ram_multifd_flush(f, FLUSH_SAVE_COMPLETE); > if (ret < 0) { > return ret; > } > @@ -3797,14 +3924,11 @@ int ram_load_postcopy(QEMUFile *f, int channel) > } > break; > case RAM_SAVE_FLAG_MULTIFD_FLUSH: > - multifd_recv_sync_main(); > + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_FLUSH); > break; > case RAM_SAVE_FLAG_EOS: > /* normal exit */ > - if (migrate_multifd() && > - migrate_multifd_flush_after_each_section()) { > - multifd_recv_sync_main(); > - } > + ram_multifd_flush(f, FLUSH_LOAD_POSTCOPY_EOS); > break; > default: > error_report("Unknown combination of migration flags: 0x%x" > @@ -4237,19 +4361,11 @@ static int ram_load_precopy(QEMUFile *f) > } > break; > case RAM_SAVE_FLAG_MULTIFD_FLUSH: > - multifd_recv_sync_main(); > + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_FLUSH); > break; > case RAM_SAVE_FLAG_EOS: > /* normal exit */ > - if (migrate_multifd() && > - migrate_multifd_flush_after_each_section() && > - /* > - * Mapped-ram migration flushes once and for all after > - * parsing ramblocks. Always ignore EOS for it. > - */ > - !migrate_mapped_ram()) { > - multifd_recv_sync_main(); > - } > + ram_multifd_flush(f, FLUSH_LOAD_PRECOPY_EOS); > break; > case RAM_SAVE_FLAG_HOOK: > ret = rdma_registration_handle(f); > -- > 2.35.3 > I do come up with a plan to clean this up. I gave it a shot, it's less code change than this one, and I hope it's acceptable and easier to read. But I'm ready if you want to say no to it. Let's discuss. Meanwhile since the VFIO patch would be relevant (on complete()), I'll need to resend that two patches series, but add my version of cleaning this on top. Give me a few hours to test it, then I'll resend that one with this altogether. Let's see how you think about it. -- Peter Xu ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v1 4/4] migration: enable multifd and postcopy together 2024-11-26 11:57 [PATCH v1 0/4] Allow to enable multifd and postcopy migration together Prasad Pandit ` (2 preceding siblings ...) 2024-11-26 11:57 ` [PATCH v1 3/4] migration: refactor ram_save_target_page functions Prasad Pandit @ 2024-11-26 11:57 ` Prasad Pandit 3 siblings, 0 replies; 18+ messages in thread From: Prasad Pandit @ 2024-11-26 11:57 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Enable Multifd and Postcopy migration together. The migration_ioc_process_incoming() routine checks magic value sent on each channel and helps to properly setup multifd and postcopy channels. Idea is to take advantage of the multifd threads to accelerate transfer of large guest RAM to the destination and switch to postcopy mode sooner. The Precopy and Multifd threads work during the initial guest RAM transfer. When migration moves to the Postcopy phase, the multifd threads are restrained and Postcopy threads on the destination request/pull data from the source side. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/migration.c | 95 +++++++++++++++++++++++++++---------------- 1 file changed, 60 insertions(+), 35 deletions(-) v1: Avoid using 4-bytes magic value for the Postcopy channel. Flush and synchronise Multifd thread before postcopy_start(). v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u diff --git a/migration/migration.c b/migration/migration.c index 8c5bd0a75c..eee7078106 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -92,6 +92,9 @@ enum mig_rp_message_type { MIG_RP_MSG_MAX }; +/* Migration channel types */ +enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY }; + /* When we add fault tolerance, we could have several migrations at once. For now we don't need to add dynamic creation of migration */ @@ -921,26 +924,32 @@ void migration_fd_process_incoming(QEMUFile *f) /* * Returns true when we want to start a new incoming migration process, * false otherwise. + * + * All the required channels must be in place before a new incoming + * migration process starts. + * - Multifd enabled: + * The main channel and the multifd channels are required. + * - Multifd/Postcopy disabled: + * The main channel is required. + * - Postcopy enabled: + * We don't want to start a new incoming migration when + * the postcopy channel is created. Because it is created + * towards the end of the precopy migration. */ -static bool migration_should_start_incoming(bool main_channel) +static bool migration_should_start_incoming(uint8_t channel) { - /* Multifd doesn't start unless all channels are established */ - if (migrate_multifd()) { - return migration_has_all_channels(); - } + bool ret = false; + + if (channel != CH_POSTCOPY) { + MigrationIncomingState *mis = migration_incoming_get_current(); + ret = mis->from_src_file ? true : false; - /* Preempt channel only starts when the main channel is created */ - if (migrate_postcopy_preempt()) { - return main_channel; + if (ret && migrate_multifd()) { + ret = multifd_recv_all_channels_created(); + } } - /* - * For all the rest types of migration, we should only reach here when - * it's the main channel that's being created, and we should always - * proceed with this channel. - */ - assert(main_channel); - return true; + return ret; } void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) @@ -948,13 +957,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); Error *local_err = NULL; QEMUFile *f; - bool default_channel = true; uint32_t channel_magic = 0; + uint8_t channel = CH_DEFAULT; int ret = 0; - if (migrate_multifd() && !migrate_mapped_ram() && - !migrate_postcopy_ram() && - qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { + if (!migration_should_start_incoming(channel) + && qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { /* * With multiple channels, it is possible that we receive channels * out of order on destination side, causing incorrect mapping of @@ -972,35 +980,46 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) return; } - default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)); + if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) { + channel = CH_DEFAULT; + } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) { + channel = CH_MULTIFD; + } else { + error_report("%s: could not identify channel, unknown magic: %u", + __func__, channel_magic); + return; + } + } else { - default_channel = !mis->from_src_file; + channel = CH_POSTCOPY; } if (multifd_recv_setup(errp) != 0) { return; } - if (default_channel) { + if (channel == CH_DEFAULT) { f = qemu_file_new_input(ioc); migration_incoming_setup(f); - } else { + } else if (channel == CH_MULTIFD) { /* Multiple connections */ - assert(migration_needs_multiple_sockets()); if (migrate_multifd()) { multifd_recv_new_channel(ioc, &local_err); - } else { + } + if (local_err) { + error_propagate(errp, local_err); + return; + } + } else if (channel == CH_POSTCOPY) { + if (migrate_postcopy()) { assert(migrate_postcopy_preempt()); + assert(!mis->postcopy_qemufile_dst); f = qemu_file_new_input(ioc); postcopy_preempt_new_channel(mis, f); } - if (local_err) { - error_propagate(errp, local_err); - return; - } } - if (migration_should_start_incoming(default_channel)) { + if (migration_should_start_incoming(channel)) { /* If it's a recovery, we're done */ if (postcopy_try_recover()) { return; @@ -1017,21 +1036,22 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) */ bool migration_has_all_channels(void) { + bool ret = false; MigrationIncomingState *mis = migration_incoming_get_current(); if (!mis->from_src_file) { - return false; + return ret; } if (migrate_multifd()) { - return multifd_recv_all_channels_created(); + ret = multifd_recv_all_channels_created(); } - if (migrate_postcopy_preempt()) { - return mis->postcopy_qemufile_dst != NULL; + if (ret && migrate_postcopy_preempt()) { + ret = mis->postcopy_qemufile_dst != NULL; } - return true; + return ret; } int migrate_send_rp_switchover_ack(MigrationIncomingState *mis) @@ -3239,6 +3259,11 @@ static MigIterateState migration_iteration_run(MigrationState *s) /* Still a significant amount to transfer */ if (!in_postcopy && must_precopy <= s->threshold_size && can_switchover && qatomic_read(&s->start_postcopy)) { + + if (migrate_multifd()) { + multifd_send_sync_main(); + } + if (postcopy_start(s, &local_err)) { migrate_set_error(s, local_err); error_report_err(local_err); -- 2.47.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-12-05 22:27 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-26 11:57 [PATCH v1 0/4] Allow to enable multifd and postcopy migration together Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 1/4] migration/multifd: move macros to multifd header Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 2/4] migration: remove multifd check with postcopy Prasad Pandit 2024-11-26 21:12 ` Fabiano Rosas 2024-11-27 11:43 ` Prasad Pandit 2024-11-26 11:57 ` [PATCH v1 3/4] migration: refactor ram_save_target_page functions Prasad Pandit 2024-11-26 21:18 ` Fabiano Rosas 2024-11-27 11:42 ` Prasad Pandit 2024-11-27 12:19 ` Fabiano Rosas 2024-11-28 10:43 ` Prasad Pandit 2024-11-27 14:12 ` Fabiano Rosas 2024-11-28 10:18 ` Prasad Pandit 2024-11-28 13:19 ` Fabiano Rosas 2024-12-02 9:44 ` Prasad Pandit 2024-12-02 14:12 ` Fabiano Rosas 2024-12-03 5:40 ` Prasad Pandit 2024-12-05 22:26 ` Peter Xu 2024-11-26 11:57 ` [PATCH v1 4/4] migration: enable multifd and postcopy together Prasad Pandit
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.