* [PATCH v5 0/5] Allow to enable multifd and postcopy migration together @ 2025-02-05 12:27 Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit ` (4 more replies) 0 siblings, 5 replies; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Hello, * This series (v5) consolidates migration capabilities setting in one 'set_migration_capabilities()' function, thus simplifying test sources. It passes all migration tests. === 66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 143.66s 71 subtests passed === v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t * This series (v4) adds more 'multifd+postcopy' qtests which test Precopy migration with 'postcopy-ram' attribute set. And run Postcopy migrations with 'multifd' channels enabled. === $ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test' # slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs # slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs # slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs # slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs # slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs # slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs # slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs ... 66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 148.41s 71 subtests passed === v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t * This series (v3) passes all existing 'tests/qtest/migration/*' tests and adds a new one to enable multifd channels with postcopy migration. v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u * This series (v2) further refactors the 'ram_save_target_page' function to make it independent of the multifd & postcopy change. v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u * This series removes magic value (4-bytes) introduced in the previous series for the Postcopy channel. v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u * Currently Multifd and Postcopy migration can not be used together. QEMU shows "Postcopy is not yet compatible with multifd" message. When migrating guests with large (100's GB) RAM, Multifd threads help to accelerate migration, but inability to use it with the Postcopy mode delays guest start up on the destination side. * This patch series allows to enable both Multifd and Postcopy migration together. Precopy and Multifd threads work during the initial guest (RAM) transfer. When migration moves to the Postcopy phase, Multifd threads are restrained and the Postcopy threads start to request pages from the source side. * This series introduces magic value (4-bytes) to be sent on the Postcopy channel. It helps to differentiate channels and properly setup incoming connections on the destination side. Thank you. --- Prasad Pandit (5): migration/multifd: move macros to multifd header migration: refactor ram_save_target_page functions migration: enable multifd and postcopy together tests/qtest/migration: add postcopy tests with multifd tests/qtest/migration: consolidate set capabilities migration/migration.c | 106 ++++++++++++++-------- migration/multifd-nocomp.c | 3 +- migration/multifd.c | 5 - migration/multifd.h | 5 + migration/options.c | 5 - migration/ram.c | 69 ++++---------- tests/qtest/migration/compression-tests.c | 18 +++- tests/qtest/migration/cpr-tests.c | 4 +- tests/qtest/migration/file-tests.c | 44 +++------ tests/qtest/migration/framework.c | 58 ++++++++++-- tests/qtest/migration/framework.h | 4 +- tests/qtest/migration/postcopy-tests.c | 27 +++++- tests/qtest/migration/precopy-tests.c | 38 +++++--- tests/qtest/migration/tls-tests.c | 45 ++++++++- 14 files changed, 262 insertions(+), 169 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v5 1/5] migration/multifd: move macros to multifd header 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit @ 2025-02-05 12:27 ` Prasad Pandit 2025-02-06 22:41 ` Peter Xu 2025-02-05 12:27 ` [PATCH v5 2/5] migration: refactor ram_save_target_page functions Prasad Pandit ` (3 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Move MULTIFD_ macros to the header file so that they are accessible from other source files. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/multifd.c | 5 ----- migration/multifd.h | 5 +++++ 2 files changed, 5 insertions(+), 5 deletions(-) v4: no change - https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t diff --git a/migration/multifd.c b/migration/multifd.c index ab73d6d984..97ce831775 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -33,11 +33,6 @@ #include "io/channel-socket.h" #include "yank_functions.h" -/* Multiple fd's */ - -#define MULTIFD_MAGIC 0x11223344U -#define MULTIFD_VERSION 1 - typedef struct { uint32_t magic; uint32_t version; diff --git a/migration/multifd.h b/migration/multifd.h index bd785b9873..10149af654 100644 --- a/migration/multifd.h +++ b/migration/multifd.h @@ -49,6 +49,11 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset); bool multifd_recv(void); MultiFDRecvData *multifd_get_recv_data(void); +/* Multiple fd's */ + +#define MULTIFD_MAGIC 0x11223344U +#define MULTIFD_VERSION 1 + /* Multifd Compression flags */ #define MULTIFD_FLAG_SYNC (1 << 0) -- 2.48.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 1/5] migration/multifd: move macros to multifd header 2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit @ 2025-02-06 22:41 ` Peter Xu 0 siblings, 0 replies; 23+ messages in thread From: Peter Xu @ 2025-02-06 22:41 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 05, 2025 at 05:57:08PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > Move MULTIFD_ macros to the header file so that > they are accessible from other source files. > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v5 2/5] migration: refactor ram_save_target_page functions 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit @ 2025-02-05 12:27 ` Prasad Pandit 2025-02-06 22:43 ` Peter Xu 2025-02-05 12:27 ` [PATCH v5 3/5] migration: enable multifd and postcopy together Prasad Pandit ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Refactor ram_save_target_page legacy and multifd functions into one. Other than simplifying it, it frees 'migration_ops' object from usage, so it is expunged. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/ram.c | 67 +++++++++++++------------------------------------ 1 file changed, 17 insertions(+), 50 deletions(-) v4: no change - https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t diff --git a/migration/ram.c b/migration/ram.c index ce28328141..f2326788de 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -446,13 +446,6 @@ void ram_transferred_add(uint64_t bytes) } } -struct MigrationOps { - int (*ram_save_target_page)(RAMState *rs, PageSearchStatus *pss); -}; -typedef struct MigrationOps MigrationOps; - -MigrationOps *migration_ops; - static int ram_save_host_page_urgent(PageSearchStatus *pss); /* NOTE: page is the PFN not real ram_addr_t. */ @@ -1958,55 +1951,36 @@ int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t len, } /** - * ram_save_target_page_legacy: save one target page - * - * Returns the number of pages written + * ram_save_target_page: save one target page to the precopy thread + * OR to multifd workers. * * @rs: current RAM state * @pss: data about the page we want to send */ -static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) +static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) { ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; int res; + if (!migrate_multifd() + || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { + if (save_zero_page(rs, pss, offset)) { + return 1; + } + } + + if (migrate_multifd()) { + RAMBlock *block = pss->block; + return ram_save_multifd_page(block, offset); + } + if (control_save_page(pss, offset, &res)) { return res; } - if (save_zero_page(rs, pss, offset)) { - return 1; - } - return ram_save_page(rs, pss); } -/** - * ram_save_target_page_multifd: send one target page to multifd workers - * - * Returns 1 if the page was queued, -1 otherwise. - * - * @rs: current RAM state - * @pss: data about the page we want to send - */ -static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss) -{ - RAMBlock *block = pss->block; - ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; - - /* - * While using multifd live migration, we still need to handle zero - * page checking on the migration main thread. - */ - if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { - if (save_zero_page(rs, pss, offset)) { - return 1; - } - } - - return ram_save_multifd_page(block, offset); -} - /* Should be called before sending a host page */ static void pss_host_page_prepare(PageSearchStatus *pss) { @@ -2093,7 +2067,7 @@ static int ram_save_host_page_urgent(PageSearchStatus *pss) if (page_dirty) { /* Be strict to return code; it must be 1, or what else? */ - if (migration_ops->ram_save_target_page(rs, pss) != 1) { + if (ram_save_target_page(rs, pss) != 1) { error_report_once("%s: ram_save_target_page failed", __func__); ret = -1; goto out; @@ -2162,7 +2136,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss) if (preempt_active) { qemu_mutex_unlock(&rs->bitmap_mutex); } - tmppages = migration_ops->ram_save_target_page(rs, pss); + tmppages = ram_save_target_page(rs, pss); if (tmppages >= 0) { pages += tmppages; /* @@ -2360,8 +2334,6 @@ static void ram_save_cleanup(void *opaque) xbzrle_cleanup(); multifd_ram_save_cleanup(); ram_state_cleanup(rsp); - g_free(migration_ops); - migration_ops = NULL; } static void ram_state_reset(RAMState *rs) @@ -3027,13 +2999,8 @@ static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp) return ret; } - migration_ops = g_malloc0(sizeof(MigrationOps)); - if (migrate_multifd()) { multifd_ram_save_setup(); - migration_ops->ram_save_target_page = ram_save_target_page_multifd; - } else { - migration_ops->ram_save_target_page = ram_save_target_page_legacy; } /* -- 2.48.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 2/5] migration: refactor ram_save_target_page functions 2025-02-05 12:27 ` [PATCH v5 2/5] migration: refactor ram_save_target_page functions Prasad Pandit @ 2025-02-06 22:43 ` Peter Xu 2025-02-07 12:19 ` Fabiano Rosas 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-06 22:43 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 05, 2025 at 05:57:09PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > Refactor ram_save_target_page legacy and multifd > functions into one. Other than simplifying it, > it frees 'migration_ops' object from usage, so it > is expunged. > > Reviewed-by: Fabiano Rosas <farosas@suse.de> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> One nitpick below: [...] > -static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss) > +static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) > { > ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS; > int res; > > + if (!migrate_multifd() > + || migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) { > + if (save_zero_page(rs, pss, offset)) { > + return 1; > + } > + } > + > + if (migrate_multifd()) { > + RAMBlock *block = pss->block; > + return ram_save_multifd_page(block, offset); Can drop the var here : return ram_save_multifd_page(pss->block, offset); -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 2/5] migration: refactor ram_save_target_page functions 2025-02-06 22:43 ` Peter Xu @ 2025-02-07 12:19 ` Fabiano Rosas 0 siblings, 0 replies; 23+ messages in thread From: Fabiano Rosas @ 2025-02-07 12:19 UTC (permalink / raw) To: Peter Xu, Prasad Pandit; +Cc: qemu-devel, berrange, Prasad Pandit Peter Xu <peterx@redhat.com> writes: > On Wed, Feb 05, 2025 at 05:57:09PM +0530, Prasad Pandit wrote: >> From: Prasad Pandit <pjp@fedoraproject.org> >> >> Refactor ram_save_target_page legacy and multifd >> functions into one. Other than simplifying it, >> it frees 'migration_ops' object from usage, so it >> is expunged. >> >> Reviewed-by: Fabiano Rosas <farosas@suse.de> >> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > One nitpick below: Too late, this one went in the last PR. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 2/5] migration: refactor ram_save_target_page functions Prasad Pandit @ 2025-02-05 12:27 ` Prasad Pandit 2025-02-06 23:16 ` Peter Xu 2025-02-05 12:27 ` [PATCH v5 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit 4 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Enable Multifd and Postcopy migration together. The migration_ioc_process_incoming() routine checks magic value sent on each channel and helps to properly setup multifd and postcopy channels. The Precopy and Multifd threads work during the initial guest RAM transfer. When migration moves to the Postcopy phase, the multifd threads are restrained and Postcopy threads on the destination request/pull data from the source side. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- migration/migration.c | 106 +++++++++++++++++++++++-------------- migration/multifd-nocomp.c | 3 +- migration/options.c | 5 -- migration/ram.c | 4 +- 4 files changed, 70 insertions(+), 48 deletions(-) v4: no change - https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t diff --git a/migration/migration.c b/migration/migration.c index 2d1da917c7..a280722e9e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -92,6 +92,9 @@ enum mig_rp_message_type { MIG_RP_MSG_MAX }; +/* Migration channel types */ +enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY }; + /* When we add fault tolerance, we could have several migrations at once. For now we don't need to add dynamic creation of migration */ @@ -929,26 +932,33 @@ void migration_fd_process_incoming(QEMUFile *f) /* * Returns true when we want to start a new incoming migration process, * false otherwise. + * + * All the required channels must be in place before a new incoming + * migration process starts. + * - Multifd enabled: + * The main channel and the multifd channels are required. + * - Multifd/Postcopy disabled: + * The main channel is required. + * - Postcopy enabled: + * We don't want to start a new incoming migration when + * the postcopy channel is created. Because it is created + * towards the end of the precopy migration. + * */ -static bool migration_should_start_incoming(bool main_channel) +static bool migration_should_start_incoming(uint8_t channel) { - /* Multifd doesn't start unless all channels are established */ - if (migrate_multifd()) { - return migration_has_all_channels(); - } + bool ret = false; + + if (channel != CH_POSTCOPY) { + MigrationIncomingState *mis = migration_incoming_get_current(); + ret = mis->from_src_file ? true : false; - /* Preempt channel only starts when the main channel is created */ - if (migrate_postcopy_preempt()) { - return main_channel; + if (ret && migrate_multifd()) { + ret = multifd_recv_all_channels_created(); + } } - /* - * For all the rest types of migration, we should only reach here when - * it's the main channel that's being created, and we should always - * proceed with this channel. - */ - assert(main_channel); - return true; + return ret; } void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) @@ -956,13 +966,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) MigrationIncomingState *mis = migration_incoming_get_current(); Error *local_err = NULL; QEMUFile *f; - bool default_channel = true; uint32_t channel_magic = 0; + uint8_t channel = CH_DEFAULT; int ret = 0; - if (migrate_multifd() && !migrate_mapped_ram() && - !migrate_postcopy_ram() && - qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { + if (!migration_should_start_incoming(channel)) { + if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { /* * With multiple channels, it is possible that we receive channels * out of order on destination side, causing incorrect mapping of @@ -973,42 +982,58 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) * tls handshake while initializing main channel so with tls this * issue is not possible. */ - ret = migration_channel_read_peek(ioc, (void *)&channel_magic, - sizeof(channel_magic), errp); + ret = migration_channel_read_peek(ioc, (void *)&channel_magic, + sizeof(channel_magic), errp); + if (ret != 0) { + return; + } - if (ret != 0) { - return; + if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) { + channel = CH_DEFAULT; + } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) { + channel = CH_MULTIFD; + } else if (!mis->from_src_file + && mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + /* reconnect default channel for postcopy recovery */ + channel = CH_DEFAULT; + } else { + error_report("%s: could not identify channel, unknown magic: %u", + __func__, channel_magic); + return; + } + } else if (mis->from_src_file + && (!strcmp(ioc->name, "migration-tls-incoming") + || !strcmp(ioc->name, "migration-file-incoming"))) { + channel = CH_MULTIFD; } - - default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)); - } else { - default_channel = !mis->from_src_file; + } else if (mis->from_src_file) { // && migrate_postcopy_preempt() + channel = CH_POSTCOPY; } if (multifd_recv_setup(errp) != 0) { return; } - if (default_channel) { + if (channel == CH_DEFAULT) { f = qemu_file_new_input(ioc); migration_incoming_setup(f); - } else { + } else if (channel == CH_MULTIFD) { /* Multiple connections */ - assert(migration_needs_multiple_sockets()); if (migrate_multifd()) { multifd_recv_new_channel(ioc, &local_err); - } else { - assert(migrate_postcopy_preempt()); - f = qemu_file_new_input(ioc); - postcopy_preempt_new_channel(mis, f); } if (local_err) { error_propagate(errp, local_err); return; } + } else if (channel == CH_POSTCOPY) { + assert(migrate_postcopy_preempt()); + assert(!mis->postcopy_qemufile_dst); + f = qemu_file_new_input(ioc); + postcopy_preempt_new_channel(mis, f); } - if (migration_should_start_incoming(default_channel)) { + if (migration_should_start_incoming(channel)) { /* If it's a recovery, we're done */ if (postcopy_try_recover()) { return; @@ -1025,21 +1050,22 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) */ bool migration_has_all_channels(void) { + bool ret = false; MigrationIncomingState *mis = migration_incoming_get_current(); if (!mis->from_src_file) { - return false; + return ret; } if (migrate_multifd()) { - return multifd_recv_all_channels_created(); + ret = multifd_recv_all_channels_created(); } - if (migrate_postcopy_preempt()) { - return mis->postcopy_qemufile_dst != NULL; + if (ret && migrate_postcopy_preempt()) { + ret = mis->postcopy_qemufile_dst != NULL; } - return true; + return ret; } int migrate_send_rp_switchover_ack(MigrationIncomingState *mis) diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c index 1325dba97c..d0edec7cd1 100644 --- a/migration/multifd-nocomp.c +++ b/migration/multifd-nocomp.c @@ -16,6 +16,7 @@ #include "file.h" #include "multifd.h" #include "options.h" +#include "migration.h" #include "qapi/error.h" #include "qemu/cutils.h" #include "qemu/error-report.h" @@ -391,7 +392,7 @@ int multifd_ram_flush_and_sync(QEMUFile *f) MultiFDSyncReq req; int ret; - if (!migrate_multifd()) { + if (!migrate_multifd() || migration_in_postcopy()) { return 0; } diff --git a/migration/options.c b/migration/options.c index b8d5300326..8c878dea49 100644 --- a/migration/options.c +++ b/migration/options.c @@ -479,11 +479,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) error_setg(errp, "Postcopy is not compatible with ignore-shared"); return false; } - - if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) { - error_setg(errp, "Postcopy is not yet compatible with multifd"); - return false; - } } if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { diff --git a/migration/ram.c b/migration/ram.c index f2326788de..bdba7abe73 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1295,7 +1295,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) pss->page = 0; pss->block = QLIST_NEXT_RCU(pss->block, next); if (!pss->block) { - if (multifd_ram_sync_per_round()) { + if (multifd_ram_sync_per_round() && !migration_in_postcopy()) { QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; int ret = multifd_ram_flush_and_sync(f); if (ret < 0) { @@ -1969,7 +1969,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) } } - if (migrate_multifd()) { + if (migrate_multifd() && !migration_in_postcopy()) { RAMBlock *block = pss->block; return ram_save_multifd_page(block, offset); } -- 2.48.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-05 12:27 ` [PATCH v5 3/5] migration: enable multifd and postcopy together Prasad Pandit @ 2025-02-06 23:16 ` Peter Xu 2025-02-07 10:32 ` Prasad Pandit 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-06 23:16 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 05, 2025 at 05:57:10PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > Enable Multifd and Postcopy migration together. > The migration_ioc_process_incoming() routine > checks magic value sent on each channel and > helps to properly setup multifd and postcopy > channels. > > The Precopy and Multifd threads work during the > initial guest RAM transfer. When migration moves > to the Postcopy phase, the multifd threads are > restrained and Postcopy threads on the destination > request/pull data from the source side. > > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > --- > migration/migration.c | 106 +++++++++++++++++++++++-------------- > migration/multifd-nocomp.c | 3 +- > migration/options.c | 5 -- > migration/ram.c | 4 +- > 4 files changed, 70 insertions(+), 48 deletions(-) > > v4: no change > - https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t > > diff --git a/migration/migration.c b/migration/migration.c > index 2d1da917c7..a280722e9e 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -92,6 +92,9 @@ enum mig_rp_message_type { > MIG_RP_MSG_MAX > }; > > +/* Migration channel types */ > +enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY }; Maybe s/DEFAULT/MAIN/? > + > /* When we add fault tolerance, we could have several > migrations at once. For now we don't need to add > dynamic creation of migration */ > @@ -929,26 +932,33 @@ void migration_fd_process_incoming(QEMUFile *f) > /* > * Returns true when we want to start a new incoming migration process, > * false otherwise. > + * > + * All the required channels must be in place before a new incoming > + * migration process starts. > + * - Multifd enabled: > + * The main channel and the multifd channels are required. > + * - Multifd/Postcopy disabled: > + * The main channel is required. > + * - Postcopy enabled: > + * We don't want to start a new incoming migration when > + * the postcopy channel is created. Because it is created > + * towards the end of the precopy migration. > + * > */ > -static bool migration_should_start_incoming(bool main_channel) > +static bool migration_should_start_incoming(uint8_t channel) > { > - /* Multifd doesn't start unless all channels are established */ > - if (migrate_multifd()) { > - return migration_has_all_channels(); > - } > + bool ret = false; > + > + if (channel != CH_POSTCOPY) { > + MigrationIncomingState *mis = migration_incoming_get_current(); > + ret = mis->from_src_file ? true : false; > > - /* Preempt channel only starts when the main channel is created */ > - if (migrate_postcopy_preempt()) { > - return main_channel; > + if (ret && migrate_multifd()) { > + ret = multifd_recv_all_channels_created(); > + } > } > > - /* > - * For all the rest types of migration, we should only reach here when > - * it's the main channel that's being created, and we should always > - * proceed with this channel. > - */ > - assert(main_channel); > - return true; > + return ret; > } > > void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) > @@ -956,13 +966,12 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) > MigrationIncomingState *mis = migration_incoming_get_current(); > Error *local_err = NULL; > QEMUFile *f; > - bool default_channel = true; > uint32_t channel_magic = 0; > + uint8_t channel = CH_DEFAULT; > int ret = 0; > > - if (migrate_multifd() && !migrate_mapped_ram() && > - !migrate_postcopy_ram() && > - qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { > + if (!migration_should_start_incoming(channel)) { This says "if we assume this is the main channel, and if we shouldn't start incoming migration, then we should peek at the buffers". Could you help explain? > + if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { > /* > * With multiple channels, it is possible that we receive channels > * out of order on destination side, causing incorrect mapping of > @@ -973,42 +982,58 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) > * tls handshake while initializing main channel so with tls this > * issue is not possible. > */ > - ret = migration_channel_read_peek(ioc, (void *)&channel_magic, > - sizeof(channel_magic), errp); > + ret = migration_channel_read_peek(ioc, (void *)&channel_magic, > + sizeof(channel_magic), errp); > + if (ret != 0) { > + return; > + } > > - if (ret != 0) { > - return; > + if (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)) { > + channel = CH_DEFAULT; > + } else if (channel_magic == cpu_to_be32(MULTIFD_MAGIC)) { > + channel = CH_MULTIFD; > + } else if (!mis->from_src_file > + && mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > + /* reconnect default channel for postcopy recovery */ > + channel = CH_DEFAULT; This is still in the big "peek buffer" if condition. IMHO we can skip peeking buffer when postcopy paused, because in this stage the channel must be (1) main channel first, then (2) preempt channel next. > + } else { > + error_report("%s: could not identify channel, unknown magic: %u", > + __func__, channel_magic); > + return; > + } > + } else if (mis->from_src_file > + && (!strcmp(ioc->name, "migration-tls-incoming") > + || !strcmp(ioc->name, "migration-file-incoming"))) { > + channel = CH_MULTIFD; Confused here too. Why do we need to check ioc name? Shouldn't multifd has the headers? > } > - > - default_channel = (channel_magic == cpu_to_be32(QEMU_VM_FILE_MAGIC)); > - } else { > - default_channel = !mis->from_src_file; > + } else if (mis->from_src_file) { // && migrate_postcopy_preempt() > + channel = CH_POSTCOPY; > } > > if (multifd_recv_setup(errp) != 0) { > return; > } > > - if (default_channel) { > + if (channel == CH_DEFAULT) { > f = qemu_file_new_input(ioc); > migration_incoming_setup(f); > - } else { > + } else if (channel == CH_MULTIFD) { > /* Multiple connections */ > - assert(migration_needs_multiple_sockets()); Could I ask why removal? > if (migrate_multifd()) { > multifd_recv_new_channel(ioc, &local_err); > - } else { > - assert(migrate_postcopy_preempt()); > - f = qemu_file_new_input(ioc); > - postcopy_preempt_new_channel(mis, f); > } > if (local_err) { > error_propagate(errp, local_err); > return; > } > + } else if (channel == CH_POSTCOPY) { > + assert(migrate_postcopy_preempt()); > + assert(!mis->postcopy_qemufile_dst); > + f = qemu_file_new_input(ioc); > + postcopy_preempt_new_channel(mis, f); > } > > - if (migration_should_start_incoming(default_channel)) { > + if (migration_should_start_incoming(channel)) { > /* If it's a recovery, we're done */ > if (postcopy_try_recover()) { > return; > @@ -1025,21 +1050,22 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp) > */ > bool migration_has_all_channels(void) > { > + bool ret = false; > MigrationIncomingState *mis = migration_incoming_get_current(); > > if (!mis->from_src_file) { > - return false; > + return ret; > } > > if (migrate_multifd()) { > - return multifd_recv_all_channels_created(); > + ret = multifd_recv_all_channels_created(); > } > > - if (migrate_postcopy_preempt()) { > - return mis->postcopy_qemufile_dst != NULL; > + if (ret && migrate_postcopy_preempt()) { It might be better to avoid such "ret && XXX" nested check. E.g. do you think below easier to read? diff --git a/migration/migration.c b/migration/migration.c index 74c50cc72c..9eb2f3fdeb 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1064,12 +1064,14 @@ bool migration_has_all_channels(void) return false; } - if (migrate_multifd()) { - return multifd_recv_all_channels_created(); + if (migrate_multifd() && + !multifd_recv_all_channels_created()) { + return false; } - if (migrate_postcopy_preempt()) { - return mis->postcopy_qemufile_dst != NULL; + if (migrate_postcopy_preempt() && + mis->postcopy_qemufile_dst == NULL) { + return false; } return true; > + ret = mis->postcopy_qemufile_dst != NULL; > } > > - return true; > + return ret; > } > > int migrate_send_rp_switchover_ack(MigrationIncomingState *mis) > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c > index 1325dba97c..d0edec7cd1 100644 > --- a/migration/multifd-nocomp.c > +++ b/migration/multifd-nocomp.c > @@ -16,6 +16,7 @@ > #include "file.h" > #include "multifd.h" > #include "options.h" > +#include "migration.h" > #include "qapi/error.h" > #include "qemu/cutils.h" > #include "qemu/error-report.h" > @@ -391,7 +392,7 @@ int multifd_ram_flush_and_sync(QEMUFile *f) > MultiFDSyncReq req; > int ret; > > - if (!migrate_multifd()) { > + if (!migrate_multifd() || migration_in_postcopy()) { > return 0; > } [1] > > diff --git a/migration/options.c b/migration/options.c > index b8d5300326..8c878dea49 100644 > --- a/migration/options.c > +++ b/migration/options.c > @@ -479,11 +479,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) > error_setg(errp, "Postcopy is not compatible with ignore-shared"); > return false; > } > - > - if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) { > - error_setg(errp, "Postcopy is not yet compatible with multifd"); > - return false; > - } > } > > if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { > diff --git a/migration/ram.c b/migration/ram.c > index f2326788de..bdba7abe73 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -1295,7 +1295,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > pss->page = 0; > pss->block = QLIST_NEXT_RCU(pss->block, next); > if (!pss->block) { > - if (multifd_ram_sync_per_round()) { > + if (multifd_ram_sync_per_round() && !migration_in_postcopy()) { If you have above[1], why need this? > QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel; > int ret = multifd_ram_flush_and_sync(f); > if (ret < 0) { > @@ -1969,7 +1969,7 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss) > } > } > > - if (migrate_multifd()) { > + if (migrate_multifd() && !migration_in_postcopy()) { > RAMBlock *block = pss->block; > return ram_save_multifd_page(block, offset); > } > -- > 2.48.1 > This patch still did nothing for multifd in postcopy_start(). I'm not sure it's safe. What happens if some multifd pages were sent, then we start postcopy, dest vcpu threads running, then during postcopy some multifd pages finally arrived and modifying the guest pages during vcpus running? Thanks, -- Peter Xu ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-06 23:16 ` Peter Xu @ 2025-02-07 10:32 ` Prasad Pandit 2025-02-07 15:45 ` Peter Xu 0 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-07 10:32 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit Hi, On Fri, 7 Feb 2025 at 04:46, Peter Xu <peterx@redhat.com> wrote: > > +/* Migration channel types */ > > +enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY }; > > Maybe s/DEFAULT/MAIN/? * Okay. > > - if (migrate_multifd() && !migrate_mapped_ram() && > > - !migrate_postcopy_ram() && > > - qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { > > + if (!migration_should_start_incoming(channel)) { > > This says "if we assume this is the main channel, and if we shouldn't start > incoming migration, then we should peek at the buffers". > Could you help explain? * New migration starts only when the main channel and if 'multifd' is enabled all multifd channels are established. So, if 'main' and 'multifd' channels are _not_ established then migration should _not_ start. And in that case, incoming connection is likely for one of those channels and so we should peek at the buffers, because both 'main' and 'multifd' channels send magic values. * migration_should_start_incoming() function returns 'true' only when 'main' and 'multifd' channels are being established. For 'postcopy' channel it returns false. > > + } else if (!mis->from_src_file > > + && mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > > + /* reconnect default channel for postcopy recovery */ > > + channel = CH_DEFAULT; > > This is still in the big "peek buffer" if condition. > IMHO we can skip peeking buffer when postcopy paused, because in this stage > the channel must be (1) main channel first, then (2) preempt channel next. * It is in the big 'peek buffer' condition because the 'main' channel (= CH_DEFAULT) is being established here. Ideally, all channels should send magic values to be consistent. The 'main' channel sends magic value when it is established before starting migration, but the same 'main' channel does not send magic value when it is established during postcopy recovery, that is an inconsistency (a bug) here. Ideal fix is to send a magic value every time the 'main' channel is established, irrespective of when it is established. * Adding conditionals to check if it is _POSTCOPY_PAUSED state then don't peek will only lead to complicated 'if' conditionals. This channel handling code is already complex and non-intuitive enough. > > + } else if (mis->from_src_file > > + && (!strcmp(ioc->name, "migration-tls-incoming") > > + || !strcmp(ioc->name, "migration-file-incoming"))) { > > + channel = CH_MULTIFD; > > Confused here too. Why do we need to check ioc name? Shouldn't multifd has > the headers? * Because they are not 'multifd' channels, tls/file channels don't send magic values, but are still handled by 'multifd_recv_new_channel()' function. === ... if (default_channel) { migration_incoming_setup(f); } else { if (migrate_multifd()) { multifd_recv_new_channel(ioc, &local_err); } else { postcopy_preempt_new_channel(mis, f); } === In the code above, if 'default_channel==false' and multifd() is enabled, all incoming connections are handled by 'multifd_recv_new_channel()', irrespective of whether it is a 'multifd' channel or not. While creating multifd channels, there is no check for channel type like: if(channel == CH_MULTIFD). * IMHO, if we make all channels behave with consistency, ie. either they all send magic value or none sends magic value, that'll simplify this code a lot. > > - assert(migration_needs_multiple_sockets()); > Could I ask why removal? * Because that function returns migrate_multifd() => migrate_multifd() || migrate_postcopy_preempt(); * And the very following check is also migrate_multifd(), as below: > > if (migrate_multifd()) { > > multifd_recv_new_channel(ioc, &local_err); > It might be better to avoid such "ret && XXX" nested check. E.g. do you > think below easier to read? > > diff --git a/migration/migration.c b/migration/migration.c > index 74c50cc72c..9eb2f3fdeb 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1064,12 +1064,14 @@ bool migration_has_all_channels(void) > return false; > } > > - if (migrate_multifd()) { > - return multifd_recv_all_channels_created(); > + if (migrate_multifd() && > + !multifd_recv_all_channels_created()) { > + return false; > } > > - if (migrate_postcopy_preempt()) { > - return mis->postcopy_qemufile_dst != NULL; > + if (migrate_postcopy_preempt() && > + mis->postcopy_qemufile_dst == NULL) { > + return false; > } > > return true; * Will try it. > > - if (!migrate_multifd()) { > > + if (!migrate_multifd() || migration_in_postcopy()) { > > return 0; > > } > > [1] > > > > > if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { > > diff --git a/migration/ram.c b/migration/ram.c > > index f2326788de..bdba7abe73 100644 > > --- a/migration/ram.c > > +++ b/migration/ram.c > > @@ -1295,7 +1295,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > > pss->page = 0; > > pss->block = QLIST_NEXT_RCU(pss->block, next); > > if (!pss->block) { > > - if (multifd_ram_sync_per_round()) { > > + if (multifd_ram_sync_per_round() && !migration_in_postcopy()) { > > If you have above[1], why need this? * True, I tried with just [1] above first, but it was failing for some reason. Will try again. > This patch still did nothing for multifd in postcopy_start(). I'm not sure > it's safe. > > What happens if some multifd pages were sent, then we start postcopy, dest > vcpu threads running, then during postcopy some multifd pages finally > arrived and modifying the guest pages during vcpus running? * ram_save_target_page() function saves multifd pages only when (..!migration_in_postcopy()) not in postcopy mode. Case of 'multifd' page arriving late on destination and 'postcopy' starting before that is strange, because if multifd page is getting late, that network latency should affect 'postcopy' channel too, no? But still if it is possible, do we want to call - multifd_ram_flush_and_sync() before postcopy_start()? Will that help? I'll check if/how it works. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-07 10:32 ` Prasad Pandit @ 2025-02-07 15:45 ` Peter Xu 2025-02-08 10:36 ` Prasad Pandit 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-07 15:45 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Fri, Feb 07, 2025 at 04:02:44PM +0530, Prasad Pandit wrote: > Hi, > > On Fri, 7 Feb 2025 at 04:46, Peter Xu <peterx@redhat.com> wrote: > > > +/* Migration channel types */ > > > +enum { CH_DEFAULT, CH_MULTIFD, CH_POSTCOPY }; > > > > Maybe s/DEFAULT/MAIN/? > > * Okay. > > > > - if (migrate_multifd() && !migrate_mapped_ram() && [b] > > > - !migrate_postcopy_ram() && > > > - qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { > > > + if (!migration_should_start_incoming(channel)) { > > > > This says "if we assume this is the main channel, and if we shouldn't start > > incoming migration, then we should peek at the buffers". > > Could you help explain? > > * New migration starts only when the main channel and if 'multifd' is > enabled all multifd channels are established. So, if 'main' and > 'multifd' channels are _not_ established then migration should _not_ > start. And in that case, incoming connection is likely for one of > those channels and so we should peek at the buffers, because both > 'main' and 'multifd' channels send magic values. > > * migration_should_start_incoming() function returns 'true' only when > 'main' and 'multifd' channels are being established. For 'postcopy' > channel it returns false. This is not easy to follow neither with the current name, nor that you "assumed this is main channel" and test it. I think you may want to split migration_has_all_channels() into migration_has_essential_channels() which only covers main and multifd cases. Then you can check if (!has_esential) here. You'd better also add a comment that all "essential channels" can be peeked. You may also want to bypass a few things, e.g. "postcopy paused stage" here rather than inside, because postcopy-recover only happens: - First with a main channel, that is not peekable as no header when resume - Then with preempt channel, that is also not peekable [a] You may also need to keep the mapped-ram check. They also don't support peek. > > > > > + } else if (!mis->from_src_file > > > + && mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { > > > + /* reconnect default channel for postcopy recovery */ > > > + channel = CH_DEFAULT; > > > > This is still in the big "peek buffer" if condition. > > IMHO we can skip peeking buffer when postcopy paused, because in this stage > > the channel must be (1) main channel first, then (2) preempt channel next. > > * It is in the big 'peek buffer' condition because the 'main' channel > (= CH_DEFAULT) is being established here. Ideally, all channels should > send magic values to be consistent. The 'main' channel sends magic > value when it is established before starting migration, but the same > 'main' channel does not send magic value when it is established during > postcopy recovery, that is an inconsistency (a bug) here. Ideal fix is For a reconnection we could do better to define a header format indeed for such extensions. I can't say it's a bug. > to send a magic value every time the 'main' channel is established, > irrespective of when it is established. > > * Adding conditionals to check if it is _POSTCOPY_PAUSED state then > don't peek will only lead to complicated 'if' conditionals. This > channel handling code is already complex and non-intuitive enough. Please see above [a]. > > > > + } else if (mis->from_src_file > > > + && (!strcmp(ioc->name, "migration-tls-incoming") > > > + || !strcmp(ioc->name, "migration-file-incoming"))) { > > > + channel = CH_MULTIFD; > > > > Confused here too. Why do we need to check ioc name? Shouldn't multifd has > > the headers? > > * Because they are not 'multifd' channels, tls/file channels don't > send magic values, but are still handled by It might be because you have a bug where you removed mapped-ram check at [b] above. I think we need to keep it. Why TLS channels don't send magic? > 'multifd_recv_new_channel()' function. > === > ... > if (default_channel) { > migration_incoming_setup(f); > } else { > if (migrate_multifd()) { > multifd_recv_new_channel(ioc, &local_err); > } else { > postcopy_preempt_new_channel(mis, f); > } > === > In the code above, if 'default_channel==false' and multifd() is > enabled, all incoming connections are handled by > 'multifd_recv_new_channel()', irrespective of whether it is a > 'multifd' channel or not. While creating multifd channels, there is no > check for channel type like: if(channel == CH_MULTIFD). > > * IMHO, if we make all channels behave with consistency, ie. either > they all send magic value or none sends magic value, that'll simplify > this code a lot. > > > > - assert(migration_needs_multiple_sockets()); > > Could I ask why removal? > > * Because that function returns migrate_multifd() => > migrate_multifd() || migrate_postcopy_preempt(); > * And the very following check is also migrate_multifd(), as below: > > > > if (migrate_multifd()) { > > > multifd_recv_new_channel(ioc, &local_err); > > > > It might be better to avoid such "ret && XXX" nested check. E.g. do you > > think below easier to read? > > > > diff --git a/migration/migration.c b/migration/migration.c > > index 74c50cc72c..9eb2f3fdeb 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -1064,12 +1064,14 @@ bool migration_has_all_channels(void) > > return false; > > } > > > > - if (migrate_multifd()) { > > - return multifd_recv_all_channels_created(); > > + if (migrate_multifd() && > > + !multifd_recv_all_channels_created()) { > > + return false; > > } > > > > - if (migrate_postcopy_preempt()) { > > - return mis->postcopy_qemufile_dst != NULL; > > + if (migrate_postcopy_preempt() && > > + mis->postcopy_qemufile_dst == NULL) { > > + return false; > > } > > > > return true; > > * Will try it. > > > > - if (!migrate_multifd()) { > > > + if (!migrate_multifd() || migration_in_postcopy()) { > > > return 0; > > > } > > > > [1] > > > > > > > > if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) { > > > diff --git a/migration/ram.c b/migration/ram.c > > > index f2326788de..bdba7abe73 100644 > > > --- a/migration/ram.c > > > +++ b/migration/ram.c > > > @@ -1295,7 +1295,7 @@ static int find_dirty_block(RAMState *rs, PageSearchStatus *pss) > > > pss->page = 0; > > > pss->block = QLIST_NEXT_RCU(pss->block, next); > > > if (!pss->block) { > > > - if (multifd_ram_sync_per_round()) { > > > + if (multifd_ram_sync_per_round() && !migration_in_postcopy()) { > > > > If you have above[1], why need this? > > * True, I tried with just [1] above first, but it was failing for some > reason. Will try again. Another approach (cleaner at least to me..) is we check in_postcopy in multifd_ram_sync_per_*() functions. > > > This patch still did nothing for multifd in postcopy_start(). I'm not sure > > it's safe. > > > > What happens if some multifd pages were sent, then we start postcopy, dest > > vcpu threads running, then during postcopy some multifd pages finally > > arrived and modifying the guest pages during vcpus running? > > * ram_save_target_page() function saves multifd pages only when > (..!migration_in_postcopy()) not in postcopy mode. Case of 'multifd' > page arriving late on destination and 'postcopy' starting before that > is strange, because if multifd page is getting late, that network > latency should affect 'postcopy' channel too, no? But still if it is I don't think so. postcopy doesn't use any multifd channels. > possible, do we want to call - multifd_ram_flush_and_sync() before > postcopy_start()? Will that help? I'll check if/how it works. Note that all things flushed may or may not be enough, because IIUC the flush only makes sure all threads are synced. It may not make sure the order of things to happen in multifd threads and postcopy thread. The latter is what we need - we need to make sure no page land in postcopy threads. That's why I was requesting to add an assert() in multifd recv thread to make sure we will never receive a page during postcopy. This part is the most important change of the whole series, please take your time to understand the workflow and let's make sure it won't happen. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-07 15:45 ` Peter Xu @ 2025-02-08 10:36 ` Prasad Pandit 2025-02-10 16:59 ` Peter Xu 0 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-08 10:36 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit Hello Peter, On Fri, 7 Feb 2025 at 21:16, Peter Xu <peterx@redhat.com> wrote: > This is not easy to follow neither with the current name, nor that you > "assumed this is main channel" and test it. * It is not my doing, nor is there any assumption, but that is how current implementation works. === static bool migration_should_start_incoming(bool main_channel) { /* Multifd doesn't start unless all channels are established */ if (migrate_multifd()) { return migration_has_all_channels(); } /* Preempt channel only starts when the main channel is created */ if (migrate_postcopy_preempt()) { return main_channel; } /* * For all the rest types of migration, we should only reach here when * it's the main channel that's being created, and we should always * proceed with this channel. */ assert(main_channel); return true; } === * Above code returns 'true' for 'multifd' and 'main_channel' cases. When migrate_postcopy_preempt() is true, main_channel is 'false', so it returns false. All I have done is reused the migration_should_start_incoming() function to simplify the 'if' conditional at the top. > I think you may want to split > migration_has_all_channels() into migration_has_essential_channels() which > only covers main and multifd cases. Then you can check if (!has_esential) > here. You'd better also add a comment that all "essential channels" can be > peeked. > > You may also want to bypass a few things, e.g. "postcopy paused stage" here > rather than inside, because postcopy-recover only happens: > > - First with a main channel, that is not peekable as no header when resume > - Then with preempt channel, that is also not peekable > > [a] > > You may also need to keep the mapped-ram check. They also don't support > peek. * Instead of adding specific conditions and splitting functions, my request is, let's work towards consistent channel behaviour that will automatically simplify these conditions and channel handling. Maybe we can do that in a subsequent series. > > > > > > + } else if (mis->from_src_file > > > > + && (!strcmp(ioc->name, "migration-tls-incoming") > > > > + || !strcmp(ioc->name, "migration-file-incoming"))) { > > > > + channel = CH_MULTIFD; > > > > > > Confused here too. Why do we need to check ioc name? Shouldn't multifd has > > > the headers? > > > > * Because they are not 'multifd' channels, tls/file channels don't > > send magic values, but are still handled by > > It might be because you have a bug where you removed mapped-ram check at > [b] above. I think we need to keep it. * ie. Because I removed the mapped-ram check, so tls/file channels are handled by multifd_recv_new_channel()? No, that's not the case. Rather, that is how it works currently. I have not changed anything, only made it more explicit to see that when it is tls/file channel, handle it as a CH_MULTIFD type. Looking at the current code, one can not see clearly how tls/file channels are handled. > Why TLS channels don't send magic? * Probably because they do TLS hand-shake while establishing a connection? > > because if multifd page is getting late, that network > > latency should affect 'postcopy' channel too, no? But still if it is > > I don't think so. postcopy doesn't use any multifd channels. * Yes, but it uses the same wire/network. > > possible, do we want to call - multifd_ram_flush_and_sync() before > > postcopy_start()? Will that help? I'll check if/how it works. > > Note that all things flushed may or may not be enough, because IIUC the > flush only makes sure all threads are synced. * We are again using 'flush' and 'sync' interchangeably. What does - flush only makes sure all threads are synced - mean really? Is it not writing all socket data onto the wire/channel? * Flush should write all socket data onto the network wire/channel. The _order_ in which threads flush/write their socket data onto the wire/channel is to synchronise them, maintaining/controlling that _order_ is not flush. > It may not make sure the order of things to happen in multifd threads and postcopy thread. The > latter is what we need - we need to make sure no page land in postcopy threads. * Let's see: 1) When migration is in Postcopy mode, ram_save_multifd_page() is _not_ called on the source side. ie. no multifd data gets enqueued on the multifd queue. 1.1) multifd_queue_page() function also calls multifd_send() if the queue is full, before enqueueing new pages. 2) If a multifd page reaches the destination during Postcopy mode, it must have been sent/written on the multifd channel before Postcopy mode started, right? 3) In this case, writing/flushing all multifd socket data onto the wire/channel, before calling postcopy_start() function should help IIUC. 3.1) ie. calling multifd_send() before postcopy_start() should help to clear the multifd queue, before Postcopy begins. 3.2) Same can be done by - multifd_ram_flush_and_sync() -> multifd_send() - sequence. * If all multifd pages are sent/written/flushed onto the multifd channels before postcopy_start() is called, then multifd pages should not arrive at the destination after postcopy starts IIUC. If that is happening, we need a reproducer for such a case. Do we have such a reproducer? > That's why I was requesting to add an assert() in multifd recv thread to > make sure we will never receive a page during postcopy. * ie. Add assert(!migrate_in_postcopy()) in multifd_recv_thread() function? Okay. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-08 10:36 ` Prasad Pandit @ 2025-02-10 16:59 ` Peter Xu 2025-02-11 9:04 ` Prasad Pandit 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-10 16:59 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Sat, Feb 08, 2025 at 04:06:56PM +0530, Prasad Pandit wrote: > Hello Peter, > > On Fri, 7 Feb 2025 at 21:16, Peter Xu <peterx@redhat.com> wrote: > > This is not easy to follow neither with the current name, nor that you > > "assumed this is main channel" and test it. > > * It is not my doing, nor is there any assumption, but that is how > current implementation works. > === > static bool migration_should_start_incoming(bool main_channel) > { > /* Multifd doesn't start unless all channels are established */ > if (migrate_multifd()) { > return migration_has_all_channels(); > } > > /* Preempt channel only starts when the main channel is created */ > if (migrate_postcopy_preempt()) { > return main_channel; > } > > /* > * For all the rest types of migration, we should only reach here when > * it's the main channel that's being created, and we should always > * proceed with this channel. > */ > assert(main_channel); > return true; > } > === > * Above code returns 'true' for 'multifd' and 'main_channel' cases. > When migrate_postcopy_preempt() is true, main_channel is 'false', so > it returns false. All I have done is reused the > migration_should_start_incoming() function to simplify the 'if' > conditional at the top. Yes, and I suggest a rename or introduce a new helper, per previous reply. > > > I think you may want to split > > migration_has_all_channels() into migration_has_essential_channels() which > > only covers main and multifd cases. Then you can check if (!has_esential) > > here. You'd better also add a comment that all "essential channels" can be > > peeked. > > > > You may also want to bypass a few things, e.g. "postcopy paused stage" here > > rather than inside, because postcopy-recover only happens: > > > > - First with a main channel, that is not peekable as no header when resume > > - Then with preempt channel, that is also not peekable > > > > [a] > > > > You may also need to keep the mapped-ram check. They also don't support > > peek. > > * Instead of adding specific conditions and splitting functions, my > request is, let's work towards consistent channel behaviour that will > automatically simplify these conditions and channel handling. Maybe we > can do that in a subsequent series. I didn't follow, sorry - do you mean this patch is correct on dropping the mapped-ram check? I don't yet understand how it can work if without. > > > > > > > > > + } else if (mis->from_src_file > > > > > + && (!strcmp(ioc->name, "migration-tls-incoming") > > > > > + || !strcmp(ioc->name, "migration-file-incoming"))) { > > > > > + channel = CH_MULTIFD; > > > > > > > > Confused here too. Why do we need to check ioc name? Shouldn't multifd has > > > > the headers? > > > > > > * Because they are not 'multifd' channels, tls/file channels don't > > > send magic values, but are still handled by > > > > It might be because you have a bug where you removed mapped-ram check at > > [b] above. I think we need to keep it. > > * ie. Because I removed the mapped-ram check, so tls/file channels are > handled by multifd_recv_new_channel()? No, that's not the case. > Rather, that is how it works currently. I have not changed anything, > only made it more explicit to see that when it is tls/file channel, > handle it as a CH_MULTIFD type. Looking at the current code, one can > not see clearly how tls/file channels are handled. > > > Why TLS channels don't send magic? > > * Probably because they do TLS hand-shake while establishing a connection? I meant tls channels should have these magics too. Do you mean they're not? > > > > because if multifd page is getting late, that network > > > latency should affect 'postcopy' channel too, no? But still if it is > > > > I don't think so. postcopy doesn't use any multifd channels. > > * Yes, but it uses the same wire/network. > > > > possible, do we want to call - multifd_ram_flush_and_sync() before > > > postcopy_start()? Will that help? I'll check if/how it works. > > > > Note that all things flushed may or may not be enough, because IIUC the > > flush only makes sure all threads are synced. > > * We are again using 'flush' and 'sync' interchangeably. What does - > flush only makes sure all threads are synced - mean really? Is it not > writing all socket data onto the wire/channel? > > * Flush should write all socket data onto the network wire/channel. > The _order_ in which threads flush/write their socket data onto the > wire/channel is to synchronise them, maintaining/controlling that > _order_ is not flush. > > > It may not make sure the order of things to happen in multifd threads and postcopy thread. The > > latter is what we need - we need to make sure no page land in postcopy threads. > > * Let's see: > 1) When migration is in Postcopy mode, ram_save_multifd_page() is > _not_ called on the source side. ie. no multifd data gets enqueued on > the multifd queue. > 1.1) multifd_queue_page() function also calls multifd_send() if > the queue is full, before enqueueing new pages. > 2) If a multifd page reaches the destination during Postcopy mode, > it must have been sent/written on the multifd channel before Postcopy > mode started, right? Yes. > 3) In this case, writing/flushing all multifd socket data onto the > wire/channel, before calling postcopy_start() function should help > IIUC. > 3.1) ie. calling multifd_send() before postcopy_start() should > help to clear the multifd queue, before Postcopy begins. > 3.2) Same can be done by - multifd_ram_flush_and_sync() -> > multifd_send() - sequence. No I don't think so. Flushing sending side makes sure send buffer is empty. It doesn't guarantee recv buffer is empty on the other side. > > * If all multifd pages are sent/written/flushed onto the multifd > channels before postcopy_start() is called, then multifd pages should > not arrive at the destination after postcopy starts IIUC. If that is > happening, we need a reproducer for such a case. Do we have such a > reproducer? With or without a reproducer, we need to at least justify it in theory. If it doesn't even work in theory, it's a problem. > > > That's why I was requesting to add an assert() in multifd recv thread to > > make sure we will never receive a page during postcopy. > > * ie. Add assert(!migrate_in_postcopy()) in multifd_recv_thread() > function? Okay. Yes, probably before multifd_recv_state->ops->recv(). Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-10 16:59 ` Peter Xu @ 2025-02-11 9:04 ` Prasad Pandit 2025-02-11 15:20 ` Peter Xu 0 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-11 9:04 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Mon, 10 Feb 2025 at 22:29, Peter Xu <peterx@redhat.com> wrote: > Yes, and I suggest a rename or introduce a new helper, per previous reply. * Okay, will try it. > I didn't follow, sorry - do you mean this patch is correct on dropping the > mapped-ram check? I don't yet understand how it can work if without. * It goes for channel peek if '!migrate_mapped_ram', ie. when mapped_ram is not set. When it is enabled, likely it just falls into the multifd channel, like other tls/file channels. I'll see if we have to add a check for mapped_ram stream, like tls/file one. > I meant tls channels should have these magics too. Do you mean they're not? * Yes. AFAIU, tls/file channels don't send magic values. > No I don't think so. > Flushing sending side makes sure send buffer is empty. It doesn't > guarantee recv buffer is empty on the other side. * A simple 'flush' operation is not supposed to guarantee reception on the destination side. It is just a 'flush' operation. If we want to _confirm_ whether the pages sent to the destination are received or not, then the destination side should send an 'Acknowledgement' to the source side. Is there such a mechanism in place currently? > > > > * If all multifd pages are sent/written/flushed onto the multifd > > channels before postcopy_start() is called, then multifd pages should > > not arrive at the destination after postcopy starts IIUC. If that is > > happening, we need a reproducer for such a case. Do we have such a > > reproducer? > > With or without a reproducer, we need to at least justify it in theory. If > it doesn't even work in theory, it's a problem. * The theory that both multifd and postcopy channels use the same underlying network wire; And in that multifd pages get delayed, but postcopy pages don't, is not understandable. There must be something else happening in such a case, which a reproducer could help with. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-11 9:04 ` Prasad Pandit @ 2025-02-11 15:20 ` Peter Xu 2025-02-12 13:27 ` Prasad Pandit 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-11 15:20 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Tue, Feb 11, 2025 at 02:34:07PM +0530, Prasad Pandit wrote: > On Mon, 10 Feb 2025 at 22:29, Peter Xu <peterx@redhat.com> wrote: > > Yes, and I suggest a rename or introduce a new helper, per previous reply. > > * Okay, will try it. > > > I didn't follow, sorry - do you mean this patch is correct on dropping the > > mapped-ram check? I don't yet understand how it can work if without. > > * It goes for channel peek if '!migrate_mapped_ram', ie. when > mapped_ram is not set. When it is enabled, likely it just falls into > the multifd channel, like other tls/file channels. I'll see if we have > to add a check for mapped_ram stream, like tls/file one. > > > I meant tls channels should have these magics too. Do you mean they're not? > > * Yes. AFAIU, tls/file channels don't send magic values. Please double check whether TLS will send magics. AFAICT, they should. > > > No I don't think so. > > Flushing sending side makes sure send buffer is empty. It doesn't > > guarantee recv buffer is empty on the other side. > > * A simple 'flush' operation is not supposed to guarantee reception on > the destination side. It is just a 'flush' operation. If we want to > _confirm_ whether the pages sent to the destination are received or > not, then the destination side should send an 'Acknowledgement' to the > source side. Is there such a mechanism in place currently? No. We need to figure out a way to do that properly, and that's exactly what I mentioned as one of the core changes we need in this series, which is still missing. We may or may not need an ACK message. Please think about it. > > > > > > > * If all multifd pages are sent/written/flushed onto the multifd > > > channels before postcopy_start() is called, then multifd pages should > > > not arrive at the destination after postcopy starts IIUC. If that is > > > happening, we need a reproducer for such a case. Do we have such a > > > reproducer? > > > > With or without a reproducer, we need to at least justify it in theory. If > > it doesn't even work in theory, it's a problem. > > * The theory that both multifd and postcopy channels use the same > underlying network wire; And in that multifd pages get delayed, but > postcopy pages don't, is not understandable. There must be something > else happening in such a case, which a reproducer could help with. Please consider the case where multifd recv threads may get scheduled out on dest host during precopy phase, not getting chance to be scheduled until postcopy already started running on dst, then the recv thread can stumble upon a page that was sent during precopy. As long as that can be always avoided, I think we should be good. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-11 15:20 ` Peter Xu @ 2025-02-12 13:27 ` Prasad Pandit 2025-02-12 14:37 ` Peter Xu 0 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-12 13:27 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit Hi, On Tue, 11 Feb 2025 at 20:50, Peter Xu <peterx@redhat.com> wrote: > > * Yes. AFAIU, tls/file channels don't send magic values. > Please double check whether TLS will send magics. AFAICT, they should. === * ... Also tls live migration already does * tls handshake while initializing main channel so with tls this * issue is not possible. */ if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { } else if (mis->from_src_file && (!strcmp(ioc->name, "migration-tls-incoming") || !strcmp(ioc->name, "migration-file-incoming"))) { channel = CH_MULTIFD; } === * From the comment and condition above, both 'tls' and 'file' channels are not peekable, ie. no magic value to peek. The 'migration-file-incoming' check also helps to cover the migrate_mapped_ram() case IIUC. > No. We need to figure out a way to do that properly, and that's exactly > what I mentioned as one of the core changes we need in this series, which > is still missing. We may or may not need an ACK message. Please think > about it. * First we tried to call 'multifd_send_shutdown()' to close multifd channels before calling postcopy_start(). That's the best case scenario wherein multifd channels are closed before postcopy starts. So that there's no confusion and/or jumbling of different data packets. It did not work, as qemu would crash during multifd_shutdown(). * Second is we push/flush all multifd pages before calling postcopy_start() and let the multifd channels exist/stay till the migration ends, after that they are duly shutdown. It is working well so far, passing all migration tests too. * Third, if we want to confirm that multifd pages are received on the destination before calling postcopy_start(), then the best way is for the destination to send an acknowledgement to the source side that it has received and processed all multifd pages and nothing is pending on the multifd channels. * Another could be to define a multifd_recv_flush() function, which could process and clear the receive side multifd queue, so that no multifd pages are pending there. Not sure how best to do this yet. Also considering it lacks proper communication and synchronisation between source and destination sides, it does not seem like the best solution. * Do you have any other option/approach in mind? > Please consider the case where multifd recv threads may get scheduled out > on dest host during precopy phase, not getting chance to be scheduled until > postcopy already started running on dst, then the recv thread can stumble > upon a page that was sent during precopy. As long as that can be always > avoided, I think we should be good. * TBH, this sounds like a remote corner case. * I'm testing the revised patch series and will send it shortly. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-12 13:27 ` Prasad Pandit @ 2025-02-12 14:37 ` Peter Xu 2025-02-12 17:36 ` Prasad Pandit 0 siblings, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-12 14:37 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 12, 2025 at 06:57:48PM +0530, Prasad Pandit wrote: > Hi, > > On Tue, 11 Feb 2025 at 20:50, Peter Xu <peterx@redhat.com> wrote: > > > * Yes. AFAIU, tls/file channels don't send magic values. > > Please double check whether TLS will send magics. AFAICT, they should. > === > * ... Also tls live migration already does > * tls handshake while initializing main channel so with tls this > * issue is not possible. > */ > if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_READ_MSG_PEEK)) { > } else if (mis->from_src_file > && (!strcmp(ioc->name, "migration-tls-incoming") > || !strcmp(ioc->name, "migration-file-incoming"))) { > channel = CH_MULTIFD; > } > === > * From the comment and condition above, both 'tls' and 'file' channels > are not peekable, ie. no magic value to peek. The > 'migration-file-incoming' check also helps to cover the > migrate_mapped_ram() case IIUC. I think it's not because TLS channels don't send magic, but TLS channels are not prone to ordering issues. In general, I'm not convinced we need to check names of iochannels. > > > No. We need to figure out a way to do that properly, and that's exactly > > what I mentioned as one of the core changes we need in this series, which > > is still missing. We may or may not need an ACK message. Please think > > about it. > > * First we tried to call 'multifd_send_shutdown()' to close multifd > channels before calling postcopy_start(). That's the best case > scenario wherein multifd channels are closed before postcopy starts. > So that there's no confusion and/or jumbling of different data > packets. It did not work, as qemu would crash during > multifd_shutdown(). Have we debugged the crash? I'm not saying we should go with this, but crash isn't a reason to not choose a design. > > * Second is we push/flush all multifd pages before calling > postcopy_start() and let the multifd channels exist/stay till the > migration ends, after that they are duly shutdown. It is working well > so far, passing all migration tests too. > > * Third, if we want to confirm that multifd pages are received on the > destination before calling postcopy_start(), then the best way is for > the destination to send an acknowledgement to the source side that it > has received and processed all multifd pages and nothing is pending on > the multifd channels. > > * Another could be to define a multifd_recv_flush() function, which > could process and clear the receive side multifd queue, so that no > multifd pages are pending there. Not sure how best to do this yet. > Also considering it lacks proper communication and synchronisation > between source and destination sides, it does not seem like the best > solution. > > * Do you have any other option/approach in mind? > > > Please consider the case where multifd recv threads may get scheduled out > > on dest host during precopy phase, not getting chance to be scheduled until > > postcopy already started running on dst, then the recv thread can stumble > > upon a page that was sent during precopy. As long as that can be always > > avoided, I think we should be good. > > * TBH, this sounds like a remote corner case. No this is not. As I mentioned tons of times.. copying page in socket buffers directly into guest page during vcpus running / postcopy is a very hard to debug issue. If it's possible to happen, the design is flawed. > > * I'm testing the revised patch series and will send it shortly. I don't think passing the unit tests prove the series is correct and should be merged. We need to understand how it work, or we can't merge it. I feel very frustrated multiple times that you seem to want to ignore what I comment. I don't know why you rush to repost things. After a 2nd thought, I think maybe multifd flush and sync could work on src side indeed, because when flush and sync there'll be a message (MULTIFD_FLUSH) on main channel and that should order against the rest postcopy messages that will also be sent on the main channel (if we do multifd flush before all the postcopy processes). Then it should guarantee when postcopy starts on dest, dest multifd recv threads flushed all messages, and no multifd message will arrive anymore. But again we should guard it with an assert() in recv threads if you want to postpone recycling of multifd threads, just to double check no outliers we overlooked. When proposing the patches you need to justify the design first before anything. Please evaluate above, these things are critical to appear in either code comments or commit messages. Please digest everything before reposting. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-12 14:37 ` Peter Xu @ 2025-02-12 17:36 ` Prasad Pandit 2025-02-12 17:52 ` Peter Xu 0 siblings, 1 reply; 23+ messages in thread From: Prasad Pandit @ 2025-02-12 17:36 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit Hi, On Wed, 12 Feb 2025 at 20:08, Peter Xu <peterx@redhat.com> wrote: > I think it's not because TLS channels don't send magic, but TLS channels > are not prone to ordering issues. > In general, I'm not convinced we need to check names of iochannels. * If the channel does not set '_READ_MSG_SEEK' flag, which magic value are we going to read? As for checking names of the channels, it tells the reader how an incoming channel is processed. >> It did not work, as qemu would crash during multifd_shutdown(). > > Have we debugged the crash? I'm not saying we should go with this, but > crash isn't a reason to not choose a design. * Yes, I did try to debug it but couldn't get to a conclusion in time. > No this is not. As I mentioned tons of times.. copying page in socket > buffers directly into guest page during vcpus running / postcopy is a very > hard to debug issue. If it's possible to happen, the design is flawed. * Sure, agreed. So far in my testing it does not seem to happen. The theory of multifd_recv_threads getting scheduled out and causing guest page over-write seems remote to me, but I get the possibility/probability. One possible solution is to have the destination side send an 'acknowledgement' to the source side. > I don't think passing the unit tests prove the series is correct and should > be merged. We need to understand how it work, or we can't merge it. * Well, passing unit tests should confirm that it does not break existing functionality. Is there any metric/limit to such understanding? Everybody understands/sees things differently. Understanding is an ever evolving thing. Saying that merge should happen based on understanding sounds weird to me. > I feel very frustrated multiple times that you seem to want to ignore what > I comment. I don't know why you rush to repost things. * I guess we are seeing/understanding things differently here. > After a 2nd thought, I think maybe multifd flush and sync could work on src > side indeed, because when flush and sync there'll be a message > (MULTIFD_FLUSH) on main channel and that should order against the rest > postcopy messages that will also be sent on the main channel (if we do > multifd flush before all the postcopy processes). Then it should guarantee > when postcopy starts on dest, dest multifd recv threads flushed all > messages, and no multifd message will arrive anymore. * After a 2nd thought - that's evolving understanding right there. :) * To mention here, to flush multifd channels with 'multifd_ram_flush_and_sync()' I tried calling migration_iteration_run -> multifd_ram_flush_and_sync(s->to_dst_file) It does not work, it crashes. Is 's->to_dst_file' a right parameter there? But calling a function below works === /* Send enqueued data pages onto next available multifd channel */ int multifd_send_flush(void) { if (!multifd_payload_empty(multifd_ram_send)) { if (!multifd_send(&multifd_ram_send)) { error_report("%s: multifd_send fail", __func__); return -1; } } return 0; } === migration_iteration_run -> multifd_send_flush() Works, but it is not sending the 'RAM_SAVE_FLAG_MULTIFD_FLUSH' message as done by 'multifd_ram_flush_and_sync()' function. > But again we should guard it with an assert() in recv threads if you want > to postpone recycling of multifd threads, just to double check no outliers > we overlooked. * Yes, the revised patch is working with the assert(!migrate_in_postcopy) in multifd_recv_thread. * I was going to send a revised series with these changes, but will wait on that for now. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 3/5] migration: enable multifd and postcopy together 2025-02-12 17:36 ` Prasad Pandit @ 2025-02-12 17:52 ` Peter Xu 0 siblings, 0 replies; 23+ messages in thread From: Peter Xu @ 2025-02-12 17:52 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 12, 2025 at 11:06:00PM +0530, Prasad Pandit wrote: > * I was going to send a revised series with these changes, but will > wait on that for now. You were going to send some changes that you don't yet fully understand how it works? How would you do the flush in the new revision if your existing code crashes? Or did you do that in the revised series at all? Sorry, no - I don't think this is how it should or could ever work. I've repeated myself multiple times, I'll stop. -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v5 4/5] tests/qtest/migration: add postcopy tests with multifd 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit ` (2 preceding siblings ...) 2025-02-05 12:27 ` [PATCH v5 3/5] migration: enable multifd and postcopy together Prasad Pandit @ 2025-02-05 12:27 ` Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit 4 siblings, 0 replies; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Add new postcopy tests to run postcopy migration with multifd channels enabled. Add a boolean fields 'multifd' and 'postcopy_ram' to the MigrateCommon structure. It helps to enable multifd channels and postcopy-ram during migration tests. Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- tests/qtest/migration/compression-tests.c | 13 ++++++++ tests/qtest/migration/framework.c | 13 ++++++++ tests/qtest/migration/framework.h | 4 +++ tests/qtest/migration/postcopy-tests.c | 23 +++++++++++++ tests/qtest/migration/precopy-tests.c | 19 +++++++++++ tests/qtest/migration/tls-tests.c | 40 +++++++++++++++++++++++ 6 files changed, 112 insertions(+) v4: no change - https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c index d78f1f11f1..3252ba2f73 100644 --- a/tests/qtest/migration/compression-tests.c +++ b/tests/qtest/migration/compression-tests.c @@ -39,6 +39,17 @@ static void test_multifd_tcp_zstd(void) }; test_precopy_common(&args); } + +static void test_multifd_postcopy_tcp_zstd(void) +{ + MigrateCommon args = { + .postcopy_ram = true, + .listen_uri = "defer", + .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd, + }; + + test_precopy_common(&args); +} #endif /* CONFIG_ZSTD */ #ifdef CONFIG_QATZIP @@ -158,6 +169,8 @@ void migration_test_add_compression(MigrationTestEnv *env) #ifdef CONFIG_ZSTD migration_test_add("/migration/multifd/tcp/plain/zstd", test_multifd_tcp_zstd); + migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd", + test_multifd_postcopy_tcp_zstd); #endif #ifdef CONFIG_QATZIP diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c index 4550cda129..00776f858c 100644 --- a/tests/qtest/migration/framework.c +++ b/tests/qtest/migration/framework.c @@ -427,6 +427,14 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, migrate_set_capability(to, "postcopy-preempt", true); } + if (args->multifd) { + migrate_set_capability(from, "multifd", true); + migrate_set_capability(to, "multifd", true); + + migrate_set_parameter_int(from, "multifd-channels", 8); + migrate_set_parameter_int(to, "multifd-channels", 8); + } + migrate_ensure_non_converge(from); migrate_prepare_for_dirty_mem(from); @@ -691,6 +699,11 @@ void test_precopy_common(MigrateCommon *args) return; } + if (args->postcopy_ram) { + migrate_set_capability(from, "postcopy-ram", true); + migrate_set_capability(to, "postcopy-ram", true); + } + if (args->start_hook) { data_hook = args->start_hook(from, to); } diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h index 7991ee56b6..214288ca42 100644 --- a/tests/qtest/migration/framework.h +++ b/tests/qtest/migration/framework.h @@ -193,7 +193,11 @@ typedef struct { */ bool live; + /* set multifd on */ + bool multifd; + /* Postcopy specific fields */ + bool postcopy_ram; void *postcopy_data; bool postcopy_preempt; PostcopyRecoveryFailStage postcopy_recovery_fail_stage; diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c index daf7449f2c..212a5ea600 100644 --- a/tests/qtest/migration/postcopy-tests.c +++ b/tests/qtest/migration/postcopy-tests.c @@ -79,6 +79,25 @@ static void test_postcopy_preempt_recovery(void) test_postcopy_recovery_common(&args); } +static void test_multifd_postcopy(void) +{ + MigrateCommon args = { + .multifd = true, + }; + + test_postcopy_common(&args); +} + +static void test_multifd_postcopy_preempt(void) +{ + MigrateCommon args = { + .multifd = true, + .postcopy_preempt = true, + }; + + test_postcopy_common(&args); +} + void migration_test_add_postcopy(MigrationTestEnv *env) { if (env->has_uffd) { @@ -98,6 +117,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env) "/migration/postcopy/recovery/double-failures/reconnect", test_postcopy_recovery_fail_reconnect); + migration_test_add("/migration/multifd+postcopy/plain", + test_multifd_postcopy); + migration_test_add("/migration/multifd+postcopy/preempt/plain", + test_multifd_postcopy_preempt); if (env->is_x86) { migration_test_add("/migration/postcopy/suspend", test_postcopy_suspend); diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c index 23599b29ee..b1a4e7bbb1 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -33,6 +33,7 @@ #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */ static char *tmpfs; +static bool postcopy_ram = false; static void test_precopy_unix_plain(void) { @@ -472,6 +473,11 @@ static void test_multifd_tcp_cancel(void) migrate_ensure_non_converge(from); migrate_prepare_for_dirty_mem(from); + if (postcopy_ram) { + migrate_set_capability(from, "postcopy-ram", true); + migrate_set_capability(to, "postcopy-ram", true); + } + migrate_set_parameter_int(from, "multifd-channels", 16); migrate_set_parameter_int(to, "multifd-channels", 16); @@ -513,6 +519,10 @@ static void test_multifd_tcp_cancel(void) return; } + if (postcopy_ram) { + migrate_set_capability(to2, "postcopy-ram", true); + } + migrate_set_parameter_int(to2, "multifd-channels", 16); migrate_set_capability(to2, "multifd", true); @@ -536,6 +546,13 @@ static void test_multifd_tcp_cancel(void) migrate_end(from, to2, true); } +static void test_multifd_postcopy_tcp_cancel(void) +{ + postcopy_ram = true; + test_multifd_tcp_cancel(); + postcopy_ram = false; +} + static void calc_dirty_rate(QTestState *who, uint64_t calc_time) { qtest_qmp_assert_success(who, @@ -999,6 +1016,8 @@ void migration_test_add_precopy(MigrationTestEnv *env) test_multifd_tcp_no_zero_page); migration_test_add("/migration/multifd/tcp/plain/cancel", test_multifd_tcp_cancel); + migration_test_add("migration/multifd+postcopy/tcp/plain/cancel", + test_multifd_postcopy_tcp_cancel); if (g_str_equal(env->arch, "x86_64") && env->has_kvm && env->has_dirty_ring) { diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c index 5704a1f992..094dc1d814 100644 --- a/tests/qtest/migration/tls-tests.c +++ b/tests/qtest/migration/tls-tests.c @@ -393,6 +393,17 @@ static void test_postcopy_recovery_tls_psk(void) test_postcopy_recovery_common(&args); } +static void test_multifd_postcopy_recovery_tls_psk(void) +{ + MigrateCommon args = { + .start_hook = migrate_hook_start_tls_psk_match, + .end_hook = migrate_hook_end_tls_psk, + .multifd = true, + }; + + test_postcopy_recovery_common(&args); +} + /* This contains preempt+recovery+tls test altogether */ static void test_postcopy_preempt_all(void) { @@ -405,6 +416,17 @@ static void test_postcopy_preempt_all(void) test_postcopy_recovery_common(&args); } +static void test_multifd_postcopy_preempt_recovery_tls_psk(void) +{ + MigrateCommon args = { + .start_hook = migrate_hook_start_tls_psk_match, + .end_hook = migrate_hook_end_tls_psk, + .multifd = true, + }; + + test_postcopy_recovery_common(&args); +} + static void test_precopy_unix_tls_psk(void) { g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -649,6 +671,18 @@ static void test_multifd_tcp_tls_psk_mismatch(void) test_precopy_common(&args); } +static void test_multifd_postcopy_tcp_tls_psk_match(void) +{ + MigrateCommon args = { + .multifd = true, + .listen_uri = "defer", + .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match, + .end_hook = migrate_hook_end_tls_psk, + }; + + test_precopy_common(&args); +} + #ifdef CONFIG_TASN1 static void test_multifd_tcp_tls_x509_default_host(void) { @@ -743,6 +777,10 @@ void migration_test_add_tls(MigrationTestEnv *env) test_postcopy_preempt_tls_psk); migration_test_add("/migration/postcopy/preempt/recovery/tls/psk", test_postcopy_preempt_all); + migration_test_add("/migration/multifd+postcopy/recovery/tls/psk", + test_multifd_postcopy_recovery_tls_psk); + migration_test_add("/migration/multifd+postcopy/preempt/recovery/tls/psk", + test_multifd_postcopy_preempt_recovery_tls_psk); } #ifdef CONFIG_TASN1 migration_test_add("/migration/precopy/unix/tls/x509/default-host", @@ -776,6 +814,8 @@ void migration_test_add_tls(MigrationTestEnv *env) test_multifd_tcp_tls_psk_match); migration_test_add("/migration/multifd/tcp/tls/psk/mismatch", test_multifd_tcp_tls_psk_mismatch); + migration_test_add("/migration/multifd+postcopy/tcp/tls/psk/match", + test_multifd_postcopy_tcp_tls_psk_match); #ifdef CONFIG_TASN1 migration_test_add("/migration/multifd/tcp/tls/x509/default-host", test_multifd_tcp_tls_x509_default_host); -- 2.48.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit ` (3 preceding siblings ...) 2025-02-05 12:27 ` [PATCH v5 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit @ 2025-02-05 12:27 ` Prasad Pandit 2025-02-06 22:44 ` Peter Xu 2025-02-07 22:01 ` Fabiano Rosas 4 siblings, 2 replies; 23+ messages in thread From: Prasad Pandit @ 2025-02-05 12:27 UTC (permalink / raw) To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit From: Prasad Pandit <pjp@fedoraproject.org> Migration capabilities are set in multiple '.start_hook' functions for various tests. Instead, consolidate setting capabilities in 'set_migration_capabilities()' function which is called from various 'test_*_common()' functions. While simplifying the capabilities setting, it helps to declutter the test sources. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> --- tests/qtest/migration/compression-tests.c | 7 +-- tests/qtest/migration/cpr-tests.c | 4 +- tests/qtest/migration/file-tests.c | 44 +++++----------- tests/qtest/migration/framework.c | 63 ++++++++++++++++------- tests/qtest/migration/framework.h | 8 ++- tests/qtest/migration/postcopy-tests.c | 10 ++-- tests/qtest/migration/precopy-tests.c | 19 +++---- tests/qtest/migration/tls-tests.c | 11 ++-- 8 files changed, 79 insertions(+), 87 deletions(-) diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c index 3252ba2f73..13a2b2d74f 100644 --- a/tests/qtest/migration/compression-tests.c +++ b/tests/qtest/migration/compression-tests.c @@ -43,7 +43,7 @@ static void test_multifd_tcp_zstd(void) static void test_multifd_postcopy_tcp_zstd(void) { MigrateCommon args = { - .postcopy_ram = true, + .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true, .listen_uri = "defer", .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd, }; @@ -114,10 +114,6 @@ migrate_hook_start_xbzrle(QTestState *from, QTestState *to) { migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432); - - migrate_set_capability(from, "xbzrle", true); - migrate_set_capability(to, "xbzrle", true); - return NULL; } @@ -129,6 +125,7 @@ static void test_precopy_unix_xbzrle(void) .listen_uri = uri, .start_hook = migrate_hook_start_xbzrle, .iterations = 2, + .caps[MIGRATION_CAPABILITY_XBZRLE] = true, /* * XBZRLE needs pages to be modified when doing the 2nd+ round * iteration to have real data pushed to the stream. diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c index 44ce89aa5b..818fa95133 100644 --- a/tests/qtest/migration/cpr-tests.c +++ b/tests/qtest/migration/cpr-tests.c @@ -24,9 +24,6 @@ static void *migrate_hook_start_mode_reboot(QTestState *from, QTestState *to) migrate_set_parameter_str(from, "mode", "cpr-reboot"); migrate_set_parameter_str(to, "mode", "cpr-reboot"); - migrate_set_capability(from, "x-ignore-shared", true); - migrate_set_capability(to, "x-ignore-shared", true); - return NULL; } @@ -39,6 +36,7 @@ static void test_mode_reboot(void) .connect_uri = uri, .listen_uri = "defer", .start_hook = migrate_hook_start_mode_reboot, + .caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true, }; test_file_common(&args, true); diff --git a/tests/qtest/migration/file-tests.c b/tests/qtest/migration/file-tests.c index 84225c8c33..bc551949f9 100644 --- a/tests/qtest/migration/file-tests.c +++ b/tests/qtest/migration/file-tests.c @@ -107,15 +107,6 @@ static void test_precopy_file_offset_bad(void) test_file_common(&args, false); } -static void *migrate_hook_start_mapped_ram(QTestState *from, - QTestState *to) -{ - migrate_set_capability(from, "mapped-ram", true); - migrate_set_capability(to, "mapped-ram", true); - - return NULL; -} - static void test_precopy_file_mapped_ram_live(void) { g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, @@ -123,7 +114,7 @@ static void test_precopy_file_mapped_ram_live(void) MigrateCommon args = { .connect_uri = uri, .listen_uri = "defer", - .start_hook = migrate_hook_start_mapped_ram, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, }; test_file_common(&args, false); @@ -136,26 +127,12 @@ static void test_precopy_file_mapped_ram(void) MigrateCommon args = { .connect_uri = uri, .listen_uri = "defer", - .start_hook = migrate_hook_start_mapped_ram, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, }; test_file_common(&args, true); } -static void *migrate_hook_start_multifd_mapped_ram(QTestState *from, - QTestState *to) -{ - migrate_hook_start_mapped_ram(from, to); - - migrate_set_parameter_int(from, "multifd-channels", 4); - migrate_set_parameter_int(to, "multifd-channels", 4); - - migrate_set_capability(from, "multifd", true); - migrate_set_capability(to, "multifd", true); - - return NULL; -} - static void test_multifd_file_mapped_ram_live(void) { g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, @@ -163,7 +140,8 @@ static void test_multifd_file_mapped_ram_live(void) MigrateCommon args = { .connect_uri = uri, .listen_uri = "defer", - .start_hook = migrate_hook_start_multifd_mapped_ram, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_file_common(&args, false); @@ -176,7 +154,8 @@ static void test_multifd_file_mapped_ram(void) MigrateCommon args = { .connect_uri = uri, .listen_uri = "defer", - .start_hook = migrate_hook_start_multifd_mapped_ram, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_file_common(&args, true); @@ -185,8 +164,6 @@ static void test_multifd_file_mapped_ram(void) static void *migrate_hook_start_multifd_mapped_ram_dio(QTestState *from, QTestState *to) { - migrate_hook_start_multifd_mapped_ram(from, to); - migrate_set_parameter_bool(from, "direct-io", true); migrate_set_parameter_bool(to, "direct-io", true); @@ -201,6 +178,8 @@ static void test_multifd_file_mapped_ram_dio(void) .connect_uri = uri, .listen_uri = "defer", .start_hook = migrate_hook_start_multifd_mapped_ram_dio, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; if (!probe_o_direct_support(tmpfs)) { @@ -246,7 +225,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset_dio(QTestState *from, fdset_add_fds(from, file, O_WRONLY, 2, true); fdset_add_fds(to, file, O_RDONLY, 2, true); - migrate_hook_start_multifd_mapped_ram(from, to); migrate_set_parameter_bool(from, "direct-io", true); migrate_set_parameter_bool(to, "direct-io", true); @@ -261,8 +239,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset(QTestState *from, fdset_add_fds(from, file, O_WRONLY, 2, false); fdset_add_fds(to, file, O_RDONLY, 2, false); - migrate_hook_start_multifd_mapped_ram(from, to); - return NULL; } @@ -275,6 +251,8 @@ static void test_multifd_file_mapped_ram_fdset(void) .listen_uri = "defer", .start_hook = migrate_hook_start_multifd_mapped_ram_fdset, .end_hook = migrate_hook_end_multifd_mapped_ram_fdset, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_file_common(&args, true); @@ -289,6 +267,8 @@ static void test_multifd_file_mapped_ram_fdset_dio(void) .listen_uri = "defer", .start_hook = migrate_hook_start_multifd_mapped_ram_fdset_dio, .end_hook = migrate_hook_end_multifd_mapped_ram_fdset, + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; if (!probe_o_direct_support(tmpfs)) { diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c index 00776f858c..a858b6ffec 100644 --- a/tests/qtest/migration/framework.c +++ b/tests/qtest/migration/framework.c @@ -204,6 +204,30 @@ static QList *migrate_start_get_qmp_capabilities(const MigrateStart *args) return capabilities; } +static void set_migration_capabilities(QTestState *from, + QTestState *to, MigrateCommon *args) +{ + /* + * MigrationCapability_lookup and MIGRATION_CAPABILITY_ constants + * are from qapi-types-migration.h. + */ + for(uint8_t i = 0; i < MIGRATION_CAPABILITY__MAX; i++) + { + if(args->caps[i]) { + if (from) { + migrate_set_capability(from, + MigrationCapability_lookup.array[i], true); + } + if (to) { + migrate_set_capability(to, + MigrationCapability_lookup.array[i], true); + } + } + } + + return; +} + int migrate_start(QTestState **from, QTestState **to, const char *uri, MigrateStart *args) { @@ -418,19 +442,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, args->postcopy_data = args->start_hook(from, to); } - migrate_set_capability(from, "postcopy-ram", true); - migrate_set_capability(to, "postcopy-ram", true); - migrate_set_capability(to, "postcopy-blocktime", true); - - if (args->postcopy_preempt) { - migrate_set_capability(from, "postcopy-preempt", true); - migrate_set_capability(to, "postcopy-preempt", true); - } - - if (args->multifd) { - migrate_set_capability(from, "multifd", true); - migrate_set_capability(to, "multifd", true); - + set_migration_capabilities(from, to, args); + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { migrate_set_parameter_int(from, "multifd-channels", 8); migrate_set_parameter_int(to, "multifd-channels", 8); } @@ -491,6 +504,10 @@ void test_postcopy_common(MigrateCommon *args) { QTestState *from, *to; + /* set postcopy capabilities */ + args->caps[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME] = true; + args->caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true; + if (migrate_postcopy_prepare(&from, &to, args)) { return; } @@ -631,6 +648,10 @@ void test_postcopy_recovery_common(MigrateCommon *args) /* Always hide errors for postcopy recover tests since they're expected */ args->start.hide_stderr = true; + /* set postcopy capabilities */ + args->caps[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME] = true; + args->caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true; + if (migrate_postcopy_prepare(&from, &to, args)) { return; } @@ -699,9 +720,10 @@ void test_precopy_common(MigrateCommon *args) return; } - if (args->postcopy_ram) { - migrate_set_capability(from, "postcopy-ram", true); - migrate_set_capability(to, "postcopy-ram", true); + set_migration_capabilities(from, to, args); + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { + migrate_set_parameter_int(from, "multifd-channels", 4); + migrate_set_parameter_int(to, "multifd-channels", 4); } if (args->start_hook) { @@ -847,6 +869,12 @@ void test_file_common(MigrateCommon *args, bool stop_src) */ g_assert_false(args->live); + set_migration_capabilities(from, to, args); + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { + migrate_set_parameter_int(from, "multifd-channels", 4); + migrate_set_parameter_int(to, "multifd-channels", 4); + } + if (g_strrstr(args->connect_uri, "offset=")) { check_offset = true; /* @@ -913,9 +941,6 @@ void *migrate_hook_start_precopy_tcp_multifd_common(QTestState *from, migrate_set_parameter_str(from, "multifd-compression", method); migrate_set_parameter_str(to, "multifd-compression", method); - migrate_set_capability(from, "multifd", true); - migrate_set_capability(to, "multifd", true); - /* Start incoming migration from the 1st socket */ migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}"); diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h index 214288ca42..cc1ea4d82b 100644 --- a/tests/qtest/migration/framework.h +++ b/tests/qtest/migration/framework.h @@ -12,6 +12,7 @@ #define TEST_FRAMEWORK_H #include "libqtest.h" +#include <qapi/qapi-types-migration.h> #define FILE_TEST_FILENAME "migfile" #define FILE_TEST_OFFSET 0x1000 @@ -193,14 +194,11 @@ typedef struct { */ bool live; - /* set multifd on */ - bool multifd; - /* Postcopy specific fields */ - bool postcopy_ram; void *postcopy_data; - bool postcopy_preempt; PostcopyRecoveryFailStage postcopy_recovery_fail_stage; + + bool caps[MIGRATION_CAPABILITY__MAX]; } MigrateCommon; void wait_for_serial(const char *side); diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c index 212a5ea600..fa8815175f 100644 --- a/tests/qtest/migration/postcopy-tests.c +++ b/tests/qtest/migration/postcopy-tests.c @@ -39,7 +39,7 @@ static void test_postcopy_suspend(void) static void test_postcopy_preempt(void) { MigrateCommon args = { - .postcopy_preempt = true, + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, }; test_postcopy_common(&args); @@ -73,7 +73,7 @@ static void test_postcopy_recovery_fail_reconnect(void) static void test_postcopy_preempt_recovery(void) { MigrateCommon args = { - .postcopy_preempt = true, + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, }; test_postcopy_recovery_common(&args); @@ -82,7 +82,7 @@ static void test_postcopy_preempt_recovery(void) static void test_multifd_postcopy(void) { MigrateCommon args = { - .multifd = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_postcopy_common(&args); @@ -91,8 +91,8 @@ static void test_multifd_postcopy(void) static void test_multifd_postcopy_preempt(void) { MigrateCommon args = { - .multifd = true, - .postcopy_preempt = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, }; test_postcopy_common(&args); diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c index b1a4e7bbb1..80c8f2a9d2 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -108,23 +108,12 @@ static void test_precopy_tcp_plain(void) test_precopy_common(&args); } -static void *migrate_hook_start_switchover_ack(QTestState *from, QTestState *to) -{ - - migrate_set_capability(from, "return-path", true); - migrate_set_capability(to, "return-path", true); - - migrate_set_capability(from, "switchover-ack", true); - migrate_set_capability(to, "switchover-ack", true); - - return NULL; -} - static void test_precopy_tcp_switchover_ack(void) { MigrateCommon args = { .listen_uri = "tcp:127.0.0.1:0", - .start_hook = migrate_hook_start_switchover_ack, + .caps[MIGRATION_CAPABILITY_RETURN_PATH] = true, + .caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK] = true, /* * Source VM must be running in order to consider the switchover ACK * when deciding to do switchover or not. @@ -393,6 +382,7 @@ static void test_multifd_tcp_uri_none(void) MigrateCommon args = { .listen_uri = "defer", .start_hook = migrate_hook_start_precopy_tcp_multifd, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, /* * Multifd is more complicated than most of the features, it * directly takes guest page buffers when sending, make sure @@ -408,6 +398,7 @@ static void test_multifd_tcp_zero_page_legacy(void) MigrateCommon args = { .listen_uri = "defer", .start_hook = migrate_hook_start_precopy_tcp_multifd_zero_page_legacy, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, /* * Multifd is more complicated than most of the features, it * directly takes guest page buffers when sending, make sure @@ -423,6 +414,7 @@ static void test_multifd_tcp_no_zero_page(void) MigrateCommon args = { .listen_uri = "defer", .start_hook = migrate_hook_start_precopy_tcp_multifd_no_zero_page, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, /* * Multifd is more complicated than most of the features, it * directly takes guest page buffers when sending, make sure @@ -439,6 +431,7 @@ static void test_multifd_tcp_channels_none(void) .listen_uri = "defer", .start_hook = migrate_hook_start_precopy_tcp_multifd, .live = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, .connect_channels = ("[ { 'channel-type': 'main'," " 'addr': { 'transport': 'socket'," " 'type': 'inet'," diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c index 094dc1d814..429fd797c0 100644 --- a/tests/qtest/migration/tls-tests.c +++ b/tests/qtest/migration/tls-tests.c @@ -375,9 +375,9 @@ static void test_postcopy_tls_psk(void) static void test_postcopy_preempt_tls_psk(void) { MigrateCommon args = { - .postcopy_preempt = true, .start_hook = migrate_hook_start_tls_psk_match, .end_hook = migrate_hook_end_tls_psk, + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, }; test_postcopy_common(&args); @@ -398,7 +398,7 @@ static void test_multifd_postcopy_recovery_tls_psk(void) MigrateCommon args = { .start_hook = migrate_hook_start_tls_psk_match, .end_hook = migrate_hook_end_tls_psk, - .multifd = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_postcopy_recovery_common(&args); @@ -408,9 +408,9 @@ static void test_multifd_postcopy_recovery_tls_psk(void) static void test_postcopy_preempt_all(void) { MigrateCommon args = { - .postcopy_preempt = true, .start_hook = migrate_hook_start_tls_psk_match, .end_hook = migrate_hook_end_tls_psk, + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, }; test_postcopy_recovery_common(&args); @@ -421,7 +421,7 @@ static void test_multifd_postcopy_preempt_recovery_tls_psk(void) MigrateCommon args = { .start_hook = migrate_hook_start_tls_psk_match, .end_hook = migrate_hook_end_tls_psk, - .multifd = true, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_postcopy_recovery_common(&args); @@ -674,8 +674,8 @@ static void test_multifd_tcp_tls_psk_mismatch(void) static void test_multifd_postcopy_tcp_tls_psk_match(void) { MigrateCommon args = { - .multifd = true, .listen_uri = "defer", + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match, .end_hook = migrate_hook_end_tls_psk, }; @@ -727,6 +727,7 @@ static void test_multifd_tcp_tls_x509_mismatch_host(void) .start_hook = migrate_hook_start_multifd_tls_x509_mismatch_host, .end_hook = migrate_hook_end_tls_x509, .result = MIG_TEST_FAIL, + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, }; test_precopy_common(&args); } -- 2.48.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities 2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit @ 2025-02-06 22:44 ` Peter Xu 2025-02-07 8:59 ` Prasad Pandit 2025-02-07 22:01 ` Fabiano Rosas 1 sibling, 1 reply; 23+ messages in thread From: Peter Xu @ 2025-02-06 22:44 UTC (permalink / raw) To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Wed, Feb 05, 2025 at 05:57:12PM +0530, Prasad Pandit wrote: > From: Prasad Pandit <pjp@fedoraproject.org> > > Migration capabilities are set in multiple '.start_hook' > functions for various tests. Instead, consolidate setting > capabilities in 'set_migration_capabilities()' function > which is called from various 'test_*_common()' functions. > While simplifying the capabilities setting, it helps > to declutter the test sources. > > Suggested-by: Fabiano Rosas <farosas@suse.de> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Would you mind reorder the two test patches, to avoid removing the lines added by previous patch? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities 2025-02-06 22:44 ` Peter Xu @ 2025-02-07 8:59 ` Prasad Pandit 0 siblings, 0 replies; 23+ messages in thread From: Prasad Pandit @ 2025-02-07 8:59 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit On Fri, 7 Feb 2025 at 04:14, Peter Xu <peterx@redhat.com> wrote: > Would you mind reorder the two test patches, to avoid removing the lines > added by previous patch? * Both ways they are the same in the end, no? Anyway, will do. Thank you. --- - Prasad ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities 2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit 2025-02-06 22:44 ` Peter Xu @ 2025-02-07 22:01 ` Fabiano Rosas 1 sibling, 0 replies; 23+ messages in thread From: Fabiano Rosas @ 2025-02-07 22:01 UTC (permalink / raw) To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit Prasad Pandit <ppandit@redhat.com> writes: > From: Prasad Pandit <pjp@fedoraproject.org> > > Migration capabilities are set in multiple '.start_hook' > functions for various tests. Instead, consolidate setting > capabilities in 'set_migration_capabilities()' function > which is called from various 'test_*_common()' functions. > While simplifying the capabilities setting, it helps > to declutter the test sources. > > Suggested-by: Fabiano Rosas <farosas@suse.de> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> > --- > tests/qtest/migration/compression-tests.c | 7 +-- > tests/qtest/migration/cpr-tests.c | 4 +- > tests/qtest/migration/file-tests.c | 44 +++++----------- > tests/qtest/migration/framework.c | 63 ++++++++++++++++------- > tests/qtest/migration/framework.h | 8 ++- > tests/qtest/migration/postcopy-tests.c | 10 ++-- > tests/qtest/migration/precopy-tests.c | 19 +++---- Isn't there a 16 channel multifd setup in this file? I don't see it in this patch. > tests/qtest/migration/tls-tests.c | 11 ++-- > 8 files changed, 79 insertions(+), 87 deletions(-) > > diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c > index 3252ba2f73..13a2b2d74f 100644 > --- a/tests/qtest/migration/compression-tests.c > +++ b/tests/qtest/migration/compression-tests.c > @@ -43,7 +43,7 @@ static void test_multifd_tcp_zstd(void) > static void test_multifd_postcopy_tcp_zstd(void) > { > MigrateCommon args = { > - .postcopy_ram = true, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true, > .listen_uri = "defer", > .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd, > }; > @@ -114,10 +114,6 @@ migrate_hook_start_xbzrle(QTestState *from, > QTestState *to) > { > migrate_set_parameter_int(from, "xbzrle-cache-size", 33554432); > - > - migrate_set_capability(from, "xbzrle", true); > - migrate_set_capability(to, "xbzrle", true); > - > return NULL; > } > > @@ -129,6 +125,7 @@ static void test_precopy_unix_xbzrle(void) > .listen_uri = uri, > .start_hook = migrate_hook_start_xbzrle, > .iterations = 2, > + .caps[MIGRATION_CAPABILITY_XBZRLE] = true, > /* > * XBZRLE needs pages to be modified when doing the 2nd+ round > * iteration to have real data pushed to the stream. > diff --git a/tests/qtest/migration/cpr-tests.c b/tests/qtest/migration/cpr-tests.c > index 44ce89aa5b..818fa95133 100644 > --- a/tests/qtest/migration/cpr-tests.c > +++ b/tests/qtest/migration/cpr-tests.c > @@ -24,9 +24,6 @@ static void *migrate_hook_start_mode_reboot(QTestState *from, QTestState *to) > migrate_set_parameter_str(from, "mode", "cpr-reboot"); > migrate_set_parameter_str(to, "mode", "cpr-reboot"); > > - migrate_set_capability(from, "x-ignore-shared", true); > - migrate_set_capability(to, "x-ignore-shared", true); > - > return NULL; > } > > @@ -39,6 +36,7 @@ static void test_mode_reboot(void) > .connect_uri = uri, > .listen_uri = "defer", > .start_hook = migrate_hook_start_mode_reboot, > + .caps[MIGRATION_CAPABILITY_X_IGNORE_SHARED] = true, > }; > > test_file_common(&args, true); > diff --git a/tests/qtest/migration/file-tests.c b/tests/qtest/migration/file-tests.c > index 84225c8c33..bc551949f9 100644 > --- a/tests/qtest/migration/file-tests.c > +++ b/tests/qtest/migration/file-tests.c > @@ -107,15 +107,6 @@ static void test_precopy_file_offset_bad(void) > test_file_common(&args, false); > } > > -static void *migrate_hook_start_mapped_ram(QTestState *from, > - QTestState *to) > -{ > - migrate_set_capability(from, "mapped-ram", true); > - migrate_set_capability(to, "mapped-ram", true); > - > - return NULL; > -} > - > static void test_precopy_file_mapped_ram_live(void) > { > g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, > @@ -123,7 +114,7 @@ static void test_precopy_file_mapped_ram_live(void) > MigrateCommon args = { > .connect_uri = uri, > .listen_uri = "defer", > - .start_hook = migrate_hook_start_mapped_ram, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > }; > > test_file_common(&args, false); > @@ -136,26 +127,12 @@ static void test_precopy_file_mapped_ram(void) > MigrateCommon args = { > .connect_uri = uri, > .listen_uri = "defer", > - .start_hook = migrate_hook_start_mapped_ram, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > }; > > test_file_common(&args, true); > } > > -static void *migrate_hook_start_multifd_mapped_ram(QTestState *from, > - QTestState *to) > -{ > - migrate_hook_start_mapped_ram(from, to); > - > - migrate_set_parameter_int(from, "multifd-channels", 4); > - migrate_set_parameter_int(to, "multifd-channels", 4); > - > - migrate_set_capability(from, "multifd", true); > - migrate_set_capability(to, "multifd", true); > - > - return NULL; > -} > - > static void test_multifd_file_mapped_ram_live(void) > { > g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs, > @@ -163,7 +140,8 @@ static void test_multifd_file_mapped_ram_live(void) > MigrateCommon args = { > .connect_uri = uri, > .listen_uri = "defer", > - .start_hook = migrate_hook_start_multifd_mapped_ram, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_file_common(&args, false); > @@ -176,7 +154,8 @@ static void test_multifd_file_mapped_ram(void) > MigrateCommon args = { > .connect_uri = uri, > .listen_uri = "defer", > - .start_hook = migrate_hook_start_multifd_mapped_ram, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_file_common(&args, true); > @@ -185,8 +164,6 @@ static void test_multifd_file_mapped_ram(void) > static void *migrate_hook_start_multifd_mapped_ram_dio(QTestState *from, > QTestState *to) > { > - migrate_hook_start_multifd_mapped_ram(from, to); > - > migrate_set_parameter_bool(from, "direct-io", true); > migrate_set_parameter_bool(to, "direct-io", true); > > @@ -201,6 +178,8 @@ static void test_multifd_file_mapped_ram_dio(void) > .connect_uri = uri, > .listen_uri = "defer", > .start_hook = migrate_hook_start_multifd_mapped_ram_dio, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > if (!probe_o_direct_support(tmpfs)) { > @@ -246,7 +225,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset_dio(QTestState *from, > fdset_add_fds(from, file, O_WRONLY, 2, true); > fdset_add_fds(to, file, O_RDONLY, 2, true); > > - migrate_hook_start_multifd_mapped_ram(from, to); > migrate_set_parameter_bool(from, "direct-io", true); > migrate_set_parameter_bool(to, "direct-io", true); > > @@ -261,8 +239,6 @@ static void *migrate_hook_start_multifd_mapped_ram_fdset(QTestState *from, > fdset_add_fds(from, file, O_WRONLY, 2, false); > fdset_add_fds(to, file, O_RDONLY, 2, false); > > - migrate_hook_start_multifd_mapped_ram(from, to); > - > return NULL; > } > > @@ -275,6 +251,8 @@ static void test_multifd_file_mapped_ram_fdset(void) > .listen_uri = "defer", > .start_hook = migrate_hook_start_multifd_mapped_ram_fdset, > .end_hook = migrate_hook_end_multifd_mapped_ram_fdset, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_file_common(&args, true); > @@ -289,6 +267,8 @@ static void test_multifd_file_mapped_ram_fdset_dio(void) > .listen_uri = "defer", > .start_hook = migrate_hook_start_multifd_mapped_ram_fdset_dio, > .end_hook = migrate_hook_end_multifd_mapped_ram_fdset, > + .caps[MIGRATION_CAPABILITY_MAPPED_RAM] = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > if (!probe_o_direct_support(tmpfs)) { > diff --git a/tests/qtest/migration/framework.c b/tests/qtest/migration/framework.c > index 00776f858c..a858b6ffec 100644 > --- a/tests/qtest/migration/framework.c > +++ b/tests/qtest/migration/framework.c > @@ -204,6 +204,30 @@ static QList *migrate_start_get_qmp_capabilities(const MigrateStart *args) > return capabilities; > } > > +static void set_migration_capabilities(QTestState *from, > + QTestState *to, MigrateCommon *args) > +{ > + /* > + * MigrationCapability_lookup and MIGRATION_CAPABILITY_ constants > + * are from qapi-types-migration.h. > + */ > + for(uint8_t i = 0; i < MIGRATION_CAPABILITY__MAX; i++) > + { > + if(args->caps[i]) { > + if (from) { > + migrate_set_capability(from, > + MigrationCapability_lookup.array[i], true); > + } > + if (to) { > + migrate_set_capability(to, > + MigrationCapability_lookup.array[i], true); > + } > + } > + } > + > + return; > +} > + > int migrate_start(QTestState **from, QTestState **to, const char *uri, > MigrateStart *args) > { > @@ -418,19 +442,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr, > args->postcopy_data = args->start_hook(from, to); > } > > - migrate_set_capability(from, "postcopy-ram", true); > - migrate_set_capability(to, "postcopy-ram", true); > - migrate_set_capability(to, "postcopy-blocktime", true); > - > - if (args->postcopy_preempt) { > - migrate_set_capability(from, "postcopy-preempt", true); > - migrate_set_capability(to, "postcopy-preempt", true); > - } > - > - if (args->multifd) { > - migrate_set_capability(from, "multifd", true); > - migrate_set_capability(to, "multifd", true); > - > + set_migration_capabilities(from, to, args); > + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { > migrate_set_parameter_int(from, "multifd-channels", 8); > migrate_set_parameter_int(to, "multifd-channels", 8); > } > @@ -491,6 +504,10 @@ void test_postcopy_common(MigrateCommon *args) > { > QTestState *from, *to; > > + /* set postcopy capabilities */ > + args->caps[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME] = true; > + args->caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true; > + > if (migrate_postcopy_prepare(&from, &to, args)) { > return; > } > @@ -631,6 +648,10 @@ void test_postcopy_recovery_common(MigrateCommon *args) > /* Always hide errors for postcopy recover tests since they're expected */ > args->start.hide_stderr = true; > > + /* set postcopy capabilities */ > + args->caps[MIGRATION_CAPABILITY_POSTCOPY_BLOCKTIME] = true; > + args->caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true; > + Why not keep them inside migrate_postcopy_prepare()? > if (migrate_postcopy_prepare(&from, &to, args)) { > return; > } > @@ -699,9 +720,10 @@ void test_precopy_common(MigrateCommon *args) > return; > } > > - if (args->postcopy_ram) { > - migrate_set_capability(from, "postcopy-ram", true); > - migrate_set_capability(to, "postcopy-ram", true); > + set_migration_capabilities(from, to, args); > + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { > + migrate_set_parameter_int(from, "multifd-channels", 4); > + migrate_set_parameter_int(to, "multifd-channels", 4); > } > > if (args->start_hook) { > @@ -847,6 +869,12 @@ void test_file_common(MigrateCommon *args, bool stop_src) > */ > g_assert_false(args->live); > > + set_migration_capabilities(from, to, args); > + if (args->caps[MIGRATION_CAPABILITY_MULTIFD]) { > + migrate_set_parameter_int(from, "multifd-channels", 4); > + migrate_set_parameter_int(to, "multifd-channels", 4); > + } > + > if (g_strrstr(args->connect_uri, "offset=")) { > check_offset = true; > /* > @@ -913,9 +941,6 @@ void *migrate_hook_start_precopy_tcp_multifd_common(QTestState *from, > migrate_set_parameter_str(from, "multifd-compression", method); > migrate_set_parameter_str(to, "multifd-compression", method); > > - migrate_set_capability(from, "multifd", true); > - migrate_set_capability(to, "multifd", true); > - > /* Start incoming migration from the 1st socket */ > migrate_incoming_qmp(to, "tcp:127.0.0.1:0", "{}"); > > diff --git a/tests/qtest/migration/framework.h b/tests/qtest/migration/framework.h > index 214288ca42..cc1ea4d82b 100644 > --- a/tests/qtest/migration/framework.h > +++ b/tests/qtest/migration/framework.h > @@ -12,6 +12,7 @@ > #define TEST_FRAMEWORK_H > > #include "libqtest.h" > +#include <qapi/qapi-types-migration.h> > > #define FILE_TEST_FILENAME "migfile" > #define FILE_TEST_OFFSET 0x1000 > @@ -193,14 +194,11 @@ typedef struct { > */ > bool live; > > - /* set multifd on */ > - bool multifd; > - > /* Postcopy specific fields */ > - bool postcopy_ram; > void *postcopy_data; > - bool postcopy_preempt; > PostcopyRecoveryFailStage postcopy_recovery_fail_stage; > + > + bool caps[MIGRATION_CAPABILITY__MAX]; > } MigrateCommon; > > void wait_for_serial(const char *side); > diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c > index 212a5ea600..fa8815175f 100644 > --- a/tests/qtest/migration/postcopy-tests.c > +++ b/tests/qtest/migration/postcopy-tests.c > @@ -39,7 +39,7 @@ static void test_postcopy_suspend(void) > static void test_postcopy_preempt(void) > { > MigrateCommon args = { > - .postcopy_preempt = true, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, > }; > > test_postcopy_common(&args); > @@ -73,7 +73,7 @@ static void test_postcopy_recovery_fail_reconnect(void) > static void test_postcopy_preempt_recovery(void) > { > MigrateCommon args = { > - .postcopy_preempt = true, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, > }; > > test_postcopy_recovery_common(&args); > @@ -82,7 +82,7 @@ static void test_postcopy_preempt_recovery(void) > static void test_multifd_postcopy(void) > { > MigrateCommon args = { > - .multifd = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_postcopy_common(&args); > @@ -91,8 +91,8 @@ static void test_multifd_postcopy(void) > static void test_multifd_postcopy_preempt(void) > { > MigrateCommon args = { > - .multifd = true, > - .postcopy_preempt = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, > }; > > test_postcopy_common(&args); > diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c > index b1a4e7bbb1..80c8f2a9d2 100644 > --- a/tests/qtest/migration/precopy-tests.c > +++ b/tests/qtest/migration/precopy-tests.c > @@ -108,23 +108,12 @@ static void test_precopy_tcp_plain(void) > test_precopy_common(&args); > } > > -static void *migrate_hook_start_switchover_ack(QTestState *from, QTestState *to) > -{ > - > - migrate_set_capability(from, "return-path", true); > - migrate_set_capability(to, "return-path", true); > - > - migrate_set_capability(from, "switchover-ack", true); > - migrate_set_capability(to, "switchover-ack", true); > - > - return NULL; > -} > - > static void test_precopy_tcp_switchover_ack(void) > { > MigrateCommon args = { > .listen_uri = "tcp:127.0.0.1:0", > - .start_hook = migrate_hook_start_switchover_ack, > + .caps[MIGRATION_CAPABILITY_RETURN_PATH] = true, > + .caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK] = true, > /* > * Source VM must be running in order to consider the switchover ACK > * when deciding to do switchover or not. > @@ -393,6 +382,7 @@ static void test_multifd_tcp_uri_none(void) > MigrateCommon args = { > .listen_uri = "defer", > .start_hook = migrate_hook_start_precopy_tcp_multifd, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > /* > * Multifd is more complicated than most of the features, it > * directly takes guest page buffers when sending, make sure > @@ -408,6 +398,7 @@ static void test_multifd_tcp_zero_page_legacy(void) > MigrateCommon args = { > .listen_uri = "defer", > .start_hook = migrate_hook_start_precopy_tcp_multifd_zero_page_legacy, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > /* > * Multifd is more complicated than most of the features, it > * directly takes guest page buffers when sending, make sure > @@ -423,6 +414,7 @@ static void test_multifd_tcp_no_zero_page(void) > MigrateCommon args = { > .listen_uri = "defer", > .start_hook = migrate_hook_start_precopy_tcp_multifd_no_zero_page, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > /* > * Multifd is more complicated than most of the features, it > * directly takes guest page buffers when sending, make sure > @@ -439,6 +431,7 @@ static void test_multifd_tcp_channels_none(void) > .listen_uri = "defer", > .start_hook = migrate_hook_start_precopy_tcp_multifd, > .live = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > .connect_channels = ("[ { 'channel-type': 'main'," > " 'addr': { 'transport': 'socket'," > " 'type': 'inet'," > diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c > index 094dc1d814..429fd797c0 100644 > --- a/tests/qtest/migration/tls-tests.c > +++ b/tests/qtest/migration/tls-tests.c > @@ -375,9 +375,9 @@ static void test_postcopy_tls_psk(void) > static void test_postcopy_preempt_tls_psk(void) > { > MigrateCommon args = { > - .postcopy_preempt = true, > .start_hook = migrate_hook_start_tls_psk_match, > .end_hook = migrate_hook_end_tls_psk, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, > }; > > test_postcopy_common(&args); > @@ -398,7 +398,7 @@ static void test_multifd_postcopy_recovery_tls_psk(void) > MigrateCommon args = { > .start_hook = migrate_hook_start_tls_psk_match, > .end_hook = migrate_hook_end_tls_psk, > - .multifd = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_postcopy_recovery_common(&args); > @@ -408,9 +408,9 @@ static void test_multifd_postcopy_recovery_tls_psk(void) > static void test_postcopy_preempt_all(void) > { > MigrateCommon args = { > - .postcopy_preempt = true, > .start_hook = migrate_hook_start_tls_psk_match, > .end_hook = migrate_hook_end_tls_psk, > + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true, > }; > > test_postcopy_recovery_common(&args); > @@ -421,7 +421,7 @@ static void test_multifd_postcopy_preempt_recovery_tls_psk(void) > MigrateCommon args = { > .start_hook = migrate_hook_start_tls_psk_match, > .end_hook = migrate_hook_end_tls_psk, > - .multifd = true, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > > test_postcopy_recovery_common(&args); > @@ -674,8 +674,8 @@ static void test_multifd_tcp_tls_psk_mismatch(void) > static void test_multifd_postcopy_tcp_tls_psk_match(void) > { > MigrateCommon args = { > - .multifd = true, > .listen_uri = "defer", > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match, > .end_hook = migrate_hook_end_tls_psk, > }; > @@ -727,6 +727,7 @@ static void test_multifd_tcp_tls_x509_mismatch_host(void) > .start_hook = migrate_hook_start_multifd_tls_x509_mismatch_host, > .end_hook = migrate_hook_end_tls_x509, > .result = MIG_TEST_FAIL, > + .caps[MIGRATION_CAPABILITY_MULTIFD] = true, > }; > test_precopy_common(&args); > } ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-02-12 17:53 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit 2025-02-06 22:41 ` Peter Xu 2025-02-05 12:27 ` [PATCH v5 2/5] migration: refactor ram_save_target_page functions Prasad Pandit 2025-02-06 22:43 ` Peter Xu 2025-02-07 12:19 ` Fabiano Rosas 2025-02-05 12:27 ` [PATCH v5 3/5] migration: enable multifd and postcopy together Prasad Pandit 2025-02-06 23:16 ` Peter Xu 2025-02-07 10:32 ` Prasad Pandit 2025-02-07 15:45 ` Peter Xu 2025-02-08 10:36 ` Prasad Pandit 2025-02-10 16:59 ` Peter Xu 2025-02-11 9:04 ` Prasad Pandit 2025-02-11 15:20 ` Peter Xu 2025-02-12 13:27 ` Prasad Pandit 2025-02-12 14:37 ` Peter Xu 2025-02-12 17:36 ` Prasad Pandit 2025-02-12 17:52 ` Peter Xu 2025-02-05 12:27 ` [PATCH v5 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit 2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit 2025-02-06 22:44 ` Peter Xu 2025-02-07 8:59 ` Prasad Pandit 2025-02-07 22:01 ` Fabiano Rosas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).