* [PATCH v10 0/3] Allow to enable multifd and postcopy migration together
@ 2025-05-08 12:28 Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 1/3] migration: enable multifd and postcopy together Prasad Pandit
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Prasad Pandit @ 2025-05-08 12:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
Hello,
* This series (v10) fixes the migration hang issue caused by optimised
writing of the zero pages during multifd phase. It also fixes the qtest
failure caused by missing 'env->has_uffd' check before running a postcopy
test.
Some of the patches from v9 series were pulled upstream. This series
has the remaining few patches.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 191.80s 81 subtests passed
===
v9: https://lore.kernel.org/qemu-devel/20250411114534.3370816-1-ppandit@redhat.com/T/#t
* This series (v9) does minor refactoring and reordering changes as
suggested in the review of earlier series (v8). Also tried to
reproduce/debug a qtest hang issue, but it could not be reproduced.
From the shared stack traces it looked like Postcopy thread was
preparing to finish before migrating all the pages.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 170.50s 81 subtests passed
===
v8: https://lore.kernel.org/qemu-devel/20250318123846.1370312-1-ppandit@redhat.com/T/#t
* This series (v8) splits earlier patch-2 which enabled multifd and
postcopy options together into two separate patches. One modifies
the channel discovery in migration_ioc_process_incoming() function,
and second one enables the multifd and postcopy migration together.
It also adds the 'save_postcopy_prepare' savevm_state handler to
enable different sections to take an action just before the Postcopy
phase starts. Thank you Peter for these patches.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 152.66s 81 subtests passed
===
v7: https://lore.kernel.org/qemu-devel/20250228121749.553184-1-ppandit@redhat.com/T/#t
* This series (v7) adds 'MULTIFD_RECV_SYNC' migration command. It is used
to notify the destination migration thread to synchronise with the Multifd
threads. This allows Multifd ('mig/dst/recv_x') threads on the destination
to receive all their data, before they are shutdown.
This series also updates the channel discovery function and qtests as
suggested in the previous review comments.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 147.84s 81 subtests passed
===
v6: https://lore.kernel.org/qemu-devel/20250215123119.814345-1-ppandit@redhat.com/T/#t
* This series (v6) shuts down Multifd threads before starting Postcopy
migration. It helps to avoid an issue of multifd pages arriving late
at the destination during Postcopy phase and corrupting the vCPU
state. It also reorders the qtest patches and does some refactoring
changes as suggested in previous review.
===
67/67 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 161.35s 73 subtests passed
===
v5: https://lore.kernel.org/qemu-devel/20250205122712.229151-1-ppandit@redhat.com/T/#t
* This series (v5) consolidates migration capabilities setting in one
'set_migration_capabilities()' function, thus simplifying test sources.
It passes all migration tests.
===
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 143.66s 71 subtests passed
===
v4: https://lore.kernel.org/qemu-devel/20250127120823.144949-1-ppandit@redhat.com/T/#t
* This series (v4) adds more 'multifd+postcopy' qtests which test
Precopy migration with 'postcopy-ram' attribute set. And run
Postcopy migrations with 'multifd' channels enabled.
===
$ ../qtest/migration-test --tap -k -r '/x86_64/migration/multifd+postcopy' | grep -i 'slow test'
# slow test /x86_64/migration/multifd+postcopy/plain executed in 1.29 secs
# slow test /x86_64/migration/multifd+postcopy/recovery/tls/psk executed in 2.48 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/plain executed in 1.49 secs
# slow test /x86_64/migration/multifd+postcopy/preempt/recovery/tls/psk executed in 2.52 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/tls/psk/match executed in 3.62 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/zstd executed in 1.34 secs
# slow test /x86_64/migration/multifd+postcopy/tcp/plain/cancel executed in 2.24 secs
...
66/66 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test OK 148.41s 71 subtests passed
===
v3: https://lore.kernel.org/qemu-devel/20250121131032.1611245-1-ppandit@redhat.com/T/#t
* This series (v3) passes all existing 'tests/qtest/migration/*' tests
and adds a new one to enable multifd channels with postcopy migration.
v2: https://lore.kernel.org/qemu-devel/20241129122256.96778-1-ppandit@redhat.com/T/#u
* This series (v2) further refactors the 'ram_save_target_page'
function to make it independent of the multifd & postcopy change.
v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
* This series removes magic value (4-bytes) introduced in the
previous series for the Postcopy channel.
v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
* Currently Multifd and Postcopy migration can not be used together.
QEMU shows "Postcopy is not yet compatible with multifd" message.
When migrating guests with large (100's GB) RAM, Multifd threads
help to accelerate migration, but inability to use it with the
Postcopy mode delays guest start up on the destination side.
* This patch series allows to enable both Multifd and Postcopy
migration together. Precopy and Multifd threads work during
the initial guest (RAM) transfer. When migration moves to the
Postcopy phase, Multifd threads are restrained and the Postcopy
threads start to request pages from the source side.
* This series introduces magic value (4-bytes) to be sent on the
Postcopy channel. It helps to differentiate channels and properly
setup incoming connections on the destination side.
Thank you.
---
Prasad Pandit (3):
migration: enable multifd and postcopy together
tests/qtest/migration: add postcopy tests with multifd
migration: write zero pages when postcopy enabled
migration/multifd-nocomp.c | 3 +-
migration/multifd-zero-page.c | 22 +++++++++-
migration/multifd.c | 7 ++++
migration/options.c | 5 ---
migration/ram.c | 5 +--
tests/qtest/migration/compression-tests.c | 18 ++++++++
tests/qtest/migration/postcopy-tests.c | 27 ++++++++++++
tests/qtest/migration/precopy-tests.c | 22 ++++++++++
tests/qtest/migration/tls-tests.c | 50 +++++++++++++++++++++++
9 files changed, 148 insertions(+), 11 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v10 1/3] migration: enable multifd and postcopy together
2025-05-08 12:28 [PATCH v10 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
@ 2025-05-08 12:28 ` Prasad Pandit
2025-05-08 14:06 ` Fabiano Rosas
2025-05-08 12:28 ` [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 3/3] migration: write zero pages when postcopy enabled Prasad Pandit
2 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-08 12:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
Enable Multifd and Postcopy migration together.
The migration_ioc_process_incoming() routine checks
magic value sent on each channel and helps to properly
setup multifd and postcopy channels.
The Precopy and Multifd threads work during the initial
guest RAM transfer. When migration moves to the Postcopy
phase, the multifd threads cease to send data on multifd
channels and Postcopy threads on the destination
request/pull data from the source side.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
migration/multifd-nocomp.c | 3 ++-
migration/multifd.c | 7 +++++++
migration/options.c | 5 -----
migration/ram.c | 5 ++---
4 files changed, 11 insertions(+), 9 deletions(-)
v10: no change
v9:
- https://lore.kernel.org/qemu-devel/20250411114534.3370816-6-ppandit@redhat.com/
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 88fe0f99f2..b48eae3d86 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -17,6 +17,7 @@
#include "migration-stats.h"
#include "multifd.h"
#include "options.h"
+#include "migration.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
#include "qemu/error-report.h"
@@ -398,7 +399,7 @@ int multifd_ram_flush_and_sync(QEMUFile *f)
MultiFDSyncReq req;
int ret;
- if (!migrate_multifd()) {
+ if (!migrate_multifd() || migration_in_postcopy()) {
return 0;
}
diff --git a/migration/multifd.c b/migration/multifd.c
index ec108af624..f18b166bcf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1379,6 +1379,13 @@ static void *multifd_recv_thread(void *opaque)
}
if (has_data) {
+ /*
+ * multifd thread should not be active and receive data
+ * when migration is in the Postcopy phase. Two threads
+ * writing the same memory area could easily corrupt
+ * the guest state.
+ */
+ assert(!migration_in_postcopy());
if (is_device_state) {
assert(use_packets);
ret = multifd_device_state_recv(p, &local_err);
diff --git a/migration/options.c b/migration/options.c
index b6ae95358d..3fcd577cd7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -509,11 +509,6 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp)
error_setg(errp, "Postcopy is not compatible with ignore-shared");
return false;
}
-
- if (new_caps[MIGRATION_CAPABILITY_MULTIFD]) {
- error_setg(errp, "Postcopy is not yet compatible with multifd");
- return false;
- }
}
if (new_caps[MIGRATION_CAPABILITY_BACKGROUND_SNAPSHOT]) {
diff --git a/migration/ram.c b/migration/ram.c
index e12913b43e..d26dbd37c4 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1993,9 +1993,8 @@ static int ram_save_target_page(RAMState *rs, PageSearchStatus *pss)
}
}
- if (migrate_multifd()) {
- RAMBlock *block = pss->block;
- return ram_save_multifd_page(block, offset);
+ if (migrate_multifd() && !migration_in_postcopy()) {
+ return ram_save_multifd_page(pss->block, offset);
}
return ram_save_page(rs, pss);
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd
2025-05-08 12:28 [PATCH v10 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 1/3] migration: enable multifd and postcopy together Prasad Pandit
@ 2025-05-08 12:28 ` Prasad Pandit
2025-05-08 19:04 ` Peter Xu
2025-05-08 12:28 ` [PATCH v10 3/3] migration: write zero pages when postcopy enabled Prasad Pandit
2 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-08 12:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
Add new qtests to run postcopy migration with multifd
channels enabled.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
tests/qtest/migration/compression-tests.c | 18 ++++++++
tests/qtest/migration/postcopy-tests.c | 27 ++++++++++++
tests/qtest/migration/precopy-tests.c | 22 ++++++++++
tests/qtest/migration/tls-tests.c | 50 +++++++++++++++++++++++
4 files changed, 117 insertions(+)
v10:
- Check 'env->has_uffd' before postcopy test
v9:
- https://lore.kernel.org/qemu-devel/20250411114534.3370816-8-ppandit@redhat.com/
diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c
index 41e79f031b..b827665b8e 100644
--- a/tests/qtest/migration/compression-tests.c
+++ b/tests/qtest/migration/compression-tests.c
@@ -42,6 +42,20 @@ static void test_multifd_tcp_zstd(void)
};
test_precopy_common(&args);
}
+
+static void test_multifd_postcopy_tcp_zstd(void)
+{
+ MigrateCommon args = {
+ .listen_uri = "defer",
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
+ },
+ .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
+ };
+
+ test_precopy_common(&args);
+}
#endif /* CONFIG_ZSTD */
#ifdef CONFIG_QATZIP
@@ -184,6 +198,10 @@ void migration_test_add_compression(MigrationTestEnv *env)
#ifdef CONFIG_ZSTD
migration_test_add("/migration/multifd/tcp/plain/zstd",
test_multifd_tcp_zstd);
+ if (env->has_uffd) {
+ migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd",
+ test_multifd_postcopy_tcp_zstd);
+ }
#endif
#ifdef CONFIG_QATZIP
diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c
index 483e3ff99f..eb637f94f7 100644
--- a/tests/qtest/migration/postcopy-tests.c
+++ b/tests/qtest/migration/postcopy-tests.c
@@ -94,6 +94,29 @@ static void migration_test_add_postcopy_smoke(MigrationTestEnv *env)
}
}
+static void test_multifd_postcopy(void)
+{
+ MigrateCommon args = {
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ },
+ };
+
+ test_postcopy_common(&args);
+}
+
+static void test_multifd_postcopy_preempt(void)
+{
+ MigrateCommon args = {
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+ },
+ };
+
+ test_postcopy_common(&args);
+}
+
void migration_test_add_postcopy(MigrationTestEnv *env)
{
migration_test_add_postcopy_smoke(env);
@@ -114,6 +137,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env)
"/migration/postcopy/recovery/double-failures/reconnect",
test_postcopy_recovery_fail_reconnect);
+ migration_test_add("/migration/postcopy/multifd/plain",
+ test_multifd_postcopy);
+ migration_test_add("/migration/postcopy/multifd/preempt/plain",
+ test_multifd_postcopy_preempt);
if (env->is_x86) {
migration_test_add("/migration/postcopy/suspend",
test_postcopy_suspend);
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index 87b0a7e8ef..a575791c72 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -34,6 +34,7 @@
#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
static char *tmpfs;
+static bool postcopy_ram = false;
static void test_precopy_unix_plain(void)
{
@@ -538,6 +539,11 @@ static void test_multifd_tcp_cancel(void)
migrate_ensure_non_converge(from);
migrate_prepare_for_dirty_mem(from);
+ if (postcopy_ram) {
+ migrate_set_capability(from, "postcopy-ram", true);
+ migrate_set_capability(to, "postcopy-ram", true);
+ }
+
migrate_set_parameter_int(from, "multifd-channels", 16);
migrate_set_parameter_int(to, "multifd-channels", 16);
@@ -579,6 +585,10 @@ static void test_multifd_tcp_cancel(void)
return;
}
+ if (postcopy_ram) {
+ migrate_set_capability(to2, "postcopy-ram", true);
+ }
+
migrate_set_parameter_int(to2, "multifd-channels", 16);
migrate_set_capability(to2, "multifd", true);
@@ -602,6 +612,13 @@ static void test_multifd_tcp_cancel(void)
migrate_end(from, to2, true);
}
+static void test_multifd_postcopy_tcp_cancel(void)
+{
+ postcopy_ram = true;
+ test_multifd_tcp_cancel();
+ postcopy_ram = false;
+}
+
static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
const char *uri, const char *phase)
{
@@ -1189,6 +1206,11 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
test_multifd_tcp_uri_none);
migration_test_add("/migration/multifd/tcp/plain/cancel",
test_multifd_tcp_cancel);
+ if (env->has_uffd) {
+ migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
+ test_multifd_postcopy_tcp_cancel);
+ }
+
#ifdef CONFIG_RDMA
migration_test_add("/migration/precopy/rdma/plain",
test_precopy_rdma_plain);
diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c
index 72f44defbb..50a07a1c0f 100644
--- a/tests/qtest/migration/tls-tests.c
+++ b/tests/qtest/migration/tls-tests.c
@@ -395,6 +395,19 @@ static void test_postcopy_recovery_tls_psk(void)
test_postcopy_recovery_common(&args);
}
+static void test_multifd_postcopy_recovery_tls_psk(void)
+{
+ MigrateCommon args = {
+ .start_hook = migrate_hook_start_tls_psk_match,
+ .end_hook = migrate_hook_end_tls_psk,
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ },
+ };
+
+ test_postcopy_recovery_common(&args);
+}
+
/* This contains preempt+recovery+tls test altogether */
static void test_postcopy_preempt_all(void)
{
@@ -409,6 +422,20 @@ static void test_postcopy_preempt_all(void)
test_postcopy_recovery_common(&args);
}
+static void test_multifd_postcopy_preempt_recovery_tls_psk(void)
+{
+ MigrateCommon args = {
+ .start_hook = migrate_hook_start_tls_psk_match,
+ .end_hook = migrate_hook_end_tls_psk,
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
+ },
+ };
+
+ test_postcopy_recovery_common(&args);
+}
+
static void test_precopy_unix_tls_psk(void)
{
g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -657,6 +684,21 @@ static void test_multifd_tcp_tls_psk_mismatch(void)
test_precopy_common(&args);
}
+static void test_multifd_postcopy_tcp_tls_psk_match(void)
+{
+ MigrateCommon args = {
+ .start = {
+ .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
+ .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
+ },
+ .listen_uri = "defer",
+ .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match,
+ .end_hook = migrate_hook_end_tls_psk,
+ };
+
+ test_precopy_common(&args);
+}
+
#ifdef CONFIG_TASN1
static void test_multifd_tcp_tls_x509_default_host(void)
{
@@ -774,6 +816,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
test_postcopy_preempt_tls_psk);
migration_test_add("/migration/postcopy/preempt/recovery/tls/psk",
test_postcopy_preempt_all);
+ migration_test_add("/migration/postcopy/multifd/recovery/tls/psk",
+ test_multifd_postcopy_recovery_tls_psk);
+ migration_test_add("/migration/postcopy/multifd/preempt/recovery/tls/psk",
+ test_multifd_postcopy_preempt_recovery_tls_psk);
}
#ifdef CONFIG_TASN1
migration_test_add("/migration/precopy/unix/tls/x509/default-host",
@@ -805,6 +851,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
test_multifd_tcp_tls_psk_match);
migration_test_add("/migration/multifd/tcp/tls/psk/mismatch",
test_multifd_tcp_tls_psk_mismatch);
+ if (env->has_uffd) {
+ migration_test_add("/migration/multifd+postcopy/tcp/tls/psk/match",
+ test_multifd_postcopy_tcp_tls_psk_match);
+ }
#ifdef CONFIG_TASN1
migration_test_add("/migration/multifd/tcp/tls/x509/default-host",
test_multifd_tcp_tls_x509_default_host);
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-08 12:28 [PATCH v10 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 1/3] migration: enable multifd and postcopy together Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
@ 2025-05-08 12:28 ` Prasad Pandit
2025-05-08 13:57 ` Fabiano Rosas
2 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-08 12:28 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, farosas, berrange, Prasad Pandit
From: Prasad Pandit <pjp@fedoraproject.org>
During multifd migration, zero pages are are written if
they are migrated more than ones.
This may result in a migration hang issue when Multifd
and Postcopy are enabled together.
When Postcopy is enabled, always write zero pages as and
when they are migrated.
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
migration/multifd-zero-page.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
v10: new patch, not present in v9 or earlier versions.
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index dbc1184921..9bfb3ef803 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -85,9 +85,27 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
{
for (int i = 0; i < p->zero_num; i++) {
void *page = p->host + p->zero[i];
- if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
+
+ /*
+ * During multifd migration zero page is written to the memory
+ * only if it is migrated more than ones.
+ *
+ * It becomes a problem when both Multifd & Postcopy options are
+ * enabled. If the zero page which was skipped during multifd phase,
+ * is accessed during the Postcopy phase of the migration, a page
+ * fault occurs. But this page fault is not served because the
+ * 'receivedmap' says the zero page is already received. Thus the
+ * migration hangs.
+ *
+ * When Postcopy is enabled, always write the zero page as and when
+ * it is migrated.
+ *
+ */
+ if (migrate_postcopy_ram() ||
+ ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
memset(page, 0, multifd_ram_page_size());
- } else {
+ }
+ if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-08 12:28 ` [PATCH v10 3/3] migration: write zero pages when postcopy enabled Prasad Pandit
@ 2025-05-08 13:57 ` Fabiano Rosas
2025-05-08 15:40 ` Peter Xu
2025-05-09 5:31 ` Prasad Pandit
0 siblings, 2 replies; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-08 13:57 UTC (permalink / raw)
To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit
Prasad Pandit <ppandit@redhat.com> writes:
> From: Prasad Pandit <pjp@fedoraproject.org>
>
> During multifd migration, zero pages are are written if
> they are migrated more than ones.
s/ones/once/
>
> This may result in a migration hang issue when Multifd
> and Postcopy are enabled together.
>
> When Postcopy is enabled, always write zero pages as and
> when they are migrated.
>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
This patch should come before 1/3, otherwise it'll break bisect.
> ---
> migration/multifd-zero-page.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> v10: new patch, not present in v9 or earlier versions.
>
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> index dbc1184921..9bfb3ef803 100644
> --- a/migration/multifd-zero-page.c
> +++ b/migration/multifd-zero-page.c
> @@ -85,9 +85,27 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
> - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> +
> + /*
> + * During multifd migration zero page is written to the memory
> + * only if it is migrated more than ones.
s/ones/once/
> + *
> + * It becomes a problem when both Multifd & Postcopy options are
> + * enabled. If the zero page which was skipped during multifd phase,
> + * is accessed during the Postcopy phase of the migration, a page
> + * fault occurs. But this page fault is not served because the
> + * 'receivedmap' says the zero page is already received. Thus the
> + * migration hangs.
> + *
> + * When Postcopy is enabled, always write the zero page as and when
> + * it is migrated.
> + *
extra blank line here^
> + */
nit: Inconsistent use of capitalization for the feature names. I'd keep
it all lowercase.
> + if (migrate_postcopy_ram() ||
> + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> memset(page, 0, multifd_ram_page_size());
> - } else {
> + }
> + if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> }
> }
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 1/3] migration: enable multifd and postcopy together
2025-05-08 12:28 ` [PATCH v10 1/3] migration: enable multifd and postcopy together Prasad Pandit
@ 2025-05-08 14:06 ` Fabiano Rosas
0 siblings, 0 replies; 16+ messages in thread
From: Fabiano Rosas @ 2025-05-08 14:06 UTC (permalink / raw)
To: Prasad Pandit, qemu-devel; +Cc: peterx, berrange, Prasad Pandit
Prasad Pandit <ppandit@redhat.com> writes:
> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Enable Multifd and Postcopy migration together.
> The migration_ioc_process_incoming() routine checks
> magic value sent on each channel and helps to properly
> setup multifd and postcopy channels.
>
> The Precopy and Multifd threads work during the initial
> guest RAM transfer. When migration moves to the Postcopy
> phase, the multifd threads cease to send data on multifd
> channels and Postcopy threads on the destination
> request/pull data from the source side.
>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-08 13:57 ` Fabiano Rosas
@ 2025-05-08 15:40 ` Peter Xu
2025-05-09 6:04 ` Prasad Pandit
2025-05-09 5:31 ` Prasad Pandit
1 sibling, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-05-08 15:40 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: Prasad Pandit, qemu-devel, berrange, Prasad Pandit
On Thu, May 08, 2025 at 10:57:19AM -0300, Fabiano Rosas wrote:
> Prasad Pandit <ppandit@redhat.com> writes:
>
> > From: Prasad Pandit <pjp@fedoraproject.org>
> >
> > During multifd migration, zero pages are are written if
> > they are migrated more than ones.
>
> s/ones/once/
>
> >
> > This may result in a migration hang issue when Multifd
> > and Postcopy are enabled together.
> >
> > When Postcopy is enabled, always write zero pages as and
> > when they are migrated.
> >
> > Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
>
> This patch should come before 1/3, otherwise it'll break bisect.
We could squash the two together, IMHO.
>
> > ---
> > migration/multifd-zero-page.c | 22 ++++++++++++++++++++--
> > 1 file changed, 20 insertions(+), 2 deletions(-)
> >
> > v10: new patch, not present in v9 or earlier versions.
> >
> > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > index dbc1184921..9bfb3ef803 100644
> > --- a/migration/multifd-zero-page.c
> > +++ b/migration/multifd-zero-page.c
> > @@ -85,9 +85,27 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> > {
> > for (int i = 0; i < p->zero_num; i++) {
> > void *page = p->host + p->zero[i];
> > - if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > +
> > + /*
> > + * During multifd migration zero page is written to the memory
> > + * only if it is migrated more than ones.
>
> s/ones/once/
>
> > + *
> > + * It becomes a problem when both Multifd & Postcopy options are
> > + * enabled. If the zero page which was skipped during multifd phase,
> > + * is accessed during the Postcopy phase of the migration, a page
> > + * fault occurs. But this page fault is not served because the
> > + * 'receivedmap' says the zero page is already received. Thus the
> > + * migration hangs.
More accurate version could be: "the thread accessing the page may hang".
As discussed previously, in most cases IIUC it won't hang migration when
accessed in vcpu contexts, and will move again when all pages migrated
(triggers uffd unregistrations).
> > + *
> > + * When Postcopy is enabled, always write the zero page as and when
> > + * it is migrated.
> > + *
>
> extra blank line here^
>
> > + */
>
> nit: Inconsistent use of capitalization for the feature names. I'd keep
> it all lowercase.
>
> > + if (migrate_postcopy_ram() ||
> > + ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > memset(page, 0, multifd_ram_page_size());
> > - } else {
> > + }
> > + if (!ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
> > ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> > }
Nitpick below: we could avoid checking the bitmap twice, and maybe move it
a bit is easier to read.
Meanwhile when at it.. for postcopy if we want we don't need to set all
zeros.. just fault it in either using one inst. Summary:
void multifd_recv_zero_page_process(MultiFDRecvParams *p)
{
bool received;
for (int i = 0; i < p->zero_num; i++) {
void *page = p->host + p->zero[i];
received = ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i]);
if (!received) {
ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
}
if (received) {
/* If it has an older version, we must clear the whole page */
memset(page, 0, multifd_ram_page_size());
} else if (migrate_postcopy_ram()) {
/*
* If postcopy is enabled, we must fault in the page because
* XXX (please fill in..). Here we don't necessarily need to
* zero the whole page because we know it must be pre-filled
* with zeros anyway.
*/
*(uint8_t *)page = 0;
}
}
}
We could also use MADV_POPULATE_WRITE but not sure which one is faster, and
this might still be easier to follow anyway..
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd
2025-05-08 12:28 ` [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
@ 2025-05-08 19:04 ` Peter Xu
2025-05-09 5:26 ` Prasad Pandit
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-05-08 19:04 UTC (permalink / raw)
To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit
On Thu, May 08, 2025 at 05:58:48PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
>
> Add new qtests to run postcopy migration with multifd
> channels enabled.
>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
> tests/qtest/migration/compression-tests.c | 18 ++++++++
> tests/qtest/migration/postcopy-tests.c | 27 ++++++++++++
> tests/qtest/migration/precopy-tests.c | 22 ++++++++++
> tests/qtest/migration/tls-tests.c | 50 +++++++++++++++++++++++
> 4 files changed, 117 insertions(+)
>
> v10:
> - Check 'env->has_uffd' before postcopy test
>
> v9:
> - https://lore.kernel.org/qemu-devel/20250411114534.3370816-8-ppandit@redhat.com/
>
> diff --git a/tests/qtest/migration/compression-tests.c b/tests/qtest/migration/compression-tests.c
> index 41e79f031b..b827665b8e 100644
> --- a/tests/qtest/migration/compression-tests.c
> +++ b/tests/qtest/migration/compression-tests.c
> @@ -42,6 +42,20 @@ static void test_multifd_tcp_zstd(void)
> };
> test_precopy_common(&args);
> }
> +
> +static void test_multifd_postcopy_tcp_zstd(void)
> +{
> + MigrateCommon args = {
> + .listen_uri = "defer",
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
> + },
> + .start_hook = migrate_hook_start_precopy_tcp_multifd_zstd,
> + };
> +
> + test_precopy_common(&args);
> +}
> #endif /* CONFIG_ZSTD */
>
> #ifdef CONFIG_QATZIP
> @@ -184,6 +198,10 @@ void migration_test_add_compression(MigrationTestEnv *env)
> #ifdef CONFIG_ZSTD
> migration_test_add("/migration/multifd/tcp/plain/zstd",
> test_multifd_tcp_zstd);
> + if (env->has_uffd) {
> + migration_test_add("/migration/multifd+postcopy/tcp/plain/zstd",
> + test_multifd_postcopy_tcp_zstd);
> + }
> #endif
>
> #ifdef CONFIG_QATZIP
> diff --git a/tests/qtest/migration/postcopy-tests.c b/tests/qtest/migration/postcopy-tests.c
> index 483e3ff99f..eb637f94f7 100644
> --- a/tests/qtest/migration/postcopy-tests.c
> +++ b/tests/qtest/migration/postcopy-tests.c
> @@ -94,6 +94,29 @@ static void migration_test_add_postcopy_smoke(MigrationTestEnv *env)
> }
> }
>
> +static void test_multifd_postcopy(void)
> +{
> + MigrateCommon args = {
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + },
> + };
> +
> + test_postcopy_common(&args);
> +}
> +
> +static void test_multifd_postcopy_preempt(void)
> +{
> + MigrateCommon args = {
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
> + },
> + };
> +
> + test_postcopy_common(&args);
> +}
> +
> void migration_test_add_postcopy(MigrationTestEnv *env)
> {
> migration_test_add_postcopy_smoke(env);
> @@ -114,6 +137,10 @@ void migration_test_add_postcopy(MigrationTestEnv *env)
> "/migration/postcopy/recovery/double-failures/reconnect",
> test_postcopy_recovery_fail_reconnect);
>
> + migration_test_add("/migration/postcopy/multifd/plain",
> + test_multifd_postcopy);
> + migration_test_add("/migration/postcopy/multifd/preempt/plain",
> + test_multifd_postcopy_preempt);
> if (env->is_x86) {
> migration_test_add("/migration/postcopy/suspend",
> test_postcopy_suspend);
> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index 87b0a7e8ef..a575791c72 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -34,6 +34,7 @@
> #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>
> static char *tmpfs;
> +static bool postcopy_ram = false;
I may not have followed the whole discussions, but have you tried to avoid
this global?
>
> static void test_precopy_unix_plain(void)
> {
> @@ -538,6 +539,11 @@ static void test_multifd_tcp_cancel(void)
> migrate_ensure_non_converge(from);
> migrate_prepare_for_dirty_mem(from);
>
> + if (postcopy_ram) {
> + migrate_set_capability(from, "postcopy-ram", true);
> + migrate_set_capability(to, "postcopy-ram", true);
> + }
> +
> migrate_set_parameter_int(from, "multifd-channels", 16);
> migrate_set_parameter_int(to, "multifd-channels", 16);
>
> @@ -579,6 +585,10 @@ static void test_multifd_tcp_cancel(void)
> return;
> }
>
> + if (postcopy_ram) {
> + migrate_set_capability(to2, "postcopy-ram", true);
> + }
> +
> migrate_set_parameter_int(to2, "multifd-channels", 16);
>
> migrate_set_capability(to2, "multifd", true);
> @@ -602,6 +612,13 @@ static void test_multifd_tcp_cancel(void)
> migrate_end(from, to2, true);
> }
>
> +static void test_multifd_postcopy_tcp_cancel(void)
> +{
> + postcopy_ram = true;
> + test_multifd_tcp_cancel();
> + postcopy_ram = false;
> +}
> +
> static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
> const char *uri, const char *phase)
> {
> @@ -1189,6 +1206,11 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
> test_multifd_tcp_uri_none);
> migration_test_add("/migration/multifd/tcp/plain/cancel",
> test_multifd_tcp_cancel);
> + if (env->has_uffd) {
> + migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
> + test_multifd_postcopy_tcp_cancel);
> + }
> +
> #ifdef CONFIG_RDMA
> migration_test_add("/migration/precopy/rdma/plain",
> test_precopy_rdma_plain);
> diff --git a/tests/qtest/migration/tls-tests.c b/tests/qtest/migration/tls-tests.c
> index 72f44defbb..50a07a1c0f 100644
> --- a/tests/qtest/migration/tls-tests.c
> +++ b/tests/qtest/migration/tls-tests.c
> @@ -395,6 +395,19 @@ static void test_postcopy_recovery_tls_psk(void)
> test_postcopy_recovery_common(&args);
> }
>
> +static void test_multifd_postcopy_recovery_tls_psk(void)
> +{
> + MigrateCommon args = {
> + .start_hook = migrate_hook_start_tls_psk_match,
> + .end_hook = migrate_hook_end_tls_psk,
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + },
> + };
> +
> + test_postcopy_recovery_common(&args);
> +}
> +
> /* This contains preempt+recovery+tls test altogether */
> static void test_postcopy_preempt_all(void)
> {
> @@ -409,6 +422,20 @@ static void test_postcopy_preempt_all(void)
> test_postcopy_recovery_common(&args);
> }
>
> +static void test_multifd_postcopy_preempt_recovery_tls_psk(void)
> +{
> + MigrateCommon args = {
> + .start_hook = migrate_hook_start_tls_psk_match,
> + .end_hook = migrate_hook_end_tls_psk,
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + .caps[MIGRATION_CAPABILITY_POSTCOPY_PREEMPT] = true,
> + },
> + };
> +
> + test_postcopy_recovery_common(&args);
> +}
> +
> static void test_precopy_unix_tls_psk(void)
> {
> g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
> @@ -657,6 +684,21 @@ static void test_multifd_tcp_tls_psk_mismatch(void)
> test_precopy_common(&args);
> }
>
> +static void test_multifd_postcopy_tcp_tls_psk_match(void)
> +{
> + MigrateCommon args = {
> + .start = {
> + .caps[MIGRATION_CAPABILITY_MULTIFD] = true,
> + .caps[MIGRATION_CAPABILITY_POSTCOPY_RAM] = true,
> + },
> + .listen_uri = "defer",
> + .start_hook = migrate_hook_start_multifd_tcp_tls_psk_match,
> + .end_hook = migrate_hook_end_tls_psk,
> + };
> +
> + test_precopy_common(&args);
> +}
> +
> #ifdef CONFIG_TASN1
> static void test_multifd_tcp_tls_x509_default_host(void)
> {
> @@ -774,6 +816,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
> test_postcopy_preempt_tls_psk);
> migration_test_add("/migration/postcopy/preempt/recovery/tls/psk",
> test_postcopy_preempt_all);
> + migration_test_add("/migration/postcopy/multifd/recovery/tls/psk",
> + test_multifd_postcopy_recovery_tls_psk);
> + migration_test_add("/migration/postcopy/multifd/preempt/recovery/tls/psk",
> + test_multifd_postcopy_preempt_recovery_tls_psk);
> }
> #ifdef CONFIG_TASN1
> migration_test_add("/migration/precopy/unix/tls/x509/default-host",
> @@ -805,6 +851,10 @@ void migration_test_add_tls(MigrationTestEnv *env)
> test_multifd_tcp_tls_psk_match);
> migration_test_add("/migration/multifd/tcp/tls/psk/mismatch",
> test_multifd_tcp_tls_psk_mismatch);
> + if (env->has_uffd) {
> + migration_test_add("/migration/multifd+postcopy/tcp/tls/psk/match",
> + test_multifd_postcopy_tcp_tls_psk_match);
> + }
> #ifdef CONFIG_TASN1
> migration_test_add("/migration/multifd/tcp/tls/x509/default-host",
> test_multifd_tcp_tls_x509_default_host);
> --
> 2.49.0
>
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd
2025-05-08 19:04 ` Peter Xu
@ 2025-05-09 5:26 ` Prasad Pandit
2025-05-09 14:30 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-09 5:26 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit
On Fri, 9 May 2025 at 00:34, Peter Xu <peterx@redhat.com> wrote:
> I may not have followed the whole discussions, but have you tried to avoid
> this global?
-> https://lore.kernel.org/qemu-devel/875xkyyxyy.fsf@suse.de/
* Yes, it was discussed, passing it as a parameter would change the
function prototype and entail changing functions at many places.
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-08 13:57 ` Fabiano Rosas
2025-05-08 15:40 ` Peter Xu
@ 2025-05-09 5:31 ` Prasad Pandit
1 sibling, 0 replies; 16+ messages in thread
From: Prasad Pandit @ 2025-05-09 5:31 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: qemu-devel, peterx, berrange, Prasad Pandit
On Thu, 8 May 2025 at 19:27, Fabiano Rosas <farosas@suse.de> wrote:
> > During multifd migration, zero pages are are written if
> > they are migrated more than ones.
>
> s/ones/once/
> s/ones/once/
> extra blank line here^
>
> nit: Inconsistent use of capitalization for the feature names. I'd keep
> it all lowercase.
* Okay.
> This patch should come before 1/3, otherwise it'll break bisect.
===
0001-migration-write-zero-pages-when-postcopy-enabled.patch
0002-migration-enable-multifd-and-postcopy-together.patch
0003-tests-qtest-migration-add-postcopy-tests-with-multif.patch
===
* Okay, like above?
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-08 15:40 ` Peter Xu
@ 2025-05-09 6:04 ` Prasad Pandit
2025-05-09 15:11 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-09 6:04 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit
> > This patch should come before 1/3, otherwise it'll break bisect.
> We could squash the two together, IMHO.
* It is adjusting the specific optimisation behaviour for the use case
of when Multifd and Postcopy are enabled together. I think it's better
as a separate patch. It'll help to see how that optimization
changed/evolved over time.
> > s/ones/once/
> >
> > > + *
> > > + * It becomes a problem when both Multifd & Postcopy options are
> > > + * enabled. If the zero page which was skipped during multifd phase,
> > > + * is accessed during the Postcopy phase of the migration, a page
> > > + * fault occurs. But this page fault is not served because the
> > > + * 'receivedmap' says the zero page is already received. Thus the
> > > + * migration hangs.
>
> More accurate version could be: "the thread accessing the page may hang".
> As discussed previously, in most cases IIUC it won't hang migration when
> accessed in vcpu contexts, and will move again when all pages migrated
> (triggers uffd unregistrations).
* Okay.
> Meanwhile when at it.. for postcopy if we want we don't need to set all
> zeros.. just fault it in either using one inst. Summary:
>
> void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> {
> bool received;
>
> for (int i = 0; i < p->zero_num; i++) {
> void *page = p->host + p->zero[i];
>
> received = ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i]);
> if (!received) {
> ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> }
* Okay.
> if (received) {
> /* If it has an older version, we must clear the whole page */
> memset(page, 0, multifd_ram_page_size());
> } else if (migrate_postcopy_ram()) {
> /*
> * If postcopy is enabled, we must fault in the page because
> * XXX (please fill in..). Here we don't necessarily need to
> * zero the whole page because we know it must be pre-filled
> * with zeros anyway.
> */
> *(uint8_t *)page = 0;
>
> We could also use MADV_POPULATE_WRITE but not sure which one is faster, and
> this might still be easier to follow anyway..
* Not sure how this is to work; During Multifd phase (Postcopy not
running), when migrate_postcopy_ram() returns true, we shall raise a
page fault here?
* Could we zero-initialise the destination guest memory when migration
starts? And not migrate the zero pages from the source at all? ie. we
mark the page received in the 'receivedmap' as is done now, but page
fault should also not happen for that guest address, because the
memory was already zero-initialised at the beginning. I think there
might be some scope to send zero-page entries piggy-backed with
non-zero pages, whose contents are migrated anyway.
* Say there are 10 pages (4KB each, Total: 40KB). Of these 10 pages:
Non-zero pages: 1, 2, 4, 7, 9, 10
Zero Pages: 3, 5-6 and 8
* We only migrate/send non-zero pages from source to the destination.
When non-zero page-4 is migrated, an entry/hint of page-3 being zero
one is piggy-backed with it. When non-zero page-7 is sent an
entry/hint of pages-5-6 being zero pages is sent with it. Similarly a
hint of page-8 being zero page is sent along with page-9. (thinking
aloud)
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd
2025-05-09 5:26 ` Prasad Pandit
@ 2025-05-09 14:30 ` Peter Xu
2025-05-13 6:20 ` Prasad Pandit
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-05-09 14:30 UTC (permalink / raw)
To: Prasad Pandit; +Cc: qemu-devel, farosas, berrange, Prasad Pandit
On Fri, May 09, 2025 at 10:56:05AM +0530, Prasad Pandit wrote:
> On Fri, 9 May 2025 at 00:34, Peter Xu <peterx@redhat.com> wrote:
> > I may not have followed the whole discussions, but have you tried to avoid
> > this global?
>
> -> https://lore.kernel.org/qemu-devel/875xkyyxyy.fsf@suse.de/
>
> * Yes, it was discussed, passing it as a parameter would change the
> function prototype and entail changing functions at many places.
Would this work?
diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
index a575791c72..441a65bcf5 100644
--- a/tests/qtest/migration/precopy-tests.c
+++ b/tests/qtest/migration/precopy-tests.c
@@ -34,7 +34,6 @@
#define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
static char *tmpfs;
-static bool postcopy_ram = false;
static void test_precopy_unix_plain(void)
{
@@ -525,7 +524,7 @@ static void test_multifd_tcp_channels_none(void)
*
* And see that it works
*/
-static void test_multifd_tcp_cancel(void)
+static void test_multifd_tcp_cancel(bool postcopy_ram)
{
MigrateStart args = {
.hide_stderr = true,
@@ -612,11 +611,14 @@ static void test_multifd_tcp_cancel(void)
migrate_end(from, to2, true);
}
+static void test_multifd_precopy_tcp_cancel(void)
+{
+ test_multifd_tcp_cancel(false);
+}
+
static void test_multifd_postcopy_tcp_cancel(void)
{
- postcopy_ram = true;
- test_multifd_tcp_cancel();
- postcopy_ram = false;
+ test_multifd_tcp_cancel(true);
}
static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
@@ -1205,7 +1207,7 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
migration_test_add("/migration/multifd/tcp/uri/plain/none",
test_multifd_tcp_uri_none);
migration_test_add("/migration/multifd/tcp/plain/cancel",
- test_multifd_tcp_cancel);
+ test_multifd_precopy_tcp_cancel);
if (env->has_uffd) {
migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
test_multifd_postcopy_tcp_cancel);
--
Peter Xu
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-09 6:04 ` Prasad Pandit
@ 2025-05-09 15:11 ` Peter Xu
2025-05-12 6:26 ` Prasad Pandit
0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2025-05-09 15:11 UTC (permalink / raw)
To: Prasad Pandit; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit
On Fri, May 09, 2025 at 11:34:06AM +0530, Prasad Pandit wrote:
> > > This patch should come before 1/3, otherwise it'll break bisect.
> > We could squash the two together, IMHO.
>
> * It is adjusting the specific optimisation behaviour for the use case
> of when Multifd and Postcopy are enabled together. I think it's better
> as a separate patch. It'll help to see how that optimization
> changed/evolved over time.
I'd say it'll not help me.. no more than if you squash it with the first.
Both of them are tiny patches, and one should never apply this one alone.
I can just find no good reason to keep this single change separate.
But it's fine to me - you can keep it separate if you prefer, as long as
reorderd.
>
> > > s/ones/once/
> > >
> > > > + *
> > > > + * It becomes a problem when both Multifd & Postcopy options are
> > > > + * enabled. If the zero page which was skipped during multifd phase,
> > > > + * is accessed during the Postcopy phase of the migration, a page
> > > > + * fault occurs. But this page fault is not served because the
> > > > + * 'receivedmap' says the zero page is already received. Thus the
> > > > + * migration hangs.
> >
> > More accurate version could be: "the thread accessing the page may hang".
> > As discussed previously, in most cases IIUC it won't hang migration when
> > accessed in vcpu contexts, and will move again when all pages migrated
> > (triggers uffd unregistrations).
>
> * Okay.
>
> > Meanwhile when at it.. for postcopy if we want we don't need to set all
> > zeros.. just fault it in either using one inst. Summary:
> >
> > void multifd_recv_zero_page_process(MultiFDRecvParams *p)
> > {
> > bool received;
> >
> > for (int i = 0; i < p->zero_num; i++) {
> > void *page = p->host + p->zero[i];
> >
> > received = ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i]);
> > if (!received) {
> > ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
> > }
>
> * Okay.
>
> > if (received) {
> > /* If it has an older version, we must clear the whole page */
> > memset(page, 0, multifd_ram_page_size());
> > } else if (migrate_postcopy_ram()) {
> > /*
> > * If postcopy is enabled, we must fault in the page because
> > * XXX (please fill in..). Here we don't necessarily need to
> > * zero the whole page because we know it must be pre-filled
> > * with zeros anyway.
> > */
> > *(uint8_t *)page = 0;
> >
> > We could also use MADV_POPULATE_WRITE but not sure which one is faster, and
> > this might still be easier to follow anyway..
>
> * Not sure how this is to work; During Multifd phase (Postcopy not
> running), when migrate_postcopy_ram() returns true, we shall raise a
> page fault here?
Yes, it's already received==0 reaching here, the page must be zero. A
fault-in should be enough.
Well.. after a second thought, maybe better memset().. I need to double
check if "migration didn't receive this page" always means it's empty. I
worry we have other part of code pre-loads the chunk. So better not do
this trivial trick until some of us is sure..
My apologies of the confusion, please keep the good part of my comments,
and stick with memset(). Please ignore the fault inst and madvise for now.
>
> * Could we zero-initialise the destination guest memory when migration
> starts? And not migrate the zero pages from the source at all? ie. we
> mark the page received in the 'receivedmap' as is done now, but page
> fault should also not happen for that guest address, because the
> memory was already zero-initialised at the beginning. I think there
> might be some scope to send zero-page entries piggy-backed with
> non-zero pages, whose contents are migrated anyway.
>
> * Say there are 10 pages (4KB each, Total: 40KB). Of these 10 pages:
>
> Non-zero pages: 1, 2, 4, 7, 9, 10
> Zero Pages: 3, 5-6 and 8
>
> * We only migrate/send non-zero pages from source to the destination.
> When non-zero page-4 is migrated, an entry/hint of page-3 being zero
> one is piggy-backed with it. When non-zero page-7 is sent an
> entry/hint of pages-5-6 being zero pages is sent with it. Similarly a
> hint of page-8 being zero page is sent along with page-9. (thinking
> aloud)
Isn't that what multifd is doing already?
typedef struct {
...
/*
* This array contains the pointers to:
* - normal pages (initial normal_pages entries)
* - zero pages (following zero_pages entries)
*/
uint64_t offset[];
} __attribute__((packed)) MultiFDPacket_t;
Or maybe I missed what you meant.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-09 15:11 ` Peter Xu
@ 2025-05-12 6:26 ` Prasad Pandit
2025-05-12 15:06 ` Peter Xu
0 siblings, 1 reply; 16+ messages in thread
From: Prasad Pandit @ 2025-05-12 6:26 UTC (permalink / raw)
To: Peter Xu; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit
Hi,
On Fri, 9 May 2025 at 20:41, Peter Xu <peterx@redhat.com> wrote:
> Isn't that what multifd is doing already?
> typedef struct {
> ...
> /*
> * This array contains the pointers to:
> * - normal pages (initial normal_pages entries)
> * - zero pages (following zero_pages entries)
> */
> uint64_t offset[];
> } __attribute__((packed)) MultiFDPacket_t;
> Or maybe I missed what you meant.
* Why are we memsetting zero pages on the receive side? What I'm
trying to get at is, if destination memory is zero-initialised at the
beginning of migration, we might be able to do away with this
memset(3) call and this optimisation altogether. All zero page entries
could point to the same zero page as well, if we don't want to
preallocate all zero pages at start.
Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 3/3] migration: write zero pages when postcopy enabled
2025-05-12 6:26 ` Prasad Pandit
@ 2025-05-12 15:06 ` Peter Xu
0 siblings, 0 replies; 16+ messages in thread
From: Peter Xu @ 2025-05-12 15:06 UTC (permalink / raw)
To: Prasad Pandit; +Cc: Fabiano Rosas, qemu-devel, berrange, Prasad Pandit
On Mon, May 12, 2025 at 11:56:31AM +0530, Prasad Pandit wrote:
> Hi,
>
> On Fri, 9 May 2025 at 20:41, Peter Xu <peterx@redhat.com> wrote:
> > Isn't that what multifd is doing already?
> > typedef struct {
> > ...
> > /*
> > * This array contains the pointers to:
> > * - normal pages (initial normal_pages entries)
> > * - zero pages (following zero_pages entries)
> > */
> > uint64_t offset[];
> > } __attribute__((packed)) MultiFDPacket_t;
> > Or maybe I missed what you meant.
>
> * Why are we memsetting zero pages on the receive side? What I'm
> trying to get at is, if destination memory is zero-initialised at the
> beginning of migration, we might be able to do away with this
> memset(3) call and this optimisation altogether. All zero page entries
> could point to the same zero page as well, if we don't want to
> preallocate all zero pages at start.
I didn't follow, sorry.
It's still uncertain to me whether the initial pages would be zero before
migration touching it. I'll need to check some day, because it looks a
matter too as long as we avoid memset in the 1st round. But I don't
understand what you meant by "do away with memset... altogether".
What is the change you're suggesting?
--
Peter Xu
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd
2025-05-09 14:30 ` Peter Xu
@ 2025-05-13 6:20 ` Prasad Pandit
0 siblings, 0 replies; 16+ messages in thread
From: Prasad Pandit @ 2025-05-13 6:20 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, farosas, berrange, Prasad Pandit
On Fri, 9 May 2025 at 20:00, Peter Xu <peterx@redhat.com> wrote:
> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> index a575791c72..441a65bcf5 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -34,7 +34,6 @@
> #define DIRTYLIMIT_TOLERANCE_RANGE 25 /* MB/s */
>
> static char *tmpfs;
> -static bool postcopy_ram = false;
>
> static void test_precopy_unix_plain(void)
> {
> @@ -525,7 +524,7 @@ static void test_multifd_tcp_channels_none(void)
> *
> * And see that it works
> */
> -static void test_multifd_tcp_cancel(void)
> +static void test_multifd_tcp_cancel(bool postcopy_ram)
> {
> MigrateStart args = {
> .hide_stderr = true,
> @@ -612,11 +611,14 @@ static void test_multifd_tcp_cancel(void)
> migrate_end(from, to2, true);
> }
>
> +static void test_multifd_precopy_tcp_cancel(void)
> +{
> + test_multifd_tcp_cancel(false);
> +}
> +
> static void test_multifd_postcopy_tcp_cancel(void)
> {
> - postcopy_ram = true;
> - test_multifd_tcp_cancel();
> - postcopy_ram = false;
> + test_multifd_tcp_cancel(true);
> }
>
> static void test_cancel_src_after_failed(QTestState *from, QTestState *to,
> @@ -1205,7 +1207,7 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
> migration_test_add("/migration/multifd/tcp/uri/plain/none",
> test_multifd_tcp_uri_none);
> migration_test_add("/migration/multifd/tcp/plain/cancel",
> - test_multifd_tcp_cancel);
> + test_multifd_precopy_tcp_cancel);
> if (env->has_uffd) {
> migration_test_add("/migration/multifd+postcopy/tcp/plain/cancel",
> test_multifd_postcopy_tcp_cancel);
>
>
> Would this work?
>
* I did; v11 includes it. Thank you.
---
- Prasad
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-05-13 6:21 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08 12:28 [PATCH v10 0/3] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 1/3] migration: enable multifd and postcopy together Prasad Pandit
2025-05-08 14:06 ` Fabiano Rosas
2025-05-08 12:28 ` [PATCH v10 2/3] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-05-08 19:04 ` Peter Xu
2025-05-09 5:26 ` Prasad Pandit
2025-05-09 14:30 ` Peter Xu
2025-05-13 6:20 ` Prasad Pandit
2025-05-08 12:28 ` [PATCH v10 3/3] migration: write zero pages when postcopy enabled Prasad Pandit
2025-05-08 13:57 ` Fabiano Rosas
2025-05-08 15:40 ` Peter Xu
2025-05-09 6:04 ` Prasad Pandit
2025-05-09 15:11 ` Peter Xu
2025-05-12 6:26 ` Prasad Pandit
2025-05-12 15:06 ` Peter Xu
2025-05-09 5:31 ` Prasad Pandit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).