qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] Introduce multifd zero page checking.
@ 2024-02-16 22:39 Hao Xiang
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:39 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

v2 update:
* Implement zero-page-detection switch with enumeration "legacy",
"none" and "multifd".
* Move normal/zero pages from MultiFDSendParams to MultiFDPages_t.
* Add zeros and zero_bytes accounting.

This patchset is based on Juan Quintela's old series here
https://lore.kernel.org/all/20220802063907.18882-1-quintela@redhat.com/

In the multifd live migration model, there is a single migration main
thread scanning the page map, queuing the pages to multiple multifd
sender threads. The migration main thread runs zero page checking on
every page before queuing the page to the sender threads. Zero page
checking is a CPU intensive task and hence having a single thread doing
all that doesn't scale well. This change introduces a new function
to run the zero page checking on the multifd sender threads. This
patchset also lays the ground work for future changes to offload zero
page checking task to accelerator hardwares.

Use two Intel 4th generation Xeon servers for testing.

Architecture:        x86_64
CPU(s):              192
Thread(s) per core:  2
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               143
Model name:          Intel(R) Xeon(R) Platinum 8457C
Stepping:            8
CPU MHz:             2538.624
CPU max MHz:         3800.0000
CPU min MHz:         800.0000

Perform multifd live migration with below setup:
1. VM has 100GB memory. All pages in the VM are zero pages.
2. Use tcp socket for live migration.
3. Use 4 multifd channels and zero page checking on migration main thread.
4. Use 1/2/4 multifd channels and zero page checking on multifd sender
threads.
5. Record migration total time from sender QEMU console's "info migrate"
command.

+------------------------------------+
|zero-page-checking | total-time(ms) |
+------------------------------------+
|main-thread        | 9629           |
+------------------------------------+
|multifd-1-threads  | 6182           |
+------------------------------------+
|multifd-2-threads  | 4643           |
+------------------------------------+
|multifd-4-threads  | 4143           |
+------------------------------------+

Apply this patchset on top of commit
5767815218efd3cbfd409505ed824d5f356044ae

Hao Xiang (7):
  migration/multifd: Add new migration option zero-page-detection.
  migration/multifd: Support for zero pages transmission in multifd
    format.
  migration/multifd: Zero page transmission on the multifd thread.
  migration/multifd: Enable zero page checking from multifd threads.
  migration/multifd: Add new migration test cases for legacy zero page
    checking.
  migration/multifd: Add zero pages and zero bytes counter to migration
    status interface.
  Update maintainer contact for migration multifd zero page checking
    acceleration.

 MAINTAINERS                         |  5 ++
 hw/core/qdev-properties-system.c    | 10 ++++
 include/hw/qdev-properties-system.h |  4 ++
 migration/meson.build               |  1 +
 migration/migration-hmp-cmds.c      | 13 +++++
 migration/migration.c               |  2 +
 migration/multifd-zero-page.c       | 59 +++++++++++++++++++
 migration/multifd-zlib.c            | 26 +++++++--
 migration/multifd-zstd.c            | 25 ++++++--
 migration/multifd.c                 | 90 ++++++++++++++++++++++++-----
 migration/multifd.h                 | 28 ++++++++-
 migration/options.c                 | 21 +++++++
 migration/options.h                 |  1 +
 migration/ram.c                     | 50 ++++++++++++----
 migration/trace-events              |  8 +--
 qapi/migration.json                 | 47 +++++++++++++--
 tests/migration/guestperf/engine.py |  2 +
 tests/qtest/migration-test.c        | 52 +++++++++++++++++
 18 files changed, 399 insertions(+), 45 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

-- 
2.30.2



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
@ 2024-02-16 22:39 ` Hao Xiang
  2024-02-21 12:03   ` Markus Armbruster
                     ` (3 more replies)
  2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
                   ` (5 subsequent siblings)
  6 siblings, 4 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:39 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

This new parameter controls where the zero page checking is running.
1. If this parameter is set to 'legacy', zero page checking is
done in the migration main thread.
2. If this parameter is set to 'none', zero page checking is disabled.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 hw/core/qdev-properties-system.c    | 10 ++++++++++
 include/hw/qdev-properties-system.h |  4 ++++
 migration/migration-hmp-cmds.c      |  9 +++++++++
 migration/options.c                 | 21 ++++++++++++++++++++
 migration/options.h                 |  1 +
 migration/ram.c                     |  4 ++++
 qapi/migration.json                 | 30 ++++++++++++++++++++++++++---
 7 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 1a396521d5..63843f18b5 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
     .set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
+const PropertyInfo qdev_prop_zero_page_detection = {
+    .name = "ZeroPageDetection",
+    .description = "zero_page_detection values, "
+                   "multifd,legacy,none",
+    .enum_table = &ZeroPageDetection_lookup,
+    .get = qdev_propinfo_get_enum,
+    .set = qdev_propinfo_set_enum,
+    .set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- Reserved Region --- */
 
 /*
diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h
index 06c359c190..839b170235 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -8,6 +8,7 @@ extern const PropertyInfo qdev_prop_macaddr;
 extern const PropertyInfo qdev_prop_reserved_region;
 extern const PropertyInfo qdev_prop_multifd_compression;
 extern const PropertyInfo qdev_prop_mig_mode;
+extern const PropertyInfo qdev_prop_zero_page_detection;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
@@ -47,6 +48,9 @@ extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
 #define DEFINE_PROP_MIG_MODE(_n, _s, _f, _d) \
     DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_mig_mode, \
                        MigMode)
+#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \
+    DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \
+                       ZeroPageDetection)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
     DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
                         LostTickPolicy)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 99b49df5dd..7e96ae6ffd 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -344,6 +344,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESSION),
             MultiFDCompression_str(params->multifd_compression));
+        assert(params->has_zero_page_detection);
+        monitor_printf(mon, "%s: %s\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_ZERO_PAGE_DETECTION),
+            qapi_enum_lookup(&ZeroPageDetection_lookup,
+                params->zero_page_detection));
         monitor_printf(mon, "%s: %" PRIu64 " bytes\n",
             MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
             params->xbzrle_cache_size);
@@ -634,6 +639,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_multifd_zstd_level = true;
         visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
         break;
+    case MIGRATION_PARAMETER_ZERO_PAGE_DETECTION:
+        p->has_zero_page_detection = true;
+        visit_type_ZeroPageDetection(v, param, &p->zero_page_detection, &err);
+        break;
     case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
         p->has_xbzrle_cache_size = true;
         if (!visit_type_size(v, param, &cache_size, &err)) {
diff --git a/migration/options.c b/migration/options.c
index 3e3e0b93b4..3c603391b0 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -179,6 +179,9 @@ Property migration_properties[] = {
     DEFINE_PROP_MIG_MODE("mode", MigrationState,
                       parameters.mode,
                       MIG_MODE_NORMAL),
+    DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
+                       parameters.zero_page_detection,
+                       ZERO_PAGE_DETECTION_LEGACY),
 
     /* Migration capabilities */
     DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -903,6 +906,13 @@ uint64_t migrate_xbzrle_cache_size(void)
     return s->parameters.xbzrle_cache_size;
 }
 
+ZeroPageDetection migrate_zero_page_detection(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->parameters.zero_page_detection;
+}
+
 /* parameter setters */
 
 void migrate_set_block_incremental(bool value)
@@ -1013,6 +1023,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->vcpu_dirty_limit = s->parameters.vcpu_dirty_limit;
     params->has_mode = true;
     params->mode = s->parameters.mode;
+    params->has_zero_page_detection = true;
+    params->zero_page_detection = s->parameters.zero_page_detection;
 
     return params;
 }
@@ -1049,6 +1061,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_x_vcpu_dirty_limit_period = true;
     params->has_vcpu_dirty_limit = true;
     params->has_mode = true;
+    params->has_zero_page_detection = true;
 }
 
 /*
@@ -1350,6 +1363,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_mode) {
         dest->mode = params->mode;
     }
+
+    if (params->has_zero_page_detection) {
+        dest->zero_page_detection = params->zero_page_detection;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1494,6 +1511,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
     if (params->has_mode) {
         s->parameters.mode = params->mode;
     }
+
+    if (params->has_zero_page_detection) {
+        s->parameters.zero_page_detection = params->zero_page_detection;
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 246c160aee..b7c4fb3861 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -93,6 +93,7 @@ const char *migrate_tls_authz(void);
 const char *migrate_tls_creds(void);
 const char *migrate_tls_hostname(void);
 uint64_t migrate_xbzrle_cache_size(void);
+ZeroPageDetection migrate_zero_page_detection(void);
 
 /* parameters setters */
 
diff --git a/migration/ram.c b/migration/ram.c
index 4649a81204..556725c30f 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1123,6 +1123,10 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
     QEMUFile *file = pss->pss_channel;
     int len = 0;
 
+    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
+        return 0;
+    }
+
     if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
         return 0;
     }
diff --git a/qapi/migration.json b/qapi/migration.json
index 5a565d9b8d..99843a8e95 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -653,6 +653,17 @@
 { 'enum': 'MigMode',
   'data': [ 'normal', 'cpr-reboot' ] }
 
+##
+# @ZeroPageDetection:
+#
+# @legacy: Perform zero page checking from main migration thread. (since 9.0)
+#
+# @none: Do not perform zero page checking.
+#
+##
+{ 'enum': 'ZeroPageDetection',
+  'data': [ 'legacy', 'none' ] }
+
 ##
 # @BitmapMigrationBitmapAliasTransform:
 #
@@ -874,6 +885,9 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @zero-page-detection: See description in @ZeroPageDetection.
+#     Default is 'legacy'. (Since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -907,7 +921,8 @@
            'block-bitmap-mapping',
            { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
            'vcpu-dirty-limit',
-           'mode'] }
+           'mode',
+           'zero-page-detection'] }
 
 ##
 # @MigrateSetParameters:
@@ -1066,6 +1081,10 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @zero-page-detection: See description in @ZeroPageDetection.
+#     Default is 'legacy'. (Since 9.0)
+#
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1119,7 +1138,8 @@
             '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
-            '*mode': 'MigMode'} }
+            '*mode': 'MigMode',
+            '*zero-page-detection': 'ZeroPageDetection'} }
 
 ##
 # @migrate-set-parameters:
@@ -1294,6 +1314,9 @@
 # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
 #        (Since 8.2)
 #
+# @zero-page-detection: See description in @ZeroPageDetection.
+#     Default is 'legacy'. (Since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @block-incremental is deprecated.  Use
@@ -1344,7 +1367,8 @@
             '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
                                             'features': [ 'unstable' ] },
             '*vcpu-dirty-limit': 'uint64',
-            '*mode': 'MigMode'} }
+            '*mode': 'MigMode',
+            '*zero-page-detection': 'ZeroPageDetection'} }
 
 ##
 # @query-migrate-parameters:
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
@ 2024-02-16 22:39 ` Hao Xiang
  2024-02-21 15:37   ` Elena Ufimtseva
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:39 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

This change adds zero page counters and updates multifd send/receive
tracing format to track the newly added counters.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 migration/multifd.c    | 43 ++++++++++++++++++++++++++++++++++--------
 migration/multifd.h    | 21 ++++++++++++++++++++-
 migration/ram.c        |  1 -
 migration/trace-events |  8 ++++----
 4 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index adfe8c9a0a..a33dba40d9 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -236,6 +236,8 @@ static void multifd_pages_reset(MultiFDPages_t *pages)
      * overwritten later when reused.
      */
     pages->num = 0;
+    pages->normal_num = 0;
+    pages->zero_num = 0;
     pages->block = NULL;
 }
 
@@ -309,6 +311,8 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
 
     pages->allocated = n;
     pages->offset = g_new0(ram_addr_t, n);
+    pages->normal = g_new0(ram_addr_t, n);
+    pages->zero = g_new0(ram_addr_t, n);
 
     return pages;
 }
@@ -319,6 +323,10 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
     pages->allocated = 0;
     g_free(pages->offset);
     pages->offset = NULL;
+    g_free(pages->normal);
+    pages->normal = NULL;
+    g_free(pages->zero);
+    pages->zero = NULL;
     g_free(pages);
 }
 
@@ -332,6 +340,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
     packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(p->pages->allocated);
     packet->normal_pages = cpu_to_be32(pages->num);
+    packet->zero_pages = cpu_to_be32(pages->zero_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 
     packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
@@ -350,9 +359,10 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 
     p->packets_sent++;
     p->total_normal_pages += pages->num;
+    p->total_zero_pages += pages->zero_num;
 
-    trace_multifd_send(p->id, packet_num, pages->num, p->flags,
-                       p->next_packet_size);
+    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
+                       p->flags, p->next_packet_size);
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -393,20 +403,29 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->normal_num = be32_to_cpu(packet->normal_pages);
     if (p->normal_num > packet->pages_alloc) {
         error_setg(errp, "multifd: received packet "
-                   "with %u pages and expected maximum pages are %u",
+                   "with %u normal pages and expected maximum pages are %u",
                    p->normal_num, packet->pages_alloc) ;
         return -1;
     }
 
+    p->zero_num = be32_to_cpu(packet->zero_pages);
+    if (p->zero_num > packet->pages_alloc - p->normal_num) {
+        error_setg(errp, "multifd: received packet "
+                   "with %u zero pages and expected maximum zero pages are %u",
+                   p->zero_num, packet->pages_alloc - p->normal_num) ;
+        return -1;
+    }
+
     p->next_packet_size = be32_to_cpu(packet->next_packet_size);
     p->packet_num = be64_to_cpu(packet->packet_num);
     p->packets_recved++;
     p->total_normal_pages += p->normal_num;
+    p->total_zero_pages += p->zero_num;
 
-    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
-                       p->next_packet_size);
+    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
+                       p->flags, p->next_packet_size);
 
-    if (p->normal_num == 0) {
+    if (p->normal_num == 0 && p->zero_num == 0) {
         return 0;
     }
 
@@ -823,6 +842,8 @@ static void *multifd_send_thread(void *opaque)
 
             stat64_add(&mig_stats.multifd_bytes,
                        p->next_packet_size + p->packet_len);
+            stat64_add(&mig_stats.normal_pages, pages->num);
+            stat64_add(&mig_stats.zero_pages, pages->zero_num);
 
             multifd_pages_reset(p->pages);
             p->next_packet_size = 0;
@@ -866,7 +887,8 @@ out:
 
     rcu_unregister_thread();
     migration_threads_remove(thread);
-    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages);
+    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -1132,6 +1154,8 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
     p->iov = NULL;
     g_free(p->normal);
     p->normal = NULL;
+    g_free(p->zero);
+    p->zero = NULL;
     multifd_recv_state->ops->recv_cleanup(p);
 }
 
@@ -1251,7 +1275,9 @@ static void *multifd_recv_thread(void *opaque)
     }
 
     rcu_unregister_thread();
-    trace_multifd_recv_thread_end(p->id, p->packets_recved, p->total_normal_pages);
+    trace_multifd_recv_thread_end(p->id, p->packets_recved,
+                                  p->total_normal_pages,
+                                  p->total_zero_pages);
 
     return NULL;
 }
@@ -1290,6 +1316,7 @@ int multifd_recv_setup(Error **errp)
         p->name = g_strdup_printf("multifdrecv_%d", i);
         p->iov = g_new0(struct iovec, page_count);
         p->normal = g_new0(ram_addr_t, page_count);
+        p->zero = g_new0(ram_addr_t, page_count);
         p->page_count = page_count;
         p->page_size = qemu_target_page_size();
     }
diff --git a/migration/multifd.h b/migration/multifd.h
index 8a1cad0996..9822ff298a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -48,7 +48,10 @@ typedef struct {
     /* size of the next packet that contains pages */
     uint32_t next_packet_size;
     uint64_t packet_num;
-    uint64_t unused[4];    /* Reserved for future use */
+    /* zero pages */
+    uint32_t zero_pages;
+    uint32_t unused32[1];    /* Reserved for future use */
+    uint64_t unused64[3];    /* Reserved for future use */
     char ramblock[256];
     uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
@@ -56,10 +59,18 @@ typedef struct {
 typedef struct {
     /* number of used pages */
     uint32_t num;
+    /* number of normal pages */
+    uint32_t normal_num;
+    /* number of zero pages */
+    uint32_t zero_num;
     /* number of allocated pages */
     uint32_t allocated;
     /* offset of each page */
     ram_addr_t *offset;
+    /* offset of normal page */
+    ram_addr_t *normal;
+    /* offset of zero page */
+    ram_addr_t *zero;
     RAMBlock *block;
 } MultiFDPages_t;
 
@@ -124,6 +135,8 @@ typedef struct {
     uint64_t packets_sent;
     /* non zero pages sent through this channel */
     uint64_t total_normal_pages;
+    /* zero pages sent through this channel */
+    uint64_t total_zero_pages;
     /* buffers to send */
     struct iovec *iov;
     /* number of iovs used */
@@ -178,12 +191,18 @@ typedef struct {
     uint8_t *host;
     /* non zero pages recv through this channel */
     uint64_t total_normal_pages;
+    /* zero pages recv through this channel */
+    uint64_t total_zero_pages;
     /* buffers to recv */
     struct iovec *iov;
     /* Pages that are not zero */
     ram_addr_t *normal;
     /* num of non zero pages */
     uint32_t normal_num;
+    /* Pages that are zero */
+    ram_addr_t *zero;
+    /* num of zero pages */
+    uint32_t zero_num;
     /* used for de-compression methods */
     void *data;
 } MultiFDRecvParams;
diff --git a/migration/ram.c b/migration/ram.c
index 556725c30f..5ece9f042e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1259,7 +1259,6 @@ static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
     if (!multifd_queue_page(block, offset)) {
         return -1;
     }
-    stat64_add(&mig_stats.normal_pages, 1);
 
     return 1;
 }
diff --git a/migration/trace-events b/migration/trace-events
index 298ad2b0dd..9f1d7ae71a 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -128,21 +128,21 @@ postcopy_preempt_reset_channel(void) ""
 # multifd.c
 multifd_new_send_channel_async(uint8_t id) "channel %u"
 multifd_new_send_channel_async_error(uint8_t id, void *err) "channel=%u err=%p"
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_recv_new_channel(uint8_t id) "channel %u"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %u"
 multifd_recv_sync_main_wait(uint8_t id) "channel %u"
 multifd_recv_terminate_threads(bool error) "error %d"
-multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
+multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " zero pages %" PRIu64
 multifd_recv_thread_start(uint8_t id) "%u"
-multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
+multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal_pages, uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
 multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
 multifd_send_terminate_threads(void) ""
-multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
+multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64 " zero pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
 multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
  2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
@ 2024-02-16 22:39 ` Hao Xiang
  2024-02-16 23:49   ` Richard Henderson
                     ` (3 more replies)
  2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
                   ` (3 subsequent siblings)
  6 siblings, 4 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:39 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

1. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
2. Added a new value 'multifd' in ZeroPageDetection enumeration.
3. Add proper asserts to ensure pages->normal are used for normal pages
in all scenarios.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 migration/meson.build         |  1 +
 migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
 migration/multifd-zlib.c      | 26 ++++++++++++---
 migration/multifd-zstd.c      | 25 ++++++++++++---
 migration/multifd.c           | 50 +++++++++++++++++++++++------
 migration/multifd.h           |  7 +++++
 qapi/migration.json           |  4 ++-
 7 files changed, 151 insertions(+), 21 deletions(-)
 create mode 100644 migration/multifd-zero-page.c

diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..1eeb915ff6 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -22,6 +22,7 @@ system_ss.add(files(
   'migration.c',
   'multifd.c',
   'multifd-zlib.c',
+  'multifd-zero-page.c',
   'ram-compress.c',
   'options.c',
   'postcopy-ram.c',
diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
new file mode 100644
index 0000000000..f0cd8e2c53
--- /dev/null
+++ b/migration/multifd-zero-page.c
@@ -0,0 +1,59 @@
+/*
+ * Multifd zero page detection implementation.
+ *
+ * Copyright (c) 2024 Bytedance Inc
+ *
+ * Authors:
+ *  Hao Xiang <hao.xiang@bytedance.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "exec/ramblock.h"
+#include "migration.h"
+#include "multifd.h"
+#include "options.h"
+#include "ram.h"
+
+void multifd_zero_page_check_send(MultiFDSendParams *p)
+{
+    /*
+     * QEMU older than 9.0 don't understand zero page
+     * on multifd channel. This switch is required to
+     * maintain backward compatibility.
+     */
+    bool use_multifd_zero_page =
+        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
+    MultiFDPages_t *pages = p->pages;
+    RAMBlock *rb = pages->block;
+
+    assert(pages->num != 0);
+    assert(pages->normal_num == 0);
+    assert(pages->zero_num == 0);
+
+    for (int i = 0; i < pages->num; i++) {
+        uint64_t offset = pages->offset[i];
+        if (use_multifd_zero_page &&
+            buffer_is_zero(rb->host + offset, p->page_size)) {
+            pages->zero[pages->zero_num] = offset;
+            pages->zero_num++;
+            ram_release_page(rb->idstr, offset);
+        } else {
+            pages->normal[pages->normal_num] = offset;
+            pages->normal_num++;
+        }
+    }
+}
+
+void multifd_zero_page_check_recv(MultiFDRecvParams *p)
+{
+    for (int i = 0; i < p->zero_num; i++) {
+        void *page = p->host + p->zero[i];
+        if (!buffer_is_zero(page, p->page_size)) {
+            memset(page, 0, p->page_size);
+        }
+    }
+}
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 012e3bdea1..cdfe0fa70e 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     int ret;
     uint32_t i;
 
+    multifd_zero_page_check_send(p);
+
+    if (!pages->normal_num) {
+        p->next_packet_size = 0;
+        goto out;
+    }
+
     multifd_send_prepare_header(p);
 
-    for (i = 0; i < pages->num; i++) {
+    for (i = 0; i < pages->normal_num; i++) {
         uint32_t available = z->zbuff_len - out_size;
         int flush = Z_NO_FLUSH;
 
-        if (i == pages->num - 1) {
+        if (i == pages->normal_num - 1) {
             flush = Z_SYNC_FLUSH;
         }
 
@@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
          * with compression. zlib does not guarantee that this is safe,
          * therefore copy the page before calling deflate().
          */
-        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
+        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
         zs->avail_in = p->page_size;
         zs->next_in = z->buf;
 
@@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
     p->iov[p->iovs_num].iov_len = out_size;
     p->iovs_num++;
     p->next_packet_size = out_size;
-    p->flags |= MULTIFD_FLAG_ZLIB;
 
+out:
+    p->flags |= MULTIFD_FLAG_ZLIB;
     multifd_send_fill_packet(p);
-
     return 0;
 }
 
@@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_ZLIB);
         return -1;
     }
+
+    multifd_zero_page_check_recv(p);
+
+    if (!p->normal_num) {
+        assert(in_size == 0);
+        return 0;
+    }
+
     ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
 
     if (ret != 0) {
@@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, out_size, expected_size);
         return -1;
     }
+
     return 0;
 }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index dc8fe43e94..27a1eba075 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     int ret;
     uint32_t i;
 
+    multifd_zero_page_check_send(p);
+
+    if (!pages->normal_num) {
+        p->next_packet_size = 0;
+        goto out;
+    }
+
     multifd_send_prepare_header(p);
 
     z->out.dst = z->zbuff;
     z->out.size = z->zbuff_len;
     z->out.pos = 0;
 
-    for (i = 0; i < pages->num; i++) {
+    for (i = 0; i < pages->normal_num; i++) {
         ZSTD_EndDirective flush = ZSTD_e_continue;
 
-        if (i == pages->num - 1) {
+        if (i == pages->normal_num - 1) {
             flush = ZSTD_e_flush;
         }
-        z->in.src = p->pages->block->host + pages->offset[i];
+        z->in.src = p->pages->block->host + pages->normal[i];
         z->in.size = p->page_size;
         z->in.pos = 0;
 
@@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
     p->iov[p->iovs_num].iov_len = z->out.pos;
     p->iovs_num++;
     p->next_packet_size = z->out.pos;
-    p->flags |= MULTIFD_FLAG_ZSTD;
 
+out:
+    p->flags |= MULTIFD_FLAG_ZSTD;
     multifd_send_fill_packet(p);
-
     return 0;
 }
 
@@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_ZSTD);
         return -1;
     }
+
+    multifd_zero_page_check_recv(p);
+
+    if (!p->normal_num) {
+        assert(in_size == 0);
+        return 0;
+    }
+
     ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
 
     if (ret != 0) {
diff --git a/migration/multifd.c b/migration/multifd.c
index a33dba40d9..fbb40ea10b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 #include "qemu/rcu.h"
 #include "exec/target_page.h"
 #include "sysemu/sysemu.h"
@@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
     MultiFDPages_t *pages = p->pages;
     int ret;
 
+    multifd_zero_page_check_send(p);
+
     if (!use_zero_copy_send) {
         /*
          * Only !zerocopy needs the header in IOV; zerocopy will
@@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
         multifd_send_prepare_header(p);
     }
 
-    for (int i = 0; i < pages->num; i++) {
-        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+    for (int i = 0; i < pages->normal_num; i++) {
+        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
         p->iov[p->iovs_num].iov_len = p->page_size;
         p->iovs_num++;
     }
 
-    p->next_packet_size = pages->num * p->page_size;
+    p->next_packet_size = pages->normal_num * p->page_size;
     p->flags |= MULTIFD_FLAG_NOCOMP;
 
     multifd_send_fill_packet(p);
@@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
                    p->id, flags, MULTIFD_FLAG_NOCOMP);
         return -1;
     }
+
+    multifd_zero_page_check_recv(p);
+
+    if (!p->normal_num) {
+        return 0;
+    }
+
     for (int i = 0; i < p->normal_num; i++) {
         p->iov[i].iov_base = p->host + p->normal[i];
         p->iov[i].iov_len = p->page_size;
@@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 
     packet->flags = cpu_to_be32(p->flags);
     packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-    packet->normal_pages = cpu_to_be32(pages->num);
+    packet->normal_pages = cpu_to_be32(pages->normal_num);
     packet->zero_pages = cpu_to_be32(pages->zero_num);
     packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 
@@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
         strncpy(packet->ramblock, pages->block->idstr, 256);
     }
 
-    for (i = 0; i < pages->num; i++) {
+    for (i = 0; i < pages->normal_num; i++) {
         /* there are architectures where ram_addr_t is 32 bit */
-        uint64_t temp = pages->offset[i];
+        uint64_t temp = pages->normal[i];
 
         packet->offset[i] = cpu_to_be64(temp);
     }
 
+    for (i = 0; i < pages->zero_num; i++) {
+        /* there are architectures where ram_addr_t is 32 bit */
+        uint64_t temp = pages->zero[i];
+
+        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
+    }
+
     p->packets_sent++;
-    p->total_normal_pages += pages->num;
+    p->total_normal_pages += pages->normal_num;
     p->total_zero_pages += pages->zero_num;
 
-    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
+    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
                        p->flags, p->next_packet_size);
 }
 
@@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
         p->normal[i] = offset;
     }
 
+    for (i = 0; i < p->zero_num; i++) {
+        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
+
+        if (offset > (p->block->used_length - p->page_size)) {
+            error_setg(errp, "multifd: offset too long %" PRIu64
+                       " (max " RAM_ADDR_FMT ")",
+                       offset, p->block->used_length);
+            return -1;
+        }
+        p->zero[i] = offset;
+    }
+
     return 0;
 }
 
@@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
 
             stat64_add(&mig_stats.multifd_bytes,
                        p->next_packet_size + p->packet_len);
-            stat64_add(&mig_stats.normal_pages, pages->num);
+            stat64_add(&mig_stats.normal_pages, pages->normal_num);
             stat64_add(&mig_stats.zero_pages, pages->zero_num);
 
             multifd_pages_reset(p->pages);
@@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
         p->flags &= ~MULTIFD_FLAG_SYNC;
         qemu_mutex_unlock(&p->mutex);
 
-        if (p->normal_num) {
+        if (p->normal_num + p->zero_num) {
+            assert(!(flags & MULTIFD_FLAG_SYNC));
             ret = multifd_recv_state->ops->recv_pages(p, &local_err);
             if (ret != 0) {
                 break;
diff --git a/migration/multifd.h b/migration/multifd.h
index 9822ff298a..125f0bbe60 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -53,6 +53,11 @@ typedef struct {
     uint32_t unused32[1];    /* Reserved for future use */
     uint64_t unused64[3];    /* Reserved for future use */
     char ramblock[256];
+    /*
+     * This array contains the pointers to:
+     *  - normal pages (initial normal_pages entries)
+     *  - zero pages (following zero_pages entries)
+     */
     uint64_t offset[];
 } __attribute__((packed)) MultiFDPacket_t;
 
@@ -224,6 +229,8 @@ typedef struct {
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
 void multifd_send_fill_packet(MultiFDSendParams *p);
+void multifd_zero_page_check_send(MultiFDSendParams *p);
+void multifd_zero_page_check_recv(MultiFDRecvParams *p);
 
 static inline void multifd_send_prepare_header(MultiFDSendParams *p)
 {
diff --git a/qapi/migration.json b/qapi/migration.json
index 99843a8e95..e2450b92d4 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -660,9 +660,11 @@
 #
 # @none: Do not perform zero page checking.
 #
+# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
+#
 ##
 { 'enum': 'ZeroPageDetection',
-  'data': [ 'legacy', 'none' ] }
+  'data': [ 'legacy', 'none', 'multifd' ] }
 
 ##
 # @BitmapMigrationBitmapAliasTransform:
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
                   ` (2 preceding siblings ...)
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
@ 2024-02-16 22:39 ` Hao Xiang
  2024-02-21 16:11   ` Elena Ufimtseva
  2024-02-21 21:06   ` Fabiano Rosas
  2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:39 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

This change adds a dedicated handler for MigrationOps::ram_save_target_page in
multifd live migration. Now zero page checking can be done in the multifd threads
and this becomes the default configuration. We still provide backward compatibility
where zero page checking is done from the migration main thread.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 migration/multifd.c |  1 +
 migration/options.c |  2 +-
 migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
 3 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index fbb40ea10b..ef5dad1019 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/rcu.h"
+#include "qemu/cutils.h"
 #include "exec/target_page.h"
 #include "sysemu/sysemu.h"
 #include "exec/ramblock.h"
diff --git a/migration/options.c b/migration/options.c
index 3c603391b0..3c79b6ccd4 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -181,7 +181,7 @@ Property migration_properties[] = {
                       MIG_MODE_NORMAL),
     DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
                        parameters.zero_page_detection,
-                       ZERO_PAGE_DETECTION_LEGACY),
+                       ZERO_PAGE_DETECTION_MULTIFD),
 
     /* Migration capabilities */
     DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
diff --git a/migration/ram.c b/migration/ram.c
index 5ece9f042e..b088c5a98c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
     QEMUFile *file = pss->pss_channel;
     int len = 0;
 
-    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
-        return 0;
-    }
-
     if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
         return 0;
     }
@@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
 
 static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
 {
+    assert(migrate_multifd());
+    assert(!migrate_compress());
+    assert(!migration_in_postcopy());
+
     if (!multifd_queue_page(block, offset)) {
         return -1;
     }
@@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
  */
 static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
 {
-    RAMBlock *block = pss->block;
     ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
     int res;
 
@@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
         return 1;
     }
 
+    return ram_save_page(rs, pss);
+}
+
+/**
+ * ram_save_target_page_multifd: save one target page
+ *
+ * Returns the number of pages written
+ *
+ * @rs: current RAM state
+ * @pss: data about the page we want to send
+ */
+static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
+{
+    RAMBlock *block = pss->block;
+    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
+
+    /* Multifd is not compatible with old compression. */
+    assert(!migrate_compress());
+
+    /* Multifd is not compabible with postcopy. */
+    assert(!migration_in_postcopy());
+
     /*
-     * Do not use multifd in postcopy as one whole host page should be
-     * placed.  Meanwhile postcopy requires atomic update of pages, so even
-     * if host page size == guest page size the dest guest during run may
-     * still see partially copied pages which is data corruption.
+     * Backward compatibility support. While using multifd live
+     * migration, we still need to handle zero page checking on the
+     * migration main thread.
      */
-    if (migrate_multifd() && !migration_in_postcopy()) {
-        return ram_save_multifd_page(block, offset);
+    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
+        if (save_zero_page(rs, pss, offset)) {
+            return 1;
+        }
     }
 
-    return ram_save_page(rs, pss);
+    return ram_save_multifd_page(block, offset);
 }
 
 /* Should be called before sending a host page */
@@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     }
 
     migration_ops = g_malloc0(sizeof(MigrationOps));
-    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+
+    if (migrate_multifd()) {
+        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
+    } else {
+        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
+    }
 
     bql_unlock();
     ret = multifd_send_sync_main();
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
                   ` (3 preceding siblings ...)
  2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
@ 2024-02-16 22:40 ` Hao Xiang
  2024-02-21 20:59   ` Fabiano Rosas
  2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
  2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang
  6 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:40 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

Now that zero page checking is done on the multifd sender threads by
default, we still provide an option for backward compatibility. This
change adds a qtest migration test case to set the zero-page-detection
option to "legacy" and run multifd migration with zero page checking on the
migration main thread.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 tests/qtest/migration-test.c | 52 ++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 8a5bb1752e..c27083110a 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2621,6 +2621,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
     return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
 }
 
+static void *
+test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
+                                                        QTestState *to)
+{
+    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+    migrate_set_parameter_str(from, "zero-page-detection", "legacy");
+    return NULL;
+}
+
+static void *
+test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
+                                                      QTestState *to)
+{
+    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
+    migrate_set_parameter_str(from, "zero-page-detection", "none");
+    return NULL;
+}
+
 static void *
 test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
                                             QTestState *to)
@@ -2652,6 +2670,36 @@ static void test_multifd_tcp_none(void)
     test_precopy_common(&args);
 }
 
+static void test_multifd_tcp_zero_page_legacy(void)
+{
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
+        /*
+         * Multifd is more complicated than most of the features, it
+         * directly takes guest page buffers when sending, make sure
+         * everything will work alright even if guest page is changing.
+         */
+        .live = true,
+    };
+    test_precopy_common(&args);
+}
+
+static void test_multifd_tcp_no_zero_page(void)
+{
+    MigrateCommon args = {
+        .listen_uri = "defer",
+        .start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
+        /*
+         * Multifd is more complicated than most of the features, it
+         * directly takes guest page buffers when sending, make sure
+         * everything will work alright even if guest page is changing.
+         */
+        .live = true,
+    };
+    test_precopy_common(&args);
+}
+
 static void test_multifd_tcp_zlib(void)
 {
     MigrateCommon args = {
@@ -3550,6 +3598,10 @@ int main(int argc, char **argv)
     }
     migration_test_add("/migration/multifd/tcp/plain/none",
                        test_multifd_tcp_none);
+    migration_test_add("/migration/multifd/tcp/plain/zero_page_legacy",
+                       test_multifd_tcp_zero_page_legacy);
+    migration_test_add("/migration/multifd/tcp/plain/no_zero_page",
+                       test_multifd_tcp_no_zero_page);
     migration_test_add("/migration/multifd/tcp/plain/cancel",
                        test_multifd_tcp_cancel);
     migration_test_add("/migration/multifd/tcp/plain/zlib",
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
                   ` (4 preceding siblings ...)
  2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
@ 2024-02-16 22:40 ` Hao Xiang
  2024-02-21 12:07   ` Markus Armbruster
  2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang
  6 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:40 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

This change extends the MigrationStatus interface to track zero pages
and zero bytes counter.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 migration/migration-hmp-cmds.c      |  4 ++++
 migration/migration.c               |  2 ++
 qapi/migration.json                 | 15 ++++++++++++++-
 tests/migration/guestperf/engine.py |  2 ++
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 7e96ae6ffd..abe035c9f2 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -111,6 +111,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->ram->normal);
         monitor_printf(mon, "normal bytes: %" PRIu64 " kbytes\n",
                        info->ram->normal_bytes >> 10);
+        monitor_printf(mon, "zero: %" PRIu64 " pages\n",
+                       info->ram->zero);
+        monitor_printf(mon, "zero bytes: %" PRIu64 " kbytes\n",
+                       info->ram->zero_bytes >> 10);
         monitor_printf(mon, "dirty sync count: %" PRIu64 "\n",
                        info->ram->dirty_sync_count);
         monitor_printf(mon, "page size: %" PRIu64 " kbytes\n",
diff --git a/migration/migration.c b/migration/migration.c
index ab21de2cad..1968ea7075 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1112,6 +1112,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
     info->ram->skipped = 0;
     info->ram->normal = stat64_get(&mig_stats.normal_pages);
     info->ram->normal_bytes = info->ram->normal * page_size;
+    info->ram->zero = stat64_get(&mig_stats.zero_pages);
+    info->ram->zero_bytes = info->ram->zero * page_size;
     info->ram->mbps = s->mbps;
     info->ram->dirty_sync_count =
         stat64_get(&mig_stats.dirty_sync_count);
diff --git a/qapi/migration.json b/qapi/migration.json
index e2450b92d4..892875da18 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -63,6 +63,10 @@
 #     between 0 and @dirty-sync-count * @multifd-channels.  (since
 #     7.1)
 #
+# @zero: number of zero pages (since 9.0)
+#
+# @zero-bytes: number of zero bytes sent (since 9.0)
+#
 # Features:
 #
 # @deprecated: Member @skipped is always zero since 1.5.3
@@ -81,7 +85,8 @@
            'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
            'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
            'postcopy-bytes': 'uint64',
-           'dirty-sync-missed-zero-copy': 'uint64' } }
+           'dirty-sync-missed-zero-copy': 'uint64',
+           'zero': 'int', 'zero-bytes': 'int' } }
 
 ##
 # @XBZRLECacheStats:
@@ -332,6 +337,8 @@
 #           "duplicate":123,
 #           "normal":123,
 #           "normal-bytes":123456,
+#           "zero":123,
+#           "zero-bytes":123456,
 #           "dirty-sync-count":15
 #         }
 #      }
@@ -358,6 +365,8 @@
 #             "duplicate":123,
 #             "normal":123,
 #             "normal-bytes":123456,
+#             "zero":123,
+#             "zero-bytes":123456,
 #             "dirty-sync-count":15
 #          }
 #       }
@@ -379,6 +388,8 @@
 #             "duplicate":123,
 #             "normal":123,
 #             "normal-bytes":123456,
+#             "zero":123,
+#             "zero-bytes":123456,
 #             "dirty-sync-count":15
 #          },
 #          "disk":{
@@ -405,6 +416,8 @@
 #             "duplicate":10,
 #             "normal":3333,
 #             "normal-bytes":3412992,
+#             "zero":3333,
+#             "zero-bytes":3412992,
 #             "dirty-sync-count":15
 #          },
 #          "xbzrle-cache":{
diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py
index 608d7270f6..75315b99b7 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -92,6 +92,8 @@ def _migrate_progress(self, vm):
                 info["ram"].get("skipped", 0),
                 info["ram"].get("normal", 0),
                 info["ram"].get("normal-bytes", 0),
+                info["ram"].get("zero", 0);
+                info["ram"].get("zero-bytes", 0);
                 info["ram"].get("dirty-pages-rate", 0),
                 info["ram"].get("mbps", 0),
                 info["ram"].get("dirty-sync-count", 0)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration.
  2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
                   ` (5 preceding siblings ...)
  2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
@ 2024-02-16 22:40 ` Hao Xiang
  6 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-16 22:40 UTC (permalink / raw)
  To: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

Add myself to maintain multifd zero page checking acceleration function.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
---
 MAINTAINERS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a24c2b51b6..3ca407cb58 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3403,6 +3403,11 @@ F: tests/migration/
 F: util/userfaultfd.c
 X: migration/rdma*
 
+Migration multifd zero page checking acceleration
+M: Hao Xiang <hao.xiang@bytedance.com>
+S: Maintained
+F: migration/multifd-zero-page.c
+
 RDMA Migration
 R: Li Zhijian <lizhijian@fujitsu.com>
 R: Peter Xu <peterx@redhat.com>
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
@ 2024-02-16 23:49   ` Richard Henderson
  2024-02-23  4:38     ` [External] " Hao Xiang
  2024-02-21 12:04   ` Markus Armbruster
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Richard Henderson @ 2024-02-16 23:49 UTC (permalink / raw)
  To: Hao Xiang, pbonzini, berrange, eduardo, peterx, farosas, eblake,
	armbru, thuth, lvivier, qemu-devel, jdenemar

On 2/16/24 12:39, Hao Xiang wrote:
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> +{
> +    for (int i = 0; i < p->zero_num; i++) {
> +        void *page = p->host + p->zero[i];
> +        if (!buffer_is_zero(page, p->page_size)) {
> +            memset(page, 0, p->page_size);
> +        }
> +    }
> +}

You should not check the buffer is zero here, you should just zero it.


r~


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
@ 2024-02-21 12:03   ` Markus Armbruster
  2024-02-23  4:22     ` [External] " Hao Xiang
  2024-02-21 13:58   ` Elena Ufimtseva
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Markus Armbruster @ 2024-02-21 12:03 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, thuth,
	lvivier, qemu-devel, jdenemar

Hao Xiang <hao.xiang@bytedance.com> writes:

> This new parameter controls where the zero page checking is running.
> 1. If this parameter is set to 'legacy', zero page checking is
> done in the migration main thread.
> 2. If this parameter is set to 'none', zero page checking is disabled.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5a565d9b8d..99843a8e95 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -653,6 +653,17 @@
>  { 'enum': 'MigMode',
>    'data': [ 'normal', 'cpr-reboot' ] }
>  
> +##
> +# @ZeroPageDetection:
> +#
> +# @legacy: Perform zero page checking from main migration thread. (since 9.0)
> +#
> +# @none: Do not perform zero page checking.
> +#
> +##

The entire type is since 9.0.  Thus:

   ##
   # @ZeroPageDetection:
   #
   # @legacy: Perform zero page checking from main migration thread.
   #
   # @none: Do not perform zero page checking.
   #
   # Since: 9.0
   ##

> +{ 'enum': 'ZeroPageDetection',
> +  'data': [ 'legacy', 'none' ] }
> +
>  ##
>  # @BitmapMigrationBitmapAliasTransform:
>  #
> @@ -874,6 +885,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)

The description feels a bit lazy :)

Suggest

   # @zero-page-detection: Whether and how to detect zero pages.  Default
   #     is 'legacy'.  (since 9.0)

Same for the other two copies.

> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -907,7 +921,8 @@
>             'block-bitmap-mapping',
>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>             'vcpu-dirty-limit',
> -           'mode'] }
> +           'mode',
> +           'zero-page-detection'] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1066,6 +1081,10 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)
> +#
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1119,7 +1138,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*zero-page-detection': 'ZeroPageDetection'} }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1294,6 +1314,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1344,7 +1367,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*zero-page-detection': 'ZeroPageDetection'} }
>  
>  ##
>  # @query-migrate-parameters:



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
  2024-02-16 23:49   ` Richard Henderson
@ 2024-02-21 12:04   ` Markus Armbruster
  2024-02-21 16:00   ` Elena Ufimtseva
  2024-02-21 21:04   ` Fabiano Rosas
  3 siblings, 0 replies; 42+ messages in thread
From: Markus Armbruster @ 2024-02-21 12:04 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, thuth,
	lvivier, qemu-devel, jdenemar

Hao Xiang <hao.xiang@bytedance.com> writes:

> 1. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 3. Add proper asserts to ensure pages->normal are used for normal pages
> in all scenarios.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 99843a8e95..e2450b92d4 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -660,9 +660,11 @@
>  #
>  # @none: Do not perform zero page checking.
>  #
> +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)

As pointed out in my review of PATCH 3, the entire type is since 9.0.

> +#
>  ##
>  { 'enum': 'ZeroPageDetection',
> -  'data': [ 'legacy', 'none' ] }
> +  'data': [ 'legacy', 'none', 'multifd' ] }
>  
>  ##
>  # @BitmapMigrationBitmapAliasTransform:

I don't like having 'none' (don't detect) between the two ways to
detect.  Put it either first or last.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface.
  2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
@ 2024-02-21 12:07   ` Markus Armbruster
  0 siblings, 0 replies; 42+ messages in thread
From: Markus Armbruster @ 2024-02-21 12:07 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, thuth,
	lvivier, qemu-devel, jdenemar

Hao Xiang <hao.xiang@bytedance.com> writes:

> This change extends the MigrationStatus interface to track zero pages
> and zero bytes counter.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/migration-hmp-cmds.c      |  4 ++++
>  migration/migration.c               |  2 ++
>  qapi/migration.json                 | 15 ++++++++++++++-
>  tests/migration/guestperf/engine.py |  2 ++
>  4 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
> index 7e96ae6ffd..abe035c9f2 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -111,6 +111,10 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>                         info->ram->normal);
>          monitor_printf(mon, "normal bytes: %" PRIu64 " kbytes\n",
>                         info->ram->normal_bytes >> 10);
> +        monitor_printf(mon, "zero: %" PRIu64 " pages\n",
> +                       info->ram->zero);
> +        monitor_printf(mon, "zero bytes: %" PRIu64 " kbytes\n",
> +                       info->ram->zero_bytes >> 10);
>          monitor_printf(mon, "dirty sync count: %" PRIu64 "\n",
>                         info->ram->dirty_sync_count);
>          monitor_printf(mon, "page size: %" PRIu64 " kbytes\n",
> diff --git a/migration/migration.c b/migration/migration.c
> index ab21de2cad..1968ea7075 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1112,6 +1112,8 @@ static void populate_ram_info(MigrationInfo *info, MigrationState *s)
>      info->ram->skipped = 0;
>      info->ram->normal = stat64_get(&mig_stats.normal_pages);
>      info->ram->normal_bytes = info->ram->normal * page_size;
> +    info->ram->zero = stat64_get(&mig_stats.zero_pages);
> +    info->ram->zero_bytes = info->ram->zero * page_size;
>      info->ram->mbps = s->mbps;
>      info->ram->dirty_sync_count =
>          stat64_get(&mig_stats.dirty_sync_count);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index e2450b92d4..892875da18 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -63,6 +63,10 @@
>  #     between 0 and @dirty-sync-count * @multifd-channels.  (since
>  #     7.1)
>  #
> +# @zero: number of zero pages (since 9.0)
> +#
> +# @zero-bytes: number of zero bytes sent (since 9.0)

Awfully terse.  How are these two related?

Recommend to name the first one @zero-pages.

> +#
>  # Features:
>  #
>  # @deprecated: Member @skipped is always zero since 1.5.3
> @@ -81,7 +85,8 @@
>             'multifd-bytes': 'uint64', 'pages-per-second': 'uint64',
>             'precopy-bytes': 'uint64', 'downtime-bytes': 'uint64',
>             'postcopy-bytes': 'uint64',
> -           'dirty-sync-missed-zero-copy': 'uint64' } }
> +           'dirty-sync-missed-zero-copy': 'uint64',
> +           'zero': 'int', 'zero-bytes': 'int' } }

Please use 'size' for byte counts such as @zero-bytes.

>  
>  ##
>  # @XBZRLECacheStats:
> @@ -332,6 +337,8 @@
>  #           "duplicate":123,
>  #           "normal":123,
>  #           "normal-bytes":123456,
> +#           "zero":123,
> +#           "zero-bytes":123456,
>  #           "dirty-sync-count":15
>  #         }
>  #      }
> @@ -358,6 +365,8 @@
>  #             "duplicate":123,
>  #             "normal":123,
>  #             "normal-bytes":123456,
> +#             "zero":123,
> +#             "zero-bytes":123456,
>  #             "dirty-sync-count":15
>  #          }
>  #       }
> @@ -379,6 +388,8 @@
>  #             "duplicate":123,
>  #             "normal":123,
>  #             "normal-bytes":123456,
> +#             "zero":123,
> +#             "zero-bytes":123456,
>  #             "dirty-sync-count":15
>  #          },
>  #          "disk":{
> @@ -405,6 +416,8 @@
>  #             "duplicate":10,
>  #             "normal":3333,
>  #             "normal-bytes":3412992,
> +#             "zero":3333,
> +#             "zero-bytes":3412992,
>  #             "dirty-sync-count":15
>  #          },
>  #          "xbzrle-cache":{
> diff --git a/tests/migration/guestperf/engine.py b/tests/migration/guestperf/engine.py
> index 608d7270f6..75315b99b7 100644
> --- a/tests/migration/guestperf/engine.py
> +++ b/tests/migration/guestperf/engine.py
> @@ -92,6 +92,8 @@ def _migrate_progress(self, vm):
>                  info["ram"].get("skipped", 0),
>                  info["ram"].get("normal", 0),
>                  info["ram"].get("normal-bytes", 0),
> +                info["ram"].get("zero", 0);
> +                info["ram"].get("zero-bytes", 0);
>                  info["ram"].get("dirty-pages-rate", 0),
>                  info["ram"].get("mbps", 0),
>                  info["ram"].get("dirty-sync-count", 0)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
  2024-02-21 12:03   ` Markus Armbruster
@ 2024-02-21 13:58   ` Elena Ufimtseva
  2024-02-23  4:37     ` [External] " Hao Xiang
  2024-02-22 10:36   ` Peter Xu
  2024-02-26  7:18   ` Wang, Lei
  3 siblings, 1 reply; 42+ messages in thread
From: Elena Ufimtseva @ 2024-02-21 13:58 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

[-- Attachment #1: Type: text/plain, Size: 11151 bytes --]

On Fri, Feb 16, 2024 at 2:41 PM Hao Xiang <hao.xiang@bytedance.com> wrote:

> This new parameter controls where the zero page checking is running.
> 1. If this parameter is set to 'legacy', zero page checking is
> done in the migration main thread.
> 2. If this parameter is set to 'none', zero page checking is disabled.
>
>
Hello Hao

Few questions and comments.

First the commit message states that the parameter control where the
checking is done, but it also controls
if sending of zero pages is done by multifd threads or not.



> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  hw/core/qdev-properties-system.c    | 10 ++++++++++
>  include/hw/qdev-properties-system.h |  4 ++++
>  migration/migration-hmp-cmds.c      |  9 +++++++++
>  migration/options.c                 | 21 ++++++++++++++++++++
>  migration/options.h                 |  1 +
>  migration/ram.c                     |  4 ++++
>  qapi/migration.json                 | 30 ++++++++++++++++++++++++++---
>  7 files changed, 76 insertions(+), 3 deletions(-)
>
> diff --git a/hw/core/qdev-properties-system.c
> b/hw/core/qdev-properties-system.c
> index 1a396521d5..63843f18b5 100644
> --- a/hw/core/qdev-properties-system.c
> +++ b/hw/core/qdev-properties-system.c
> @@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
>      .set_default_value = qdev_propinfo_set_default_value_enum,
>  };
>
> +const PropertyInfo qdev_prop_zero_page_detection = {
> +    .name = "ZeroPageDetection",
> +    .description = "zero_page_detection values, "
> +                   "multifd,legacy,none",
> +    .enum_table = &ZeroPageDetection_lookup,
> +    .get = qdev_propinfo_get_enum,
> +    .set = qdev_propinfo_set_enum,
> +    .set_default_value = qdev_propinfo_set_default_value_enum,
> +};
> +
>  /* --- Reserved Region --- */
>
>  /*
> diff --git a/include/hw/qdev-properties-system.h
> b/include/hw/qdev-properties-system.h
> index 06c359c190..839b170235 100644
> --- a/include/hw/qdev-properties-system.h
> +++ b/include/hw/qdev-properties-system.h
> @@ -8,6 +8,7 @@ extern const PropertyInfo qdev_prop_macaddr;
>  extern const PropertyInfo qdev_prop_reserved_region;
>  extern const PropertyInfo qdev_prop_multifd_compression;
>  extern const PropertyInfo qdev_prop_mig_mode;
> +extern const PropertyInfo qdev_prop_zero_page_detection;
>  extern const PropertyInfo qdev_prop_losttickpolicy;
>  extern const PropertyInfo qdev_prop_blockdev_on_error;
>  extern const PropertyInfo qdev_prop_bios_chs_trans;
> @@ -47,6 +48,9 @@ extern const PropertyInfo
> qdev_prop_iothread_vq_mapping_list;
>  #define DEFINE_PROP_MIG_MODE(_n, _s, _f, _d) \
>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_mig_mode, \
>                         MigMode)
> +#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \
> +    DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \
> +                       ZeroPageDetection)
>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
>                          LostTickPolicy)
> diff --git a/migration/migration-hmp-cmds.c
> b/migration/migration-hmp-cmds.c
> index 99b49df5dd..7e96ae6ffd 100644
> --- a/migration/migration-hmp-cmds.c
> +++ b/migration/migration-hmp-cmds.c
> @@ -344,6 +344,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const
> QDict *qdict)
>          monitor_printf(mon, "%s: %s\n",
>
>  MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESSION),
>              MultiFDCompression_str(params->multifd_compression));
> +        assert(params->has_zero_page_detection);
>

What is the reason to have assert here?


> +        monitor_printf(mon, "%s: %s\n",
> +
> MigrationParameter_str(MIGRATION_PARAMETER_ZERO_PAGE_DETECTION),
> +            qapi_enum_lookup(&ZeroPageDetection_lookup,
> +                params->zero_page_detection));
>          monitor_printf(mon, "%s: %" PRIu64 " bytes\n",
>              MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
>              params->xbzrle_cache_size);
> @@ -634,6 +639,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const
> QDict *qdict)
>          p->has_multifd_zstd_level = true;
>          visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
>          break;
> +    case MIGRATION_PARAMETER_ZERO_PAGE_DETECTION:
> +        p->has_zero_page_detection = true;
> +        visit_type_ZeroPageDetection(v, param, &p->zero_page_detection,
> &err);
> +        break;
>      case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
>          p->has_xbzrle_cache_size = true;
>          if (!visit_type_size(v, param, &cache_size, &err)) {
> diff --git a/migration/options.c b/migration/options.c
> index 3e3e0b93b4..3c603391b0 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -179,6 +179,9 @@ Property migration_properties[] = {
>      DEFINE_PROP_MIG_MODE("mode", MigrationState,
>                        parameters.mode,
>                        MIG_MODE_NORMAL),
> +    DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> +                       parameters.zero_page_detection,
> +                       ZERO_PAGE_DETECTION_LEGACY),
>
>      /* Migration capabilities */
>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> @@ -903,6 +906,13 @@ uint64_t migrate_xbzrle_cache_size(void)
>      return s->parameters.xbzrle_cache_size;
>  }
>
> +ZeroPageDetection migrate_zero_page_detection(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->parameters.zero_page_detection;
> +}
> +
>  /* parameter setters */
>
>  void migrate_set_block_incremental(bool value)
> @@ -1013,6 +1023,8 @@ MigrationParameters
> *qmp_query_migrate_parameters(Error **errp)
>      params->vcpu_dirty_limit = s->parameters.vcpu_dirty_limit;
>      params->has_mode = true;
>      params->mode = s->parameters.mode;
> +    params->has_zero_page_detection = true;
> +    params->zero_page_detection = s->parameters.zero_page_detection;
>
>      return params;
>  }
> @@ -1049,6 +1061,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_x_vcpu_dirty_limit_period = true;
>      params->has_vcpu_dirty_limit = true;
>      params->has_mode = true;
> +    params->has_zero_page_detection = true;
>  }
>
>  /*
> @@ -1350,6 +1363,10 @@ static void
> migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_mode) {
>          dest->mode = params->mode;
>      }
> +
> +    if (params->has_zero_page_detection) {
> +        dest->zero_page_detection = params->zero_page_detection;
> +    }
>  }
>
>  static void migrate_params_apply(MigrateSetParameters *params, Error
> **errp)
> @@ -1494,6 +1511,10 @@ static void
> migrate_params_apply(MigrateSetParameters *params, Error **errp)
>      if (params->has_mode) {
>          s->parameters.mode = params->mode;
>      }
> +
> +    if (params->has_zero_page_detection) {
> +        s->parameters.zero_page_detection = params->zero_page_detection;
> +    }
>  }
>
>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error
> **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 246c160aee..b7c4fb3861 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -93,6 +93,7 @@ const char *migrate_tls_authz(void);
>  const char *migrate_tls_creds(void);
>  const char *migrate_tls_hostname(void);
>  uint64_t migrate_xbzrle_cache_size(void);
> +ZeroPageDetection migrate_zero_page_detection(void);
>
>  /* parameters setters */
>
> diff --git a/migration/ram.c b/migration/ram.c
> index 4649a81204..556725c30f 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1123,6 +1123,10 @@ static int save_zero_page(RAMState *rs,
> PageSearchStatus *pss,
>      QEMUFile *file = pss->pss_channel;
>      int len = 0;
>
> +    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> +        return 0;
> +    }
> +
>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>          return 0;
>      }
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 5a565d9b8d..99843a8e95 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -653,6 +653,17 @@
>  { 'enum': 'MigMode',
>    'data': [ 'normal', 'cpr-reboot' ] }
>
> +##
> +# @ZeroPageDetection:
> +#
> +# @legacy: Perform zero page checking from main migration thread. (since
> 9.0)
> +#
> +# @none: Do not perform zero page checking.
> +#
> +##
> +{ 'enum': 'ZeroPageDetection',
> +  'data': [ 'legacy', 'none' ] }
> +
>

Above you have introduced the qdev property qdev_prop_zero_page_detection
with multifd, but it is not present in the scheme.
Perhaps 'mulitfd' in qdev_prop_zero_page_detection belongs to another patch?



>  ##
>  # @BitmapMigrationBitmapAliasTransform:
>  #
> @@ -874,6 +885,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -907,7 +921,8 @@
>             'block-bitmap-mapping',
>             { 'name': 'x-vcpu-dirty-limit-period', 'features':
> ['unstable'] },
>             'vcpu-dirty-limit',
> -           'mode'] }
> +           'mode',
> +           'zero-page-detection'] }
>
>  ##
>  # @MigrateSetParameters:
> @@ -1066,6 +1081,10 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)
> +#
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1119,7 +1138,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*zero-page-detection': 'ZeroPageDetection'} }
>
>  ##
>  # @migrate-set-parameters:
> @@ -1294,6 +1314,9 @@
>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>  #        (Since 8.2)
>  #
> +# @zero-page-detection: See description in @ZeroPageDetection.
> +#     Default is 'legacy'. (Since 9.0)
> +#
>  # Features:
>  #
>  # @deprecated: Member @block-incremental is deprecated.  Use
> @@ -1344,7 +1367,8 @@
>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>                                              'features': [ 'unstable' ] },
>              '*vcpu-dirty-limit': 'uint64',
> -            '*mode': 'MigMode'} }
> +            '*mode': 'MigMode',
> +            '*zero-page-detection': 'ZeroPageDetection'} }
>
>  ##
>  # @query-migrate-parameters:
> --
> 2.30.2
>
>
>

-- 
Elena

[-- Attachment #2: Type: text/html, Size: 14026 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format.
  2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
@ 2024-02-21 15:37   ` Elena Ufimtseva
  2024-02-23  4:18     ` [External] " Hao Xiang
  0 siblings, 1 reply; 42+ messages in thread
From: Elena Ufimtseva @ 2024-02-21 15:37 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

[-- Attachment #1: Type: text/plain, Size: 11680 bytes --]

On Fri, Feb 16, 2024 at 2:41 PM Hao Xiang <hao.xiang@bytedance.com> wrote:

> This change adds zero page counters and updates multifd send/receive
> tracing format to track the newly added counters.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/multifd.c    | 43 ++++++++++++++++++++++++++++++++++--------
>  migration/multifd.h    | 21 ++++++++++++++++++++-
>  migration/ram.c        |  1 -
>  migration/trace-events |  8 ++++----
>  4 files changed, 59 insertions(+), 14 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index adfe8c9a0a..a33dba40d9 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -236,6 +236,8 @@ static void multifd_pages_reset(MultiFDPages_t *pages)
>       * overwritten later when reused.
>       */
>      pages->num = 0;
> +    pages->normal_num = 0;
> +    pages->zero_num = 0;
>      pages->block = NULL;
>  }
>

> @@ -309,6 +311,8 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
>
>      pages->allocated = n;
>      pages->offset = g_new0(ram_addr_t, n);
> +    pages->normal = g_new0(ram_addr_t, n);
> +    pages->zero = g_new0(ram_addr_t, n);
>
>
     return pages;
>  }
> @@ -319,6 +323,10 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
>      pages->allocated = 0;
>      g_free(pages->offset);
>      pages->offset = NULL;
> +    g_free(pages->normal);
> +    pages->normal = NULL;
> +    g_free(pages->zero);
> +    pages->zero = NULL;
>      g_free(pages);
>  }
>
> @@ -332,6 +340,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>      packet->flags = cpu_to_be32(p->flags);
>      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
>      packet->normal_pages = cpu_to_be32(pages->num);
> +    packet->zero_pages = cpu_to_be32(pages->zero_num);
>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>
>      packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
> @@ -350,9 +359,10 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>
>      p->packets_sent++;
>      p->total_normal_pages += pages->num;
> +    p->total_zero_pages += pages->zero_num;
>
> -    trace_multifd_send(p->id, packet_num, pages->num, p->flags,
> -                       p->next_packet_size);
> +    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> +                       p->flags, p->next_packet_size);
>  }
>
>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> @@ -393,20 +403,29 @@ static int
> multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>      p->normal_num = be32_to_cpu(packet->normal_pages);
>      if (p->normal_num > packet->pages_alloc) {
>          error_setg(errp, "multifd: received packet "
> -                   "with %u pages and expected maximum pages are %u",
> +                   "with %u normal pages and expected maximum pages are
> %u",
>                     p->normal_num, packet->pages_alloc) ;
>          return -1;
>      }
>
> +    p->zero_num = be32_to_cpu(packet->zero_pages);
> +    if (p->zero_num > packet->pages_alloc - p->normal_num) {
> +        error_setg(errp, "multifd: received packet "
> +                   "with %u zero pages and expected maximum zero pages
> are %u",
> +                   p->zero_num, packet->pages_alloc - p->normal_num) ;
> +        return -1;
> +    }


You could probably combine this check with normal_num against pages_alloc.

> +
>      p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>      p->packet_num = be64_to_cpu(packet->packet_num);
>      p->packets_recved++;
>      p->total_normal_pages += p->normal_num;
> +    p->total_zero_pages += p->zero_num;
>
> -    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
> -                       p->next_packet_size);
> +    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
> +                       p->flags, p->next_packet_size);
>
> -    if (p->normal_num == 0) {
> +    if (p->normal_num == 0 && p->zero_num == 0) {
>          return 0;
>      }
>
> @@ -823,6 +842,8 @@ static void *multifd_send_thread(void *opaque)
>
>              stat64_add(&mig_stats.multifd_bytes,
>                         p->next_packet_size + p->packet_len);
> +            stat64_add(&mig_stats.normal_pages, pages->num);
>

That seems wrong. pages->num is the number of pages total in the packet.
But next patch changes it, so I suggest or change it here and not in 3/7.

+            stat64_add(&mig_stats.zero_pages, pages->zero_num);
>
>              multifd_pages_reset(p->pages);
>              p->next_packet_size = 0;
> @@ -866,7 +887,8 @@ out:
>
>      rcu_unregister_thread();
>      migration_threads_remove(thread);
> -    trace_multifd_send_thread_end(p->id, p->packets_sent,
> p->total_normal_pages);
> +    trace_multifd_send_thread_end(p->id, p->packets_sent,
> p->total_normal_pages,
> +                                  p->total_zero_pages);
>
>      return NULL;
>  }
> @@ -1132,6 +1154,8 @@ static void
> multifd_recv_cleanup_channel(MultiFDRecvParams *p)
>      p->iov = NULL;
>      g_free(p->normal);
>      p->normal = NULL;
> +    g_free(p->zero);
> +    p->zero = NULL;
>      multifd_recv_state->ops->recv_cleanup(p);
>  }
>
> @@ -1251,7 +1275,9 @@ static void *multifd_recv_thread(void *opaque)
>      }
>
>      rcu_unregister_thread();
> -    trace_multifd_recv_thread_end(p->id, p->packets_recved,
> p->total_normal_pages);
> +    trace_multifd_recv_thread_end(p->id, p->packets_recved,
> +                                  p->total_normal_pages,
> +                                  p->total_zero_pages);
>
>      return NULL;
>  }
> @@ -1290,6 +1316,7 @@ int multifd_recv_setup(Error **errp)
>          p->name = g_strdup_printf("multifdrecv_%d", i);
>          p->iov = g_new0(struct iovec, page_count);
>          p->normal = g_new0(ram_addr_t, page_count);
> +        p->zero = g_new0(ram_addr_t, page_count);
>          p->page_count = page_count;
>          p->page_size = qemu_target_page_size();
>      }
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 8a1cad0996..9822ff298a 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -48,7 +48,10 @@ typedef struct {
>      /* size of the next packet that contains pages */
>      uint32_t next_packet_size;
>      uint64_t packet_num;
> -    uint64_t unused[4];    /* Reserved for future use */
> +    /* zero pages */
> +    uint32_t zero_pages;
> +    uint32_t unused32[1];    /* Reserved for future use */
> +    uint64_t unused64[3];    /* Reserved for future use */
>      char ramblock[256];
>      uint64_t offset[];
>  } __attribute__((packed)) MultiFDPacket_t;
> @@ -56,10 +59,18 @@ typedef struct {
>  typedef struct {
>      /* number of used pages */
>      uint32_t num;
> +    /* number of normal pages */
> +    uint32_t normal_num;
> +    /* number of zero pages */
> +    uint32_t zero_num;
>      /* number of allocated pages */
>      uint32_t allocated;
>      /* offset of each page */
>      ram_addr_t *offset;
> +    /* offset of normal page */
> +    ram_addr_t *normal;
> +    /* offset of zero page */
> +    ram_addr_t *zero;
>      RAMBlock *block;
>  } MultiFDPages_t;
>
> @@ -124,6 +135,8 @@ typedef struct {
>      uint64_t packets_sent;
>      /* non zero pages sent through this channel */
>      uint64_t total_normal_pages;
> +    /* zero pages sent through this channel */
> +    uint64_t total_zero_pages;
>

Can we initialize these to zero when threads are being set up?
Also, I have a strong desire to rename these.. later.


>      /* buffers to send */
>      struct iovec *iov;
>      /* number of iovs used */
> @@ -178,12 +191,18 @@ typedef struct {
>      uint8_t *host;
>      /* non zero pages recv through this channel */
>      uint64_t total_normal_pages;
> +    /* zero pages recv through this channel */
> +    uint64_t total_zero_pages;
>      /* buffers to recv */
>      struct iovec *iov;
>      /* Pages that are not zero */
>      ram_addr_t *normal;
>      /* num of non zero pages */
>      uint32_t normal_num;
> +    /* Pages that are zero */
> +    ram_addr_t *zero;
> +    /* num of zero pages */
> +    uint32_t zero_num;
>      /* used for de-compression methods */
>      void *data;
>  } MultiFDRecvParams;
> diff --git a/migration/ram.c b/migration/ram.c
> index 556725c30f..5ece9f042e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1259,7 +1259,6 @@ static int ram_save_multifd_page(RAMBlock *block,
> ram_addr_t offset)
>      if (!multifd_queue_page(block, offset)) {
>          return -1;
>      }
> -    stat64_add(&mig_stats.normal_pages, 1);
>
>      return 1;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index 298ad2b0dd..9f1d7ae71a 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -128,21 +128,21 @@ postcopy_preempt_reset_channel(void) ""
>  # multifd.c
>  multifd_new_send_channel_async(uint8_t id) "channel %u"
>  multifd_new_send_channel_async_error(uint8_t id, void *err) "channel=%u
> err=%p"
> -multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t
> flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages
> %u flags 0x%x next packet size %u"
> +multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t
> zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %"
> PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
>  multifd_recv_new_channel(uint8_t id) "channel %u"
>  multifd_recv_sync_main(long packet_num) "packet num %ld"
>  multifd_recv_sync_main_signal(uint8_t id) "channel %u"
>  multifd_recv_sync_main_wait(uint8_t id) "channel %u"
>  multifd_recv_terminate_threads(bool error) "error %d"
> -multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages)
> "channel %u packets %" PRIu64 " pages %" PRIu64
> +multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t
> normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal
> pages %" PRIu64 " zero pages %" PRIu64
>  multifd_recv_thread_start(uint8_t id) "%u"
> -multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t
> flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal
> pages %u flags 0x%x next packet size %u"
> +multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal_pages,
> uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u
> packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet
> size %u"
>  multifd_send_error(uint8_t id) "channel %u"
>  multifd_send_sync_main(long packet_num) "packet num %ld"
>  multifd_send_sync_main_signal(uint8_t id) "channel %u"
>  multifd_send_sync_main_wait(uint8_t id) "channel %u"
>  multifd_send_terminate_threads(void) ""
> -multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t
> normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
> +multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t
> normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal
> pages %"  PRIu64 " zero pages %"  PRIu64
>  multifd_send_thread_start(uint8_t id) "%u"
>  multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char
> *hostname) "ioc=%p tioc=%p hostname=%s"
>  multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p
> err=%s"
> --
> 2.30.2
>
>
>

-- 
Elena

[-- Attachment #2: Type: text/html, Size: 14832 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
  2024-02-16 23:49   ` Richard Henderson
  2024-02-21 12:04   ` Markus Armbruster
@ 2024-02-21 16:00   ` Elena Ufimtseva
  2024-02-23  4:59     ` [External] " Hao Xiang
  2024-02-21 21:04   ` Fabiano Rosas
  3 siblings, 1 reply; 42+ messages in thread
From: Elena Ufimtseva @ 2024-02-21 16:00 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

[-- Attachment #1: Type: text/plain, Size: 14576 bytes --]

On Fri, Feb 16, 2024 at 2:42 PM Hao Xiang <hao.xiang@bytedance.com> wrote:

> 1. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 3. Add proper asserts to ensure pages->normal are used for normal pages
> in all scenarios.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/meson.build         |  1 +
>  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
>  migration/multifd-zlib.c      | 26 ++++++++++++---
>  migration/multifd-zstd.c      | 25 ++++++++++++---
>  migration/multifd.c           | 50 +++++++++++++++++++++++------
>  migration/multifd.h           |  7 +++++
>  qapi/migration.json           |  4 ++-
>  7 files changed, 151 insertions(+), 21 deletions(-)
>  create mode 100644 migration/multifd-zero-page.c
>
> diff --git a/migration/meson.build b/migration/meson.build
> index 92b1cc4297..1eeb915ff6 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -22,6 +22,7 @@ system_ss.add(files(
>    'migration.c',
>    'multifd.c',
>    'multifd-zlib.c',
> +  'multifd-zero-page.c',
>    'ram-compress.c',
>    'options.c',
>    'postcopy-ram.c',
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> new file mode 100644
> index 0000000000..f0cd8e2c53
> --- /dev/null
> +++ b/migration/multifd-zero-page.c
> @@ -0,0 +1,59 @@
> +/*
> + * Multifd zero page detection implementation.
> + *
> + * Copyright (c) 2024 Bytedance Inc
> + *
> + * Authors:
> + *  Hao Xiang <hao.xiang@bytedance.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include "exec/ramblock.h"
> +#include "migration.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "ram.h"
> +
> +void multifd_zero_page_check_send(MultiFDSendParams *p)
> +{
> +    /*
> +     * QEMU older than 9.0 don't understand zero page
> +     * on multifd channel. This switch is required to
> +     * maintain backward compatibility.
> +     */
> +    bool use_multifd_zero_page =
> +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> +    MultiFDPages_t *pages = p->pages;
> +    RAMBlock *rb = pages->block;
> +
> +    assert(pages->num != 0);
>

Not needed, the check is done right before calling send_prepare.


> +    assert(pages->normal_num == 0);
> +    assert(pages->zero_num == 0);
>

Why these asserts are needed?

> +
>
+    for (int i = 0; i < pages->num; i++) {
> +        uint64_t offset = pages->offset[i];
> +        if (use_multifd_zero_page &&
> +            buffer_is_zero(rb->host + offset, p->page_size)) {
> +            pages->zero[pages->zero_num] = offset;
> +            pages->zero_num++;
> +            ram_release_page(rb->idstr, offset);
> +        } else {
> +            pages->normal[pages->normal_num] = offset;
> +            pages->normal_num++;
> +        }
> +    }
> +}
> +
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> +{
> +    for (int i = 0; i < p->zero_num; i++) {
> +        void *page = p->host + p->zero[i];
> +        if (!buffer_is_zero(page, p->page_size)) {
> +            memset(page, 0, p->page_size);
> +        }
> +    }
> +}
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 012e3bdea1..cdfe0fa70e 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p,
> Error **errp)
>      int ret;
>      uint32_t i;
>
> +    multifd_zero_page_check_send(p);
> +
> +    if (!pages->normal_num) {
> +        p->next_packet_size = 0;
> +        goto out;
> +    }
> +
>      multifd_send_prepare_header(p);
>
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          uint32_t available = z->zbuff_len - out_size;
>          int flush = Z_NO_FLUSH;
>
> -        if (i == pages->num - 1) {
> +        if (i == pages->normal_num - 1) {
>              flush = Z_SYNC_FLUSH;
>          }
>
> @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p,
> Error **errp)
>           * with compression. zlib does not guarantee that this is safe,
>           * therefore copy the page before calling deflate().
>           */
> -        memcpy(z->buf, p->pages->block->host + pages->offset[i],
> p->page_size);
> +        memcpy(z->buf, p->pages->block->host + pages->normal[i],
> p->page_size);
>          zs->avail_in = p->page_size;
>          zs->next_in = z->buf;
>
> @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p,
> Error **errp)
>      p->iov[p->iovs_num].iov_len = out_size;
>      p->iovs_num++;
>      p->next_packet_size = out_size;
> -    p->flags |= MULTIFD_FLAG_ZLIB;
>
> +out:
> +    p->flags |= MULTIFD_FLAG_ZLIB;
>      multifd_send_fill_packet(p);
> -
>
Spurious?

     return 0;
>  }
>
> @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p,
> Error **errp)
>                     p->id, flags, MULTIFD_FLAG_ZLIB);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        assert(in_size == 0);
> +        return 0;
>

return here will have no effect. Also, why is assert needed here?
This change also does not seem to fit the description of the patch, probaby
separate patch will be better.


> +    }
> +
>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>
>      if (ret != 0) {
> @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error
> **errp)
>                     p->id, out_size, expected_size);
>          return -1;
>      }
> +
>      return 0;
>  }
>
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index dc8fe43e94..27a1eba075 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p,
> Error **errp)
>      int ret;
>      uint32_t i;
>
> +    multifd_zero_page_check_send(p);
> +
> +    if (!pages->normal_num) {
> +        p->next_packet_size = 0;
> +        goto out;
> +    }
> +
>      multifd_send_prepare_header(p);
>
>      z->out.dst = z->zbuff;
>      z->out.size = z->zbuff_len;
>      z->out.pos = 0;
>
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          ZSTD_EndDirective flush = ZSTD_e_continue;
>
> -        if (i == pages->num - 1) {
> +        if (i == pages->normal_num - 1) {
>              flush = ZSTD_e_flush;
>          }
> -        z->in.src = p->pages->block->host + pages->offset[i];
> +        z->in.src = p->pages->block->host + pages->normal[i];
>          z->in.size = p->page_size;
>          z->in.pos = 0;
>
> @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p,
> Error **errp)
>      p->iov[p->iovs_num].iov_len = z->out.pos;
>      p->iovs_num++;
>      p->next_packet_size = z->out.pos;
> -    p->flags |= MULTIFD_FLAG_ZSTD;
>
> +out:
> +    p->flags |= MULTIFD_FLAG_ZSTD;
>      multifd_send_fill_packet(p);
> -
>
Spurious removal.


>      return 0;
>  }
>
> @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p,
> Error **errp)
>                     p->id, flags, MULTIFD_FLAG_ZSTD);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        assert(in_size == 0);
> +        return 0;
> +    }
> +
>
Same question here about assert.


>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>
>      if (ret != 0) {
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a33dba40d9..fbb40ea10b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -11,6 +11,7 @@
>   */
>
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qemu/rcu.h"
>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
> @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p,
> Error **errp)
>      MultiFDPages_t *pages = p->pages;
>      int ret;
>
> +    multifd_zero_page_check_send(p);
> +
>      if (!use_zero_copy_send) {
>          /*
>           * Only !zerocopy needs the header in IOV; zerocopy will
> @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p,
> Error **errp)
>          multifd_send_prepare_header(p);
>      }
>
> -    for (int i = 0; i < pages->num; i++) {
> -        p->iov[p->iovs_num].iov_base = pages->block->host +
> pages->offset[i];
> +    for (int i = 0; i < pages->normal_num; i++) {
> +        p->iov[p->iovs_num].iov_base = pages->block->host +
> pages->normal[i];
>          p->iov[p->iovs_num].iov_len = p->page_size;
>          p->iovs_num++;
>      }
>
> -    p->next_packet_size = pages->num * p->page_size;
> +    p->next_packet_size = pages->normal_num * p->page_size;
>      p->flags |= MULTIFD_FLAG_NOCOMP;
>
>      multifd_send_fill_packet(p);
> @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p,
> Error **errp)
>                     p->id, flags, MULTIFD_FLAG_NOCOMP);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        return 0;
> +    }
> +
>      for (int i = 0; i < p->normal_num; i++) {
>          p->iov[i].iov_base = p->host + p->normal[i];
>          p->iov[i].iov_len = p->page_size;
> @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>
>      packet->flags = cpu_to_be32(p->flags);
>      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> -    packet->normal_pages = cpu_to_be32(pages->num);
> +    packet->normal_pages = cpu_to_be32(pages->normal_num);
>      packet->zero_pages = cpu_to_be32(pages->zero_num);
>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>
> @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>          strncpy(packet->ramblock, pages->block->idstr, 256);
>      }
>
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          /* there are architectures where ram_addr_t is 32 bit */
> -        uint64_t temp = pages->offset[i];
> +        uint64_t temp = pages->normal[i];
>
>          packet->offset[i] = cpu_to_be64(temp);
>      }
>
> +    for (i = 0; i < pages->zero_num; i++) {
> +        /* there are architectures where ram_addr_t is 32 bit */
> +        uint64_t temp = pages->zero[i];
> +
> +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> +    }
> +
>      p->packets_sent++;
> -    p->total_normal_pages += pages->num;
> +    p->total_normal_pages += pages->normal_num;
>      p->total_zero_pages += pages->zero_num;
>
> -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> +    trace_multifd_send(p->id, packet_num, pages->normal_num,
> pages->zero_num,
>                         p->flags, p->next_packet_size);
>  }
>
> @@ -451,6 +468,18 @@ static int
> multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          p->normal[i] = offset;
>      }
>
> +    for (i = 0; i < p->zero_num; i++) {
> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> +
> +        if (offset > (p->block->used_length - p->page_size)) {
> +            error_setg(errp, "multifd: offset too long %" PRIu64
> +                       " (max " RAM_ADDR_FMT ")",
> +                       offset, p->block->used_length);
> +            return -1;
> +        }
> +        p->zero[i] = offset;
> +    }
> +
>      return 0;
>  }
>
> @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
>
>              stat64_add(&mig_stats.multifd_bytes,
>                         p->next_packet_size + p->packet_len);
> -            stat64_add(&mig_stats.normal_pages, pages->num);
> +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
>              stat64_add(&mig_stats.zero_pages, pages->zero_num);
>
>              multifd_pages_reset(p->pages);
> @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
>          p->flags &= ~MULTIFD_FLAG_SYNC;
>          qemu_mutex_unlock(&p->mutex);
>
> -        if (p->normal_num) {
> +        if (p->normal_num + p->zero_num) {
> +            assert(!(flags & MULTIFD_FLAG_SYNC));
>
This assertion seems to be not relevant to this patch. Could you post it
separately and explain why it's needed here?


>              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
>              if (ret != 0) {
>                  break;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9822ff298a..125f0bbe60 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -53,6 +53,11 @@ typedef struct {
>      uint32_t unused32[1];    /* Reserved for future use */
>      uint64_t unused64[3];    /* Reserved for future use */
>      char ramblock[256];
> +    /*
> +     * This array contains the pointers to:
> +     *  - normal pages (initial normal_pages entries)
> +     *  - zero pages (following zero_pages entries)
> +     */
>      uint64_t offset[];
>  } __attribute__((packed)) MultiFDPacket_t;
>
> @@ -224,6 +229,8 @@ typedef struct {
>
>  void multifd_register_ops(int method, MultiFDMethods *ops);
>  void multifd_send_fill_packet(MultiFDSendParams *p);
> +void multifd_zero_page_check_send(MultiFDSendParams *p);
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
>
>  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>  {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 99843a8e95..e2450b92d4 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -660,9 +660,11 @@
>  #
>  # @none: Do not perform zero page checking.
>  #
> +# @multifd: Perform zero page checking on the multifd sender thread.
> (since 9.0)
> +#
>  ##
>  { 'enum': 'ZeroPageDetection',
> -  'data': [ 'legacy', 'none' ] }
> +  'data': [ 'legacy', 'none', 'multifd' ] }
>
>  ##
>  # @BitmapMigrationBitmapAliasTransform:
> --
> 2.30.2
>
>
>

-- 
Elena

[-- Attachment #2: Type: text/html, Size: 18997 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
@ 2024-02-21 16:11   ` Elena Ufimtseva
  2024-02-23  5:24     ` [External] " Hao Xiang
  2024-02-21 21:06   ` Fabiano Rosas
  1 sibling, 1 reply; 42+ messages in thread
From: Elena Ufimtseva @ 2024-02-21 16:11 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

[-- Attachment #1: Type: text/plain, Size: 5476 bytes --]

On Fri, Feb 16, 2024 at 2:42 PM Hao Xiang <hao.xiang@bytedance.com> wrote:

> This change adds a dedicated handler for
> MigrationOps::ram_save_target_page in
> multifd live migration. Now zero page checking can be done in the multifd
> threads
> and this becomes the default configuration. We still provide backward
> compatibility
> where zero page checking is done from the migration main thread.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/multifd.c |  1 +
>  migration/options.c |  2 +-
>  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
>  3 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index fbb40ea10b..ef5dad1019 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/cutils.h"
>  #include "qemu/rcu.h"
> +#include "qemu/cutils.h"
>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
>  #include "exec/ramblock.h"
> diff --git a/migration/options.c b/migration/options.c
> index 3c603391b0..3c79b6ccd4 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -181,7 +181,7 @@ Property migration_properties[] = {
>                        MIG_MODE_NORMAL),
>      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
>                         parameters.zero_page_detection,
> -                       ZERO_PAGE_DETECTION_LEGACY),
> +                       ZERO_PAGE_DETECTION_MULTIFD),
>
>      /* Migration capabilities */
>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> diff --git a/migration/ram.c b/migration/ram.c
> index 5ece9f042e..b088c5a98c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs,
> PageSearchStatus *pss,
>      QEMUFile *file = pss->pss_channel;
>      int len = 0;
>
> -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> -        return 0;
> -    }
> -
>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>          return 0;
>      }
> @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs,
> PageSearchStatus *pss)
>
>  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
>  {
> +    assert(migrate_multifd());
>
We only call ram_save_multifd_page() if:
 if (migrate_multifd()) {
        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
So this assert is not needed.

+    assert(!migrate_compress());
>
+    assert(!migration_in_postcopy());
>
These two are redundant and done before we call in here.

+
>      if (!multifd_queue_page(block, offset)) {
>          return -1;
>      }
> @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs,
> PageSearchStatus *pss,
>   */
>  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus
> *pss)
>  {
> -    RAMBlock *block = pss->block;
>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>      int res;
>
> @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState
> *rs, PageSearchStatus *pss)
>          return 1;
>      }
>
> +    return ram_save_page(rs, pss);
> +}
> +
> +/**
> + * ram_save_target_page_multifd: save one target page
> + *
> + * Returns the number of pages written
> + *
> + * @rs: current RAM state
> + * @pss: data about the page we want to send
> + */
> +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus
> *pss)
> +{
> +    RAMBlock *block = pss->block;
> +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> +
> +    /* Multifd is not compatible with old compression. */
> +    assert(!migrate_compress());
>
Do we need to check this for every page?


> +    /* Multifd is not compabible with postcopy. */
> +    assert(!migration_in_postcopy());
> +
>      /*
> -     * Do not use multifd in postcopy as one whole host page should be
> -     * placed.  Meanwhile postcopy requires atomic update of pages, so
> even
> -     * if host page size == guest page size the dest guest during run may
> -     * still see partially copied pages which is data corruption.
> +     * Backward compatibility support. While using multifd live
> +     * migration, we still need to handle zero page checking on the
> +     * migration main thread.
>       */
> -    if (migrate_multifd() && !migration_in_postcopy()) {
> -        return ram_save_multifd_page(block, offset);
> +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> +        if (save_zero_page(rs, pss, offset)) {
> +            return 1;
> +        }
>      }
>
> -    return ram_save_page(rs, pss);
> +    return ram_save_multifd_page(block, offset);
>  }
>
>  /* Should be called before sending a host page */
> @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      }
>
>      migration_ops = g_malloc0(sizeof(MigrationOps));
> -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> +
> +    if (migrate_multifd()) {
> +        migration_ops->ram_save_target_page =
> ram_save_target_page_multifd;
> +    } else {
> +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> +    }
>
>      bql_unlock();
>      ret = multifd_send_sync_main();
> --
> 2.30.2
>
>
>

-- 
Elena

[-- Attachment #2: Type: text/html, Size: 7234 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking.
  2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
@ 2024-02-21 20:59   ` Fabiano Rosas
  2024-02-23  4:20     ` [External] " Hao Xiang
  0 siblings, 1 reply; 42+ messages in thread
From: Fabiano Rosas @ 2024-02-21 20:59 UTC (permalink / raw)
  To: Hao Xiang, pbonzini, berrange, eduardo, peterx, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

Hao Xiang <hao.xiang@bytedance.com> writes:

> Now that zero page checking is done on the multifd sender threads by
> default, we still provide an option for backward compatibility. This
> change adds a qtest migration test case to set the zero-page-detection
> option to "legacy" and run multifd migration with zero page checking on the
> migration main thread.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  tests/qtest/migration-test.c | 52 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 52 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 8a5bb1752e..c27083110a 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2621,6 +2621,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
>      return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
>  }
>  
> +static void *
> +test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
> +                                                        QTestState *to)
> +{
> +    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
> +    migrate_set_parameter_str(from, "zero-page-detection", "legacy");
> +    return NULL;
> +}
> +
> +static void *
> +test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
> +                                                      QTestState *to)
> +{
> +    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
> +    migrate_set_parameter_str(from, "zero-page-detection", "none");
> +    return NULL;
> +}
> +
>  static void *
>  test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
>                                              QTestState *to)
> @@ -2652,6 +2670,36 @@ static void test_multifd_tcp_none(void)
>      test_precopy_common(&args);
>  }
>  
> +static void test_multifd_tcp_zero_page_legacy(void)
> +{
> +    MigrateCommon args = {
> +        .listen_uri = "defer",
> +        .start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
> +        /*
> +         * Multifd is more complicated than most of the features, it
> +         * directly takes guest page buffers when sending, make sure
> +         * everything will work alright even if guest page is changing.
> +         */
> +        .live = true,
> +    };
> +    test_precopy_common(&args);
> +}
> +
> +static void test_multifd_tcp_no_zero_page(void)
> +{
> +    MigrateCommon args = {
> +        .listen_uri = "defer",
> +        .start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
> +        /*
> +         * Multifd is more complicated than most of the features, it
> +         * directly takes guest page buffers when sending, make sure
> +         * everything will work alright even if guest page is changing.
> +         */
> +        .live = true,
> +    };
> +    test_precopy_common(&args);
> +}
> +
>  static void test_multifd_tcp_zlib(void)
>  {
>      MigrateCommon args = {
> @@ -3550,6 +3598,10 @@ int main(int argc, char **argv)
>      }
>      migration_test_add("/migration/multifd/tcp/plain/none",
>                         test_multifd_tcp_none);
> +    migration_test_add("/migration/multifd/tcp/plain/zero_page_legacy",
> +                       test_multifd_tcp_zero_page_legacy);
> +    migration_test_add("/migration/multifd/tcp/plain/no_zero_page",
> +                       test_multifd_tcp_no_zero_page);

Here it's better to separate the main feature from the states. That way
we can run only the zero-page tests with:

 migration-test -r /x86_64/migration/multifd/tcp/plain/zero-page

Like so: (also dashes instead of underscores)
/zero-page/legacy
/zero-page/none

>      migration_test_add("/migration/multifd/tcp/plain/cancel",
>                         test_multifd_tcp_cancel);
>      migration_test_add("/migration/multifd/tcp/plain/zlib",


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
                     ` (2 preceding siblings ...)
  2024-02-21 16:00   ` Elena Ufimtseva
@ 2024-02-21 21:04   ` Fabiano Rosas
  2024-02-23  2:20     ` Peter Xu
  2024-02-23  5:18     ` Hao Xiang
  3 siblings, 2 replies; 42+ messages in thread
From: Fabiano Rosas @ 2024-02-21 21:04 UTC (permalink / raw)
  To: Hao Xiang, pbonzini, berrange, eduardo, peterx, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

Hao Xiang <hao.xiang@bytedance.com> writes:

> 1. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 3. Add proper asserts to ensure pages->normal are used for normal pages
> in all scenarios.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/meson.build         |  1 +
>  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
>  migration/multifd-zlib.c      | 26 ++++++++++++---
>  migration/multifd-zstd.c      | 25 ++++++++++++---
>  migration/multifd.c           | 50 +++++++++++++++++++++++------
>  migration/multifd.h           |  7 +++++
>  qapi/migration.json           |  4 ++-
>  7 files changed, 151 insertions(+), 21 deletions(-)
>  create mode 100644 migration/multifd-zero-page.c
>
> diff --git a/migration/meson.build b/migration/meson.build
> index 92b1cc4297..1eeb915ff6 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -22,6 +22,7 @@ system_ss.add(files(
>    'migration.c',
>    'multifd.c',
>    'multifd-zlib.c',
> +  'multifd-zero-page.c',
>    'ram-compress.c',
>    'options.c',
>    'postcopy-ram.c',
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> new file mode 100644
> index 0000000000..f0cd8e2c53
> --- /dev/null
> +++ b/migration/multifd-zero-page.c
> @@ -0,0 +1,59 @@
> +/*
> + * Multifd zero page detection implementation.
> + *
> + * Copyright (c) 2024 Bytedance Inc
> + *
> + * Authors:
> + *  Hao Xiang <hao.xiang@bytedance.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include "exec/ramblock.h"
> +#include "migration.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "ram.h"
> +
> +void multifd_zero_page_check_send(MultiFDSendParams *p)
> +{
> +    /*
> +     * QEMU older than 9.0 don't understand zero page
> +     * on multifd channel. This switch is required to
> +     * maintain backward compatibility.
> +     */
> +    bool use_multifd_zero_page =
> +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> +    MultiFDPages_t *pages = p->pages;
> +    RAMBlock *rb = pages->block;
> +
> +    assert(pages->num != 0);
> +    assert(pages->normal_num == 0);
> +    assert(pages->zero_num == 0);

We can drop these before the final version.

> +
> +    for (int i = 0; i < pages->num; i++) {
> +        uint64_t offset = pages->offset[i];
> +        if (use_multifd_zero_page &&
> +            buffer_is_zero(rb->host + offset, p->page_size)) {
> +            pages->zero[pages->zero_num] = offset;
> +            pages->zero_num++;
> +            ram_release_page(rb->idstr, offset);
> +        } else {
> +            pages->normal[pages->normal_num] = offset;
> +            pages->normal_num++;
> +        }
> +    }

I don't think it's super clean to have three arrays offset, zero and
normal, all sized for the full packet size. It might be possible to just
carry a bitmap of non-zero pages along with pages->offset and operate on
that instead.

What do you think?

Peter, any ideas? Should we just leave this for another time?

> +}
> +
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> +{
> +    for (int i = 0; i < p->zero_num; i++) {
> +        void *page = p->host + p->zero[i];
> +        if (!buffer_is_zero(page, p->page_size)) {
> +            memset(page, 0, p->page_size);
> +        }
> +    }
> +}
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 012e3bdea1..cdfe0fa70e 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>      int ret;
>      uint32_t i;
>  
> +    multifd_zero_page_check_send(p);
> +
> +    if (!pages->normal_num) {
> +        p->next_packet_size = 0;
> +        goto out;
> +    }
> +
>      multifd_send_prepare_header(p);
>  
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          uint32_t available = z->zbuff_len - out_size;
>          int flush = Z_NO_FLUSH;
>  
> -        if (i == pages->num - 1) {
> +        if (i == pages->normal_num - 1) {
>              flush = Z_SYNC_FLUSH;
>          }
>  
> @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>           * with compression. zlib does not guarantee that this is safe,
>           * therefore copy the page before calling deflate().
>           */
> -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
>          zs->avail_in = p->page_size;
>          zs->next_in = z->buf;
>  
> @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>      p->iov[p->iovs_num].iov_len = out_size;
>      p->iovs_num++;
>      p->next_packet_size = out_size;
> -    p->flags |= MULTIFD_FLAG_ZLIB;
>  
> +out:
> +    p->flags |= MULTIFD_FLAG_ZLIB;
>      multifd_send_fill_packet(p);
> -
>      return 0;
>  }
>  
> @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>                     p->id, flags, MULTIFD_FLAG_ZLIB);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        assert(in_size == 0);
> +        return 0;
> +    }
> +
>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>  
>      if (ret != 0) {
> @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>                     p->id, out_size, expected_size);
>          return -1;
>      }
> +
>      return 0;
>  }
>  
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index dc8fe43e94..27a1eba075 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>      int ret;
>      uint32_t i;
>  
> +    multifd_zero_page_check_send(p);
> +
> +    if (!pages->normal_num) {
> +        p->next_packet_size = 0;
> +        goto out;
> +    }
> +
>      multifd_send_prepare_header(p);
>  
>      z->out.dst = z->zbuff;
>      z->out.size = z->zbuff_len;
>      z->out.pos = 0;
>  
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          ZSTD_EndDirective flush = ZSTD_e_continue;
>  
> -        if (i == pages->num - 1) {
> +        if (i == pages->normal_num - 1) {
>              flush = ZSTD_e_flush;
>          }
> -        z->in.src = p->pages->block->host + pages->offset[i];
> +        z->in.src = p->pages->block->host + pages->normal[i];
>          z->in.size = p->page_size;
>          z->in.pos = 0;
>  
> @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>      p->iov[p->iovs_num].iov_len = z->out.pos;
>      p->iovs_num++;
>      p->next_packet_size = z->out.pos;
> -    p->flags |= MULTIFD_FLAG_ZSTD;
>  
> +out:
> +    p->flags |= MULTIFD_FLAG_ZSTD;
>      multifd_send_fill_packet(p);
> -
>      return 0;
>  }
>  
> @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>                     p->id, flags, MULTIFD_FLAG_ZSTD);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        assert(in_size == 0);
> +        return 0;
> +    }
> +
>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>  
>      if (ret != 0) {
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a33dba40d9..fbb40ea10b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
>  #include "qemu/rcu.h"
>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
> @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>      MultiFDPages_t *pages = p->pages;
>      int ret;
>  
> +    multifd_zero_page_check_send(p);
> +
>      if (!use_zero_copy_send) {
>          /*
>           * Only !zerocopy needs the header in IOV; zerocopy will
> @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>          multifd_send_prepare_header(p);
>      }
>  
> -    for (int i = 0; i < pages->num; i++) {
> -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> +    for (int i = 0; i < pages->normal_num; i++) {
> +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
>          p->iov[p->iovs_num].iov_len = p->page_size;
>          p->iovs_num++;
>      }
>  
> -    p->next_packet_size = pages->num * p->page_size;
> +    p->next_packet_size = pages->normal_num * p->page_size;
>      p->flags |= MULTIFD_FLAG_NOCOMP;
>  
>      multifd_send_fill_packet(p);
> @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>                     p->id, flags, MULTIFD_FLAG_NOCOMP);
>          return -1;
>      }
> +
> +    multifd_zero_page_check_recv(p);
> +
> +    if (!p->normal_num) {
> +        return 0;
> +    }
> +
>      for (int i = 0; i < p->normal_num; i++) {
>          p->iov[i].iov_base = p->host + p->normal[i];
>          p->iov[i].iov_len = p->page_size;
> @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>  
>      packet->flags = cpu_to_be32(p->flags);
>      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> -    packet->normal_pages = cpu_to_be32(pages->num);
> +    packet->normal_pages = cpu_to_be32(pages->normal_num);
>      packet->zero_pages = cpu_to_be32(pages->zero_num);
>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>  
> @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>          strncpy(packet->ramblock, pages->block->idstr, 256);
>      }
>  
> -    for (i = 0; i < pages->num; i++) {
> +    for (i = 0; i < pages->normal_num; i++) {
>          /* there are architectures where ram_addr_t is 32 bit */
> -        uint64_t temp = pages->offset[i];
> +        uint64_t temp = pages->normal[i];
>  
>          packet->offset[i] = cpu_to_be64(temp);
>      }
>  
> +    for (i = 0; i < pages->zero_num; i++) {
> +        /* there are architectures where ram_addr_t is 32 bit */
> +        uint64_t temp = pages->zero[i];
> +
> +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> +    }
> +
>      p->packets_sent++;
> -    p->total_normal_pages += pages->num;
> +    p->total_normal_pages += pages->normal_num;
>      p->total_zero_pages += pages->zero_num;
>  
> -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
>                         p->flags, p->next_packet_size);
>  }
>  
> @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>          p->normal[i] = offset;
>      }
>  
> +    for (i = 0; i < p->zero_num; i++) {
> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> +
> +        if (offset > (p->block->used_length - p->page_size)) {
> +            error_setg(errp, "multifd: offset too long %" PRIu64
> +                       " (max " RAM_ADDR_FMT ")",
> +                       offset, p->block->used_length);
> +            return -1;
> +        }
> +        p->zero[i] = offset;
> +    }
> +
>      return 0;
>  }
>  
> @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
>  
>              stat64_add(&mig_stats.multifd_bytes,
>                         p->next_packet_size + p->packet_len);
> -            stat64_add(&mig_stats.normal_pages, pages->num);
> +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
>              stat64_add(&mig_stats.zero_pages, pages->zero_num);
>  
>              multifd_pages_reset(p->pages);
> @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
>          p->flags &= ~MULTIFD_FLAG_SYNC;
>          qemu_mutex_unlock(&p->mutex);
>  
> -        if (p->normal_num) {
> +        if (p->normal_num + p->zero_num) {
> +            assert(!(flags & MULTIFD_FLAG_SYNC));

This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
along with the data packet.

>              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
>              if (ret != 0) {
>                  break;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9822ff298a..125f0bbe60 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -53,6 +53,11 @@ typedef struct {
>      uint32_t unused32[1];    /* Reserved for future use */
>      uint64_t unused64[3];    /* Reserved for future use */
>      char ramblock[256];
> +    /*
> +     * This array contains the pointers to:
> +     *  - normal pages (initial normal_pages entries)
> +     *  - zero pages (following zero_pages entries)
> +     */
>      uint64_t offset[];
>  } __attribute__((packed)) MultiFDPacket_t;
>  
> @@ -224,6 +229,8 @@ typedef struct {
>  
>  void multifd_register_ops(int method, MultiFDMethods *ops);
>  void multifd_send_fill_packet(MultiFDSendParams *p);
> +void multifd_zero_page_check_send(MultiFDSendParams *p);
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
>  
>  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>  {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 99843a8e95..e2450b92d4 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -660,9 +660,11 @@
>  #
>  # @none: Do not perform zero page checking.
>  #
> +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> +#
>  ##
>  { 'enum': 'ZeroPageDetection',
> -  'data': [ 'legacy', 'none' ] }
> +  'data': [ 'legacy', 'none', 'multifd' ] }
>  
>  ##
>  # @BitmapMigrationBitmapAliasTransform:


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
  2024-02-21 16:11   ` Elena Ufimtseva
@ 2024-02-21 21:06   ` Fabiano Rosas
  2024-02-23  2:33     ` Peter Xu
  2024-02-23  5:47     ` Hao Xiang
  1 sibling, 2 replies; 42+ messages in thread
From: Fabiano Rosas @ 2024-02-21 21:06 UTC (permalink / raw)
  To: Hao Xiang, pbonzini, berrange, eduardo, peterx, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar
  Cc: Hao Xiang

Hao Xiang <hao.xiang@bytedance.com> writes:

> This change adds a dedicated handler for MigrationOps::ram_save_target_page in

nit: Add a dedicated handler...

Usually "this patch/change" is used only when necessary to avoid
ambiguity.

> multifd live migration. Now zero page checking can be done in the multifd threads
> and this becomes the default configuration. We still provide backward compatibility
> where zero page checking is done from the migration main thread.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  migration/multifd.c |  1 +
>  migration/options.c |  2 +-
>  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
>  3 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index fbb40ea10b..ef5dad1019 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -13,6 +13,7 @@
>  #include "qemu/osdep.h"
>  #include "qemu/cutils.h"

This include...

>  #include "qemu/rcu.h"
> +#include "qemu/cutils.h"

is there already.

>  #include "exec/target_page.h"
>  #include "sysemu/sysemu.h"
>  #include "exec/ramblock.h"
> diff --git a/migration/options.c b/migration/options.c
> index 3c603391b0..3c79b6ccd4 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -181,7 +181,7 @@ Property migration_properties[] = {
>                        MIG_MODE_NORMAL),
>      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
>                         parameters.zero_page_detection,
> -                       ZERO_PAGE_DETECTION_LEGACY),
> +                       ZERO_PAGE_DETECTION_MULTIFD),

I think we'll need something to avoid a 9.0 -> 8.2 migration with this
enabled. Otherwise it will go along happily until we get data corruption
because the new QEMU didn't send any zero pages on the migration thread
and the old QEMU did not look for them in the multifd packet.

Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
in use. We'd just need to fix the test in the new QEMU to check
(msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).

>  
>      /* Migration capabilities */
>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> diff --git a/migration/ram.c b/migration/ram.c
> index 5ece9f042e..b088c5a98c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>      QEMUFile *file = pss->pss_channel;
>      int len = 0;
>  
> -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> -        return 0;
> -    }

How does 'none' work now?

> -
>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>          return 0;
>      }
> @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>  
>  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
>  {
> +    assert(migrate_multifd());
> +    assert(!migrate_compress());
> +    assert(!migration_in_postcopy());

Drop these, please. Keep only the asserts that are likely to trigger
during development, such as the existing ones at multifd_send_pages.

> +
>      if (!multifd_queue_page(block, offset)) {
>          return -1;
>      }
> @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
>   */
>  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>  {
> -    RAMBlock *block = pss->block;
>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>      int res;
>  
> @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>          return 1;
>      }
>  
> +    return ram_save_page(rs, pss);

Look at where git put this! Are you using the default diff algorithm? If
not try using --patience to see if it improves the diff.

> +}
> +
> +/**
> + * ram_save_target_page_multifd: save one target page
> + *
> + * Returns the number of pages written

We could be more precise here:

 ram_save_target_page_multifd: send one target page to multifd workers
 
 Returns 1 if the page was queued, -1 otherwise.

> + *
> + * @rs: current RAM state
> + * @pss: data about the page we want to send
> + */
> +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> +{
> +    RAMBlock *block = pss->block;
> +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> +
> +    /* Multifd is not compatible with old compression. */
> +    assert(!migrate_compress());

This should already be enforced at options.c.

> +
> +    /* Multifd is not compabible with postcopy. */
> +    assert(!migration_in_postcopy());

Same here.

> +
>      /*
> -     * Do not use multifd in postcopy as one whole host page should be
> -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> -     * if host page size == guest page size the dest guest during run may
> -     * still see partially copied pages which is data corruption.
> +     * Backward compatibility support. While using multifd live
> +     * migration, we still need to handle zero page checking on the
> +     * migration main thread.
>       */
> -    if (migrate_multifd() && !migration_in_postcopy()) {
> -        return ram_save_multifd_page(block, offset);
> +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> +        if (save_zero_page(rs, pss, offset)) {
> +            return 1;
> +        }
>      }
>  
> -    return ram_save_page(rs, pss);
> +    return ram_save_multifd_page(block, offset);
>  }
>  
>  /* Should be called before sending a host page */
> @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      }
>  
>      migration_ops = g_malloc0(sizeof(MigrationOps));
> -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> +
> +    if (migrate_multifd()) {
> +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> +    } else {
> +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> +    }
>  
>      bql_unlock();
>      ret = multifd_send_sync_main();


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
  2024-02-21 12:03   ` Markus Armbruster
  2024-02-21 13:58   ` Elena Ufimtseva
@ 2024-02-22 10:36   ` Peter Xu
  2024-02-26  7:18   ` Wang, Lei
  3 siblings, 0 replies; 42+ messages in thread
From: Peter Xu @ 2024-02-22 10:36 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, farosas, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Fri, Feb 16, 2024 at 10:39:56PM +0000, Hao Xiang wrote:
> @@ -1123,6 +1123,10 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>      QEMUFile *file = pss->pss_channel;
>      int len = 0;
>  
> +    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> +        return 0;
> +    }

Nitpick: use "== NONE" here seems clearer.

> +
>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>          return 0;
>      }

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-21 21:04   ` Fabiano Rosas
@ 2024-02-23  2:20     ` Peter Xu
  2024-02-23  5:15       ` [External] " Hao Xiang
  2024-02-23  5:18     ` Hao Xiang
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Xu @ 2024-02-23  2:20 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Hao Xiang, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 06:04:10PM -0300, Fabiano Rosas wrote:
> Hao Xiang <hao.xiang@bytedance.com> writes:
> 
> > 1. Implements the zero page detection and handling on the multifd
> > threads for non-compression, zlib and zstd compression backends.
> > 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > 3. Add proper asserts to ensure pages->normal are used for normal pages
> > in all scenarios.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  migration/meson.build         |  1 +
> >  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
> >  migration/multifd-zlib.c      | 26 ++++++++++++---
> >  migration/multifd-zstd.c      | 25 ++++++++++++---
> >  migration/multifd.c           | 50 +++++++++++++++++++++++------
> >  migration/multifd.h           |  7 +++++
> >  qapi/migration.json           |  4 ++-
> >  7 files changed, 151 insertions(+), 21 deletions(-)
> >  create mode 100644 migration/multifd-zero-page.c
> >
> > diff --git a/migration/meson.build b/migration/meson.build
> > index 92b1cc4297..1eeb915ff6 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -22,6 +22,7 @@ system_ss.add(files(
> >    'migration.c',
> >    'multifd.c',
> >    'multifd-zlib.c',
> > +  'multifd-zero-page.c',
> >    'ram-compress.c',
> >    'options.c',
> >    'postcopy-ram.c',
> > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > new file mode 100644
> > index 0000000000..f0cd8e2c53
> > --- /dev/null
> > +++ b/migration/multifd-zero-page.c
> > @@ -0,0 +1,59 @@
> > +/*
> > + * Multifd zero page detection implementation.
> > + *
> > + * Copyright (c) 2024 Bytedance Inc
> > + *
> > + * Authors:
> > + *  Hao Xiang <hao.xiang@bytedance.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/cutils.h"
> > +#include "exec/ramblock.h"
> > +#include "migration.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "ram.h"
> > +
> > +void multifd_zero_page_check_send(MultiFDSendParams *p)
> > +{
> > +    /*
> > +     * QEMU older than 9.0 don't understand zero page
> > +     * on multifd channel. This switch is required to
> > +     * maintain backward compatibility.
> > +     */
> > +    bool use_multifd_zero_page =
> > +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> > +    MultiFDPages_t *pages = p->pages;
> > +    RAMBlock *rb = pages->block;
> > +
> > +    assert(pages->num != 0);
> > +    assert(pages->normal_num == 0);
> > +    assert(pages->zero_num == 0);
> 
> We can drop these before the final version.
> 
> > +
> > +    for (int i = 0; i < pages->num; i++) {
> > +        uint64_t offset = pages->offset[i];
> > +        if (use_multifd_zero_page &&
> > +            buffer_is_zero(rb->host + offset, p->page_size)) {
> > +            pages->zero[pages->zero_num] = offset;
> > +            pages->zero_num++;
> > +            ram_release_page(rb->idstr, offset);
> > +        } else {
> > +            pages->normal[pages->normal_num] = offset;
> > +            pages->normal_num++;
> > +        }
> > +    }
> 
> I don't think it's super clean to have three arrays offset, zero and
> normal, all sized for the full packet size. It might be possible to just
> carry a bitmap of non-zero pages along with pages->offset and operate on
> that instead.
> 
> What do you think?
> 
> Peter, any ideas? Should we just leave this for another time?

Yeah I think a bitmap should save quite a few fields indeed, it'll however
make the latter iteration slightly harder by walking both (offset[],
bitmap), process the page only if bitmap is set for the offset.

IIUC we perhaps don't even need a bitmap?  AFAIU what we only need in
Multifdpages_t is one extra field to mark "how many normal pages", aka,
normal_num here (zero_num can be calculated from num-normal_num).  Then
the zero page detection logic should do two things:

  - Sort offset[] array so that it starts with normal pages, followed up by
    zero pages

  - Setup normal_num to be the number of normal pages

Then we reduce 2 new arrays (normal[], zero[]) + 2 new fields (normal_num,
zero_num) -> 1 new field (normal_num).  It'll also be trivial to fill the
packet header later because offset[] is exactly that.

Side note - I still think it's confusing to read this patch and previous
patch separately.  Obviously previous patch introduced these new fields
without justifying their values yet.  IMHO it'll be easier to review if you
merge the two patches.

> 
> > +}
> > +
> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > +{
> > +    for (int i = 0; i < p->zero_num; i++) {
> > +        void *page = p->host + p->zero[i];
> > +        if (!buffer_is_zero(page, p->page_size)) {
> > +            memset(page, 0, p->page_size);
> > +        }
> > +    }
> > +}
> > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> > index 012e3bdea1..cdfe0fa70e 100644
> > --- a/migration/multifd-zlib.c
> > +++ b/migration/multifd-zlib.c
> > @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >      int ret;
> >      uint32_t i;
> >  
> > +    multifd_zero_page_check_send(p);
> > +
> > +    if (!pages->normal_num) {
> > +        p->next_packet_size = 0;
> > +        goto out;
> > +    }
> > +
> >      multifd_send_prepare_header(p);
> >  
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          uint32_t available = z->zbuff_len - out_size;
> >          int flush = Z_NO_FLUSH;
> >  
> > -        if (i == pages->num - 1) {
> > +        if (i == pages->normal_num - 1) {
> >              flush = Z_SYNC_FLUSH;
> >          }
> >  
> > @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >           * with compression. zlib does not guarantee that this is safe,
> >           * therefore copy the page before calling deflate().
> >           */
> > -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> > +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
> >          zs->avail_in = p->page_size;
> >          zs->next_in = z->buf;
> >  
> > @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >      p->iov[p->iovs_num].iov_len = out_size;
> >      p->iovs_num++;
> >      p->next_packet_size = out_size;
> > -    p->flags |= MULTIFD_FLAG_ZLIB;
> >  
> > +out:
> > +    p->flags |= MULTIFD_FLAG_ZLIB;
> >      multifd_send_fill_packet(p);
> > -
> >      return 0;
> >  }
> >  
> > @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_ZLIB);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        assert(in_size == 0);
> > +        return 0;
> > +    }
> > +
> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> >  
> >      if (ret != 0) {
> > @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, out_size, expected_size);
> >          return -1;
> >      }
> > +
> >      return 0;
> >  }
> >  
> > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> > index dc8fe43e94..27a1eba075 100644
> > --- a/migration/multifd-zstd.c
> > +++ b/migration/multifd-zstd.c
> > @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> >      int ret;
> >      uint32_t i;
> >  
> > +    multifd_zero_page_check_send(p);
> > +
> > +    if (!pages->normal_num) {
> > +        p->next_packet_size = 0;
> > +        goto out;
> > +    }
> > +
> >      multifd_send_prepare_header(p);

If this forms a pattern we can introduce multifd_send_prepare_common():

bool multifd_send_prepare_common()
{
    multifd_zero_page_check_send();
    if (...) {
    
    }
    multifd_send_prepare_header();
}

> >  
> >      z->out.dst = z->zbuff;
> >      z->out.size = z->zbuff_len;
> >      z->out.pos = 0;
> >  
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          ZSTD_EndDirective flush = ZSTD_e_continue;
> >  
> > -        if (i == pages->num - 1) {
> > +        if (i == pages->normal_num - 1) {
> >              flush = ZSTD_e_flush;
> >          }
> > -        z->in.src = p->pages->block->host + pages->offset[i];
> > +        z->in.src = p->pages->block->host + pages->normal[i];
> >          z->in.size = p->page_size;
> >          z->in.pos = 0;
> >  
> > @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> >      p->iov[p->iovs_num].iov_len = z->out.pos;
> >      p->iovs_num++;
> >      p->next_packet_size = z->out.pos;
> > -    p->flags |= MULTIFD_FLAG_ZSTD;
> >  
> > +out:
> > +    p->flags |= MULTIFD_FLAG_ZSTD;
> >      multifd_send_fill_packet(p);
> > -
> >      return 0;
> >  }
> >  
> > @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_ZSTD);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        assert(in_size == 0);
> > +        return 0;
> > +    }
> > +
> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> >  
> >      if (ret != 0) {
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index a33dba40d9..fbb40ea10b 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -11,6 +11,7 @@
> >   */
> >  
> >  #include "qemu/osdep.h"
> > +#include "qemu/cutils.h"
> >  #include "qemu/rcu.h"
> >  #include "exec/target_page.h"
> >  #include "sysemu/sysemu.h"
> > @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> >      MultiFDPages_t *pages = p->pages;
> >      int ret;
> >  
> > +    multifd_zero_page_check_send(p);
> > +
> >      if (!use_zero_copy_send) {
> >          /*
> >           * Only !zerocopy needs the header in IOV; zerocopy will
> > @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> >          multifd_send_prepare_header(p);
> >      }
> >  
> > -    for (int i = 0; i < pages->num; i++) {
> > -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> > +    for (int i = 0; i < pages->normal_num; i++) {
> > +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
> >          p->iov[p->iovs_num].iov_len = p->page_size;
> >          p->iovs_num++;
> >      }
> >  
> > -    p->next_packet_size = pages->num * p->page_size;
> > +    p->next_packet_size = pages->normal_num * p->page_size;
> >      p->flags |= MULTIFD_FLAG_NOCOMP;
> >  
> >      multifd_send_fill_packet(p);
> > @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_NOCOMP);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        return 0;
> > +    }
> > +
> >      for (int i = 0; i < p->normal_num; i++) {
> >          p->iov[i].iov_base = p->host + p->normal[i];
> >          p->iov[i].iov_len = p->page_size;
> > @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> >  
> >      packet->flags = cpu_to_be32(p->flags);
> >      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> > -    packet->normal_pages = cpu_to_be32(pages->num);
> > +    packet->normal_pages = cpu_to_be32(pages->normal_num);
> >      packet->zero_pages = cpu_to_be32(pages->zero_num);
> >      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> >  
> > @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> >          strncpy(packet->ramblock, pages->block->idstr, 256);
> >      }
> >  
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          /* there are architectures where ram_addr_t is 32 bit */
> > -        uint64_t temp = pages->offset[i];
> > +        uint64_t temp = pages->normal[i];
> >  
> >          packet->offset[i] = cpu_to_be64(temp);
> >      }
> >  
> > +    for (i = 0; i < pages->zero_num; i++) {
> > +        /* there are architectures where ram_addr_t is 32 bit */
> > +        uint64_t temp = pages->zero[i];
> > +
> > +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> > +    }
> > +
> >      p->packets_sent++;
> > -    p->total_normal_pages += pages->num;
> > +    p->total_normal_pages += pages->normal_num;
> >      p->total_zero_pages += pages->zero_num;
> >  
> > -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> > +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
> >                         p->flags, p->next_packet_size);
> >  }
> >  
> > @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          p->normal[i] = offset;
> >      }
> >  
> > +    for (i = 0; i < p->zero_num; i++) {
> > +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> > +
> > +        if (offset > (p->block->used_length - p->page_size)) {
> > +            error_setg(errp, "multifd: offset too long %" PRIu64
> > +                       " (max " RAM_ADDR_FMT ")",
> > +                       offset, p->block->used_length);
> > +            return -1;
> > +        }
> > +        p->zero[i] = offset;
> > +    }
> > +
> >      return 0;
> >  }
> >  
> > @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
> >  
> >              stat64_add(&mig_stats.multifd_bytes,
> >                         p->next_packet_size + p->packet_len);
> > -            stat64_add(&mig_stats.normal_pages, pages->num);
> > +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
> >              stat64_add(&mig_stats.zero_pages, pages->zero_num);
> >  
> >              multifd_pages_reset(p->pages);
> > @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
> >          p->flags &= ~MULTIFD_FLAG_SYNC;
> >          qemu_mutex_unlock(&p->mutex);
> >  
> > -        if (p->normal_num) {
> > +        if (p->normal_num + p->zero_num) {
> > +            assert(!(flags & MULTIFD_FLAG_SYNC));
> 
> This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
> along with the data packet.
> 
> >              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
> >              if (ret != 0) {
> >                  break;
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 9822ff298a..125f0bbe60 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -53,6 +53,11 @@ typedef struct {
> >      uint32_t unused32[1];    /* Reserved for future use */
> >      uint64_t unused64[3];    /* Reserved for future use */
> >      char ramblock[256];
> > +    /*
> > +     * This array contains the pointers to:
> > +     *  - normal pages (initial normal_pages entries)
> > +     *  - zero pages (following zero_pages entries)
> > +     */
> >      uint64_t offset[];
> >  } __attribute__((packed)) MultiFDPacket_t;
> >  
> > @@ -224,6 +229,8 @@ typedef struct {
> >  
> >  void multifd_register_ops(int method, MultiFDMethods *ops);
> >  void multifd_send_fill_packet(MultiFDSendParams *p);
> > +void multifd_zero_page_check_send(MultiFDSendParams *p);
> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
> >  
> >  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> >  {
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 99843a8e95..e2450b92d4 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -660,9 +660,11 @@
> >  #
> >  # @none: Do not perform zero page checking.
> >  #
> > +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> > +#
> >  ##
> >  { 'enum': 'ZeroPageDetection',
> > -  'data': [ 'legacy', 'none' ] }
> > +  'data': [ 'legacy', 'none', 'multifd' ] }
> >  
> >  ##
> >  # @BitmapMigrationBitmapAliasTransform:
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-21 21:06   ` Fabiano Rosas
@ 2024-02-23  2:33     ` Peter Xu
  2024-02-23  6:02       ` [External] " Hao Xiang
  2024-02-23  5:47     ` Hao Xiang
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Xu @ 2024-02-23  2:33 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: Hao Xiang, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> Hao Xiang <hao.xiang@bytedance.com> writes:
> 
> > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
> 
> nit: Add a dedicated handler...
> 
> Usually "this patch/change" is used only when necessary to avoid
> ambiguity.
> 
> > multifd live migration. Now zero page checking can be done in the multifd threads
> > and this becomes the default configuration. We still provide backward compatibility
> > where zero page checking is done from the migration main thread.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  migration/multifd.c |  1 +
> >  migration/options.c |  2 +-
> >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> >  3 files changed, 42 insertions(+), 14 deletions(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index fbb40ea10b..ef5dad1019 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -13,6 +13,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/cutils.h"
> 
> This include...
> 
> >  #include "qemu/rcu.h"
> > +#include "qemu/cutils.h"
> 
> is there already.
> 
> >  #include "exec/target_page.h"
> >  #include "sysemu/sysemu.h"
> >  #include "exec/ramblock.h"
> > diff --git a/migration/options.c b/migration/options.c
> > index 3c603391b0..3c79b6ccd4 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> >                        MIG_MODE_NORMAL),
> >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> >                         parameters.zero_page_detection,
> > -                       ZERO_PAGE_DETECTION_LEGACY),
> > +                       ZERO_PAGE_DETECTION_MULTIFD),
> 
> I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> enabled. Otherwise it will go along happily until we get data corruption
> because the new QEMU didn't send any zero pages on the migration thread
> and the old QEMU did not look for them in the multifd packet.

It could be even worse, as the new QEMU will only attach "normal" pages
after the multifd packet, the old QEMU could read more than it could,
expecting all pages..

> 
> Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> in use. We'd just need to fix the test in the new QEMU to check
> (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).

IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
We should make sure when "legacy" is set, multifd ran the old protocol
(zero_num will always be 0, and will be ignored by old QEMUs, IIUC).

One more comment is, when repost please consider split this patch into two;
The new ram_save_target_page_multifd() hook can be done in another patch,
AFAIU.

> 
> >  
> >      /* Migration capabilities */
> >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5ece9f042e..b088c5a98c 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> >      QEMUFile *file = pss->pss_channel;
> >      int len = 0;
> >  
> > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > -        return 0;
> > -    }
> 
> How does 'none' work now?
> 
> > -
> >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> >          return 0;
> >      }
> > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> >  
> >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> >  {
> > +    assert(migrate_multifd());
> > +    assert(!migrate_compress());
> > +    assert(!migration_in_postcopy());
> 
> Drop these, please. Keep only the asserts that are likely to trigger
> during development, such as the existing ones at multifd_send_pages.
> 
> > +
> >      if (!multifd_queue_page(block, offset)) {
> >          return -1;
> >      }
> > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> >   */
> >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >  {
> > -    RAMBlock *block = pss->block;
> >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> >      int res;
> >  
> > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >          return 1;
> >      }
> >  
> > +    return ram_save_page(rs, pss);
> 
> Look at where git put this! Are you using the default diff algorithm? If
> not try using --patience to see if it improves the diff.
> 
> > +}
> > +
> > +/**
> > + * ram_save_target_page_multifd: save one target page
> > + *
> > + * Returns the number of pages written
> 
> We could be more precise here:
> 
>  ram_save_target_page_multifd: send one target page to multifd workers
>  
>  Returns 1 if the page was queued, -1 otherwise.
> 
> > + *
> > + * @rs: current RAM state
> > + * @pss: data about the page we want to send
> > + */
> > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > +{
> > +    RAMBlock *block = pss->block;
> > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > +
> > +    /* Multifd is not compatible with old compression. */
> > +    assert(!migrate_compress());
> 
> This should already be enforced at options.c.
> 
> > +
> > +    /* Multifd is not compabible with postcopy. */
> > +    assert(!migration_in_postcopy());
> 
> Same here.
> 
> > +
> >      /*
> > -     * Do not use multifd in postcopy as one whole host page should be
> > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > -     * if host page size == guest page size the dest guest during run may
> > -     * still see partially copied pages which is data corruption.
> > +     * Backward compatibility support. While using multifd live
> > +     * migration, we still need to handle zero page checking on the
> > +     * migration main thread.
> >       */
> > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > -        return ram_save_multifd_page(block, offset);
> > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > +        if (save_zero_page(rs, pss, offset)) {
> > +            return 1;
> > +        }
> >      }
> >  
> > -    return ram_save_page(rs, pss);
> > +    return ram_save_multifd_page(block, offset);
> >  }
> >  
> >  /* Should be called before sending a host page */
> > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >      }
> >  
> >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +
> > +    if (migrate_multifd()) {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > +    } else {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +    }
> >  
> >      bql_unlock();
> >      ret = multifd_send_sync_main();
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format.
  2024-02-21 15:37   ` Elena Ufimtseva
@ 2024-02-23  4:18     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:18 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 7:37 AM Elena Ufimtseva <ufimtseva@gmail.com> wrote:
>
>
>
> On Fri, Feb 16, 2024 at 2:41 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>>
>> This change adds zero page counters and updates multifd send/receive
>> tracing format to track the newly added counters.
>>
>> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> ---
>>  migration/multifd.c    | 43 ++++++++++++++++++++++++++++++++++--------
>>  migration/multifd.h    | 21 ++++++++++++++++++++-
>>  migration/ram.c        |  1 -
>>  migration/trace-events |  8 ++++----
>>  4 files changed, 59 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index adfe8c9a0a..a33dba40d9 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -236,6 +236,8 @@ static void multifd_pages_reset(MultiFDPages_t *pages)
>>       * overwritten later when reused.
>>       */
>>      pages->num = 0;
>> +    pages->normal_num = 0;
>> +    pages->zero_num = 0;
>>      pages->block = NULL;
>>  }
>>
>>
>> @@ -309,6 +311,8 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
>>
>>      pages->allocated = n;
>>      pages->offset = g_new0(ram_addr_t, n);
>> +    pages->normal = g_new0(ram_addr_t, n);
>> +    pages->zero = g_new0(ram_addr_t, n);
>>
>>
>>      return pages;
>>  }
>> @@ -319,6 +323,10 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
>>      pages->allocated = 0;
>>      g_free(pages->offset);
>>      pages->offset = NULL;
>> +    g_free(pages->normal);
>> +    pages->normal = NULL;
>> +    g_free(pages->zero);
>> +    pages->zero = NULL;
>>      g_free(pages);
>>  }
>>
>> @@ -332,6 +340,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>      packet->flags = cpu_to_be32(p->flags);
>>      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
>>      packet->normal_pages = cpu_to_be32(pages->num);
>> +    packet->zero_pages = cpu_to_be32(pages->zero_num);
>>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>>
>>      packet_num = qatomic_fetch_inc(&multifd_send_state->packet_num);
>> @@ -350,9 +359,10 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>
>>      p->packets_sent++;
>>      p->total_normal_pages += pages->num;
>> +    p->total_zero_pages += pages->zero_num;
>>
>> -    trace_multifd_send(p->id, packet_num, pages->num, p->flags,
>> -                       p->next_packet_size);
>> +    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
>> +                       p->flags, p->next_packet_size);
>>  }
>>
>>  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> @@ -393,20 +403,29 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>      p->normal_num = be32_to_cpu(packet->normal_pages);
>>      if (p->normal_num > packet->pages_alloc) {
>>          error_setg(errp, "multifd: received packet "
>> -                   "with %u pages and expected maximum pages are %u",
>> +                   "with %u normal pages and expected maximum pages are %u",
>>                     p->normal_num, packet->pages_alloc) ;
>>          return -1;
>>      }
>>
>> +    p->zero_num = be32_to_cpu(packet->zero_pages);
>> +    if (p->zero_num > packet->pages_alloc - p->normal_num) {
>> +        error_setg(errp, "multifd: received packet "
>> +                   "with %u zero pages and expected maximum zero pages are %u",
>> +                   p->zero_num, packet->pages_alloc - p->normal_num) ;
>> +        return -1;
>> +    }
>
>
> You could probably combine this check with normal_num against pages_alloc.
>>
>> +
>>      p->next_packet_size = be32_to_cpu(packet->next_packet_size);
>>      p->packet_num = be64_to_cpu(packet->packet_num);
>>      p->packets_recved++;
>>      p->total_normal_pages += p->normal_num;
>> +    p->total_zero_pages += p->zero_num;
>>
>> -    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
>> -                       p->next_packet_size);
>> +    trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->zero_num,
>> +                       p->flags, p->next_packet_size);
>>
>> -    if (p->normal_num == 0) {
>> +    if (p->normal_num == 0 && p->zero_num == 0) {
>>          return 0;
>>      }
>>
>> @@ -823,6 +842,8 @@ static void *multifd_send_thread(void *opaque)
>>
>>              stat64_add(&mig_stats.multifd_bytes,
>>                         p->next_packet_size + p->packet_len);
>> +            stat64_add(&mig_stats.normal_pages, pages->num);
>
>
> That seems wrong. pages->num is the number of pages total in the packet.
> But next patch changes it, so I suggest or change it here and not in 3/7.

In this patch, multifd zero pages are not enabled yet. So pages->num
are the number of normal pages not pages total in the packet. The zero
pages were send in a different format in save_zero_page. Later on,
when multifd zero page is enabled, pages->normal_num counts the number
of normal pages and hence the accounting is changed.

>
>> +            stat64_add(&mig_stats.zero_pages, pages->zero_num);
>>
>>              multifd_pages_reset(p->pages);
>>              p->next_packet_size = 0;
>> @@ -866,7 +887,8 @@ out:
>>
>>      rcu_unregister_thread();
>>      migration_threads_remove(thread);
>> -    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages);
>> +    trace_multifd_send_thread_end(p->id, p->packets_sent, p->total_normal_pages,
>> +                                  p->total_zero_pages);
>>
>>      return NULL;
>>  }
>> @@ -1132,6 +1154,8 @@ static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
>>      p->iov = NULL;
>>      g_free(p->normal);
>>      p->normal = NULL;
>> +    g_free(p->zero);
>> +    p->zero = NULL;
>>      multifd_recv_state->ops->recv_cleanup(p);
>>  }
>>
>> @@ -1251,7 +1275,9 @@ static void *multifd_recv_thread(void *opaque)
>>      }
>>
>>      rcu_unregister_thread();
>> -    trace_multifd_recv_thread_end(p->id, p->packets_recved, p->total_normal_pages);
>> +    trace_multifd_recv_thread_end(p->id, p->packets_recved,
>> +                                  p->total_normal_pages,
>> +                                  p->total_zero_pages);
>>
>>      return NULL;
>>  }
>> @@ -1290,6 +1316,7 @@ int multifd_recv_setup(Error **errp)
>>          p->name = g_strdup_printf("multifdrecv_%d", i);
>>          p->iov = g_new0(struct iovec, page_count);
>>          p->normal = g_new0(ram_addr_t, page_count);
>> +        p->zero = g_new0(ram_addr_t, page_count);
>>          p->page_count = page_count;
>>          p->page_size = qemu_target_page_size();
>>      }
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index 8a1cad0996..9822ff298a 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -48,7 +48,10 @@ typedef struct {
>>      /* size of the next packet that contains pages */
>>      uint32_t next_packet_size;
>>      uint64_t packet_num;
>> -    uint64_t unused[4];    /* Reserved for future use */
>> +    /* zero pages */
>> +    uint32_t zero_pages;
>> +    uint32_t unused32[1];    /* Reserved for future use */
>> +    uint64_t unused64[3];    /* Reserved for future use */
>>      char ramblock[256];
>>      uint64_t offset[];
>>  } __attribute__((packed)) MultiFDPacket_t;
>> @@ -56,10 +59,18 @@ typedef struct {
>>  typedef struct {
>>      /* number of used pages */
>>      uint32_t num;
>> +    /* number of normal pages */
>> +    uint32_t normal_num;
>> +    /* number of zero pages */
>> +    uint32_t zero_num;
>>      /* number of allocated pages */
>>      uint32_t allocated;
>>      /* offset of each page */
>>      ram_addr_t *offset;
>> +    /* offset of normal page */
>> +    ram_addr_t *normal;
>> +    /* offset of zero page */
>> +    ram_addr_t *zero;
>>      RAMBlock *block;
>>  } MultiFDPages_t;
>>
>> @@ -124,6 +135,8 @@ typedef struct {
>>      uint64_t packets_sent;
>>      /* non zero pages sent through this channel */
>>      uint64_t total_normal_pages;
>> +    /* zero pages sent through this channel */
>> +    uint64_t total_zero_pages;
>
>
> Can we initialize these to zero when threads are being set up?
> Also, I have a strong desire to rename these.. later.

When MultiFDSendParams are allocated in multifd_send_setup, g_new0
will initialize them to zero.

>
>>
>>      /* buffers to send */
>>      struct iovec *iov;
>>      /* number of iovs used */
>> @@ -178,12 +191,18 @@ typedef struct {
>>      uint8_t *host;
>>      /* non zero pages recv through this channel */
>>      uint64_t total_normal_pages;
>> +    /* zero pages recv through this channel */
>> +    uint64_t total_zero_pages;
>>      /* buffers to recv */
>>      struct iovec *iov;
>>      /* Pages that are not zero */
>>      ram_addr_t *normal;
>>      /* num of non zero pages */
>>      uint32_t normal_num;
>> +    /* Pages that are zero */
>> +    ram_addr_t *zero;
>> +    /* num of zero pages */
>> +    uint32_t zero_num;
>>      /* used for de-compression methods */
>>      void *data;
>>  } MultiFDRecvParams;
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 556725c30f..5ece9f042e 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1259,7 +1259,6 @@ static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
>>      if (!multifd_queue_page(block, offset)) {
>>          return -1;
>>      }
>> -    stat64_add(&mig_stats.normal_pages, 1);
>>
>>      return 1;
>>  }
>> diff --git a/migration/trace-events b/migration/trace-events
>> index 298ad2b0dd..9f1d7ae71a 100644
>> --- a/migration/trace-events
>> +++ b/migration/trace-events
>> @@ -128,21 +128,21 @@ postcopy_preempt_reset_channel(void) ""
>>  # multifd.c
>>  multifd_new_send_channel_async(uint8_t id) "channel %u"
>>  multifd_new_send_channel_async_error(uint8_t id, void *err) "channel=%u err=%p"
>> -multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " pages %u flags 0x%x next packet size %u"
>> +multifd_recv(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t zero, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
>>  multifd_recv_new_channel(uint8_t id) "channel %u"
>>  multifd_recv_sync_main(long packet_num) "packet num %ld"
>>  multifd_recv_sync_main_signal(uint8_t id) "channel %u"
>>  multifd_recv_sync_main_wait(uint8_t id) "channel %u"
>>  multifd_recv_terminate_threads(bool error) "error %d"
>> -multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t pages) "channel %u packets %" PRIu64 " pages %" PRIu64
>> +multifd_recv_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %" PRIu64 " zero pages %" PRIu64
>>  multifd_recv_thread_start(uint8_t id) "%u"
>> -multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u flags 0x%x next packet size %u"
>> +multifd_send(uint8_t id, uint64_t packet_num, uint32_t normal_pages, uint32_t zero_pages, uint32_t flags, uint32_t next_packet_size) "channel %u packet_num %" PRIu64 " normal pages %u zero pages %u flags 0x%x next packet size %u"
>>  multifd_send_error(uint8_t id) "channel %u"
>>  multifd_send_sync_main(long packet_num) "packet num %ld"
>>  multifd_send_sync_main_signal(uint8_t id) "channel %u"
>>  multifd_send_sync_main_wait(uint8_t id) "channel %u"
>>  multifd_send_terminate_threads(void) ""
>> -multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64
>> +multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages, uint64_t zero_pages) "channel %u packets %" PRIu64 " normal pages %"  PRIu64 " zero pages %"  PRIu64
>>  multifd_send_thread_start(uint8_t id) "%u"
>>  multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char *hostname) "ioc=%p tioc=%p hostname=%s"
>>  multifd_tls_outgoing_handshake_error(void *ioc, const char *err) "ioc=%p err=%s"
>> --
>> 2.30.2
>>
>>
>
>
> --
> Elena


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking.
  2024-02-21 20:59   ` Fabiano Rosas
@ 2024-02-23  4:20     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:20 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: pbonzini, berrange, eduardo, peterx, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 12:59 PM Fabiano Rosas <farosas@suse.de> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > Now that zero page checking is done on the multifd sender threads by
> > default, we still provide an option for backward compatibility. This
> > change adds a qtest migration test case to set the zero-page-detection
> > option to "legacy" and run multifd migration with zero page checking on the
> > migration main thread.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  tests/qtest/migration-test.c | 52 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 52 insertions(+)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 8a5bb1752e..c27083110a 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -2621,6 +2621,24 @@ test_migrate_precopy_tcp_multifd_start(QTestState *from,
> >      return test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
> >  }
> >
> > +static void *
> > +test_migrate_precopy_tcp_multifd_start_zero_page_legacy(QTestState *from,
> > +                                                        QTestState *to)
> > +{
> > +    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
> > +    migrate_set_parameter_str(from, "zero-page-detection", "legacy");
> > +    return NULL;
> > +}
> > +
> > +static void *
> > +test_migration_precopy_tcp_multifd_start_no_zero_page(QTestState *from,
> > +                                                      QTestState *to)
> > +{
> > +    test_migrate_precopy_tcp_multifd_start_common(from, to, "none");
> > +    migrate_set_parameter_str(from, "zero-page-detection", "none");
> > +    return NULL;
> > +}
> > +
> >  static void *
> >  test_migrate_precopy_tcp_multifd_zlib_start(QTestState *from,
> >                                              QTestState *to)
> > @@ -2652,6 +2670,36 @@ static void test_multifd_tcp_none(void)
> >      test_precopy_common(&args);
> >  }
> >
> > +static void test_multifd_tcp_zero_page_legacy(void)
> > +{
> > +    MigrateCommon args = {
> > +        .listen_uri = "defer",
> > +        .start_hook = test_migrate_precopy_tcp_multifd_start_zero_page_legacy,
> > +        /*
> > +         * Multifd is more complicated than most of the features, it
> > +         * directly takes guest page buffers when sending, make sure
> > +         * everything will work alright even if guest page is changing.
> > +         */
> > +        .live = true,
> > +    };
> > +    test_precopy_common(&args);
> > +}
> > +
> > +static void test_multifd_tcp_no_zero_page(void)
> > +{
> > +    MigrateCommon args = {
> > +        .listen_uri = "defer",
> > +        .start_hook = test_migration_precopy_tcp_multifd_start_no_zero_page,
> > +        /*
> > +         * Multifd is more complicated than most of the features, it
> > +         * directly takes guest page buffers when sending, make sure
> > +         * everything will work alright even if guest page is changing.
> > +         */
> > +        .live = true,
> > +    };
> > +    test_precopy_common(&args);
> > +}
> > +
> >  static void test_multifd_tcp_zlib(void)
> >  {
> >      MigrateCommon args = {
> > @@ -3550,6 +3598,10 @@ int main(int argc, char **argv)
> >      }
> >      migration_test_add("/migration/multifd/tcp/plain/none",
> >                         test_multifd_tcp_none);
> > +    migration_test_add("/migration/multifd/tcp/plain/zero_page_legacy",
> > +                       test_multifd_tcp_zero_page_legacy);
> > +    migration_test_add("/migration/multifd/tcp/plain/no_zero_page",
> > +                       test_multifd_tcp_no_zero_page);
>
> Here it's better to separate the main feature from the states. That way
> we can run only the zero-page tests with:
>
>  migration-test -r /x86_64/migration/multifd/tcp/plain/zero-page
>
> Like so: (also dashes instead of underscores)
> /zero-page/legacy
> /zero-page/none
>

Sounds good.

> >      migration_test_add("/migration/multifd/tcp/plain/cancel",
> >                         test_multifd_tcp_cancel);
> >      migration_test_add("/migration/multifd/tcp/plain/zlib",


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-21 12:03   ` Markus Armbruster
@ 2024-02-23  4:22     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 4:03 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > This new parameter controls where the zero page checking is running.
> > 1. If this parameter is set to 'legacy', zero page checking is
> > done in the migration main thread.
> > 2. If this parameter is set to 'none', zero page checking is disabled.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>
> [...]
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 5a565d9b8d..99843a8e95 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -653,6 +653,17 @@
> >  { 'enum': 'MigMode',
> >    'data': [ 'normal', 'cpr-reboot' ] }
> >
> > +##
> > +# @ZeroPageDetection:
> > +#
> > +# @legacy: Perform zero page checking from main migration thread. (since 9.0)
> > +#
> > +# @none: Do not perform zero page checking.
> > +#
> > +##
>
> The entire type is since 9.0.  Thus:
>
>    ##
>    # @ZeroPageDetection:
>    #
>    # @legacy: Perform zero page checking from main migration thread.
>    #
>    # @none: Do not perform zero page checking.
>    #
>    # Since: 9.0
>    ##
>
> > +{ 'enum': 'ZeroPageDetection',
> > +  'data': [ 'legacy', 'none' ] }
> > +
> >  ##
> >  # @BitmapMigrationBitmapAliasTransform:
> >  #
> > @@ -874,6 +885,9 @@
> >  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
> >  #        (Since 8.2)
> >  #
> > +# @zero-page-detection: See description in @ZeroPageDetection.
> > +#     Default is 'legacy'. (Since 9.0)
>
> The description feels a bit lazy :)
>
> Suggest
>
>    # @zero-page-detection: Whether and how to detect zero pages.  Default
>    #     is 'legacy'.  (since 9.0)
>
> Same for the other two copies.

I will fix these in the next version.

>
> > +#
> >  # Features:
> >  #
> >  # @deprecated: Member @block-incremental is deprecated.  Use
> > @@ -907,7 +921,8 @@
> >             'block-bitmap-mapping',
> >             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
> >             'vcpu-dirty-limit',
> > -           'mode'] }
> > +           'mode',
> > +           'zero-page-detection'] }
> >
> >  ##
> >  # @MigrateSetParameters:
> > @@ -1066,6 +1081,10 @@
> >  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
> >  #        (Since 8.2)
> >  #
> > +# @zero-page-detection: See description in @ZeroPageDetection.
> > +#     Default is 'legacy'. (Since 9.0)
> > +#
> > +#
> >  # Features:
> >  #
> >  # @deprecated: Member @block-incremental is deprecated.  Use
> > @@ -1119,7 +1138,8 @@
> >              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
> >                                              'features': [ 'unstable' ] },
> >              '*vcpu-dirty-limit': 'uint64',
> > -            '*mode': 'MigMode'} }
> > +            '*mode': 'MigMode',
> > +            '*zero-page-detection': 'ZeroPageDetection'} }
> >
> >  ##
> >  # @migrate-set-parameters:
> > @@ -1294,6 +1314,9 @@
> >  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
> >  #        (Since 8.2)
> >  #
> > +# @zero-page-detection: See description in @ZeroPageDetection.
> > +#     Default is 'legacy'. (Since 9.0)
> > +#
> >  # Features:
> >  #
> >  # @deprecated: Member @block-incremental is deprecated.  Use
> > @@ -1344,7 +1367,8 @@
> >              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
> >                                              'features': [ 'unstable' ] },
> >              '*vcpu-dirty-limit': 'uint64',
> > -            '*mode': 'MigMode'} }
> > +            '*mode': 'MigMode',
> > +            '*zero-page-detection': 'ZeroPageDetection'} }
> >
> >  ##
> >  # @query-migrate-parameters:
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-21 13:58   ` Elena Ufimtseva
@ 2024-02-23  4:37     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:37 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 5:58 AM Elena Ufimtseva <ufimtseva@gmail.com> wrote:
>
>
>
> On Fri, Feb 16, 2024 at 2:41 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>>
>> This new parameter controls where the zero page checking is running.
>> 1. If this parameter is set to 'legacy', zero page checking is
>> done in the migration main thread.
>> 2. If this parameter is set to 'none', zero page checking is disabled.
>>
>
> Hello Hao
>
> Few questions and comments.
>
> First the commit message states that the parameter control where the checking is done, but it also controls
> if sending of zero pages is done by multifd threads or not.
>
>
>>
>> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> ---
>>  hw/core/qdev-properties-system.c    | 10 ++++++++++
>>  include/hw/qdev-properties-system.h |  4 ++++
>>  migration/migration-hmp-cmds.c      |  9 +++++++++
>>  migration/options.c                 | 21 ++++++++++++++++++++
>>  migration/options.h                 |  1 +
>>  migration/ram.c                     |  4 ++++
>>  qapi/migration.json                 | 30 ++++++++++++++++++++++++++---
>>  7 files changed, 76 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
>> index 1a396521d5..63843f18b5 100644
>> --- a/hw/core/qdev-properties-system.c
>> +++ b/hw/core/qdev-properties-system.c
>> @@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
>>      .set_default_value = qdev_propinfo_set_default_value_enum,
>>  };
>>
>> +const PropertyInfo qdev_prop_zero_page_detection = {
>> +    .name = "ZeroPageDetection",
>> +    .description = "zero_page_detection values, "
>> +                   "multifd,legacy,none",
>> +    .enum_table = &ZeroPageDetection_lookup,
>> +    .get = qdev_propinfo_get_enum,
>> +    .set = qdev_propinfo_set_enum,
>> +    .set_default_value = qdev_propinfo_set_default_value_enum,
>> +};
>> +
>>  /* --- Reserved Region --- */
>>
>>  /*
>> diff --git a/include/hw/qdev-properties-system.h b/include/hw/qdev-properties-system.h
>> index 06c359c190..839b170235 100644
>> --- a/include/hw/qdev-properties-system.h
>> +++ b/include/hw/qdev-properties-system.h
>> @@ -8,6 +8,7 @@ extern const PropertyInfo qdev_prop_macaddr;
>>  extern const PropertyInfo qdev_prop_reserved_region;
>>  extern const PropertyInfo qdev_prop_multifd_compression;
>>  extern const PropertyInfo qdev_prop_mig_mode;
>> +extern const PropertyInfo qdev_prop_zero_page_detection;
>>  extern const PropertyInfo qdev_prop_losttickpolicy;
>>  extern const PropertyInfo qdev_prop_blockdev_on_error;
>>  extern const PropertyInfo qdev_prop_bios_chs_trans;
>> @@ -47,6 +48,9 @@ extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
>>  #define DEFINE_PROP_MIG_MODE(_n, _s, _f, _d) \
>>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_mig_mode, \
>>                         MigMode)
>> +#define DEFINE_PROP_ZERO_PAGE_DETECTION(_n, _s, _f, _d) \
>> +    DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_zero_page_detection, \
>> +                       ZeroPageDetection)
>>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
>>                          LostTickPolicy)
>> diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
>> index 99b49df5dd..7e96ae6ffd 100644
>> --- a/migration/migration-hmp-cmds.c
>> +++ b/migration/migration-hmp-cmds.c
>> @@ -344,6 +344,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
>>          monitor_printf(mon, "%s: %s\n",
>>              MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESSION),
>>              MultiFDCompression_str(params->multifd_compression));
>> +        assert(params->has_zero_page_detection);
>
>
> What is the reason to have assert here?

It's just to verify that the option is initialized properly before we
reach here. Same things are done for other options.

>
>>
>> +        monitor_printf(mon, "%s: %s\n",
>> +            MigrationParameter_str(MIGRATION_PARAMETER_ZERO_PAGE_DETECTION),
>> +            qapi_enum_lookup(&ZeroPageDetection_lookup,
>> +                params->zero_page_detection));
>>          monitor_printf(mon, "%s: %" PRIu64 " bytes\n",
>>              MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
>>              params->xbzrle_cache_size);
>> @@ -634,6 +639,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>          p->has_multifd_zstd_level = true;
>>          visit_type_uint8(v, param, &p->multifd_zstd_level, &err);
>>          break;
>> +    case MIGRATION_PARAMETER_ZERO_PAGE_DETECTION:
>> +        p->has_zero_page_detection = true;
>> +        visit_type_ZeroPageDetection(v, param, &p->zero_page_detection, &err);
>> +        break;
>>      case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
>>          p->has_xbzrle_cache_size = true;
>>          if (!visit_type_size(v, param, &cache_size, &err)) {
>> diff --git a/migration/options.c b/migration/options.c
>> index 3e3e0b93b4..3c603391b0 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -179,6 +179,9 @@ Property migration_properties[] = {
>>      DEFINE_PROP_MIG_MODE("mode", MigrationState,
>>                        parameters.mode,
>>                        MIG_MODE_NORMAL),
>> +    DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
>> +                       parameters.zero_page_detection,
>> +                       ZERO_PAGE_DETECTION_LEGACY),
>>
>>      /* Migration capabilities */
>>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>> @@ -903,6 +906,13 @@ uint64_t migrate_xbzrle_cache_size(void)
>>      return s->parameters.xbzrle_cache_size;
>>  }
>>
>> +ZeroPageDetection migrate_zero_page_detection(void)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    return s->parameters.zero_page_detection;
>> +}
>> +
>>  /* parameter setters */
>>
>>  void migrate_set_block_incremental(bool value)
>> @@ -1013,6 +1023,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>>      params->vcpu_dirty_limit = s->parameters.vcpu_dirty_limit;
>>      params->has_mode = true;
>>      params->mode = s->parameters.mode;
>> +    params->has_zero_page_detection = true;
>> +    params->zero_page_detection = s->parameters.zero_page_detection;
>>
>>      return params;
>>  }
>> @@ -1049,6 +1061,7 @@ void migrate_params_init(MigrationParameters *params)
>>      params->has_x_vcpu_dirty_limit_period = true;
>>      params->has_vcpu_dirty_limit = true;
>>      params->has_mode = true;
>> +    params->has_zero_page_detection = true;
>>  }
>>
>>  /*
>> @@ -1350,6 +1363,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>>      if (params->has_mode) {
>>          dest->mode = params->mode;
>>      }
>> +
>> +    if (params->has_zero_page_detection) {
>> +        dest->zero_page_detection = params->zero_page_detection;
>> +    }
>>  }
>>
>>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>> @@ -1494,6 +1511,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>>      if (params->has_mode) {
>>          s->parameters.mode = params->mode;
>>      }
>> +
>> +    if (params->has_zero_page_detection) {
>> +        s->parameters.zero_page_detection = params->zero_page_detection;
>> +    }
>>  }
>>
>>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
>> diff --git a/migration/options.h b/migration/options.h
>> index 246c160aee..b7c4fb3861 100644
>> --- a/migration/options.h
>> +++ b/migration/options.h
>> @@ -93,6 +93,7 @@ const char *migrate_tls_authz(void);
>>  const char *migrate_tls_creds(void);
>>  const char *migrate_tls_hostname(void);
>>  uint64_t migrate_xbzrle_cache_size(void);
>> +ZeroPageDetection migrate_zero_page_detection(void);
>>
>>  /* parameters setters */
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 4649a81204..556725c30f 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1123,6 +1123,10 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>>      QEMUFile *file = pss->pss_channel;
>>      int len = 0;
>>
>> +    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
>> +        return 0;
>> +    }
>> +
>>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>          return 0;
>>      }
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 5a565d9b8d..99843a8e95 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -653,6 +653,17 @@
>>  { 'enum': 'MigMode',
>>    'data': [ 'normal', 'cpr-reboot' ] }
>>
>> +##
>> +# @ZeroPageDetection:
>> +#
>> +# @legacy: Perform zero page checking from main migration thread. (since 9.0)
>> +#
>> +# @none: Do not perform zero page checking.
>> +#
>> +##
>> +{ 'enum': 'ZeroPageDetection',
>> +  'data': [ 'legacy', 'none' ] }
>> +
>
>
> Above you have introduced the qdev property qdev_prop_zero_page_detection with multifd, but it is not present in the scheme.
> Perhaps 'mulitfd' in qdev_prop_zero_page_detection belongs to another patch?

You are right. I will fix that.

>
>
>>
>>  ##
>>  # @BitmapMigrationBitmapAliasTransform:
>>  #
>> @@ -874,6 +885,9 @@
>>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>>  #        (Since 8.2)
>>  #
>> +# @zero-page-detection: See description in @ZeroPageDetection.
>> +#     Default is 'legacy'. (Since 9.0)
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -907,7 +921,8 @@
>>             'block-bitmap-mapping',
>>             { 'name': 'x-vcpu-dirty-limit-period', 'features': ['unstable'] },
>>             'vcpu-dirty-limit',
>> -           'mode'] }
>> +           'mode',
>> +           'zero-page-detection'] }
>>
>>  ##
>>  # @MigrateSetParameters:
>> @@ -1066,6 +1081,10 @@
>>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>>  #        (Since 8.2)
>>  #
>> +# @zero-page-detection: See description in @ZeroPageDetection.
>> +#     Default is 'legacy'. (Since 9.0)
>> +#
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -1119,7 +1138,8 @@
>>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>>                                              'features': [ 'unstable' ] },
>>              '*vcpu-dirty-limit': 'uint64',
>> -            '*mode': 'MigMode'} }
>> +            '*mode': 'MigMode',
>> +            '*zero-page-detection': 'ZeroPageDetection'} }
>>
>>  ##
>>  # @migrate-set-parameters:
>> @@ -1294,6 +1314,9 @@
>>  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
>>  #        (Since 8.2)
>>  #
>> +# @zero-page-detection: See description in @ZeroPageDetection.
>> +#     Default is 'legacy'. (Since 9.0)
>> +#
>>  # Features:
>>  #
>>  # @deprecated: Member @block-incremental is deprecated.  Use
>> @@ -1344,7 +1367,8 @@
>>              '*x-vcpu-dirty-limit-period': { 'type': 'uint64',
>>                                              'features': [ 'unstable' ] },
>>              '*vcpu-dirty-limit': 'uint64',
>> -            '*mode': 'MigMode'} }
>> +            '*mode': 'MigMode',
>> +            '*zero-page-detection': 'ZeroPageDetection'} }
>>
>>  ##
>>  # @query-migrate-parameters:
>> --
>> 2.30.2
>>
>>
>
>
> --
> Elena


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-16 23:49   ` Richard Henderson
@ 2024-02-23  4:38     ` Hao Xiang
  2024-02-24 19:06       ` Hao Xiang
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:38 UTC (permalink / raw)
  To: Richard Henderson
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Fri, Feb 16, 2024 at 9:08 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/16/24 12:39, Hao Xiang wrote:
> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > +{
> > +    for (int i = 0; i < p->zero_num; i++) {
> > +        void *page = p->host + p->zero[i];
> > +        if (!buffer_is_zero(page, p->page_size)) {
> > +            memset(page, 0, p->page_size);
> > +        }
> > +    }
> > +}
>
> You should not check the buffer is zero here, you should just zero it.

I will fix it in the next version.

>
>
> r~


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-21 16:00   ` Elena Ufimtseva
@ 2024-02-23  4:59     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  4:59 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 8:00 AM Elena Ufimtseva <ufimtseva@gmail.com> wrote:
>
>
>
> On Fri, Feb 16, 2024 at 2:42 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>>
>> 1. Implements the zero page detection and handling on the multifd
>> threads for non-compression, zlib and zstd compression backends.
>> 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
>> 3. Add proper asserts to ensure pages->normal are used for normal pages
>> in all scenarios.
>>
>> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> ---
>>  migration/meson.build         |  1 +
>>  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
>>  migration/multifd-zlib.c      | 26 ++++++++++++---
>>  migration/multifd-zstd.c      | 25 ++++++++++++---
>>  migration/multifd.c           | 50 +++++++++++++++++++++++------
>>  migration/multifd.h           |  7 +++++
>>  qapi/migration.json           |  4 ++-
>>  7 files changed, 151 insertions(+), 21 deletions(-)
>>  create mode 100644 migration/multifd-zero-page.c
>>
>> diff --git a/migration/meson.build b/migration/meson.build
>> index 92b1cc4297..1eeb915ff6 100644
>> --- a/migration/meson.build
>> +++ b/migration/meson.build
>> @@ -22,6 +22,7 @@ system_ss.add(files(
>>    'migration.c',
>>    'multifd.c',
>>    'multifd-zlib.c',
>> +  'multifd-zero-page.c',
>>    'ram-compress.c',
>>    'options.c',
>>    'postcopy-ram.c',
>> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
>> new file mode 100644
>> index 0000000000..f0cd8e2c53
>> --- /dev/null
>> +++ b/migration/multifd-zero-page.c
>> @@ -0,0 +1,59 @@
>> +/*
>> + * Multifd zero page detection implementation.
>> + *
>> + * Copyright (c) 2024 Bytedance Inc
>> + *
>> + * Authors:
>> + *  Hao Xiang <hao.xiang@bytedance.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/cutils.h"
>> +#include "exec/ramblock.h"
>> +#include "migration.h"
>> +#include "multifd.h"
>> +#include "options.h"
>> +#include "ram.h"
>> +
>> +void multifd_zero_page_check_send(MultiFDSendParams *p)
>> +{
>> +    /*
>> +     * QEMU older than 9.0 don't understand zero page
>> +     * on multifd channel. This switch is required to
>> +     * maintain backward compatibility.
>> +     */
>> +    bool use_multifd_zero_page =
>> +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
>> +    MultiFDPages_t *pages = p->pages;
>> +    RAMBlock *rb = pages->block;
>> +
>> +    assert(pages->num != 0);
>
>
> Not needed, the check is done right before calling send_prepare.
>
>>
>> +    assert(pages->normal_num == 0);
>> +    assert(pages->zero_num == 0);
>
>
> Why these asserts are needed?

The idea is that when multifd_zero_page_check_send is called, I want
to make sure zero page checking was not processed on this packet
before. It is perhaps redundant. It won't compile in free build.

>>
>> +
>>
>> +    for (int i = 0; i < pages->num; i++) {
>> +        uint64_t offset = pages->offset[i];
>> +        if (use_multifd_zero_page &&
>> +            buffer_is_zero(rb->host + offset, p->page_size)) {
>> +            pages->zero[pages->zero_num] = offset;
>> +            pages->zero_num++;
>> +            ram_release_page(rb->idstr, offset);
>> +        } else {
>> +            pages->normal[pages->normal_num] = offset;
>> +            pages->normal_num++;
>> +        }
>> +    }
>> +}
>> +
>> +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
>> +{
>> +    for (int i = 0; i < p->zero_num; i++) {
>> +        void *page = p->host + p->zero[i];
>> +        if (!buffer_is_zero(page, p->page_size)) {
>> +            memset(page, 0, p->page_size);
>> +        }
>> +    }
>> +}
>> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
>> index 012e3bdea1..cdfe0fa70e 100644
>> --- a/migration/multifd-zlib.c
>> +++ b/migration/multifd-zlib.c
>> @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>>      int ret;
>>      uint32_t i;
>>
>> +    multifd_zero_page_check_send(p);
>> +
>> +    if (!pages->normal_num) {
>> +        p->next_packet_size = 0;
>> +        goto out;
>> +    }
>> +
>>      multifd_send_prepare_header(p);
>>
>> -    for (i = 0; i < pages->num; i++) {
>> +    for (i = 0; i < pages->normal_num; i++) {
>>          uint32_t available = z->zbuff_len - out_size;
>>          int flush = Z_NO_FLUSH;
>>
>> -        if (i == pages->num - 1) {
>> +        if (i == pages->normal_num - 1) {
>>              flush = Z_SYNC_FLUSH;
>>          }
>>
>> @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>>           * with compression. zlib does not guarantee that this is safe,
>>           * therefore copy the page before calling deflate().
>>           */
>> -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
>> +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
>>          zs->avail_in = p->page_size;
>>          zs->next_in = z->buf;
>>
>> @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>>      p->iov[p->iovs_num].iov_len = out_size;
>>      p->iovs_num++;
>>      p->next_packet_size = out_size;
>> -    p->flags |= MULTIFD_FLAG_ZLIB;
>>
>> +out:
>> +    p->flags |= MULTIFD_FLAG_ZLIB;
>>      multifd_send_fill_packet(p);
>> -
>
> Spurious?

We need to set the flag anyway otherwise the receiver side will complain.

>
>>      return 0;
>>  }
>>
>> @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>>                     p->id, flags, MULTIFD_FLAG_ZLIB);
>>          return -1;
>>      }
>> +
>> +    multifd_zero_page_check_recv(p);
>> +
>> +    if (!p->normal_num) {
>> +        assert(in_size == 0);
>> +        return 0;
>
>
> return here will have no effect. Also, why is assert needed here?

We return here so we don't end up calling qio_channel_read_all with
buflen = 0. The assert makes sure that normal_num/next_packet_size
states are consistent.

> This change also does not seem to fit the description of the patch, probaby separate patch will be better.

These changes are needed to ensure basic functionalities are working
after enabling the multifd zero page format.

>
>>
>> +    }
>> +
>>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>>
>>      if (ret != 0) {
>> @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>>                     p->id, out_size, expected_size);
>>          return -1;
>>      }
>> +
>>      return 0;
>>  }
>>
>> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
>> index dc8fe43e94..27a1eba075 100644
>> --- a/migration/multifd-zstd.c
>> +++ b/migration/multifd-zstd.c
>> @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>>      int ret;
>>      uint32_t i;
>>
>> +    multifd_zero_page_check_send(p);
>> +
>> +    if (!pages->normal_num) {
>> +        p->next_packet_size = 0;
>> +        goto out;
>> +    }
>> +
>>      multifd_send_prepare_header(p);
>>
>>      z->out.dst = z->zbuff;
>>      z->out.size = z->zbuff_len;
>>      z->out.pos = 0;
>>
>> -    for (i = 0; i < pages->num; i++) {
>> +    for (i = 0; i < pages->normal_num; i++) {
>>          ZSTD_EndDirective flush = ZSTD_e_continue;
>>
>> -        if (i == pages->num - 1) {
>> +        if (i == pages->normal_num - 1) {
>>              flush = ZSTD_e_flush;
>>          }
>> -        z->in.src = p->pages->block->host + pages->offset[i];
>> +        z->in.src = p->pages->block->host + pages->normal[i];
>>          z->in.size = p->page_size;
>>          z->in.pos = 0;
>>
>> @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>>      p->iov[p->iovs_num].iov_len = z->out.pos;
>>      p->iovs_num++;
>>      p->next_packet_size = z->out.pos;
>> -    p->flags |= MULTIFD_FLAG_ZSTD;
>>
>> +out:
>> +    p->flags |= MULTIFD_FLAG_ZSTD;
>>      multifd_send_fill_packet(p);
>> -
>
> Spurious removal.

Can you elaborate on this?

>
>>
>>      return 0;
>>  }
>>
>> @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>>                     p->id, flags, MULTIFD_FLAG_ZSTD);
>>          return -1;
>>      }
>> +
>> +    multifd_zero_page_check_recv(p);
>> +
>> +    if (!p->normal_num) {
>> +        assert(in_size == 0);
>> +        return 0;
>> +    }
>> +
>
> Same question here about assert.
>
>>
>>      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>>
>>      if (ret != 0) {
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index a33dba40d9..fbb40ea10b 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -11,6 +11,7 @@
>>   */
>>
>>  #include "qemu/osdep.h"
>> +#include "qemu/cutils.h"
>>  #include "qemu/rcu.h"
>>  #include "exec/target_page.h"
>>  #include "sysemu/sysemu.h"
>> @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>>      MultiFDPages_t *pages = p->pages;
>>      int ret;
>>
>> +    multifd_zero_page_check_send(p);
>> +
>>      if (!use_zero_copy_send) {
>>          /*
>>           * Only !zerocopy needs the header in IOV; zerocopy will
>> @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>>          multifd_send_prepare_header(p);
>>      }
>>
>> -    for (int i = 0; i < pages->num; i++) {
>> -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
>> +    for (int i = 0; i < pages->normal_num; i++) {
>> +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
>>          p->iov[p->iovs_num].iov_len = p->page_size;
>>          p->iovs_num++;
>>      }
>>
>> -    p->next_packet_size = pages->num * p->page_size;
>> +    p->next_packet_size = pages->normal_num * p->page_size;
>>      p->flags |= MULTIFD_FLAG_NOCOMP;
>>
>>      multifd_send_fill_packet(p);
>> @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>>                     p->id, flags, MULTIFD_FLAG_NOCOMP);
>>          return -1;
>>      }
>> +
>> +    multifd_zero_page_check_recv(p);
>> +
>> +    if (!p->normal_num) {
>> +        return 0;
>> +    }
>> +
>>      for (int i = 0; i < p->normal_num; i++) {
>>          p->iov[i].iov_base = p->host + p->normal[i];
>>          p->iov[i].iov_len = p->page_size;
>> @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>
>>      packet->flags = cpu_to_be32(p->flags);
>>      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
>> -    packet->normal_pages = cpu_to_be32(pages->num);
>> +    packet->normal_pages = cpu_to_be32(pages->normal_num);
>>      packet->zero_pages = cpu_to_be32(pages->zero_num);
>>      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>>
>> @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>>          strncpy(packet->ramblock, pages->block->idstr, 256);
>>      }
>>
>> -    for (i = 0; i < pages->num; i++) {
>> +    for (i = 0; i < pages->normal_num; i++) {
>>          /* there are architectures where ram_addr_t is 32 bit */
>> -        uint64_t temp = pages->offset[i];
>> +        uint64_t temp = pages->normal[i];
>>
>>          packet->offset[i] = cpu_to_be64(temp);
>>      }
>>
>> +    for (i = 0; i < pages->zero_num; i++) {
>> +        /* there are architectures where ram_addr_t is 32 bit */
>> +        uint64_t temp = pages->zero[i];
>> +
>> +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
>> +    }
>> +
>>      p->packets_sent++;
>> -    p->total_normal_pages += pages->num;
>> +    p->total_normal_pages += pages->normal_num;
>>      p->total_zero_pages += pages->zero_num;
>>
>> -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
>> +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
>>                         p->flags, p->next_packet_size);
>>  }
>>
>> @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>>          p->normal[i] = offset;
>>      }
>>
>> +    for (i = 0; i < p->zero_num; i++) {
>> +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
>> +
>> +        if (offset > (p->block->used_length - p->page_size)) {
>> +            error_setg(errp, "multifd: offset too long %" PRIu64
>> +                       " (max " RAM_ADDR_FMT ")",
>> +                       offset, p->block->used_length);
>> +            return -1;
>> +        }
>> +        p->zero[i] = offset;
>> +    }
>> +
>>      return 0;
>>  }
>>
>> @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
>>
>>              stat64_add(&mig_stats.multifd_bytes,
>>                         p->next_packet_size + p->packet_len);
>> -            stat64_add(&mig_stats.normal_pages, pages->num);
>> +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
>>              stat64_add(&mig_stats.zero_pages, pages->zero_num);
>>
>>              multifd_pages_reset(p->pages);
>> @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
>>          p->flags &= ~MULTIFD_FLAG_SYNC;
>>          qemu_mutex_unlock(&p->mutex);
>>
>> -        if (p->normal_num) {
>> +        if (p->normal_num + p->zero_num) {
>> +            assert(!(flags & MULTIFD_FLAG_SYNC));
>
> This assertion seems to be not relevant to this patch. Could you post it separately and explain why it's needed here?

I thought this is nice to have because it ensures that sync packet and
data packet are mutually exclusive. I agree this is not needed for
things to work but does this deserve its own patch?

>
>>
>>              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
>>              if (ret != 0) {
>>                  break;
>> diff --git a/migration/multifd.h b/migration/multifd.h
>> index 9822ff298a..125f0bbe60 100644
>> --- a/migration/multifd.h
>> +++ b/migration/multifd.h
>> @@ -53,6 +53,11 @@ typedef struct {
>>      uint32_t unused32[1];    /* Reserved for future use */
>>      uint64_t unused64[3];    /* Reserved for future use */
>>      char ramblock[256];
>> +    /*
>> +     * This array contains the pointers to:
>> +     *  - normal pages (initial normal_pages entries)
>> +     *  - zero pages (following zero_pages entries)
>> +     */
>>      uint64_t offset[];
>>  } __attribute__((packed)) MultiFDPacket_t;
>>
>> @@ -224,6 +229,8 @@ typedef struct {
>>
>>  void multifd_register_ops(int method, MultiFDMethods *ops);
>>  void multifd_send_fill_packet(MultiFDSendParams *p);
>> +void multifd_zero_page_check_send(MultiFDSendParams *p);
>> +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
>>
>>  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>>  {
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 99843a8e95..e2450b92d4 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -660,9 +660,11 @@
>>  #
>>  # @none: Do not perform zero page checking.
>>  #
>> +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
>> +#
>>  ##
>>  { 'enum': 'ZeroPageDetection',
>> -  'data': [ 'legacy', 'none' ] }
>> +  'data': [ 'legacy', 'none', 'multifd' ] }
>>
>>  ##
>>  # @BitmapMigrationBitmapAliasTransform:
>> --
>> 2.30.2
>>
>>
>
>
> --
> Elena


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-23  2:20     ` Peter Xu
@ 2024-02-23  5:15       ` Hao Xiang
  2024-02-24 22:56         ` Hao Xiang
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  5:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Thu, Feb 22, 2024 at 6:21 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Feb 21, 2024 at 06:04:10PM -0300, Fabiano Rosas wrote:
> > Hao Xiang <hao.xiang@bytedance.com> writes:
> >
> > > 1. Implements the zero page detection and handling on the multifd
> > > threads for non-compression, zlib and zstd compression backends.
> > > 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > > 3. Add proper asserts to ensure pages->normal are used for normal pages
> > > in all scenarios.
> > >
> > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > ---
> > >  migration/meson.build         |  1 +
> > >  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
> > >  migration/multifd-zlib.c      | 26 ++++++++++++---
> > >  migration/multifd-zstd.c      | 25 ++++++++++++---
> > >  migration/multifd.c           | 50 +++++++++++++++++++++++------
> > >  migration/multifd.h           |  7 +++++
> > >  qapi/migration.json           |  4 ++-
> > >  7 files changed, 151 insertions(+), 21 deletions(-)
> > >  create mode 100644 migration/multifd-zero-page.c
> > >
> > > diff --git a/migration/meson.build b/migration/meson.build
> > > index 92b1cc4297..1eeb915ff6 100644
> > > --- a/migration/meson.build
> > > +++ b/migration/meson.build
> > > @@ -22,6 +22,7 @@ system_ss.add(files(
> > >    'migration.c',
> > >    'multifd.c',
> > >    'multifd-zlib.c',
> > > +  'multifd-zero-page.c',
> > >    'ram-compress.c',
> > >    'options.c',
> > >    'postcopy-ram.c',
> > > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > > new file mode 100644
> > > index 0000000000..f0cd8e2c53
> > > --- /dev/null
> > > +++ b/migration/multifd-zero-page.c
> > > @@ -0,0 +1,59 @@
> > > +/*
> > > + * Multifd zero page detection implementation.
> > > + *
> > > + * Copyright (c) 2024 Bytedance Inc
> > > + *
> > > + * Authors:
> > > + *  Hao Xiang <hao.xiang@bytedance.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > + * See the COPYING file in the top-level directory.
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qemu/cutils.h"
> > > +#include "exec/ramblock.h"
> > > +#include "migration.h"
> > > +#include "multifd.h"
> > > +#include "options.h"
> > > +#include "ram.h"
> > > +
> > > +void multifd_zero_page_check_send(MultiFDSendParams *p)
> > > +{
> > > +    /*
> > > +     * QEMU older than 9.0 don't understand zero page
> > > +     * on multifd channel. This switch is required to
> > > +     * maintain backward compatibility.
> > > +     */
> > > +    bool use_multifd_zero_page =
> > > +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> > > +    MultiFDPages_t *pages = p->pages;
> > > +    RAMBlock *rb = pages->block;
> > > +
> > > +    assert(pages->num != 0);
> > > +    assert(pages->normal_num == 0);
> > > +    assert(pages->zero_num == 0);
> >
> > We can drop these before the final version.
> >
> > > +
> > > +    for (int i = 0; i < pages->num; i++) {
> > > +        uint64_t offset = pages->offset[i];
> > > +        if (use_multifd_zero_page &&
> > > +            buffer_is_zero(rb->host + offset, p->page_size)) {
> > > +            pages->zero[pages->zero_num] = offset;
> > > +            pages->zero_num++;
> > > +            ram_release_page(rb->idstr, offset);
> > > +        } else {
> > > +            pages->normal[pages->normal_num] = offset;
> > > +            pages->normal_num++;
> > > +        }
> > > +    }
> >
> > I don't think it's super clean to have three arrays offset, zero and
> > normal, all sized for the full packet size. It might be possible to just
> > carry a bitmap of non-zero pages along with pages->offset and operate on
> > that instead.
> >
> > What do you think?
> >
> > Peter, any ideas? Should we just leave this for another time?
>
> Yeah I think a bitmap should save quite a few fields indeed, it'll however
> make the latter iteration slightly harder by walking both (offset[],
> bitmap), process the page only if bitmap is set for the offset.
>
> IIUC we perhaps don't even need a bitmap?  AFAIU what we only need in
> Multifdpages_t is one extra field to mark "how many normal pages", aka,
> normal_num here (zero_num can be calculated from num-normal_num).  Then
> the zero page detection logic should do two things:
>
>   - Sort offset[] array so that it starts with normal pages, followed up by
>     zero pages
>
>   - Setup normal_num to be the number of normal pages
>
> Then we reduce 2 new arrays (normal[], zero[]) + 2 new fields (normal_num,
> zero_num) -> 1 new field (normal_num).  It'll also be trivial to fill the
> packet header later because offset[] is exactly that.
>
> Side note - I still think it's confusing to read this patch and previous
> patch separately.  Obviously previous patch introduced these new fields
> without justifying their values yet.  IMHO it'll be easier to review if you
> merge the two patches.

Fabiano, thanks for catching this. I totally missed the backward
compatibility thing.
Peter, I will code the sorting and merge this patch with the previous one.

>
> >
> > > +}
> > > +
> > > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > > +{
> > > +    for (int i = 0; i < p->zero_num; i++) {
> > > +        void *page = p->host + p->zero[i];
> > > +        if (!buffer_is_zero(page, p->page_size)) {
> > > +            memset(page, 0, p->page_size);
> > > +        }
> > > +    }
> > > +}
> > > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> > > index 012e3bdea1..cdfe0fa70e 100644
> > > --- a/migration/multifd-zlib.c
> > > +++ b/migration/multifd-zlib.c
> > > @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > >      int ret;
> > >      uint32_t i;
> > >
> > > +    multifd_zero_page_check_send(p);
> > > +
> > > +    if (!pages->normal_num) {
> > > +        p->next_packet_size = 0;
> > > +        goto out;
> > > +    }
> > > +
> > >      multifd_send_prepare_header(p);
> > >
> > > -    for (i = 0; i < pages->num; i++) {
> > > +    for (i = 0; i < pages->normal_num; i++) {
> > >          uint32_t available = z->zbuff_len - out_size;
> > >          int flush = Z_NO_FLUSH;
> > >
> > > -        if (i == pages->num - 1) {
> > > +        if (i == pages->normal_num - 1) {
> > >              flush = Z_SYNC_FLUSH;
> > >          }
> > >
> > > @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > >           * with compression. zlib does not guarantee that this is safe,
> > >           * therefore copy the page before calling deflate().
> > >           */
> > > -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> > > +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
> > >          zs->avail_in = p->page_size;
> > >          zs->next_in = z->buf;
> > >
> > > @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > >      p->iov[p->iovs_num].iov_len = out_size;
> > >      p->iovs_num++;
> > >      p->next_packet_size = out_size;
> > > -    p->flags |= MULTIFD_FLAG_ZLIB;
> > >
> > > +out:
> > > +    p->flags |= MULTIFD_FLAG_ZLIB;
> > >      multifd_send_fill_packet(p);
> > > -
> > >      return 0;
> > >  }
> > >
> > > @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> > >                     p->id, flags, MULTIFD_FLAG_ZLIB);
> > >          return -1;
> > >      }
> > > +
> > > +    multifd_zero_page_check_recv(p);
> > > +
> > > +    if (!p->normal_num) {
> > > +        assert(in_size == 0);
> > > +        return 0;
> > > +    }
> > > +
> > >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> > >
> > >      if (ret != 0) {
> > > @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> > >                     p->id, out_size, expected_size);
> > >          return -1;
> > >      }
> > > +
> > >      return 0;
> > >  }
> > >
> > > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> > > index dc8fe43e94..27a1eba075 100644
> > > --- a/migration/multifd-zstd.c
> > > +++ b/migration/multifd-zstd.c
> > > @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> > >      int ret;
> > >      uint32_t i;
> > >
> > > +    multifd_zero_page_check_send(p);
> > > +
> > > +    if (!pages->normal_num) {
> > > +        p->next_packet_size = 0;
> > > +        goto out;
> > > +    }
> > > +
> > >      multifd_send_prepare_header(p);
>
> If this forms a pattern we can introduce multifd_send_prepare_common():

I will add that in the next version.

>
> bool multifd_send_prepare_common()
> {
>     multifd_zero_page_check_send();
>     if (...) {
>
>     }
>     multifd_send_prepare_header();
> }
>
> > >
> > >      z->out.dst = z->zbuff;
> > >      z->out.size = z->zbuff_len;
> > >      z->out.pos = 0;
> > >
> > > -    for (i = 0; i < pages->num; i++) {
> > > +    for (i = 0; i < pages->normal_num; i++) {
> > >          ZSTD_EndDirective flush = ZSTD_e_continue;
> > >
> > > -        if (i == pages->num - 1) {
> > > +        if (i == pages->normal_num - 1) {
> > >              flush = ZSTD_e_flush;
> > >          }
> > > -        z->in.src = p->pages->block->host + pages->offset[i];
> > > +        z->in.src = p->pages->block->host + pages->normal[i];
> > >          z->in.size = p->page_size;
> > >          z->in.pos = 0;
> > >
> > > @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> > >      p->iov[p->iovs_num].iov_len = z->out.pos;
> > >      p->iovs_num++;
> > >      p->next_packet_size = z->out.pos;
> > > -    p->flags |= MULTIFD_FLAG_ZSTD;
> > >
> > > +out:
> > > +    p->flags |= MULTIFD_FLAG_ZSTD;
> > >      multifd_send_fill_packet(p);
> > > -
> > >      return 0;
> > >  }
> > >
> > > @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
> > >                     p->id, flags, MULTIFD_FLAG_ZSTD);
> > >          return -1;
> > >      }
> > > +
> > > +    multifd_zero_page_check_recv(p);
> > > +
> > > +    if (!p->normal_num) {
> > > +        assert(in_size == 0);
> > > +        return 0;
> > > +    }
> > > +
> > >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> > >
> > >      if (ret != 0) {
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index a33dba40d9..fbb40ea10b 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -11,6 +11,7 @@
> > >   */
> > >
> > >  #include "qemu/osdep.h"
> > > +#include "qemu/cutils.h"
> > >  #include "qemu/rcu.h"
> > >  #include "exec/target_page.h"
> > >  #include "sysemu/sysemu.h"
> > > @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> > >      MultiFDPages_t *pages = p->pages;
> > >      int ret;
> > >
> > > +    multifd_zero_page_check_send(p);
> > > +
> > >      if (!use_zero_copy_send) {
> > >          /*
> > >           * Only !zerocopy needs the header in IOV; zerocopy will
> > > @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> > >          multifd_send_prepare_header(p);
> > >      }
> > >
> > > -    for (int i = 0; i < pages->num; i++) {
> > > -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> > > +    for (int i = 0; i < pages->normal_num; i++) {
> > > +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
> > >          p->iov[p->iovs_num].iov_len = p->page_size;
> > >          p->iovs_num++;
> > >      }
> > >
> > > -    p->next_packet_size = pages->num * p->page_size;
> > > +    p->next_packet_size = pages->normal_num * p->page_size;
> > >      p->flags |= MULTIFD_FLAG_NOCOMP;
> > >
> > >      multifd_send_fill_packet(p);
> > > @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
> > >                     p->id, flags, MULTIFD_FLAG_NOCOMP);
> > >          return -1;
> > >      }
> > > +
> > > +    multifd_zero_page_check_recv(p);
> > > +
> > > +    if (!p->normal_num) {
> > > +        return 0;
> > > +    }
> > > +
> > >      for (int i = 0; i < p->normal_num; i++) {
> > >          p->iov[i].iov_base = p->host + p->normal[i];
> > >          p->iov[i].iov_len = p->page_size;
> > > @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> > >
> > >      packet->flags = cpu_to_be32(p->flags);
> > >      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> > > -    packet->normal_pages = cpu_to_be32(pages->num);
> > > +    packet->normal_pages = cpu_to_be32(pages->normal_num);
> > >      packet->zero_pages = cpu_to_be32(pages->zero_num);
> > >      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> > >
> > > @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> > >          strncpy(packet->ramblock, pages->block->idstr, 256);
> > >      }
> > >
> > > -    for (i = 0; i < pages->num; i++) {
> > > +    for (i = 0; i < pages->normal_num; i++) {
> > >          /* there are architectures where ram_addr_t is 32 bit */
> > > -        uint64_t temp = pages->offset[i];
> > > +        uint64_t temp = pages->normal[i];
> > >
> > >          packet->offset[i] = cpu_to_be64(temp);
> > >      }
> > >
> > > +    for (i = 0; i < pages->zero_num; i++) {
> > > +        /* there are architectures where ram_addr_t is 32 bit */
> > > +        uint64_t temp = pages->zero[i];
> > > +
> > > +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> > > +    }
> > > +
> > >      p->packets_sent++;
> > > -    p->total_normal_pages += pages->num;
> > > +    p->total_normal_pages += pages->normal_num;
> > >      p->total_zero_pages += pages->zero_num;
> > >
> > > -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> > > +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
> > >                         p->flags, p->next_packet_size);
> > >  }
> > >
> > > @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> > >          p->normal[i] = offset;
> > >      }
> > >
> > > +    for (i = 0; i < p->zero_num; i++) {
> > > +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> > > +
> > > +        if (offset > (p->block->used_length - p->page_size)) {
> > > +            error_setg(errp, "multifd: offset too long %" PRIu64
> > > +                       " (max " RAM_ADDR_FMT ")",
> > > +                       offset, p->block->used_length);
> > > +            return -1;
> > > +        }
> > > +        p->zero[i] = offset;
> > > +    }
> > > +
> > >      return 0;
> > >  }
> > >
> > > @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
> > >
> > >              stat64_add(&mig_stats.multifd_bytes,
> > >                         p->next_packet_size + p->packet_len);
> > > -            stat64_add(&mig_stats.normal_pages, pages->num);
> > > +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
> > >              stat64_add(&mig_stats.zero_pages, pages->zero_num);
> > >
> > >              multifd_pages_reset(p->pages);
> > > @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
> > >          p->flags &= ~MULTIFD_FLAG_SYNC;
> > >          qemu_mutex_unlock(&p->mutex);
> > >
> > > -        if (p->normal_num) {
> > > +        if (p->normal_num + p->zero_num) {
> > > +            assert(!(flags & MULTIFD_FLAG_SYNC));
> >
> > This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
> > along with the data packet.
> >
> > >              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
> > >              if (ret != 0) {
> > >                  break;
> > > diff --git a/migration/multifd.h b/migration/multifd.h
> > > index 9822ff298a..125f0bbe60 100644
> > > --- a/migration/multifd.h
> > > +++ b/migration/multifd.h
> > > @@ -53,6 +53,11 @@ typedef struct {
> > >      uint32_t unused32[1];    /* Reserved for future use */
> > >      uint64_t unused64[3];    /* Reserved for future use */
> > >      char ramblock[256];
> > > +    /*
> > > +     * This array contains the pointers to:
> > > +     *  - normal pages (initial normal_pages entries)
> > > +     *  - zero pages (following zero_pages entries)
> > > +     */
> > >      uint64_t offset[];
> > >  } __attribute__((packed)) MultiFDPacket_t;
> > >
> > > @@ -224,6 +229,8 @@ typedef struct {
> > >
> > >  void multifd_register_ops(int method, MultiFDMethods *ops);
> > >  void multifd_send_fill_packet(MultiFDSendParams *p);
> > > +void multifd_zero_page_check_send(MultiFDSendParams *p);
> > > +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
> > >
> > >  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> > >  {
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 99843a8e95..e2450b92d4 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -660,9 +660,11 @@
> > >  #
> > >  # @none: Do not perform zero page checking.
> > >  #
> > > +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> > > +#
> > >  ##
> > >  { 'enum': 'ZeroPageDetection',
> > > -  'data': [ 'legacy', 'none' ] }
> > > +  'data': [ 'legacy', 'none', 'multifd' ] }
> > >
> > >  ##
> > >  # @BitmapMigrationBitmapAliasTransform:
> >
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-21 21:04   ` Fabiano Rosas
  2024-02-23  2:20     ` Peter Xu
@ 2024-02-23  5:18     ` Hao Xiang
  2024-02-23 14:47       ` Fabiano Rosas
  1 sibling, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  5:18 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: pbonzini, berrange, eduardo, peterx, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 1:04 PM Fabiano Rosas <farosas@suse.de> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > 1. Implements the zero page detection and handling on the multifd
> > threads for non-compression, zlib and zstd compression backends.
> > 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > 3. Add proper asserts to ensure pages->normal are used for normal pages
> > in all scenarios.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  migration/meson.build         |  1 +
> >  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
> >  migration/multifd-zlib.c      | 26 ++++++++++++---
> >  migration/multifd-zstd.c      | 25 ++++++++++++---
> >  migration/multifd.c           | 50 +++++++++++++++++++++++------
> >  migration/multifd.h           |  7 +++++
> >  qapi/migration.json           |  4 ++-
> >  7 files changed, 151 insertions(+), 21 deletions(-)
> >  create mode 100644 migration/multifd-zero-page.c
> >
> > diff --git a/migration/meson.build b/migration/meson.build
> > index 92b1cc4297..1eeb915ff6 100644
> > --- a/migration/meson.build
> > +++ b/migration/meson.build
> > @@ -22,6 +22,7 @@ system_ss.add(files(
> >    'migration.c',
> >    'multifd.c',
> >    'multifd-zlib.c',
> > +  'multifd-zero-page.c',
> >    'ram-compress.c',
> >    'options.c',
> >    'postcopy-ram.c',
> > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > new file mode 100644
> > index 0000000000..f0cd8e2c53
> > --- /dev/null
> > +++ b/migration/multifd-zero-page.c
> > @@ -0,0 +1,59 @@
> > +/*
> > + * Multifd zero page detection implementation.
> > + *
> > + * Copyright (c) 2024 Bytedance Inc
> > + *
> > + * Authors:
> > + *  Hao Xiang <hao.xiang@bytedance.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > + * See the COPYING file in the top-level directory.
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qemu/cutils.h"
> > +#include "exec/ramblock.h"
> > +#include "migration.h"
> > +#include "multifd.h"
> > +#include "options.h"
> > +#include "ram.h"
> > +
> > +void multifd_zero_page_check_send(MultiFDSendParams *p)
> > +{
> > +    /*
> > +     * QEMU older than 9.0 don't understand zero page
> > +     * on multifd channel. This switch is required to
> > +     * maintain backward compatibility.
> > +     */
> > +    bool use_multifd_zero_page =
> > +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> > +    MultiFDPages_t *pages = p->pages;
> > +    RAMBlock *rb = pages->block;
> > +
> > +    assert(pages->num != 0);
> > +    assert(pages->normal_num == 0);
> > +    assert(pages->zero_num == 0);
>
> We can drop these before the final version.

Elena has the same concern. I will drop these.

>
> > +
> > +    for (int i = 0; i < pages->num; i++) {
> > +        uint64_t offset = pages->offset[i];
> > +        if (use_multifd_zero_page &&
> > +            buffer_is_zero(rb->host + offset, p->page_size)) {
> > +            pages->zero[pages->zero_num] = offset;
> > +            pages->zero_num++;
> > +            ram_release_page(rb->idstr, offset);
> > +        } else {
> > +            pages->normal[pages->normal_num] = offset;
> > +            pages->normal_num++;
> > +        }
> > +    }
>
> I don't think it's super clean to have three arrays offset, zero and
> normal, all sized for the full packet size. It might be possible to just
> carry a bitmap of non-zero pages along with pages->offset and operate on
> that instead.
>
> What do you think?
>
> Peter, any ideas? Should we just leave this for another time?
>
> > +}
> > +
> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > +{
> > +    for (int i = 0; i < p->zero_num; i++) {
> > +        void *page = p->host + p->zero[i];
> > +        if (!buffer_is_zero(page, p->page_size)) {
> > +            memset(page, 0, p->page_size);
> > +        }
> > +    }
> > +}
> > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> > index 012e3bdea1..cdfe0fa70e 100644
> > --- a/migration/multifd-zlib.c
> > +++ b/migration/multifd-zlib.c
> > @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >      int ret;
> >      uint32_t i;
> >
> > +    multifd_zero_page_check_send(p);
> > +
> > +    if (!pages->normal_num) {
> > +        p->next_packet_size = 0;
> > +        goto out;
> > +    }
> > +
> >      multifd_send_prepare_header(p);
> >
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          uint32_t available = z->zbuff_len - out_size;
> >          int flush = Z_NO_FLUSH;
> >
> > -        if (i == pages->num - 1) {
> > +        if (i == pages->normal_num - 1) {
> >              flush = Z_SYNC_FLUSH;
> >          }
> >
> > @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >           * with compression. zlib does not guarantee that this is safe,
> >           * therefore copy the page before calling deflate().
> >           */
> > -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> > +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
> >          zs->avail_in = p->page_size;
> >          zs->next_in = z->buf;
> >
> > @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> >      p->iov[p->iovs_num].iov_len = out_size;
> >      p->iovs_num++;
> >      p->next_packet_size = out_size;
> > -    p->flags |= MULTIFD_FLAG_ZLIB;
> >
> > +out:
> > +    p->flags |= MULTIFD_FLAG_ZLIB;
> >      multifd_send_fill_packet(p);
> > -
> >      return 0;
> >  }
> >
> > @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_ZLIB);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        assert(in_size == 0);
> > +        return 0;
> > +    }
> > +
> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> >
> >      if (ret != 0) {
> > @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, out_size, expected_size);
> >          return -1;
> >      }
> > +
> >      return 0;
> >  }
> >
> > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> > index dc8fe43e94..27a1eba075 100644
> > --- a/migration/multifd-zstd.c
> > +++ b/migration/multifd-zstd.c
> > @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> >      int ret;
> >      uint32_t i;
> >
> > +    multifd_zero_page_check_send(p);
> > +
> > +    if (!pages->normal_num) {
> > +        p->next_packet_size = 0;
> > +        goto out;
> > +    }
> > +
> >      multifd_send_prepare_header(p);
> >
> >      z->out.dst = z->zbuff;
> >      z->out.size = z->zbuff_len;
> >      z->out.pos = 0;
> >
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          ZSTD_EndDirective flush = ZSTD_e_continue;
> >
> > -        if (i == pages->num - 1) {
> > +        if (i == pages->normal_num - 1) {
> >              flush = ZSTD_e_flush;
> >          }
> > -        z->in.src = p->pages->block->host + pages->offset[i];
> > +        z->in.src = p->pages->block->host + pages->normal[i];
> >          z->in.size = p->page_size;
> >          z->in.pos = 0;
> >
> > @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> >      p->iov[p->iovs_num].iov_len = z->out.pos;
> >      p->iovs_num++;
> >      p->next_packet_size = z->out.pos;
> > -    p->flags |= MULTIFD_FLAG_ZSTD;
> >
> > +out:
> > +    p->flags |= MULTIFD_FLAG_ZSTD;
> >      multifd_send_fill_packet(p);
> > -
> >      return 0;
> >  }
> >
> > @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_ZSTD);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        assert(in_size == 0);
> > +        return 0;
> > +    }
> > +
> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> >
> >      if (ret != 0) {
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index a33dba40d9..fbb40ea10b 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -11,6 +11,7 @@
> >   */
> >
> >  #include "qemu/osdep.h"
> > +#include "qemu/cutils.h"
> >  #include "qemu/rcu.h"
> >  #include "exec/target_page.h"
> >  #include "sysemu/sysemu.h"
> > @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> >      MultiFDPages_t *pages = p->pages;
> >      int ret;
> >
> > +    multifd_zero_page_check_send(p);
> > +
> >      if (!use_zero_copy_send) {
> >          /*
> >           * Only !zerocopy needs the header in IOV; zerocopy will
> > @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> >          multifd_send_prepare_header(p);
> >      }
> >
> > -    for (int i = 0; i < pages->num; i++) {
> > -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> > +    for (int i = 0; i < pages->normal_num; i++) {
> > +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
> >          p->iov[p->iovs_num].iov_len = p->page_size;
> >          p->iovs_num++;
> >      }
> >
> > -    p->next_packet_size = pages->num * p->page_size;
> > +    p->next_packet_size = pages->normal_num * p->page_size;
> >      p->flags |= MULTIFD_FLAG_NOCOMP;
> >
> >      multifd_send_fill_packet(p);
> > @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
> >                     p->id, flags, MULTIFD_FLAG_NOCOMP);
> >          return -1;
> >      }
> > +
> > +    multifd_zero_page_check_recv(p);
> > +
> > +    if (!p->normal_num) {
> > +        return 0;
> > +    }
> > +
> >      for (int i = 0; i < p->normal_num; i++) {
> >          p->iov[i].iov_base = p->host + p->normal[i];
> >          p->iov[i].iov_len = p->page_size;
> > @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> >
> >      packet->flags = cpu_to_be32(p->flags);
> >      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> > -    packet->normal_pages = cpu_to_be32(pages->num);
> > +    packet->normal_pages = cpu_to_be32(pages->normal_num);
> >      packet->zero_pages = cpu_to_be32(pages->zero_num);
> >      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> >
> > @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> >          strncpy(packet->ramblock, pages->block->idstr, 256);
> >      }
> >
> > -    for (i = 0; i < pages->num; i++) {
> > +    for (i = 0; i < pages->normal_num; i++) {
> >          /* there are architectures where ram_addr_t is 32 bit */
> > -        uint64_t temp = pages->offset[i];
> > +        uint64_t temp = pages->normal[i];
> >
> >          packet->offset[i] = cpu_to_be64(temp);
> >      }
> >
> > +    for (i = 0; i < pages->zero_num; i++) {
> > +        /* there are architectures where ram_addr_t is 32 bit */
> > +        uint64_t temp = pages->zero[i];
> > +
> > +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> > +    }
> > +
> >      p->packets_sent++;
> > -    p->total_normal_pages += pages->num;
> > +    p->total_normal_pages += pages->normal_num;
> >      p->total_zero_pages += pages->zero_num;
> >
> > -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> > +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
> >                         p->flags, p->next_packet_size);
> >  }
> >
> > @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> >          p->normal[i] = offset;
> >      }
> >
> > +    for (i = 0; i < p->zero_num; i++) {
> > +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> > +
> > +        if (offset > (p->block->used_length - p->page_size)) {
> > +            error_setg(errp, "multifd: offset too long %" PRIu64
> > +                       " (max " RAM_ADDR_FMT ")",
> > +                       offset, p->block->used_length);
> > +            return -1;
> > +        }
> > +        p->zero[i] = offset;
> > +    }
> > +
> >      return 0;
> >  }
> >
> > @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
> >
> >              stat64_add(&mig_stats.multifd_bytes,
> >                         p->next_packet_size + p->packet_len);
> > -            stat64_add(&mig_stats.normal_pages, pages->num);
> > +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
> >              stat64_add(&mig_stats.zero_pages, pages->zero_num);
> >
> >              multifd_pages_reset(p->pages);
> > @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
> >          p->flags &= ~MULTIFD_FLAG_SYNC;
> >          qemu_mutex_unlock(&p->mutex);
> >
> > -        if (p->normal_num) {
> > +        if (p->normal_num + p->zero_num) {
> > +            assert(!(flags & MULTIFD_FLAG_SYNC));
>
> This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
> along with the data packet.

I keep missing the compatibility thing. Will remove this.

>
> >              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
> >              if (ret != 0) {
> >                  break;
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 9822ff298a..125f0bbe60 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -53,6 +53,11 @@ typedef struct {
> >      uint32_t unused32[1];    /* Reserved for future use */
> >      uint64_t unused64[3];    /* Reserved for future use */
> >      char ramblock[256];
> > +    /*
> > +     * This array contains the pointers to:
> > +     *  - normal pages (initial normal_pages entries)
> > +     *  - zero pages (following zero_pages entries)
> > +     */
> >      uint64_t offset[];
> >  } __attribute__((packed)) MultiFDPacket_t;
> >
> > @@ -224,6 +229,8 @@ typedef struct {
> >
> >  void multifd_register_ops(int method, MultiFDMethods *ops);
> >  void multifd_send_fill_packet(MultiFDSendParams *p);
> > +void multifd_zero_page_check_send(MultiFDSendParams *p);
> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
> >
> >  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> >  {
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 99843a8e95..e2450b92d4 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -660,9 +660,11 @@
> >  #
> >  # @none: Do not perform zero page checking.
> >  #
> > +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> > +#
> >  ##
> >  { 'enum': 'ZeroPageDetection',
> > -  'data': [ 'legacy', 'none' ] }
> > +  'data': [ 'legacy', 'none', 'multifd' ] }
> >
> >  ##
> >  # @BitmapMigrationBitmapAliasTransform:


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-21 16:11   ` Elena Ufimtseva
@ 2024-02-23  5:24     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  5:24 UTC (permalink / raw)
  To: Elena Ufimtseva
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 8:11 AM Elena Ufimtseva <ufimtseva@gmail.com> wrote:
>
>
>
> On Fri, Feb 16, 2024 at 2:42 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>>
>> This change adds a dedicated handler for MigrationOps::ram_save_target_page in
>> multifd live migration. Now zero page checking can be done in the multifd threads
>> and this becomes the default configuration. We still provide backward compatibility
>> where zero page checking is done from the migration main thread.
>>
>> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> ---
>>  migration/multifd.c |  1 +
>>  migration/options.c |  2 +-
>>  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
>>  3 files changed, 42 insertions(+), 14 deletions(-)
>>
>> diff --git a/migration/multifd.c b/migration/multifd.c
>> index fbb40ea10b..ef5dad1019 100644
>> --- a/migration/multifd.c
>> +++ b/migration/multifd.c
>> @@ -13,6 +13,7 @@
>>  #include "qemu/osdep.h"
>>  #include "qemu/cutils.h"
>>  #include "qemu/rcu.h"
>> +#include "qemu/cutils.h"
>>  #include "exec/target_page.h"
>>  #include "sysemu/sysemu.h"
>>  #include "exec/ramblock.h"
>> diff --git a/migration/options.c b/migration/options.c
>> index 3c603391b0..3c79b6ccd4 100644
>> --- a/migration/options.c
>> +++ b/migration/options.c
>> @@ -181,7 +181,7 @@ Property migration_properties[] = {
>>                        MIG_MODE_NORMAL),
>>      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
>>                         parameters.zero_page_detection,
>> -                       ZERO_PAGE_DETECTION_LEGACY),
>> +                       ZERO_PAGE_DETECTION_MULTIFD),
>>
>>      /* Migration capabilities */
>>      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5ece9f042e..b088c5a98c 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>>      QEMUFile *file = pss->pss_channel;
>>      int len = 0;
>>
>> -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
>> -        return 0;
>> -    }
>> -
>>      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>          return 0;
>>      }
>> @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>>
>>  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
>>  {
>> +    assert(migrate_multifd());
>
> We only call ram_save_multifd_page() if:
>  if (migrate_multifd()) {
>         migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> So this assert is not needed.

The point of an assert is to ensure the current function is called
with the correct assumptions. In the future, if someone moves this
function to a different place, we can catch the potential issues.

>
>> +    assert(!migrate_compress());
>>
>> +    assert(!migration_in_postcopy());
>
> These two are redundant and done before we call in here.
>
>> +
>>      if (!multifd_queue_page(block, offset)) {
>>          return -1;
>>      }
>> @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
>>   */
>>  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>>  {
>> -    RAMBlock *block = pss->block;
>>      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>>      int res;
>>
>> @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>>          return 1;
>>      }
>>
>> +    return ram_save_page(rs, pss);
>> +}
>> +
>> +/**
>> + * ram_save_target_page_multifd: save one target page
>> + *
>> + * Returns the number of pages written
>> + *
>> + * @rs: current RAM state
>> + * @pss: data about the page we want to send
>> + */
>> +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
>> +{
>> +    RAMBlock *block = pss->block;
>> +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>> +
>> +    /* Multifd is not compatible with old compression. */
>> +    assert(!migrate_compress());
>
> Do we need to check this for every page?
>
>>
>> +    /* Multifd is not compabible with postcopy. */
>> +    assert(!migration_in_postcopy());
>> +
>>      /*
>> -     * Do not use multifd in postcopy as one whole host page should be
>> -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
>> -     * if host page size == guest page size the dest guest during run may
>> -     * still see partially copied pages which is data corruption.
>> +     * Backward compatibility support. While using multifd live
>> +     * migration, we still need to handle zero page checking on the
>> +     * migration main thread.
>>       */
>> -    if (migrate_multifd() && !migration_in_postcopy()) {
>> -        return ram_save_multifd_page(block, offset);
>> +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
>> +        if (save_zero_page(rs, pss, offset)) {
>> +            return 1;
>> +        }
>>      }
>>
>> -    return ram_save_page(rs, pss);
>> +    return ram_save_multifd_page(block, offset);
>>  }
>>
>>  /* Should be called before sending a host page */
>> @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>>      }
>>
>>      migration_ops = g_malloc0(sizeof(MigrationOps));
>> -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> +
>> +    if (migrate_multifd()) {
>> +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
>> +    } else {
>> +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> +    }
>>
>>      bql_unlock();
>>      ret = multifd_send_sync_main();
>> --
>> 2.30.2
>>
>>
>
>
> --
> Elena


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-21 21:06   ` Fabiano Rosas
  2024-02-23  2:33     ` Peter Xu
@ 2024-02-23  5:47     ` Hao Xiang
  2024-02-23 14:38       ` Fabiano Rosas
  1 sibling, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  5:47 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: pbonzini, berrange, eduardo, peterx, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Wed, Feb 21, 2024 at 1:06 PM Fabiano Rosas <farosas@suse.de> wrote:
>
> Hao Xiang <hao.xiang@bytedance.com> writes:
>
> > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
>
> nit: Add a dedicated handler...
>
> Usually "this patch/change" is used only when necessary to avoid
> ambiguity.

Will do.

>
> > multifd live migration. Now zero page checking can be done in the multifd threads
> > and this becomes the default configuration. We still provide backward compatibility
> > where zero page checking is done from the migration main thread.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  migration/multifd.c |  1 +
> >  migration/options.c |  2 +-
> >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> >  3 files changed, 42 insertions(+), 14 deletions(-)
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index fbb40ea10b..ef5dad1019 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -13,6 +13,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/cutils.h"
>
> This include...
>
> >  #include "qemu/rcu.h"
> > +#include "qemu/cutils.h"
>
> is there already.
>
> >  #include "exec/target_page.h"
> >  #include "sysemu/sysemu.h"
> >  #include "exec/ramblock.h"
> > diff --git a/migration/options.c b/migration/options.c
> > index 3c603391b0..3c79b6ccd4 100644
> > --- a/migration/options.c
> > +++ b/migration/options.c
> > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> >                        MIG_MODE_NORMAL),
> >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> >                         parameters.zero_page_detection,
> > -                       ZERO_PAGE_DETECTION_LEGACY),
> > +                       ZERO_PAGE_DETECTION_MULTIFD),
>
> I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> enabled. Otherwise it will go along happily until we get data corruption
> because the new QEMU didn't send any zero pages on the migration thread
> and the old QEMU did not look for them in the multifd packet.
>
> Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> in use. We'd just need to fix the test in the new QEMU to check
> (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
>
> >
> >      /* Migration capabilities */
> >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 5ece9f042e..b088c5a98c 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> >      QEMUFile *file = pss->pss_channel;
> >      int len = 0;
> >
> > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > -        return 0;
> > -    }
>
> How does 'none' work now?

I tested it and all pages are transferred with payload (including the
zero pages).

>
> > -
> >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> >          return 0;
> >      }
> > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> >
> >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> >  {
> > +    assert(migrate_multifd());
> > +    assert(!migrate_compress());
> > +    assert(!migration_in_postcopy());
>
> Drop these, please. Keep only the asserts that are likely to trigger
> during development, such as the existing ones at multifd_send_pages.

I think I have got enough feedback regarding too many asserts. I will
drop these. assert is not compiled into the free build, correct?

>
> > +
> >      if (!multifd_queue_page(block, offset)) {
> >          return -1;
> >      }
> > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> >   */
> >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >  {
> > -    RAMBlock *block = pss->block;
> >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> >      int res;
> >
> > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> >          return 1;
> >      }
> >
> > +    return ram_save_page(rs, pss);
>
> Look at where git put this! Are you using the default diff algorithm? If
> not try using --patience to see if it improves the diff.

I used the default diff algorithm.

>
> > +}
> > +
> > +/**
> > + * ram_save_target_page_multifd: save one target page
> > + *
> > + * Returns the number of pages written
>
> We could be more precise here:
>
>  ram_save_target_page_multifd: send one target page to multifd workers
>
>  Returns 1 if the page was queued, -1 otherwise.

Will do.

>
> > + *
> > + * @rs: current RAM state
> > + * @pss: data about the page we want to send
> > + */
> > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > +{
> > +    RAMBlock *block = pss->block;
> > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > +
> > +    /* Multifd is not compatible with old compression. */
> > +    assert(!migrate_compress());
>
> This should already be enforced at options.c.
>
> > +
> > +    /* Multifd is not compabible with postcopy. */
> > +    assert(!migration_in_postcopy());
>
> Same here.
>
> > +
> >      /*
> > -     * Do not use multifd in postcopy as one whole host page should be
> > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > -     * if host page size == guest page size the dest guest during run may
> > -     * still see partially copied pages which is data corruption.
> > +     * Backward compatibility support. While using multifd live
> > +     * migration, we still need to handle zero page checking on the
> > +     * migration main thread.
> >       */
> > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > -        return ram_save_multifd_page(block, offset);
> > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > +        if (save_zero_page(rs, pss, offset)) {
> > +            return 1;
> > +        }
> >      }
> >
> > -    return ram_save_page(rs, pss);
> > +    return ram_save_multifd_page(block, offset);
> >  }
> >
> >  /* Should be called before sending a host page */
> > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >      }
> >
> >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +
> > +    if (migrate_multifd()) {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > +    } else {
> > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > +    }
> >
> >      bql_unlock();
> >      ret = multifd_send_sync_main();


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-23  2:33     ` Peter Xu
@ 2024-02-23  6:02       ` Hao Xiang
  2024-02-24 23:03         ` Hao Xiang
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-23  6:02 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Thu, Feb 22, 2024 at 6:33 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> > Hao Xiang <hao.xiang@bytedance.com> writes:
> >
> > > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
> >
> > nit: Add a dedicated handler...
> >
> > Usually "this patch/change" is used only when necessary to avoid
> > ambiguity.
> >
> > > multifd live migration. Now zero page checking can be done in the multifd threads
> > > and this becomes the default configuration. We still provide backward compatibility
> > > where zero page checking is done from the migration main thread.
> > >
> > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > ---
> > >  migration/multifd.c |  1 +
> > >  migration/options.c |  2 +-
> > >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> > >  3 files changed, 42 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index fbb40ea10b..ef5dad1019 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -13,6 +13,7 @@
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/cutils.h"
> >
> > This include...
> >
> > >  #include "qemu/rcu.h"
> > > +#include "qemu/cutils.h"
> >
> > is there already.
> >
> > >  #include "exec/target_page.h"
> > >  #include "sysemu/sysemu.h"
> > >  #include "exec/ramblock.h"
> > > diff --git a/migration/options.c b/migration/options.c
> > > index 3c603391b0..3c79b6ccd4 100644
> > > --- a/migration/options.c
> > > +++ b/migration/options.c
> > > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> > >                        MIG_MODE_NORMAL),
> > >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> > >                         parameters.zero_page_detection,
> > > -                       ZERO_PAGE_DETECTION_LEGACY),
> > > +                       ZERO_PAGE_DETECTION_MULTIFD),
> >
> > I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> > enabled. Otherwise it will go along happily until we get data corruption
> > because the new QEMU didn't send any zero pages on the migration thread
> > and the old QEMU did not look for them in the multifd packet.
>
> It could be even worse, as the new QEMU will only attach "normal" pages
> after the multifd packet, the old QEMU could read more than it could,
> expecting all pages..
>
> >
> > Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> > in use. We'd just need to fix the test in the new QEMU to check
> > (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
>
> IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
> compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
> We should make sure when "legacy" is set, multifd ran the old protocol
> (zero_num will always be 0, and will be ignored by old QEMUs, IIUC).
>
> One more comment is, when repost please consider split this patch into two;
> The new ram_save_target_page_multifd() hook can be done in another patch,
> AFAIU.

Sorry, I kept missing this. I will keep telling myself, compatibility
is king. I will set the hw_compat_8_2 setting and make sure to test
migration 9.0 -> 8.2 fails with "multifd" option set.
Will split patches.

>
> >
> > >
> > >      /* Migration capabilities */
> > >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5ece9f042e..b088c5a98c 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> > >      QEMUFile *file = pss->pss_channel;
> > >      int len = 0;
> > >
> > > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > > -        return 0;
> > > -    }
> >
> > How does 'none' work now?
> >
> > > -
> > >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> > >          return 0;
> > >      }
> > > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> > >
> > >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> > >  {
> > > +    assert(migrate_multifd());
> > > +    assert(!migrate_compress());
> > > +    assert(!migration_in_postcopy());
> >
> > Drop these, please. Keep only the asserts that are likely to trigger
> > during development, such as the existing ones at multifd_send_pages.
> >
> > > +
> > >      if (!multifd_queue_page(block, offset)) {
> > >          return -1;
> > >      }
> > > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> > >   */
> > >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > >  {
> > > -    RAMBlock *block = pss->block;
> > >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > >      int res;
> > >
> > > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > >          return 1;
> > >      }
> > >
> > > +    return ram_save_page(rs, pss);
> >
> > Look at where git put this! Are you using the default diff algorithm? If
> > not try using --patience to see if it improves the diff.
> >
> > > +}
> > > +
> > > +/**
> > > + * ram_save_target_page_multifd: save one target page
> > > + *
> > > + * Returns the number of pages written
> >
> > We could be more precise here:
> >
> >  ram_save_target_page_multifd: send one target page to multifd workers
> >
> >  Returns 1 if the page was queued, -1 otherwise.
> >
> > > + *
> > > + * @rs: current RAM state
> > > + * @pss: data about the page we want to send
> > > + */
> > > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > > +{
> > > +    RAMBlock *block = pss->block;
> > > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > +
> > > +    /* Multifd is not compatible with old compression. */
> > > +    assert(!migrate_compress());
> >
> > This should already be enforced at options.c.
> >
> > > +
> > > +    /* Multifd is not compabible with postcopy. */
> > > +    assert(!migration_in_postcopy());
> >
> > Same here.
> >
> > > +
> > >      /*
> > > -     * Do not use multifd in postcopy as one whole host page should be
> > > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > > -     * if host page size == guest page size the dest guest during run may
> > > -     * still see partially copied pages which is data corruption.
> > > +     * Backward compatibility support. While using multifd live
> > > +     * migration, we still need to handle zero page checking on the
> > > +     * migration main thread.
> > >       */
> > > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > > -        return ram_save_multifd_page(block, offset);
> > > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > > +        if (save_zero_page(rs, pss, offset)) {
> > > +            return 1;
> > > +        }
> > >      }
> > >
> > > -    return ram_save_page(rs, pss);
> > > +    return ram_save_multifd_page(block, offset);
> > >  }
> > >
> > >  /* Should be called before sending a host page */
> > > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> > >      }
> > >
> > >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > +
> > > +    if (migrate_multifd()) {
> > > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > > +    } else {
> > > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > +    }
> > >
> > >      bql_unlock();
> > >      ret = multifd_send_sync_main();
> >
>
> --
> Peter Xu
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-23  5:47     ` Hao Xiang
@ 2024-02-23 14:38       ` Fabiano Rosas
  0 siblings, 0 replies; 42+ messages in thread
From: Fabiano Rosas @ 2024-02-23 14:38 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

Hao Xiang <hao.xiang@bytedance.com> writes:

> On Wed, Feb 21, 2024 at 1:06 PM Fabiano Rosas <farosas@suse.de> wrote:
>>
>> Hao Xiang <hao.xiang@bytedance.com> writes:
>>
>> > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
>>
>> nit: Add a dedicated handler...
>>
>> Usually "this patch/change" is used only when necessary to avoid
>> ambiguity.
>
> Will do.
>
>>
>> > multifd live migration. Now zero page checking can be done in the multifd threads
>> > and this becomes the default configuration. We still provide backward compatibility
>> > where zero page checking is done from the migration main thread.
>> >
>> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> > ---
>> >  migration/multifd.c |  1 +
>> >  migration/options.c |  2 +-
>> >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
>> >  3 files changed, 42 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/migration/multifd.c b/migration/multifd.c
>> > index fbb40ea10b..ef5dad1019 100644
>> > --- a/migration/multifd.c
>> > +++ b/migration/multifd.c
>> > @@ -13,6 +13,7 @@
>> >  #include "qemu/osdep.h"
>> >  #include "qemu/cutils.h"
>>
>> This include...
>>
>> >  #include "qemu/rcu.h"
>> > +#include "qemu/cutils.h"
>>
>> is there already.
>>
>> >  #include "exec/target_page.h"
>> >  #include "sysemu/sysemu.h"
>> >  #include "exec/ramblock.h"
>> > diff --git a/migration/options.c b/migration/options.c
>> > index 3c603391b0..3c79b6ccd4 100644
>> > --- a/migration/options.c
>> > +++ b/migration/options.c
>> > @@ -181,7 +181,7 @@ Property migration_properties[] = {
>> >                        MIG_MODE_NORMAL),
>> >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
>> >                         parameters.zero_page_detection,
>> > -                       ZERO_PAGE_DETECTION_LEGACY),
>> > +                       ZERO_PAGE_DETECTION_MULTIFD),
>>
>> I think we'll need something to avoid a 9.0 -> 8.2 migration with this
>> enabled. Otherwise it will go along happily until we get data corruption
>> because the new QEMU didn't send any zero pages on the migration thread
>> and the old QEMU did not look for them in the multifd packet.
>>
>> Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
>> in use. We'd just need to fix the test in the new QEMU to check
>> (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
>>
>> >
>> >      /* Migration capabilities */
>> >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
>> > diff --git a/migration/ram.c b/migration/ram.c
>> > index 5ece9f042e..b088c5a98c 100644
>> > --- a/migration/ram.c
>> > +++ b/migration/ram.c
>> > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
>> >      QEMUFile *file = pss->pss_channel;
>> >      int len = 0;
>> >
>> > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
>> > -        return 0;
>> > -    }
>>
>> How does 'none' work now?
>
> I tested it and all pages are transferred with payload (including the
> zero pages).
>
>>
>> > -
>> >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>> >          return 0;
>> >      }
>> > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
>> >
>> >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
>> >  {
>> > +    assert(migrate_multifd());
>> > +    assert(!migrate_compress());
>> > +    assert(!migration_in_postcopy());
>>
>> Drop these, please. Keep only the asserts that are likely to trigger
>> during development, such as the existing ones at multifd_send_pages.
>
> I think I have got enough feedback regarding too many asserts. I will
> drop these. assert is not compiled into the free build, correct?
>

From include/qemu/osdep.h:

  /*
   * We have a lot of unaudited code that may fail in strange ways, or
   * even be a security risk during migration, if you disable assertions
   * at compile-time.  You may comment out these safety checks if you
   * absolutely want to disable assertion overhead, but it is not
   * supported upstream so the risk is all yours.  Meanwhile, please
   * submit patches to remove any side-effects inside an assertion, or
   * fixing error handling that should use Error instead of assert.
   */
  #ifdef NDEBUG
  #error building with NDEBUG is not supported
  #endif
  #ifdef G_DISABLE_ASSERT
  #error building with G_DISABLE_ASSERT is not supported
  #endif

>>
>> > +
>> >      if (!multifd_queue_page(block, offset)) {
>> >          return -1;
>> >      }
>> > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
>> >   */
>> >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>> >  {
>> > -    RAMBlock *block = pss->block;
>> >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>> >      int res;
>> >
>> > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
>> >          return 1;
>> >      }
>> >
>> > +    return ram_save_page(rs, pss);
>>
>> Look at where git put this! Are you using the default diff algorithm? If
>> not try using --patience to see if it improves the diff.
>
> I used the default diff algorithm.
>
>>
>> > +}
>> > +
>> > +/**
>> > + * ram_save_target_page_multifd: save one target page
>> > + *
>> > + * Returns the number of pages written
>>
>> We could be more precise here:
>>
>>  ram_save_target_page_multifd: send one target page to multifd workers
>>
>>  Returns 1 if the page was queued, -1 otherwise.
>
> Will do.
>
>>
>> > + *
>> > + * @rs: current RAM state
>> > + * @pss: data about the page we want to send
>> > + */
>> > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
>> > +{
>> > +    RAMBlock *block = pss->block;
>> > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
>> > +
>> > +    /* Multifd is not compatible with old compression. */
>> > +    assert(!migrate_compress());
>>
>> This should already be enforced at options.c.
>>
>> > +
>> > +    /* Multifd is not compabible with postcopy. */
>> > +    assert(!migration_in_postcopy());
>>
>> Same here.
>>
>> > +
>> >      /*
>> > -     * Do not use multifd in postcopy as one whole host page should be
>> > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
>> > -     * if host page size == guest page size the dest guest during run may
>> > -     * still see partially copied pages which is data corruption.
>> > +     * Backward compatibility support. While using multifd live
>> > +     * migration, we still need to handle zero page checking on the
>> > +     * migration main thread.
>> >       */
>> > -    if (migrate_multifd() && !migration_in_postcopy()) {
>> > -        return ram_save_multifd_page(block, offset);
>> > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
>> > +        if (save_zero_page(rs, pss, offset)) {
>> > +            return 1;
>> > +        }
>> >      }
>> >
>> > -    return ram_save_page(rs, pss);
>> > +    return ram_save_multifd_page(block, offset);
>> >  }
>> >
>> >  /* Should be called before sending a host page */
>> > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>> >      }
>> >
>> >      migration_ops = g_malloc0(sizeof(MigrationOps));
>> > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> > +
>> > +    if (migrate_multifd()) {
>> > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
>> > +    } else {
>> > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
>> > +    }
>> >
>> >      bql_unlock();
>> >      ret = multifd_send_sync_main();


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-23  5:18     ` Hao Xiang
@ 2024-02-23 14:47       ` Fabiano Rosas
  0 siblings, 0 replies; 42+ messages in thread
From: Fabiano Rosas @ 2024-02-23 14:47 UTC (permalink / raw)
  To: Hao Xiang
  Cc: pbonzini, berrange, eduardo, peterx, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

Hao Xiang <hao.xiang@bytedance.com> writes:

> On Wed, Feb 21, 2024 at 1:04 PM Fabiano Rosas <farosas@suse.de> wrote:
>>
>> Hao Xiang <hao.xiang@bytedance.com> writes:
>>
>> > 1. Implements the zero page detection and handling on the multifd
>> > threads for non-compression, zlib and zstd compression backends.
>> > 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
>> > 3. Add proper asserts to ensure pages->normal are used for normal pages
>> > in all scenarios.
>> >
>> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
>> > ---
>> >  migration/meson.build         |  1 +
>> >  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
>> >  migration/multifd-zlib.c      | 26 ++++++++++++---
>> >  migration/multifd-zstd.c      | 25 ++++++++++++---
>> >  migration/multifd.c           | 50 +++++++++++++++++++++++------
>> >  migration/multifd.h           |  7 +++++
>> >  qapi/migration.json           |  4 ++-
>> >  7 files changed, 151 insertions(+), 21 deletions(-)
>> >  create mode 100644 migration/multifd-zero-page.c
>> >
>> > diff --git a/migration/meson.build b/migration/meson.build
>> > index 92b1cc4297..1eeb915ff6 100644
>> > --- a/migration/meson.build
>> > +++ b/migration/meson.build
>> > @@ -22,6 +22,7 @@ system_ss.add(files(
>> >    'migration.c',
>> >    'multifd.c',
>> >    'multifd-zlib.c',
>> > +  'multifd-zero-page.c',
>> >    'ram-compress.c',
>> >    'options.c',
>> >    'postcopy-ram.c',
>> > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
>> > new file mode 100644
>> > index 0000000000..f0cd8e2c53
>> > --- /dev/null
>> > +++ b/migration/multifd-zero-page.c
>> > @@ -0,0 +1,59 @@
>> > +/*
>> > + * Multifd zero page detection implementation.
>> > + *
>> > + * Copyright (c) 2024 Bytedance Inc
>> > + *
>> > + * Authors:
>> > + *  Hao Xiang <hao.xiang@bytedance.com>
>> > + *
>> > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> > + * See the COPYING file in the top-level directory.
>> > + */
>> > +
>> > +#include "qemu/osdep.h"
>> > +#include "qemu/cutils.h"
>> > +#include "exec/ramblock.h"
>> > +#include "migration.h"
>> > +#include "multifd.h"
>> > +#include "options.h"
>> > +#include "ram.h"
>> > +
>> > +void multifd_zero_page_check_send(MultiFDSendParams *p)
>> > +{
>> > +    /*
>> > +     * QEMU older than 9.0 don't understand zero page
>> > +     * on multifd channel. This switch is required to
>> > +     * maintain backward compatibility.
>> > +     */
>> > +    bool use_multifd_zero_page =
>> > +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
>> > +    MultiFDPages_t *pages = p->pages;
>> > +    RAMBlock *rb = pages->block;
>> > +
>> > +    assert(pages->num != 0);
>> > +    assert(pages->normal_num == 0);
>> > +    assert(pages->zero_num == 0);
>>
>> We can drop these before the final version.
>
> Elena has the same concern. I will drop these.
>
>>
>> > +
>> > +    for (int i = 0; i < pages->num; i++) {
>> > +        uint64_t offset = pages->offset[i];
>> > +        if (use_multifd_zero_page &&
>> > +            buffer_is_zero(rb->host + offset, p->page_size)) {
>> > +            pages->zero[pages->zero_num] = offset;
>> > +            pages->zero_num++;
>> > +            ram_release_page(rb->idstr, offset);
>> > +        } else {
>> > +            pages->normal[pages->normal_num] = offset;
>> > +            pages->normal_num++;
>> > +        }
>> > +    }
>>
>> I don't think it's super clean to have three arrays offset, zero and
>> normal, all sized for the full packet size. It might be possible to just
>> carry a bitmap of non-zero pages along with pages->offset and operate on
>> that instead.
>>
>> What do you think?
>>
>> Peter, any ideas? Should we just leave this for another time?
>>
>> > +}
>> > +
>> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
>> > +{
>> > +    for (int i = 0; i < p->zero_num; i++) {
>> > +        void *page = p->host + p->zero[i];
>> > +        if (!buffer_is_zero(page, p->page_size)) {
>> > +            memset(page, 0, p->page_size);
>> > +        }
>> > +    }
>> > +}
>> > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
>> > index 012e3bdea1..cdfe0fa70e 100644
>> > --- a/migration/multifd-zlib.c
>> > +++ b/migration/multifd-zlib.c
>> > @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>> >      int ret;
>> >      uint32_t i;
>> >
>> > +    multifd_zero_page_check_send(p);
>> > +
>> > +    if (!pages->normal_num) {
>> > +        p->next_packet_size = 0;
>> > +        goto out;
>> > +    }
>> > +
>> >      multifd_send_prepare_header(p);
>> >
>> > -    for (i = 0; i < pages->num; i++) {
>> > +    for (i = 0; i < pages->normal_num; i++) {
>> >          uint32_t available = z->zbuff_len - out_size;
>> >          int flush = Z_NO_FLUSH;
>> >
>> > -        if (i == pages->num - 1) {
>> > +        if (i == pages->normal_num - 1) {
>> >              flush = Z_SYNC_FLUSH;
>> >          }
>> >
>> > @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>> >           * with compression. zlib does not guarantee that this is safe,
>> >           * therefore copy the page before calling deflate().
>> >           */
>> > -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
>> > +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
>> >          zs->avail_in = p->page_size;
>> >          zs->next_in = z->buf;
>> >
>> > @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
>> >      p->iov[p->iovs_num].iov_len = out_size;
>> >      p->iovs_num++;
>> >      p->next_packet_size = out_size;
>> > -    p->flags |= MULTIFD_FLAG_ZLIB;
>> >
>> > +out:
>> > +    p->flags |= MULTIFD_FLAG_ZLIB;
>> >      multifd_send_fill_packet(p);
>> > -
>> >      return 0;
>> >  }
>> >
>> > @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>> >                     p->id, flags, MULTIFD_FLAG_ZLIB);
>> >          return -1;
>> >      }
>> > +
>> > +    multifd_zero_page_check_recv(p);
>> > +
>> > +    if (!p->normal_num) {
>> > +        assert(in_size == 0);
>> > +        return 0;
>> > +    }
>> > +
>> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>> >
>> >      if (ret != 0) {
>> > @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
>> >                     p->id, out_size, expected_size);
>> >          return -1;
>> >      }
>> > +
>> >      return 0;
>> >  }
>> >
>> > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
>> > index dc8fe43e94..27a1eba075 100644
>> > --- a/migration/multifd-zstd.c
>> > +++ b/migration/multifd-zstd.c
>> > @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>> >      int ret;
>> >      uint32_t i;
>> >
>> > +    multifd_zero_page_check_send(p);
>> > +
>> > +    if (!pages->normal_num) {
>> > +        p->next_packet_size = 0;
>> > +        goto out;
>> > +    }
>> > +
>> >      multifd_send_prepare_header(p);
>> >
>> >      z->out.dst = z->zbuff;
>> >      z->out.size = z->zbuff_len;
>> >      z->out.pos = 0;
>> >
>> > -    for (i = 0; i < pages->num; i++) {
>> > +    for (i = 0; i < pages->normal_num; i++) {
>> >          ZSTD_EndDirective flush = ZSTD_e_continue;
>> >
>> > -        if (i == pages->num - 1) {
>> > +        if (i == pages->normal_num - 1) {
>> >              flush = ZSTD_e_flush;
>> >          }
>> > -        z->in.src = p->pages->block->host + pages->offset[i];
>> > +        z->in.src = p->pages->block->host + pages->normal[i];
>> >          z->in.size = p->page_size;
>> >          z->in.pos = 0;
>> >
>> > @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
>> >      p->iov[p->iovs_num].iov_len = z->out.pos;
>> >      p->iovs_num++;
>> >      p->next_packet_size = z->out.pos;
>> > -    p->flags |= MULTIFD_FLAG_ZSTD;
>> >
>> > +out:
>> > +    p->flags |= MULTIFD_FLAG_ZSTD;
>> >      multifd_send_fill_packet(p);
>> > -
>> >      return 0;
>> >  }
>> >
>> > @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
>> >                     p->id, flags, MULTIFD_FLAG_ZSTD);
>> >          return -1;
>> >      }
>> > +
>> > +    multifd_zero_page_check_recv(p);
>> > +
>> > +    if (!p->normal_num) {
>> > +        assert(in_size == 0);
>> > +        return 0;
>> > +    }
>> > +
>> >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>> >
>> >      if (ret != 0) {
>> > diff --git a/migration/multifd.c b/migration/multifd.c
>> > index a33dba40d9..fbb40ea10b 100644
>> > --- a/migration/multifd.c
>> > +++ b/migration/multifd.c
>> > @@ -11,6 +11,7 @@
>> >   */
>> >
>> >  #include "qemu/osdep.h"
>> > +#include "qemu/cutils.h"
>> >  #include "qemu/rcu.h"
>> >  #include "exec/target_page.h"
>> >  #include "sysemu/sysemu.h"
>> > @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>> >      MultiFDPages_t *pages = p->pages;
>> >      int ret;
>> >
>> > +    multifd_zero_page_check_send(p);
>> > +
>> >      if (!use_zero_copy_send) {
>> >          /*
>> >           * Only !zerocopy needs the header in IOV; zerocopy will
>> > @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
>> >          multifd_send_prepare_header(p);
>> >      }
>> >
>> > -    for (int i = 0; i < pages->num; i++) {
>> > -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
>> > +    for (int i = 0; i < pages->normal_num; i++) {
>> > +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
>> >          p->iov[p->iovs_num].iov_len = p->page_size;
>> >          p->iovs_num++;
>> >      }
>> >
>> > -    p->next_packet_size = pages->num * p->page_size;
>> > +    p->next_packet_size = pages->normal_num * p->page_size;
>> >      p->flags |= MULTIFD_FLAG_NOCOMP;
>> >
>> >      multifd_send_fill_packet(p);
>> > @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
>> >                     p->id, flags, MULTIFD_FLAG_NOCOMP);
>> >          return -1;
>> >      }
>> > +
>> > +    multifd_zero_page_check_recv(p);
>> > +
>> > +    if (!p->normal_num) {
>> > +        return 0;
>> > +    }
>> > +
>> >      for (int i = 0; i < p->normal_num; i++) {
>> >          p->iov[i].iov_base = p->host + p->normal[i];
>> >          p->iov[i].iov_len = p->page_size;
>> > @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>> >
>> >      packet->flags = cpu_to_be32(p->flags);
>> >      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
>> > -    packet->normal_pages = cpu_to_be32(pages->num);
>> > +    packet->normal_pages = cpu_to_be32(pages->normal_num);
>> >      packet->zero_pages = cpu_to_be32(pages->zero_num);
>> >      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>> >
>> > @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>> >          strncpy(packet->ramblock, pages->block->idstr, 256);
>> >      }
>> >
>> > -    for (i = 0; i < pages->num; i++) {
>> > +    for (i = 0; i < pages->normal_num; i++) {
>> >          /* there are architectures where ram_addr_t is 32 bit */
>> > -        uint64_t temp = pages->offset[i];
>> > +        uint64_t temp = pages->normal[i];
>> >
>> >          packet->offset[i] = cpu_to_be64(temp);
>> >      }
>> >
>> > +    for (i = 0; i < pages->zero_num; i++) {
>> > +        /* there are architectures where ram_addr_t is 32 bit */
>> > +        uint64_t temp = pages->zero[i];
>> > +
>> > +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
>> > +    }
>> > +
>> >      p->packets_sent++;
>> > -    p->total_normal_pages += pages->num;
>> > +    p->total_normal_pages += pages->normal_num;
>> >      p->total_zero_pages += pages->zero_num;
>> >
>> > -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
>> > +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
>> >                         p->flags, p->next_packet_size);
>> >  }
>> >
>> > @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
>> >          p->normal[i] = offset;
>> >      }
>> >
>> > +    for (i = 0; i < p->zero_num; i++) {
>> > +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
>> > +
>> > +        if (offset > (p->block->used_length - p->page_size)) {
>> > +            error_setg(errp, "multifd: offset too long %" PRIu64
>> > +                       " (max " RAM_ADDR_FMT ")",
>> > +                       offset, p->block->used_length);
>> > +            return -1;
>> > +        }
>> > +        p->zero[i] = offset;
>> > +    }
>> > +
>> >      return 0;
>> >  }
>> >
>> > @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
>> >
>> >              stat64_add(&mig_stats.multifd_bytes,
>> >                         p->next_packet_size + p->packet_len);
>> > -            stat64_add(&mig_stats.normal_pages, pages->num);
>> > +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
>> >              stat64_add(&mig_stats.zero_pages, pages->zero_num);
>> >
>> >              multifd_pages_reset(p->pages);
>> > @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
>> >          p->flags &= ~MULTIFD_FLAG_SYNC;
>> >          qemu_mutex_unlock(&p->mutex);
>> >
>> > -        if (p->normal_num) {
>> > +        if (p->normal_num + p->zero_num) {
>> > +            assert(!(flags & MULTIFD_FLAG_SYNC));
>>
>> This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
>> along with the data packet.
>
> I keep missing the compatibility thing. Will remove this.
>

If you send this through CI it will catch these issues on x86 at least.

You can also keep an 8.2 build and run the tests in your local machine
like this:

$ git checkout <devel>
$ mkdir build
$ cd build
$ configure, make

$ git checkout v8.2.0
$ mkdir ../build-8.2.0
$ cd ../build-8.2.0
$ configure, make

# this has to use the tests from the *previous* version
# 8.2 -> devel
$ QTEST_QEMU_BINARY_SRC=./qemu-system-x86_64  \
  QTEST_QEMU_BINARY=../build/qemu-system-x86_64 \
  ./tests/qtest/migration-test

# devel -> 8.2
$ QTEST_QEMU_BINARY_DST=./qemu-system-x86_64  \
  QTEST_QEMU_BINARY=../build/qemu-system-x86_64 \
  ./tests/qtest/migration-test

>>
>> >              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
>> >              if (ret != 0) {
>> >                  break;
>> > diff --git a/migration/multifd.h b/migration/multifd.h
>> > index 9822ff298a..125f0bbe60 100644
>> > --- a/migration/multifd.h
>> > +++ b/migration/multifd.h
>> > @@ -53,6 +53,11 @@ typedef struct {
>> >      uint32_t unused32[1];    /* Reserved for future use */
>> >      uint64_t unused64[3];    /* Reserved for future use */
>> >      char ramblock[256];
>> > +    /*
>> > +     * This array contains the pointers to:
>> > +     *  - normal pages (initial normal_pages entries)
>> > +     *  - zero pages (following zero_pages entries)
>> > +     */
>> >      uint64_t offset[];
>> >  } __attribute__((packed)) MultiFDPacket_t;
>> >
>> > @@ -224,6 +229,8 @@ typedef struct {
>> >
>> >  void multifd_register_ops(int method, MultiFDMethods *ops);
>> >  void multifd_send_fill_packet(MultiFDSendParams *p);
>> > +void multifd_zero_page_check_send(MultiFDSendParams *p);
>> > +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
>> >
>> >  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
>> >  {
>> > diff --git a/qapi/migration.json b/qapi/migration.json
>> > index 99843a8e95..e2450b92d4 100644
>> > --- a/qapi/migration.json
>> > +++ b/qapi/migration.json
>> > @@ -660,9 +660,11 @@
>> >  #
>> >  # @none: Do not perform zero page checking.
>> >  #
>> > +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
>> > +#
>> >  ##
>> >  { 'enum': 'ZeroPageDetection',
>> > -  'data': [ 'legacy', 'none' ] }
>> > +  'data': [ 'legacy', 'none', 'multifd' ] }
>> >
>> >  ##
>> >  # @BitmapMigrationBitmapAliasTransform:


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-23  4:38     ` [External] " Hao Xiang
@ 2024-02-24 19:06       ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-24 19:06 UTC (permalink / raw)
  To: Richard Henderson
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Thu, Feb 22, 2024 at 8:38 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>
> On Fri, Feb 16, 2024 at 9:08 PM Richard Henderson
> <richard.henderson@linaro.org> wrote:
> >
> > On 2/16/24 12:39, Hao Xiang wrote:
> > > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > > +{
> > > +    for (int i = 0; i < p->zero_num; i++) {
> > > +        void *page = p->host + p->zero[i];
> > > +        if (!buffer_is_zero(page, p->page_size)) {
> > > +            memset(page, 0, p->page_size);
> > > +        }
> > > +    }
> > > +}
> >
> > You should not check the buffer is zero here, you should just zero it.
>
> I will fix it in the next version.

I tested with zero out all pages but the performance is bad compared
to previously. In my test case, most pages are zero pages. I think
what happened is that the destination host already has the pages being
zero so performing a memcmp is much faster than memset on all zero
pages.

>
> >
> >
> > r~


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-23  5:15       ` [External] " Hao Xiang
@ 2024-02-24 22:56         ` Hao Xiang
  2024-02-26  1:30           ` Peter Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-24 22:56 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Thu, Feb 22, 2024 at 9:15 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>
> On Thu, Feb 22, 2024 at 6:21 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 06:04:10PM -0300, Fabiano Rosas wrote:
> > > Hao Xiang <hao.xiang@bytedance.com> writes:
> > >
> > > > 1. Implements the zero page detection and handling on the multifd
> > > > threads for non-compression, zlib and zstd compression backends.
> > > > 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> > > > 3. Add proper asserts to ensure pages->normal are used for normal pages
> > > > in all scenarios.
> > > >
> > > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > > ---
> > > >  migration/meson.build         |  1 +
> > > >  migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
> > > >  migration/multifd-zlib.c      | 26 ++++++++++++---
> > > >  migration/multifd-zstd.c      | 25 ++++++++++++---
> > > >  migration/multifd.c           | 50 +++++++++++++++++++++++------
> > > >  migration/multifd.h           |  7 +++++
> > > >  qapi/migration.json           |  4 ++-
> > > >  7 files changed, 151 insertions(+), 21 deletions(-)
> > > >  create mode 100644 migration/multifd-zero-page.c
> > > >
> > > > diff --git a/migration/meson.build b/migration/meson.build
> > > > index 92b1cc4297..1eeb915ff6 100644
> > > > --- a/migration/meson.build
> > > > +++ b/migration/meson.build
> > > > @@ -22,6 +22,7 @@ system_ss.add(files(
> > > >    'migration.c',
> > > >    'multifd.c',
> > > >    'multifd-zlib.c',
> > > > +  'multifd-zero-page.c',
> > > >    'ram-compress.c',
> > > >    'options.c',
> > > >    'postcopy-ram.c',
> > > > diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> > > > new file mode 100644
> > > > index 0000000000..f0cd8e2c53
> > > > --- /dev/null
> > > > +++ b/migration/multifd-zero-page.c
> > > > @@ -0,0 +1,59 @@
> > > > +/*
> > > > + * Multifd zero page detection implementation.
> > > > + *
> > > > + * Copyright (c) 2024 Bytedance Inc
> > > > + *
> > > > + * Authors:
> > > > + *  Hao Xiang <hao.xiang@bytedance.com>
> > > > + *
> > > > + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> > > > + * See the COPYING file in the top-level directory.
> > > > + */
> > > > +
> > > > +#include "qemu/osdep.h"
> > > > +#include "qemu/cutils.h"
> > > > +#include "exec/ramblock.h"
> > > > +#include "migration.h"
> > > > +#include "multifd.h"
> > > > +#include "options.h"
> > > > +#include "ram.h"
> > > > +
> > > > +void multifd_zero_page_check_send(MultiFDSendParams *p)
> > > > +{
> > > > +    /*
> > > > +     * QEMU older than 9.0 don't understand zero page
> > > > +     * on multifd channel. This switch is required to
> > > > +     * maintain backward compatibility.
> > > > +     */
> > > > +    bool use_multifd_zero_page =
> > > > +        (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> > > > +    MultiFDPages_t *pages = p->pages;
> > > > +    RAMBlock *rb = pages->block;
> > > > +
> > > > +    assert(pages->num != 0);
> > > > +    assert(pages->normal_num == 0);
> > > > +    assert(pages->zero_num == 0);
> > >
> > > We can drop these before the final version.
> > >
> > > > +
> > > > +    for (int i = 0; i < pages->num; i++) {
> > > > +        uint64_t offset = pages->offset[i];
> > > > +        if (use_multifd_zero_page &&
> > > > +            buffer_is_zero(rb->host + offset, p->page_size)) {
> > > > +            pages->zero[pages->zero_num] = offset;
> > > > +            pages->zero_num++;
> > > > +            ram_release_page(rb->idstr, offset);
> > > > +        } else {
> > > > +            pages->normal[pages->normal_num] = offset;
> > > > +            pages->normal_num++;
> > > > +        }
> > > > +    }
> > >
> > > I don't think it's super clean to have three arrays offset, zero and
> > > normal, all sized for the full packet size. It might be possible to just
> > > carry a bitmap of non-zero pages along with pages->offset and operate on
> > > that instead.
> > >
> > > What do you think?
> > >
> > > Peter, any ideas? Should we just leave this for another time?
> >
> > Yeah I think a bitmap should save quite a few fields indeed, it'll however
> > make the latter iteration slightly harder by walking both (offset[],
> > bitmap), process the page only if bitmap is set for the offset.
> >
> > IIUC we perhaps don't even need a bitmap?  AFAIU what we only need in
> > Multifdpages_t is one extra field to mark "how many normal pages", aka,
> > normal_num here (zero_num can be calculated from num-normal_num).  Then
> > the zero page detection logic should do two things:
> >
> >   - Sort offset[] array so that it starts with normal pages, followed up by
> >     zero pages
> >
> >   - Setup normal_num to be the number of normal pages
> >
> > Then we reduce 2 new arrays (normal[], zero[]) + 2 new fields (normal_num,
> > zero_num) -> 1 new field (normal_num).  It'll also be trivial to fill the
> > packet header later because offset[] is exactly that.
> >
> > Side note - I still think it's confusing to read this patch and previous
> > patch separately.  Obviously previous patch introduced these new fields
> > without justifying their values yet.  IMHO it'll be easier to review if you
> > merge the two patches.
>
> Fabiano, thanks for catching this. I totally missed the backward
> compatibility thing.
> Peter, I will code the sorting and merge this patch with the previous one.
>
It turns out that we still need to add a "zero_pages" field in
MultiFDPacket_t because the existing field "pages_alloc" is not the
total number of pages in "offset". So source can set "zero_pages" from
pages->num - pages->num_normal but "zero_pages" needs to be set in the
packet.
> >
> > >
> > > > +}
> > > > +
> > > > +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> > > > +{
> > > > +    for (int i = 0; i < p->zero_num; i++) {
> > > > +        void *page = p->host + p->zero[i];
> > > > +        if (!buffer_is_zero(page, p->page_size)) {
> > > > +            memset(page, 0, p->page_size);
> > > > +        }
> > > > +    }
> > > > +}
> > > > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> > > > index 012e3bdea1..cdfe0fa70e 100644
> > > > --- a/migration/multifd-zlib.c
> > > > +++ b/migration/multifd-zlib.c
> > > > @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >      int ret;
> > > >      uint32_t i;
> > > >
> > > > +    multifd_zero_page_check_send(p);
> > > > +
> > > > +    if (!pages->normal_num) {
> > > > +        p->next_packet_size = 0;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > >      multifd_send_prepare_header(p);
> > > >
> > > > -    for (i = 0; i < pages->num; i++) {
> > > > +    for (i = 0; i < pages->normal_num; i++) {
> > > >          uint32_t available = z->zbuff_len - out_size;
> > > >          int flush = Z_NO_FLUSH;
> > > >
> > > > -        if (i == pages->num - 1) {
> > > > +        if (i == pages->normal_num - 1) {
> > > >              flush = Z_SYNC_FLUSH;
> > > >          }
> > > >
> > > > @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >           * with compression. zlib does not guarantee that this is safe,
> > > >           * therefore copy the page before calling deflate().
> > > >           */
> > > > -        memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> > > > +        memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
> > > >          zs->avail_in = p->page_size;
> > > >          zs->next_in = z->buf;
> > > >
> > > > @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >      p->iov[p->iovs_num].iov_len = out_size;
> > > >      p->iovs_num++;
> > > >      p->next_packet_size = out_size;
> > > > -    p->flags |= MULTIFD_FLAG_ZLIB;
> > > >
> > > > +out:
> > > > +    p->flags |= MULTIFD_FLAG_ZLIB;
> > > >      multifd_send_fill_packet(p);
> > > > -
> > > >      return 0;
> > > >  }
> > > >
> > > > @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> > > >                     p->id, flags, MULTIFD_FLAG_ZLIB);
> > > >          return -1;
> > > >      }
> > > > +
> > > > +    multifd_zero_page_check_recv(p);
> > > > +
> > > > +    if (!p->normal_num) {
> > > > +        assert(in_size == 0);
> > > > +        return 0;
> > > > +    }
> > > > +
> > > >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> > > >
> > > >      if (ret != 0) {
> > > > @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> > > >                     p->id, out_size, expected_size);
> > > >          return -1;
> > > >      }
> > > > +
> > > >      return 0;
> > > >  }
> > > >
> > > > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> > > > index dc8fe43e94..27a1eba075 100644
> > > > --- a/migration/multifd-zstd.c
> > > > +++ b/migration/multifd-zstd.c
> > > > @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >      int ret;
> > > >      uint32_t i;
> > > >
> > > > +    multifd_zero_page_check_send(p);
> > > > +
> > > > +    if (!pages->normal_num) {
> > > > +        p->next_packet_size = 0;
> > > > +        goto out;
> > > > +    }
> > > > +
> > > >      multifd_send_prepare_header(p);
> >
> > If this forms a pattern we can introduce multifd_send_prepare_common():
>
> I will add that in the next version.
>
> >
> > bool multifd_send_prepare_common()
> > {
> >     multifd_zero_page_check_send();
> >     if (...) {
> >
> >     }
> >     multifd_send_prepare_header();
> > }
> >
> > > >
> > > >      z->out.dst = z->zbuff;
> > > >      z->out.size = z->zbuff_len;
> > > >      z->out.pos = 0;
> > > >
> > > > -    for (i = 0; i < pages->num; i++) {
> > > > +    for (i = 0; i < pages->normal_num; i++) {
> > > >          ZSTD_EndDirective flush = ZSTD_e_continue;
> > > >
> > > > -        if (i == pages->num - 1) {
> > > > +        if (i == pages->normal_num - 1) {
> > > >              flush = ZSTD_e_flush;
> > > >          }
> > > > -        z->in.src = p->pages->block->host + pages->offset[i];
> > > > +        z->in.src = p->pages->block->host + pages->normal[i];
> > > >          z->in.size = p->page_size;
> > > >          z->in.pos = 0;
> > > >
> > > > @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >      p->iov[p->iovs_num].iov_len = z->out.pos;
> > > >      p->iovs_num++;
> > > >      p->next_packet_size = z->out.pos;
> > > > -    p->flags |= MULTIFD_FLAG_ZSTD;
> > > >
> > > > +out:
> > > > +    p->flags |= MULTIFD_FLAG_ZSTD;
> > > >      multifd_send_fill_packet(p);
> > > > -
> > > >      return 0;
> > > >  }
> > > >
> > > > @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
> > > >                     p->id, flags, MULTIFD_FLAG_ZSTD);
> > > >          return -1;
> > > >      }
> > > > +
> > > > +    multifd_zero_page_check_recv(p);
> > > > +
> > > > +    if (!p->normal_num) {
> > > > +        assert(in_size == 0);
> > > > +        return 0;
> > > > +    }
> > > > +
> > > >      ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
> > > >
> > > >      if (ret != 0) {
> > > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > > index a33dba40d9..fbb40ea10b 100644
> > > > --- a/migration/multifd.c
> > > > +++ b/migration/multifd.c
> > > > @@ -11,6 +11,7 @@
> > > >   */
> > > >
> > > >  #include "qemu/osdep.h"
> > > > +#include "qemu/cutils.h"
> > > >  #include "qemu/rcu.h"
> > > >  #include "exec/target_page.h"
> > > >  #include "sysemu/sysemu.h"
> > > > @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >      MultiFDPages_t *pages = p->pages;
> > > >      int ret;
> > > >
> > > > +    multifd_zero_page_check_send(p);
> > > > +
> > > >      if (!use_zero_copy_send) {
> > > >          /*
> > > >           * Only !zerocopy needs the header in IOV; zerocopy will
> > > > @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> > > >          multifd_send_prepare_header(p);
> > > >      }
> > > >
> > > > -    for (int i = 0; i < pages->num; i++) {
> > > > -        p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> > > > +    for (int i = 0; i < pages->normal_num; i++) {
> > > > +        p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
> > > >          p->iov[p->iovs_num].iov_len = p->page_size;
> > > >          p->iovs_num++;
> > > >      }
> > > >
> > > > -    p->next_packet_size = pages->num * p->page_size;
> > > > +    p->next_packet_size = pages->normal_num * p->page_size;
> > > >      p->flags |= MULTIFD_FLAG_NOCOMP;
> > > >
> > > >      multifd_send_fill_packet(p);
> > > > @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
> > > >                     p->id, flags, MULTIFD_FLAG_NOCOMP);
> > > >          return -1;
> > > >      }
> > > > +
> > > > +    multifd_zero_page_check_recv(p);
> > > > +
> > > > +    if (!p->normal_num) {
> > > > +        return 0;
> > > > +    }
> > > > +
> > > >      for (int i = 0; i < p->normal_num; i++) {
> > > >          p->iov[i].iov_base = p->host + p->normal[i];
> > > >          p->iov[i].iov_len = p->page_size;
> > > > @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> > > >
> > > >      packet->flags = cpu_to_be32(p->flags);
> > > >      packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> > > > -    packet->normal_pages = cpu_to_be32(pages->num);
> > > > +    packet->normal_pages = cpu_to_be32(pages->normal_num);
> > > >      packet->zero_pages = cpu_to_be32(pages->zero_num);
> > > >      packet->next_packet_size = cpu_to_be32(p->next_packet_size);
> > > >
> > > > @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> > > >          strncpy(packet->ramblock, pages->block->idstr, 256);
> > > >      }
> > > >
> > > > -    for (i = 0; i < pages->num; i++) {
> > > > +    for (i = 0; i < pages->normal_num; i++) {
> > > >          /* there are architectures where ram_addr_t is 32 bit */
> > > > -        uint64_t temp = pages->offset[i];
> > > > +        uint64_t temp = pages->normal[i];
> > > >
> > > >          packet->offset[i] = cpu_to_be64(temp);
> > > >      }
> > > >
> > > > +    for (i = 0; i < pages->zero_num; i++) {
> > > > +        /* there are architectures where ram_addr_t is 32 bit */
> > > > +        uint64_t temp = pages->zero[i];
> > > > +
> > > > +        packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> > > > +    }
> > > > +
> > > >      p->packets_sent++;
> > > > -    p->total_normal_pages += pages->num;
> > > > +    p->total_normal_pages += pages->normal_num;
> > > >      p->total_zero_pages += pages->zero_num;
> > > >
> > > > -    trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> > > > +    trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
> > > >                         p->flags, p->next_packet_size);
> > > >  }
> > > >
> > > > @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> > > >          p->normal[i] = offset;
> > > >      }
> > > >
> > > > +    for (i = 0; i < p->zero_num; i++) {
> > > > +        uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> > > > +
> > > > +        if (offset > (p->block->used_length - p->page_size)) {
> > > > +            error_setg(errp, "multifd: offset too long %" PRIu64
> > > > +                       " (max " RAM_ADDR_FMT ")",
> > > > +                       offset, p->block->used_length);
> > > > +            return -1;
> > > > +        }
> > > > +        p->zero[i] = offset;
> > > > +    }
> > > > +
> > > >      return 0;
> > > >  }
> > > >
> > > > @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
> > > >
> > > >              stat64_add(&mig_stats.multifd_bytes,
> > > >                         p->next_packet_size + p->packet_len);
> > > > -            stat64_add(&mig_stats.normal_pages, pages->num);
> > > > +            stat64_add(&mig_stats.normal_pages, pages->normal_num);
> > > >              stat64_add(&mig_stats.zero_pages, pages->zero_num);
> > > >
> > > >              multifd_pages_reset(p->pages);
> > > > @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
> > > >          p->flags &= ~MULTIFD_FLAG_SYNC;
> > > >          qemu_mutex_unlock(&p->mutex);
> > > >
> > > > -        if (p->normal_num) {
> > > > +        if (p->normal_num + p->zero_num) {
> > > > +            assert(!(flags & MULTIFD_FLAG_SYNC));
> > >
> > > This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
> > > along with the data packet.
> > >
> > > >              ret = multifd_recv_state->ops->recv_pages(p, &local_err);
> > > >              if (ret != 0) {
> > > >                  break;
> > > > diff --git a/migration/multifd.h b/migration/multifd.h
> > > > index 9822ff298a..125f0bbe60 100644
> > > > --- a/migration/multifd.h
> > > > +++ b/migration/multifd.h
> > > > @@ -53,6 +53,11 @@ typedef struct {
> > > >      uint32_t unused32[1];    /* Reserved for future use */
> > > >      uint64_t unused64[3];    /* Reserved for future use */
> > > >      char ramblock[256];
> > > > +    /*
> > > > +     * This array contains the pointers to:
> > > > +     *  - normal pages (initial normal_pages entries)
> > > > +     *  - zero pages (following zero_pages entries)
> > > > +     */
> > > >      uint64_t offset[];
> > > >  } __attribute__((packed)) MultiFDPacket_t;
> > > >
> > > > @@ -224,6 +229,8 @@ typedef struct {
> > > >
> > > >  void multifd_register_ops(int method, MultiFDMethods *ops);
> > > >  void multifd_send_fill_packet(MultiFDSendParams *p);
> > > > +void multifd_zero_page_check_send(MultiFDSendParams *p);
> > > > +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
> > > >
> > > >  static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> > > >  {
> > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > index 99843a8e95..e2450b92d4 100644
> > > > --- a/qapi/migration.json
> > > > +++ b/qapi/migration.json
> > > > @@ -660,9 +660,11 @@
> > > >  #
> > > >  # @none: Do not perform zero page checking.
> > > >  #
> > > > +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> > > > +#
> > > >  ##
> > > >  { 'enum': 'ZeroPageDetection',
> > > > -  'data': [ 'legacy', 'none' ] }
> > > > +  'data': [ 'legacy', 'none', 'multifd' ] }
> > > >
> > > >  ##
> > > >  # @BitmapMigrationBitmapAliasTransform:
> > >
> >
> > --
> > Peter Xu
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-23  6:02       ` [External] " Hao Xiang
@ 2024-02-24 23:03         ` Hao Xiang
  2024-02-26  1:43           ` Peter Xu
  0 siblings, 1 reply; 42+ messages in thread
From: Hao Xiang @ 2024-02-24 23:03 UTC (permalink / raw)
  To: Peter Xu
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Thu, Feb 22, 2024 at 10:02 PM Hao Xiang <hao.xiang@bytedance.com> wrote:
>
> On Thu, Feb 22, 2024 at 6:33 PM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 06:06:19PM -0300, Fabiano Rosas wrote:
> > > Hao Xiang <hao.xiang@bytedance.com> writes:
> > >
> > > > This change adds a dedicated handler for MigrationOps::ram_save_target_page in
> > >
> > > nit: Add a dedicated handler...
> > >
> > > Usually "this patch/change" is used only when necessary to avoid
> > > ambiguity.
> > >
> > > > multifd live migration. Now zero page checking can be done in the multifd threads
> > > > and this becomes the default configuration. We still provide backward compatibility
> > > > where zero page checking is done from the migration main thread.
> > > >
> > > > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > > > ---
> > > >  migration/multifd.c |  1 +
> > > >  migration/options.c |  2 +-
> > > >  migration/ram.c     | 53 ++++++++++++++++++++++++++++++++++-----------
> > > >  3 files changed, 42 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > > index fbb40ea10b..ef5dad1019 100644
> > > > --- a/migration/multifd.c
> > > > +++ b/migration/multifd.c
> > > > @@ -13,6 +13,7 @@
> > > >  #include "qemu/osdep.h"
> > > >  #include "qemu/cutils.h"
> > >
> > > This include...
> > >
> > > >  #include "qemu/rcu.h"
> > > > +#include "qemu/cutils.h"
> > >
> > > is there already.
> > >
> > > >  #include "exec/target_page.h"
> > > >  #include "sysemu/sysemu.h"
> > > >  #include "exec/ramblock.h"
> > > > diff --git a/migration/options.c b/migration/options.c
> > > > index 3c603391b0..3c79b6ccd4 100644
> > > > --- a/migration/options.c
> > > > +++ b/migration/options.c
> > > > @@ -181,7 +181,7 @@ Property migration_properties[] = {
> > > >                        MIG_MODE_NORMAL),
> > > >      DEFINE_PROP_ZERO_PAGE_DETECTION("zero-page-detection", MigrationState,
> > > >                         parameters.zero_page_detection,
> > > > -                       ZERO_PAGE_DETECTION_LEGACY),
> > > > +                       ZERO_PAGE_DETECTION_MULTIFD),
> > >
> > > I think we'll need something to avoid a 9.0 -> 8.2 migration with this
> > > enabled. Otherwise it will go along happily until we get data corruption
> > > because the new QEMU didn't send any zero pages on the migration thread
> > > and the old QEMU did not look for them in the multifd packet.
> >
> > It could be even worse, as the new QEMU will only attach "normal" pages
> > after the multifd packet, the old QEMU could read more than it could,
> > expecting all pages..
> >
> > >
> > > Perhaps bumping the MULTIFD_VERSION when ZERO_PAGE_DETECTION_MULTIFD is
> > > in use. We'd just need to fix the test in the new QEMU to check
> > > (msg.version > MULTIFD_VERSION) instead of (msg.version != MULTIFD_VERSION).
> >
> > IMHO we don't need yet to change MULTIFD_VERSION, what we need is perhaps a
> > compat entry in hw_compat_8_2 setting "zero-page-detection" to "legacy".
> > We should make sure when "legacy" is set, multifd ran the old protocol
> > (zero_num will always be 0, and will be ignored by old QEMUs, IIUC).
> >
> > One more comment is, when repost please consider split this patch into two;
> > The new ram_save_target_page_multifd() hook can be done in another patch,
> > AFAIU.
>
> Sorry, I kept missing this. I will keep telling myself, compatibility
> is king. I will set the hw_compat_8_2 setting and make sure to test
> migration 9.0 -> 8.2 fails with "multifd" option set.
> Will split patches.

So I just want to make sure I am coding the right solution. I added
setting "zero-page-detection" to "legacy" in hw_compat_8_2 and tested
it. The behavior is that if I set machine type to pc-q35-8.2,
zero-page-detection will automatically be set to "legacy". But if I
set the machine type to pc-q35-9.0, zero-page-detection will be the
default value "multifd". However, this doesn't seem to be a hard
requirement because I can still override zero-page-detection to
multifd on machine type pc-q35-8.2. Is this OK?

>
> >
> > >
> > > >
> > > >      /* Migration capabilities */
> > > >      DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
> > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > index 5ece9f042e..b088c5a98c 100644
> > > > --- a/migration/ram.c
> > > > +++ b/migration/ram.c
> > > > @@ -1123,10 +1123,6 @@ static int save_zero_page(RAMState *rs, PageSearchStatus *pss,
> > > >      QEMUFile *file = pss->pss_channel;
> > > >      int len = 0;
> > > >
> > > > -    if (migrate_zero_page_detection() != ZERO_PAGE_DETECTION_LEGACY) {
> > > > -        return 0;
> > > > -    }
> > >
> > > How does 'none' work now?
> > >
> > > > -
> > > >      if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
> > > >          return 0;
> > > >      }
> > > > @@ -1256,6 +1252,10 @@ static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> > > >
> > > >  static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
> > > >  {
> > > > +    assert(migrate_multifd());
> > > > +    assert(!migrate_compress());
> > > > +    assert(!migration_in_postcopy());
> > >
> > > Drop these, please. Keep only the asserts that are likely to trigger
> > > during development, such as the existing ones at multifd_send_pages.
> > >
> > > > +
> > > >      if (!multifd_queue_page(block, offset)) {
> > > >          return -1;
> > > >      }
> > > > @@ -2046,7 +2046,6 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
> > > >   */
> > > >  static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > > >  {
> > > > -    RAMBlock *block = pss->block;
> > > >      ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > >      int res;
> > > >
> > > > @@ -2062,17 +2061,40 @@ static int ram_save_target_page_legacy(RAMState *rs, PageSearchStatus *pss)
> > > >          return 1;
> > > >      }
> > > >
> > > > +    return ram_save_page(rs, pss);
> > >
> > > Look at where git put this! Are you using the default diff algorithm? If
> > > not try using --patience to see if it improves the diff.
> > >
> > > > +}
> > > > +
> > > > +/**
> > > > + * ram_save_target_page_multifd: save one target page
> > > > + *
> > > > + * Returns the number of pages written
> > >
> > > We could be more precise here:
> > >
> > >  ram_save_target_page_multifd: send one target page to multifd workers
> > >
> > >  Returns 1 if the page was queued, -1 otherwise.
> > >
> > > > + *
> > > > + * @rs: current RAM state
> > > > + * @pss: data about the page we want to send
> > > > + */
> > > > +static int ram_save_target_page_multifd(RAMState *rs, PageSearchStatus *pss)
> > > > +{
> > > > +    RAMBlock *block = pss->block;
> > > > +    ram_addr_t offset = ((ram_addr_t)pss->page) << TARGET_PAGE_BITS;
> > > > +
> > > > +    /* Multifd is not compatible with old compression. */
> > > > +    assert(!migrate_compress());
> > >
> > > This should already be enforced at options.c.
> > >
> > > > +
> > > > +    /* Multifd is not compabible with postcopy. */
> > > > +    assert(!migration_in_postcopy());
> > >
> > > Same here.
> > >
> > > > +
> > > >      /*
> > > > -     * Do not use multifd in postcopy as one whole host page should be
> > > > -     * placed.  Meanwhile postcopy requires atomic update of pages, so even
> > > > -     * if host page size == guest page size the dest guest during run may
> > > > -     * still see partially copied pages which is data corruption.
> > > > +     * Backward compatibility support. While using multifd live
> > > > +     * migration, we still need to handle zero page checking on the
> > > > +     * migration main thread.
> > > >       */
> > > > -    if (migrate_multifd() && !migration_in_postcopy()) {
> > > > -        return ram_save_multifd_page(block, offset);
> > > > +    if (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_LEGACY) {
> > > > +        if (save_zero_page(rs, pss, offset)) {
> > > > +            return 1;
> > > > +        }
> > > >      }
> > > >
> > > > -    return ram_save_page(rs, pss);
> > > > +    return ram_save_multifd_page(block, offset);
> > > >  }
> > > >
> > > >  /* Should be called before sending a host page */
> > > > @@ -2984,7 +3006,12 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> > > >      }
> > > >
> > > >      migration_ops = g_malloc0(sizeof(MigrationOps));
> > > > -    migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > > +
> > > > +    if (migrate_multifd()) {
> > > > +        migration_ops->ram_save_target_page = ram_save_target_page_multifd;
> > > > +    } else {
> > > > +        migration_ops->ram_save_target_page = ram_save_target_page_legacy;
> > > > +    }
> > > >
> > > >      bql_unlock();
> > > >      ret = multifd_send_sync_main();
> > >
> >
> > --
> > Peter Xu
> >


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
  2024-02-24 22:56         ` Hao Xiang
@ 2024-02-26  1:30           ` Peter Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Xu @ 2024-02-26  1:30 UTC (permalink / raw)
  To: Hao Xiang
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Sat, Feb 24, 2024 at 02:56:15PM -0800, Hao Xiang wrote:
> > > > I don't think it's super clean to have three arrays offset, zero and
> > > > normal, all sized for the full packet size. It might be possible to just
> > > > carry a bitmap of non-zero pages along with pages->offset and operate on
> > > > that instead.
> > > >
> > > > What do you think?
> > > >
> > > > Peter, any ideas? Should we just leave this for another time?
> > >
> > > Yeah I think a bitmap should save quite a few fields indeed, it'll however
> > > make the latter iteration slightly harder by walking both (offset[],
> > > bitmap), process the page only if bitmap is set for the offset.
> > >
> > > IIUC we perhaps don't even need a bitmap?  AFAIU what we only need in
> > > Multifdpages_t is one extra field to mark "how many normal pages", aka,
> > > normal_num here (zero_num can be calculated from num-normal_num).  Then
> > > the zero page detection logic should do two things:
> > >
> > >   - Sort offset[] array so that it starts with normal pages, followed up by
> > >     zero pages
> > >
> > >   - Setup normal_num to be the number of normal pages
> > >
> > > Then we reduce 2 new arrays (normal[], zero[]) + 2 new fields (normal_num,
> > > zero_num) -> 1 new field (normal_num).  It'll also be trivial to fill the
> > > packet header later because offset[] is exactly that.
> > >
> > > Side note - I still think it's confusing to read this patch and previous
> > > patch separately.  Obviously previous patch introduced these new fields
> > > without justifying their values yet.  IMHO it'll be easier to review if you
> > > merge the two patches.
> >
> > Fabiano, thanks for catching this. I totally missed the backward
> > compatibility thing.
> > Peter, I will code the sorting and merge this patch with the previous one.
> >
> It turns out that we still need to add a "zero_pages" field in
> MultiFDPacket_t because the existing field "pages_alloc" is not the
> total number of pages in "offset". So source can set "zero_pages" from
> pages->num - pages->num_normal but "zero_pages" needs to be set in the
> packet.

Yes, one more field should be needed in MultiFDPacket_t.  Noet that what I
said above was about Multifdpages_t, not MultiFDPacket_t (which is the wire
protocol instead).  To support zero page offloading we should need one more
field for each.

IMHO MultiFDPacket_t.pages_alloc is redundant and actually not useful..
It's just that it existed in the wire protocol already so maybe we'd still
better keep it there..

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads.
  2024-02-24 23:03         ` Hao Xiang
@ 2024-02-26  1:43           ` Peter Xu
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Xu @ 2024-02-26  1:43 UTC (permalink / raw)
  To: Hao Xiang
  Cc: Fabiano Rosas, pbonzini, berrange, eduardo, eblake, armbru, thuth,
	lvivier, qemu-devel, jdenemar

On Sat, Feb 24, 2024 at 03:03:15PM -0800, Hao Xiang wrote:
> So I just want to make sure I am coding the right solution. I added
> setting "zero-page-detection" to "legacy" in hw_compat_8_2 and tested
> it. The behavior is that if I set machine type to pc-q35-8.2,
> zero-page-detection will automatically be set to "legacy". But if I
> set the machine type to pc-q35-9.0, zero-page-detection will be the
> default value "multifd". However, this doesn't seem to be a hard
> requirement because I can still override zero-page-detection to
> multifd on machine type pc-q35-8.2. Is this OK?

What we want to guarantee is old 8.2 users can smoothly migrate to the new
qemus, and existing 8.2 (or prior) users definitely don't have any override
over the new parameter zero-page-detection simply because 8.2 (or prior)
binary doesn't support it yet.

Then, if someone is using new binary with 8.2 machine types, meanwhile
override this default value, it means it's the user's choice of doing this,
and the user should guarantee all the qemus he/she manages also keeps this
parameter override to make sure migration will work between these qemu
processes.

So in short, that's all fine.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
                     ` (2 preceding siblings ...)
  2024-02-22 10:36   ` Peter Xu
@ 2024-02-26  7:18   ` Wang, Lei
  2024-02-26 19:45     ` [External] " Hao Xiang
  3 siblings, 1 reply; 42+ messages in thread
From: Wang, Lei @ 2024-02-26  7:18 UTC (permalink / raw)
  To: Hao Xiang, pbonzini, berrange, eduardo, peterx, farosas, eblake,
	armbru, thuth, lvivier, qemu-devel, jdenemar

On 2/17/2024 6:39, Hao Xiang wrote:
> This new parameter controls where the zero page checking is running.
> 1. If this parameter is set to 'legacy', zero page checking is
> done in the migration main thread.
> 2. If this parameter is set to 'none', zero page checking is disabled.
> 
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
>  hw/core/qdev-properties-system.c    | 10 ++++++++++
>  include/hw/qdev-properties-system.h |  4 ++++
>  migration/migration-hmp-cmds.c      |  9 +++++++++
>  migration/options.c                 | 21 ++++++++++++++++++++
>  migration/options.h                 |  1 +
>  migration/ram.c                     |  4 ++++
>  qapi/migration.json                 | 30 ++++++++++++++++++++++++++---
>  7 files changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
> index 1a396521d5..63843f18b5 100644
> --- a/hw/core/qdev-properties-system.c
> +++ b/hw/core/qdev-properties-system.c
> @@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
>      .set_default_value = qdev_propinfo_set_default_value_enum,
>  };
>  
> +const PropertyInfo qdev_prop_zero_page_detection = {
> +    .name = "ZeroPageDetection",
> +    .description = "zero_page_detection values, "
> +                   "multifd,legacy,none",

Nit: Maybe multifd/legacy/none?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [External] Re: [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection.
  2024-02-26  7:18   ` Wang, Lei
@ 2024-02-26 19:45     ` Hao Xiang
  0 siblings, 0 replies; 42+ messages in thread
From: Hao Xiang @ 2024-02-26 19:45 UTC (permalink / raw)
  To: Wang, Lei
  Cc: pbonzini, berrange, eduardo, peterx, farosas, eblake, armbru,
	thuth, lvivier, qemu-devel, jdenemar

On Sun, Feb 25, 2024 at 11:19 PM Wang, Lei <lei4.wang@intel.com> wrote:
>
> On 2/17/2024 6:39, Hao Xiang wrote:
> > This new parameter controls where the zero page checking is running.
> > 1. If this parameter is set to 'legacy', zero page checking is
> > done in the migration main thread.
> > 2. If this parameter is set to 'none', zero page checking is disabled.
> >
> > Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> > ---
> >  hw/core/qdev-properties-system.c    | 10 ++++++++++
> >  include/hw/qdev-properties-system.h |  4 ++++
> >  migration/migration-hmp-cmds.c      |  9 +++++++++
> >  migration/options.c                 | 21 ++++++++++++++++++++
> >  migration/options.h                 |  1 +
> >  migration/ram.c                     |  4 ++++
> >  qapi/migration.json                 | 30 ++++++++++++++++++++++++++---
> >  7 files changed, 76 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
> > index 1a396521d5..63843f18b5 100644
> > --- a/hw/core/qdev-properties-system.c
> > +++ b/hw/core/qdev-properties-system.c
> > @@ -679,6 +679,16 @@ const PropertyInfo qdev_prop_mig_mode = {
> >      .set_default_value = qdev_propinfo_set_default_value_enum,
> >  };
> >
> > +const PropertyInfo qdev_prop_zero_page_detection = {
> > +    .name = "ZeroPageDetection",
> > +    .description = "zero_page_detection values, "
> > +                   "multifd,legacy,none",
>
> Nit: Maybe multifd/legacy/none?

I changed it to

.description = "zero_page_detection values, "
"none,legacy,multifd",

Since both "," and "/" are used in existing code, I think it would be
fine either way.


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-02-26 19:46 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
2024-02-21 12:03   ` Markus Armbruster
2024-02-23  4:22     ` [External] " Hao Xiang
2024-02-21 13:58   ` Elena Ufimtseva
2024-02-23  4:37     ` [External] " Hao Xiang
2024-02-22 10:36   ` Peter Xu
2024-02-26  7:18   ` Wang, Lei
2024-02-26 19:45     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
2024-02-21 15:37   ` Elena Ufimtseva
2024-02-23  4:18     ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
2024-02-16 23:49   ` Richard Henderson
2024-02-23  4:38     ` [External] " Hao Xiang
2024-02-24 19:06       ` Hao Xiang
2024-02-21 12:04   ` Markus Armbruster
2024-02-21 16:00   ` Elena Ufimtseva
2024-02-23  4:59     ` [External] " Hao Xiang
2024-02-21 21:04   ` Fabiano Rosas
2024-02-23  2:20     ` Peter Xu
2024-02-23  5:15       ` [External] " Hao Xiang
2024-02-24 22:56         ` Hao Xiang
2024-02-26  1:30           ` Peter Xu
2024-02-23  5:18     ` Hao Xiang
2024-02-23 14:47       ` Fabiano Rosas
2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
2024-02-21 16:11   ` Elena Ufimtseva
2024-02-23  5:24     ` [External] " Hao Xiang
2024-02-21 21:06   ` Fabiano Rosas
2024-02-23  2:33     ` Peter Xu
2024-02-23  6:02       ` [External] " Hao Xiang
2024-02-24 23:03         ` Hao Xiang
2024-02-26  1:43           ` Peter Xu
2024-02-23  5:47     ` Hao Xiang
2024-02-23 14:38       ` Fabiano Rosas
2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
2024-02-21 20:59   ` Fabiano Rosas
2024-02-23  4:20     ` [External] " Hao Xiang
2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
2024-02-21 12:07   ` Markus Armbruster
2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).