From: Fabiano Rosas <farosas@suse.de>
To: Hao Xiang <hao.xiang@bytedance.com>,
pbonzini@redhat.com, berrange@redhat.com, eduardo@habkost.net,
peterx@redhat.com, eblake@redhat.com, armbru@redhat.com,
thuth@redhat.com, lvivier@redhat.com, qemu-devel@nongnu.org,
jdenemar@redhat.com
Cc: Hao Xiang <hao.xiang@bytedance.com>
Subject: Re: [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread.
Date: Wed, 21 Feb 2024 18:04:10 -0300 [thread overview]
Message-ID: <877cixbkc5.fsf@suse.de> (raw)
In-Reply-To: <20240216224002.1476890-4-hao.xiang@bytedance.com>
Hao Xiang <hao.xiang@bytedance.com> writes:
> 1. Implements the zero page detection and handling on the multifd
> threads for non-compression, zlib and zstd compression backends.
> 2. Added a new value 'multifd' in ZeroPageDetection enumeration.
> 3. Add proper asserts to ensure pages->normal are used for normal pages
> in all scenarios.
>
> Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
> ---
> migration/meson.build | 1 +
> migration/multifd-zero-page.c | 59 +++++++++++++++++++++++++++++++++++
> migration/multifd-zlib.c | 26 ++++++++++++---
> migration/multifd-zstd.c | 25 ++++++++++++---
> migration/multifd.c | 50 +++++++++++++++++++++++------
> migration/multifd.h | 7 +++++
> qapi/migration.json | 4 ++-
> 7 files changed, 151 insertions(+), 21 deletions(-)
> create mode 100644 migration/multifd-zero-page.c
>
> diff --git a/migration/meson.build b/migration/meson.build
> index 92b1cc4297..1eeb915ff6 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -22,6 +22,7 @@ system_ss.add(files(
> 'migration.c',
> 'multifd.c',
> 'multifd-zlib.c',
> + 'multifd-zero-page.c',
> 'ram-compress.c',
> 'options.c',
> 'postcopy-ram.c',
> diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
> new file mode 100644
> index 0000000000..f0cd8e2c53
> --- /dev/null
> +++ b/migration/multifd-zero-page.c
> @@ -0,0 +1,59 @@
> +/*
> + * Multifd zero page detection implementation.
> + *
> + * Copyright (c) 2024 Bytedance Inc
> + *
> + * Authors:
> + * Hao Xiang <hao.xiang@bytedance.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> +#include "exec/ramblock.h"
> +#include "migration.h"
> +#include "multifd.h"
> +#include "options.h"
> +#include "ram.h"
> +
> +void multifd_zero_page_check_send(MultiFDSendParams *p)
> +{
> + /*
> + * QEMU older than 9.0 don't understand zero page
> + * on multifd channel. This switch is required to
> + * maintain backward compatibility.
> + */
> + bool use_multifd_zero_page =
> + (migrate_zero_page_detection() == ZERO_PAGE_DETECTION_MULTIFD);
> + MultiFDPages_t *pages = p->pages;
> + RAMBlock *rb = pages->block;
> +
> + assert(pages->num != 0);
> + assert(pages->normal_num == 0);
> + assert(pages->zero_num == 0);
We can drop these before the final version.
> +
> + for (int i = 0; i < pages->num; i++) {
> + uint64_t offset = pages->offset[i];
> + if (use_multifd_zero_page &&
> + buffer_is_zero(rb->host + offset, p->page_size)) {
> + pages->zero[pages->zero_num] = offset;
> + pages->zero_num++;
> + ram_release_page(rb->idstr, offset);
> + } else {
> + pages->normal[pages->normal_num] = offset;
> + pages->normal_num++;
> + }
> + }
I don't think it's super clean to have three arrays offset, zero and
normal, all sized for the full packet size. It might be possible to just
carry a bitmap of non-zero pages along with pages->offset and operate on
that instead.
What do you think?
Peter, any ideas? Should we just leave this for another time?
> +}
> +
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p)
> +{
> + for (int i = 0; i < p->zero_num; i++) {
> + void *page = p->host + p->zero[i];
> + if (!buffer_is_zero(page, p->page_size)) {
> + memset(page, 0, p->page_size);
> + }
> + }
> +}
> diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> index 012e3bdea1..cdfe0fa70e 100644
> --- a/migration/multifd-zlib.c
> +++ b/migration/multifd-zlib.c
> @@ -123,13 +123,20 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> int ret;
> uint32_t i;
>
> + multifd_zero_page_check_send(p);
> +
> + if (!pages->normal_num) {
> + p->next_packet_size = 0;
> + goto out;
> + }
> +
> multifd_send_prepare_header(p);
>
> - for (i = 0; i < pages->num; i++) {
> + for (i = 0; i < pages->normal_num; i++) {
> uint32_t available = z->zbuff_len - out_size;
> int flush = Z_NO_FLUSH;
>
> - if (i == pages->num - 1) {
> + if (i == pages->normal_num - 1) {
> flush = Z_SYNC_FLUSH;
> }
>
> @@ -138,7 +145,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> * with compression. zlib does not guarantee that this is safe,
> * therefore copy the page before calling deflate().
> */
> - memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
> + memcpy(z->buf, p->pages->block->host + pages->normal[i], p->page_size);
> zs->avail_in = p->page_size;
> zs->next_in = z->buf;
>
> @@ -172,10 +179,10 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
> p->iov[p->iovs_num].iov_len = out_size;
> p->iovs_num++;
> p->next_packet_size = out_size;
> - p->flags |= MULTIFD_FLAG_ZLIB;
>
> +out:
> + p->flags |= MULTIFD_FLAG_ZLIB;
> multifd_send_fill_packet(p);
> -
> return 0;
> }
>
> @@ -261,6 +268,14 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> p->id, flags, MULTIFD_FLAG_ZLIB);
> return -1;
> }
> +
> + multifd_zero_page_check_recv(p);
> +
> + if (!p->normal_num) {
> + assert(in_size == 0);
> + return 0;
> + }
> +
> ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>
> if (ret != 0) {
> @@ -310,6 +325,7 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
> p->id, out_size, expected_size);
> return -1;
> }
> +
> return 0;
> }
>
> diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> index dc8fe43e94..27a1eba075 100644
> --- a/migration/multifd-zstd.c
> +++ b/migration/multifd-zstd.c
> @@ -118,19 +118,26 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> int ret;
> uint32_t i;
>
> + multifd_zero_page_check_send(p);
> +
> + if (!pages->normal_num) {
> + p->next_packet_size = 0;
> + goto out;
> + }
> +
> multifd_send_prepare_header(p);
>
> z->out.dst = z->zbuff;
> z->out.size = z->zbuff_len;
> z->out.pos = 0;
>
> - for (i = 0; i < pages->num; i++) {
> + for (i = 0; i < pages->normal_num; i++) {
> ZSTD_EndDirective flush = ZSTD_e_continue;
>
> - if (i == pages->num - 1) {
> + if (i == pages->normal_num - 1) {
> flush = ZSTD_e_flush;
> }
> - z->in.src = p->pages->block->host + pages->offset[i];
> + z->in.src = p->pages->block->host + pages->normal[i];
> z->in.size = p->page_size;
> z->in.pos = 0;
>
> @@ -161,10 +168,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
> p->iov[p->iovs_num].iov_len = z->out.pos;
> p->iovs_num++;
> p->next_packet_size = z->out.pos;
> - p->flags |= MULTIFD_FLAG_ZSTD;
>
> +out:
> + p->flags |= MULTIFD_FLAG_ZSTD;
> multifd_send_fill_packet(p);
> -
> return 0;
> }
>
> @@ -257,6 +264,14 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
> p->id, flags, MULTIFD_FLAG_ZSTD);
> return -1;
> }
> +
> + multifd_zero_page_check_recv(p);
> +
> + if (!p->normal_num) {
> + assert(in_size == 0);
> + return 0;
> + }
> +
> ret = qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);
>
> if (ret != 0) {
> diff --git a/migration/multifd.c b/migration/multifd.c
> index a33dba40d9..fbb40ea10b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -11,6 +11,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/cutils.h"
> #include "qemu/rcu.h"
> #include "exec/target_page.h"
> #include "sysemu/sysemu.h"
> @@ -126,6 +127,8 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> MultiFDPages_t *pages = p->pages;
> int ret;
>
> + multifd_zero_page_check_send(p);
> +
> if (!use_zero_copy_send) {
> /*
> * Only !zerocopy needs the header in IOV; zerocopy will
> @@ -134,13 +137,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
> multifd_send_prepare_header(p);
> }
>
> - for (int i = 0; i < pages->num; i++) {
> - p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
> + for (int i = 0; i < pages->normal_num; i++) {
> + p->iov[p->iovs_num].iov_base = pages->block->host + pages->normal[i];
> p->iov[p->iovs_num].iov_len = p->page_size;
> p->iovs_num++;
> }
>
> - p->next_packet_size = pages->num * p->page_size;
> + p->next_packet_size = pages->normal_num * p->page_size;
> p->flags |= MULTIFD_FLAG_NOCOMP;
>
> multifd_send_fill_packet(p);
> @@ -202,6 +205,13 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
> p->id, flags, MULTIFD_FLAG_NOCOMP);
> return -1;
> }
> +
> + multifd_zero_page_check_recv(p);
> +
> + if (!p->normal_num) {
> + return 0;
> + }
> +
> for (int i = 0; i < p->normal_num; i++) {
> p->iov[i].iov_base = p->host + p->normal[i];
> p->iov[i].iov_len = p->page_size;
> @@ -339,7 +349,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
>
> packet->flags = cpu_to_be32(p->flags);
> packet->pages_alloc = cpu_to_be32(p->pages->allocated);
> - packet->normal_pages = cpu_to_be32(pages->num);
> + packet->normal_pages = cpu_to_be32(pages->normal_num);
> packet->zero_pages = cpu_to_be32(pages->zero_num);
> packet->next_packet_size = cpu_to_be32(p->next_packet_size);
>
> @@ -350,18 +360,25 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
> strncpy(packet->ramblock, pages->block->idstr, 256);
> }
>
> - for (i = 0; i < pages->num; i++) {
> + for (i = 0; i < pages->normal_num; i++) {
> /* there are architectures where ram_addr_t is 32 bit */
> - uint64_t temp = pages->offset[i];
> + uint64_t temp = pages->normal[i];
>
> packet->offset[i] = cpu_to_be64(temp);
> }
>
> + for (i = 0; i < pages->zero_num; i++) {
> + /* there are architectures where ram_addr_t is 32 bit */
> + uint64_t temp = pages->zero[i];
> +
> + packet->offset[pages->normal_num + i] = cpu_to_be64(temp);
> + }
> +
> p->packets_sent++;
> - p->total_normal_pages += pages->num;
> + p->total_normal_pages += pages->normal_num;
> p->total_zero_pages += pages->zero_num;
>
> - trace_multifd_send(p->id, packet_num, pages->num, pages->zero_num,
> + trace_multifd_send(p->id, packet_num, pages->normal_num, pages->zero_num,
> p->flags, p->next_packet_size);
> }
>
> @@ -451,6 +468,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
> p->normal[i] = offset;
> }
>
> + for (i = 0; i < p->zero_num; i++) {
> + uint64_t offset = be64_to_cpu(packet->offset[p->normal_num + i]);
> +
> + if (offset > (p->block->used_length - p->page_size)) {
> + error_setg(errp, "multifd: offset too long %" PRIu64
> + " (max " RAM_ADDR_FMT ")",
> + offset, p->block->used_length);
> + return -1;
> + }
> + p->zero[i] = offset;
> + }
> +
> return 0;
> }
>
> @@ -842,7 +871,7 @@ static void *multifd_send_thread(void *opaque)
>
> stat64_add(&mig_stats.multifd_bytes,
> p->next_packet_size + p->packet_len);
> - stat64_add(&mig_stats.normal_pages, pages->num);
> + stat64_add(&mig_stats.normal_pages, pages->normal_num);
> stat64_add(&mig_stats.zero_pages, pages->zero_num);
>
> multifd_pages_reset(p->pages);
> @@ -1256,7 +1285,8 @@ static void *multifd_recv_thread(void *opaque)
> p->flags &= ~MULTIFD_FLAG_SYNC;
> qemu_mutex_unlock(&p->mutex);
>
> - if (p->normal_num) {
> + if (p->normal_num + p->zero_num) {
> + assert(!(flags & MULTIFD_FLAG_SYNC));
This breaks 8.2 -> 9.0 migration. QEMU 8.2 is still sending the SYNC
along with the data packet.
> ret = multifd_recv_state->ops->recv_pages(p, &local_err);
> if (ret != 0) {
> break;
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 9822ff298a..125f0bbe60 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -53,6 +53,11 @@ typedef struct {
> uint32_t unused32[1]; /* Reserved for future use */
> uint64_t unused64[3]; /* Reserved for future use */
> char ramblock[256];
> + /*
> + * This array contains the pointers to:
> + * - normal pages (initial normal_pages entries)
> + * - zero pages (following zero_pages entries)
> + */
> uint64_t offset[];
> } __attribute__((packed)) MultiFDPacket_t;
>
> @@ -224,6 +229,8 @@ typedef struct {
>
> void multifd_register_ops(int method, MultiFDMethods *ops);
> void multifd_send_fill_packet(MultiFDSendParams *p);
> +void multifd_zero_page_check_send(MultiFDSendParams *p);
> +void multifd_zero_page_check_recv(MultiFDRecvParams *p);
>
> static inline void multifd_send_prepare_header(MultiFDSendParams *p)
> {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 99843a8e95..e2450b92d4 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -660,9 +660,11 @@
> #
> # @none: Do not perform zero page checking.
> #
> +# @multifd: Perform zero page checking on the multifd sender thread. (since 9.0)
> +#
> ##
> { 'enum': 'ZeroPageDetection',
> - 'data': [ 'legacy', 'none' ] }
> + 'data': [ 'legacy', 'none', 'multifd' ] }
>
> ##
> # @BitmapMigrationBitmapAliasTransform:
next prev parent reply other threads:[~2024-02-21 21:05 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-16 22:39 [PATCH v2 0/7] Introduce multifd zero page checking Hao Xiang
2024-02-16 22:39 ` [PATCH v2 1/7] migration/multifd: Add new migration option zero-page-detection Hao Xiang
2024-02-21 12:03 ` Markus Armbruster
2024-02-23 4:22 ` [External] " Hao Xiang
2024-02-21 13:58 ` Elena Ufimtseva
2024-02-23 4:37 ` [External] " Hao Xiang
2024-02-22 10:36 ` Peter Xu
2024-02-26 7:18 ` Wang, Lei
2024-02-26 19:45 ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 2/7] migration/multifd: Support for zero pages transmission in multifd format Hao Xiang
2024-02-21 15:37 ` Elena Ufimtseva
2024-02-23 4:18 ` [External] " Hao Xiang
2024-02-16 22:39 ` [PATCH v2 3/7] migration/multifd: Zero page transmission on the multifd thread Hao Xiang
2024-02-16 23:49 ` Richard Henderson
2024-02-23 4:38 ` [External] " Hao Xiang
2024-02-24 19:06 ` Hao Xiang
2024-02-21 12:04 ` Markus Armbruster
2024-02-21 16:00 ` Elena Ufimtseva
2024-02-23 4:59 ` [External] " Hao Xiang
2024-02-21 21:04 ` Fabiano Rosas [this message]
2024-02-23 2:20 ` Peter Xu
2024-02-23 5:15 ` [External] " Hao Xiang
2024-02-24 22:56 ` Hao Xiang
2024-02-26 1:30 ` Peter Xu
2024-02-23 5:18 ` Hao Xiang
2024-02-23 14:47 ` Fabiano Rosas
2024-02-16 22:39 ` [PATCH v2 4/7] migration/multifd: Enable zero page checking from multifd threads Hao Xiang
2024-02-21 16:11 ` Elena Ufimtseva
2024-02-23 5:24 ` [External] " Hao Xiang
2024-02-21 21:06 ` Fabiano Rosas
2024-02-23 2:33 ` Peter Xu
2024-02-23 6:02 ` [External] " Hao Xiang
2024-02-24 23:03 ` Hao Xiang
2024-02-26 1:43 ` Peter Xu
2024-02-23 5:47 ` Hao Xiang
2024-02-23 14:38 ` Fabiano Rosas
2024-02-16 22:40 ` [PATCH v2 5/7] migration/multifd: Add new migration test cases for legacy zero page checking Hao Xiang
2024-02-21 20:59 ` Fabiano Rosas
2024-02-23 4:20 ` [External] " Hao Xiang
2024-02-16 22:40 ` [PATCH v2 6/7] migration/multifd: Add zero pages and zero bytes counter to migration status interface Hao Xiang
2024-02-21 12:07 ` Markus Armbruster
2024-02-16 22:40 ` [PATCH v2 7/7] Update maintainer contact for migration multifd zero page checking acceleration Hao Xiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cixbkc5.fsf@suse.de \
--to=farosas@suse.de \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=hao.xiang@bytedance.com \
--cc=jdenemar@redhat.com \
--cc=lvivier@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).