From: Peter Xu <peterx@redhat.com>
To: Chuang Xu <xuchuangxclwt@bytedance.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, sgarzare@redhat.com,
richard.henderson@linaro.org, pbonzini@redhat.com,
david@kernel.org, philmd@linaro.org, farosas@suse.de
Subject: Re: [RFC v1 2/2] migration: merge fragmented clear_dirty ioctls
Date: Tue, 9 Dec 2025 16:10:03 -0500 [thread overview]
Message-ID: <aTiQK-a-ZcANCFbk@x1.local> (raw)
In-Reply-To: <20251208120952.37563-3-xuchuangxclwt@bytedance.com>
On Mon, Dec 08, 2025 at 08:09:52PM +0800, Chuang Xu wrote:
> From: xuchuangxclwt <xuchuangxclwt@bytedance.com>
>
> When the addresses processed are not aligned, a large number of
> clear_dirty ioctl occur (e.g. a 16MB misaligned memory can generate
> 4096 clear_dirty ioctls), which increases the time required for
> bitmap_sync and makes it more difficult for dirty pages to converge.
>
> Attempt to merge those fragmented clear_dirty ioctls.
(besides separate perf results I requested as in the cover letter reply..)
Could you add something into the commit log explaining at least one example
that you observe? E.g. what is the VM setup, how many ramblocks are the
ones not aligned?
Have you considered setting rb->clear_bmap when it's available? It'll
postpone the remote clear even further until page sent. Logically it
should be more efficient, but it may depend on the size of unaligned
ramblocks that you're hitting indeed, as clear_bmap isn't PAGE_SIZE based
but it can be much bigger. Some discussion around this would be nice on
how you chose the current solution.
>
> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com>
> ---
> accel/tcg/cputlb.c | 5 +++--
> include/system/physmem.h | 3 ++-
> migration/ram.c | 17 ++++++++++++++++-
> system/memory.c | 2 +-
> system/physmem.c | 19 +++++++++++--------
> 5 files changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index fd1606c856..8a054cb545 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -857,8 +857,9 @@ void tlb_flush_page_bits_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
> void tlb_protect_code(ram_addr_t ram_addr)
> {
> physical_memory_test_and_clear_dirty(ram_addr & TARGET_PAGE_MASK,
> - TARGET_PAGE_SIZE,
> - DIRTY_MEMORY_CODE);
> + TARGET_PAGE_SIZE,
> + DIRTY_MEMORY_CODE,
> + true);
> }
>
> /* update the TLB so that writes in physical page 'phys_addr' are no longer
> diff --git a/include/system/physmem.h b/include/system/physmem.h
> index 879f6eae38..8529f0510a 100644
> --- a/include/system/physmem.h
> +++ b/include/system/physmem.h
> @@ -41,7 +41,8 @@ void physical_memory_dirty_bits_cleared(ram_addr_t start, ram_addr_t length);
>
> bool physical_memory_test_and_clear_dirty(ram_addr_t start,
> ram_addr_t length,
> - unsigned client);
> + unsigned client,
> + bool clear_dirty);
>
> DirtyBitmapSnapshot *
> physical_memory_snapshot_and_clear_dirty(MemoryRegion *mr, hwaddr offset,
> diff --git a/migration/ram.c b/migration/ram.c
> index 29f016cb25..329168d081 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -995,18 +995,33 @@ static uint64_t physical_memory_sync_dirty_bitmap(RAMBlock *rb,
> }
> } else {
> ram_addr_t offset = rb->offset;
> + unsigned long end, start_page;
> + uint64_t mr_offset, mr_size;
>
> for (addr = 0; addr < length; addr += TARGET_PAGE_SIZE) {
> + /*
> + * Here we don't expect so many clear_dirty, so we call
> + * physical_memory_test_and_clear_dirty with clear_dirty
> + * set to false. Later we'll do make an overall clear_dirty
> + * outside the loop.
> + */
> if (physical_memory_test_and_clear_dirty(
> start + addr + offset,
> TARGET_PAGE_SIZE,
> - DIRTY_MEMORY_MIGRATION)) {
> + DIRTY_MEMORY_MIGRATION,
> + false)) {
> long k = (start + addr) >> TARGET_PAGE_BITS;
> if (!test_and_set_bit(k, dest)) {
> num_dirty++;
> }
> }
> }
> + end = TARGET_PAGE_ALIGN(start + addr + offset + TARGET_PAGE_SIZE)
> + >> TARGET_PAGE_BITS;
> + start_page = (start + offset) >> TARGET_PAGE_BITS;
> + mr_offset = (ram_addr_t)(start_page << TARGET_PAGE_BITS) - offset;
> + mr_size = (end - start_page) << TARGET_PAGE_BITS;
> + memory_region_clear_dirty_bitmap(rb->mr, mr_offset, mr_size);
> }
>
> return num_dirty;
> diff --git a/system/memory.c b/system/memory.c
> index 8b84661ae3..7b81b84f30 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2424,7 +2424,7 @@ void memory_region_reset_dirty(MemoryRegion *mr, hwaddr addr,
> {
> assert(mr->ram_block);
> physical_memory_test_and_clear_dirty(
> - memory_region_get_ram_addr(mr) + addr, size, client);
> + memory_region_get_ram_addr(mr) + addr, size, client, true);
> }
>
> int memory_region_get_fd(MemoryRegion *mr)
> diff --git a/system/physmem.c b/system/physmem.c
> index c9869e4049..21b2db3884 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -1091,8 +1091,9 @@ void physical_memory_set_dirty_range(ram_addr_t start, ram_addr_t length,
>
> /* Note: start and end must be within the same ram block. */
> bool physical_memory_test_and_clear_dirty(ram_addr_t start,
> - ram_addr_t length,
> - unsigned client)
> + ram_addr_t length,
> + unsigned client,
> + bool clear_dirty)
> {
> DirtyMemoryBlocks *blocks;
> unsigned long end, page, start_page;
> @@ -1126,9 +1127,11 @@ bool physical_memory_test_and_clear_dirty(ram_addr_t start,
> page += num;
> }
>
> - mr_offset = (ram_addr_t)(start_page << TARGET_PAGE_BITS) - ramblock->offset;
> - mr_size = (end - start_page) << TARGET_PAGE_BITS;
> - memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
> + if (clear_dirty) {
> + mr_offset = (ram_addr_t)(start_page << TARGET_PAGE_BITS) - ramblock->offset;
> + mr_size = (end - start_page) << TARGET_PAGE_BITS;
> + memory_region_clear_dirty_bitmap(ramblock->mr, mr_offset, mr_size);
> + }
> }
>
> if (dirty) {
> @@ -1140,9 +1143,9 @@ bool physical_memory_test_and_clear_dirty(ram_addr_t start,
>
> static void physical_memory_clear_dirty_range(ram_addr_t addr, ram_addr_t length)
> {
> - physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_MIGRATION);
> - physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_VGA);
> - physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_CODE);
> + physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_MIGRATION, true);
> + physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_VGA, true);
> + physical_memory_test_and_clear_dirty(addr, length, DIRTY_MEMORY_CODE, true);
> }
>
> DirtyBitmapSnapshot *physical_memory_snapshot_and_clear_dirty
> --
> 2.20.1
>
--
Peter Xu
next prev parent reply other threads:[~2025-12-09 21:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 12:09 [RFC v1 0/2] migration: reduce bitmap sync time and make dirty pages converge much more easily Chuang Xu
2025-12-08 12:09 ` [RFC v1 1/2] vhost: eliminate duplicate dirty_bitmap sync when log shared by multiple devices Chuang Xu
2025-12-09 20:47 ` Peter Xu
2025-12-10 6:51 ` Jason Wang
2025-12-10 13:52 ` Chuang Xu
2025-12-08 12:09 ` [RFC v1 2/2] migration: merge fragmented clear_dirty ioctls Chuang Xu
2025-12-09 21:10 ` Peter Xu [this message]
2025-12-10 14:18 ` Chuang Xu
2025-12-10 19:45 ` Peter Xu
2025-12-09 21:06 ` [RFC v1 0/2] migration: reduce bitmap sync time and make dirty pages converge much more easily Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTiQK-a-ZcANCFbk@x1.local \
--to=peterx@redhat.com \
--cc=david@kernel.org \
--cc=farosas@suse.de \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=sgarzare@redhat.com \
--cc=xuchuangxclwt@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).