From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Yong Huang <yong.huang@smartx.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Wei Wang <wei.w.wang@intel.com>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Wed, 13 Nov 2024 19:49:44 +0100 [thread overview]
Message-ID: <382461ab-d620-4d2e-becd-720daadf3c55@redhat.com> (raw)
In-Reply-To: <ZzTkopUrLGL5iqSv@x1n>
On 13.11.24 18:40, Peter Xu wrote:
> On Tue, Nov 12, 2024 at 11:08:44AM +0100, David Hildenbrand wrote:
>> On 11.11.24 12:37, Yong Huang wrote:
>>>
>>>
>>> On Mon, Nov 11, 2024 at 6:42 PM David Hildenbrand <david@redhat.com
>>> <mailto:david@redhat.com>> wrote:
>>>
>>> On 11.11.24 11:08, Yong Huang wrote:
>>> >
>>> >
>>> > On Mon, Nov 11, 2024 at 5:27 PM David Hildenbrand
>>> <david@redhat.com <mailto:david@redhat.com>
>>> > <mailto:david@redhat.com <mailto:david@redhat.com>>> wrote:
>>> >
>>> > On 09.11.24 05:59, Hyman Huang wrote:
>>> > > The first iteration's RAMBlock dirty sync can be omitted
>>> because QEMU
>>> > > always initializes the RAMBlock's bmap to all 1s by default.
>>> > >
>>> > > Signed-off-by: Hyman Huang <yong.huang@smartx.com
>>> <mailto:yong.huang@smartx.com>
>>> > <mailto:yong.huang@smartx.com <mailto:yong.huang@smartx.com>>>
>>> > > ---
>>> > > migration/cpu-throttle.c | 2 +-
>>> > > migration/ram.c | 11 ++++++++---
>>> > > 2 files changed, 9 insertions(+), 4 deletions(-)
>>> > >
>>> > > diff --git a/migration/cpu-throttle.c b/migration/cpu-
>>> throttle.c
>>> > > index 5179019e33..674dc2004e 100644
>>> > > --- a/migration/cpu-throttle.c
>>> > > +++ b/migration/cpu-throttle.c
>>> > > @@ -141,7 +141,7 @@ void
>>> cpu_throttle_dirty_sync_timer_tick(void
>>> > *opaque)
>>> > > * effect on guest performance, therefore omit it to
>>> avoid
>>> > > * paying extra for the sync penalty.
>>> > > */
>>> > > - if (sync_cnt <= 1) {
>>> > > + if (!sync_cnt) {
>>> > > goto end;
>>> > > }
>>> > >
>>> > > diff --git a/migration/ram.c b/migration/ram.c
>>> > > index 05ff9eb328..571dba10b7 100644
>>> > > --- a/migration/ram.c
>>> > > +++ b/migration/ram.c
>>> > > @@ -2718,7 +2718,7 @@ static void ram_list_init_bitmaps(void)
>>> > > {
>>> > > MigrationState *ms = migrate_get_current();
>>> > > RAMBlock *block;
>>> > > - unsigned long pages;
>>> > > + unsigned long pages, clear_bmap_pages;
>>> > > uint8_t shift;
>>> > >
>>> > > /* Skip setting bitmap if there is no RAM */
>>> > > @@ -2736,6 +2736,7 @@ static void ram_list_init_bitmaps(void)
>>> > >
>>> > > RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>>> > > pages = block->max_length >> TARGET_PAGE_BITS;
>>> > > + clear_bmap_pages = clear_bmap_size(pages, shift);
>>> > > /*
>>> > > * The initial dirty bitmap for migration
>>> must be
>>> > set with all
>>> > > * ones to make sure we'll migrate every
>>> guest RAM
>>> > page to
>>> > > @@ -2751,7 +2752,12 @@ static void ram_list_init_bitmaps(void)
>>> > > block->file_bmap = bitmap_new(pages);
>>> > > }
>>> > > block->clear_bmap_shift = shift;
>>> > > - block->clear_bmap =
>>> > bitmap_new(clear_bmap_size(pages, shift));
>>> > > + block->clear_bmap = bitmap_new(clear_bmap_pages);
>>> > > + /*
>>> > > + * Set clear_bmap to 1 unconditionally, as we
>>> always
>>> > set bmap
>>> > > + * to all 1s by default.
>>> > > + */
>>> > > + bitmap_set(block->clear_bmap, 0,
>>> clear_bmap_pages);
>>> > > }
>>> > > }
>>> > > }
>>> > > @@ -2783,7 +2789,6 @@ static bool
>>> ram_init_bitmaps(RAMState *rs,
>>> > Error **errp)
>>> > > if (!ret) {
>>> > > goto out_unlock;
>>> > > }
>>> > > - migration_bitmap_sync_precopy(false);
>>> > > }
>>> > > }
>>> > > out_unlock:
>>> >
>>> >
>>> > For virtio-mem, we rely on the
>>> migration_bitmap_clear_discarded_pages()
>>> > call to clear all bits that correspond to unplugged memory
>>> ranges.
>>> >
>>> >
>>> > If we ommit the sync, we can likely have bits of unplugged
>>> ranges still
>>> > set to "1", meaning we would try migrate them later, although we
>>> > shouldn't?
>>> >
>>> >
>>> >
>>> > IIUC, migration_bitmap_clear_discarded_pagesis still called at
>>> the end of
>>> > ram_init_bitmaps no matter if we omit the first sync.
>>> > > PRECOPY_NOTIFY_SETUPnotification is sent out at the end of
>>> > ram_save_setup(ram_list_init_bitmaps),when
>>> > virtio_balloon_free_page_start() is
>>> > called,migration_bitmap_clear_discarded_pages() has already
>>> completed
>>> > and the
>>> > bmap has been correctly cleared.
>>> >
>>> > ram_save_setup
>>> > -> ram_list_init_bitmaps
>>> > -> migration_bitmap_clear_discarded_pages
>>> > -> return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>>> >
>>> > You can double check it.
>>>
>>> That's not my concern, let me clarify :)
>>>
>>>
>>> Assume in KVM the bitmap is all 1s ("everything dirty").
>>>
>>> In current code, we will sync the bitmap once (IIRC, clearing any dirty
>>> bits from KVM).
>>>
>>>
>>> For the old logic, write-protect and clear dirty bits are all done in
>>> the KVM_GET_DIRTY_LOG API, while with
>>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 feature enabled, clearing
>>> dirty bits are postponed in the KVM_CLEAR_DIRTY_LOG API, which
>>> is called right before page sending in the migration thread in QEMU.
>>>
>>>
>>> Then we call migration_bitmap_clear_discarded_pages() to clear all
>>> "discarded" pages that we shouldn't touch.
>>>
>>> When we do the next bitmap sync, we will not get a "1" for discarded
>>> ranges, and we will never try migrating discarded ranges.
>>>
>>>
>>> With your patch, we're omitting the first sync. Could we possibly get
>>> discarded ranges reported from KVM as dirty during the "now first" sync
>>> *after* the migration_bitmap_clear_discarded_pages() call, and try
>>> migrating discarded ranges?
>>>
>>> I did not dive deep into the code, maybe
>>> migration_bitmap_clear_discarded_pages() ends up clearing the bits in
>>>
>>>
>>> Yes, the migration_bitmap_clear_discarded_pages clear the bits in
>>> KVM in:
>>> ramblock_dirty_bitmap_clear_discarded_pages
>>> -> dirty_bitmap_clear_section
>>> -> migration_clear_memory_region_dirty_bitmap_range
>>> -> migration_clear_memory_region_dirty_bitmap
>>> -> memory_region_clear_dirty_bitmap
>>> -> KVM_CLEAR_DIRTY_LOG ioctl
>>>
>>
>> I just tried, and your patch breaks virtio-mem migration as I suspected.
>>
>> sudo build/qemu-system-x86_64 \
>> --enable-kvm \
>> -m 16G,maxmem=24G \
>> -object memory-backend-ram,id=mem1,size=16G \
>> -machine q35,memory-backend=mem1 \
>> -cpu max \
>> -smp 16 \
>> -nographic \
>> -nodefaults \
>> -net nic -net user \
>> -chardev stdio,nosignal,id=serial \
>> -hda Fedora-Server-KVM-40-1.14.x86_64.qcow2 \
>> -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>> -device isa-serial,chardev=serial \
>> -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>> -mon chardev=monitor,mode=readline \
>> -device pcie-root-port,id=root,slot=0 \
>> -object memory-backend-file,share=on,mem-path=/dev/shm/vm,id=mem2,size=8G \
>> -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=16M,bus=root,dynamic-memslots=on,prealloc=on \
>>
>>
>> Once the VM booted up, as expected we're consuming 16M
>>
>>
>> $ stat /dev/shm/vm
>> Datei: /dev/shm/vm
>> Größe: 8589934592 Blöcke: 32768 EA Block: 4096 reguläre Datei
>> Gerät: 0/25 Inode: 2087 Verknüpfungen: 1
>> tmpfs tmpfs 16G 16M 16G 1% /dev/shm
>>
>>
>> Let's start a migration:
>>
>> $ echo "migrate exec:cat>state" | sudo nc -U /var/tmp/mon_src
>>
>>
>> ... and we end up reading discarded memory:
>>
>> $ LANG=C df -ahT | grep /dev/shm
>> tmpfs tmpfs 16G 8.0G 7.6G 52% /dev/shm
>>
>>
>>
>> Running with TCG only also doesn't work. So somewhere, we have a bitmap filled with
>> all 1s that is not cleared if we drop the first sync.
>
> Hmm, I'm not yet sure why this happened, but indeed this reminds me that at
> least vhost can have similar issue: when vhost devices are used, it has its
> own bitmap so there it can keep having 1s in the unplugged regions when
> reported the 1st time.
I'm also surprised that it triggers even with TCG. Somewhere seems to be
a bitmap with all 1s hiding :)
>
> Is virtio-mem plug/unplug allowed during migration? I'm wondering whether
> below could happen while migration in progress:
>
> migration starts..
> bitmap init, disgard all unplugged mem in dirty bmap
> plug mem region X, dirty some pages
> unplug mem region X
> dirty sync, reports mem region X dirty (even though unplugged..)
> ...
No, for this (and other) reasons virtio_mem_is_busy() checks for
"migration_in_incoming_postcopy() || migration_is_running();" and
rejects any memory plug/unplug requests.
So the discarded state is stable while migration is running.
>
> So if unplugged mem should never be touched by qemu, then not sure whether
> above can trigger this case too.
>
> With/without above, I wonder if migration_bitmap_clear_discarded_pages()
> shouldn't rely on the initial sync of dirty bitmap, instead it could be
> done after each global sync: either another log_global_after_sync() hook,
> or just move it over in migration_bitmap_sync().
I think I had precisely that, and I recall you suggested to have it only
after the initial sync. Would work for me, but I'd still like to
understand why essentially none of the "discard" was effective -- all of
guest RAM got touched.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-11-13 18:50 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-09 4:59 [PATCH v1 0/2] migration: Skip the first dirty sync Hyman Huang
2024-11-09 4:59 ` [PATCH v1 1/2] virtio-balloon: Enable free page hinting during PRECOPY_NOTIFY_SETUP Hyman Huang
2024-11-12 10:10 ` David Hildenbrand
2024-11-09 4:59 ` [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration Hyman Huang
2024-11-11 9:07 ` Wang, Wei W
2024-11-11 10:20 ` Yong Huang
2024-11-11 9:27 ` David Hildenbrand
2024-11-11 10:08 ` Yong Huang
2024-11-11 10:42 ` David Hildenbrand
2024-11-11 11:14 ` Yong Huang
2024-11-11 11:37 ` Yong Huang
2024-11-12 10:08 ` David Hildenbrand
2024-11-13 17:40 ` Peter Xu
2024-11-13 18:49 ` David Hildenbrand [this message]
2024-11-13 20:12 ` Peter Xu
2024-11-14 9:02 ` David Hildenbrand
2024-11-14 19:28 ` Peter Xu
2024-11-14 21:16 ` David Hildenbrand
2024-11-14 22:40 ` Peter Xu
2024-11-15 9:11 ` David Hildenbrand
2024-11-15 15:41 ` Peter Xu
2024-11-15 15:49 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=382461ab-d620-4d2e-becd-720daadf3c55@redhat.com \
--to=david@redhat.com \
--cc=farosas@suse.de \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wei.w.wang@intel.com \
--cc=yong.huang@smartx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).