qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Yong Huang <yong.huang@smartx.com>,
	qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Wei Wang <wei.w.wang@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Wed, 13 Nov 2024 19:49:44 +0100	[thread overview]
Message-ID: <382461ab-d620-4d2e-becd-720daadf3c55@redhat.com> (raw)
In-Reply-To: <ZzTkopUrLGL5iqSv@x1n>

On 13.11.24 18:40, Peter Xu wrote:
> On Tue, Nov 12, 2024 at 11:08:44AM +0100, David Hildenbrand wrote:
>> On 11.11.24 12:37, Yong Huang wrote:
>>>
>>>
>>> On Mon, Nov 11, 2024 at 6:42 PM David Hildenbrand <david@redhat.com
>>> <mailto:david@redhat.com>> wrote:
>>>
>>>      On 11.11.24 11:08, Yong Huang wrote:
>>>       >
>>>       >
>>>       > On Mon, Nov 11, 2024 at 5:27 PM David Hildenbrand
>>>      <david@redhat.com <mailto:david@redhat.com>
>>>       > <mailto:david@redhat.com <mailto:david@redhat.com>>> wrote:
>>>       >
>>>       >     On 09.11.24 05:59, Hyman Huang wrote:
>>>       >      > The first iteration's RAMBlock dirty sync can be omitted
>>>      because QEMU
>>>       >      > always initializes the RAMBlock's bmap to all 1s by default.
>>>       >      >
>>>       >      > Signed-off-by: Hyman Huang <yong.huang@smartx.com
>>>      <mailto:yong.huang@smartx.com>
>>>       >     <mailto:yong.huang@smartx.com <mailto:yong.huang@smartx.com>>>
>>>       >      > ---
>>>       >      >   migration/cpu-throttle.c |  2 +-
>>>       >      >   migration/ram.c          | 11 ++++++++---
>>>       >      >   2 files changed, 9 insertions(+), 4 deletions(-)
>>>       >      >
>>>       >      > diff --git a/migration/cpu-throttle.c b/migration/cpu-
>>>      throttle.c
>>>       >      > index 5179019e33..674dc2004e 100644
>>>       >      > --- a/migration/cpu-throttle.c
>>>       >      > +++ b/migration/cpu-throttle.c
>>>       >      > @@ -141,7 +141,7 @@ void
>>>      cpu_throttle_dirty_sync_timer_tick(void
>>>       >     *opaque)
>>>       >      >        * effect on guest performance, therefore omit it to
>>>      avoid
>>>       >      >        * paying extra for the sync penalty.
>>>       >      >        */
>>>       >      > -    if (sync_cnt <= 1) {
>>>       >      > +    if (!sync_cnt) {
>>>       >      >           goto end;
>>>       >      >       }
>>>       >      >
>>>       >      > diff --git a/migration/ram.c b/migration/ram.c
>>>       >      > index 05ff9eb328..571dba10b7 100644
>>>       >      > --- a/migration/ram.c
>>>       >      > +++ b/migration/ram.c
>>>       >      > @@ -2718,7 +2718,7 @@ static void ram_list_init_bitmaps(void)
>>>       >      >   {
>>>       >      >       MigrationState *ms = migrate_get_current();
>>>       >      >       RAMBlock *block;
>>>       >      > -    unsigned long pages;
>>>       >      > +    unsigned long pages, clear_bmap_pages;
>>>       >      >       uint8_t shift;
>>>       >      >
>>>       >      >       /* Skip setting bitmap if there is no RAM */
>>>       >      > @@ -2736,6 +2736,7 @@ static void ram_list_init_bitmaps(void)
>>>       >      >
>>>       >      >           RAMBLOCK_FOREACH_NOT_IGNORED(block) {
>>>       >      >               pages = block->max_length >> TARGET_PAGE_BITS;
>>>       >      > +            clear_bmap_pages = clear_bmap_size(pages, shift);
>>>       >      >               /*
>>>       >      >                * The initial dirty bitmap for migration
>>>      must be
>>>       >     set with all
>>>       >      >                * ones to make sure we'll migrate every
>>>      guest RAM
>>>       >     page to
>>>       >      > @@ -2751,7 +2752,12 @@ static void ram_list_init_bitmaps(void)
>>>       >      >                   block->file_bmap = bitmap_new(pages);
>>>       >      >               }
>>>       >      >               block->clear_bmap_shift = shift;
>>>       >      > -            block->clear_bmap =
>>>       >     bitmap_new(clear_bmap_size(pages, shift));
>>>       >      > +            block->clear_bmap = bitmap_new(clear_bmap_pages);
>>>       >      > +            /*
>>>       >      > +             * Set clear_bmap to 1 unconditionally, as we
>>>      always
>>>       >     set bmap
>>>       >      > +             * to all 1s by default.
>>>       >      > +             */
>>>       >      > +            bitmap_set(block->clear_bmap, 0,
>>>      clear_bmap_pages);
>>>       >      >           }
>>>       >      >       }
>>>       >      >   }
>>>       >      > @@ -2783,7 +2789,6 @@ static bool
>>>      ram_init_bitmaps(RAMState *rs,
>>>       >     Error **errp)
>>>       >      >               if (!ret) {
>>>       >      >                   goto out_unlock;
>>>       >      >               }
>>>       >      > -            migration_bitmap_sync_precopy(false);
>>>       >      >           }
>>>       >      >       }
>>>       >      >   out_unlock:
>>>       >
>>>       >
>>>       >     For virtio-mem, we rely on the
>>>      migration_bitmap_clear_discarded_pages()
>>>       >     call to clear all bits that correspond to unplugged memory
>>>      ranges.
>>>       >
>>>       >
>>>       >     If we ommit the sync, we can likely have bits of unplugged
>>>      ranges still
>>>       >     set to "1", meaning we would try migrate them later, although we
>>>       >     shouldn't?
>>>       >
>>>       >
>>>       >
>>>       > IIUC, migration_bitmap_clear_discarded_pagesis still called at
>>>      the end of
>>>       > ram_init_bitmaps no matter if we omit the first sync.
>>>        > > PRECOPY_NOTIFY_SETUPnotification is sent out at the end of
>>>       > ram_save_setup(ram_list_init_bitmaps),when
>>>       > virtio_balloon_free_page_start() is
>>>       > called,migration_bitmap_clear_discarded_pages() has already
>>>      completed
>>>       > and the
>>>       > bmap has been correctly cleared.
>>>       >
>>>       > ram_save_setup
>>>       >     -> ram_list_init_bitmaps
>>>       >         -> migration_bitmap_clear_discarded_pages
>>>       >      -> return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
>>>       >
>>>       > You can double check it.
>>>
>>>      That's not my concern, let me clarify :)
>>>
>>>
>>>      Assume in KVM the bitmap is all 1s ("everything dirty").
>>>
>>>      In current code, we will sync the bitmap once (IIRC, clearing any dirty
>>>      bits from KVM).
>>>
>>>
>>> For the old logic, write-protect and clear dirty bits are all done in
>>> the KVM_GET_DIRTY_LOG API, while with
>>> KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 feature enabled, clearing
>>> dirty bits are postponed in the KVM_CLEAR_DIRTY_LOG API, which
>>> is called right before page sending in the migration thread in QEMU.
>>>
>>>
>>>      Then we call migration_bitmap_clear_discarded_pages() to clear all
>>>      "discarded" pages that we shouldn't touch.
>>>
>>>      When we do the next bitmap sync, we will not get a "1" for discarded
>>>      ranges, and we will never try migrating discarded ranges.
>>>
>>>
>>>      With your patch, we're omitting the first sync. Could we possibly get
>>>      discarded ranges reported from KVM as dirty during the "now first" sync
>>>      *after* the migration_bitmap_clear_discarded_pages() call, and try
>>>      migrating discarded ranges?
>>>
>>>      I did not dive deep into the code, maybe
>>>      migration_bitmap_clear_discarded_pages() ends up clearing the bits in
>>>
>>>
>>> Yes, the migration_bitmap_clear_discarded_pages clear the bits in
>>> KVM in:
>>> ramblock_dirty_bitmap_clear_discarded_pages
>>>       -> dirty_bitmap_clear_section
>>>           -> migration_clear_memory_region_dirty_bitmap_range
>>>               -> migration_clear_memory_region_dirty_bitmap
>>>                   -> memory_region_clear_dirty_bitmap
>>>                       -> KVM_CLEAR_DIRTY_LOG ioctl
>>>
>>
>> I just tried, and your patch breaks virtio-mem migration as I suspected.
>>
>> sudo build/qemu-system-x86_64 \
>>      --enable-kvm \
>>      -m 16G,maxmem=24G \
>>      -object memory-backend-ram,id=mem1,size=16G \
>>      -machine q35,memory-backend=mem1 \
>>      -cpu max \
>>      -smp 16 \
>>      -nographic \
>>      -nodefaults \
>>      -net nic -net user \
>>      -chardev stdio,nosignal,id=serial \
>>      -hda Fedora-Server-KVM-40-1.14.x86_64.qcow2 \
>>      -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>>      -device isa-serial,chardev=serial \
>>      -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>>      -mon chardev=monitor,mode=readline \
>>      -device pcie-root-port,id=root,slot=0 \
>>      -object memory-backend-file,share=on,mem-path=/dev/shm/vm,id=mem2,size=8G \
>>      -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=16M,bus=root,dynamic-memslots=on,prealloc=on \
>>
>>
>> Once the VM booted up, as expected we're consuming 16M
>>
>>
>> $ stat /dev/shm/vm
>>   Datei: /dev/shm/vm
>>   Größe: 8589934592      Blöcke: 32768      EA Block: 4096   reguläre Datei
>> Gerät: 0/25     Inode: 2087        Verknüpfungen: 1
>> tmpfs                   tmpfs             16G   16M   16G   1% /dev/shm
>>
>>
>> Let's start a migration:
>>
>> $ echo "migrate exec:cat>state" | sudo nc -U /var/tmp/mon_src
>>
>>
>> ... and we end up reading discarded memory:
>>
>> $ LANG=C df -ahT  | grep /dev/shm
>> tmpfs                   tmpfs             16G  8.0G  7.6G  52% /dev/shm
>>
>>
>>
>> Running with TCG only also doesn't work. So somewhere, we have a bitmap filled with
>> all 1s that is not cleared if we drop the first sync.
> 
> Hmm, I'm not yet sure why this happened, but indeed this reminds me that at
> least vhost can have similar issue: when vhost devices are used, it has its
> own bitmap so there it can keep having 1s in the unplugged regions when
> reported the 1st time.

I'm also surprised that it triggers even with TCG. Somewhere seems to be 
a bitmap with all 1s hiding :)

> 
> Is virtio-mem plug/unplug allowed during migration?  I'm wondering whether
> below could happen while migration in progress:
> 
>    migration starts..
>    bitmap init, disgard all unplugged mem in dirty bmap
>    plug mem region X, dirty some pages
>    unplug mem region X
>    dirty sync, reports mem region X dirty (even though unplugged..)
>    ...

No, for this (and other) reasons virtio_mem_is_busy() checks for 
"migration_in_incoming_postcopy() || migration_is_running();" and 
rejects any memory plug/unplug requests.

So the discarded state is stable while migration is running.

> 
> So if unplugged mem should never be touched by qemu, then not sure whether
> above can trigger this case too.
> 
> With/without above, I wonder if migration_bitmap_clear_discarded_pages()
> shouldn't rely on the initial sync of dirty bitmap, instead it could be
> done after each global sync: either another log_global_after_sync() hook,
> or just move it over in migration_bitmap_sync().

I think I had precisely that, and I recall you suggested to have it only 
after the initial sync. Would work for me, but I'd still like to 
understand why essentially none of the "discard" was effective -- all of 
guest RAM got touched.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-11-13 18:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-09  4:59 [PATCH v1 0/2] migration: Skip the first dirty sync Hyman Huang
2024-11-09  4:59 ` [PATCH v1 1/2] virtio-balloon: Enable free page hinting during PRECOPY_NOTIFY_SETUP Hyman Huang
2024-11-12 10:10   ` David Hildenbrand
2024-11-09  4:59 ` [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration Hyman Huang
2024-11-11  9:07   ` Wang, Wei W
2024-11-11 10:20     ` Yong Huang
2024-11-11  9:27   ` David Hildenbrand
2024-11-11 10:08     ` Yong Huang
2024-11-11 10:42       ` David Hildenbrand
2024-11-11 11:14         ` Yong Huang
2024-11-11 11:37         ` Yong Huang
2024-11-12 10:08           ` David Hildenbrand
2024-11-13 17:40             ` Peter Xu
2024-11-13 18:49               ` David Hildenbrand [this message]
2024-11-13 20:12                 ` Peter Xu
2024-11-14  9:02                   ` David Hildenbrand
2024-11-14 19:28                     ` Peter Xu
2024-11-14 21:16                       ` David Hildenbrand
2024-11-14 22:40                         ` Peter Xu
2024-11-15  9:11                           ` David Hildenbrand
2024-11-15 15:41                             ` Peter Xu
2024-11-15 15:49                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=382461ab-d620-4d2e-becd-720daadf3c55@redhat.com \
    --to=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei.w.wang@intel.com \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).