qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Yong Huang <yong.huang@smartx.com>,
	qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Wei Wang <wei.w.wang@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Wed, 13 Nov 2024 12:40:50 -0500	[thread overview]
Message-ID: <ZzTkopUrLGL5iqSv@x1n> (raw)
In-Reply-To: <3049bc19-2556-4fbf-9d34-578db523b63b@redhat.com>

On Tue, Nov 12, 2024 at 11:08:44AM +0100, David Hildenbrand wrote:
> On 11.11.24 12:37, Yong Huang wrote:
> > 
> > 
> > On Mon, Nov 11, 2024 at 6:42 PM David Hildenbrand <david@redhat.com
> > <mailto:david@redhat.com>> wrote:
> > 
> >     On 11.11.24 11:08, Yong Huang wrote:
> >      >
> >      >
> >      > On Mon, Nov 11, 2024 at 5:27 PM David Hildenbrand
> >     <david@redhat.com <mailto:david@redhat.com>
> >      > <mailto:david@redhat.com <mailto:david@redhat.com>>> wrote:
> >      >
> >      >     On 09.11.24 05:59, Hyman Huang wrote:
> >      >      > The first iteration's RAMBlock dirty sync can be omitted
> >     because QEMU
> >      >      > always initializes the RAMBlock's bmap to all 1s by default.
> >      >      >
> >      >      > Signed-off-by: Hyman Huang <yong.huang@smartx.com
> >     <mailto:yong.huang@smartx.com>
> >      >     <mailto:yong.huang@smartx.com <mailto:yong.huang@smartx.com>>>
> >      >      > ---
> >      >      >   migration/cpu-throttle.c |  2 +-
> >      >      >   migration/ram.c          | 11 ++++++++---
> >      >      >   2 files changed, 9 insertions(+), 4 deletions(-)
> >      >      >
> >      >      > diff --git a/migration/cpu-throttle.c b/migration/cpu-
> >     throttle.c
> >      >      > index 5179019e33..674dc2004e 100644
> >      >      > --- a/migration/cpu-throttle.c
> >      >      > +++ b/migration/cpu-throttle.c
> >      >      > @@ -141,7 +141,7 @@ void
> >     cpu_throttle_dirty_sync_timer_tick(void
> >      >     *opaque)
> >      >      >        * effect on guest performance, therefore omit it to
> >     avoid
> >      >      >        * paying extra for the sync penalty.
> >      >      >        */
> >      >      > -    if (sync_cnt <= 1) {
> >      >      > +    if (!sync_cnt) {
> >      >      >           goto end;
> >      >      >       }
> >      >      >
> >      >      > diff --git a/migration/ram.c b/migration/ram.c
> >      >      > index 05ff9eb328..571dba10b7 100644
> >      >      > --- a/migration/ram.c
> >      >      > +++ b/migration/ram.c
> >      >      > @@ -2718,7 +2718,7 @@ static void ram_list_init_bitmaps(void)
> >      >      >   {
> >      >      >       MigrationState *ms = migrate_get_current();
> >      >      >       RAMBlock *block;
> >      >      > -    unsigned long pages;
> >      >      > +    unsigned long pages, clear_bmap_pages;
> >      >      >       uint8_t shift;
> >      >      >
> >      >      >       /* Skip setting bitmap if there is no RAM */
> >      >      > @@ -2736,6 +2736,7 @@ static void ram_list_init_bitmaps(void)
> >      >      >
> >      >      >           RAMBLOCK_FOREACH_NOT_IGNORED(block) {
> >      >      >               pages = block->max_length >> TARGET_PAGE_BITS;
> >      >      > +            clear_bmap_pages = clear_bmap_size(pages, shift);
> >      >      >               /*
> >      >      >                * The initial dirty bitmap for migration
> >     must be
> >      >     set with all
> >      >      >                * ones to make sure we'll migrate every
> >     guest RAM
> >      >     page to
> >      >      > @@ -2751,7 +2752,12 @@ static void ram_list_init_bitmaps(void)
> >      >      >                   block->file_bmap = bitmap_new(pages);
> >      >      >               }
> >      >      >               block->clear_bmap_shift = shift;
> >      >      > -            block->clear_bmap =
> >      >     bitmap_new(clear_bmap_size(pages, shift));
> >      >      > +            block->clear_bmap = bitmap_new(clear_bmap_pages);
> >      >      > +            /*
> >      >      > +             * Set clear_bmap to 1 unconditionally, as we
> >     always
> >      >     set bmap
> >      >      > +             * to all 1s by default.
> >      >      > +             */
> >      >      > +            bitmap_set(block->clear_bmap, 0,
> >     clear_bmap_pages);
> >      >      >           }
> >      >      >       }
> >      >      >   }
> >      >      > @@ -2783,7 +2789,6 @@ static bool
> >     ram_init_bitmaps(RAMState *rs,
> >      >     Error **errp)
> >      >      >               if (!ret) {
> >      >      >                   goto out_unlock;
> >      >      >               }
> >      >      > -            migration_bitmap_sync_precopy(false);
> >      >      >           }
> >      >      >       }
> >      >      >   out_unlock:
> >      >
> >      >
> >      >     For virtio-mem, we rely on the
> >     migration_bitmap_clear_discarded_pages()
> >      >     call to clear all bits that correspond to unplugged memory
> >     ranges.
> >      >
> >      >
> >      >     If we ommit the sync, we can likely have bits of unplugged
> >     ranges still
> >      >     set to "1", meaning we would try migrate them later, although we
> >      >     shouldn't?
> >      >
> >      >
> >      >
> >      > IIUC, migration_bitmap_clear_discarded_pagesis still called at
> >     the end of
> >      > ram_init_bitmaps no matter if we omit the first sync.
> >       > > PRECOPY_NOTIFY_SETUPnotification is sent out at the end of
> >      > ram_save_setup(ram_list_init_bitmaps),when
> >      > virtio_balloon_free_page_start() is
> >      > called,migration_bitmap_clear_discarded_pages() has already
> >     completed
> >      > and the
> >      > bmap has been correctly cleared.
> >      >
> >      > ram_save_setup
> >      >     -> ram_list_init_bitmaps
> >      >         -> migration_bitmap_clear_discarded_pages
> >      >      -> return precopy_notify(PRECOPY_NOTIFY_SETUP, errp);
> >      >
> >      > You can double check it.
> > 
> >     That's not my concern, let me clarify :)
> > 
> > 
> >     Assume in KVM the bitmap is all 1s ("everything dirty").
> > 
> >     In current code, we will sync the bitmap once (IIRC, clearing any dirty
> >     bits from KVM).
> > 
> > 
> > For the old logic, write-protect and clear dirty bits are all done in
> > the KVM_GET_DIRTY_LOG API, while with
> > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 feature enabled, clearing
> > dirty bits are postponed in the KVM_CLEAR_DIRTY_LOG API, which
> > is called right before page sending in the migration thread in QEMU.
> > 
> > 
> >     Then we call migration_bitmap_clear_discarded_pages() to clear all
> >     "discarded" pages that we shouldn't touch.
> > 
> >     When we do the next bitmap sync, we will not get a "1" for discarded
> >     ranges, and we will never try migrating discarded ranges.
> > 
> > 
> >     With your patch, we're omitting the first sync. Could we possibly get
> >     discarded ranges reported from KVM as dirty during the "now first" sync
> >     *after* the migration_bitmap_clear_discarded_pages() call, and try
> >     migrating discarded ranges?
> > 
> >     I did not dive deep into the code, maybe
> >     migration_bitmap_clear_discarded_pages() ends up clearing the bits in
> > 
> > 
> > Yes, the migration_bitmap_clear_discarded_pages clear the bits in
> > KVM in:
> > ramblock_dirty_bitmap_clear_discarded_pages
> >      -> dirty_bitmap_clear_section
> >          -> migration_clear_memory_region_dirty_bitmap_range
> >              -> migration_clear_memory_region_dirty_bitmap
> >                  -> memory_region_clear_dirty_bitmap
> >                      -> KVM_CLEAR_DIRTY_LOG ioctl
> > 
> 
> I just tried, and your patch breaks virtio-mem migration as I suspected.
> 
> sudo build/qemu-system-x86_64 \
>     --enable-kvm \
>     -m 16G,maxmem=24G \
>     -object memory-backend-ram,id=mem1,size=16G \
>     -machine q35,memory-backend=mem1 \
>     -cpu max \
>     -smp 16 \
>     -nographic \
>     -nodefaults \
>     -net nic -net user \
>     -chardev stdio,nosignal,id=serial \
>     -hda Fedora-Server-KVM-40-1.14.x86_64.qcow2 \
>     -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
>     -device isa-serial,chardev=serial \
>     -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
>     -mon chardev=monitor,mode=readline \
>     -device pcie-root-port,id=root,slot=0 \
>     -object memory-backend-file,share=on,mem-path=/dev/shm/vm,id=mem2,size=8G \
>     -device virtio-mem-pci,id=vmem0,memdev=mem2,requested-size=16M,bus=root,dynamic-memslots=on,prealloc=on \
> 
> 
> Once the VM booted up, as expected we're consuming 16M
> 
> 
> $ stat /dev/shm/vm
>  Datei: /dev/shm/vm
>  Größe: 8589934592      Blöcke: 32768      EA Block: 4096   reguläre Datei
> Gerät: 0/25     Inode: 2087        Verknüpfungen: 1
> tmpfs                   tmpfs             16G   16M   16G   1% /dev/shm
> 
> 
> Let's start a migration:
> 
> $ echo "migrate exec:cat>state" | sudo nc -U /var/tmp/mon_src
> 
> 
> ... and we end up reading discarded memory:
> 
> $ LANG=C df -ahT  | grep /dev/shm
> tmpfs                   tmpfs             16G  8.0G  7.6G  52% /dev/shm
> 
> 
> 
> Running with TCG only also doesn't work. So somewhere, we have a bitmap filled with
> all 1s that is not cleared if we drop the first sync.

Hmm, I'm not yet sure why this happened, but indeed this reminds me that at
least vhost can have similar issue: when vhost devices are used, it has its
own bitmap so there it can keep having 1s in the unplugged regions when
reported the 1st time.

Is virtio-mem plug/unplug allowed during migration?  I'm wondering whether
below could happen while migration in progress:

  migration starts..
  bitmap init, disgard all unplugged mem in dirty bmap
  plug mem region X, dirty some pages
  unplug mem region X
  dirty sync, reports mem region X dirty (even though unplugged..)
  ...

So if unplugged mem should never be touched by qemu, then not sure whether
above can trigger this case too.

With/without above, I wonder if migration_bitmap_clear_discarded_pages()
shouldn't rely on the initial sync of dirty bitmap, instead it could be
done after each global sync: either another log_global_after_sync() hook,
or just move it over in migration_bitmap_sync().

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-11-13 17:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-09  4:59 [PATCH v1 0/2] migration: Skip the first dirty sync Hyman Huang
2024-11-09  4:59 ` [PATCH v1 1/2] virtio-balloon: Enable free page hinting during PRECOPY_NOTIFY_SETUP Hyman Huang
2024-11-12 10:10   ` David Hildenbrand
2024-11-09  4:59 ` [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration Hyman Huang
2024-11-11  9:07   ` Wang, Wei W
2024-11-11 10:20     ` Yong Huang
2024-11-11  9:27   ` David Hildenbrand
2024-11-11 10:08     ` Yong Huang
2024-11-11 10:42       ` David Hildenbrand
2024-11-11 11:14         ` Yong Huang
2024-11-11 11:37         ` Yong Huang
2024-11-12 10:08           ` David Hildenbrand
2024-11-13 17:40             ` Peter Xu [this message]
2024-11-13 18:49               ` David Hildenbrand
2024-11-13 20:12                 ` Peter Xu
2024-11-14  9:02                   ` David Hildenbrand
2024-11-14 19:28                     ` Peter Xu
2024-11-14 21:16                       ` David Hildenbrand
2024-11-14 22:40                         ` Peter Xu
2024-11-15  9:11                           ` David Hildenbrand
2024-11-15 15:41                             ` Peter Xu
2024-11-15 15:49                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzTkopUrLGL5iqSv@x1n \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei.w.wang@intel.com \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).