qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Yong Huang <yong.huang@smartx.com>,
	qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Wei Wang <wei.w.wang@intel.com>,
	"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Fri, 15 Nov 2024 10:41:01 -0500	[thread overview]
Message-ID: <ZzdrjTXgNFlrqqEu@x1n> (raw)
In-Reply-To: <5577f4cc-63f9-4f4a-b857-48a9405075c1@redhat.com>

On Fri, Nov 15, 2024 at 10:11:51AM +0100, David Hildenbrand wrote:
> > > > > 
> > > > > But then I realized that even memory_region_clear_dirty_bitmap() will not
> > > > > clear the ramblock_dirty_bitmap_ bits! It's only concerned about
> > > > > listener->log_clear() calls.
> > > > > 
> > > > > Looking for DIRTY_MEMORY_BLOCK_SIZE users, only
> > > > > cpu_physical_memory_sync_dirty_bitmap() and
> > > > > cpu_physical_memory_clear_dirty_range() clear them, whereby the latter is
> > > > > only used when resizing RAMblocks.
> > > > > 
> > > > > At first I wondered whether ramblock_dirty_bitmap_clear_discarded_pages()
> > > > > should also call cpu_physical_memory_clear_dirty_range(), but then I am not
> > > > > so sure if that is really the right approach.
> > > > 
> > > > That sounds actually reasonable to me so far.. What's the concern in your
> > > > mind?
> > > 
> > > I think what I had in mind was that for the initial bitmap sync, when we set
> > > the bmap to all-1s already, we could just clear the whole
> > > ramblock_dirty_bitmap_ + KVM ... bitmaps.
> > > 
> > > So, instead of an "initial sync" we might just want to do an "initial
> > > clearing" of all bitmaps.
> > 
> > Logically most dirty tracking bitmaps should start with all zeros.  KVM old
> > kernels are like that; KVM_DIRTY_LOG_INITIALLY_SET is not, but it's a
> > separate feature.  I still hope it's pretty common for the rest, e.g. vhost
> > should have all zeros in its init bitmap even without initial sync.
> 
> We better double check and document that, because it must be guaranteed, not
> "let's cross fingers".

Yes, we should double check on at least known good use cases, maybe not
all.

E.g., I see nvmm_log_sync() and whpx_log_sync() unconditionally set dirty
to all mem always.  I actually don't know how some of these trackers work
for live migration, or if it works at all.

If by accident this change breaks something, I also wonder whether we
should fix the patch that breaks it, or fix the assumption that the 1st
sync must happen at setup.  That's simply wrong to me.. and not all dirty
track users have such either.  E.g. vga tracking won't even have a SETUP
phase at all.

The simplest solution for any potential breakage is to provide a
log_global_start() and sync once there, that's exactly what SETUP should
do.  But I hope it won't happen at all..

This is legacy tech debt to me.  The earlier we understand it the better,
so I'm personally ok if someone would try to do this as long as major archs
will be safe.

> 
> Also, I'm not 100% sure how KVM internals implement that (I recall some
> s390x oddities, but might be wrong ...).
> 
> > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > virtio-balloon() calls qemu_guest_free_page_hint() which calls
> > > > > 
> > > > > migration_clear_memory_region_dirty_bitmap_range()
> > > > > bitmap_clear()
> > > > > 
> > > > > So it would maybe have the same issue.
> > > > 
> > > > Should virtio-balloon do the same?
> > > 
> > > virtio-balloon is more interesting, because I assume here we could run after
> > > the "initial clearing" and would want to mark it clean everywhere.
> > 
> > Yes, if it does what I mentioned below, IIUC it'll clear all dirty bits
> > across the whole stack.  Only the ram_list bitmap is missing.  IIUC it
> > could mean it could stop working for at least tcg, as tcg sololy uses
> > it.. even with kvm some MRs may use it.  Maybe we want to fix it
> > separately.
> 
> Yes, that definitely needs care.
> 
> [...]
> 
> > > (but I'm confused because we have way too many bitmaps, and maybe the KVM
> > > one could be problematic without an initial sync ... we'd want an initial
> > > clearing for that as well ...)
> > 
> > So IMHO most of the bitmaps should be initialized with zeros, not
> > ones.. like I mentioned above.
> > 
> > Migration bitmap is special, because it's not about dirty tracking
> > capability / reporting but that we know we need to migrate the first round.
> > So setting all ones makes sense for migration only, not a reporting
> > facility.  While KVM_DIRTY_LOG_INITIALLY_SET existed for its own reasoning
> > on speeding up migration starts..
> > 
> > So, now I am thinking whether we should not set all ones in ram_list bitmap
> > at all, corresponds to this change:
> > 
> > ===8<===
> > diff --git a/system/physmem.c b/system/physmem.c
> > index dc1db3a384..10966fa68c 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -1913,10 +1913,6 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
> >       ram_list.version++;
> >       qemu_mutex_unlock_ramlist();
> > -    cpu_physical_memory_set_dirty_range(new_block->offset,
> > -                                        new_block->used_length,
> > -                                        DIRTY_CLIENTS_ALL);
> > -
> >       if (new_block->host) {
> >           qemu_ram_setup_dump(new_block->host, new_block->max_length);
> >           qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
> > ===8<===
> > 
> 
> It will leave things in an inconsistent state with regards of
> qemu_ram_resize() though. So we'd want to do the same thing there as well.
> 
> I don't know about DIRTY_CLIENTS_ALL, though ... no expert on that, which
> side effects it has.
> 
> Because saying "initially, all memory is dirty" can make sense, depending on
> from which angle you look at it.

Migration definitely doesn't need it, so to be safe we could also make
CLIENT_ALL to CODE|VGA.

And if we agree virtio-mem will need to punch holes on all the bitmaps,
then this patch is even not needed.  For that, more below.

> 
> > I'm guessing whether above could fix the virtio-mem regression after
> > Hyman's current patch applied.
> > 
> > Said that, IMHO virtio-mem should still use the same helper just like
> > virtio-balloon as I discussed previously, so as to reset bitmap for the
> > whole stack (which seems to always be the right thing to do to not miss one
> > layer of them)?
> 
> Well, the difference is that virtio-mem really only gets called exactly once
> initially. If we can just reset all bitmaps initially, then virtio-mem
> really only has to zap the rb->bmap.

I think virtio-mem should better always punch through, regardless of
whether we can have above change. Consider cases like KVM "initial-all-set"
feature I mentioned above, when that feature bit is set by QEMU, KVM sets
all ones to the bitmap initially.  So that may be required for virtio-mem
to clear 1s in KVM at least.

> 
> The most robust solution would be to process discards after every bitmap
> sync of course.
> 
> > 
> > Hence: 1 patch to virtio-balloon covering ram_list bitmap (which could be a
> > real fix to virtio-balloon on e.g. tcg?); 1 patch to virtio-mem reusing
> > that helper of virtio-balloon just as a cleanup to also cover all bitmaps;
> > 1 patch like above to avoid setting ones at all in ram_list bitmap as
> > cleanup; then finally remove the sync() in SETUP, which is this patch.
> > IIUC after all these changes applied it'll work for all cases.
> 
> Would work for me.
> 
> BTW, I was wondering if we could get rid of one of the bitmaps with KVM ...
> but likely not that easy.

Yeh.  The memslots' bitmap can be completely overwritten by KVM kernel, so
at least we can't simply make use of the migration ram_list bitmap there..

-- 
Peter Xu



  reply	other threads:[~2024-11-15 15:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-09  4:59 [PATCH v1 0/2] migration: Skip the first dirty sync Hyman Huang
2024-11-09  4:59 ` [PATCH v1 1/2] virtio-balloon: Enable free page hinting during PRECOPY_NOTIFY_SETUP Hyman Huang
2024-11-12 10:10   ` David Hildenbrand
2024-11-09  4:59 ` [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration Hyman Huang
2024-11-11  9:07   ` Wang, Wei W
2024-11-11 10:20     ` Yong Huang
2024-11-11  9:27   ` David Hildenbrand
2024-11-11 10:08     ` Yong Huang
2024-11-11 10:42       ` David Hildenbrand
2024-11-11 11:14         ` Yong Huang
2024-11-11 11:37         ` Yong Huang
2024-11-12 10:08           ` David Hildenbrand
2024-11-13 17:40             ` Peter Xu
2024-11-13 18:49               ` David Hildenbrand
2024-11-13 20:12                 ` Peter Xu
2024-11-14  9:02                   ` David Hildenbrand
2024-11-14 19:28                     ` Peter Xu
2024-11-14 21:16                       ` David Hildenbrand
2024-11-14 22:40                         ` Peter Xu
2024-11-15  9:11                           ` David Hildenbrand
2024-11-15 15:41                             ` Peter Xu [this message]
2024-11-15 15:49                               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZzdrjTXgNFlrqqEu@x1n \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei.w.wang@intel.com \
    --cc=yong.huang@smartx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).