From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Yong Huang <yong.huang@smartx.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Wei Wang <wei.w.wang@intel.com>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration
Date: Fri, 15 Nov 2024 16:49:56 +0100 [thread overview]
Message-ID: <91a7fcc0-f4b6-4129-bde2-c776c1e09859@redhat.com> (raw)
In-Reply-To: <ZzdrjTXgNFlrqqEu@x1n>
>> We better double check and document that, because it must be guaranteed, not
>> "let's cross fingers".
>
> Yes, we should double check on at least known good use cases, maybe not
> all.
>
> E.g., I see nvmm_log_sync() and whpx_log_sync() unconditionally set dirty
> to all mem always. I actually don't know how some of these trackers work
> for live migration, or if it works at all.
>
> If by accident this change breaks something, I also wonder whether we
> should fix the patch that breaks it, or fix the assumption that the 1st
> sync must happen at setup. That's simply wrong to me.. and not all dirty
> track users have such either. E.g. vga tracking won't even have a SETUP
> phase at all.
>
> The simplest solution for any potential breakage is to provide a
> log_global_start() and sync once there, that's exactly what SETUP should
> do. But I hope it won't happen at all..
>
> This is legacy tech debt to me. The earlier we understand it the better,
> so I'm personally ok if someone would try to do this as long as major archs
> will be safe.
>
>>
>> Also, I'm not 100% sure how KVM internals implement that (I recall some
>> s390x oddities, but might be wrong ...).
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> virtio-balloon() calls qemu_guest_free_page_hint() which calls
>>>>>>
>>>>>> migration_clear_memory_region_dirty_bitmap_range()
>>>>>> bitmap_clear()
>>>>>>
>>>>>> So it would maybe have the same issue.
>>>>>
>>>>> Should virtio-balloon do the same?
>>>>
>>>> virtio-balloon is more interesting, because I assume here we could run after
>>>> the "initial clearing" and would want to mark it clean everywhere.
>>>
>>> Yes, if it does what I mentioned below, IIUC it'll clear all dirty bits
>>> across the whole stack. Only the ram_list bitmap is missing. IIUC it
>>> could mean it could stop working for at least tcg, as tcg sololy uses
>>> it.. even with kvm some MRs may use it. Maybe we want to fix it
>>> separately.
>>
>> Yes, that definitely needs care.
>>
>> [...]
>>
>>>> (but I'm confused because we have way too many bitmaps, and maybe the KVM
>>>> one could be problematic without an initial sync ... we'd want an initial
>>>> clearing for that as well ...)
>>>
>>> So IMHO most of the bitmaps should be initialized with zeros, not
>>> ones.. like I mentioned above.
>>>
>>> Migration bitmap is special, because it's not about dirty tracking
>>> capability / reporting but that we know we need to migrate the first round.
>>> So setting all ones makes sense for migration only, not a reporting
>>> facility. While KVM_DIRTY_LOG_INITIALLY_SET existed for its own reasoning
>>> on speeding up migration starts..
>>>
>>> So, now I am thinking whether we should not set all ones in ram_list bitmap
>>> at all, corresponds to this change:
>>>
>>> ===8<===
>>> diff --git a/system/physmem.c b/system/physmem.c
>>> index dc1db3a384..10966fa68c 100644
>>> --- a/system/physmem.c
>>> +++ b/system/physmem.c
>>> @@ -1913,10 +1913,6 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>>> ram_list.version++;
>>> qemu_mutex_unlock_ramlist();
>>> - cpu_physical_memory_set_dirty_range(new_block->offset,
>>> - new_block->used_length,
>>> - DIRTY_CLIENTS_ALL);
>>> -
>>> if (new_block->host) {
>>> qemu_ram_setup_dump(new_block->host, new_block->max_length);
>>> qemu_madvise(new_block->host, new_block->max_length, QEMU_MADV_HUGEPAGE);
>>> ===8<===
>>>
>>
>> It will leave things in an inconsistent state with regards of
>> qemu_ram_resize() though. So we'd want to do the same thing there as well.
>>
>> I don't know about DIRTY_CLIENTS_ALL, though ... no expert on that, which
>> side effects it has.
>>
>> Because saying "initially, all memory is dirty" can make sense, depending on
>> from which angle you look at it.
>
> Migration definitely doesn't need it, so to be safe we could also make
> CLIENT_ALL to CODE|VGA.
>
> And if we agree virtio-mem will need to punch holes on all the bitmaps,
> then this patch is even not needed. For that, more below.
Yes, it's fine for me as a fix. It's not optimal, but I don't really
care as long as it works :)
>
>>
>>> I'm guessing whether above could fix the virtio-mem regression after
>>> Hyman's current patch applied.
>>>
>>> Said that, IMHO virtio-mem should still use the same helper just like
>>> virtio-balloon as I discussed previously, so as to reset bitmap for the
>>> whole stack (which seems to always be the right thing to do to not miss one
>>> layer of them)?
>>
>> Well, the difference is that virtio-mem really only gets called exactly once
>> initially. If we can just reset all bitmaps initially, then virtio-mem
>> really only has to zap the rb->bmap.
>
> I think virtio-mem should better always punch through, regardless of
> whether we can have above change. Consider cases like KVM "initial-all-set"
> feature I mentioned above, when that feature bit is set by QEMU, KVM sets
> all ones to the bitmap initially. So that may be required for virtio-mem
> to clear 1s in KVM at least.
Yes.
Anyhow, I'm happy as long as we don't break virtio-mem. Clearing all
bitmaps will work.
Getting virtio-mem to clear discarded after every bitmap sync would be
even more robust, for example, if we'd had some stupid sync()
implementation that simply always sets all memory dirty ...
--
Cheers,
David / dhildenb
prev parent reply other threads:[~2024-11-15 15:50 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-09 4:59 [PATCH v1 0/2] migration: Skip the first dirty sync Hyman Huang
2024-11-09 4:59 ` [PATCH v1 1/2] virtio-balloon: Enable free page hinting during PRECOPY_NOTIFY_SETUP Hyman Huang
2024-11-12 10:10 ` David Hildenbrand
2024-11-09 4:59 ` [PATCH v1 2/2] migration: Do not perform RAMBlock dirty sync during the first iteration Hyman Huang
2024-11-11 9:07 ` Wang, Wei W
2024-11-11 10:20 ` Yong Huang
2024-11-11 9:27 ` David Hildenbrand
2024-11-11 10:08 ` Yong Huang
2024-11-11 10:42 ` David Hildenbrand
2024-11-11 11:14 ` Yong Huang
2024-11-11 11:37 ` Yong Huang
2024-11-12 10:08 ` David Hildenbrand
2024-11-13 17:40 ` Peter Xu
2024-11-13 18:49 ` David Hildenbrand
2024-11-13 20:12 ` Peter Xu
2024-11-14 9:02 ` David Hildenbrand
2024-11-14 19:28 ` Peter Xu
2024-11-14 21:16 ` David Hildenbrand
2024-11-14 22:40 ` Peter Xu
2024-11-15 9:11 ` David Hildenbrand
2024-11-15 15:41 ` Peter Xu
2024-11-15 15:49 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91a7fcc0-f4b6-4129-bde2-c776c1e09859@redhat.com \
--to=david@redhat.com \
--cc=farosas@suse.de \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=wei.w.wang@intel.com \
--cc=yong.huang@smartx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).