From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
David Hildenbrand <david@redhat.com>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] migration: ram block cpr blockers
Date: Tue, 25 Feb 2025 10:46:23 -0500 [thread overview]
Message-ID: <492dee01-fe64-4ce4-b298-a9eb5d731ea6@oracle.com> (raw)
In-Reply-To: <Z7Sw228UUsmGD3_t@x1.local>
On 2/18/2025 11:10 AM, Peter Xu wrote:
> On Fri, Feb 14, 2025 at 03:12:22PM -0500, Steven Sistare wrote:
>> On 1/30/2025 12:01 PM, Peter Xu wrote:
>>> On Wed, Jan 29, 2025 at 01:20:13PM -0500, Steven Sistare wrote:
>>>> On 1/17/2025 6:57 PM, Peter Xu wrote:
>>>>> On Fri, Jan 17, 2025 at 02:10:14PM -0500, Steven Sistare wrote:
>>>>>> On 1/17/2025 1:16 PM, Peter Xu wrote:
>>>>>>> On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote:
>>>>>>>> +/*
>>>>>>>> + * Return true if ram contents would be lost during CPR.
>>>>>>>> + * Return false for ram_device because it is remapped in new QEMU. Do not
>>>>>>>> + * exclude rom, even though it is readonly, because the rom file could change
>>>>>>>> + * in new QEMU. Return false for non-migratable blocks. They are either
>>>>>>>> + * re-created in new QEMU, or are handled specially, or are covered by a
>>>>>>>> + * device-level CPR blocker. Return false for an fd, because it is visible and
>>>>>>>> + * can be remapped in new QEMU.
>>>>>>>> + */
>>>>>>>> +static bool ram_is_volatile(RAMBlock *rb)
>>>>>>>> +{
>>>>>>>> + MemoryRegion *mr = rb->mr;
>>>>>>>> +
>>>>>>>> + return mr &&
>>>>>>>> + memory_region_is_ram(mr) &&
>>>>>>>> + !memory_region_is_ram_device(mr) &&
>>>>>>>> + (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
>>>>>>>> + qemu_ram_is_migratable(rb) &&
>>>>>>>> + rb->fd < 0;
>>>>>>>> +}
>>>>>>>
>>>>>>> Blocking guest_memfd looks ok, but comparing to add one more block
>>>>>>> notifier, can we check all ramblocks once in migrate_prepare(), and fail
>>>>>>> that command directly if it fails the check?
>>>>>>
>>>>>> In an upcoming patch, I will be adding an option analogous to only-migratable which
>>>>>> prevents QEMU from starting if anything would block cpr-transfer. That option
>>>>>> will be checked when blockers are added, like for only-migratable. migrate_prepare
>>>>>> is too late.
>>>>>>
>>>>>>> OTOH, is there any simpler way to simplify the check conditions? It'll be
>>>>>>> at least nice to break these checks into smaller if conditions for
>>>>>>> readability..
>>>>>>
>>>>>> I thought the function header comments made it clear, but I could move each
>>>>>> comment next to each condition:
>>>>>>
>>>>>> ...
>>>>>> /*
>>>>>> * Return false for an fd, because it is visible and can be remapped in
>>>>>> * new QEMU.
>>>>>> */
>>>>>> if (rb->fd >= 0) {
>>>>>> return false;
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>>> I wonder if we could stick with looping over all ramblocks, then make sure
>>>>>>> each of them is on the cpr saved fd list. It may need to make
>>>>>>> cpr_save_fd() always register with the name of ramblock to do such lookup,
>>>>>>> or maybe we could also cache the ramblock pointer in CprFd, then the lookup
>>>>>>> will be a pointer match check.
>>>>>>
>>>>>> Some ramblocks are not on the list, such as named files. Plus looping in
>>>>>> migrate_prepare is too late as noted above.
>>>>>>
>>>>>> IMO what I have already implemented using blockers is clean and elegant.
>>>>>
>>>>> OK if we need to fail it early at boot, then yes blockers are probably
>>>>> better.
>>>>>
>>>>> We'll need one more cmdline parameter. I've no objection, but I don't know
>>>>> how to judge when it's ok to add, when it's better not.. I'll leave others
>>>>> to comment on this.
>>>>>
>>>>> But still, could we check it when ramblocks are created? So in that way
>>>>> whatever is forbidden is clear in its own path, I feel like that could be
>>>>> clearer (like what you did with gmemfd).
>>>>
>>>> When the ramblock is created, we don't yet know if it is migratable. A
>>>> ramblock that is not migratable does not block cpr. Migratable is not known
>>>> until vmstate_register_ram calls qemu_ram_set_migratable. Hence that is
>>>> where I evaluate conditions and install a blocker.
>>>>
>>>> Because that is the only place where ram_block_add_cpr_blocker is called,
>>>> the test qemu_ram_is_migratable() inside ram_block_add_cpr_blocker is
>>>> redundant, and I should delete it.
>>>
>>> Hmm.. sounds reasonable.
>>>
>>>>
>>>>> For example, if I start to convert some of your requirements above, then
>>>>> memory_region_is_ram_device() implies RAM_PREALLOC. Actually, ram_device
>>>>> is not the only RAM_PREALLOC user.. Say, would it also not work with all
>>>>> memory_region_init_ram_ptr() users (even if they're not ram_device)? An
>>>>> example is, looks like virtio-gpu can create random ramblocks on the fly
>>>>> with prealloced buffers. I am not sure whether they can be pinned by VFIO
>>>>> too. You may know better.
>>>>
>>>> That memory is not visible to the guest. It is not part of system_memory,
>>>> and is not marked migratable.
>>>
>>> I _think_ that can still be visible at least for the virtio-gpu use case,
>>> which hangs under VirtIOGPUBase.hostmem. Relevant code for reference:
>>>
>>> virtio_gpu_virgl_map_resource_blob:
>>> memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>>> memory_region_add_subregion(&b->hostmem, offset, mr);
>>> memory_region_set_enabled(mr, true);
>
> [1]
>
>>>
>>> virtio_gpu_pci_base_realize:
>>> memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
>>> g->conf.hostmem);
>>> pci_register_bar(&vpci_dev->pci_dev, 4,
>>> PCI_BASE_ADDRESS_SPACE_MEMORY |
>>> PCI_BASE_ADDRESS_MEM_PREFETCH |
>>> PCI_BASE_ADDRESS_MEM_TYPE_64,
>>> &g->hostmem);
>
> [2]
>
>>>
>>> pci_update_mappings:
>>> memory_region_add_subregion_overlap(r->address_space,
>>> r->addr, r->memory, 1);
>
> [3]
>
>>>
>>> but indeed I'm not sure how it work with migration so far, because it
>>> doesn't have RAM_MIGRATABLE set. So at least cpr didn't make it more
>>> special. I assume this isn't something we must figure out as of now
>>> then.. but if you do, please kindly share.
>>
>> AFAICT this memory cannot be pinned, because it is not attached to the
>> "system_memory" mr and the "address_space_memory" address space. The
>> listener than maps/pins is attached to address_space_memory.
>
> I still think it's part of system_memory - every PCI bar that got mapped as
> MMIO regions could be part of guest system memory, and IIUC could be
> pinned.
>
> Normally these MMIOs are IO regions so "pinning" may not make much
> difference IIUC, but here looks like it's real RAM backing it when
> emulated, even though they're "MMIO regions"..
>
> Above code clips are paths that I think how they got attached to
> system_memory. It isn't more special than a generic PCI device's bars got
> mapped to system_memory, though:
>
> - Firstly, at [1], the blob is part of VirtIOGPUBase.hostmem as subregion
>
> - Then, at [2], the VirtIOGPUBase.hostmem is registered as a pci bar, in
> this case, it's BAR4.
>
> - Then, it's the same as other pci device BARs where they can logically
> be mapped into guest physical address space and be part of system_memory.
> Above [3] was when the bar memory got added into PCIIORegion.address_space.
>
> Then if to fill up the last missing piece: taking i440fx as example,
> ultimately the pci bus memory address space will be attached to
> system_memory here:
>
> i440fx_pcihost_realize:
> /* setup pci memory mapping */
> pc_pci_as_mapping_init(s->system_memory, s->pci_address_space);
>
> Again, not requesting to resolve this immediately: I have no idea how
> migration works at all with random new ramblocks being added.. but I'd
> still like to check we're on the same page on this specific case, though.
I agree with your analysis; thanks for connecting the dots. The vfio memory
listener callback vfio_listener_region_add would be called, and pin the memory
for DMA.
The memory is allocated by an external source, the virglrenderer library, so cpr
is not supported for this device. virtio_gpu_base_device_realize already calls
migrate_add_blocker, which blocks all migration modes.
- Steve
prev parent reply other threads:[~2025-02-25 15:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 17:46 [PATCH] migration: ram block cpr blockers Steve Sistare
2025-01-17 18:16 ` Peter Xu
2025-01-17 19:10 ` Steven Sistare
2025-01-17 23:57 ` Peter Xu
2025-01-29 18:20 ` Steven Sistare
2025-01-30 17:01 ` Peter Xu
2025-02-14 20:12 ` Steven Sistare
2025-02-18 16:10 ` Peter Xu
2025-02-25 15:46 ` Steven Sistare [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=492dee01-fe64-4ce4-b298-a9eb5d731ea6@oracle.com \
--to=steven.sistare@oracle.com \
--cc=david@redhat.com \
--cc=farosas@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).