qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] migration: ram block cpr blockers
Date: Fri, 14 Feb 2025 15:12:22 -0500	[thread overview]
Message-ID: <f1f1c640-367d-4966-93f5-a98e3b764e28@oracle.com> (raw)
In-Reply-To: <Z5uwUNtYhQfJJlg3@x1.local>

On 1/30/2025 12:01 PM, Peter Xu wrote:
> On Wed, Jan 29, 2025 at 01:20:13PM -0500, Steven Sistare wrote:
>> On 1/17/2025 6:57 PM, Peter Xu wrote:
>>> On Fri, Jan 17, 2025 at 02:10:14PM -0500, Steven Sistare wrote:
>>>> On 1/17/2025 1:16 PM, Peter Xu wrote:
>>>>> On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote:
>>>>>> +/*
>>>>>> + * Return true if ram contents would be lost during CPR.
>>>>>> + * Return false for ram_device because it is remapped in new QEMU.  Do not
>>>>>> + * exclude rom, even though it is readonly, because the rom file could change
>>>>>> + * in new QEMU.  Return false for non-migratable blocks.  They are either
>>>>>> + * re-created in new QEMU, or are handled specially, or are covered by a
>>>>>> + * device-level CPR blocker.  Return false for an fd, because it is visible and
>>>>>> + * can be remapped in new QEMU.
>>>>>> + */
>>>>>> +static bool ram_is_volatile(RAMBlock *rb)
>>>>>> +{
>>>>>> +    MemoryRegion *mr = rb->mr;
>>>>>> +
>>>>>> +    return mr &&
>>>>>> +        memory_region_is_ram(mr) &&
>>>>>> +        !memory_region_is_ram_device(mr) &&
>>>>>> +        (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) &&
>>>>>> +        qemu_ram_is_migratable(rb) &&
>>>>>> +        rb->fd < 0;
>>>>>> +}
>>>>>
>>>>> Blocking guest_memfd looks ok, but comparing to add one more block
>>>>> notifier, can we check all ramblocks once in migrate_prepare(), and fail
>>>>> that command directly if it fails the check?
>>>>
>>>> In an upcoming patch, I will be adding an option analogous to only-migratable which
>>>> prevents QEMU from starting if anything would block cpr-transfer.  That option
>>>> will be checked when blockers are added, like for only-migratable. migrate_prepare
>>>> is too late.
>>>>
>>>>> OTOH, is there any simpler way to simplify the check conditions?  It'll be
>>>>> at least nice to break these checks into smaller if conditions for
>>>>> readability..
>>>>
>>>> I thought the function header comments made it clear, but I could move each
>>>> comment next to each condition:
>>>>
>>>>       ...
>>>>       /*
>>>>        * Return false for an fd, because it is visible and can be remapped in
>>>>        * new QEMU.
>>>>        */
>>>>       if (rb->fd >= 0) {
>>>>           return false;
>>>>       }
>>>>       ...
>>>>
>>>>> I wonder if we could stick with looping over all ramblocks, then make sure
>>>>> each of them is on the cpr saved fd list.  It may need to make
>>>>> cpr_save_fd() always register with the name of ramblock to do such lookup,
>>>>> or maybe we could also cache the ramblock pointer in CprFd, then the lookup
>>>>> will be a pointer match check.
>>>>
>>>> Some ramblocks are not on the list, such as named files.  Plus looping in
>>>> migrate_prepare is too late as noted above.
>>>>
>>>> IMO what I have already implemented using blockers is clean and elegant.
>>>
>>> OK if we need to fail it early at boot, then yes blockers are probably
>>> better.
>>>
>>> We'll need one more cmdline parameter. I've no objection, but I don't know
>>> how to judge when it's ok to add, when it's better not.. I'll leave others
>>> to comment on this.
>>>
>>> But still, could we check it when ramblocks are created?  So in that way
>>> whatever is forbidden is clear in its own path, I feel like that could be
>>> clearer (like what you did with gmemfd).
>>
>> When the ramblock is created, we don't yet know if it is migratable. A
>> ramblock that is not migratable does not block cpr. Migratable is not known
>> until vmstate_register_ram calls qemu_ram_set_migratable.  Hence that is
>> where I evaluate conditions and install a blocker.
>>
>> Because that is the only place where ram_block_add_cpr_blocker is called,
>> the test qemu_ram_is_migratable() inside ram_block_add_cpr_blocker is
>> redundant, and I should delete it.
> 
> Hmm.. sounds reasonable.
> 
>>
>>> For example, if I start to convert some of your requirements above, then
>>> memory_region_is_ram_device() implies RAM_PREALLOC.  Actually, ram_device
>>> is not the only RAM_PREALLOC user..  Say, would it also not work with all
>>> memory_region_init_ram_ptr() users (even if they're not ram_device)?  An
>>> example is, looks like virtio-gpu can create random ramblocks on the fly
>>> with prealloced buffers.  I am not sure whether they can be pinned by VFIO
>>> too.  You may know better.
>>
>> That memory is not visible to the guest.  It is not part of system_memory,
>> and is not marked migratable.
> 
> I _think_ that can still be visible at least for the virtio-gpu use case,
> which hangs under VirtIOGPUBase.hostmem.  Relevant code for reference:
> 
> virtio_gpu_virgl_map_resource_blob:
>      memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
>      memory_region_add_subregion(&b->hostmem, offset, mr);
>      memory_region_set_enabled(mr, true);
> 
> virtio_gpu_pci_base_realize:
>          memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
>                             g->conf.hostmem);
>          pci_register_bar(&vpci_dev->pci_dev, 4,
>                           PCI_BASE_ADDRESS_SPACE_MEMORY |
>                           PCI_BASE_ADDRESS_MEM_PREFETCH |
>                           PCI_BASE_ADDRESS_MEM_TYPE_64,
>                           &g->hostmem);
> 
> pci_update_mappings:
>              memory_region_add_subregion_overlap(r->address_space,
>                                                  r->addr, r->memory, 1);
> 
> but indeed I'm not sure how it work with migration so far, because it
> doesn't have RAM_MIGRATABLE set. So at least cpr didn't make it more
> special.  I assume this isn't something we must figure out as of now
> then.. but if you do, please kindly share.

AFAICT this memory cannot be pinned, because it is not attached to the
"system_memory" mr and the "address_space_memory" address space.  The
listener than maps/pins is attached to address_space_memory.

I am submitting V2 of this patch with more embedded comments.

- Steve


  reply	other threads:[~2025-02-14 20:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 17:46 [PATCH] migration: ram block cpr blockers Steve Sistare
2025-01-17 18:16 ` Peter Xu
2025-01-17 19:10   ` Steven Sistare
2025-01-17 23:57     ` Peter Xu
2025-01-29 18:20       ` Steven Sistare
2025-01-30 17:01         ` Peter Xu
2025-02-14 20:12           ` Steven Sistare [this message]
2025-02-18 16:10             ` Peter Xu
2025-02-25 15:46               ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1f1c640-367d-4966-93f5-a98e3b764e28@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).