Re: [PATCH V4 02/19] physmem: fd-based shared memory

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH V4 02/19] physmem: fd-based shared memory
Date: Fri, 13 Dec 2024 11:41:45 -0500	[thread overview]
Message-ID: <bf0e7550-54a0-42ff-a281-6a65cb1ba7b5@oracle.com> (raw)
In-Reply-To: <Z1tUBUcpf1XcVRhG@x1n>

On 12/12/2024 4:22 PM, Peter Xu wrote:
> On Thu, Dec 12, 2024 at 03:38:00PM -0500, Steven Sistare wrote:
>> On 12/9/2024 2:42 PM, Peter Xu wrote:
>>> On Mon, Dec 02, 2024 at 05:19:54AM -0800, Steve Sistare wrote:
>>>> @@ -2089,13 +2154,23 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>>>>        new_block->page_size = qemu_real_host_page_size();
>>>>        new_block->host = host;
>>>>        new_block->flags = ram_flags;
>>>> +
>>>> +    if (!host && !xen_enabled()) {
>>>
>>> Adding one more xen check is unnecessary.  This patch needed it could mean
>>> that the patch can be refactored.. because we have xen checks in both
>>> ram_block_add() and also in the fd allocation path.
>>>
>>> At the meantime, see:
>>>
>>> qemu_ram_alloc_from_fd():
>>>       if (kvm_enabled() && !kvm_has_sync_mmu()) {
>>>           error_setg(errp,
>>>                      "host lacks kvm mmu notifiers, -mem-path unsupported");
>>>           return NULL;
>>>       }
>>>
>>> I don't think any decent kernel could hit this, but that could be another
>>> sign that this patch duplicated some file allocations.
>>>
>>>> +        if ((new_block->flags & RAM_SHARED) &&
>>>> +            !qemu_ram_alloc_shared(new_block, &local_err)) {
>>>> +            goto err;
>>>> +        }
>>>> +    }
>>>> +
>>>>        ram_block_add(new_block, &local_err);
>>>> -    if (local_err) {
>>>> -        g_free(new_block);
>>>> -        error_propagate(errp, local_err);
>>>> -        return NULL;
>>>> +    if (!local_err) {
>>>> +        return new_block;
>>>>        }
>>>> -    return new_block;
>>>> +
>>>> +err:
>>>> +    g_free(new_block);
>>>> +    error_propagate(errp, local_err);
>>>> +    return NULL;
>>>>    }
>>>
>>> IIUC we only need to conditionally convert an anon-allocation into an
>>> fd-allocation, and then we don't need to mostly duplicate
>>> qemu_ram_alloc_from_fd(), instead we reuse it.
>>>
>>> I do have a few other comments elsewhere, but when I was trying to comment.
>>> E.g., we either shouldn't need to bother caching qemu_memfd_check()
>>> results, or do it in qemu_memfd_check() directly.. and some more.
>>
>> Someone thought it a good idea to cache the result of qemu_memfd_alloc_check,
>> and qemu_memfd_check will be called more often.  I'll cache the result inside
>> qemu_memfd_check for the special case of flags=0.
> 
> OK.
> 
>>
>>> Then I think it's easier I provide a patch, and also show that it can be
>>> also smaller changes to do the same thing, with everything fixed up
>>> (e.g. addressing above mmu notifier missing issue).  What do you think as
>>> below?
>>
>> The key change you make is calling qemu_ram_alloc_from_fd instead of file_ram_alloc,
>> which buys the xen and kvm checks for free.  Sounds good, I will do that in the
>> context of my patch.
>>
>> Here are some other changes in your patch, and my responses:
>>
>> I will drop the "Retrying using MAP_ANON|MAP_SHARED" message, as you did.
>>
>> However, I am keeping QEMU_VMALLOC_ALIGN, qemu_set_cloexec, and trace_qemu_ram_alloc_shared.
> 
> I guess no huge deal on these, however since we're talking..  Is that
> QEMU_VMALLOC_ALIGN from qemu_anon_ram_alloc()?
> 
> A quick dig tells me that it was used to be for anon THPs..
> 
>      commit 36b586284e678da28df3af9fd0907d2b16f9311c
>      Author: Avi Kivity <avi@redhat.com>
>      Date:   Mon Sep 5 11:07:05 2011 +0300
> 
>      qemu_vmalloc: align properly for transparent hugepages and KVM
> 
> And I'm guessing if at that time was also majorly for guest ram.
> 
> Considering that this path won't make an effect until the new aux mem
> option is on, I'd think it better to stick without anything special like
> QEMU_VMALLOC_ALIGN, until it's justified to be worthwhile.  E.g., Avi used
> to explicitly mention this in that commit message:
> 
>      Adjust qemu_vmalloc() to honor that requirement.  Ignore it for small regions
>      to avoid fragmentation.
> 
> And this is exactly mostly small regions when it's AUX.. probably except
> VGA, but it'll be SHARED on top of shmem not PRIVATE on anon anyway... so
> it'll be totally different things.
> 
> So I won't worry on that 2M alignment, and I will try to not carry over
> that, because then trying to remove it will be harder.. even when we want.

Yes, currently the aux allocations get QEMU_VMALLOC_ALIGN alignment in
qemu_anon_ram_alloc.  I do the same for the shared fd mappings to guarantee
no performance regression, as some of them are larger than 2M and would
benefit from using huge pages.  The VA fragmentation is trivial for this small
number of aux blocks in a 64-bit address space, and is no different than it was
for qemu_anon_ram_alloc.

> For the 2nd.. Any quick answer on why explicit qemu_set_cloexec() needed?

qemu sets cloexec for all descriptors it opens to prevent them from accidentally
being leaked to another process via fork+exec.

> For 3rd, tracepoint would definitely be fine whenever you feel necessary.
> 
>> Also, when qemu_memfd_create + qemu_ram_alloc_from_fd fails, qemu should fail and exit,
>> and not fall back, because something unexpected went wrong.  David said the same.
> 
> Why?  I was trying to rely on such fallback to make it work on e.g. Xen.
> In that case, Xen fails there and fallback to xen_ram_alloc() inside the
> later call to ram_block_add(), no?

Why -- because something went wrong that should have worked, and we should report the
first fault so its cause can be fixed and cpr can be used.

However, to do the above, but still quietly fallback if qemu_ram_alloc_from_fd
fails because of xen or kvm, I would need to return different error codes from
qemu_ram_alloc_from_fd.  Doable, but requires tweaks to all occurrences of
qemu_ram_alloc_from_fd.

And BTW, qemu_ram_alloc_from_fd is defined for CONFIG_POSIX only.  I need
to modify the call site in the patch accordingly.

Overall, I am not convinced that using qemu_ram_alloc_from_fd in this patch
is better/simpler than my V4 patch using file_ram_alloc, plus adding xen and
kvm_has_sync_mmu checks in qemu_ram_alloc_internal.

- Steve

next prev parent reply	other threads:[~2024-12-13 16:43 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 13:19 [PATCH V4 00/19] Live update: cpr-transfer Steve Sistare
2024-12-02 13:19 ` [PATCH V4 01/19] backends/hostmem-shm: factor out allocation of "anonymous shared memory with an fd" Steve Sistare
2024-12-09 17:36   ` Peter Xu
2024-12-12 20:37     ` Steven Sistare
2024-12-02 13:19 ` [PATCH V4 02/19] physmem: fd-based shared memory Steve Sistare
2024-12-09 19:42   ` Peter Xu
2024-12-12 20:38     ` Steven Sistare
2024-12-12 21:22       ` Peter Xu
2024-12-13 16:41         ` Steven Sistare [this message]
2024-12-13 17:05           ` Steven Sistare
2024-12-16 18:19           ` Peter Xu
2024-12-17 21:54             ` Steven Sistare
2024-12-17 22:46               ` Peter Xu
2024-12-18 16:34                 ` Steven Sistare
2024-12-02 13:19 ` [PATCH V4 03/19] memory: add RAM_PRIVATE Steve Sistare
2024-12-09 19:45   ` Peter Xu
2024-12-02 13:19 ` [PATCH V4 04/19] machine: aux-ram-share option Steve Sistare
2024-12-05  8:25   ` Markus Armbruster
2024-12-05 14:24     ` Steven Sistare
2024-12-05 12:08   ` Markus Armbruster
2024-12-05 12:19     ` Markus Armbruster
2024-12-05 14:24       ` Steven Sistare
2024-12-09 19:54   ` Peter Xu
2024-12-12 20:38     ` Steven Sistare
2024-12-12 21:22       ` Peter Xu
2024-12-02 13:19 ` [PATCH V4 05/19] migration: cpr-state Steve Sistare
2024-12-02 13:19 ` [PATCH V4 06/19] physmem: preserve ram blocks for cpr Steve Sistare
2024-12-09 20:07   ` Peter Xu
2024-12-12 20:38     ` Steven Sistare
2024-12-12 22:48       ` Peter Xu
2024-12-13 15:21         ` Peter Xu
2024-12-13 15:30           ` Steven Sistare
2024-12-18 16:34             ` Steven Sistare
2024-12-18 17:00               ` Peter Xu
2024-12-18 20:22                 ` Steven Sistare
2024-12-18 20:33                   ` Peter Xu
2024-12-02 13:19 ` [PATCH V4 07/19] hostmem-memfd: preserve " Steve Sistare
2024-12-18 19:53   ` Steven Sistare
2024-12-18 20:23     ` Peter Xu
2024-12-02 13:20 ` [PATCH V4 08/19] hostmem-shm: " Steve Sistare
2024-12-12 17:38   ` Peter Xu
2024-12-02 13:20 ` [PATCH V4 09/19] migration: incoming channel Steve Sistare
2024-12-05 15:23   ` Markus Armbruster
2024-12-05 20:45     ` Steven Sistare
2024-12-09 12:12       ` Markus Armbruster
2024-12-09 16:36         ` Peter Xu
2024-12-11  9:18         ` Markus Armbruster
2024-12-11 18:58         ` Steven Sistare
2024-12-10 12:46     ` Markus Armbruster
2024-12-02 13:20 ` [PATCH V4 10/19] migration: cpr channel Steve Sistare
2024-12-05 15:37   ` Markus Armbruster
2024-12-05 20:46     ` Steven Sistare
2024-12-06  9:31       ` Markus Armbruster
2024-12-18 19:53         ` Steven Sistare
2024-12-18 20:27           ` Peter Xu
2024-12-18 20:31             ` Steven Sistare
2024-12-02 13:20 ` [PATCH V4 11/19] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-12-02 13:20 ` [PATCH V4 12/19] migration: VMSTATE_FD Steve Sistare
2024-12-02 13:20 ` [PATCH V4 13/19] migration: cpr-transfer save and load Steve Sistare
2024-12-02 13:20 ` [PATCH V4 14/19] migration: cpr-transfer mode Steve Sistare
2024-12-04 16:10   ` Steven Sistare
2024-12-10 12:26   ` Markus Armbruster
2024-12-11 22:05     ` Steven Sistare
2024-12-02 13:20 ` [PATCH V4 15/19] tests/migration-test: memory_backend Steve Sistare
2024-12-02 13:20 ` [PATCH V4 16/19] tests/qtest: defer connection Steve Sistare
2024-12-18 21:02   ` Steven Sistare
2024-12-19 15:46   ` Peter Xu
2024-12-19 22:33     ` Steven Sistare
2024-12-02 13:20 ` [PATCH V4 17/19] tests/migration-test: " Steve Sistare
2024-12-02 13:20 ` [PATCH V4 18/19] migration-test: cpr-transfer Steve Sistare
2024-12-18 21:03   ` Steven Sistare
2024-12-19 16:56   ` Peter Xu
2024-12-19 22:34     ` Steven Sistare
2024-12-20 15:41       ` Peter Xu
2024-12-02 13:20 ` [PATCH V4 19/19] migration: cpr-transfer documentation Steve Sistare
2024-12-18 21:03   ` Steven Sistare
2024-12-19 17:02   ` Peter Xu
2024-12-19 22:35     ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf0e7550-54a0-42ff-a281-6a65cb1ba7b5@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).