From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH V2 01/13] machine: alloc-anon option
Date: Fri, 4 Oct 2024 14:54:38 +0200 [thread overview]
Message-ID: <bffa3dc0-36b7-4fa1-a0b6-cce34743a46c@redhat.com> (raw)
In-Reply-To: <Zv_ghrH6i4QOzne8@x1n>
On 04.10.24 14:33, Peter Xu wrote:
> On Fri, Oct 04, 2024 at 12:14:35PM +0200, David Hildenbrand wrote:
>> On 03.10.24 18:14, Peter Xu wrote:
>>> On Mon, Sep 30, 2024 at 12:40:32PM -0700, Steve Sistare wrote:
>>>> Allocate anonymous memory using mmap MAP_ANON or memfd_create depending
>>>> on the value of the anon-alloc machine property. This option applies to
>>>> memory allocated as a side effect of creating various devices. It does
>>>> not apply to memory-backend-objects, whether explicitly specified on
>>>> the command line, or implicitly created by the -m command line option.
>>>>
>>>> The memfd option is intended to support new migration modes, in which the
>>>> memory region can be transferred in place to a new QEMU process, by sending
>>>> the memfd file descriptor to the process. Memory contents are preserved,
>>>> and if the mode also transfers device descriptors, then pages that are
>>>> locked in memory for DMA remain locked. This behavior is a pre-requisite
>>>> for supporting vfio, vdpa, and iommufd devices with the new modes.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>
>>> [Igor seems missing in the loop; added]
>>>
>>>> ---
>>>> hw/core/machine.c | 19 +++++++++++++++++++
>>>> include/hw/boards.h | 1 +
>>>> qapi/machine.json | 14 ++++++++++++++
>>>> qemu-options.hx | 11 +++++++++++
>>>> system/physmem.c | 35 +++++++++++++++++++++++++++++++++++
>>>> system/trace-events | 3 +++
>>>> 6 files changed, 83 insertions(+)
>>>>
>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>>> index adaba17..a89a32b 100644
>>>> --- a/hw/core/machine.c
>>>> +++ b/hw/core/machine.c
>>>> @@ -460,6 +460,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
>>>> ms->mem_merge = value;
>>>> }
>>>> +static int machine_get_anon_alloc(Object *obj, Error **errp)
>>>> +{
>>>> + MachineState *ms = MACHINE(obj);
>>>> +
>>>> + return ms->anon_alloc;
>>>> +}
>>>> +
>>>> +static void machine_set_anon_alloc(Object *obj, int value, Error **errp)
>>>> +{
>>>> + MachineState *ms = MACHINE(obj);
>>>> +
>>>> + ms->anon_alloc = value;
>>>> +}
>>>> +
>>>> static bool machine_get_usb(Object *obj, Error **errp)
>>>> {
>>>> MachineState *ms = MACHINE(obj);
>>>> @@ -1078,6 +1092,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
>>>> object_class_property_set_description(oc, "mem-merge",
>>>> "Enable/disable memory merge support");
>>>> + object_class_property_add_enum(oc, "anon-alloc", "AnonAllocOption",
>>>> + &AnonAllocOption_lookup,
>>>> + machine_get_anon_alloc,
>>>> + machine_set_anon_alloc);
>>>> +
>>>> object_class_property_add_bool(oc, "usb",
>>>> machine_get_usb, machine_set_usb);
>>>> object_class_property_set_description(oc, "usb",
>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>>>> index 5966069..5a87647 100644
>>>> --- a/include/hw/boards.h
>>>> +++ b/include/hw/boards.h
>>>> @@ -393,6 +393,7 @@ struct MachineState {
>>>> bool enable_graphics;
>>>> ConfidentialGuestSupport *cgs;
>>>> HostMemoryBackend *memdev;
>>>> + AnonAllocOption anon_alloc;
>>>> /*
>>>> * convenience alias to ram_memdev_id backend memory region
>>>> * or to numa container memory region
>>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>>> index a6b8795..d4a63f5 100644
>>>> --- a/qapi/machine.json
>>>> +++ b/qapi/machine.json
>>>> @@ -1898,3 +1898,17 @@
>>>> { 'command': 'x-query-interrupt-controllers',
>>>> 'returns': 'HumanReadableText',
>>>> 'features': [ 'unstable' ]}
>>>> +
>>>> +##
>>>> +# @AnonAllocOption:
>>>> +#
>>>> +# An enumeration of the options for allocating anonymous guest memory.
>>>> +#
>>>> +# @mmap: allocate using mmap MAP_ANON
>>>> +#
>>>> +# @memfd: allocate using memfd_create
>>>> +#
>>>> +# Since: 9.2
>>>> +##
>>>> +{ 'enum': 'AnonAllocOption',
>>>> + 'data': [ 'mmap', 'memfd' ] }
>>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>>> index d94e2cb..90ab943 100644
>>>> --- a/qemu-options.hx
>>>> +++ b/qemu-options.hx
>>>> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>>> " nvdimm=on|off controls NVDIMM support (default=off)\n"
>>>> " memory-encryption=@var{} memory encryption object to use (default=none)\n"
>>>> " hmat=on|off controls ACPI HMAT support (default=off)\n"
>>>> + " anon-alloc=mmap|memfd allocate anonymous guest RAM using mmap MAP_ANON or memfd_create (default: mmap)\n"
>>>> " memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
>>>> " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
>>>> QEMU_ARCH_ALL)
>>>> @@ -101,6 +102,16 @@ SRST
>>>> Enables or disables ACPI Heterogeneous Memory Attribute Table
>>>> (HMAT) support. The default is off.
>>>> + ``anon-alloc=mmap|memfd``
>>>> + Allocate anonymous guest RAM using mmap MAP_ANON (the default)
>>>> + or memfd_create. This option applies to memory allocated as a
>>>> + side effect of creating various devices. It does not apply to
>>>> + memory-backend-objects, whether explicitly specified on the
>>>> + command line, or implicitly created by the -m command line
>>>> + option.
>>>> +
>>>> + Some migration modes require anon-alloc=memfd.
>>>> +
>>>> ``memory-backend='id'``
>>>> An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
>>>> Allows to use a memory backend as main RAM.
>>>> diff --git a/system/physmem.c b/system/physmem.c
>>>> index dc1db3a..174f7e0 100644
>>>> --- a/system/physmem.c
>>>> +++ b/system/physmem.c
>>>> @@ -47,6 +47,7 @@
>>>> #include "qemu/qemu-print.h"
>>>> #include "qemu/log.h"
>>>> #include "qemu/memalign.h"
>>>> +#include "qemu/memfd.h"
>>>> #include "exec/memory.h"
>>>> #include "exec/ioport.h"
>>>> #include "sysemu/dma.h"
>>>> @@ -69,6 +70,8 @@
>>>> #include "qemu/pmem.h"
>>>> +#include "qapi/qapi-types-migration.h"
>>>> +#include "migration/options.h"
>>>> #include "migration/vmstate.h"
>>>> #include "qemu/range.h"
>>>> @@ -1849,6 +1852,35 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>>>> qemu_mutex_unlock_ramlist();
>>>> return;
>>>> }
>>>> +
>>>> + } else if (current_machine->anon_alloc == ANON_ALLOC_OPTION_MEMFD &&
>>>> + !object_dynamic_cast(new_block->mr->parent_obj.parent,
>>>> + TYPE_MEMORY_BACKEND)) {
>>>
>>> This is pretty fragile.. if someone adds yet another layer on top of memory
>>> backend objects, the ownership links can change and this might silently run
>>> into something else even without any warning..
>>>
>>> I wished we dig into what is missing, but maybe that's too trivial. If
>>> not, we still need to make this as solid. Perhaps that can be a ram flag
>>> and let relevant callers pass in that flag explicitly.
>>
>> How would they decide whether or not we want to set the flag in the current
>> configuration?
>
> It was in the previous email where it got cut.. I listed four paths that
> may need change.
That's not my question. Who would decide whether we want to set
MAP_SHARED in these callers or not?
If you have "unconditionally" in mind, I think it's a bad idea. If there
is some other toggle to perform that setting conditionally, why not.
>
>>
>>>
>>> I think RAM_SHARED can actually be that flag already - I mean, in all paths
>>> that we may create anon mem (but not memory-backend-* objects), is it
>>> always safe we always switch to RAM_SHARED from anon?
>>
>> Do you mean only setting the flag (-> anonymous shmem) or switching also to
>> memfd, which is a bigger change?
>
> Switching to memfd. I thought anon shmem (mmap(MAP_SHARED)) is mostly the
> same internally, if we create memfd then mmap(MAP_SHARED) on top of it, no?
Memfd is Linux specific, keep that in mind. Apart from that there
shouldn't be much difference between anon shmem and memfd (there are
memory commit differences, though).
Of course, there is a difference between anon memory and shmem, for
example regarding what viritofsd faced (e.g., KSM) recently.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-10-04 12:55 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 19:40 [PATCH V2 00/13] Live update: cpr-transfer Steve Sistare
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14 ` Peter Xu
2024-10-04 10:14 ` David Hildenbrand
2024-10-04 12:33 ` Peter Xu
2024-10-04 12:54 ` David Hildenbrand [this message]
2024-10-04 13:24 ` Peter Xu
2024-10-07 16:23 ` David Hildenbrand
2024-10-07 19:05 ` Peter Xu
2024-10-07 15:36 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-07 20:10 ` Peter Xu
2024-10-08 15:57 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27 ` Peter Xu
2024-10-07 20:52 ` Steven Sistare
2024-10-08 15:35 ` Peter Xu
2024-10-08 19:13 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49 ` Peter Xu
2024-10-07 16:28 ` Peter Xu
2024-10-08 15:17 ` Steven Sistare
2024-10-08 16:26 ` Peter Xu
2024-10-08 21:05 ` Steven Sistare
2024-10-08 21:32 ` Peter Xu
2024-10-31 20:32 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06 ` Peter Xu
2024-10-07 16:35 ` Daniel P. Berrangé
2024-10-07 18:12 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-08 15:36 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44 ` Peter Xu
2024-10-07 20:39 ` Steven Sistare
2024-10-08 15:45 ` Peter Xu
2024-10-08 19:12 ` Steven Sistare
2024-10-08 19:38 ` Peter Xu
2024-10-08 18:28 ` Fabiano Rosas
2024-10-08 18:47 ` Peter Xu
2024-10-08 19:11 ` Fabiano Rosas
2024-10-08 19:33 ` Steven Sistare
2024-10-08 19:48 ` Peter Xu
2024-10-09 18:43 ` Steven Sistare
2024-10-09 19:06 ` Peter Xu
2024-10-09 19:59 ` Peter Xu
2024-10-09 20:18 ` Steven Sistare
2024-10-09 20:57 ` Peter Xu
2024-10-09 22:08 ` Fabiano Rosas
2024-10-10 20:05 ` Steven Sistare
2024-10-09 20:09 ` Steven Sistare
2024-10-09 20:36 ` Peter Xu
2024-10-10 20:06 ` Steven Sistare
2024-10-10 21:23 ` Peter Xu
2024-10-24 21:12 ` Steven Sistare
2024-10-25 13:55 ` Peter Xu
2024-10-25 15:04 ` Steven Sistare
2024-10-08 19:29 ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bffa3dc0-36b7-4fa1-a0b6-cce34743a46c@redhat.com \
--to=david@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=imammedo@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).