From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>,
qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Eduardo Habkost <eduardo@habkost.net>,
Philippe Mathieu-Daude <philmd@linaro.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH V2 01/13] machine: alloc-anon option
Date: Fri, 4 Oct 2024 08:33:10 -0400 [thread overview]
Message-ID: <Zv_ghrH6i4QOzne8@x1n> (raw)
In-Reply-To: <2143f803-439e-4b8b-ae92-07caa913d646@redhat.com>
On Fri, Oct 04, 2024 at 12:14:35PM +0200, David Hildenbrand wrote:
> On 03.10.24 18:14, Peter Xu wrote:
> > On Mon, Sep 30, 2024 at 12:40:32PM -0700, Steve Sistare wrote:
> > > Allocate anonymous memory using mmap MAP_ANON or memfd_create depending
> > > on the value of the anon-alloc machine property. This option applies to
> > > memory allocated as a side effect of creating various devices. It does
> > > not apply to memory-backend-objects, whether explicitly specified on
> > > the command line, or implicitly created by the -m command line option.
> > >
> > > The memfd option is intended to support new migration modes, in which the
> > > memory region can be transferred in place to a new QEMU process, by sending
> > > the memfd file descriptor to the process. Memory contents are preserved,
> > > and if the mode also transfers device descriptors, then pages that are
> > > locked in memory for DMA remain locked. This behavior is a pre-requisite
> > > for supporting vfio, vdpa, and iommufd devices with the new modes.
> > >
> > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >
> > [Igor seems missing in the loop; added]
> >
> > > ---
> > > hw/core/machine.c | 19 +++++++++++++++++++
> > > include/hw/boards.h | 1 +
> > > qapi/machine.json | 14 ++++++++++++++
> > > qemu-options.hx | 11 +++++++++++
> > > system/physmem.c | 35 +++++++++++++++++++++++++++++++++++
> > > system/trace-events | 3 +++
> > > 6 files changed, 83 insertions(+)
> > >
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index adaba17..a89a32b 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -460,6 +460,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
> > > ms->mem_merge = value;
> > > }
> > > +static int machine_get_anon_alloc(Object *obj, Error **errp)
> > > +{
> > > + MachineState *ms = MACHINE(obj);
> > > +
> > > + return ms->anon_alloc;
> > > +}
> > > +
> > > +static void machine_set_anon_alloc(Object *obj, int value, Error **errp)
> > > +{
> > > + MachineState *ms = MACHINE(obj);
> > > +
> > > + ms->anon_alloc = value;
> > > +}
> > > +
> > > static bool machine_get_usb(Object *obj, Error **errp)
> > > {
> > > MachineState *ms = MACHINE(obj);
> > > @@ -1078,6 +1092,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
> > > object_class_property_set_description(oc, "mem-merge",
> > > "Enable/disable memory merge support");
> > > + object_class_property_add_enum(oc, "anon-alloc", "AnonAllocOption",
> > > + &AnonAllocOption_lookup,
> > > + machine_get_anon_alloc,
> > > + machine_set_anon_alloc);
> > > +
> > > object_class_property_add_bool(oc, "usb",
> > > machine_get_usb, machine_set_usb);
> > > object_class_property_set_description(oc, "usb",
> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > index 5966069..5a87647 100644
> > > --- a/include/hw/boards.h
> > > +++ b/include/hw/boards.h
> > > @@ -393,6 +393,7 @@ struct MachineState {
> > > bool enable_graphics;
> > > ConfidentialGuestSupport *cgs;
> > > HostMemoryBackend *memdev;
> > > + AnonAllocOption anon_alloc;
> > > /*
> > > * convenience alias to ram_memdev_id backend memory region
> > > * or to numa container memory region
> > > diff --git a/qapi/machine.json b/qapi/machine.json
> > > index a6b8795..d4a63f5 100644
> > > --- a/qapi/machine.json
> > > +++ b/qapi/machine.json
> > > @@ -1898,3 +1898,17 @@
> > > { 'command': 'x-query-interrupt-controllers',
> > > 'returns': 'HumanReadableText',
> > > 'features': [ 'unstable' ]}
> > > +
> > > +##
> > > +# @AnonAllocOption:
> > > +#
> > > +# An enumeration of the options for allocating anonymous guest memory.
> > > +#
> > > +# @mmap: allocate using mmap MAP_ANON
> > > +#
> > > +# @memfd: allocate using memfd_create
> > > +#
> > > +# Since: 9.2
> > > +##
> > > +{ 'enum': 'AnonAllocOption',
> > > + 'data': [ 'mmap', 'memfd' ] }
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index d94e2cb..90ab943 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> > > " nvdimm=on|off controls NVDIMM support (default=off)\n"
> > > " memory-encryption=@var{} memory encryption object to use (default=none)\n"
> > > " hmat=on|off controls ACPI HMAT support (default=off)\n"
> > > + " anon-alloc=mmap|memfd allocate anonymous guest RAM using mmap MAP_ANON or memfd_create (default: mmap)\n"
> > > " memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
> > > " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
> > > QEMU_ARCH_ALL)
> > > @@ -101,6 +102,16 @@ SRST
> > > Enables or disables ACPI Heterogeneous Memory Attribute Table
> > > (HMAT) support. The default is off.
> > > + ``anon-alloc=mmap|memfd``
> > > + Allocate anonymous guest RAM using mmap MAP_ANON (the default)
> > > + or memfd_create. This option applies to memory allocated as a
> > > + side effect of creating various devices. It does not apply to
> > > + memory-backend-objects, whether explicitly specified on the
> > > + command line, or implicitly created by the -m command line
> > > + option.
> > > +
> > > + Some migration modes require anon-alloc=memfd.
> > > +
> > > ``memory-backend='id'``
> > > An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
> > > Allows to use a memory backend as main RAM.
> > > diff --git a/system/physmem.c b/system/physmem.c
> > > index dc1db3a..174f7e0 100644
> > > --- a/system/physmem.c
> > > +++ b/system/physmem.c
> > > @@ -47,6 +47,7 @@
> > > #include "qemu/qemu-print.h"
> > > #include "qemu/log.h"
> > > #include "qemu/memalign.h"
> > > +#include "qemu/memfd.h"
> > > #include "exec/memory.h"
> > > #include "exec/ioport.h"
> > > #include "sysemu/dma.h"
> > > @@ -69,6 +70,8 @@
> > > #include "qemu/pmem.h"
> > > +#include "qapi/qapi-types-migration.h"
> > > +#include "migration/options.h"
> > > #include "migration/vmstate.h"
> > > #include "qemu/range.h"
> > > @@ -1849,6 +1852,35 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
> > > qemu_mutex_unlock_ramlist();
> > > return;
> > > }
> > > +
> > > + } else if (current_machine->anon_alloc == ANON_ALLOC_OPTION_MEMFD &&
> > > + !object_dynamic_cast(new_block->mr->parent_obj.parent,
> > > + TYPE_MEMORY_BACKEND)) {
> >
> > This is pretty fragile.. if someone adds yet another layer on top of memory
> > backend objects, the ownership links can change and this might silently run
> > into something else even without any warning..
> >
> > I wished we dig into what is missing, but maybe that's too trivial. If
> > not, we still need to make this as solid. Perhaps that can be a ram flag
> > and let relevant callers pass in that flag explicitly.
>
> How would they decide whether or not we want to set the flag in the current
> configuration?
It was in the previous email where it got cut.. I listed four paths that
may need change.
>
> >
> > I think RAM_SHARED can actually be that flag already - I mean, in all paths
> > that we may create anon mem (but not memory-backend-* objects), is it
> > always safe we always switch to RAM_SHARED from anon?
>
> Do you mean only setting the flag (-> anonymous shmem) or switching also to
> memfd, which is a bigger change?
Switching to memfd. I thought anon shmem (mmap(MAP_SHARED)) is mostly the
same internally, if we create memfd then mmap(MAP_SHARED) on top of it, no?
In this case we need the fds so we need to do the latter to cache the fds.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-10-04 12:34 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-30 19:40 [PATCH V2 00/13] Live update: cpr-transfer Steve Sistare
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14 ` Peter Xu
2024-10-04 10:14 ` David Hildenbrand
2024-10-04 12:33 ` Peter Xu [this message]
2024-10-04 12:54 ` David Hildenbrand
2024-10-04 13:24 ` Peter Xu
2024-10-07 16:23 ` David Hildenbrand
2024-10-07 19:05 ` Peter Xu
2024-10-07 15:36 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14 ` Peter Xu
2024-10-07 19:30 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-07 20:10 ` Peter Xu
2024-10-08 15:57 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27 ` Peter Xu
2024-10-07 20:52 ` Steven Sistare
2024-10-08 15:35 ` Peter Xu
2024-10-08 19:13 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49 ` Peter Xu
2024-10-07 16:28 ` Peter Xu
2024-10-08 15:17 ` Steven Sistare
2024-10-08 16:26 ` Peter Xu
2024-10-08 21:05 ` Steven Sistare
2024-10-08 21:32 ` Peter Xu
2024-10-31 20:32 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06 ` Peter Xu
2024-10-07 16:35 ` Daniel P. Berrangé
2024-10-07 18:12 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47 ` Peter Xu
2024-10-07 19:31 ` Steven Sistare
2024-10-08 15:36 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18 ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44 ` Peter Xu
2024-10-07 20:39 ` Steven Sistare
2024-10-08 15:45 ` Peter Xu
2024-10-08 19:12 ` Steven Sistare
2024-10-08 19:38 ` Peter Xu
2024-10-08 18:28 ` Fabiano Rosas
2024-10-08 18:47 ` Peter Xu
2024-10-08 19:11 ` Fabiano Rosas
2024-10-08 19:33 ` Steven Sistare
2024-10-08 19:48 ` Peter Xu
2024-10-09 18:43 ` Steven Sistare
2024-10-09 19:06 ` Peter Xu
2024-10-09 19:59 ` Peter Xu
2024-10-09 20:18 ` Steven Sistare
2024-10-09 20:57 ` Peter Xu
2024-10-09 22:08 ` Fabiano Rosas
2024-10-10 20:05 ` Steven Sistare
2024-10-09 20:09 ` Steven Sistare
2024-10-09 20:36 ` Peter Xu
2024-10-10 20:06 ` Steven Sistare
2024-10-10 21:23 ` Peter Xu
2024-10-24 21:12 ` Steven Sistare
2024-10-25 13:55 ` Peter Xu
2024-10-25 15:04 ` Steven Sistare
2024-10-08 19:29 ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13 ` Steven Sistare
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zv_ghrH6i4QOzne8@x1n \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=farosas@suse.de \
--cc=imammedo@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pbonzini@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).