qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Steve Sistare <steven.sistare@oracle.com>,
	qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH V2 01/13] machine: alloc-anon option
Date: Fri, 4 Oct 2024 08:33:10 -0400	[thread overview]
Message-ID: <Zv_ghrH6i4QOzne8@x1n> (raw)
In-Reply-To: <2143f803-439e-4b8b-ae92-07caa913d646@redhat.com>

On Fri, Oct 04, 2024 at 12:14:35PM +0200, David Hildenbrand wrote:
> On 03.10.24 18:14, Peter Xu wrote:
> > On Mon, Sep 30, 2024 at 12:40:32PM -0700, Steve Sistare wrote:
> > > Allocate anonymous memory using mmap MAP_ANON or memfd_create depending
> > > on the value of the anon-alloc machine property.  This option applies to
> > > memory allocated as a side effect of creating various devices. It does
> > > not apply to memory-backend-objects, whether explicitly specified on
> > > the command line, or implicitly created by the -m command line option.
> > > 
> > > The memfd option is intended to support new migration modes, in which the
> > > memory region can be transferred in place to a new QEMU process, by sending
> > > the memfd file descriptor to the process.  Memory contents are preserved,
> > > and if the mode also transfers device descriptors, then pages that are
> > > locked in memory for DMA remain locked.  This behavior is a pre-requisite
> > > for supporting vfio, vdpa, and iommufd devices with the new modes.
> > > 
> > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > 
> > [Igor seems missing in the loop; added]
> > 
> > > ---
> > >   hw/core/machine.c   | 19 +++++++++++++++++++
> > >   include/hw/boards.h |  1 +
> > >   qapi/machine.json   | 14 ++++++++++++++
> > >   qemu-options.hx     | 11 +++++++++++
> > >   system/physmem.c    | 35 +++++++++++++++++++++++++++++++++++
> > >   system/trace-events |  3 +++
> > >   6 files changed, 83 insertions(+)
> > > 
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index adaba17..a89a32b 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -460,6 +460,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
> > >       ms->mem_merge = value;
> > >   }
> > > +static int machine_get_anon_alloc(Object *obj, Error **errp)
> > > +{
> > > +    MachineState *ms = MACHINE(obj);
> > > +
> > > +    return ms->anon_alloc;
> > > +}
> > > +
> > > +static void machine_set_anon_alloc(Object *obj, int value, Error **errp)
> > > +{
> > > +    MachineState *ms = MACHINE(obj);
> > > +
> > > +    ms->anon_alloc = value;
> > > +}
> > > +
> > >   static bool machine_get_usb(Object *obj, Error **errp)
> > >   {
> > >       MachineState *ms = MACHINE(obj);
> > > @@ -1078,6 +1092,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
> > >       object_class_property_set_description(oc, "mem-merge",
> > >           "Enable/disable memory merge support");
> > > +    object_class_property_add_enum(oc, "anon-alloc", "AnonAllocOption",
> > > +                                   &AnonAllocOption_lookup,
> > > +                                   machine_get_anon_alloc,
> > > +                                   machine_set_anon_alloc);
> > > +
> > >       object_class_property_add_bool(oc, "usb",
> > >           machine_get_usb, machine_set_usb);
> > >       object_class_property_set_description(oc, "usb",
> > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > index 5966069..5a87647 100644
> > > --- a/include/hw/boards.h
> > > +++ b/include/hw/boards.h
> > > @@ -393,6 +393,7 @@ struct MachineState {
> > >       bool enable_graphics;
> > >       ConfidentialGuestSupport *cgs;
> > >       HostMemoryBackend *memdev;
> > > +    AnonAllocOption anon_alloc;
> > >       /*
> > >        * convenience alias to ram_memdev_id backend memory region
> > >        * or to numa container memory region
> > > diff --git a/qapi/machine.json b/qapi/machine.json
> > > index a6b8795..d4a63f5 100644
> > > --- a/qapi/machine.json
> > > +++ b/qapi/machine.json
> > > @@ -1898,3 +1898,17 @@
> > >   { 'command': 'x-query-interrupt-controllers',
> > >     'returns': 'HumanReadableText',
> > >     'features': [ 'unstable' ]}
> > > +
> > > +##
> > > +# @AnonAllocOption:
> > > +#
> > > +# An enumeration of the options for allocating anonymous guest memory.
> > > +#
> > > +# @mmap: allocate using mmap MAP_ANON
> > > +#
> > > +# @memfd: allocate using memfd_create
> > > +#
> > > +# Since: 9.2
> > > +##
> > > +{ 'enum': 'AnonAllocOption',
> > > +  'data': [ 'mmap', 'memfd' ] }
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index d94e2cb..90ab943 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> > >       "                nvdimm=on|off controls NVDIMM support (default=off)\n"
> > >       "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
> > >       "                hmat=on|off controls ACPI HMAT support (default=off)\n"
> > > +    "                anon-alloc=mmap|memfd allocate anonymous guest RAM using mmap MAP_ANON or memfd_create (default: mmap)\n"
> > >       "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
> > >       "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
> > >       QEMU_ARCH_ALL)
> > > @@ -101,6 +102,16 @@ SRST
> > >           Enables or disables ACPI Heterogeneous Memory Attribute Table
> > >           (HMAT) support. The default is off.
> > > +    ``anon-alloc=mmap|memfd``
> > > +        Allocate anonymous guest RAM using mmap MAP_ANON (the default)
> > > +        or memfd_create.  This option applies to memory allocated as a
> > > +        side effect of creating various devices. It does not apply to
> > > +        memory-backend-objects, whether explicitly specified on the
> > > +        command line, or implicitly created by the -m command line
> > > +        option.
> > > +
> > > +        Some migration modes require anon-alloc=memfd.
> > > +
> > >       ``memory-backend='id'``
> > >           An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
> > >           Allows to use a memory backend as main RAM.
> > > diff --git a/system/physmem.c b/system/physmem.c
> > > index dc1db3a..174f7e0 100644
> > > --- a/system/physmem.c
> > > +++ b/system/physmem.c
> > > @@ -47,6 +47,7 @@
> > >   #include "qemu/qemu-print.h"
> > >   #include "qemu/log.h"
> > >   #include "qemu/memalign.h"
> > > +#include "qemu/memfd.h"
> > >   #include "exec/memory.h"
> > >   #include "exec/ioport.h"
> > >   #include "sysemu/dma.h"
> > > @@ -69,6 +70,8 @@
> > >   #include "qemu/pmem.h"
> > > +#include "qapi/qapi-types-migration.h"
> > > +#include "migration/options.h"
> > >   #include "migration/vmstate.h"
> > >   #include "qemu/range.h"
> > > @@ -1849,6 +1852,35 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
> > >                   qemu_mutex_unlock_ramlist();
> > >                   return;
> > >               }
> > > +
> > > +        } else if (current_machine->anon_alloc == ANON_ALLOC_OPTION_MEMFD &&
> > > +                   !object_dynamic_cast(new_block->mr->parent_obj.parent,
> > > +                                        TYPE_MEMORY_BACKEND)) {
> > 
> > This is pretty fragile.. if someone adds yet another layer on top of memory
> > backend objects, the ownership links can change and this might silently run
> > into something else even without any warning..
> > 
> > I wished we dig into what is missing, but maybe that's too trivial.  If
> > not, we still need to make this as solid.  Perhaps that can be a ram flag
> > and let relevant callers pass in that flag explicitly.
> 
> How would they decide whether or not we want to set the flag in the current
> configuration?

It was in the previous email where it got cut..  I listed four paths that
may need change.

> 
> > 
> > I think RAM_SHARED can actually be that flag already - I mean, in all paths
> > that we may create anon mem (but not memory-backend-* objects), is it
> > always safe we always switch to RAM_SHARED from anon?
> 
> Do you mean only setting the flag (-> anonymous shmem) or switching also to
> memfd, which is a bigger change?

Switching to memfd.  I thought anon shmem (mmap(MAP_SHARED)) is mostly the
same internally, if we create memfd then mmap(MAP_SHARED) on top of it, no?

In this case we need the fds so we need to do the latter to cache the fds.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-10-04 12:34 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-30 19:40 [PATCH V2 00/13] Live update: cpr-transfer Steve Sistare
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14   ` Peter Xu
2024-10-04 10:14     ` David Hildenbrand
2024-10-04 12:33       ` Peter Xu [this message]
2024-10-04 12:54         ` David Hildenbrand
2024-10-04 13:24           ` Peter Xu
2024-10-07 16:23             ` David Hildenbrand
2024-10-07 19:05               ` Peter Xu
2024-10-07 15:36   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-07 20:10       ` Peter Xu
2024-10-08 15:57         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27   ` Peter Xu
2024-10-07 20:52     ` Steven Sistare
2024-10-08 15:35       ` Peter Xu
2024-10-08 19:13         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49   ` Peter Xu
2024-10-07 16:28     ` Peter Xu
2024-10-08 15:17       ` Steven Sistare
2024-10-08 16:26         ` Peter Xu
2024-10-08 21:05           ` Steven Sistare
2024-10-08 21:32             ` Peter Xu
2024-10-31 20:32               ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06   ` Peter Xu
2024-10-07 16:35     ` Daniel P. Berrangé
2024-10-07 18:12       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-08 15:36       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44   ` Peter Xu
2024-10-07 20:39     ` Steven Sistare
2024-10-08 15:45       ` Peter Xu
2024-10-08 19:12         ` Steven Sistare
2024-10-08 19:38           ` Peter Xu
2024-10-08 18:28       ` Fabiano Rosas
2024-10-08 18:47         ` Peter Xu
2024-10-08 19:11           ` Fabiano Rosas
2024-10-08 19:33             ` Steven Sistare
2024-10-08 19:48             ` Peter Xu
2024-10-09 18:43               ` Steven Sistare
2024-10-09 19:06                 ` Peter Xu
2024-10-09 19:59                   ` Peter Xu
2024-10-09 20:18                     ` Steven Sistare
2024-10-09 20:57                       ` Peter Xu
2024-10-09 22:08                         ` Fabiano Rosas
2024-10-10 20:05                           ` Steven Sistare
2024-10-09 20:09                   ` Steven Sistare
2024-10-09 20:36                     ` Peter Xu
2024-10-10 20:06                       ` Steven Sistare
2024-10-10 21:23                         ` Peter Xu
2024-10-24 21:12                           ` Steven Sistare
2024-10-25 13:55                             ` Peter Xu
2024-10-25 15:04                               ` Steven Sistare
2024-10-08 19:29           ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv_ghrH6i4QOzne8@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).