qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>, Steve Sistare <steven.sistare@oracle.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH V2 01/13] machine: alloc-anon option
Date: Fri, 4 Oct 2024 12:14:35 +0200	[thread overview]
Message-ID: <2143f803-439e-4b8b-ae92-07caa913d646@redhat.com> (raw)
In-Reply-To: <Zv7C7MeVP2X8bEJU@x1n>

On 03.10.24 18:14, Peter Xu wrote:
> On Mon, Sep 30, 2024 at 12:40:32PM -0700, Steve Sistare wrote:
>> Allocate anonymous memory using mmap MAP_ANON or memfd_create depending
>> on the value of the anon-alloc machine property.  This option applies to
>> memory allocated as a side effect of creating various devices. It does
>> not apply to memory-backend-objects, whether explicitly specified on
>> the command line, or implicitly created by the -m command line option.
>>
>> The memfd option is intended to support new migration modes, in which the
>> memory region can be transferred in place to a new QEMU process, by sending
>> the memfd file descriptor to the process.  Memory contents are preserved,
>> and if the mode also transfers device descriptors, then pages that are
>> locked in memory for DMA remain locked.  This behavior is a pre-requisite
>> for supporting vfio, vdpa, and iommufd devices with the new modes.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> 
> [Igor seems missing in the loop; added]
> 
>> ---
>>   hw/core/machine.c   | 19 +++++++++++++++++++
>>   include/hw/boards.h |  1 +
>>   qapi/machine.json   | 14 ++++++++++++++
>>   qemu-options.hx     | 11 +++++++++++
>>   system/physmem.c    | 35 +++++++++++++++++++++++++++++++++++
>>   system/trace-events |  3 +++
>>   6 files changed, 83 insertions(+)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index adaba17..a89a32b 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -460,6 +460,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
>>       ms->mem_merge = value;
>>   }
>>   
>> +static int machine_get_anon_alloc(Object *obj, Error **errp)
>> +{
>> +    MachineState *ms = MACHINE(obj);
>> +
>> +    return ms->anon_alloc;
>> +}
>> +
>> +static void machine_set_anon_alloc(Object *obj, int value, Error **errp)
>> +{
>> +    MachineState *ms = MACHINE(obj);
>> +
>> +    ms->anon_alloc = value;
>> +}
>> +
>>   static bool machine_get_usb(Object *obj, Error **errp)
>>   {
>>       MachineState *ms = MACHINE(obj);
>> @@ -1078,6 +1092,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
>>       object_class_property_set_description(oc, "mem-merge",
>>           "Enable/disable memory merge support");
>>   
>> +    object_class_property_add_enum(oc, "anon-alloc", "AnonAllocOption",
>> +                                   &AnonAllocOption_lookup,
>> +                                   machine_get_anon_alloc,
>> +                                   machine_set_anon_alloc);
>> +
>>       object_class_property_add_bool(oc, "usb",
>>           machine_get_usb, machine_set_usb);
>>       object_class_property_set_description(oc, "usb",
>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>> index 5966069..5a87647 100644
>> --- a/include/hw/boards.h
>> +++ b/include/hw/boards.h
>> @@ -393,6 +393,7 @@ struct MachineState {
>>       bool enable_graphics;
>>       ConfidentialGuestSupport *cgs;
>>       HostMemoryBackend *memdev;
>> +    AnonAllocOption anon_alloc;
>>       /*
>>        * convenience alias to ram_memdev_id backend memory region
>>        * or to numa container memory region
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index a6b8795..d4a63f5 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -1898,3 +1898,17 @@
>>   { 'command': 'x-query-interrupt-controllers',
>>     'returns': 'HumanReadableText',
>>     'features': [ 'unstable' ]}
>> +
>> +##
>> +# @AnonAllocOption:
>> +#
>> +# An enumeration of the options for allocating anonymous guest memory.
>> +#
>> +# @mmap: allocate using mmap MAP_ANON
>> +#
>> +# @memfd: allocate using memfd_create
>> +#
>> +# Since: 9.2
>> +##
>> +{ 'enum': 'AnonAllocOption',
>> +  'data': [ 'mmap', 'memfd' ] }
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index d94e2cb..90ab943 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -38,6 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>       "                nvdimm=on|off controls NVDIMM support (default=off)\n"
>>       "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
>>       "                hmat=on|off controls ACPI HMAT support (default=off)\n"
>> +    "                anon-alloc=mmap|memfd allocate anonymous guest RAM using mmap MAP_ANON or memfd_create (default: mmap)\n"
>>       "                memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n"
>>       "                cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
>>       QEMU_ARCH_ALL)
>> @@ -101,6 +102,16 @@ SRST
>>           Enables or disables ACPI Heterogeneous Memory Attribute Table
>>           (HMAT) support. The default is off.
>>   
>> +    ``anon-alloc=mmap|memfd``
>> +        Allocate anonymous guest RAM using mmap MAP_ANON (the default)
>> +        or memfd_create.  This option applies to memory allocated as a
>> +        side effect of creating various devices. It does not apply to
>> +        memory-backend-objects, whether explicitly specified on the
>> +        command line, or implicitly created by the -m command line
>> +        option.
>> +
>> +        Some migration modes require anon-alloc=memfd.
>> +
>>       ``memory-backend='id'``
>>           An alternative to legacy ``-mem-path`` and ``mem-prealloc`` options.
>>           Allows to use a memory backend as main RAM.
>> diff --git a/system/physmem.c b/system/physmem.c
>> index dc1db3a..174f7e0 100644
>> --- a/system/physmem.c
>> +++ b/system/physmem.c
>> @@ -47,6 +47,7 @@
>>   #include "qemu/qemu-print.h"
>>   #include "qemu/log.h"
>>   #include "qemu/memalign.h"
>> +#include "qemu/memfd.h"
>>   #include "exec/memory.h"
>>   #include "exec/ioport.h"
>>   #include "sysemu/dma.h"
>> @@ -69,6 +70,8 @@
>>   
>>   #include "qemu/pmem.h"
>>   
>> +#include "qapi/qapi-types-migration.h"
>> +#include "migration/options.h"
>>   #include "migration/vmstate.h"
>>   
>>   #include "qemu/range.h"
>> @@ -1849,6 +1852,35 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>>                   qemu_mutex_unlock_ramlist();
>>                   return;
>>               }
>> +
>> +        } else if (current_machine->anon_alloc == ANON_ALLOC_OPTION_MEMFD &&
>> +                   !object_dynamic_cast(new_block->mr->parent_obj.parent,
>> +                                        TYPE_MEMORY_BACKEND)) {
> 
> This is pretty fragile.. if someone adds yet another layer on top of memory
> backend objects, the ownership links can change and this might silently run
> into something else even without any warning..
> 
> I wished we dig into what is missing, but maybe that's too trivial.  If
> not, we still need to make this as solid.  Perhaps that can be a ram flag
> and let relevant callers pass in that flag explicitly.

How would they decide whether or not we want to set the flag in the 
current configuration?

> 
> I think RAM_SHARED can actually be that flag already - I mean, in all paths
> that we may create anon mem (but not memory-backend-* objects), is it
> always safe we always switch to RAM_SHARED from anon?

Do you mean only setting the flag (-> anonymous shmem) or switching also 
to memfd, which is a bigger change?

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-10-04 10:15 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-30 19:40 [PATCH V2 00/13] Live update: cpr-transfer Steve Sistare
2024-09-30 19:40 ` [PATCH V2 01/13] machine: alloc-anon option Steve Sistare
2024-10-03 16:14   ` Peter Xu
2024-10-04 10:14     ` David Hildenbrand [this message]
2024-10-04 12:33       ` Peter Xu
2024-10-04 12:54         ` David Hildenbrand
2024-10-04 13:24           ` Peter Xu
2024-10-07 16:23             ` David Hildenbrand
2024-10-07 19:05               ` Peter Xu
2024-10-07 15:36   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 02/13] migration: cpr-state Steve Sistare
2024-10-07 14:14   ` Peter Xu
2024-10-07 19:30     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 03/13] migration: save cpr mode Steve Sistare
2024-10-07 15:18   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-07 20:10       ` Peter Xu
2024-10-08 15:57         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 04/13] migration: stop vm earlier for cpr Steve Sistare
2024-10-07 15:27   ` Peter Xu
2024-10-07 20:52     ` Steven Sistare
2024-10-08 15:35       ` Peter Xu
2024-10-08 19:13         ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 05/13] physmem: preserve ram blocks " Steve Sistare
2024-10-07 15:49   ` Peter Xu
2024-10-07 16:28     ` Peter Xu
2024-10-08 15:17       ` Steven Sistare
2024-10-08 16:26         ` Peter Xu
2024-10-08 21:05           ` Steven Sistare
2024-10-08 21:32             ` Peter Xu
2024-10-31 20:32               ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 06/13] hostmem-memfd: preserve " Steve Sistare
2024-10-07 15:52   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 07/13] migration: SCM_RIGHTS for QEMUFile Steve Sistare
2024-10-07 16:06   ` Peter Xu
2024-10-07 16:35     ` Daniel P. Berrangé
2024-10-07 18:12       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 08/13] migration: VMSTATE_FD Steve Sistare
2024-10-07 16:36   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-09-30 19:40 ` [PATCH V2 09/13] migration: cpr-transfer save and load Steve Sistare
2024-10-07 16:47   ` Peter Xu
2024-10-07 19:31     ` Steven Sistare
2024-10-08 15:36       ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 10/13] migration: cpr-uri parameter Steve Sistare
2024-10-07 16:49   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 11/13] migration: cpr-uri option Steve Sistare
2024-10-07 16:50   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 12/13] migration: split qmp_migrate Steve Sistare
2024-10-07 19:18   ` Peter Xu
2024-09-30 19:40 ` [PATCH V2 13/13] migration: cpr-transfer mode Steve Sistare
2024-10-07 19:44   ` Peter Xu
2024-10-07 20:39     ` Steven Sistare
2024-10-08 15:45       ` Peter Xu
2024-10-08 19:12         ` Steven Sistare
2024-10-08 19:38           ` Peter Xu
2024-10-08 18:28       ` Fabiano Rosas
2024-10-08 18:47         ` Peter Xu
2024-10-08 19:11           ` Fabiano Rosas
2024-10-08 19:33             ` Steven Sistare
2024-10-08 19:48             ` Peter Xu
2024-10-09 18:43               ` Steven Sistare
2024-10-09 19:06                 ` Peter Xu
2024-10-09 19:59                   ` Peter Xu
2024-10-09 20:18                     ` Steven Sistare
2024-10-09 20:57                       ` Peter Xu
2024-10-09 22:08                         ` Fabiano Rosas
2024-10-10 20:05                           ` Steven Sistare
2024-10-09 20:09                   ` Steven Sistare
2024-10-09 20:36                     ` Peter Xu
2024-10-10 20:06                       ` Steven Sistare
2024-10-10 21:23                         ` Peter Xu
2024-10-24 21:12                           ` Steven Sistare
2024-10-25 13:55                             ` Peter Xu
2024-10-25 15:04                               ` Steven Sistare
2024-10-08 19:29           ` Steven Sistare
2024-10-08 14:33 ` [PATCH V2 00/13] Live update: cpr-transfer Vladimir Sementsov-Ogievskiy
2024-10-08 21:13   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2143f803-439e-4b8b-ae92-07caa913d646@redhat.com \
    --to=david@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).