qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Steve Sistare <steven.sistare@oracle.com>
Cc: qemu-devel@nongnu.org, Fabiano Rosas <farosas@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Eduardo Habkost <eduardo@habkost.net>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH V1 17/26] machine: memfd-alloc option
Date: Tue, 28 May 2024 17:12:02 -0400	[thread overview]
Message-ID: <ZlZIoiH5Dj4XBbLO@x1n> (raw)
In-Reply-To: <1714406135-451286-18-git-send-email-steven.sistare@oracle.com>

On Mon, Apr 29, 2024 at 08:55:26AM -0700, Steve Sistare wrote:
> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> option is set.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  hw/core/machine.c   | 22 ++++++++++++++++++++++
>  include/hw/boards.h |  1 +
>  qemu-options.hx     |  6 ++++++
>  system/memory.c     |  9 ++++++---
>  system/physmem.c    | 18 +++++++++++++++++-
>  system/trace-events |  1 +
>  6 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 582c2df..9567b97 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -443,6 +443,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
>      ms->mem_merge = value;
>  }
>  
> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->memfd_alloc;
> +}
> +
> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->memfd_alloc = value;
> +}
> +
>  static bool machine_get_usb(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -1044,6 +1058,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
>      object_class_property_set_description(oc, "mem-merge",
>          "Enable/disable memory merge support");
>  
> +    object_class_property_add_bool(oc, "memfd-alloc",
> +        machine_get_memfd_alloc, machine_set_memfd_alloc);
> +    object_class_property_set_description(oc, "memfd-alloc",
> +        "Enable/disable allocating anonymous memory using memfd_create");
> +
>      object_class_property_add_bool(oc, "usb",
>          machine_get_usb, machine_set_usb);
>      object_class_property_set_description(oc, "usb",
> @@ -1387,6 +1406,9 @@ static bool create_default_memdev(MachineState *ms, const char *path, Error **er
>      if (!object_property_set_int(obj, "size", ms->ram_size, errp)) {
>          goto out;
>      }
> +    if (!object_property_set_bool(obj, "share", ms->memfd_alloc, errp)) {
> +        goto out;
> +    }
>      object_property_add_child(object_get_objects_root(), mc->default_ram_id,
>                                obj);
>      /* Ensure backend's memory region name is equal to mc->default_ram_id */
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 69c1ba4..96259c3 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -372,6 +372,7 @@ struct MachineState {
>      bool dump_guest_core;
>      bool mem_merge;
>      bool require_guest_memfd;
> +    bool memfd_alloc;
>      bool usb;
>      bool usb_disabled;
>      char *firmware;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index cf61f6b..f0dfda5 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -32,6 +32,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
>      "                mem-merge=on|off controls memory merge support (default: on)\n"
> +    "                memfd-alloc=on|off controls allocating anonymous guest RAM using memfd_create (default: off)\n"
>      "                aes-key-wrap=on|off controls support for AES key wrapping (default=on)\n"
>      "                dea-key-wrap=on|off controls support for DEA key wrapping (default=on)\n"
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
> @@ -79,6 +80,11 @@ SRST
>          supported by the host, de-duplicates identical memory pages
>          among VMs instances (enabled by default).
>  
> +    ``memfd-alloc=on|off``
> +        Enables or disables allocation of anonymous guest RAM using
> +        memfd_create.  Any associated memory-backend objects are created with
> +        share=on.  The memfd-alloc default is off.
> +
>      ``aes-key-wrap=on|off``
>          Enables or disables AES key wrapping support on s390-ccw hosts.
>          This feature controls whether AES wrapping keys will be created
> diff --git a/system/memory.c b/system/memory.c
> index 49f1cb2..ca04a0e 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -1552,8 +1552,9 @@ bool memory_region_init_ram_nomigrate(MemoryRegion *mr,
>                                        uint64_t size,
>                                        Error **errp)
>  {
> +    uint32_t flags = current_machine->memfd_alloc ? RAM_SHARED : 0;

If there's a machine option to "use memfd for allocations", then it's
shared mem... Hmm..

It is a bit confusing to me in quite a few levels:

  - Why memory allocation method will be defined by a machine property,
    even if we have memory-backend-* which should cover everything?

  - Even if we have such a machine property, why setting "memfd" will
    always imply shared?  why not private?  After all it's not called
    "memfd-shared-alloc", and we can create private mappings using
    e.g. memory-backend-memfd,share=off.

>      return memory_region_init_ram_flags_nomigrate(mr, owner, name,
> -                                                  size, 0, errp);
> +                                                  size, flags, errp);
>  }
>  
>  bool memory_region_init_ram_flags_nomigrate(MemoryRegion *mr,
> @@ -1713,8 +1714,9 @@ bool memory_region_init_rom_nomigrate(MemoryRegion *mr,
>                                        uint64_t size,
>                                        Error **errp)
>  {
> +    uint32_t flags = current_machine->memfd_alloc ? RAM_SHARED : 0;
>      if (!memory_region_init_ram_flags_nomigrate(mr, owner, name,
> -                                                size, 0, errp)) {
> +                                                size, flags, errp)) {
>           return false;
>      }
>      mr->readonly = true;
> @@ -1731,6 +1733,7 @@ bool memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
>                                               Error **errp)
>  {
>      Error *err = NULL;
> +    uint32_t flags = current_machine->memfd_alloc ? RAM_SHARED : 0;
>      assert(ops);
>      memory_region_init(mr, owner, name, size);
>      mr->ops = ops;
> @@ -1738,7 +1741,7 @@ bool memory_region_init_rom_device_nomigrate(MemoryRegion *mr,
>      mr->terminates = true;
>      mr->rom_device = true;
>      mr->destructor = memory_region_destructor_ram;
> -    mr->ram_block = qemu_ram_alloc(size, 0, mr, &err);
> +    mr->ram_block = qemu_ram_alloc(size, flags, mr, &err);
>      if (err) {
>          mr->size = int128_zero();
>          object_unparent(OBJECT(mr));
> diff --git a/system/physmem.c b/system/physmem.c
> index c736af5..36d97ec 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -45,6 +45,7 @@
>  #include "qemu/qemu-print.h"
>  #include "qemu/log.h"
>  #include "qemu/memalign.h"
> +#include "qemu/memfd.h"
>  #include "exec/memory.h"
>  #include "exec/ioport.h"
>  #include "sysemu/dma.h"
> @@ -1825,6 +1826,19 @@ static void *ram_block_alloc_host(RAMBlock *rb, Error **errp)
>      if (xen_enabled()) {
>          xen_ram_alloc(rb->offset, rb->max_length, mr, errp);
>  
> +    } else if (rb->flags & RAM_SHARED) {
> +        if (rb->fd == -1) {
> +            mr->align = QEMU_VMALLOC_ALIGN;
> +            rb->fd = qemu_memfd_create(rb->idstr, rb->max_length + mr->align,
> +                                       0, 0, 0, errp);
> +        }

We used to have case where RAM_SHARED && rb->fd==-1, I think.

We have some code that checks explicitly on rb->fd against -1 to know
whether it's a fd based.  I'm not sure whether there'll be implications to
affect those codes.

Maybe it's mostly fine, OTOH I worry more on the whole idea.  I'm not sure
whether this is relevant to "we want to be able to share the mem with the
new process", in this case can we simply require the user to use file based
memory backends, rather than such change?

Thanks,

> +        if (rb->fd >= 0) {
> +            int mfd = rb->fd;
> +            qemu_set_cloexec(mfd);
> +            host = file_ram_alloc(rb, rb->max_length, mfd, false, 0, errp);
> +            trace_qemu_anon_memfd_alloc(rb->idstr, rb->max_length, mfd, host);
> +        }
> +
>      } else {
>          host = qemu_anon_ram_alloc(rb->max_length, &mr->align,
>                                     qemu_ram_is_shared(rb),
> @@ -2106,8 +2120,10 @@ RAMBlock *qemu_ram_alloc_resizeable(ram_addr_t size, ram_addr_t maxsz,
>                                                       void *host),
>                                       MemoryRegion *mr, Error **errp)
>  {
> +    uint32_t flags = current_machine->memfd_alloc ? RAM_SHARED : 0;
> +    flags |= RAM_RESIZEABLE;
>      return qemu_ram_alloc_internal(size, maxsz, resized, NULL,
> -                                   RAM_RESIZEABLE, mr, errp);
> +                                   flags, mr, errp);
>  }
>  
>  static void reclaim_ramblock(RAMBlock *block)
> diff --git a/system/trace-events b/system/trace-events
> index f0a80ba..0092734 100644
> --- a/system/trace-events
> +++ b/system/trace-events
> @@ -41,3 +41,4 @@ dirtylimit_vcpu_execute(int cpu_index, int64_t sleep_time_us) "CPU[%d] sleep %"P
>  
>  # physmem.c
>  ram_block_create(const char *name, uint32_t flags, int fd, size_t used_length, size_t max_length, size_t align) "%s, flags %u, fd %d, len %lu, maxlen %lu, align %lu"
> +qemu_anon_memfd_alloc(const char *name, size_t size, int fd, void *ptr) "%s size %zu fd %d -> %p"
> -- 
> 1.8.3.1
> 

-- 
Peter Xu



  reply	other threads:[~2024-05-28 21:13 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29 15:55 [PATCH V1 00/26] Live update: cpr-exec Steve Sistare
2024-04-29 15:55 ` [PATCH V1 01/26] oslib: qemu_clear_cloexec Steve Sistare
2024-05-06 23:27   ` Fabiano Rosas
2024-05-07  8:56     ` Daniel P. Berrangé
2024-05-07 13:54       ` Fabiano Rosas
2024-04-29 15:55 ` [PATCH V1 02/26] vl: helper to request re-exec Steve Sistare
2024-04-29 15:55 ` [PATCH V1 03/26] migration: SAVEVM_FOREACH Steve Sistare
2024-05-06 23:17   ` Fabiano Rosas
2024-05-13 19:27     ` Steven Sistare
2024-05-27 18:14       ` Peter Xu
2024-04-29 15:55 ` [PATCH V1 04/26] migration: delete unused parameter mis Steve Sistare
2024-05-06 21:50   ` Fabiano Rosas
2024-05-27 18:02   ` Peter Xu
2024-04-29 15:55 ` [PATCH V1 05/26] migration: precreate vmstate Steve Sistare
2024-05-07 21:02   ` Fabiano Rosas
2024-05-13 19:28     ` Steven Sistare
2024-05-24 13:56   ` Fabiano Rosas
2024-05-27 18:16   ` Peter Xu
2024-05-28 15:09     ` Steven Sistare via
2024-05-29 18:39       ` Peter Xu
2024-05-30 17:04         ` Steven Sistare via
2024-04-29 15:55 ` [PATCH V1 06/26] migration: precreate vmstate for exec Steve Sistare
2024-05-06 23:34   ` Fabiano Rosas
2024-05-13 19:28     ` Steven Sistare
2024-05-13 21:21       ` Fabiano Rosas
2024-04-29 15:55 ` [PATCH V1 07/26] migration: VMStateId Steve Sistare
2024-05-07 21:03   ` Fabiano Rosas
2024-05-27 18:20   ` Peter Xu
2024-05-28 15:10     ` Steven Sistare via
2024-05-28 17:44       ` Peter Xu
2024-05-29 17:30         ` Steven Sistare via
2024-05-29 18:53           ` Peter Xu
2024-05-30 17:11             ` Steven Sistare via
2024-05-30 18:03               ` Peter Xu
2024-04-29 15:55 ` [PATCH V1 08/26] migration: vmstate_info_void_ptr Steve Sistare
2024-05-07 21:33   ` Fabiano Rosas
2024-05-27 18:31   ` Peter Xu
2024-05-28 15:10     ` Steven Sistare via
2024-05-28 18:21       ` Peter Xu
2024-05-29 17:30         ` Steven Sistare via
2024-04-29 15:55 ` [PATCH V1 09/26] migration: vmstate_register_named Steve Sistare
2024-05-09 14:19   ` Fabiano Rosas
2024-05-09 14:32     ` Fabiano Rosas
2024-05-13 19:29       ` Steven Sistare
2024-04-29 15:55 ` [PATCH V1 10/26] migration: vmstate_unregister_named Steve Sistare
2024-04-29 15:55 ` [PATCH V1 11/26] migration: vmstate_register at init time Steve Sistare
2024-04-29 15:55 ` [PATCH V1 12/26] migration: vmstate factory object Steve Sistare
2024-04-29 15:55 ` [PATCH V1 13/26] physmem: ram_block_create Steve Sistare
2024-05-13 18:37   ` Fabiano Rosas
2024-05-13 19:30     ` Steven Sistare
2024-04-29 15:55 ` [PATCH V1 14/26] physmem: hoist guest_memfd creation Steve Sistare
2024-04-29 15:55 ` [PATCH V1 15/26] physmem: hoist host memory allocation Steve Sistare
2024-04-29 15:55 ` [PATCH V1 16/26] physmem: set ram block idstr earlier Steve Sistare
2024-04-29 15:55 ` [PATCH V1 17/26] machine: memfd-alloc option Steve Sistare
2024-05-28 21:12   ` Peter Xu [this message]
2024-05-29 17:31     ` Steven Sistare via
2024-05-29 19:14       ` Peter Xu
2024-05-30 17:11         ` Steven Sistare via
2024-05-30 18:14           ` Peter Xu
2024-05-31 19:32             ` Steven Sistare via
2024-06-03 21:48               ` Peter Xu
2024-06-04  7:13                 ` Daniel P. Berrangé
2024-06-04 15:58                   ` Peter Xu
2024-06-04 16:14                     ` David Hildenbrand
2024-06-04 16:41                       ` Peter Xu
2024-06-04 17:16                         ` David Hildenbrand
2024-06-03 10:17       ` Daniel P. Berrangé
2024-06-03 11:59         ` Steven Sistare via
2024-04-29 15:55 ` [PATCH V1 18/26] migration: cpr-exec-args parameter Steve Sistare
2024-05-02 12:23   ` Markus Armbruster
2024-05-02 16:00     ` Steven Sistare
2024-05-21  8:13   ` Daniel P. Berrangé
2024-04-29 15:55 ` [PATCH V1 19/26] physmem: preserve ram blocks for cpr Steve Sistare
2024-05-28 21:44   ` Peter Xu
2024-05-29 17:31     ` Steven Sistare via
2024-05-29 19:25       ` Peter Xu
2024-05-30 17:12         ` Steven Sistare via
2024-05-30 18:39           ` Peter Xu
2024-05-31 19:32             ` Steven Sistare via
2024-06-03 22:29               ` Peter Xu
2024-04-29 15:55 ` [PATCH V1 20/26] migration: cpr-exec mode Steve Sistare
2024-05-02 12:23   ` Markus Armbruster
2024-05-02 16:00     ` Steven Sistare
2024-05-03  6:26       ` Markus Armbruster
2024-05-21  8:20   ` Daniel P. Berrangé
2024-05-24 14:58   ` Fabiano Rosas
2024-05-27 18:54     ` Steven Sistare via
2024-04-29 15:55 ` [PATCH V1 21/26] migration: migrate_add_blocker_mode Steve Sistare
2024-05-09 17:47   ` Fabiano Rosas
2024-04-29 15:55 ` [PATCH V1 22/26] migration: ram block cpr-exec blockers Steve Sistare
2024-05-09 18:01   ` Fabiano Rosas
2024-05-13 19:29     ` Steven Sistare
2024-04-29 15:55 ` [PATCH V1 23/26] migration: misc " Steve Sistare
2024-05-09 18:05   ` Fabiano Rosas
2024-05-24 12:40   ` Fabiano Rosas
2024-05-27 19:02     ` Steven Sistare via
2024-04-29 15:55 ` [PATCH V1 24/26] seccomp: cpr-exec blocker Steve Sistare
2024-05-09 18:16   ` Fabiano Rosas
2024-05-10  7:54   ` Daniel P. Berrangé
2024-05-13 19:29     ` Steven Sistare
2024-05-21  7:14       ` Daniel P. Berrangé
2024-04-29 15:55 ` [PATCH V1 25/26] migration: fix mismatched GPAs during cpr-exec Steve Sistare
2024-05-09 18:39   ` Fabiano Rosas
2024-04-29 15:55 ` [PATCH V1 26/26] migration: only-migratable-modes Steve Sistare
2024-05-09 19:14   ` Fabiano Rosas
2024-05-13 19:48     ` Steven Sistare
2024-05-13 21:57       ` Fabiano Rosas
2024-05-21  8:05   ` Daniel P. Berrangé
2024-05-02 16:13 ` cpr-exec doc (was Re: [PATCH V1 00/26] Live update: cpr-exec) Steven Sistare
2024-05-02 18:15   ` Peter Xu
2024-05-20 18:30 ` [PATCH V1 00/26] Live update: cpr-exec Steven Sistare
2024-05-20 22:28   ` Fabiano Rosas
2024-05-21  2:31     ` Peter Xu
2024-05-21 11:46       ` Steven Sistare
2024-05-27 17:45         ` Peter Xu
2024-05-28 15:10           ` Steven Sistare via
2024-05-28 16:42             ` Peter Xu
2024-05-30 17:17               ` Steven Sistare via
2024-05-30 19:23                 ` Peter Xu
2024-05-24 13:02 ` Fabiano Rosas
2024-05-24 14:07   ` Steven Sistare
2024-05-27 18:07 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZlZIoiH5Dj4XBbLO@x1n \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=imammedo@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).