All of lore.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: "Jason Zeng" <jason.zeng@linux.intel.com>,
	"Juan Quintela" <quintela@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Zheng Chuan" <zhengchuan@huawei.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Daniel P. Berrange" <berrange@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>
Subject: Re: [PATCH V7 10/29] machine: memfd-alloc option
Date: Fri, 11 Mar 2022 10:42:52 +0100	[thread overview]
Message-ID: <20220311104252.548c5fb4@redhat.com> (raw)
In-Reply-To: <88be3aa0-0d7f-08c5-8278-07a3c5b701c8@oracle.com>

On Thu, 10 Mar 2022 13:18:35 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 3/10/2022 12:28 PM, Steven Sistare wrote:
> > On 3/10/2022 11:00 AM, Igor Mammedov wrote:  
> >> On Thu, 10 Mar 2022 10:36:08 -0500
> >> Steven Sistare <steven.sistare@oracle.com> wrote:
> >>  
> >>> On 3/8/2022 2:20 AM, Igor Mammedov wrote:  
> >>>> On Tue, 8 Mar 2022 01:50:11 -0500
> >>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>     
> >>>>> On Mon, Mar 07, 2022 at 09:41:44AM -0500, Steven Sistare wrote:    
> >>>>>> On 3/4/2022 5:41 AM, Igor Mammedov wrote:      
> >>>>>>> On Thu, 3 Mar 2022 12:21:15 -0500
> >>>>>>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>>>>>>       
> >>>>>>>> On Wed, Dec 22, 2021 at 11:05:15AM -0800, Steve Sistare wrote:      
> >>>>>>>>> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> >>>>>>>>> option is set.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>>>>>>> ---
> >>>>>>>>>  hw/core/machine.c   | 19 +++++++++++++++++++
> >>>>>>>>>  include/hw/boards.h |  1 +
> >>>>>>>>>  qemu-options.hx     |  6 ++++++
> >>>>>>>>>  softmmu/physmem.c   | 47 ++++++++++++++++++++++++++++++++++++++---------
> >>>>>>>>>  softmmu/vl.c        |  1 +
> >>>>>>>>>  trace-events        |  1 +
> >>>>>>>>>  util/qemu-config.c  |  4 ++++
> >>>>>>>>>  7 files changed, 70 insertions(+), 9 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >>>>>>>>> index 53a99ab..7739d88 100644
> >>>>>>>>> --- a/hw/core/machine.c
> >>>>>>>>> +++ b/hw/core/machine.c
> >>>>>>>>> @@ -392,6 +392,20 @@ static void machine_set_mem_merge(Object *obj, bool value, Error **errp)
> >>>>>>>>>      ms->mem_merge = value;
> >>>>>>>>>  }
> >>>>>>>>>  
> >>>>>>>>> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> >>>>>>>>> +{
> >>>>>>>>> +    MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> +    return ms->memfd_alloc;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
> >>>>>>>>> +{
> >>>>>>>>> +    MachineState *ms = MACHINE(obj);
> >>>>>>>>> +
> >>>>>>>>> +    ms->memfd_alloc = value;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  static bool machine_get_usb(Object *obj, Error **errp)
> >>>>>>>>>  {
> >>>>>>>>>      MachineState *ms = MACHINE(obj);
> >>>>>>>>> @@ -829,6 +843,11 @@ static void machine_class_init(ObjectClass *oc, void *data)
> >>>>>>>>>      object_class_property_set_description(oc, "mem-merge",
> >>>>>>>>>          "Enable/disable memory merge support");
> >>>>>>>>>  
> >>>>>>>>> +    object_class_property_add_bool(oc, "memfd-alloc",
> >>>>>>>>> +        machine_get_memfd_alloc, machine_set_memfd_alloc);
> >>>>>>>>> +    object_class_property_set_description(oc, "memfd-alloc",
> >>>>>>>>> +        "Enable/disable allocating anonymous memory using memfd_create");
> >>>>>>>>> +
> >>>>>>>>>      object_class_property_add_bool(oc, "usb",
> >>>>>>>>>          machine_get_usb, machine_set_usb);
> >>>>>>>>>      object_class_property_set_description(oc, "usb",
> >>>>>>>>> diff --git a/include/hw/boards.h b/include/hw/boards.h
> >>>>>>>>> index 9c1c190..a57d7a0 100644
> >>>>>>>>> --- a/include/hw/boards.h
> >>>>>>>>> +++ b/include/hw/boards.h
> >>>>>>>>> @@ -327,6 +327,7 @@ struct MachineState {
> >>>>>>>>>      char *dt_compatible;
> >>>>>>>>>      bool dump_guest_core;
> >>>>>>>>>      bool mem_merge;
> >>>>>>>>> +    bool memfd_alloc;
> >>>>>>>>>      bool usb;
> >>>>>>>>>      bool usb_disabled;
> >>>>>>>>>      char *firmware;
> >>>>>>>>> diff --git a/qemu-options.hx b/qemu-options.hx
> >>>>>>>>> index 7d47510..33c8173 100644
> >>>>>>>>> --- a/qemu-options.hx
> >>>>>>>>> +++ b/qemu-options.hx
> >>>>>>>>> @@ -30,6 +30,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
> >>>>>>>>>      "                vmport=on|off|auto controls emulation of vmport (default: auto)\n"
> >>>>>>>>>      "                dump-guest-core=on|off include guest memory in a core dump (default=on)\n"
> >>>>>>>>>      "                mem-merge=on|off controls memory merge support (default: on)\n"
> >>>>>>>>> +    "                memfd-alloc=on|off controls allocating anonymous guest RAM using memfd_create (default: off)\n"        
> >>>>>>>>
> >>>>>>>> Question: are there any disadvantages associated with using
> >>>>>>>> memfd_create? I guess we are using up an fd, but that seems minor.  Any
> >>>>>>>> reason not to set to on by default? maybe with a fallback option to
> >>>>>>>> disable that?      
> >>>>>>
> >>>>>> Old Linux host kernels, circa 4.1, do not support huge pages for shared memory.
> >>>>>> Also, the tunable to enable huge pages for share memory is different than for
> >>>>>> anon memory, so there could be performance loss if it is not set correctly.
> >>>>>>     /sys/kernel/mm/transparent_hugepage/enabled
> >>>>>>     vs
> >>>>>>     /sys/kernel/mm/transparent_hugepage/shmem_enabled      
> >>>>>
> >>>>> I guess we can test this when launching the VM, and select
> >>>>> a good default.
> >>>>>    
> >>>>>> It might make sense to use memfd_create by default for the secondary segments.      
> >>>>>
> >>>>> Well there's also KSM now you mention it.    
> >>>>
> >>>> then another quest, is there downside to always using memfd_create
> >>>> without any knobs being involved?    
> >>>
> >>> Lower performance if small pages are used (but Michael suggests qemu could 
> >>> automatically check the tunable and use anon memory instead)
> >>>
> >>> KSM (same page merging) is not supported for shared memory, so ram_block_add ->
> >>> memory_try_enable_merging will not enable it.
> >>>
> >>> In both cases, I expect the degradation would be negligible if memfd_create is
> >>> only automatically applied to the secondary segments, which are typically small.
> >>> But, someone's secondary segment could be larger, and it is time consuming to
> >>> prove innocence when someone claims your change caused their performance regression.  
> >>
> >> Adding David as memory subsystem maintainer, maybe he will a better
> >> idea instead of introducing global knob that would also magically alter 
> >> backends' behavior despite of its their configured settings.  
> > 
> > OK, in ram_block_add I can set the RAM_SHARED flag based on the memory-backend object's
> > shared flag.  I already set the latter in create_default_memdev when memfd-alloc is
> > specified.  With that change, we do not override configured settings.  Users can no longer
> > use memory-backend-ram for CPR, and must change all memory-backend-ram to memory-backend-memfd
> > in the command-line arguments.  That is fine.
> > 
> > With that change, are you OK with this patch?  
> 
> Sorry, I mis-read my own code in ram_block_add.  The existing code is correct and does 
> not alter any backend's behavior.   It only sets the shared flag when the ram is *not* 
> being allocated for a backend:
> 
>                 if (!object_dynamic_cast(parent, TYPE_MEMORY_BACKEND)) {
>                     new_block->flags |= RAM_SHARED;
>                 }
> 

ok, maybe instead of introducing a generic option, introduce the high level
feature one that turns this and other necessary quirks for it to work (i.e.
something like live-update=on|off).
That will not make QEMU internals any better but at least it will hide obscure
memfd-alloc from users.
Is there a patch that makes QEMU error out if backend without
shared=on is used?

Also, can you answer question below, pls
or point to a patch in series that takes care of that invariant?

[...]

> >>>>>> There is currently no way to specify memory backends for the secondary memory
> >>>>>> segments (vram, roms, etc), and IMO it would be onerous to specify a backend for
> >>>>>> each of them.  On x86_64, these include pc.bios, vga.vram, pc.rom, vga.rom,
> >>>>>> /rom@etc/acpi/tables, /rom@etc/table-loader, /rom@etc/acpi/rsdp.  
> >>
> >> MemoryRegion is not the only place where state is stored.
> >> If we only talk about fwcfg entries state, it can also reference
> >> plain malloced memory allocated elsewhere or make a deep copy internally.
> >> Similarly devices also may store state outside of RamBlock framework.
> >>
> >> How are you dealing with that?
[...]



  reply	other threads:[~2022-03-11  9:44 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 19:05 [PATCH V7 00/29] Live Update Steve Sistare
2021-12-22 19:05 ` [PATCH V7 01/29] memory: qemu_check_ram_volatile Steve Sistare
2022-02-24 18:28   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2022-03-04 12:47   ` Philippe Mathieu-Daudé
2021-12-22 19:05 ` [PATCH V7 02/29] migration: fix populate_vfio_info Steve Sistare
2022-02-24 18:42   ` Peter Maydell
2022-03-03 15:55     ` Steven Sistare
2022-03-03 16:21       ` Peter Maydell
2022-03-03 16:38         ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 03/29] migration: qemu file wrappers Steve Sistare
2022-02-24 18:21   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 04/29] migration: simplify savevm Steve Sistare
2022-02-24 18:25   ` Dr. David Alan Gilbert
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 05/29] vl: start on wakeup request Steve Sistare
2022-02-24 18:51   ` Dr. David Alan Gilbert
2022-03-03 15:56     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 06/29] cpr: reboot mode Steve Sistare
2021-12-22 19:05 ` [PATCH V7 07/29] cpr: reboot HMP interfaces Steve Sistare
2021-12-22 19:05 ` [PATCH V7 08/29] memory: flat section iterator Steve Sistare
2022-03-04 12:48   ` Philippe Mathieu-Daudé
2022-03-07 14:42     ` Steven Sistare
2022-03-09 14:18   ` Marc-André Lureau
2021-12-22 19:05 ` [PATCH V7 09/29] oslib: qemu_clear_cloexec Steve Sistare
2021-12-22 19:05 ` [PATCH V7 10/29] machine: memfd-alloc option Steve Sistare
2022-02-18  8:05   ` Guoyi Tu
2022-03-03 15:55     ` Steven Sistare
2022-02-24 17:56   ` Dr. David Alan Gilbert
2022-03-03 15:56     ` Steven Sistare
2022-03-03 17:21   ` Michael S. Tsirkin
2022-03-04 10:41     ` Igor Mammedov
2022-03-07 14:41       ` Steven Sistare
2022-03-08  6:50         ` Michael S. Tsirkin
2022-03-08  7:20           ` Igor Mammedov
2022-03-10 15:36             ` Steven Sistare
2022-03-10 16:00               ` Igor Mammedov
2022-03-10 17:28                 ` Steven Sistare
2022-03-10 18:18                   ` Steven Sistare
2022-03-11  9:42                     ` Igor Mammedov [this message]
2022-03-29 17:43                       ` Steven Sistare
2022-03-11 10:08         ` Daniel P. Berrangé
2022-03-11 10:25     ` David Hildenbrand
2022-03-11  9:54   ` David Hildenbrand
2021-12-22 19:05 ` [PATCH V7 11/29] qapi: list utility functions Steve Sistare
2022-03-09 14:11   ` Marc-André Lureau
2022-03-11 16:45     ` Steven Sistare
2022-03-11 21:59       ` Marc-André Lureau
2021-12-22 19:05 ` [PATCH V7 12/29] vl: helper to request re-exec Steve Sistare
2022-03-09 14:16   ` Marc-André Lureau
2022-03-11 16:45     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 13/29] cpr: preserve extra state Steve Sistare
2021-12-22 19:05 ` [PATCH V7 14/29] cpr: restart mode Steve Sistare
2021-12-22 19:05 ` [PATCH V7 15/29] cpr: restart HMP interfaces Steve Sistare
2021-12-22 19:05 ` [PATCH V7 16/29] hostmem-memfd: cpr for memory-backend-memfd Steve Sistare
2021-12-22 19:05 ` [PATCH V7 17/29] pci: export functions for cpr Steve Sistare
2021-12-22 23:07   ` Michael S. Tsirkin
2022-01-05 17:22     ` Steven Sistare
2022-01-05 20:16       ` Michael S. Tsirkin
2022-01-06 22:48         ` Steven Sistare
2022-01-07 10:03           ` Michael S. Tsirkin
2021-12-22 19:05 ` [PATCH V7 18/29] vfio-pci: refactor " Steve Sistare
2022-03-03 23:21   ` Alex Williamson
2022-03-07 14:42     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 19/29] vfio-pci: cpr part 1 (fd and dma) Steve Sistare
2021-12-22 23:15   ` Michael S. Tsirkin
2022-01-05 17:24     ` Steven Sistare
2022-01-05 21:14       ` Michael S. Tsirkin
2022-01-05 21:40         ` Steven Sistare
2022-01-05 23:09           ` Michael S. Tsirkin
2022-01-05 23:24             ` Steven Sistare
2022-01-06  9:12               ` Michael S. Tsirkin
2022-01-06 19:13                 ` Steven Sistare
2022-03-07 22:16   ` Alex Williamson
2022-03-10 15:00     ` Steven Sistare
2022-03-10 18:35       ` Alex Williamson
2022-03-10 19:55         ` Steven Sistare
2022-03-10 22:30           ` Alex Williamson
2022-03-11 16:22             ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 20/29] vfio-pci: cpr part 2 (msi) Steve Sistare
2021-12-22 19:05 ` [PATCH V7 21/29] vfio-pci: cpr part 3 (intx) Steve Sistare
2021-12-22 19:05 ` [PATCH V7 22/29] vfio-pci: recover from unmap-all-vaddr failure Steve Sistare
2021-12-22 19:05 ` [PATCH V7 23/29] vhost: reset vhost devices for cpr Steve Sistare
2021-12-22 19:05 ` [PATCH V7 24/29] loader: suppress rom_reset during cpr Steve Sistare
2021-12-22 19:05 ` [PATCH V7 25/29] chardev: cpr framework Steve Sistare
2021-12-22 19:05 ` [PATCH V7 26/29] chardev: cpr for simple devices Steve Sistare
2021-12-22 19:05 ` [PATCH V7 27/29] chardev: cpr for pty Steve Sistare
2021-12-22 19:05 ` [PATCH V7 28/29] chardev: cpr for sockets Steve Sistare
2022-02-18  9:03   ` Guoyi Tu
2022-03-03 15:55     ` Steven Sistare
2021-12-22 19:05 ` [PATCH V7 29/29] cpr: only-cpr-capable option Steve Sistare
2022-02-18  9:43   ` Guoyi Tu
2022-03-03 15:54     ` Steven Sistare
2022-01-07 18:45 ` [PATCH V7 00/29] Live Update Steven Sistare
2022-02-18 13:36   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220311104252.548c5fb4@redhat.com \
    --to=imammedo@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=jason.zeng@linux.intel.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=steven.sistare@oracle.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.