qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Xiao Guangrong" <xiaoguangrong.eric@gmail.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Michal Privoznik" <mprivozn@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Gavin Shan" <gshan@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v3 00/16] virtio-mem: Expose device memory through multiple memslots
Date: Mon, 11 Sep 2023 09:45:05 +0200	[thread overview]
Message-ID: <87e38689-c99b-0c92-3567-589cd9a2bc4c@redhat.com> (raw)
In-Reply-To: <20230908142136.403541-1-david@redhat.com>

@MST, any comment on the vhost bits (mostly uncontroversial and only in 
the memslot domain)?

I'm planning on queuing this myself (but will wait a bit more), unless 
you want to take it.


On 08.09.23 16:21, David Hildenbrand wrote:
> Quoting from patch #14:
> 
>      Having large virtio-mem devices that only expose little memory to a VM
>      is currently a problem: we map the whole sparse memory region into the
>      guest using a single memslot, resulting in one gigantic memslot in KVM.
>      KVM allocates metadata for the whole memslot, which can result in quite
>      some memory waste.
> 
>      Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
>      1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
>      allocate metadata for that 1 TiB memslot: on x86, this implies allocating
>      a significant amount of memory for metadata:
> 
>      (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
>          -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
> 
>          With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
>          allocated lazily when required for nested VMs
>      (2) gfn_track: 2 bytes per 4 KiB
>          -> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
>      (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
>          -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
>      (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
>          -> For 1 TiB: 536870912 = 64 MiB (0.006 %)
> 
>      So we primarily care about (1) and (2). The bad thing is, that the
>      memory consumption doubles once SMM is enabled, because we create the
>      memslot once for !SMM and once for SMM.
> 
>      Having a 1 TiB memslot without the TDP MMU consumes around:
>      * With SMM: 5 GiB
>      * Without SMM: 2.5 GiB
>      Having a 1 TiB memslot with the TDP MMU consumes around:
>      * With SMM: 1 GiB
>      * Without SMM: 512 MiB
> 
>      ... and that's really something we want to optimize, to be able to just
>      start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
>      that can grow very large (e.g., 1 TiB).
> 
>      Consequently, using multiple memslots and only mapping the memslots we
>      really need can significantly reduce memory waste and speed up
>      memslot-related operations. Let's expose the sparse RAM memory region using
>      multiple memslots, mapping only the memslots we currently need into our
>      device memory region container.
> 
> The hyper-v balloon driver has similar demands [1].
> 
> For virtio-mem, this has to be turned manually on ("multiple-memslots=on"),
> due to the interaction with vhost (below).
> 
> If we have less than 509 memslots available, we always default to a single
> memslot. Otherwise, we automatically decide how many memslots to use
> based on a simple heuristic (see patch #12), and try not to use more than
> 256 memslots across all memory devices: our historical DIMM limit.
> 
> As soon as any memory devices automatically decided on using more than
> one memslot, vhost devices that support less than 509 memslots (e.g.,
> currently most vhost-user devices like with virtiofsd) can no longer be
> plugged as a precaution.
> 
> Quoting from patch #12:
> 
>      Plugging vhost devices with less than 509 memslots available while we
>      have memory devices plugged that consume multiple memslots due to
>      automatic decisions can be problematic. Most configurations might just fail
>      due to "limit < used + reserved", however, it can also happen that these
>      memory devices would suddenly consume memslots that would actually be
>      required by other memslot consumers (boot, PCI BARs) later. Note that this
>      has always been sketchy with vhost devices that support only a small number
>      of memslots; but we don't want to make it any worse.So let's keep it simple
>      and simply reject plugging such vhost devices in such a configuration.
> 
>      Eventually, all vhost devices that want to be fully compatible with such
>      memory devices should support a decent number of memslots (>= 509).
> 
> 
> The recommendation is to plug such vhost devices before the virtio-mem
> decides, or to not set "multiple-memslots=on". As soon as these devices
> support a reasonable number of memslots (>= 509), this will start working
> automatically.
> 
> I run some tests on x86_64, now also including vfio tests. Seems to work
> as expected, even when multiple memslots are used.
> 
> 
> Patch #1 -- #3 are from [2] that were not picked up yet.
> 
> Patch #4 -- #12 add handling of multiple memslots to memory devices
> 
> Patch #13 -- #14 add "multiple-memslots=on" support to virtio-mem
> 
> Patch #15 -- #16 make sure that virtio-mem memslots can be enabled/disable
>               atomically
> 
> v2 -> v3:
> * "kvm: Return number of free memslots"
>   -> Return 0 in stub
> * "kvm: Add stub for kvm_get_max_memslots()"
>   -> Return 0 in stub
> * Adjust other patches to check for kvm_enabled() before calling
>    kvm_get_free_memslots()/kvm_get_max_memslots()
> * Add RBs
> 
> v1 -> v2:
> * Include patches from [1]
> * A lot of code simplification and reorganization, too many to spell out
> * don't add a general soft-limit on memslots, to avoid warning in sane
>    setups
> * Simplify handling of vhost devices with a small number of memslots:
>    simply fail plugging them
> * "virtio-mem: Expose device memory via multiple memslots if enabled"
>   -> Fix one "is this the last memslot" check
> * Much more testing
> 
> 
> [1] https://lkml.kernel.org/r/cover.1689786474.git.maciej.szmigiero@oracle.com
> [2] https://lkml.kernel.org/r/20230523185915.540373-1-david@redhat.com
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Igor Mammedov <imammedo@redhat.com>
> Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
> Cc: Eduardo Habkost <eduardo@habkost.net>
> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> Cc: Yanan Wang <wangyanan55@huawei.com>
> Cc: Michal Privoznik <mprivozn@redhat.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Gavin Shan <gshan@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
> Cc: kvm@vger.kernel.org
> 
> David Hildenbrand (16):
>    vhost: Rework memslot filtering and fix "used_memslot" tracking
>    vhost: Remove vhost_backend_can_merge() callback
>    softmmu/physmem: Fixup qemu_ram_block_from_host() documentation
>    kvm: Return number of free memslots
>    vhost: Return number of free memslots
>    memory-device: Support memory devices with multiple memslots
>    stubs: Rename qmp_memory_device.c to memory_device.c
>    memory-device: Track required and actually used memslots in
>      DeviceMemoryState
>    memory-device,vhost: Support memory devices that dynamically consume
>      memslots
>    kvm: Add stub for kvm_get_max_memslots()
>    vhost: Add vhost_get_max_memslots()
>    memory-device,vhost: Support automatic decision on the number of
>      memslots
>    memory: Clarify mapping requirements for RamDiscardManager
>    virtio-mem: Expose device memory via multiple memslots if enabled
>    memory,vhost: Allow for marking memory device memory regions
>      unmergeable
>    virtio-mem: Mark memslot alias memory regions unmergeable
> 
>   MAINTAINERS                                   |   1 +
>   accel/kvm/kvm-all.c                           |  35 ++-
>   accel/stubs/kvm-stub.c                        |   9 +-
>   hw/mem/memory-device.c                        | 196 ++++++++++++-
>   hw/virtio/vhost-stub.c                        |   9 +-
>   hw/virtio/vhost-user.c                        |  21 +-
>   hw/virtio/vhost-vdpa.c                        |   1 -
>   hw/virtio/vhost.c                             | 103 +++++--
>   hw/virtio/virtio-mem-pci.c                    |  21 ++
>   hw/virtio/virtio-mem.c                        | 272 +++++++++++++++++-
>   include/exec/cpu-common.h                     |  15 +
>   include/exec/memory.h                         |  27 +-
>   include/hw/boards.h                           |  14 +-
>   include/hw/mem/memory-device.h                |  57 ++++
>   include/hw/virtio/vhost-backend.h             |   9 +-
>   include/hw/virtio/vhost.h                     |   3 +-
>   include/hw/virtio/virtio-mem.h                |  23 +-
>   include/sysemu/kvm.h                          |   4 +-
>   include/sysemu/kvm_int.h                      |   1 +
>   softmmu/memory.c                              |  35 ++-
>   softmmu/physmem.c                             |  17 --
>   .../{qmp_memory_device.c => memory_device.c}  |  10 +
>   stubs/meson.build                             |   2 +-
>   23 files changed, 779 insertions(+), 106 deletions(-)
>   rename stubs/{qmp_memory_device.c => memory_device.c} (56%)
> 

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2023-09-11  7:46 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-08 14:21 [PATCH v3 00/16] virtio-mem: Expose device memory through multiple memslots David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 01/16] vhost: Rework memslot filtering and fix "used_memslot" tracking David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 02/16] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 03/16] softmmu/physmem: Fixup qemu_ram_block_from_host() documentation David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 04/16] kvm: Return number of free memslots David Hildenbrand
2023-09-16 16:05   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 05/16] vhost: " David Hildenbrand
2023-09-16 16:07   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 06/16] memory-device: Support memory devices with multiple memslots David Hildenbrand
2023-09-16 16:27   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 07/16] stubs: Rename qmp_memory_device.c to memory_device.c David Hildenbrand
2023-09-16 16:28   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 08/16] memory-device: Track required and actually used memslots in DeviceMemoryState David Hildenbrand
2023-09-16 16:36   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 09/16] memory-device, vhost: Support memory devices that dynamically consume memslots David Hildenbrand
2023-09-16 17:52   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 10/16] kvm: Add stub for kvm_get_max_memslots() David Hildenbrand
2023-09-16 17:13   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 11/16] vhost: Add vhost_get_max_memslots() David Hildenbrand
2023-09-16 17:16   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 12/16] memory-device, vhost: Support automatic decision on the number of memslots David Hildenbrand
2023-09-17 10:46   ` Maciej S. Szmigiero
2023-09-18 12:33     ` David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 13/16] memory: Clarify mapping requirements for RamDiscardManager David Hildenbrand
2023-09-16 17:31   ` Maciej S. Szmigiero
2023-09-08 14:21 ` [PATCH v3 14/16] virtio-mem: Expose device memory via multiple memslots if enabled David Hildenbrand
2023-09-17 11:47   ` Maciej S. Szmigiero
2023-09-19  8:08     ` David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 15/16] memory, vhost: Allow for marking memory device memory regions unmergeable David Hildenbrand
2023-09-08 14:21 ` [PATCH v3 16/16] virtio-mem: Mark memslot alias " David Hildenbrand
2023-09-11  7:45 ` David Hildenbrand [this message]
2023-09-19  8:20   ` [PATCH v3 00/16] virtio-mem: Expose device memory through multiple memslots David Hildenbrand
2023-09-19  9:34     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87e38689-c99b-0c92-3567-589cd9a2bc4c@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=wangyanan55@huawei.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).