From: David Hildenbrand <david@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"David Hildenbrand" <david@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Igor Mammedov" <imammedo@redhat.com>,
"Ani Sinha" <ani@anisinha.ca>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots
Date: Wed, 13 Oct 2021 12:33:15 +0200 [thread overview]
Message-ID: <20211013103330.26869-1-david@redhat.com> (raw)
Based-on: 20211011175346.15499-1-david@redhat.com
A virtio-mem device is represented by a single large RAM memory region
backed by a single large mmap.
Right now, we map that complete memory region into guest physical addres
space, resulting in a very large memory mapping, KVM memory slot, ...
although only a small amount of memory might actually be exposed to the VM.
For example, when starting a VM with a 1 TiB virtio-mem device that only
exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
in order to hotplug more memory later, we waste a lot of memory on metadata
for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
optimizations in KVM are being worked on to reduce this metadata overhead
on x86-64 in some cases, it remains a problem with nested VMs and there are
other reasons why we would want to reduce the total memory slot to a
reasonable minimum.
We want to:
a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
inside QEMU KVM code where possible.
b) Not always expose all device-memory to the VM, to reduce the attack
surface of malicious VMs without using userfaultfd.
So instead, expose the RAM memory region not by a single large mapping
(consuming one memslot) but instead by multiple mappings, each consuming
one memslot. To do that, we divide the RAM memory region via aliases into
separate parts and only map the aliases into a device container we actually
need. We have to make sure that QEMU won't silently merge the memory
sections corresponding to the aliases (and thereby also memslots),
otherwise we lose atomic updates with KVM and vhost-user, which we deeply
care about when adding/removing memory. Further, to get memslot accounting
right, such merging is better avoided.
Within the memslots, virtio-mem can (un)plug memory in smaller granularity
dynamically. So memslots are a pure optimization to tackle a) and b) above.
Memslots are right now mapped once they fall into the usable device region
(which grows/shrinks on demand right now either when requesting to
hotplug more memory or during/after reboots). In the future, with
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to (un)map aliases even
more dynamically when (un)plugging device blocks.
Adding a 500GiB virtio-mem device and not hotplugging any memory results in:
0000000140000000-000001047fffffff (prio 0, i/o): device-memory
0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
Requesting the VM to consume 2 GiB results in (note: the usable region size
is bigger than 2 GiB, so 3 * 1 GiB memslots are required):
0000000140000000-000001047fffffff (prio 0, i/o): device-memory
0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
0000000140000000-000000017fffffff (prio 0, ram): alias virtio-mem-memslot-0 @mem0 0000000000000000-000000003fffffff
0000000180000000-00000001bfffffff (prio 0, ram): alias virtio-mem-memslot-1 @mem0 0000000040000000-000000007fffffff
00000001c0000000-00000001ffffffff (prio 0, ram): alias virtio-mem-memslot-2 @mem0 0000000080000000-00000000bfffffff
Requesting the VM to consume 20 GiB results in:
0000000140000000-000001047fffffff (prio 0, i/o): device-memory
0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
0000000140000000-000000017fffffff (prio 0, ram): alias virtio-mem-memslot-0 @mem0 0000000000000000-000000003fffffff
0000000180000000-00000001bfffffff (prio 0, ram): alias virtio-mem-memslot-1 @mem0 0000000040000000-000000007fffffff
00000001c0000000-00000001ffffffff (prio 0, ram): alias virtio-mem-memslot-2 @mem0 0000000080000000-00000000bfffffff
0000000200000000-000000023fffffff (prio 0, ram): alias virtio-mem-memslot-3 @mem0 00000000c0000000-00000000ffffffff
0000000240000000-000000027fffffff (prio 0, ram): alias virtio-mem-memslot-4 @mem0 0000000100000000-000000013fffffff
0000000280000000-00000002bfffffff (prio 0, ram): alias virtio-mem-memslot-5 @mem0 0000000140000000-000000017fffffff
00000002c0000000-00000002ffffffff (prio 0, ram): alias virtio-mem-memslot-6 @mem0 0000000180000000-00000001bfffffff
0000000300000000-000000033fffffff (prio 0, ram): alias virtio-mem-memslot-7 @mem0 00000001c0000000-00000001ffffffff
0000000340000000-000000037fffffff (prio 0, ram): alias virtio-mem-memslot-8 @mem0 0000000200000000-000000023fffffff
0000000380000000-00000003bfffffff (prio 0, ram): alias virtio-mem-memslot-9 @mem0 0000000240000000-000000027fffffff
00000003c0000000-00000003ffffffff (prio 0, ram): alias virtio-mem-memslot-10 @mem0 0000000280000000-00000002bfffffff
0000000400000000-000000043fffffff (prio 0, ram): alias virtio-mem-memslot-11 @mem0 00000002c0000000-00000002ffffffff
0000000440000000-000000047fffffff (prio 0, ram): alias virtio-mem-memslot-12 @mem0 0000000300000000-000000033fffffff
0000000480000000-00000004bfffffff (prio 0, ram): alias virtio-mem-memslot-13 @mem0 0000000340000000-000000037fffffff
00000004c0000000-00000004ffffffff (prio 0, ram): alias virtio-mem-memslot-14 @mem0 0000000380000000-00000003bfffffff
0000000500000000-000000053fffffff (prio 0, ram): alias virtio-mem-memslot-15 @mem0 00000003c0000000-00000003ffffffff
0000000540000000-000000057fffffff (prio 0, ram): alias virtio-mem-memslot-16 @mem0 0000000400000000-000000043fffffff
0000000580000000-00000005bfffffff (prio 0, ram): alias virtio-mem-memslot-17 @mem0 0000000440000000-000000047fffffff
00000005c0000000-00000005ffffffff (prio 0, ram): alias virtio-mem-memslot-18 @mem0 0000000480000000-00000004bfffffff
0000000600000000-000000063fffffff (prio 0, ram): alias virtio-mem-memslot-19 @mem0 00000004c0000000-00000004ffffffff
0000000640000000-000000067fffffff (prio 0, ram): alias virtio-mem-memslot-20 @mem0 0000000500000000-000000053fffffff
Requesting the VM to consume 5 GiB and rebooting (note: usable region size
will change during reboots) results in:
0000000140000000-000001047fffffff (prio 0, i/o): device-memory
0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
0000000140000000-000000017fffffff (prio 0, ram): alias virtio-mem-memslot-0 @mem0 0000000000000000-000000003fffffff
0000000180000000-00000001bfffffff (prio 0, ram): alias virtio-mem-memslot-1 @mem0 0000000040000000-000000007fffffff
00000001c0000000-00000001ffffffff (prio 0, ram): alias virtio-mem-memslot-2 @mem0 0000000080000000-00000000bfffffff
0000000200000000-000000023fffffff (prio 0, ram): alias virtio-mem-memslot-3 @mem0 00000000c0000000-00000000ffffffff
0000000240000000-000000027fffffff (prio 0, ram): alias virtio-mem-memslot-4 @mem0 0000000100000000-000000013fffffff
0000000280000000-00000002bfffffff (prio 0, ram): alias virtio-mem-memslot-5 @mem0 0000000140000000-000000017fffffff
In addition to other factors, we limit the number of memslots to 1024 per
devices and the size of one memslot to at least 1 GiB. So only a 1 TiB
virtio-mem device could consume 1024 memslots in the "worst" case. To
calculate a memslot limit for a device, we use a heuristic based on all
available memslots for memory devices and the percentage of
"device size":"total memory device area size". Users can further limit
the maximum number of memslots that will be used by a device by setting
the "max-memslots" property. It's expected to be set to "0" (auto) in most
setups.
In recent setups (e.g., KVM with ~32k memslots, vhost-user with ~4k
memslots after this series), we'll get the biggest benefit. In special
setups (e.g., older KVM, vhost kernel with 64 memslots), we'll get some
benefit -- the individual memslots will be bigger.
Future work:
- vhost-user and libvhost-user optimizations for handling more memslots
more efficiently.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Ani Sinha <ani@anisinha.ca>
Cc: Peter Xu <peterx@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: kvm@vger.kernel.org
David Hildenbrand (15):
memory: Drop mapping check from
memory_region_get_ram_discard_manager()
kvm: Return number of free memslots
vhost: Return number of free memslots
memory: Allow for marking memory region aliases unmergeable
vhost: Don't merge unmergeable memory sections
memory-device: Move memory_device_check_addable() directly into
memory_device_pre_plug()
memory-device: Generalize memory_device_used_region_size()
memory-device: Support memory devices that consume a variable number
of memslots
vhost: Respect reserved memslots for memory devices when realizing a
vhost device
virtio-mem: Set the RamDiscardManager for the RAM memory region
earlier
virtio-mem: Fix typo in virito_mem_intersect_memory_section() function
name
virtio-mem: Expose device memory via separate memslots
vhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 496 with
CONFIG_VIRTIO_MEM
libvhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 4096
virtio-mem: Set "max-memslots" to 0 (auto) for the 6.2 machine
accel/kvm/kvm-all.c | 24 ++-
accel/stubs/kvm-stub.c | 4 +-
hw/core/machine.c | 1 +
hw/mem/memory-device.c | 167 +++++++++++++++---
hw/virtio/vhost-stub.c | 2 +-
hw/virtio/vhost-user.c | 7 +-
hw/virtio/vhost.c | 17 +-
hw/virtio/virtio-mem-pci.c | 22 +++
hw/virtio/virtio-mem.c | 202 ++++++++++++++++++++--
include/exec/memory.h | 23 +++
include/hw/mem/memory-device.h | 32 ++++
include/hw/virtio/vhost.h | 2 +-
include/hw/virtio/virtio-mem.h | 29 +++-
include/sysemu/kvm.h | 2 +-
softmmu/memory.c | 35 +++-
stubs/qmp_memory_device.c | 5 +
subprojects/libvhost-user/libvhost-user.h | 7 +-
17 files changed, 499 insertions(+), 82 deletions(-)
--
2.31.1
next reply other threads:[~2021-10-13 10:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-13 10:33 David Hildenbrand [this message]
2021-10-13 10:33 ` [PATCH RFC 01/15] memory: Drop mapping check from memory_region_get_ram_discard_manager() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 02/15] kvm: Return number of free memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 03/15] vhost: " David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 04/15] memory: Allow for marking memory region aliases unmergeable David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 05/15] vhost: Don't merge unmergeable memory sections David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 06/15] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 07/15] memory-device: Generalize memory_device_used_region_size() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 08/15] memory-device: Support memory devices that consume a variable number of memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 09/15] vhost: Respect reserved memslots for memory devices when realizing a vhost device David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 10/15] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 11/15] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots David Hildenbrand
2021-10-14 11:45 ` Dr. David Alan Gilbert
2021-10-14 13:17 ` David Hildenbrand
2021-10-20 12:17 ` David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 13/15] vhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 496 with CONFIG_VIRTIO_MEM David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 14/15] libvhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 4096 David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 15/15] virtio-mem: Set "max-memslots" to 0 (auto) for the 6.2 machine David Hildenbrand
2021-10-13 19:03 ` [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots Dr. David Alan Gilbert
2021-10-14 7:01 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211013103330.26869-1-david@redhat.com \
--to=david@redhat.com \
--cc=ani@anisinha.ca \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=f4bug@amsat.org \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).