From: David Hildenbrand <david@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Igor Mammedov" <imammedo@redhat.com>,
"Ani Sinha" <ani@anisinha.ca>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots
Date: Thu, 14 Oct 2021 09:01:59 +0200 [thread overview]
Message-ID: <f63fc766-a044-7978-df40-27dff8f79bf5@redhat.com> (raw)
In-Reply-To: <YWcthytjDJUXdN0w@work-vm>
On 13.10.21 21:03, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (david@redhat.com) wrote:
>> Based-on: 20211011175346.15499-1-david@redhat.com
>>
>> A virtio-mem device is represented by a single large RAM memory region
>> backed by a single large mmap.
>>
>> Right now, we map that complete memory region into guest physical addres
>> space, resulting in a very large memory mapping, KVM memory slot, ...
>> although only a small amount of memory might actually be exposed to the VM.
>>
>> For example, when starting a VM with a 1 TiB virtio-mem device that only
>> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
>> in order to hotplug more memory later, we waste a lot of memory on metadata
>> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
>> optimizations in KVM are being worked on to reduce this metadata overhead
>> on x86-64 in some cases, it remains a problem with nested VMs and there are
>> other reasons why we would want to reduce the total memory slot to a
>> reasonable minimum.
>>
>> We want to:
>> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
>> inside QEMU KVM code where possible.
>> b) Not always expose all device-memory to the VM, to reduce the attack
>> surface of malicious VMs without using userfaultfd.
>>
>> So instead, expose the RAM memory region not by a single large mapping
>> (consuming one memslot) but instead by multiple mappings, each consuming
>> one memslot. To do that, we divide the RAM memory region via aliases into
>> separate parts and only map the aliases into a device container we actually
>> need. We have to make sure that QEMU won't silently merge the memory
>> sections corresponding to the aliases (and thereby also memslots),
>> otherwise we lose atomic updates with KVM and vhost-user, which we deeply
>> care about when adding/removing memory. Further, to get memslot accounting
>> right, such merging is better avoided.
>>
>> Within the memslots, virtio-mem can (un)plug memory in smaller granularity
>> dynamically. So memslots are a pure optimization to tackle a) and b) above.
>>
>> Memslots are right now mapped once they fall into the usable device region
>> (which grows/shrinks on demand right now either when requesting to
>> hotplug more memory or during/after reboots). In the future, with
>> VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to (un)map aliases even
>> more dynamically when (un)plugging device blocks.
>>
>>
>> Adding a 500GiB virtio-mem device and not hotplugging any memory results in:
>> 0000000140000000-000001047fffffff (prio 0, i/o): device-memory
>> 0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
>>
>> Requesting the VM to consume 2 GiB results in (note: the usable region size
>> is bigger than 2 GiB, so 3 * 1 GiB memslots are required):
>> 0000000140000000-000001047fffffff (prio 0, i/o): device-memory
>> 0000000140000000-0000007e3fffffff (prio 0, i/o): virtio-mem-memslots
>> 0000000140000000-000000017fffffff (prio 0, ram): alias virtio-mem-memslot-0 @mem0 0000000000000000-000000003fffffff
>> 0000000180000000-00000001bfffffff (prio 0, ram): alias virtio-mem-memslot-1 @mem0 0000000040000000-000000007fffffff
>> 00000001c0000000-00000001ffffffff (prio 0, ram): alias virtio-mem-memslot-2 @mem0 0000000080000000-00000000bfffffff
>
> I've got a vague memory that there were some devices that didn't like
> doing split IO across a memory region (or something) - some virtio
> devices? Do you know if that's still true and if that causes a problem?
Interesting point! I am not aware of any such issues, and I'd be
surprised if we'd still have such buggy devices, because the layout
virtio-mem now creates is just very similar to the layout we'll
automatically create with ordinary DIMMs.
If we hotplug DIMMs they will end up consecutive in guest physical
address space, however, having separate memory regions and requiring
separate memory slots. So, very similar to a virtio-mem device now.
Maybe the catch is that it's hard to cross memory regions that are e.g.,
>- 128 MiB aligned, because ordinary allocations (e.g., via the buddy in
Linux which supports <= 4 MiB pages) in won't cross these blocks.
--
Thanks,
David / dhildenb
prev parent reply other threads:[~2021-10-14 7:03 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-13 10:33 [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 01/15] memory: Drop mapping check from memory_region_get_ram_discard_manager() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 02/15] kvm: Return number of free memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 03/15] vhost: " David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 04/15] memory: Allow for marking memory region aliases unmergeable David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 05/15] vhost: Don't merge unmergeable memory sections David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 06/15] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 07/15] memory-device: Generalize memory_device_used_region_size() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 08/15] memory-device: Support memory devices that consume a variable number of memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 09/15] vhost: Respect reserved memslots for memory devices when realizing a vhost device David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 10/15] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 11/15] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots David Hildenbrand
2021-10-14 11:45 ` Dr. David Alan Gilbert
2021-10-14 13:17 ` David Hildenbrand
2021-10-20 12:17 ` David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 13/15] vhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 496 with CONFIG_VIRTIO_MEM David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 14/15] libvhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 4096 David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 15/15] virtio-mem: Set "max-memslots" to 0 (auto) for the 6.2 machine David Hildenbrand
2021-10-13 19:03 ` [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots Dr. David Alan Gilbert
2021-10-14 7:01 ` David Hildenbrand [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f63fc766-a044-7978-df40-27dff8f79bf5@redhat.com \
--to=david@redhat.com \
--cc=ani@anisinha.ca \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=f4bug@amsat.org \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).