From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
"Igor Mammedov" <imammedo@redhat.com>,
"Xiao Guangrong" <xiaoguangrong.eric@gmail.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Michal Privoznik" <mprivozn@redhat.com>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Gavin Shan" <gshan@redhat.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [PATCH v4 16/18] virtio-mem: Expose device memory dynamically via multiple memslots if enabled
Date: Mon, 2 Oct 2023 10:57:40 +0200 [thread overview]
Message-ID: <56cec8a9-f4bb-5e9e-a1c4-223359f8b491@redhat.com> (raw)
In-Reply-To: <11c6efbd-b794-4a05-9c51-4928fb545db4@maciej.szmigiero.name>
On 30.09.23 19:31, Maciej S. Szmigiero wrote:
> On 26.09.2023 20:57, David Hildenbrand wrote:
>> Having large virtio-mem devices that only expose little memory to a VM
>> is currently a problem: we map the whole sparse memory region into the
>> guest using a single memslot, resulting in one gigantic memslot in KVM.
>> KVM allocates metadata for the whole memslot, which can result in quite
>> some memory waste.
>>
>> Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
>> 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
>> allocate metadata for that 1 TiB memslot: on x86, this implies allocating
>> a significant amount of memory for metadata:
>>
>> (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
>> -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
>>
>> With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
>> allocated lazily when required for nested VMs
>> (2) gfn_track: 2 bytes per 4 KiB
>> -> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
>> (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
>> -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
>> (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
>> -> For 1 TiB: 536870912 = 64 MiB (0.006 %)
>>
>> So we primarily care about (1) and (2). The bad thing is, that the
>> memory consumption *doubles* once SMM is enabled, because we create the
>> memslot once for !SMM and once for SMM.
>>
>> Having a 1 TiB memslot without the TDP MMU consumes around:
>> * With SMM: 5 GiB
>> * Without SMM: 2.5 GiB
>> Having a 1 TiB memslot with the TDP MMU consumes around:
>> * With SMM: 1 GiB
>> * Without SMM: 512 MiB
>>
>> ... and that's really something we want to optimize, to be able to just
>> start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
>> that can grow very large (e.g., 1 TiB).
>>
>> Consequently, using multiple memslots and only mapping the memslots we
>> really need can significantly reduce memory waste and speed up
>> memslot-related operations. Let's expose the sparse RAM memory region using
>> multiple memslots, mapping only the memslots we currently need into our
>> device memory region container.
>>
>> The feature can be enabled using "dynamic-memslots=on" and requires
>> "unplugged-inaccessible=on", which is nowadays the default.
>>
>> Once enabled, we'll auto-detect the number of memslots to use based on the
>> memslot limit provided by the core. We'll use at most 1 memslot per
>> gigabyte. Note that our global limit of memslots accross all memory devices
>> is currently set to 256: even with multiple large virtio-mem devices,
>> we'd still have a sane limit on the number of memslots used.
>>
>> The default is to not dynamically map memslot for now
>> ("dynamic-memslots=off"). The optimization must be enabled manually,
>> because some vhost setups (e.g., hotplug of vhost-user devices) might be
>> problematic until we support more memslots especially in vhost-user backends.
>>
>> Note that "dynamic-memslots=on" is just a hint that multiple memslots
>> *may* be used for internal optimizations, not that multiple memslots
>> *must* be used. The actual number of memslots that are used is an
>> internal detail: for example, once memslot metadata is no longer an
>> issue, we could simply stop optimizing for that. Migration source and
>> destination can differ on the setting of "dynamic-memslots".
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>
> The changes seem reasonable, so:
> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Thanks Maciej!
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-10-02 8:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-26 18:57 [PATCH v4 00/18] virtio-mem: Expose device memory through multiple memslots David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 01/18] vhost: Rework memslot filtering and fix "used_memslot" tracking David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 02/18] vhost: Remove vhost_backend_can_merge() callback David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 03/18] softmmu/physmem: Fixup qemu_ram_block_from_host() documentation David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 04/18] kvm: Return number of free memslots David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 05/18] vhost: " David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 06/18] memory-device: Support memory devices with multiple memslots David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 07/18] stubs: Rename qmp_memory_device.c to memory_device.c David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 08/18] memory-device: Track required and actually used memslots in DeviceMemoryState David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 09/18] memory-device, vhost: Support memory devices that dynamically consume memslots David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 10/18] kvm: Add stub for kvm_get_max_memslots() David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 11/18] vhost: Add vhost_get_max_memslots() David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 12/18] memory-device, vhost: Support automatic decision on the number of memslots David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 13/18] memory: Clarify mapping requirements for RamDiscardManager David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 14/18] virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb David Hildenbrand
2023-09-28 17:41 ` Maciej S. Szmigiero
2023-09-26 18:57 ` [PATCH v4 15/18] virtio-mem: Update state to match bitmap as soon as it's been migrated David Hildenbrand
2023-09-30 15:50 ` Maciej S. Szmigiero
2023-09-26 18:57 ` [PATCH v4 16/18] virtio-mem: Expose device memory dynamically via multiple memslots if enabled David Hildenbrand
2023-09-30 17:31 ` Maciej S. Szmigiero
2023-10-02 8:57 ` David Hildenbrand [this message]
2023-09-26 18:57 ` [PATCH v4 17/18] memory, vhost: Allow for marking memory device memory regions unmergeable David Hildenbrand
2023-09-26 18:57 ` [PATCH v4 18/18] virtio-mem: Mark memslot alias " David Hildenbrand
2023-10-02 8:58 ` [PATCH v4 00/18] virtio-mem: Expose device memory through multiple memslots David Hildenbrand
2023-10-03 13:39 ` Michael S. Tsirkin
2023-10-06 9:29 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56cec8a9-f4bb-5e9e-a1c4-223359f8b491@redhat.com \
--to=david@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=berrange@redhat.com \
--cc=eduardo@habkost.net \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mail@maciej.szmigiero.name \
--cc=marcel.apfelbaum@gmail.com \
--cc=mprivozn@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=wangyanan55@huawei.com \
--cc=xiaoguangrong.eric@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).