Re: [PATCH v1 13/15] virtio-mem: Expose device memory via multiple memslots if enabled

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>,
	qemu-devel@nongnu.org
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Xiao Guangrong" <xiaoguangrong.eric@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"Michal Privoznik" <mprivozn@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>,
	"Gavin Shan" <gshan@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v1 13/15] virtio-mem: Expose device memory via multiple memslots if enabled
Date: Fri, 14 Jul 2023 12:01:04 +0200	[thread overview]
Message-ID: <34f749c1-db52-f435-f887-f8a9852150d1@redhat.com> (raw)
In-Reply-To: <3bd720ec-8f61-d3e9-c998-4873e0c4f778@maciej.szmigiero.name>

On 13.07.23 21:58, Maciej S. Szmigiero wrote:
> On 16.06.2023 11:26, David Hildenbrand wrote:
>> Having large virtio-mem devices that only expose little memory to a VM
>> is currently a problem: we map the whole sparse memory region into the
>> guest using a single memslot, resulting in one gigantic memslot in KVM.
>> KVM allocates metadata for the whole memslot, which can result in quite
>> some memory waste.
>>
>> Assuming we have a 1 TiB virtio-mem device and only expose little (e.g.,
>> 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to
>> allocate metadata for that 1 TiB memslot: on x86, this implies allocating
>> a significant amount of memory for metadata:
>>
>> (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB
>>       -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %)
>>
>>       With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets
>>       allocated lazily when required for nested VMs
>> (2) gfn_track: 2 bytes per 4 KiB
>>       -> For 1 TiB: 536870912 = ~512 MiB (0.05 %)
>> (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB
>>       -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %)
>> (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page
>>       -> For 1 TiB: 536870912 = 64 MiB (0.006 %)
>>
>> So we primarily care about (1) and (2). The bad thing is, that the
>> memory consumption *doubles* once SMM is enabled, because we create the
>> memslot once for !SMM and once for SMM.
>>
>> Having a 1 TiB memslot without the TDP MMU consumes around:
>> * With SMM: 5 GiB
>> * Without SMM: 2.5 GiB
>> Having a 1 TiB memslot with the TDP MMU consumes around:
>> * With SMM: 1 GiB
>> * Without SMM: 512 MiB
>>
>> ... and that's really something we want to optimize, to be able to just
>> start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device
>> that can grow very large (e.g., 1 TiB).
>>
>> Consequently, using multiple memslots and only mapping the memslots we
>> really need can significantly reduce memory waste and speed up
>> memslot-related operations. Let's expose the sparse RAM memory region using
>> multiple memslots, mapping only the memslots we currently need into our
>> device memory region container.
>>
>> * With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we only map the memslots that
>>     actually have memory plugged, and dynamically (un)map when
>>     (un)plugging memory blocks.
>>
>> * Without VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we always map the memslots
>>     covered by the usable region, and dynamically (un)map when resizing the
>>     usable region.
>>
>> We'll auto-determine the number of memslots to use based on the suggested
>> memslot limit provided by the core. We'll use at most 1 memslot per
>> gigabyte. Note that our global limit of memslots accross all memory devices
>> is currently set to 256: even with multiple large virtio-mem devices, we'd
>> still have a sane limit on the number of memslots used.
>>
>> The default is a single memslot for now ("multiple-memslots=off"). The
>> optimization must be enabled manually using "multiple-memslots=on", because
>> some vhost setups (e.g., hotplug of vhost-user devices) might be
>> problematic until we support more memslots especially in vhost-user
>> backends.
>>
>> Note that "multiple-memslots=on" is just a hint that multiple memslots
>> *may* be used for internal optimizations, not that multiple memslots
>> *must* be used. The actual number of memslots that are used is an
>> internal detail: for example, once memslot metadata is no longer an
>> issue, we could simply stop optimizing for that. Migration source and
>> destination can differ on the setting of "multiple-memslots".
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>    hw/virtio/virtio-mem-pci.c     |  21 +++
>>    hw/virtio/virtio-mem.c         | 265 ++++++++++++++++++++++++++++++++-
>>    include/hw/virtio/virtio-mem.h |  23 ++-
>>    3 files changed, 304 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
>> index b85c12668d..8b403e7e78 100644
>> --- a/hw/virtio/virtio-mem-pci.c
>> +++ b/hw/virtio/virtio-mem-pci.c
> (...)
>> @@ -790,6 +921,43 @@ static void virtio_mem_system_reset(void *opaque)
>>        virtio_mem_unplug_all(vmem);
>>    }
>>    
>> +static void virtio_mem_prepare_mr(VirtIOMEM *vmem)
>> +{
>> +    const uint64_t region_size = memory_region_size(&vmem->memdev->mr);
>> +
>> +    g_assert(!vmem->mr);
>> +    vmem->mr = g_new0(MemoryRegion, 1);
>> +    memory_region_init(vmem->mr, OBJECT(vmem), "virtio-mem",
>> +                       region_size);
>> +    vmem->mr->align = memory_region_get_alignment(&vmem->memdev->mr);
>> +}
>> +
>> +static void virtio_mem_prepare_memslots(VirtIOMEM *vmem)
>> +{
>> +    const uint64_t region_size = memory_region_size(&vmem->memdev->mr);
>> +    unsigned int idx;
>> +
>> +    g_assert(!vmem->memslots && vmem->nb_memslots);
>> +    vmem->memslots = g_new0(MemoryRegion, vmem->nb_memslots);
>> +
>> +    /* Initialize our memslots, but don't map them yet. */
>> +    for (idx = 0; idx < vmem->nb_memslots; idx++) {
>> +        const uint64_t memslot_offset = idx * vmem->memslot_size;
>> +        uint64_t memslot_size = vmem->memslot_size;
>> +        char name[20];
>> +
>> +        /* The size of the last memslot might be smaller. */
>> +        if (idx == vmem->nb_memslots) {                       ^
> I guess this should be "vmem->nb_memslots - 1" since that's the last
> memslot index.

Indeed, thanks!

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-07-14 10:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-16  9:26 [PATCH v1 00/15] virtio-mem: Expose device memory through multiple memslots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 01/15] memory-device: Track the required memslots in DeviceMemoryState David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 02/15] kvm: Add stub for kvm_get_max_memslots() David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 03/15] vhost: Add vhost_get_max_memslots() David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 04/15] memory-device, vhost: Add a memslot soft limit for memory devices David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 05/15] kvm: Return number of free memslots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 06/15] vhost: " David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 07/15] memory-device: Support memory devices that statically consume multiple memslots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 08/15] memory-device: Track the actually used memslots in DeviceMemoryState David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 09/15] memory-device, vhost: Support memory devices that dynamically consume multiple memslots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 10/15] pc-dimm: Provide pc_dimm_get_free_slots() to query free ram slots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 11/15] memory-device: Support memory-devices with auto-detection of the number of memslots David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 12/15] memory: Clarify mapping requirements for RamDiscardManager David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 13/15] virtio-mem: Expose device memory via multiple memslots if enabled David Hildenbrand
2023-07-13 19:58   ` Maciej S. Szmigiero
2023-07-14 10:01     ` David Hildenbrand [this message]
2023-06-16  9:26 ` [PATCH v1 14/15] memory, vhost: Allow for marking memory device memory regions unmergeable David Hildenbrand
2023-06-16  9:26 ` [PATCH v1 15/15] virtio-mem: Mark memslot alias " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34f749c1-db52-f435-f887-f8a9852150d1@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mprivozn@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=wangyanan55@huawei.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).