Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	kvm@vger.kernel.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Ani Sinha" <ani@anisinha.ca>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots
Date: Wed, 20 Oct 2021 14:17:16 +0200	[thread overview]
Message-ID: <81fc0417-8335-cbce-e4ad-53cbb52183a6@redhat.com> (raw)
In-Reply-To: <83270a38-a179-b2c5-9bab-7dd614dc07d6@redhat.com>

On 14.10.21 15:17, David Hildenbrand wrote:
> On 14.10.21 13:45, Dr. David Alan Gilbert wrote:
>> * David Hildenbrand (david@redhat.com) wrote:
>>> KVM nowadays supports a lot of memslots. We want to exploit that in
>>> virtio-mem, exposing device memory via separate memslots to the guest
>>> on demand, essentially reducing the total size of KVM slots
>>> significantly (and thereby metadata in KVM and in QEMU for KVM memory
>>> slots) especially when exposing initially only a small amount of memory
>>> via a virtio-mem device to the guest, to hotplug more later. Further,
>>> not always exposing the full device memory region to the guest reduces
>>> the attack surface in many setups without requiring other mechanisms
>>> like uffd for protection of unplugged memory.
>>>
>>> So split the original RAM region via memory region aliases into separate
>>> chunks (ending up as individual memslots), and dynamically map the
>>> required chunks (falling into the usable region) into the container.
>>>
>>> For now, we always map the memslots covered by the usable region. In the
>>> future, with VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to map
>>> memslots on actual demand and optimize further.
>>>
>>> Users can specify via the "max-memslots" property how much memslots the
>>> virtio-mem device is allowed to use at max. "0" translates to "auto, no
>>> limit" and is determinded automatically using a heuristic. When a maximum
>>> (> 1) is specified, that auto-determined value is capped. The parameter
>>> doesn't have to be migrated and can differ between source and destination.
>>> The only reason the parameter exists is not make some corner case setups
>>> (multiple large virtio-mem devices assigned to a single virtual NUMA node
>>>  with only very limited available memslots, hotplug of vhost devices) work.
>>> The parameter will be set to be "0" as default soon, whereby it will remain
>>> to be "1" for compat machines.
>>>
>>> The properties "memslots" and "used-memslots" are read-only.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> I think you need to move this patch after the vhost-user patches so that
>> you don't break a bisect including vhost-user.
> 
> As the default is set to 1 and is set to 0 ("auto") in the last patch in
> this series, there should be (almost) no difference regarding vhost-user.
> 
>>
>> But I do worry about the effect on vhost-user:
> 
> The 4096 limit was certainly more "let's make it extreme so we raise
> some eyebrows and we can talk about the implications". I'd be perfectly
> happy with 256 or better 512. Anything that's bigger than 32 in case of
> virtiofsd :)
> 
>>   a) What about external programs like dpdk?
> 
> At least initially virtio-mem won't apply to dpdk and similar workloads
> (RT). For example, virtio-mem is incompatible with mlock. So I think the
> most important use case to optimize for is virtio-mem+virtiofsd
> (especially kata).
> 
>>   b) I worry if you end up with a LOT of slots you end up with a lot of
>> mmap's and fd's in vhost-user; I'm not quite sure what all the effects
>> of that will be.
> 
> At least for virtio-mem, there will be a small number of fd's, as many
> memslots share the same fd, so with virtio-mem it's not an issue.
> 
> #VMAs is indeed worth discussing. Usually we can have up to 64k VMAs in
> a process. The downside of having many is some reduce pagefault
> performance. It really also depends on the target application. Maybe
> there should be some libvhost-user toggle, where the application can opt
> in to allow more?
> 

I just did a simple test with memfds. The 1024 open fds limit does not
apply to fds we already closed again.

So the 1024 limit does not apply when done via

fd = open()
addr = mmap(fd)
close(fd)

For example, I did a simple test by creating 4096 memfds, mapping them,
and then closing the file. The end result is

$ ls -la /proc/38113/map_files/ | wc -l
4115
$ ls -la /proc/38113/fd/ | wc -l
6

Meaning there are many individual mappings, but only very limited open files

Which should be precisely what we are doing in libvhost-user code (and
should be doing in any other vhost-user code -- once we did the mmap(),
we should let go of the fd).

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2021-10-20 12:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13 10:33 [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 01/15] memory: Drop mapping check from memory_region_get_ram_discard_manager() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 02/15] kvm: Return number of free memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 03/15] vhost: " David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 04/15] memory: Allow for marking memory region aliases unmergeable David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 05/15] vhost: Don't merge unmergeable memory sections David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 06/15] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 07/15] memory-device: Generalize memory_device_used_region_size() David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 08/15] memory-device: Support memory devices that consume a variable number of memslots David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 09/15] vhost: Respect reserved memslots for memory devices when realizing a vhost device David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 10/15] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 11/15] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 12/15] virtio-mem: Expose device memory via separate memslots David Hildenbrand
2021-10-14 11:45   ` Dr. David Alan Gilbert
2021-10-14 13:17     ` David Hildenbrand
2021-10-20 12:17       ` David Hildenbrand [this message]
2021-10-13 10:33 ` [PATCH RFC 13/15] vhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 496 with CONFIG_VIRTIO_MEM David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 14/15] libvhost-user: Increase VHOST_USER_MAX_RAM_SLOTS to 4096 David Hildenbrand
2021-10-13 10:33 ` [PATCH RFC 15/15] virtio-mem: Set "max-memslots" to 0 (auto) for the 6.2 machine David Hildenbrand
2021-10-13 19:03 ` [PATCH RFC 00/15] virtio-mem: Expose device memory via separate memslots Dr. David Alan Gilbert
2021-10-14  7:01   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81fc0417-8335-cbce-e4ad-53cbb52183a6@redhat.com \
    --to=david@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).