All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Ani Sinha" <ani@anisinha.ca>, "Peter Xu" <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Hui Zhu" <teawater@gmail.com>,
	"Sebastien Boeuf" <sebastien.boeuf@intel.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots
Date: Tue, 2 Nov 2021 07:35:31 -0400	[thread overview]
Message-ID: <20211102072843-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <a5c94705-b66d-1b19-1c1f-52e99d9dacce@redhat.com>

On Tue, Nov 02, 2021 at 09:33:55AM +0100, David Hildenbrand wrote:
> On 01.11.21 23:15, Michael S. Tsirkin wrote:
> > On Wed, Oct 27, 2021 at 02:45:19PM +0200, David Hildenbrand wrote:
> >> This is the follow-up of [1], dropping auto-detection and vhost-user
> >> changes from the initial RFC.
> >>
> >> Based-on: 20211011175346.15499-1-david@redhat.com
> >>
> >> A virtio-mem device is represented by a single large RAM memory region
> >> backed by a single large mmap.
> >>
> >> Right now, we map that complete memory region into guest physical addres
> >> space, resulting in a very large memory mapping, KVM memory slot, ...
> >> although only a small amount of memory might actually be exposed to the VM.
> >>
> >> For example, when starting a VM with a 1 TiB virtio-mem device that only
> >> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
> >> in order to hotplug more memory later, we waste a lot of memory on metadata
> >> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
> >> optimizations in KVM are being worked on to reduce this metadata overhead
> >> on x86-64 in some cases, it remains a problem with nested VMs and there are
> >> other reasons why we would want to reduce the total memory slot to a
> >> reasonable minimum.
> >>
> >> We want to:
> >> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
> >>    inside QEMU KVM code where possible.
> >> b) Not always expose all device-memory to the VM, to reduce the attack
> >>    surface of malicious VMs without using userfaultfd.
> > 
> > I'm confused by the mention of these security considerations,
> > and I expect users will be just as confused.
> 
> Malicious VMs wanting to consume more memory than desired is only
> relevant when running untrusted VMs in some environments, and it can be
> caught differently, for example, by carefully monitoring and limiting
> the maximum memory consumption of a VM. We have the same issue already
> when using virtio-balloon to logically unplug memory. For me, it's a
> secondary concern ( optimizing a is much more important ).
> 
> Some users showed interest in having QEMU disallow access to unplugged
> memory, because coming up with a maximum memory consumption for a VM is
> hard. This is one step into that direction without having to run with
> uffd enabled all of the time.

Sorry about missing the memo - is there a lot of overhead associated
with uffd then?

> ("security is somewhat the wrong word. we won't be able to steal any
> information from the hypervisor.)

Right. Let's just spell it out.
Further, removing memory still requires guest cooperation.

> 
> > So let's say user wants to not be exposed. What value for
> > the option should be used? What if a lower option is used?
> > Is there still some security advantage?
> 
> My recommendation will be to use 1 memslot per gigabyte as default if
> possible in the configuration. If we have a virtio-mem devices with a
> maximum size of 128 GiB, the suggestion will be to use memslots=128.
> Some setups will require less (e.g., vhost-user until adjusted, old
> KVM), some setups can allow for more. I assume that most users will
> later set "memslots=0", to enable auto-detection mode.
> 
> 
> Assume we have a virtio-mem device with a maximum size of 1 TiB and we
> hotplugged 1 GiB to the VM. With "memslots=1", the malicious VM could
> actually access the whole 1 TiB. With "memslots=1024", the malicious VM
> could only access additional ~ 1 GiB. With "memslots=512", ~ 2 GiB.
> That's the reduced attack surface.
> 
> Of course, it's different after we hotunplugged memory, before we have
> VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE support in QEMU, because all memory
> inside the usable region has to be accessible and we cannot "unplug" the
> memslots.
> 
> 
> Note: With upcoming VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE changes in QEMU,
> one will be able to disallow any access for malicious VMs by setting the
> memblock size just as big as the device block size.
> 
> So with a 128 GiB virtio-mem device with memslots=128,block-size=1G, or
> with memslots=1024,block-size=128M we could make it impossible for a
> malicious VM to consume more memory than intended. But we lose
> flexibility due to the block size and the limited number of available
> memslots.
> 
> But again, for "full protection against malicious VMs" I consider
> userfaultfd protection more flexible. This approach here gives some
> advantage, especially when having large virtio-mem devices that start
> out small.
> 
> -- 
> Thanks,
> 
> David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: "Eduardo Habkost" <ehabkost@redhat.com>,
	kvm@vger.kernel.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	qemu-devel@nongnu.org, "Peter Xu" <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Sebastien Boeuf" <sebastien.boeuf@intel.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Ani Sinha" <ani@anisinha.ca>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Hui Zhu" <teawater@gmail.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>
Subject: Re: [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots
Date: Tue, 2 Nov 2021 07:35:31 -0400	[thread overview]
Message-ID: <20211102072843-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <a5c94705-b66d-1b19-1c1f-52e99d9dacce@redhat.com>

On Tue, Nov 02, 2021 at 09:33:55AM +0100, David Hildenbrand wrote:
> On 01.11.21 23:15, Michael S. Tsirkin wrote:
> > On Wed, Oct 27, 2021 at 02:45:19PM +0200, David Hildenbrand wrote:
> >> This is the follow-up of [1], dropping auto-detection and vhost-user
> >> changes from the initial RFC.
> >>
> >> Based-on: 20211011175346.15499-1-david@redhat.com
> >>
> >> A virtio-mem device is represented by a single large RAM memory region
> >> backed by a single large mmap.
> >>
> >> Right now, we map that complete memory region into guest physical addres
> >> space, resulting in a very large memory mapping, KVM memory slot, ...
> >> although only a small amount of memory might actually be exposed to the VM.
> >>
> >> For example, when starting a VM with a 1 TiB virtio-mem device that only
> >> exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
> >> in order to hotplug more memory later, we waste a lot of memory on metadata
> >> for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
> >> optimizations in KVM are being worked on to reduce this metadata overhead
> >> on x86-64 in some cases, it remains a problem with nested VMs and there are
> >> other reasons why we would want to reduce the total memory slot to a
> >> reasonable minimum.
> >>
> >> We want to:
> >> a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
> >>    inside QEMU KVM code where possible.
> >> b) Not always expose all device-memory to the VM, to reduce the attack
> >>    surface of malicious VMs without using userfaultfd.
> > 
> > I'm confused by the mention of these security considerations,
> > and I expect users will be just as confused.
> 
> Malicious VMs wanting to consume more memory than desired is only
> relevant when running untrusted VMs in some environments, and it can be
> caught differently, for example, by carefully monitoring and limiting
> the maximum memory consumption of a VM. We have the same issue already
> when using virtio-balloon to logically unplug memory. For me, it's a
> secondary concern ( optimizing a is much more important ).
> 
> Some users showed interest in having QEMU disallow access to unplugged
> memory, because coming up with a maximum memory consumption for a VM is
> hard. This is one step into that direction without having to run with
> uffd enabled all of the time.

Sorry about missing the memo - is there a lot of overhead associated
with uffd then?

> ("security is somewhat the wrong word. we won't be able to steal any
> information from the hypervisor.)

Right. Let's just spell it out.
Further, removing memory still requires guest cooperation.

> 
> > So let's say user wants to not be exposed. What value for
> > the option should be used? What if a lower option is used?
> > Is there still some security advantage?
> 
> My recommendation will be to use 1 memslot per gigabyte as default if
> possible in the configuration. If we have a virtio-mem devices with a
> maximum size of 128 GiB, the suggestion will be to use memslots=128.
> Some setups will require less (e.g., vhost-user until adjusted, old
> KVM), some setups can allow for more. I assume that most users will
> later set "memslots=0", to enable auto-detection mode.
> 
> 
> Assume we have a virtio-mem device with a maximum size of 1 TiB and we
> hotplugged 1 GiB to the VM. With "memslots=1", the malicious VM could
> actually access the whole 1 TiB. With "memslots=1024", the malicious VM
> could only access additional ~ 1 GiB. With "memslots=512", ~ 2 GiB.
> That's the reduced attack surface.
> 
> Of course, it's different after we hotunplugged memory, before we have
> VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE support in QEMU, because all memory
> inside the usable region has to be accessible and we cannot "unplug" the
> memslots.
> 
> 
> Note: With upcoming VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE changes in QEMU,
> one will be able to disallow any access for malicious VMs by setting the
> memblock size just as big as the device block size.
> 
> So with a 128 GiB virtio-mem device with memslots=128,block-size=1G, or
> with memslots=1024,block-size=128M we could make it impossible for a
> malicious VM to consume more memory than intended. But we lose
> flexibility due to the block size and the limited number of available
> memslots.
> 
> But again, for "full protection against malicious VMs" I consider
> userfaultfd protection more flexible. This approach here gives some
> advantage, especially when having large virtio-mem devices that start
> out small.
> 
> -- 
> Thanks,
> 
> David / dhildenb



  reply	other threads:[~2021-11-02 11:35 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27 12:45 [PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots David Hildenbrand
2021-10-27 12:45 ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 01/12] kvm: Return number of free memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 02/12] vhost: " David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 13:36   ` Philippe Mathieu-Daudé
2021-10-27 13:36     ` Philippe Mathieu-Daudé
2021-10-27 13:37     ` David Hildenbrand
2021-10-27 13:37       ` David Hildenbrand
2021-10-27 14:04     ` David Hildenbrand
2021-10-27 14:04       ` David Hildenbrand
2021-10-27 14:11       ` Philippe Mathieu-Daudé
2021-10-27 14:11         ` Philippe Mathieu-Daudé
2021-10-27 15:33         ` Michael S. Tsirkin
2021-10-27 15:33           ` Michael S. Tsirkin
2021-10-27 15:45           ` David Hildenbrand
2021-10-27 15:45             ` David Hildenbrand
2021-10-27 16:11             ` Philippe Mathieu-Daudé
2021-10-27 16:11               ` Philippe Mathieu-Daudé
2021-10-27 16:51               ` David Hildenbrand
2021-10-27 16:51                 ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 03/12] memory: Allow for marking memory region aliases unmergeable David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 04/12] vhost: Don't merge unmergeable memory sections David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 05/12] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 06/12] memory-device: Generalize memory_device_used_region_size() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 07/12] memory-device: Support memory devices that dynamically consume multiple memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 08/12] vhost: Respect reserved memslots for memory devices when realizing a vhost device David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 09/12] memory: Drop mapping check from memory_region_get_ram_discard_manager() David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 10/12] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2022-12-28 14:05   ` Philippe Mathieu-Daudé
2022-12-28 14:06     ` David Hildenbrand
2022-12-28 14:07       ` Philippe Mathieu-Daudé
2021-10-27 12:45 ` [PATCH v1 11/12] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-10-27 12:45 ` [PATCH v1 12/12] virtio-mem: Expose device memory via multiple memslots David Hildenbrand
2021-10-27 12:45   ` David Hildenbrand
2021-11-01 22:15 ` [PATCH v1 00/12] " Michael S. Tsirkin
2021-11-01 22:15   ` Michael S. Tsirkin
2021-11-02  8:33   ` David Hildenbrand
2021-11-02  8:33     ` David Hildenbrand
2021-11-02 11:35     ` Michael S. Tsirkin [this message]
2021-11-02 11:35       ` Michael S. Tsirkin
2021-11-02 11:55       ` David Hildenbrand
2021-11-02 11:55         ` David Hildenbrand
2021-11-02 17:06         ` Michael S. Tsirkin
2021-11-02 17:06           ` Michael S. Tsirkin
2021-11-02 17:10           ` David Hildenbrand
2021-11-02 17:10             ` David Hildenbrand
2021-11-07  8:14             ` Michael S. Tsirkin
2021-11-07  8:14               ` Michael S. Tsirkin
2021-11-07  9:21               ` David Hildenbrand
2021-11-07  9:21                 ` David Hildenbrand
2021-11-07 10:21                 ` Michael S. Tsirkin
2021-11-07 10:21                   ` Michael S. Tsirkin
2021-11-07 10:53                   ` David Hildenbrand
2021-11-07 10:53                     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211102072843-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=ani@anisinha.ca \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sebastien.boeuf@intel.com \
    --cc=stefanha@redhat.com \
    --cc=teawater@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.