[virtio-dev] Re: Constraining where a guest may allocate virtio accessible resources

Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: virtio-dev@lists.oasis-open.org,
	"David Hildenbrand" <david@redhat.com>,
	jan.kiszka@siemens.com,
	"Srivatsa Vaddagiri" <vatsa@codeaurora.org>,
	"Azzedine Touzni" <atouzni@qti.qualcomm.com>,
	"François Ozog" <francois.ozog@linaro.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Soni, Trilok" <tsoni@quicinc.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Jean-Philippe Brucker" <jean-philippe@linaro.org>
Subject: [virtio-dev] Re: Constraining where a guest may allocate virtio accessible resources
Date: Thu, 18 Jun 2020 03:30:08 -0400	[thread overview]
Message-ID: <20200618032405-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <87a7194kgt.fsf@linaro.org>

On Wed, Jun 17, 2020 at 06:31:15PM +0100, Alex BennÃ©e wrote:
> 
> Hi,
> 
> This follows on from the discussion in the last thread I raised:
> 
>   Subject: Backend libraries for VirtIO device emulation
>   Date: Fri, 06 Mar 2020 18:33:57 +0000
>   Message-ID: <874kv15o4q.fsf@linaro.org>
> 
> To support the concept of a VirtIO backend having limited visibility of
> a guests memory space there needs to be some mechanism to limit the
> where that guest may place things. A simple VirtIO device can be
> expressed purely in virt resources, for example:
> 
>    * status, feature and config fields
>    * notification/doorbell
>    * one or more virtqueues
> 
> Using a PCI backend the location of everything but the virtqueues it
> controlled by the mapping of the PCI device so something that is
> controllable by the host/hypervisor. However the guest is free to
> allocate the virtqueues anywhere in the virtual address space of system
> RAM.
> 
> In theory this shouldn't matter because sharing virtual pages is just a
> matter of putting the appropriate translations in place. However there
> are multiple ways the host and guest may interact:
> 
> * QEMU TCG
> 
> QEMU sees a block of system memory in it's virtual address space that
> has a one to one mapping with the guests physical address space. If QEMU
> want to share a subset of that address space it can only realistically
> do it for a contiguous region of it's address space which implies the
> guest must use a contiguous region of it's physical address space.
> 
> * QEMU KVM
> 
> The situation here is broadly the same - although both QEMU and the
> guest are seeing a their own virtual views of a linear address space
> which may well actually be a fragmented set of physical pages on the
> host.
> 
> KVM based guests have additional constraints if they ever want to access
> real hardware in the host as you need to ensure any address accessed by
> the guest can be eventually translated into an address that can
> physically access the bus which a device in one (for device
> pass-through). The area also has to be DMA coherent so updates from a
> bus are reliably visible to software accessing the same address space.
> 
> * Xen (and other type-1's?)
> 
> Here the situation is a little different because the guest explicitly
> makes it's pages visible to other domains by way of grant tables. The
> guest is still free to use whatever parts of its address space it wishes
> to. Other domains then request access to those pages via the hypervisor.
> 
> In theory the requester is free to map the granted pages anywhere in
> its own address space. However there are differences between the
> architectures on how well this is supported.
> 
> So I think this makes a case for having a mechanism by which the guest
> can restrict it's allocation to a specific area of the guest physical
> address space. The question is then what is the best way to inform the
> guest kernel of the limitation?

Something that's unclear to me is whether you envision each
device to have its own dedicated memory it can access,
or broadly to have a couple of groups of devices,
kind of like e.g. there are 32 bit and 64 bit DMA capable pci devices,
or like we have devices with VIRTIO_F_ACCESS_PLATFORM and
without it?


> Option 1 - Kernel Command Line
> ==============================
> 
> This isn't without precedent - the kernel supports options like "memmap"
> which can with the appropriate amount of crafting be used to carve out
> sections of bad ram from the physical address space. Other formulations
> can be used to mark specific areas of the address space as particular
> types of memory.  
> 
> However there are cons to this approach as it then becomes a job for
> whatever builds the VMM command lines to ensure the both the backend and
> the kernel know where things are. It is also very Linux centric and
> doesn't solve the problem for other guest OSes. Considering the rest of
> VirtIO can be made discover-able this seems like it would be a backward
> step.
> 
> Option 2 - Additional Platform Data
> ===================================
> 
> This would be extending using something like device tree or ACPI tables
> which could define regions of memory that would inform the low level
> memory allocation routines where they could allocate from. There is
> already of the concept of "dma-ranges" in device tree which can be a
> per-device property which defines the region of space that is DMA
> coherent for a device.
> 
> There is the question of how you tie regions declared here with the
> eventual instantiating of the VirtIO devices?
> 
> For a fully distributed set of backends (one backend per device per
> worker VM) you would need several different regions. Would each region
> be tied to each device or just a set of areas the guest would allocate
> from in sequence?
> 
> Option 3 - Abusing PCI Regions
> ==============================
> 
> One of the reasons to use the VirtIO PCI backend it to help with
> automatic probing and setup. Could we define a new PCI region which on
> backend just maps to RAM but from the front-ends point of view is a
> region it can allocate it's virtqueues? Could we go one step further and
> just let the host to define and allocate the virtqueue in the reserved
> PCI space and pass the base of it somehow?
> 
> Options 4 - Extend VirtIO Config
> ================================
> 
> Another approach would be to extend the VirtIO configuration and
> start-up handshake to supply these limitations to the guest. This could
> be handled by the addition of a feature bit (VIRTIO_F_HOST_QUEUE?) and
> additional configuration information.
> 
> One problem I can foresee is device initialisation is usually done
> fairly late in the start-up of a kernel by which time any memory zoning
> restrictions will likely need to have informed the kernels low level
> memory management. Does that mean we would have to combine such a
> feature behaviour with a another method anyway?
> 
> Option 5 - Additional Device
> ============================
> 
> The final approach would be to tie the allocation of virtqueues to
> memory regions as defined by additional devices. For example the
> proposed IVSHMEMv2 spec offers the ability for the hypervisor to present
> a fixed non-mappable region of the address space. Other proposals like
> virtio-mem allow for hot plugging of "physical" memory into the guest
> (conveniently treatable as separate shareable memory objects for QEMU
> ;-).

Another approach would be supplying this information through virtio-iommu.
That already has topology information, and can be used together with
VIRTIO_F_ACCESS_PLATFORM to limit device access to memory.
As virtio iommu is fairly new I kind of like this approach myself -
not a lot of legacy to contend with.

> 
> Closing Thoughts and Open Questions
> ===================================
> 
> Currently all of this is considering just virtqueues themselves but of
> course only a subset of devices interact purely by virtqueue messages.
> Network and Block devices often end up filling up additional structures
> in memory that are usually across the whole of system memory. To achieve
> better isolation you either need to ensure that specific bits of kernel
> allocation are done in certain regions (i.e. block cache in "shared"
> region) or implement some sort of bounce buffer [1] that allows you to bring
> data from backend to frontend (which is more like the channel concept of
> Xen's PV).
> 
> I suspect the solution will end up being a combination of all of these
> approaches. There setup of different systems might mean we need a
> plethora of ways to carve out and define regions in ways a kernel can
> understand and make decisions about.
> 
> I think there will always have to be an element of VirtIO config
> involved as that is *the* mechanism by which front/back end negotiate if
> they can get up and running in a way they are both happy with.
> 
> One potential approach would be to introduce the concept of a region id
> at the VirtIO config level which is simply a reasonably unique magic
> number that virtio driver passes down into the kernel when requesting
> memory for it's virtqueues. It could then be left to the kernel to
> associate use that id when identifying the physical address range to
> allocate from. This seems a bit of a loose binding between the driver
> level and the kernel level but perhaps that is preferable to allow for
> flexibility about how such regions are discovered by kernels?
> 
> I hope this message hasn't rambled on to much. I feel this is a complex
> topic and I'm want to be sure I've thought through all the potential
> options before starting to prototype a solution. For those that have
> made it this far the final questions are:
> 
>   - is constraining guest allocation of virtqueues a reasonable requirement?
> 
>   - could virtqueues ever be directly host/hypervisor assigned?
> 
>   - should there be a tight or loose coupling between front-end driver
>     and kernel/hypervisor support for allocating memory?
> 
> Of course if this is all solvable with existing code I'd be more than
> happy but please let me know how ;-)
> 
> Regards,
> 
> 
> -- 
> Alex BennÃ©e
> 
> [1] Example bounce buffer approach
> 
> Subject: [PATCH 0/5] virtio on Type-1 hypervisor
> Message-Id: <1588073958-1793-1-git-send-email-vatsa@codeaurora.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

next prev parent reply	other threads:[~2020-06-18  7:30 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17 17:31 [virtio-dev] Constraining where a guest may allocate virtio accessible resources Alex Bennée
2020-06-17 18:01 ` [virtio-dev] " Jan Kiszka
2020-06-18 13:29   ` Stefan Hajnoczi
2020-06-18 13:59     ` Jan Kiszka
2020-06-18 14:52       ` Michael S. Tsirkin
2020-06-18 14:58         ` Jan Kiszka
2020-06-18 15:05           ` Michael S. Tsirkin
2020-06-18 15:22             ` Jan Kiszka
2020-06-18 15:29               ` Michael S. Tsirkin
2020-07-03 12:22                 ` Stefan Hajnoczi
2020-06-18 13:53   ` Laszlo Ersek
2020-06-19 15:16   ` Alex Bennée
2020-06-18  7:30 ` Michael S. Tsirkin [this message]
2020-06-19 18:20   ` Alex Bennée
2020-06-18 13:25 ` Stefan Hajnoczi
2020-06-19 17:35   ` Alex Bennée
2020-07-03 13:14     ` Stefan Hajnoczi
2020-06-19  8:02 ` Jean-Philippe Brucker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200618032405-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=atouzni@qti.qualcomm.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=francois.ozog@linaro.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jan.kiszka@siemens.com \
    --cc=jean-philippe@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=tsoni@quicinc.com \
    --cc=vatsa@codeaurora.org \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox