[virtio-dev] VM memory protection and zero-copy transfers.

Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed

* [virtio-dev] VM memory protection and zero-copy transfers.
@ 2022-07-08 13:56 Afsa, Baptiste
  2022-07-12 13:49 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Afsa, Baptiste @ 2022-07-08 13:56 UTC (permalink / raw)
  To: virtio-dev@lists.oasis-open.org

Hello everyone,

The traditional virtio model relies on the ability for the host to access the
entire memory of the guest VM. Virtio is also used in system configurations
where the devices are not featured by the host (which may not exist as such in
the case of a Type-1 hypervisor) but by another, unprivileged guest VM. In such
a configuration, the guest VM memory sharing requirement would raise security
concerns.

The following proposal removes that requirement by introducing an alternative
model where the interactions between the virtio driver and the virtio device are
mediated by the hypervisor. This concept is applicable to both Type-1 and Type-2
hypervisors. In the following write-up, the "host" thus refers either to the
host OS or to the guest VM that executes the virtio device.

The main objective is to keep the memory of the VM that runs the driver isolated
from the memory that runs the device, while still allowing zero-copy transfers
between the two domains. The operations that control the exchange of the virtio
buffers are handled by hypervisor code that sits between the device and the
driver.

As opposed to the regular virtio model, the virtqueues allocated by the driver
are not shared with the device directly. Instead, the hypervisor allocates a
separate set of virtqueues that have the same sizes as the original ones and
shares this second set with the device. These hypervisor-allocated virtqueues
are referred as the "shadow virtqueues".

During device operation, the hypervisor copies the descriptors between the
driver and the shadow virtqueues as the buffers cycle between the driver and the
device.

Whenever the driver adds some buffers to the available ring, the hypervisor
validates the descriptors and dynamically grants the I/O buffers to the host or
VM that runs the device. The hypervisor then copies these descriptors to the
shadow virtqueue's available ring. At the other end, when the device returns
buffers to the shadow virtqueue's used ring, the hypervisor unmaps these buffers
from the host's address space and copy the descriptor to the driver's used ring.

Although the virtio buffers can be allocated anywhere in the guest memory and
are not necessarily page-aligned, the memory sharing granularity is constrained
by the page size. So when a buffer is mapped to the host address space, the
hypervisor may end up sharing more memory that what is strictly needed.

The cost of granting the memory dynamically as virtio transfers goes is
significant, though. We measured up to 40% performance degradation when using
this dynamic buffer granting mechanism.

We also compared this solution to other approaches that we have seen elsewhere.
For instance, using the swiotlb mechanism along with the
VIRTIO_F_ACCESS_PLATFORM feature bit to force a copy of the I/O buffers to a
statically shared memory region. In that case, the same set of benchmarks shows
an even bigger performance degradation, up to 60%, compared to the original
virtio performance.

Although the shadow virtqueue concept looks fairly simple, there is still one
point that has not been covered yet: indirect descriptors.

To support indirect descriptors, the following two options were considered
initially:

  1. Grant the indirect descriptor as-is to the host while it is on the used
     ring. This introduces a security issue because a compromised guest OS can
     modify the indirect descriptor after it has been pushed to the available
     ring. This would cause the device to fault while trying to access any
     arbitrary memory that was not actually granted.

     Note that in the shadow virtqueue model, there is no need for the device to
     validate the descriptors in the available rings, because the hypervisor
     already performed such checks before granting the memory.

  2. Follow the same logic that is used for the "normal" descriptors and
     introduce shadow indirect descriptors. This would require the hypervisor to
     provision a memory pool to allocate these shadow indirect descriptors and
     determining the size of this pool may not be trivial.

     Additionally, indirect descriptors can be as large as the driver wants them
     to be, something that can cause the hypervisor to copy an arbitrary large
     amount of data.

An alternative approach consists in introducing a new virtio feature bit. This
feature bit, when set by the device, instructs the driver to allocate indirect
descriptors using dedicated memory pages. These pages shall hold no other data
than the indirect descriptors. Since a correct virtio driver implementation does
not modify an indirect descriptor once it has been pushed to the device, the
pages where the indirect descriptors lies can later on be remapped as read-only
in the guest address space.

This allows the hypervisor to validate the content of the indirect descriptor,
grant it to the host (along with all the buffers referenced by this descriptor)
and remap the indirect descriptor read-only in the guest address space as long
as it is granted to the host (i.e. until the indirect descriptor is returned
through the used ring).

The present proposal has some obvious drawbacks, but we believe that memory
protection will not come for free. We know that there are other folks out there
which try to address this issue of memory sharing between VMs, so we would be
pleased to hear what you guys think about this approach.

Additionally, we would like to know whether a feature bit similar to the one
that was discussed here could be considered for addition to the virtio standard.

Looking forwards to hear from you.
Baptiste

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-dev] VM memory protection and zero-copy transfers.
  2022-07-08 13:56 [virtio-dev] VM memory protection and zero-copy transfers Afsa, Baptiste
@ 2022-07-12 13:49 ` Stefan Hajnoczi
  2022-07-15 14:18   ` Afsa, Baptiste
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2022-07-12 13:49 UTC (permalink / raw)
  To: Afsa, Baptiste
  Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 9997 bytes --]

On Fri, Jul 08, 2022 at 01:56:31PM +0000, Afsa, Baptiste wrote:
> Hello everyone,
> 
> The traditional virtio model relies on the ability for the host to access the
> entire memory of the guest VM.

The VIRTIO device model (virtqueues, configuration space, feature
negotiation, etc) does not rely on shared memory access between the
device and the driver.

There is a shared memory resource in the device model that some devices
use, but that's the only thing that requires shared memory.

It's the virtio-pci, virtio-mmio, etc transports and their use of the
vring layout that requires shared memory access.

This might seem pedantic but there's a practical reason for making the
distinction. It should be possible to have a virtio-tcp or other message
passing transport for VIRTIO one day. Correctly layered drivers will
work regardless of whether the underlying transport relies on shared
memory or message passing.

> Virtio is also used in system configurations
> where the devices are not featured by the host (which may not exist as such in
> the case of a Type-1 hypervisor) but by another, unprivileged guest VM. In such
> a configuration, the guest VM memory sharing requirement would raise security
> concerns.

Guest drivers can use IOMMU functionality to restrict device access to
memory, if available from the transport. For example, a virtio-pci
driver implementation can program the IOMMU to allow read/write access
only to the vring and virtqueue buffer pages.

> The following proposal removes that requirement by introducing an alternative
> model where the interactions between the virtio driver and the virtio device are
> mediated by the hypervisor. This concept is applicable to both Type-1 and Type-2
> hypervisors. In the following write-up, the "host" thus refers either to the
> host OS or to the guest VM that executes the virtio device.
> 
> The main objective is to keep the memory of the VM that runs the driver isolated
> from the memory that runs the device, while still allowing zero-copy transfers
> between the two domains. The operations that control the exchange of the virtio
> buffers are handled by hypervisor code that sits between the device and the
> driver.
> 
> As opposed to the regular virtio model, the virtqueues allocated by the driver
> are not shared with the device directly. Instead, the hypervisor allocates a
> separate set of virtqueues that have the same sizes as the original ones and
> shares this second set with the device. These hypervisor-allocated virtqueues
> are referred as the "shadow virtqueues".
> 
> During device operation, the hypervisor copies the descriptors between the
> driver and the shadow virtqueues as the buffers cycle between the driver and the
> device.
> 
> Whenever the driver adds some buffers to the available ring, the hypervisor
> validates the descriptors and dynamically grants the I/O buffers to the host or
> VM that runs the device. The hypervisor then copies these descriptors to the
> shadow virtqueue's available ring. At the other end, when the device returns
> buffers to the shadow virtqueue's used ring, the hypervisor unmaps these buffers
> from the host's address space and copy the descriptor to the driver's used ring.
> 
> Although the virtio buffers can be allocated anywhere in the guest memory and
> are not necessarily page-aligned, the memory sharing granularity is constrained
> by the page size. So when a buffer is mapped to the host address space, the
> hypervisor may end up sharing more memory that what is strictly needed.
> 
> The cost of granting the memory dynamically as virtio transfers goes is
> significant, though. We measured up to 40% performance degradation when using
> this dynamic buffer granting mechanism.
> 
> We also compared this solution to other approaches that we have seen elsewhere.
> For instance, using the swiotlb mechanism along with the
> VIRTIO_F_ACCESS_PLATFORM feature bit to force a copy of the I/O buffers to a
> statically shared memory region. In that case, the same set of benchmarks shows
> an even bigger performance degradation, up to 60%, compared to the original
> virtio performance.

Did you try virtio-pci with an IOMMU? The advantage compared to both
your proposal and swiotlb is that workloads that reuse buffers have no
performance overhead because the IOMMU mappings remain in place across
virtqueue requests.

I have CCed Jean-Philippe Brucker <jean-philippe@linaro.org> who
designed the virtio-iommu device.

Using an IOMMU can be slower than the approach you are proposing when
each request requires new mappings. That's because your approach
combines the virtqueue kick processing with the page granting whereas
programming an IOMMU with map/unmap commands is a separate vmexit from
the virtqueue kick. It's probably easier to make your approach faster in
the dynamic mappings case for this reason.

A page-table based IOMMU (doesn't require explicit map/unamp commands
because it reads mappings on-demand from a page table structure) might
perform better than one that needs to be programmed for each each
map/unmap operation. It still needs a kick (vmexit) for invalidation but
it might be possible for a design of this type to avoid vmexits in the
common case.

> 
> Although the shadow virtqueue concept looks fairly simple, there is still one
> point that has not been covered yet: indirect descriptors.
> 
> To support indirect descriptors, the following two options were considered
> initially:
> 
>   1. Grant the indirect descriptor as-is to the host while it is on the used
>      ring. This introduces a security issue because a compromised guest OS can
>      modify the indirect descriptor after it has been pushed to the available
>      ring. This would cause the device to fault while trying to access any
>      arbitrary memory that was not actually granted.
> 
>      Note that in the shadow virtqueue model, there is no need for the device to
>      validate the descriptors in the available rings, because the hypervisor
>      already performed such checks before granting the memory.

Assuming that the driver can trust the device isn't possible in all use
cases. Hardware VIRTIO device implementations, VDUSE
(https://docs.kernel.org/userspace-api/vduse.html), and Confidential
Computing are 3 use cases where the device is untrusted. If you make the
assumption then it's important to clearly mark the code so it won't be
reused in a context where that would be a security problem.

> 
>   2. Follow the same logic that is used for the "normal" descriptors and
>      introduce shadow indirect descriptors. This would require the hypervisor to
>      provision a memory pool to allocate these shadow indirect descriptors and
>      determining the size of this pool may not be trivial.
> 
>      Additionally, indirect descriptors can be as large as the driver wants them
>      to be, something that can cause the hypervisor to copy an arbitrary large
>      amount of data.

I agree that it's unfortunate that indirect descriptors would require
some kind of dynamic memory in the hypervisor. However, the statement
about indirect descriptor size is incorrect. They are limited by Queue
Size:

  VIRTIO 1.2 2.7.5.3.1 Driver Requirements: Indirect Descriptors

  A driver MUST NOT create a descriptor chain longer than the Queue Size
  of the device.

> 
> An alternative approach consists in introducing a new virtio feature bit. This
> feature bit, when set by the device, instructs the driver to allocate indirect
> descriptors using dedicated memory pages. These pages shall hold no other data
> than the indirect descriptors. Since a correct virtio driver implementation does
> not modify an indirect descriptor once it has been pushed to the device, the
> pages where the indirect descriptors lies can later on be remapped as read-only
> in the guest address space.
> 
> This allows the hypervisor to validate the content of the indirect descriptor,
> grant it to the host (along with all the buffers referenced by this descriptor)
> and remap the indirect descriptor read-only in the guest address space as long
> as it is granted to the host (i.e. until the indirect descriptor is returned
> through the used ring).

That sounds very slow (2 page table updates per request).

> The present proposal has some obvious drawbacks, but we believe that memory
> protection will not come for free. We know that there are other folks out there
> which try to address this issue of memory sharing between VMs, so we would be
> pleased to hear what you guys think about this approach.
> 
> Additionally, we would like to know whether a feature bit similar to the one
> that was discussed here could be considered for addition to the virtio standard.

Memory isolation is hard to do efficiently. It would be great to discuss
your proposal more with the VIRTIO community and then send a spec patch
for detailed review and voting.

One thing I didn't see in your proposal was a copying vs zero-copy
threshold. Maybe it helps to look at the size of requests and copy data
instead of granting pages when descriptors are small? On the other hand,
a 4 KB page size means that many descriptors won't be larger than 4 KB
anyway due to guest physical memory fragmentation. This is basically
hybrid of swiotlb and your proposal - zero-copy when it pays off,
copying when it's cheap.

As I mentioned, I think IOMMUs are worth investigating, in particular
for the case where mappings are rarely changed. They are fast in that
case.

By the way, KVM Forum is coming up in September 2022 in Dublin, Ireland
where Linux Plumbers Conference, LinuxCon Europe, Open Source Summit
Europe, and other conferences are also taking place. That's a good venue
to meet with others interested in VIRTIO and discussing your idea.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-dev] VM memory protection and zero-copy transfers.
  2022-07-12 13:49 ` Stefan Hajnoczi
@ 2022-07-15 14:18   ` Afsa, Baptiste
  2022-07-15 16:50     ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Afsa, Baptiste @ 2022-07-15 14:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker,
	Michael S. Tsirkin

> Did you try virtio-pci with an IOMMU? The advantage compared to both
> your proposal and swiotlb is that workloads that reuse buffers have no
> performance overhead because the IOMMU mappings remain in place across
> virtqueue requests.
>
> I have CCed Jean-Philippe Brucker <jean-philippe@linaro.org> who
> designed the virtio-iommu device.
>
> Using an IOMMU can be slower than the approach you are proposing when
> each request requires new mappings. That's because your approach
> combines the virtqueue kick processing with the page granting whereas
> programming an IOMMU with map/unmap commands is a separate vmexit from
> the virtqueue kick. It's probably easier to make your approach faster in
> the dynamic mappings case for this reason.
>
> A page-table based IOMMU (doesn't require explicit map/unamp commands
> because it reads mappings on-demand from a page table structure) might
> perform better than one that needs to be programmed for each each
> map/unmap operation. It still needs a kick (vmexit) for invalidation but
> it might be possible for a design of this type to avoid vmexits in the
> common case.

We considered it at some point with the idea of moving the memory granting to
the driver's side but we did not go in this direction for the reason that you
mentioned, plus the fact that doing it from the hypervisor allows us to validate
the descriptors while granting the memory and moving them to the shadow queue.

However, we did not consider workloads where the buffers are reused. I am
curious to see how much this is used in practice and the sort of gain we are
looking at.

Thank you for pointing this out, this looks like something we should investigate
further.

> > Although the shadow virtqueue concept looks fairly simple, there is still one
> > point that has not been covered yet: indirect descriptors.
> >
> > To support indirect descriptors, the following two options were considered
> > initially:
> >
> >   1. Grant the indirect descriptor as-is to the host while it is on the used
> >      ring. This introduces a security issue because a compromised guest OS can
> >      modify the indirect descriptor after it has been pushed to the available
> >      ring. This would cause the device to fault while trying to access any
> >      arbitrary memory that was not actually granted.
> >
> >      Note that in the shadow virtqueue model, there is no need for the device to
> >      validate the descriptors in the available rings, because the hypervisor
> >      already performed such checks before granting the memory.
>
> Assuming that the driver can trust the device isn't possible in all use
> cases. Hardware VIRTIO device implementations, VDUSE
> (https://docs.kernel.org/userspace-api/vduse.html), and Confidential
> Computing are 3 use cases where the device is untrusted. If you make the
> assumption then it's important to clearly mark the code so it won't be
> reused in a context where that would be a security problem.
>
> >
> >   2. Follow the same logic that is used for the "normal" descriptors and
> >      introduce shadow indirect descriptors. This would require the hypervisor to
> >      provision a memory pool to allocate these shadow indirect descriptors and
> >      determining the size of this pool may not be trivial.
> >
> >      Additionally, indirect descriptors can be as large as the driver wants them
> >      to be, something that can cause the hypervisor to copy an arbitrary large
> >      amount of data.
>
> I agree that it's unfortunate that indirect descriptors would require
> some kind of dynamic memory in the hypervisor. However, the statement
> about indirect descriptor size is incorrect. They are limited by Queue
> Size:
>
>   VIRTIO 1.2 2.7.5.3.1 Driver Requirements: Indirect Descriptors
>
>   A driver MUST NOT create a descriptor chain longer than the Queue Size
>   of the device.
>
> >
> > An alternative approach consists in introducing a new virtio feature bit. This
> > feature bit, when set by the device, instructs the driver to allocate indirect
> > descriptors using dedicated memory pages. These pages shall hold no other data
> > than the indirect descriptors. Since a correct virtio driver implementation does
> > not modify an indirect descriptor once it has been pushed to the device, the
> > pages where the indirect descriptors lies can later on be remapped as read-only
> > in the guest address space.
> >
> > This allows the hypervisor to validate the content of the indirect descriptor,
> > grant it to the host (along with all the buffers referenced by this descriptor)
> > and remap the indirect descriptor read-only in the guest address space as long
> > as it is granted to the host (i.e. until the indirect descriptor is returned
> > through the used ring).
>
> That sounds very slow (2 page table updates per request).

Yes it is. As I mentioned, overall this approach is about 40% slower on the set
of benchmarks we used when compared to running virtio devices without memory
isolation.

> > The present proposal has some obvious drawbacks, but we believe that memory
> > protection will not come for free. We know that there are other folks out there
> > which try to address this issue of memory sharing between VMs, so we would be
> > pleased to hear what you guys think about this approach.
> >
> > Additionally, we would like to know whether a feature bit similar to the one
> > that was discussed here could be considered for addition to the virtio standard.
>
> Memory isolation is hard to do efficiently. It would be great to discuss
> your proposal more with the VIRTIO community and then send a spec patch
> for detailed review and voting.

This is exactly why we are here!

I can indeed prepare a spec patch but before doing that I wanted to get some
feedback on this approach and see if anyone in the community would see any
reason not to introduce such a feature bit.

> One thing I didn't see in your proposal was a copying vs zero-copy
> threshold. Maybe it helps to look at the size of requests and copy data
> instead of granting pages when descriptors are small? On the other hand,
> a 4 KB page size means that many descriptors won't be larger than 4 KB
> anyway due to guest physical memory fragmentation. This is basically
> hybrid of swiotlb and your proposal - zero-copy when it pays off,
> copying when it's cheap.
>
> As I mentioned, I think IOMMUs are worth investigating, in particular
> for the case where mappings are rarely changed. They are fast in that
> case.

I agree there is much likely a point until which copying is cheaper. However, we
have not considered this in our initial investigation.

Thank you Stefan for taking the time to read this proposal and for your
feedback.

Baptiste
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-dev] VM memory protection and zero-copy transfers.
  2022-07-15 14:18   ` Afsa, Baptiste
@ 2022-07-15 16:50     ` Stefan Hajnoczi
  2022-09-09  8:52       ` Afsa, Baptiste
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2022-07-15 16:50 UTC (permalink / raw)
  To: Afsa, Baptiste
  Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

On Fri, Jul 15, 2022 at 02:18:32PM +0000, Afsa, Baptiste wrote:
> > One thing I didn't see in your proposal was a copying vs zero-copy
> > threshold. Maybe it helps to look at the size of requests and copy data
> > instead of granting pages when descriptors are small? On the other hand,
> > a 4 KB page size means that many descriptors won't be larger than 4 KB
> > anyway due to guest physical memory fragmentation. This is basically
> > hybrid of swiotlb and your proposal - zero-copy when it pays off,
> > copying when it's cheap.
> >
> > As I mentioned, I think IOMMUs are worth investigating, in particular
> > for the case where mappings are rarely changed. They are fast in that
> > case.
> 
> I agree there is much likely a point until which copying is cheaper. However, we
> have not considered this in our initial investigation.
> 
> Thank you Stefan for taking the time to read this proposal and for your
> feedback.

I will be away until August so I probably won't participate in the
discussion.

One thing I forgot to say is that relying on a single benchmark may not
be representative of real workloads. Two factors we touched on are
descriptor size and page reuse. It seems worth evaluating solutions
based on multiple benchmarks that vary these factors.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-dev] VM memory protection and zero-copy transfers.
  2022-07-15 16:50     ` Stefan Hajnoczi
@ 2022-09-09  8:52       ` Afsa, Baptiste
  2022-09-09 16:05         ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Afsa, Baptiste @ 2022-09-09  8:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker,
	Michael S. Tsirkin

Hello,

I ran some benchmarks to compare the performances achieved by the swiotlb
approach and our dynamic memory granting solution while using different buffer
sizes. Without any surprise, the swiotlb approach performs much better when the
buffers are small. Actually for small buffers, the performances are on par with
the original configuration where the entire memory is shared. Of course these
results are specific to the platform I used and the system workload (e.g. CPU
utilization, caches utilization).

At the moment we are not planning to add a mechanism that would take the
decision between copying or granting the buffers dynamically based on their
sizes, but this experience showed us that devices that uses small packets would
benefits from going through the swiotlb. So we are considering making this
configurable on a per-device basis in our solution.

We have also experimented with the use of a virtual IOMMU on the guest side and
we have a few concerns with this option.

If we add a virtual IOMMU, we can see mapping commands being issued as virtio
buffers are exchanged between the device and the driver. However the kernel
controls the mappings from DMA addresses to physical addresses. In theory, we
could remap the memory in the host address space to "implement" these mappings
but we have some additional constraints that make this approach problematic.

Our solution runs on systems where the physical IOMMU does not support address
translation. So we rely on having an identity mapping between the guest address
space and the physical address space to allow the guest OS to initiate DMA
transactions. If the memory that we import for virtio buffers uses translated
addresses, these buffers cannot be used in DMA transactions.

We also have an issue with letting the driver control the exported memory
through an IOMMU. If we do this, we need to consider what will happen if the
guest unmaps a virtio buffer while it is in use on the device side.

Although it looks possible to recover from such a scenario in the case of a
device doing CPU accesses to the shared memory, things get more complicated if
we start considering that the buffer may be involved in a DMA transaction.

In some previous projects, we have learned that the ability for the hardware
device and/or its associated driver to recover from an aborted transaction is
not something that we can rely upon in the general case.

For this reason, in our typical memory granting scenarios, we usually "lock" the
shared memory regions to prevent the exporter from revoking the mappings until
the importer says it is ok to do so.

Note that locking the mappings could be applied here as well. In this case, we
would still use this concept of shadow virtqueues and the hypervisor would be
responsible for locking/unlocking the virtio buffers as they cycle between the
device and the driver. This design is likely to be slower than the original
implementation as the cost of locking the mappings is significant (i.e. an extra
page table walk to validate the memory regions).

As we discussed in this thread, there are a few options available to enable
virtio in configurations where the VM address spaces are isolated. I think they
all have different trade-offs. Our approach certainly have some drawbacks but it
also addresses some specific considerations that are relevant in our use
case. Different configurations will probably require different solutions to this
question.

What would be the next steps to go forward with adding a new feature bit such as
the one I discussed in my original email? Should we prepare a patch on the
specification and post it here for further discussions?

Baptiste

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-dev] VM memory protection and zero-copy transfers.
  2022-09-09  8:52       ` Afsa, Baptiste
@ 2022-09-09 16:05         ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2022-09-09 16:05 UTC (permalink / raw)
  To: Afsa, Baptiste
  Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker,
	Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 4241 bytes --]

On Fri, Sep 09, 2022 at 08:52:02AM +0000, Afsa, Baptiste wrote:
> Hello,
> 
> I ran some benchmarks to compare the performances achieved by the swiotlb
> approach and our dynamic memory granting solution while using different buffer
> sizes. Without any surprise, the swiotlb approach performs much better when the
> buffers are small. Actually for small buffers, the performances are on par with
> the original configuration where the entire memory is shared. Of course these
> results are specific to the platform I used and the system workload (e.g. CPU
> utilization, caches utilization).
> 
> At the moment we are not planning to add a mechanism that would take the
> decision between copying or granting the buffers dynamically based on their
> sizes, but this experience showed us that devices that uses small packets would
> benefits from going through the swiotlb. So we are considering making this
> configurable on a per-device basis in our solution.
> 
> We have also experimented with the use of a virtual IOMMU on the guest side and
> we have a few concerns with this option.
> 
> If we add a virtual IOMMU, we can see mapping commands being issued as virtio
> buffers are exchanged between the device and the driver. However the kernel
> controls the mappings from DMA addresses to physical addresses. In theory, we
> could remap the memory in the host address space to "implement" these mappings
> but we have some additional constraints that make this approach problematic.
> 
> Our solution runs on systems where the physical IOMMU does not support address
> translation. So we rely on having an identity mapping between the guest address
> space and the physical address space to allow the guest OS to initiate DMA
> transactions. If the memory that we import for virtio buffers uses translated
> addresses, these buffers cannot be used in DMA transactions.
> 
> We also have an issue with letting the driver control the exported memory
> through an IOMMU. If we do this, we need to consider what will happen if the
> guest unmaps a virtio buffer while it is in use on the device side.
> 
> Although it looks possible to recover from such a scenario in the case of a
> device doing CPU accesses to the shared memory, things get more complicated if
> we start considering that the buffer may be involved in a DMA transaction.
> 
> In some previous projects, we have learned that the ability for the hardware
> device and/or its associated driver to recover from an aborted transaction is
> not something that we can rely upon in the general case.
> 
> For this reason, in our typical memory granting scenarios, we usually "lock" the
> shared memory regions to prevent the exporter from revoking the mappings until
> the importer says it is ok to do so.
> 
> Note that locking the mappings could be applied here as well. In this case, we
> would still use this concept of shadow virtqueues and the hypervisor would be
> responsible for locking/unlocking the virtio buffers as they cycle between the
> device and the driver. This design is likely to be slower than the original
> implementation as the cost of locking the mappings is significant (i.e. an extra
> page table walk to validate the memory regions).
> 
> As we discussed in this thread, there are a few options available to enable
> virtio in configurations where the VM address spaces are isolated. I think they
> all have different trade-offs. Our approach certainly have some drawbacks but it
> also addresses some specific considerations that are relevant in our use
> case. Different configurations will probably require different solutions to this
> question.
> 
> What would be the next steps to go forward with adding a new feature bit such as
> the one I discussed in my original email? Should we prepare a patch on the
> specification and post it here for further discussions?

Hi Baptiste,
Thanks for sharing your findings. As the next step please send your
VIRTIO spec change proposal to the mailing list so they can be discussed
in detail.

Example commands for sending spec patch emails are here:
https://github.com/oasis-tcs/virtio-spec#providing-feedback

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-09 16:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-08 13:56 [virtio-dev] VM memory protection and zero-copy transfers Afsa, Baptiste
2022-07-12 13:49 ` Stefan Hajnoczi
2022-07-15 14:18   ` Afsa, Baptiste
2022-07-15 16:50     ` Stefan Hajnoczi
2022-09-09  8:52       ` Afsa, Baptiste
2022-09-09 16:05         ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox