* [virtio-dev] VM memory protection and zero-copy transfers.
@ 2022-07-08 13:56 Afsa, Baptiste
2022-07-12 13:49 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Afsa, Baptiste @ 2022-07-08 13:56 UTC (permalink / raw)
To: virtio-dev@lists.oasis-open.org
Hello everyone,
The traditional virtio model relies on the ability for the host to access the
entire memory of the guest VM. Virtio is also used in system configurations
where the devices are not featured by the host (which may not exist as such in
the case of a Type-1 hypervisor) but by another, unprivileged guest VM. In such
a configuration, the guest VM memory sharing requirement would raise security
concerns.
The following proposal removes that requirement by introducing an alternative
model where the interactions between the virtio driver and the virtio device are
mediated by the hypervisor. This concept is applicable to both Type-1 and Type-2
hypervisors. In the following write-up, the "host" thus refers either to the
host OS or to the guest VM that executes the virtio device.
The main objective is to keep the memory of the VM that runs the driver isolated
from the memory that runs the device, while still allowing zero-copy transfers
between the two domains. The operations that control the exchange of the virtio
buffers are handled by hypervisor code that sits between the device and the
driver.
As opposed to the regular virtio model, the virtqueues allocated by the driver
are not shared with the device directly. Instead, the hypervisor allocates a
separate set of virtqueues that have the same sizes as the original ones and
shares this second set with the device. These hypervisor-allocated virtqueues
are referred as the "shadow virtqueues".
During device operation, the hypervisor copies the descriptors between the
driver and the shadow virtqueues as the buffers cycle between the driver and the
device.
Whenever the driver adds some buffers to the available ring, the hypervisor
validates the descriptors and dynamically grants the I/O buffers to the host or
VM that runs the device. The hypervisor then copies these descriptors to the
shadow virtqueue's available ring. At the other end, when the device returns
buffers to the shadow virtqueue's used ring, the hypervisor unmaps these buffers
from the host's address space and copy the descriptor to the driver's used ring.
Although the virtio buffers can be allocated anywhere in the guest memory and
are not necessarily page-aligned, the memory sharing granularity is constrained
by the page size. So when a buffer is mapped to the host address space, the
hypervisor may end up sharing more memory that what is strictly needed.
The cost of granting the memory dynamically as virtio transfers goes is
significant, though. We measured up to 40% performance degradation when using
this dynamic buffer granting mechanism.
We also compared this solution to other approaches that we have seen elsewhere.
For instance, using the swiotlb mechanism along with the
VIRTIO_F_ACCESS_PLATFORM feature bit to force a copy of the I/O buffers to a
statically shared memory region. In that case, the same set of benchmarks shows
an even bigger performance degradation, up to 60%, compared to the original
virtio performance.
Although the shadow virtqueue concept looks fairly simple, there is still one
point that has not been covered yet: indirect descriptors.
To support indirect descriptors, the following two options were considered
initially:
1. Grant the indirect descriptor as-is to the host while it is on the used
ring. This introduces a security issue because a compromised guest OS can
modify the indirect descriptor after it has been pushed to the available
ring. This would cause the device to fault while trying to access any
arbitrary memory that was not actually granted.
Note that in the shadow virtqueue model, there is no need for the device to
validate the descriptors in the available rings, because the hypervisor
already performed such checks before granting the memory.
2. Follow the same logic that is used for the "normal" descriptors and
introduce shadow indirect descriptors. This would require the hypervisor to
provision a memory pool to allocate these shadow indirect descriptors and
determining the size of this pool may not be trivial.
Additionally, indirect descriptors can be as large as the driver wants them
to be, something that can cause the hypervisor to copy an arbitrary large
amount of data.
An alternative approach consists in introducing a new virtio feature bit. This
feature bit, when set by the device, instructs the driver to allocate indirect
descriptors using dedicated memory pages. These pages shall hold no other data
than the indirect descriptors. Since a correct virtio driver implementation does
not modify an indirect descriptor once it has been pushed to the device, the
pages where the indirect descriptors lies can later on be remapped as read-only
in the guest address space.
This allows the hypervisor to validate the content of the indirect descriptor,
grant it to the host (along with all the buffers referenced by this descriptor)
and remap the indirect descriptor read-only in the guest address space as long
as it is granted to the host (i.e. until the indirect descriptor is returned
through the used ring).
The present proposal has some obvious drawbacks, but we believe that memory
protection will not come for free. We know that there are other folks out there
which try to address this issue of memory sharing between VMs, so we would be
pleased to hear what you guys think about this approach.
Additionally, we would like to know whether a feature bit similar to the one
that was discussed here could be considered for addition to the virtio standard.
Looking forwards to hear from you.
Baptiste
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [virtio-dev] VM memory protection and zero-copy transfers. 2022-07-08 13:56 [virtio-dev] VM memory protection and zero-copy transfers Afsa, Baptiste @ 2022-07-12 13:49 ` Stefan Hajnoczi 2022-07-15 14:18 ` Afsa, Baptiste 0 siblings, 1 reply; 6+ messages in thread From: Stefan Hajnoczi @ 2022-07-12 13:49 UTC (permalink / raw) To: Afsa, Baptiste Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker, Michael S. Tsirkin [-- Attachment #1: Type: text/plain, Size: 9997 bytes --] On Fri, Jul 08, 2022 at 01:56:31PM +0000, Afsa, Baptiste wrote: > Hello everyone, > > The traditional virtio model relies on the ability for the host to access the > entire memory of the guest VM. The VIRTIO device model (virtqueues, configuration space, feature negotiation, etc) does not rely on shared memory access between the device and the driver. There is a shared memory resource in the device model that some devices use, but that's the only thing that requires shared memory. It's the virtio-pci, virtio-mmio, etc transports and their use of the vring layout that requires shared memory access. This might seem pedantic but there's a practical reason for making the distinction. It should be possible to have a virtio-tcp or other message passing transport for VIRTIO one day. Correctly layered drivers will work regardless of whether the underlying transport relies on shared memory or message passing. > Virtio is also used in system configurations > where the devices are not featured by the host (which may not exist as such in > the case of a Type-1 hypervisor) but by another, unprivileged guest VM. In such > a configuration, the guest VM memory sharing requirement would raise security > concerns. Guest drivers can use IOMMU functionality to restrict device access to memory, if available from the transport. For example, a virtio-pci driver implementation can program the IOMMU to allow read/write access only to the vring and virtqueue buffer pages. > The following proposal removes that requirement by introducing an alternative > model where the interactions between the virtio driver and the virtio device are > mediated by the hypervisor. This concept is applicable to both Type-1 and Type-2 > hypervisors. In the following write-up, the "host" thus refers either to the > host OS or to the guest VM that executes the virtio device. > > The main objective is to keep the memory of the VM that runs the driver isolated > from the memory that runs the device, while still allowing zero-copy transfers > between the two domains. The operations that control the exchange of the virtio > buffers are handled by hypervisor code that sits between the device and the > driver. > > As opposed to the regular virtio model, the virtqueues allocated by the driver > are not shared with the device directly. Instead, the hypervisor allocates a > separate set of virtqueues that have the same sizes as the original ones and > shares this second set with the device. These hypervisor-allocated virtqueues > are referred as the "shadow virtqueues". > > During device operation, the hypervisor copies the descriptors between the > driver and the shadow virtqueues as the buffers cycle between the driver and the > device. > > Whenever the driver adds some buffers to the available ring, the hypervisor > validates the descriptors and dynamically grants the I/O buffers to the host or > VM that runs the device. The hypervisor then copies these descriptors to the > shadow virtqueue's available ring. At the other end, when the device returns > buffers to the shadow virtqueue's used ring, the hypervisor unmaps these buffers > from the host's address space and copy the descriptor to the driver's used ring. > > Although the virtio buffers can be allocated anywhere in the guest memory and > are not necessarily page-aligned, the memory sharing granularity is constrained > by the page size. So when a buffer is mapped to the host address space, the > hypervisor may end up sharing more memory that what is strictly needed. > > The cost of granting the memory dynamically as virtio transfers goes is > significant, though. We measured up to 40% performance degradation when using > this dynamic buffer granting mechanism. > > We also compared this solution to other approaches that we have seen elsewhere. > For instance, using the swiotlb mechanism along with the > VIRTIO_F_ACCESS_PLATFORM feature bit to force a copy of the I/O buffers to a > statically shared memory region. In that case, the same set of benchmarks shows > an even bigger performance degradation, up to 60%, compared to the original > virtio performance. Did you try virtio-pci with an IOMMU? The advantage compared to both your proposal and swiotlb is that workloads that reuse buffers have no performance overhead because the IOMMU mappings remain in place across virtqueue requests. I have CCed Jean-Philippe Brucker <jean-philippe@linaro.org> who designed the virtio-iommu device. Using an IOMMU can be slower than the approach you are proposing when each request requires new mappings. That's because your approach combines the virtqueue kick processing with the page granting whereas programming an IOMMU with map/unmap commands is a separate vmexit from the virtqueue kick. It's probably easier to make your approach faster in the dynamic mappings case for this reason. A page-table based IOMMU (doesn't require explicit map/unamp commands because it reads mappings on-demand from a page table structure) might perform better than one that needs to be programmed for each each map/unmap operation. It still needs a kick (vmexit) for invalidation but it might be possible for a design of this type to avoid vmexits in the common case. > > Although the shadow virtqueue concept looks fairly simple, there is still one > point that has not been covered yet: indirect descriptors. > > To support indirect descriptors, the following two options were considered > initially: > > 1. Grant the indirect descriptor as-is to the host while it is on the used > ring. This introduces a security issue because a compromised guest OS can > modify the indirect descriptor after it has been pushed to the available > ring. This would cause the device to fault while trying to access any > arbitrary memory that was not actually granted. > > Note that in the shadow virtqueue model, there is no need for the device to > validate the descriptors in the available rings, because the hypervisor > already performed such checks before granting the memory. Assuming that the driver can trust the device isn't possible in all use cases. Hardware VIRTIO device implementations, VDUSE (https://docs.kernel.org/userspace-api/vduse.html), and Confidential Computing are 3 use cases where the device is untrusted. If you make the assumption then it's important to clearly mark the code so it won't be reused in a context where that would be a security problem. > > 2. Follow the same logic that is used for the "normal" descriptors and > introduce shadow indirect descriptors. This would require the hypervisor to > provision a memory pool to allocate these shadow indirect descriptors and > determining the size of this pool may not be trivial. > > Additionally, indirect descriptors can be as large as the driver wants them > to be, something that can cause the hypervisor to copy an arbitrary large > amount of data. I agree that it's unfortunate that indirect descriptors would require some kind of dynamic memory in the hypervisor. However, the statement about indirect descriptor size is incorrect. They are limited by Queue Size: VIRTIO 1.2 2.7.5.3.1 Driver Requirements: Indirect Descriptors A driver MUST NOT create a descriptor chain longer than the Queue Size of the device. > > An alternative approach consists in introducing a new virtio feature bit. This > feature bit, when set by the device, instructs the driver to allocate indirect > descriptors using dedicated memory pages. These pages shall hold no other data > than the indirect descriptors. Since a correct virtio driver implementation does > not modify an indirect descriptor once it has been pushed to the device, the > pages where the indirect descriptors lies can later on be remapped as read-only > in the guest address space. > > This allows the hypervisor to validate the content of the indirect descriptor, > grant it to the host (along with all the buffers referenced by this descriptor) > and remap the indirect descriptor read-only in the guest address space as long > as it is granted to the host (i.e. until the indirect descriptor is returned > through the used ring). That sounds very slow (2 page table updates per request). > The present proposal has some obvious drawbacks, but we believe that memory > protection will not come for free. We know that there are other folks out there > which try to address this issue of memory sharing between VMs, so we would be > pleased to hear what you guys think about this approach. > > Additionally, we would like to know whether a feature bit similar to the one > that was discussed here could be considered for addition to the virtio standard. Memory isolation is hard to do efficiently. It would be great to discuss your proposal more with the VIRTIO community and then send a spec patch for detailed review and voting. One thing I didn't see in your proposal was a copying vs zero-copy threshold. Maybe it helps to look at the size of requests and copy data instead of granting pages when descriptors are small? On the other hand, a 4 KB page size means that many descriptors won't be larger than 4 KB anyway due to guest physical memory fragmentation. This is basically hybrid of swiotlb and your proposal - zero-copy when it pays off, copying when it's cheap. As I mentioned, I think IOMMUs are worth investigating, in particular for the case where mappings are rarely changed. They are fast in that case. By the way, KVM Forum is coming up in September 2022 in Dublin, Ireland where Linux Plumbers Conference, LinuxCon Europe, Open Source Summit Europe, and other conferences are also taking place. That's a good venue to meet with others interested in VIRTIO and discussing your idea. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [virtio-dev] VM memory protection and zero-copy transfers. 2022-07-12 13:49 ` Stefan Hajnoczi @ 2022-07-15 14:18 ` Afsa, Baptiste 2022-07-15 16:50 ` Stefan Hajnoczi 0 siblings, 1 reply; 6+ messages in thread From: Afsa, Baptiste @ 2022-07-15 14:18 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker, Michael S. Tsirkin > Did you try virtio-pci with an IOMMU? The advantage compared to both > your proposal and swiotlb is that workloads that reuse buffers have no > performance overhead because the IOMMU mappings remain in place across > virtqueue requests. > > I have CCed Jean-Philippe Brucker <jean-philippe@linaro.org> who > designed the virtio-iommu device. > > Using an IOMMU can be slower than the approach you are proposing when > each request requires new mappings. That's because your approach > combines the virtqueue kick processing with the page granting whereas > programming an IOMMU with map/unmap commands is a separate vmexit from > the virtqueue kick. It's probably easier to make your approach faster in > the dynamic mappings case for this reason. > > A page-table based IOMMU (doesn't require explicit map/unamp commands > because it reads mappings on-demand from a page table structure) might > perform better than one that needs to be programmed for each each > map/unmap operation. It still needs a kick (vmexit) for invalidation but > it might be possible for a design of this type to avoid vmexits in the > common case. We considered it at some point with the idea of moving the memory granting to the driver's side but we did not go in this direction for the reason that you mentioned, plus the fact that doing it from the hypervisor allows us to validate the descriptors while granting the memory and moving them to the shadow queue. However, we did not consider workloads where the buffers are reused. I am curious to see how much this is used in practice and the sort of gain we are looking at. Thank you for pointing this out, this looks like something we should investigate further. > > Although the shadow virtqueue concept looks fairly simple, there is still one > > point that has not been covered yet: indirect descriptors. > > > > To support indirect descriptors, the following two options were considered > > initially: > > > > 1. Grant the indirect descriptor as-is to the host while it is on the used > > ring. This introduces a security issue because a compromised guest OS can > > modify the indirect descriptor after it has been pushed to the available > > ring. This would cause the device to fault while trying to access any > > arbitrary memory that was not actually granted. > > > > Note that in the shadow virtqueue model, there is no need for the device to > > validate the descriptors in the available rings, because the hypervisor > > already performed such checks before granting the memory. > > Assuming that the driver can trust the device isn't possible in all use > cases. Hardware VIRTIO device implementations, VDUSE > (https://docs.kernel.org/userspace-api/vduse.html), and Confidential > Computing are 3 use cases where the device is untrusted. If you make the > assumption then it's important to clearly mark the code so it won't be > reused in a context where that would be a security problem. > > > > > 2. Follow the same logic that is used for the "normal" descriptors and > > introduce shadow indirect descriptors. This would require the hypervisor to > > provision a memory pool to allocate these shadow indirect descriptors and > > determining the size of this pool may not be trivial. > > > > Additionally, indirect descriptors can be as large as the driver wants them > > to be, something that can cause the hypervisor to copy an arbitrary large > > amount of data. > > I agree that it's unfortunate that indirect descriptors would require > some kind of dynamic memory in the hypervisor. However, the statement > about indirect descriptor size is incorrect. They are limited by Queue > Size: > > VIRTIO 1.2 2.7.5.3.1 Driver Requirements: Indirect Descriptors > > A driver MUST NOT create a descriptor chain longer than the Queue Size > of the device. > > > > > An alternative approach consists in introducing a new virtio feature bit. This > > feature bit, when set by the device, instructs the driver to allocate indirect > > descriptors using dedicated memory pages. These pages shall hold no other data > > than the indirect descriptors. Since a correct virtio driver implementation does > > not modify an indirect descriptor once it has been pushed to the device, the > > pages where the indirect descriptors lies can later on be remapped as read-only > > in the guest address space. > > > > This allows the hypervisor to validate the content of the indirect descriptor, > > grant it to the host (along with all the buffers referenced by this descriptor) > > and remap the indirect descriptor read-only in the guest address space as long > > as it is granted to the host (i.e. until the indirect descriptor is returned > > through the used ring). > > That sounds very slow (2 page table updates per request). Yes it is. As I mentioned, overall this approach is about 40% slower on the set of benchmarks we used when compared to running virtio devices without memory isolation. > > The present proposal has some obvious drawbacks, but we believe that memory > > protection will not come for free. We know that there are other folks out there > > which try to address this issue of memory sharing between VMs, so we would be > > pleased to hear what you guys think about this approach. > > > > Additionally, we would like to know whether a feature bit similar to the one > > that was discussed here could be considered for addition to the virtio standard. > > Memory isolation is hard to do efficiently. It would be great to discuss > your proposal more with the VIRTIO community and then send a spec patch > for detailed review and voting. This is exactly why we are here! I can indeed prepare a spec patch but before doing that I wanted to get some feedback on this approach and see if anyone in the community would see any reason not to introduce such a feature bit. > One thing I didn't see in your proposal was a copying vs zero-copy > threshold. Maybe it helps to look at the size of requests and copy data > instead of granting pages when descriptors are small? On the other hand, > a 4 KB page size means that many descriptors won't be larger than 4 KB > anyway due to guest physical memory fragmentation. This is basically > hybrid of swiotlb and your proposal - zero-copy when it pays off, > copying when it's cheap. > > As I mentioned, I think IOMMUs are worth investigating, in particular > for the case where mappings are rarely changed. They are fast in that > case. I agree there is much likely a point until which copying is cheaper. However, we have not considered this in our initial investigation. Thank you Stefan for taking the time to read this proposal and for your feedback. Baptiste --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [virtio-dev] VM memory protection and zero-copy transfers. 2022-07-15 14:18 ` Afsa, Baptiste @ 2022-07-15 16:50 ` Stefan Hajnoczi 2022-09-09 8:52 ` Afsa, Baptiste 0 siblings, 1 reply; 6+ messages in thread From: Stefan Hajnoczi @ 2022-07-15 16:50 UTC (permalink / raw) To: Afsa, Baptiste Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker, Michael S. Tsirkin [-- Attachment #1: Type: text/plain, Size: 1307 bytes --] On Fri, Jul 15, 2022 at 02:18:32PM +0000, Afsa, Baptiste wrote: > > One thing I didn't see in your proposal was a copying vs zero-copy > > threshold. Maybe it helps to look at the size of requests and copy data > > instead of granting pages when descriptors are small? On the other hand, > > a 4 KB page size means that many descriptors won't be larger than 4 KB > > anyway due to guest physical memory fragmentation. This is basically > > hybrid of swiotlb and your proposal - zero-copy when it pays off, > > copying when it's cheap. > > > > As I mentioned, I think IOMMUs are worth investigating, in particular > > for the case where mappings are rarely changed. They are fast in that > > case. > > I agree there is much likely a point until which copying is cheaper. However, we > have not considered this in our initial investigation. > > Thank you Stefan for taking the time to read this proposal and for your > feedback. I will be away until August so I probably won't participate in the discussion. One thing I forgot to say is that relying on a single benchmark may not be representative of real workloads. Two factors we touched on are descriptor size and page reuse. It seems worth evaluating solutions based on multiple benchmarks that vary these factors. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [virtio-dev] VM memory protection and zero-copy transfers. 2022-07-15 16:50 ` Stefan Hajnoczi @ 2022-09-09 8:52 ` Afsa, Baptiste 2022-09-09 16:05 ` Stefan Hajnoczi 0 siblings, 1 reply; 6+ messages in thread From: Afsa, Baptiste @ 2022-09-09 8:52 UTC (permalink / raw) To: Stefan Hajnoczi Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker, Michael S. Tsirkin Hello, I ran some benchmarks to compare the performances achieved by the swiotlb approach and our dynamic memory granting solution while using different buffer sizes. Without any surprise, the swiotlb approach performs much better when the buffers are small. Actually for small buffers, the performances are on par with the original configuration where the entire memory is shared. Of course these results are specific to the platform I used and the system workload (e.g. CPU utilization, caches utilization). At the moment we are not planning to add a mechanism that would take the decision between copying or granting the buffers dynamically based on their sizes, but this experience showed us that devices that uses small packets would benefits from going through the swiotlb. So we are considering making this configurable on a per-device basis in our solution. We have also experimented with the use of a virtual IOMMU on the guest side and we have a few concerns with this option. If we add a virtual IOMMU, we can see mapping commands being issued as virtio buffers are exchanged between the device and the driver. However the kernel controls the mappings from DMA addresses to physical addresses. In theory, we could remap the memory in the host address space to "implement" these mappings but we have some additional constraints that make this approach problematic. Our solution runs on systems where the physical IOMMU does not support address translation. So we rely on having an identity mapping between the guest address space and the physical address space to allow the guest OS to initiate DMA transactions. If the memory that we import for virtio buffers uses translated addresses, these buffers cannot be used in DMA transactions. We also have an issue with letting the driver control the exported memory through an IOMMU. If we do this, we need to consider what will happen if the guest unmaps a virtio buffer while it is in use on the device side. Although it looks possible to recover from such a scenario in the case of a device doing CPU accesses to the shared memory, things get more complicated if we start considering that the buffer may be involved in a DMA transaction. In some previous projects, we have learned that the ability for the hardware device and/or its associated driver to recover from an aborted transaction is not something that we can rely upon in the general case. For this reason, in our typical memory granting scenarios, we usually "lock" the shared memory regions to prevent the exporter from revoking the mappings until the importer says it is ok to do so. Note that locking the mappings could be applied here as well. In this case, we would still use this concept of shadow virtqueues and the hypervisor would be responsible for locking/unlocking the virtio buffers as they cycle between the device and the driver. This design is likely to be slower than the original implementation as the cost of locking the mappings is significant (i.e. an extra page table walk to validate the memory regions). As we discussed in this thread, there are a few options available to enable virtio in configurations where the VM address spaces are isolated. I think they all have different trade-offs. Our approach certainly have some drawbacks but it also addresses some specific considerations that are relevant in our use case. Different configurations will probably require different solutions to this question. What would be the next steps to go forward with adding a new feature bit such as the one I discussed in my original email? Should we prepare a patch on the specification and post it here for further discussions? Baptiste --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [virtio-dev] VM memory protection and zero-copy transfers. 2022-09-09 8:52 ` Afsa, Baptiste @ 2022-09-09 16:05 ` Stefan Hajnoczi 0 siblings, 0 replies; 6+ messages in thread From: Stefan Hajnoczi @ 2022-09-09 16:05 UTC (permalink / raw) To: Afsa, Baptiste Cc: virtio-dev@lists.oasis-open.org, Jean-Philippe Brucker, Michael S. Tsirkin [-- Attachment #1: Type: text/plain, Size: 4241 bytes --] On Fri, Sep 09, 2022 at 08:52:02AM +0000, Afsa, Baptiste wrote: > Hello, > > I ran some benchmarks to compare the performances achieved by the swiotlb > approach and our dynamic memory granting solution while using different buffer > sizes. Without any surprise, the swiotlb approach performs much better when the > buffers are small. Actually for small buffers, the performances are on par with > the original configuration where the entire memory is shared. Of course these > results are specific to the platform I used and the system workload (e.g. CPU > utilization, caches utilization). > > At the moment we are not planning to add a mechanism that would take the > decision between copying or granting the buffers dynamically based on their > sizes, but this experience showed us that devices that uses small packets would > benefits from going through the swiotlb. So we are considering making this > configurable on a per-device basis in our solution. > > We have also experimented with the use of a virtual IOMMU on the guest side and > we have a few concerns with this option. > > If we add a virtual IOMMU, we can see mapping commands being issued as virtio > buffers are exchanged between the device and the driver. However the kernel > controls the mappings from DMA addresses to physical addresses. In theory, we > could remap the memory in the host address space to "implement" these mappings > but we have some additional constraints that make this approach problematic. > > Our solution runs on systems where the physical IOMMU does not support address > translation. So we rely on having an identity mapping between the guest address > space and the physical address space to allow the guest OS to initiate DMA > transactions. If the memory that we import for virtio buffers uses translated > addresses, these buffers cannot be used in DMA transactions. > > We also have an issue with letting the driver control the exported memory > through an IOMMU. If we do this, we need to consider what will happen if the > guest unmaps a virtio buffer while it is in use on the device side. > > Although it looks possible to recover from such a scenario in the case of a > device doing CPU accesses to the shared memory, things get more complicated if > we start considering that the buffer may be involved in a DMA transaction. > > In some previous projects, we have learned that the ability for the hardware > device and/or its associated driver to recover from an aborted transaction is > not something that we can rely upon in the general case. > > For this reason, in our typical memory granting scenarios, we usually "lock" the > shared memory regions to prevent the exporter from revoking the mappings until > the importer says it is ok to do so. > > Note that locking the mappings could be applied here as well. In this case, we > would still use this concept of shadow virtqueues and the hypervisor would be > responsible for locking/unlocking the virtio buffers as they cycle between the > device and the driver. This design is likely to be slower than the original > implementation as the cost of locking the mappings is significant (i.e. an extra > page table walk to validate the memory regions). > > As we discussed in this thread, there are a few options available to enable > virtio in configurations where the VM address spaces are isolated. I think they > all have different trade-offs. Our approach certainly have some drawbacks but it > also addresses some specific considerations that are relevant in our use > case. Different configurations will probably require different solutions to this > question. > > What would be the next steps to go forward with adding a new feature bit such as > the one I discussed in my original email? Should we prepare a patch on the > specification and post it here for further discussions? Hi Baptiste, Thanks for sharing your findings. As the next step please send your VIRTIO spec change proposal to the mailing list so they can be discussed in detail. Example commands for sending spec patch emails are here: https://github.com/oasis-tcs/virtio-spec#providing-feedback Thanks, Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-09-09 16:05 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-08 13:56 [virtio-dev] VM memory protection and zero-copy transfers Afsa, Baptiste 2022-07-12 13:49 ` Stefan Hajnoczi 2022-07-15 14:18 ` Afsa, Baptiste 2022-07-15 16:50 ` Stefan Hajnoczi 2022-09-09 8:52 ` Afsa, Baptiste 2022-09-09 16:05 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox