Re: [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature

Discussion of the VIRTIO specification
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@redhat.com>
To: "Afsa, Baptiste" <Baptiste.Afsa@harman.com>
Cc: Eugenio Perez Martin <eperezma@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"virtio-comment@lists.oasis-open.org"
	<virtio-comment@lists.oasis-open.org>,
	Christian Schoenebeck <qemu_oss@crudebyte.com>
Subject: Re: [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature
Date: Mon, 27 Feb 2023 10:45:45 -0500	[thread overview]
Message-ID: <Y/zQKS7WD227Qz2y@fedora> (raw)
In-Reply-To: <1677509635629.14571@harman.com>

[-- Attachment #1: Type: text/plain, Size: 6133 bytes --]

On Mon, Feb 27, 2023 at 02:53:55PM +0000, Afsa, Baptiste wrote:
> > Hi Baptiste,
> >
> > It is not the current way of working for shadow virtqueue in qemu, but
> > it is doable.
> >
> > QEMU offers a new address space to the device, including all the guest
> > memory and the shadow vrings in qemu memory. By default, qemu
> > translates each descriptor to this new address space and copies it in
> > the shadow vring, making sure it's a valid guest address.
> >
> > Net CVQ is a special case, as qemu copies also the buffer so a
> > malicious guest will not be able to change it and make vdpa device and
> > qemu's knowledge of it out of sync. This is done in a region mapped at
> > device start for this purpose for performance and simplicity, not for
> > each descriptor.
> >
> > It is technically possible to develop the copy of the buffer content
> > in a memory region too. Once a suitable size is agreed (or given by a
> > parameter in cmdline, for example), SVQ already supports things like
> > backoff in case this region is full. To malloc and map this region
> > dynamically is also possible.
> >
> 
> Hello Eugenio,
> 
> Thanks for the insight. It does have some similarities with what we have done on
> our side.
> 
> We have also tried the approach of doing a copy of the buffer content into a
> shared memory region but we did that on the guest side using things like the
> swiotlb or restricted DMA pools. This works well when buffers are small, but
> when they get larger, we get better performance by granting the memory
> dynamically.
> 
> > > This new feature bit helps supporting indirect
> > > descriptors in this context.
> > >
> > > When an indirect descriptor is added to a virtqueue, we need to "share" the
> > > indirect descriptor table in addition to the buffers themselves. To do this we
> > > have two options:
> > >
> > >   - Share a shadow copy of the table with the device but this requires some kind
> > >     of dynamic memory allocation in the hypervisor.
> > >
> > >   - Grant the indirect descriptor table from the guest as-is. Note that in our
> > >     use case we do not translate buffer addresses when importing them into the
> > >     device address space.
> > >
> >
> > The plan with indirect descriptors is similar to CVQ buffer treatment:
> > To preallocate a big enough chunk of memory able to hold a copy of the
> > indirect table and then translate each descriptor. The current dynamic
> > allocation scheme is very simple but it should work as long as there
> > is enough memory. As commented before, backoff is already available
> > for other cases.
> 
> Ok. This is one of the options we considered initially, but that we rejected
> because we wanted to avoid dynamic memory allocation.
> 
> However, we are reconsidering this idea now. We initially overlooked the fact
> that the spec limits the size of an indirect descriptor table to the queue
> size. Having this allows us to figure out the maximum size for a single indirect
> descriptor table and the total size needed for the memory pool that will store
> the copies of the indirect descriptor tables.
> 
> The issue that we have now, is that this limitation does not seem to be enforced
> in Linux virtio drivers today. I came across:
> 
> https://github.com/oasis-tcs/virtio-spec/issues/122
> 
> which looks like a good base for us to build upon, but I'm not sure what is the
> status with this issue. Do you know whether there is any plan regarding this?

CCing Christian regarding extending queue size limits.

> > Going a step forward with current qemu's SVQ, a feature that would
> > help is to allow the device to translate buffers addresses with a
> > different IOVA from vring and indirect table. But this is far from
> > standard POV, since it does not have even an address space concept.
> 
> I'm not sure to see what this would bring compared to the current design.
> 
> > Do you think both modes can converge?
> 
> I think so since we are now thinking of using a copy of the indirect descriptor
> table. This would eliminate the need for any additional feature bit.
> 
> > > The issue with the second option is that we need to ensure that the driver will
> > > not modify the indirect descriptor table while it is on the device side.
> > > Otherwise the device could attempt to access the buffers using the new addresses
> > > which would not correspond to the one the hypervisor used when granting memory.
> > >
> > > Since this is something that a correct driver implementation would not do, this
> > > led to the idea of remapping the table read-only while it is used by the device.
> > > Because an indirect descriptor table may not fill an entire page, we needed a
> > > way to ensure that the OS would not re-use the rest of these pages for other
> > > purposes while we have made them read-only.
> > >
> >
> > How do you guarantee it for vring itself?
> 
> For the vrings we do exactly what you described for QEMU. The virtqueues that
> the device sees are not the virtqueues allocated by the driver but the shadow
> copies managed by the hypervisor. If the driver touches the descriptors after
> they have been copied to the shadow virtqueue, the hypervisor will ignore these
> changes.
> 
> Thanks.
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2023-02-27 15:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-13  7:45 [virtio-comment] [PATCH] Introduce VIRTIO_F_ISOLATE_INDIRECT_DESC feature Baptiste Afsa
2023-01-13 12:46 ` Michael S. Tsirkin
2023-01-17 15:19   ` Afsa, Baptiste
2023-01-17 18:27     ` Eugenio Perez Martin
2023-02-27 14:53       ` Afsa, Baptiste
2023-02-27 15:45         ` Stefan Hajnoczi [this message]
     [not found]           ` <2244126.gP0zCk8Q6A@silver>
2023-02-27 17:41             ` [virtio-comment] Re: VIRTIO_RING_F_INDIRECT_SIZE status Michael S. Tsirkin
     [not found]               ` <2494182.5W6NY9sLyD@silver>
2023-02-28 12:05                 ` Michael S. Tsirkin
     [not found] ` <6380471.4BWXO1n1mU@silver>
     [not found]   ` <Y/9Z5fphn34/HSKs@fedora>
     [not found]     ` <2458440.T3bEdP9vpG@silver>
2023-03-06 16:27       ` Stefan Hajnoczi
     [not found]   ` <20230301095017-mutt-send-email-mst@kernel.org>
     [not found]     ` <2812377.Px9Efocobp@silver>
2023-03-06 17:41       ` Michael S. Tsirkin
2023-03-06 20:46         ` Stefan Hajnoczi
2023-03-06 21:50           ` Michael S. Tsirkin
2023-03-07 12:40             ` Christian Schoenebeck
2023-03-13 11:48               ` Christian Schoenebeck
2023-03-13 13:06                 ` Michael S. Tsirkin
2023-03-13 13:48                   ` Christian Schoenebeck
2023-03-13 13:54                     ` Michael S. Tsirkin
2023-03-07 13:26             ` Stefan Hajnoczi
2023-03-07 16:47               ` Michael S. Tsirkin
2023-03-07 19:35                 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/zQKS7WD227Qz2y@fedora \
    --to=stefanha@redhat.com \
    --cc=Baptiste.Afsa@harman.com \
    --cc=eperezma@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu_oss@crudebyte.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox