qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Sahil <icegambit91@gmail.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
	 Sahil Siddiq <sahilcdq@proton.me>
Subject: Re: [RFC v3 3/3] vhost: Allocate memory for packed vring
Date: Tue, 13 Aug 2024 08:53:55 +0200	[thread overview]
Message-ID: <CAJaqyWf3Vv6LvCHvRtpdZFQrhVHMOUTdzhJGj7PkbVDYeKox_w@mail.gmail.com> (raw)
In-Reply-To: <8454459.NyiUUSuA9g@valdaarhun>

On Mon, Aug 12, 2024 at 9:32 PM Sahil <icegambit91@gmail.com> wrote:
>
> Hi,
>
> On Monday, August 12, 2024 12:01:00 PM GMT+5:30 you wrote:
> > On Sun, Aug 11, 2024 at 7:20 PM Sahil <icegambit91@gmail.com> wrote:
> > > On Wednesday, August 7, 2024 9:52:10 PM GMT+5:30 Eugenio Perez Martin wrote:
> > > > On Fri, Aug 2, 2024 at 1:22 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
> > > > > [...]
> > > > > @@ -726,17 +738,30 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > > > >      svq->vring.num = virtio_queue_get_num(vdev,
> > > > >      virtio_get_queue_index(vq));
> > > > >      svq->num_free = svq->vring.num;
> > > > >
> > > > > -    svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq),
> > > > > -                           PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,
> > > > > -                           -1, 0);
> > > > > -    desc_size = sizeof(vring_desc_t) * svq->vring.num;
> > > > > -    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> > > > > -    svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq),
> > > > > -                           PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,
> > > > > -                           -1, 0);
> > > > > -    svq->desc_state = g_new0(SVQDescState, svq->vring.num);
> > > > > -    svq->desc_next = g_new0(uint16_t, svq->vring.num);
> > > > > -    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > > > > +    svq->is_packed = virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED);
> > > > > +
> > > > > +    if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) {
> > > > > +        svq->vring_packed.vring.desc = mmap(NULL, vhost_svq_memory_packed(svq),
> > > > > +                                          PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS,
> > > > > +                                          -1, 0);
> > > > > +        desc_size = sizeof(struct vring_packed_desc) * svq->vring.num;
> > > > > +        svq->vring_packed.vring.driver = (void *)((char *)svq->vring_packed.vring.desc + desc_size);
> > > > > +        svq->vring_packed.vring.device = (void *)((char *)svq->vring_packed.vring.driver +
> > > > > +                                     sizeof(struct vring_packed_desc_event));
> > > >
> > > > This is a great start but it will be problematic when you start
> > > > mapping the areas to the vdpa device. The driver area should be read
> > > > only for the device, but it is placed in the same page as a RW one.
> > > >
> > > > More on this later.
> > > >
> > > > > +    } else {
> > > > > +        svq->vring.desc = mmap(NULL, vhost_svq_driver_area_size(svq),
> > > > > +                               PROT_READ | PROT_WRITE, MAP_SHARED |MAP_ANONYMOUS,
> > > > > +                               -1, 0);
> > > > > +        desc_size = sizeof(vring_desc_t) * svq->vring.num;
> > > > > +        svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> > > > > +        svq->vring.used = mmap(NULL, vhost_svq_device_area_size(svq),
> > > > > +                               PROT_READ | PROT_WRITE, MAP_SHARED |MAP_ANONYMOUS,
> > > > > +                               -1, 0);
> > > > > +    }
> > > >
> > > > I think it will be beneficial to avoid "if (packed)" conditionals on
> > > > the exposed functions that give information about the memory maps.
> > > > These need to be replicated at
> > > > hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings.
> > > >
> > > > However, the current one depends on the driver area to live in the
> > > > same page as the descriptor area, so it is not suitable for this.
> > >
> > > I haven't really understood this.
> > >
> > > In split vqs the descriptor, driver and device areas are mapped to RW pages.
> > > In vhost_vdpa.c:vhost_vdpa_svq_map_rings, the regions are mapped with
> > > the appropriate "perm" field that sets the R/W permissions in the DMAMap
> > > object. Is this problematic for the split vq format because the avail ring is
> > > anyway mapped to a RW page in "vhost_svq_start"?
> > >
> >
> > Ok so maybe the map word is misleading here. The pages needs to be
> > allocated for the QEMU process with both PROT_READ | PROT_WRITE, as
> > QEMU needs to write into it.
> >
> > They are mapped to the device with vhost_vdpa_dma_map, and the last
> > bool parameter indicates if the device needs write permissions or not.
> > You can see how hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_ring checks
> > the needle permission for this, and the needle permissions are stored
> > at hw/virtio/vhost-vdpa.c:vhost_vdpa_svq_map_rings. This is the
> > function that needs to check for the maps permissions.
> >
>
> I think I have understood what's going on in "vhost_vdpa_svq_map_rings",
> "vhost_vdpa_svq_map_ring" and "vhost_vdpa_dma_map". But based on
> what I have understood it looks like the driver area is getting mapped to
> an iova which is read-only for vhost_vdpa. Please let me know where I am
> going wrong.
>

You're not going wrong there. The device does not need to write into
this area, so we map it read only.

> Consider the following implementation in hw/virtio/vhost_vdpa.c:
>
> > size_t device_size = vhost_svq_device_area_size(svq);
> > size_t driver_size = vhost_svq_driver_area_size(svq);
>
> The driver size includes the descriptor area and the driver area. For
> packed vq, the driver area is the "driver event suppression" structure
> which should be read-only for the device according to the virtio spec
> (section 2.8.10) [1].
>
> > size_t avail_offset;
> > bool ok;
> >
> > vhost_svq_get_vring_addr(svq, &svq_addr);
>
> Over here "svq_addr.desc_user_addr" will point to the descriptor area
> while "svq_addr.avail_user_addr" will point to the driver area/driver
> event suppression structure.
>
> > driver_region = (DMAMap) {
> >     .translated_addr = svq_addr.desc_user_addr,
> >     .size = driver_size - 1,
> >     .perm = IOMMU_RO,
> > };
>
> This region points to the descriptor area and its size encompasses the
> driver area as well with RO permission.
>
> > ok = vhost_vdpa_svq_map_ring(v, &driver_region, errp);
>
> The above function checks the value of needle->perm and sees that it is RO.
> It then calls "vhost_vdpa_dma_map" with the following arguments:
>
> > r = vhost_vdpa_dma_map(v->shared, v->address_space_id, needle->iova,
> >                                                needle->size + 1,
> >                                                (void *)(uintptr_t)needle->translated_addr,
> >                                                needle->perm == IOMMU_RO);
>
> Since needle->size includes the driver area as well, the driver area will be
> mapped to a RO page in the device's address space, right?
>

Yes, the device does not need to write into the descriptor area in the
supported split virtqueue case. So the descriptor area is also mapped
RO at this moment.

This change in the packed virtqueue case, so we need to map it RW.

> > if (unlikely(!ok)) {
> >     error_prepend(errp, "Cannot create vq driver region: ");
> >     return false;
> > }
> > addr->desc_user_addr = driver_region.iova;
> > avail_offset = svq_addr.avail_user_addr - svq_addr.desc_user_addr;
> > addr->avail_user_addr = driver_region.iova + avail_offset;
>
> I think "addr->desc_user_addr" and "addr->avail_user_addr" will both be
> mapped to a RO page in the device's address space.
>
> > device_region = (DMAMap) {
> >     .translated_addr = svq_addr.used_user_addr,
> >     .size = device_size - 1,
> >     .perm = IOMMU_RW,
> > };
>
> The device area/device event suppression structure on the other hand will
> be mapped to a RW page.
>
> I also think there are other issues with the current state of the patch. According
> to the virtio spec (section 2.8.10) [1], the "device event suppression" structure
> needs to be write-only for the device but is mapped to a RW page.
>

Yes, I'm not sure if all IOMMU supports write-only maps to be honest.

> Another concern I have is regarding the driver area size for packed vq. In
> "hw/virtio/vhost-shadow-virtqueue.c" of the current patch:
> > size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
> > {
> >     size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
> >     size_t avail_size = offsetof(vring_avail_t, ring[svq->vring.num]) +
> >                                                               sizeof(uint16_t);
> >
> >     return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size());
> > }
> >
> > [...]
> >
> > size_t vhost_svq_memory_packed(const VhostShadowVirtqueue *svq)
> > {
> >     size_t desc_size = sizeof(struct vring_packed_desc) * svq->num_free;
> >     size_t driver_event_suppression = sizeof(struct vring_packed_desc_event);
> >     size_t device_event_suppression = sizeof(struct vring_packed_desc_event);
> >
> >     return ROUND_UP(desc_size + driver_event_suppression + device_event_suppression,
> >                     qemu_real_host_page_size());
> > }
>
> The size returned by "vhost_svq_driver_area_size" might not be the actual driver
> size which is given by desc_size + driver_event_suppression, right? Will this have to
> be changed too?
>

Yes, you're right this needs to be changed too.

> Thanks,
> Sahil
>
> [1] https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-720008
>
>
>



  reply	other threads:[~2024-08-13  6:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02 11:21 [RFC v3 0/3] Add packed virtqueue to shadow virtqueue Sahil Siddiq
2024-08-02 11:21 ` [RFC v3 1/3] vhost: Introduce packed vq and add buffer elements Sahil Siddiq
2024-08-07 16:40   ` Eugenio Perez Martin
2024-08-02 11:21 ` [RFC v3 2/3] vhost: Data structure changes to support packed vqs Sahil Siddiq
2024-08-02 11:21 ` [RFC v3 3/3] vhost: Allocate memory for packed vring Sahil Siddiq
2024-08-07 16:22   ` Eugenio Perez Martin
2024-08-11 15:37     ` Sahil
2024-08-11 17:20     ` Sahil
2024-08-12  6:31       ` Eugenio Perez Martin
2024-08-12 19:32     ` Sahil
2024-08-13  6:53       ` Eugenio Perez Martin [this message]
2024-08-21 12:19         ` Sahil
2024-08-27 15:30           ` Eugenio Perez Martin
2024-08-30 10:20             ` Sahil
2024-08-30 10:48               ` Eugenio Perez Martin
2024-09-08 19:46                 ` Sahil
2024-09-09 12:34                   ` Eugenio Perez Martin
2024-09-11 19:36                     ` Sahil
2024-09-12  9:54                       ` Eugenio Perez Martin
2024-09-16  4:34                         ` Sahil
2024-09-24  5:31                           ` Sahil
2024-09-24 10:46                             ` Eugenio Perez Martin
2024-09-30  5:34                               ` Sahil
2024-10-28  5:37                                 ` Sahil Siddiq
2024-10-28  8:10                                   ` Eugenio Perez Martin
2024-10-31  5:10                                     ` Sahil Siddiq
2024-11-13  5:10                                   ` Sahil Siddiq
2024-11-13 11:30                                     ` Eugenio Perez Martin
2024-12-05 20:38                                       ` Sahil Siddiq
2024-08-07 16:41 ` [RFC v3 0/3] Add packed virtqueue to shadow virtqueue Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJaqyWf3Vv6LvCHvRtpdZFQrhVHMOUTdzhJGj7PkbVDYeKox_w@mail.gmail.com \
    --to=eperezma@redhat.com \
    --cc=icegambit91@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sahilcdq@proton.me \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).