From: Eugenio Perez Martin <eperezma@redhat.com>
To: Sahil Siddiq <icegambit91@gmail.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
sahilcdq@proton.me
Subject: Re: [RFC v5 3/7] vhost: Forward descriptors to device via packed SVQ
Date: Mon, 14 Apr 2025 17:07:32 +0200 [thread overview]
Message-ID: <CAJaqyWdJCaP79Pvy1YgkrZC4p7HBsg1U5MDmQR-LLDFmrBGSzg@mail.gmail.com> (raw)
In-Reply-To: <5f970447-2547-4ce2-8d27-420540d5896b@gmail.com>
On Mon, Apr 14, 2025 at 11:38 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>
> Hi,
>
> On 3/28/25 1:21 PM, Eugenio Perez Martin wrote:
> > On Thu, Mar 27, 2025 at 7:42 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
> >> On 3/26/25 1:33 PM, Eugenio Perez Martin wrote:
> >>> On Mon, Mar 24, 2025 at 3:14 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
> >>>> On 3/24/25 7:29 PM, Sahil Siddiq wrote:
> >>>>> Implement the insertion of available buffers in the descriptor area of
> >>>>> packed shadow virtqueues. It takes into account descriptor chains, but
> >>>>> does not consider indirect descriptors.
> >>>>>
> >>>>> Enable the packed SVQ to forward the descriptors to the device.
> >>>>>
> >>>>> Signed-off-by: Sahil Siddiq <sahilcdq@proton.me>
> >>>>> ---
> >>>>> [...]
> >>>>> +
> >>>>> +/**
> >>>>> + * Write descriptors to SVQ packed vring
> >>>>> + *
> >>>>> + * @svq: The shadow virtqueue
> >>>>> + * @out_sg: The iovec to the guest
> >>>>> + * @out_num: Outgoing iovec length
> >>>>> + * @in_sg: The iovec from the guest
> >>>>> + * @in_num: Incoming iovec length
> >>>>> + * @sgs: Cache for hwaddr
> >>>>> + * @head: Saves current free_head
> >>>>> + */
> >>>>> +static void vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> >>>>> + const struct iovec *out_sg, size_t out_num,
> >>>>> + const struct iovec *in_sg, size_t in_num,
> >>>>> + hwaddr *sgs, unsigned *head)
> >>>>> +{
> >>>>> + uint16_t id, curr, i, head_flags = 0, head_idx;
> >>>>> + size_t num = out_num + in_num;
> >>>>> + unsigned n;
> >>>>> +
> >>>>> + struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> >>>>> +
> >>>>> + head_idx = svq->vring_packed.next_avail_idx;
> >>>>
> >>>> Since "svq->vring_packed.next_avail_idx" is part of QEMU internals and not
> >>>> stored in guest memory, no endianness conversion is required here, right?
> >>>>
> >>>
> >>> Right!
> >>
> >> Understood.
> >>
> >>>>> + i = head_idx;
> >>>>> + id = svq->free_head;
> >>>>> + curr = id;
> >>>>> + *head = id;
> >>>>
> >>>> Should head be the buffer id or the idx of the descriptor ring where the
> >>>> first descriptor of a descriptor chain is inserted?
> >>>>
> >>>
> >>> The buffer id of the *last* descriptor of a chain. See "2.8.6 Next
> >>> Flag: Descriptor Chaining" at [1].
> >>
> >> Ah, yes. The second half of my question in incorrect.
> >>
> >> The tail descriptor of the chain includes the buffer id. In this implementation
> >> we place the same tail buffer id in other locations of the descriptor ring since
> >> they will be ignored anyway [1].
> >>
> >> The explanation below frames my query better.
> >>
> >>>>> + /* Write descriptors to SVQ packed vring */
> >>>>> + for (n = 0; n < num; n++) {
> >>>>> + uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> >>>>> + (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> >>>>> + (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> >>>>> + if (i == head_idx) {
> >>>>> + head_flags = flags;
> >>>>> + } else {
> >>>>> + descs[i].flags = flags;
> >>>>> + }
> >>>>> +
> >>>>> + descs[i].addr = cpu_to_le64(sgs[n]);
> >>>>> + descs[i].id = id;
> >>>>> + if (n < out_num) {
> >>>>> + descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> >>>>> + } else {
> >>>>> + descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> >>>>> + }
> >>>>> +
> >>>>> + curr = cpu_to_le16(svq->desc_next[curr]);
> >>>>> +
> >>>>> + if (++i >= svq->vring_packed.vring.num) {
> >>>>> + i = 0;
> >>>>> + svq->vring_packed.avail_used_flags ^=
> >>>>> + 1 << VRING_PACKED_DESC_F_AVAIL |
> >>>>> + 1 << VRING_PACKED_DESC_F_USED;
> >>>>> + }
> >>>>> + }
> >>>>>
> >>>>> + if (i <= head_idx) {
> >>>>> + svq->vring_packed.avail_wrap_counter ^= 1;
> >>>>> + }
> >>>>> +
> >>>>> + svq->vring_packed.next_avail_idx = i;
> >>>>> + svq->shadow_avail_idx = i;
> >>>>> + svq->free_head = curr;
> >>>>> +
> >>>>> + /*
> >>>>> + * A driver MUST NOT make the first descriptor in the list
> >>>>> + * available before all subsequent descriptors comprising
> >>>>> + * the list are made available.
> >>>>> + */
> >>>>> + smp_wmb();
> >>>>> + svq->vring_packed.vring.desc[head_idx].flags = head_flags;
> >>>>> }
> >>>>>
> >>>>> [...]
> >>>>> @@ -258,13 +356,22 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> >>>>> return -EINVAL;
> >>>>> }
> >>>>>
> >>>>> - vhost_svq_add_split(svq, out_sg, out_num, in_sg,
> >>>>> - in_num, sgs, &qemu_head);
> >>>>> + if (svq->is_packed) {
> >>>>> + vhost_svq_add_packed(svq, out_sg, out_num, in_sg,
> >>>>> + in_num, sgs, &qemu_head);
> >>>>> + } else {
> >>>>> + vhost_svq_add_split(svq, out_sg, out_num, in_sg,
> >>>>> + in_num, sgs, &qemu_head);
> >>>>> + }
> >>>>>
> >>>>> svq->num_free -= ndescs;
> >>>>> svq->desc_state[qemu_head].elem = elem;
> >>>>> svq->desc_state[qemu_head].ndescs = ndescs;
> >>>>
> >>>> *head in vhost_svq_add_packed() is stored in "qemu_head" here.
> >>>>
> >>>
> >>> Sorry I don't get this, can you expand?
> >>
> >> Sure. In vhost_svq_add(), after the descriptors have been added
> >> (either using vhost_svq_add_split or vhost_svq_add_packed),
> >> VirtQueueElement elem and ndescs are both saved in the
> >> svq->desc_state array. "elem" and "ndescs" are later used when
> >> the guest consumes used descriptors from the device in
> >> vhost_svq_get_buf_(split|packed).
> >>
> >> For split vqs, the index of svq->desc where elem and ndescs are
> >> saved matches the index of the descriptor ring where the head of
> >> the descriptor ring is placed.
> >>
> >> In vhost_svq_add_split:
> >>
> >> *head = svq->free_head;
> >> [...]
> >> avail_idx = svq->shadow_avail_idx & (svq->vring.num - 1);
> >> avail->ring[avail_idx] = cpu_to_le16(*head);
> >>
> >> "qemu_head" in vhost_svq_add gets its value from "*head" in
> >> vhost_svq_add_split:
> >>
> >> svq->desc_state[qemu_head].elem = elem;
> >> svq->desc_state[qemu_head].ndescs = ndescs;
> >>
> >> For packed vq, something similar has to be done. My approach was
> >> to have the index of svq->desc_state match the buffer id in the
> >> tail of the descriptor ring.
> >>
> >> The entire chain is written to the descriptor ring in the loop
> >> in vhost_svq_add_packed. I am not sure if the index of
> >> svq->desc_state should be the buffer id or if it should be a
> >> descriptor index ("head_idx" or the index corresponding to the
> >> tail of the chain).
> >>
> >
> > I think both approaches should be valid. My advice is to follow
> > Linux's code and let it be the tail descriptor id. This descriptor id
> > is pushed and popped from vq->free_head in a stack style.
> >
> > In addition to that, Linux also sets the same id to all the chain
> > elements. I think this is useful when dealing with bad devices. In
> > particular,
>
> Understood. So far, I have implemented this so it matches the
> implementation in Linux.
>
> > QEMU's packed vq implementation looked at the first
> > desciptor's id, which is an incorrect behavior.
>
> Are you referring to:
>
> 1. svq->desc_state[qemu_head].elem = elem (in vhost_svq_add()), and
> 2. *head = id (in vhost_svq_add_packed())
>
I meant "it used to use the first descriptor id by mistake". It was
fixed in commit 33abfea23959 ("hw/virtio: Fix obtain the buffer id
from the last descriptor"). It is better to set the descriptor id in
all the descriptors of the chain, so if QEMU does not contain this
patch in the nested VM case it can still work with this version.
next prev parent reply other threads:[~2025-04-14 15:09 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 13:59 [RFC v5 0/7] Add packed format to shadow virtqueue Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 1/7] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2025-03-26 11:25 ` Eugenio Perez Martin
2025-03-28 5:18 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 2/7] vhost: Data structure changes to support packed vqs Sahil Siddiq
2025-03-26 11:26 ` Eugenio Perez Martin
2025-03-28 5:17 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 3/7] vhost: Forward descriptors to device via packed SVQ Sahil Siddiq
2025-03-24 14:14 ` Sahil Siddiq
2025-03-26 8:03 ` Eugenio Perez Martin
2025-03-27 18:42 ` Sahil Siddiq
2025-03-28 7:51 ` Eugenio Perez Martin
2025-04-14 9:37 ` Sahil Siddiq
2025-04-14 15:07 ` Eugenio Perez Martin [this message]
2025-04-15 19:10 ` Sahil Siddiq
2025-03-26 12:02 ` Eugenio Perez Martin
2025-03-28 5:09 ` Sahil Siddiq
2025-03-28 6:42 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 4/7] vdpa: Allocate memory for SVQ and map them to vdpa Sahil Siddiq
2025-03-26 12:05 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 5/7] vhost: Forward descriptors to guest via packed vqs Sahil Siddiq
2025-03-24 14:34 ` Sahil Siddiq
2025-03-26 8:34 ` Eugenio Perez Martin
2025-03-28 5:22 ` Sahil Siddiq
2025-03-28 7:53 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 6/7] vhost: Validate transport device features for " Sahil Siddiq
2025-03-26 12:06 ` Eugenio Perez Martin
2025-03-28 5:33 ` Sahil Siddiq
2025-03-28 8:02 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 7/7] vdpa: Support setting vring_base for packed SVQ Sahil Siddiq
2025-03-26 12:08 ` Eugenio Perez Martin
2025-03-27 18:44 ` Sahil Siddiq
2025-03-26 7:35 ` [RFC v5 0/7] Add packed format to shadow virtqueue Eugenio Perez Martin
2025-04-14 9:20 ` Sahil Siddiq
2025-04-15 19:20 ` Sahil Siddiq
2025-04-16 7:20 ` Eugenio Perez Martin
2025-05-14 6:21 ` Sahil Siddiq
2025-05-15 6:19 ` Eugenio Perez Martin
2025-06-26 5:16 ` Sahil Siddiq
2025-06-26 7:37 ` Eugenio Perez Martin
2025-07-30 14:32 ` Sahil Siddiq
2025-07-31 13:52 ` Eugenio Perez Martin
2025-08-04 6:04 ` Sahil Siddiq
2025-08-05 9:07 ` Eugenio Perez Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJaqyWdJCaP79Pvy1YgkrZC4p7HBsg1U5MDmQR-LLDFmrBGSzg@mail.gmail.com \
--to=eperezma@redhat.com \
--cc=icegambit91@gmail.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sahilcdq@proton.me \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).