Re: [RFC v2 1/5] virtio: Initialize sequence variables

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jonah Palmer <jonah.palmer@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, raphael@enfabrica.net,
	kwolf@redhat.com, hreitz@redhat.com, jasowang@redhat.com,
	pbonzini@redhat.com, fam@euphon.net, stefanha@redhat.com,
	qemu-block@nongnu.org, schalla@marvell.com, leiyang@redhat.com,
	virtio-fs@lists.linux.dev, si-wei.liu@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: [RFC v2 1/5] virtio: Initialize sequence variables
Date: Fri, 5 Apr 2024 09:58:46 -0400	[thread overview]
Message-ID: <c1e78d58-53bb-4091-b775-3d4a004a8bfd@oracle.com> (raw)
In-Reply-To: <CAJaqyWf1=v7V-SzmbRCQo-DzuOKCQK2u5E9XVFV877Z5choVig@mail.gmail.com>



On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
> On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>>
>>
>> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
>>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
>>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>>>>>
>>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
>>>>>> structures. A VirtQueue's sequence variables are initialized when a
>>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
>>>>>> variable is initialized when a VirtQueueElement is being initialized.
>>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>>>>>>
>>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
>>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
>>>>>> The next VirtQueueElement added to the used ring must match this
>>>>>> sequence number before additional elements can be safely added to the
>>>>>> used ring. It's also particularly useful for helping find the number of
>>>>>> new elements added to the used ring.
>>>>>>
>>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
>>>>>> This value is essentially a counter where the value is assigned to a new
>>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
>>>>>> sequence number can be between 0 and 65,535.
>>>>>>
>>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
>>>>>> the VirtQueueElement when it was created. This value must match with the
>>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
>>>>>> by the device.
>>>>>>
>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>>>> ---
>>>>>>     hw/virtio/virtio.c         | 18 ++++++++++++++++++
>>>>>>     include/hw/virtio/virtio.h |  1 +
>>>>>>     2 files changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>> index fb6b4ccd83..069d96df99 100644
>>>>>> --- a/hw/virtio/virtio.c
>>>>>> +++ b/hw/virtio/virtio.c
>>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
>>>>>>         uint16_t used_idx;
>>>>>>         bool used_wrap_counter;
>>>>>>
>>>>>> +    /* In-Order sequence indices */
>>>>>> +    uint16_t used_seq_idx;
>>>>>> +    uint16_t current_seq_idx;
>>>>>> +
>>>>>
>>>>> I'm having a hard time understanding the difference between these and
>>>>> last_avail_idx and used_idx. It seems to me if we replace them
>>>>> everything will work? What am I missing?
>>>>>
>>>>
>>>> For used_seq_idx, it does work like used_idx except the difference is
>>>> when their values get updated, specifically for the split VQ case.
>>>>
>>>> As you know, for the split VQ case, the used_idx is updated during
>>>> virtqueue_split_flush. However, imagine a batch of elements coming in
>>>> where virtqueue_split_fill is called multiple times before
>>>> virtqueue_split_flush. We want to make sure we write these elements to
>>>> the used ring in-order and we'll know its order based on used_seq_idx.
>>>>
>>>> Alternatively, I thought about replicating the logic for the packed VQ
>>>> case (where this used_seq_idx isn't used) where we start looking at
>>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
>>>> element, but I wasn't sure how to handle the case where elements get
>>>> used (written to the used ring) and new elements get put in used_elems
>>>> before the used_idx is updated. Since this search would require us to
>>>> always start at index vq->used_idx.
>>>>
>>>> For example, say, of three elements getting filled (elem0 - elem2),
>>>> elem1 and elem0 come back first (vq->used_idx = 0):
>>>>
>>>> elem1 - not in-order
>>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>>>>            in-order, write elem0 and elem1 to used ring, mark elements as
>>>>            used
>>>>
>>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
>>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
>>>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>>>>
>>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
>>>> could continue looking through the vq->used_elems array until we find an
>>>> unused element... but then again how would we (1) know if the element is
>>>> in-order and (2) know when to stop searching?
>>>>
>>>
>>> Ok I think I understand the problem now. It is aggravated if we add
>>> chained descriptors to the mix.
>>>
>>> We know that the order of used descriptors must be the exact same as
>>> the order they were made available, leaving out in order batching.
>>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
>>> marks them as used somehow? Two booleans (or flag) would do for a
>>> first iteration.
>>>
>>> If we go with this approach I think used_elems should be renamed actually.
>>>
>>
>> If I'm understanding correctly, I don't think adding newly created
>> elements to vq->used_elems at virtqueue_pop will do much for us.
> 
> By knowing what descriptor id must go in each position of the used ring.
> 
> Following your example, let's say avail_idx is 10 at that moment.
> Then, the driver makes available the three elements you mention, so:
> used_elems[10] = elem0
> used_elems[11] = elem1
> used_elems[12] = elem2
> 
> Now the device uses elem1. virtqueue_push can search linearly for
> elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
> the device is mis-behaving, no need to optimize this behavior.
> used_elems[11].index == elem->index, so we mark it as used somehow.
> Let's say we add a boolean to VirtQueueElement.
> 
> virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
> looking for used elements. At this moment used_elems[used_idx] is not
> used so it stops there.
> 
> Now elem0 is pushed. It is the first one in the
> used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
> to the used ring at fill. used_idx++. We use the rest of the
> descriptor until we find one in used_elems that is not used, which is
> used_elems[12].
> 
> After that virtqueue_flush is called. At its scanning, used_elems[10]
> is used, so it writes it to the used ring. After that, used_elems[11]
> is also used, so it is written also. used_elems[12] is not used, so it
> stops there.
> 
> Finally, elem2 is pushed, so used_elems[12] is written.
> virtqueue_flush detects it, so it writes it to the guest's used ring.
> 
> Let me know what you think. I find it simpler than declaring new
> indexes, but I may be wrong.
> 

I think I see where you're getting at, but I just have a few clarifying 
questions about your proposal here.

So you're proposing to add entries to used_elems at virtqueue_pop, ok.

avail_idx = 10, then the driver makes some new entries (elems) available 
in the avail ring:

used_elems[10] = elem0
used_elems[11] = elem1
used_elems[12] = elem2

At this point, avail_idx = 13, used_idx = 10.

elem1 gets used first, ok.

Now, if I'm understanding correctly, you're saying that in 
virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan 
used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index 
== elem->index and mark it as used, e.g. used_elems[i].used = true. 
Okay, now used_elems[11] has been marked as used.

Now we make it to virtqueue_fill. What's the role you want 
virtqueue_fill to take here (if any)?

You say that after we mark this element as used, we go to 
virtqueue_flush and scan for used elements in used_elems[used_idx] - 
used_elems[avail_idx]. Of course, the first one of this order will be in 
used_elems[used_idx], which is currently showing the element as unused, 
so we're done with this element for now.

So, what exactly is the role of virtqueue_flush here? I'm inferring here 
that you want the virtqueue_flush role (for split VQs) to do both the 
writing to the used ring (normally done in virtqueue_fill) as well as 
updating the used_idx (normally done in virtqueue_flush). Is this correct?

Next, elem0 gets used second.

Again, in virtqueue_push we scan scan used_elems[used_idx] - 
used_elems[avail_idx] to find used_elems[i].index == elem->index and 
mark it as used. Okay, used_elems[10] has been marked as used.

Then you say, "It is the first one in the used_elems[used_idx] - 
used_elems[avail_idx] range, so we can write it to the used ring at 
fill. used_idx++. We use the rest of the descriptor until we find one in 
used_elems that is not used, which is used_elems[12]."

This, to me, sounds like "in virtqueue_fill, when we find an order (e.g. 
used_elems[used_idx].index == elem->index) write it to the used ring AND 
increment the used_idx. Keep writing elements to the used ring if 
used_elems[used_idx].used == true and, for each element being written, 
incremented used_idx."

This is a bit confusing to me since next you say "After that, 
virtqueue_flush is called. At its scanning, used_elems[10] is used, so 
it writes it to the used ring. After that, used_elems[11] is also used, 
so it is written also. used_elems[12] is not used, so it stops there."

This sounds very similar to what you proposed for virtqueue_fill, except 
it looks like you're also saying to do this in virtqueue_flush, hence my 
confusion.

If you wouldn't mind, could you clarify the roles of virtqueue_fill and 
virtqueue_flush here for me? Thanks :)!

> This makes it difficult to actually batch used descriptors. My
> proposal is to address it in another series, by delegating it to the
> caller and recovering proper usage of virtqueue_push(idx) parameter.
> The caller can specify either to batch as usual, or to delegate the
> automatic (and potentially inefficient) ordering I'm proposing here.
> 

Just to be clear, for this series, you'd like me to implement a solution 
that does *not* consider the case where virtqueue_fill is called 
multiple times before virtqueue_flush (and to put a solution for this in 
a separate series)?

Are we not concerned that we might shoot ourselves in the foot here by 
implementing a process that may not work well for a batching solution, 
especially when we have an almost-working solution for batching and 
non-batching cases?

>> We
>> could just keep adding processed elements to vq->used_elems at
>> virtqueue_fill but instead of:
>>
>> vq->used_elems[seq_idx].in_num = elem->in_num;
>> vq->used_elems[seq_idx].out_num = elem->out_num;
>>
>> We could do:
>>
>> vq->used_elems[seq_idx].in_num = 1;
>> vq->used_elems[seq_idx].out_num = 1;
>>
>> We'd use in_num and out_num as separate flags. in_num could indicate if
>> this element has been written to the used ring while out_num could
>> indicate if this element has been flushed (1 for no, 0 for yes). In
>> other words, when we go to write to the used ring, start at index
>> vq->used_idx and iterate through the used elements.
>>
>> If a used element's in_num and out_num == 0, then this element is
>> invalid (not yet processed) and we stop the search.
>>
>> If a used element's in_num and out_num == 1, then this element is valid,
>> written to the used ring, in_num is set to 0, and the search continues.
>>
>> Lastly, if a used element's in_num == 0 but out_num == 1, then this
>> element has already been written to the used ring but not yet flushed,
>> so ignore this element and continue searching.
>>
>> There should never be a case where in_num == 1 and out_num == 0.
>>
>> However, this would probably mean that before (or right after) we
>> actually perform the flush we'll have to iterate through the used_elems
>> array one more time and set their out_num's to 0 to indicate the element
>> has been flushed.
>>
>> Again, this is just for the batched split VQ case where we have to keep
>> track of elements that have been written but not flushed and elements
>> that have been written and flushed, given that we don't know which
>> elements have actually been written to the used ring until the used_idx
>> is updated.
>>
>> This approach appears more costly though if we're really trying to avoid
>> having this new used_seq_idx VirtQueue member.
>>
>>>> In any case, the use of this variable could be seen as an optimization
>>>> as its value will tell us where to start looking in vq->used_elems
>>>> instead of always starting at vq->used_idx.
>>>>
>>>> If this is like a one-shot scenario where one element gets written and
>>>> then flushed after, then yes in this case used_seq_idx == used_idx.
>>>>
>>>> ------
>>>>
>>>> For current_seq_idx, this is pretty much just a counter. Every new
>>>> VirtQueueElement created from virtqueue_pop is given a number and the
>>>> counter is incremented. Like grabbing a ticket number and waiting for
>>>> your number to be called. The next person to grab a ticket number will
>>>> be your number + 1.
>>>>
>>>
>>> So it's like last_avail_idx, isn't it?
>>>
>>
>> For the split VQ case, pretty much. Though if we hit this case in
>> virtqueue_split_pop, we may get into some trouble:
>>
>> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
>>       goto done;
>> }
>>
> 
> That's a fatal mistake and the device breaks, vdev->broken = true. It
> cannot be used anymore from that point on, because of all the checks
> of that variable.
> 
> Does that solve the problem?
> 

Ah, it does. My apologies, I should've recognized this would result in 
the device breaking.

>> However for the packed VQ case, last_avail_idx might not work in the way
>> we'd need it to for this implementation. In virtqueue_packed_pop, we see
>> this:
>>
>> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
>> vq->last_avail_idx += elem->ndescs;
>>
>> It would appear as though last_avail_idx is incremented by total number
>> of descriptors associated with the element, which can be greater than 1.
>> This implementation increments by 1 for each element.
>>
>> Actually... It's good you mentioned this because I think my packed VQ
>> implementation is wrong. For packed VQs, vq->used_idx is incremented by
>> the total number of descriptors in the flushed elements and not
>> necessarily the number of elements being flushed like in the split VQ
>> case. I'm adding elements to vq->used_elems in a per-element sequence
>> rather than going by the number of descriptors an element holds, which
>> should be the case for packed VQs.
>>
> 
> If you keep it by your proposed index I think you can increase it one
> per head, as they are the entries that are written in both cases.
> unsed_idx should increment properly already.
> 
> If you move to my proposal, both should increase by elem->ndescs as
> you suggest here.
> 

Ack! Thanks!

>>>> Let me know if I'm making any sense. Thanks :)
>>>>
>>>> Jonah
>>>>
>>>>>>         /* Last used index value we have signalled on */
>>>>>>         uint16_t signalled_used;
>>>>>>
>>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>>>>>>             elem->in_sg[i] = iov[out_num + i];
>>>>>>         }
>>>>>>
>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>> +    }
>>>>>> +
>>>>>>         vq->inuse++;
>>>>>>
>>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
>>>>>>         vq->shadow_avail_idx = vq->last_avail_idx;
>>>>>>         vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>>>
>>>>>> +    /* Assign sequence index for in-order processing */
>>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
>>>>>> +        elem->seq_idx = vq->current_seq_idx++;
>>>>>> +    }
>>>>>> +
>>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>>>>>>     done:
>>>>>>         address_space_cache_destroy(&indirect_desc_cache);
>>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
>>>>>>         vdev->vq[i].notification = true;
>>>>>>         vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>>>>>>         vdev->vq[i].inuse = 0;
>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>         virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
>>>>>>     }
>>>>>>
>>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
>>>>>>         vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>>>>>>         vdev->vq[i].handle_output = handle_output;
>>>>>>         vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
>>>>>> +    vdev->vq[i].used_seq_idx = 0;
>>>>>> +    vdev->vq[i].current_seq_idx = 0;
>>>>>>
>>>>>>         return &vdev->vq[i];
>>>>>>     }
>>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
>>>>>> index b3c74a1bca..910b2a3427 100644
>>>>>> --- a/include/hw/virtio/virtio.h
>>>>>> +++ b/include/hw/virtio/virtio.h
>>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>>>>>>         hwaddr *out_addr;
>>>>>>         struct iovec *in_sg;
>>>>>>         struct iovec *out_sg;
>>>>>> +    uint16_t seq_idx;
>>>>>>     } VirtQueueElement;
>>>>>>
>>>>>>     #define VIRTIO_QUEUE_MAX 1024
>>>>>> --
>>>>>> 2.39.3
>>>>>>
>>>>>
>>>>
>>>
>>
>

next prev parent reply	other threads:[~2024-04-05 13:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
2024-04-03 10:18   ` Eugenio Perez Martin
2024-04-03 16:51     ` Jonah Palmer
2024-04-04 11:35       ` Eugenio Perez Martin
2024-04-04 14:41         ` Jonah Palmer
2024-04-04 16:33           ` Eugenio Perez Martin
2024-04-05 13:58             ` Jonah Palmer [this message]
2024-04-05 15:04               ` Eugenio Perez Martin
2024-04-05 15:37                 ` Jonah Palmer
2024-03-28 16:22 ` [RFC v2 2/5] virtio: In-order support for split VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 3/5] virtio: In-order support for packed VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 4/5] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-28 16:22 ` [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1e78d58-53bb-4091-b775-3d4a004a8bfd@oracle.com \
    --to=jonah.palmer@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=eperezma@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael@enfabrica.net \
    --cc=schalla@marvell.com \
    --cc=si-wei.liu@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).