From: Jonah Palmer <jonah.palmer@oracle.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, raphael@enfabrica.net,
kwolf@redhat.com, hreitz@redhat.com, jasowang@redhat.com,
pbonzini@redhat.com, fam@euphon.net, stefanha@redhat.com,
qemu-block@nongnu.org, schalla@marvell.com, leiyang@redhat.com,
virtio-fs@lists.linux.dev, si-wei.liu@oracle.com,
boris.ostrovsky@oracle.com
Subject: Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support
Date: Mon, 25 Mar 2024 12:52:16 -0400 [thread overview]
Message-ID: <5b8494df-c15c-4d18-ad4a-74b5584429ce@oracle.com> (raw)
In-Reply-To: <CAJaqyWf-oS_Y7EgO7DrVxMj5Roe=yjbbU3tka=Yj3St1ALCvnw@mail.gmail.com>
On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>>
>> The goal of these patches is to add support to a variety of virtio and
>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
>> indicates that all buffers are used by the device in the same order in
>> which they were made available by the driver.
>>
>> These patches attempt to implement a generalized, non-device-specific
>> solution to support this feature.
>>
>> The core feature behind this solution is a buffer mechanism in the form
>> of GLib's GHashTable. The decision behind using a hash table was to
>> leverage their ability for quick lookup, insertion, and removal
>> operations. Given that our keys are simply numbers of an ordered
>> sequence, a hash table seemed like the best choice for a buffer
>> mechanism.
>>
>> ---------------------
>>
>> The strategy behind this implementation is as follows:
>>
>> We know that buffers that are popped from the available ring and enqueued
>> for further processing will always done in the same order in which they
>> were made available by the driver. Given this, we can note their order
>> by assigning the resulting VirtQueueElement a key. This key is a number
>> in a sequence that represents the order in which they were popped from
>> the available ring, relative to the other VirtQueueElements.
>>
>> For example, given 3 "elements" that were popped from the available
>> ring, we assign a key value to them which represents their order (elem0
>> is popped first, then elem1, then lastly elem2):
>>
>> elem2 -- elem1 -- elem0 ---> Enqueue for processing
>> (key: 2) (key: 1) (key: 0)
>>
>> Then these elements are enqueued for further processing by the host.
>>
>> While most devices will return these completed elements in the same
>> order in which they were enqueued, some devices may not (e.g.
>> virtio-blk). To guarantee that these elements are put on the used ring
>> in the same order in which they were enqueued, we can use a buffering
>> mechanism that keeps track of the next expected sequence number of an
>> element.
>>
>> In other words, if the completed element does not have a key value that
>> matches the next expected sequence number, then we know this element is
>> not in-order and we must stash it away in a hash table until an order
>> can be made. The element's key value is used as the key for placing it
>> in the hash table.
>>
>> If the completed element has a key value that matches the next expected
>> sequence number, then we know this element is in-order and we can push
>> it on the used ring. Then we increment the next expected sequence number
>> and check if the hash table contains an element at this key location.
>>
>> If so, we retrieve this element, push it to the used ring, delete the
>> key-value pair from the hash table, increment the next expected sequence
>> number, and check the hash table again for an element at this new key
>> location. This process is repeated until we're unable to find an element
>> in the hash table to continue the order.
>>
>> So, for example, say the 3 elements we enqueued were completed in the
>> following order: elem1, elem2, elem0. The next expected sequence number
>> is 0:
>>
>> exp-seq-num = 0:
>>
>> elem1 --> elem1.key == exp-seq-num ? --> No, stash it
>> (key: 1) |
>> |
>> v
>> ================
>> |key: 1 - elem1|
>> ================
>> ---------------------
>> exp-seq-num = 0:
>>
>> elem2 --> elem2.key == exp-seq-num ? --> No, stash it
>> (key: 2) |
>> |
>> v
>> ================
>> |key: 1 - elem1|
>> |--------------|
>> |key: 2 - elem2|
>> ================
>> ---------------------
>> exp-seq-num = 0:
>>
>> elem0 --> elem0.key == exp-seq-num ? --> Yes, push to used ring
>> (key: 0)
>>
>> exp-seq-num = 1:
>>
>> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>> remove elem from table
>> |
>> v
>> ================
>> |key: 2 - elem2|
>> ================
>>
>> exp-seq-num = 2:
>>
>> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>> remove elem from table
>> |
>> v
>> ================
>> | *empty* |
>> ================
>>
>> exp-seq-num = 3:
>>
>> lookup(table, exp-seq-num) != NULL ? --> No, done
>> ---------------------
>>
>
> I think to use a hashtable to handle this has an important drawback:
> it hurts performance on the devices that are using right in-order
> because of hash calculus, to benefit devices that are using it badly
> by using descriptors out of order. We should use data structs that are
> as free as possible for the first, and we don't care to worse the
> experience of the devices that enable in_order and they shouldn't.
>
Right, because if descriptors are coming in in-order, we still search
the (empty) hash table.
Hmm... what if we introduced a flag to see if we actually should bother
searching the hash table? That way we avoid the cost of searching when
we really don't need to.
> So I suggest reusing vq->used_elems array vq. At each used descriptor
> written in the used ring, you know the next head is elem->index +
> elem->ndescs, so you can check if that element has been filled or not.
> If used, it needs to be flushed too. If not used, just return.
>
> Of course virtqueue_flush also needs to take this into account.
>
> What do you think, does it make sense to you?
>
I'm having a bit of trouble understanding the suggestion here. Would you
mind elaborating a bit more for me on this?
For example, say elem0, elem1, and elem2 were enqueued in-order (elem0
being first, elem2 last) and then elem2 finishes first, elem1 second,
and elem0 third. Given that these elements finish out-of-order, how
would you handle these out-of-order elements using your suggestion?
Thanks :)
> Thanks!
>
>
>> Jonah Palmer (8):
>> virtio: Define InOrderVQElement
>> virtio: Create/destroy/reset VirtQueue In-Order hash table
>> virtio: Define order variables
>> virtio: Implement in-order handling for virtio devices
>> virtio-net: in-order handling
>> vhost-svq: in-order handling
>> vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
>> virtio: Add VIRTIO_F_IN_ORDER property definition
>>
>> hw/block/vhost-user-blk.c | 1 +
>> hw/net/vhost_net.c | 2 +
>> hw/net/virtio-net.c | 6 +-
>> hw/scsi/vhost-scsi.c | 1 +
>> hw/scsi/vhost-user-scsi.c | 1 +
>> hw/virtio/vhost-shadow-virtqueue.c | 15 ++++-
>> hw/virtio/vhost-user-fs.c | 1 +
>> hw/virtio/vhost-user-vsock.c | 1 +
>> hw/virtio/virtio.c | 103 ++++++++++++++++++++++++++++-
>> include/hw/virtio/virtio.h | 20 +++++-
>> net/vhost-vdpa.c | 1 +
>> 11 files changed, 145 insertions(+), 7 deletions(-)
>>
>> --
>> 2.39.3
>>
>
next prev parent reply other threads:[~2024-03-25 16:53 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-21 15:57 [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-21 15:57 ` [RFC 1/8] virtio: Define InOrderVQElement Jonah Palmer
2024-03-22 9:45 ` Eugenio Perez Martin
2024-03-25 17:08 ` Jonah Palmer
2024-03-25 19:12 ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table Jonah Palmer
2024-03-21 15:57 ` [RFC 3/8] virtio: Define order variables Jonah Palmer
2024-03-21 15:57 ` [RFC 4/8] virtio: Implement in-order handling for virtio devices Jonah Palmer
2024-03-22 10:46 ` Eugenio Perez Martin
2024-03-25 17:34 ` Jonah Palmer
2024-03-25 19:45 ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 5/8] virtio-net: in-order handling Jonah Palmer
2024-03-21 15:57 ` [RFC 6/8] vhost-svq: " Jonah Palmer
2024-03-21 15:57 ` [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-22 10:47 ` Eugenio Perez Martin
2024-03-21 15:57 ` [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer
2024-03-22 10:48 ` Eugenio Perez Martin
2024-03-21 19:48 ` [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support Dongli Zhang
2024-03-21 21:25 ` Jonah Palmer
2024-03-22 11:18 ` Eugenio Perez Martin
2024-03-25 16:52 ` Jonah Palmer [this message]
2024-03-25 20:33 ` Eugenio Perez Martin
2024-03-26 16:49 ` Jonah Palmer
2024-03-26 18:34 ` Eugenio Perez Martin
2024-03-26 19:01 ` Jonah Palmer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b8494df-c15c-4d18-ad4a-74b5584429ce@oracle.com \
--to=jonah.palmer@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=eperezma@redhat.com \
--cc=fam@euphon.net \
--cc=hreitz@redhat.com \
--cc=jasowang@redhat.com \
--cc=kwolf@redhat.com \
--cc=leiyang@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=raphael@enfabrica.net \
--cc=schalla@marvell.com \
--cc=si-wei.liu@oracle.com \
--cc=stefanha@redhat.com \
--cc=virtio-fs@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).