From: Tiwei Bie <tiwei.bie@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: mst@redhat.com, virtualization@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
wexu@redhat.com, jfreimann@redhat.com
Subject: Re: [RFC v4 3/5] virtio_ring: add packed ring support
Date: Sat, 19 May 2018 10:29:38 +0800 [thread overview]
Message-ID: <20180519022938.GA18888@debian> (raw)
In-Reply-To: <1a661df0-8ca9-b31d-9c17-8684d608a33a@redhat.com>
On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > > > + unsigned int id, void **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > + struct vring_packed_desc *desc;
> > > > > > > > > > > > + unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > + /* Clear data ptr. */
> > > > > > > > > > > > + vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > + i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > + for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > + desc = &vq->vring_packed.desc[i];
> > > > > > > > > > > > + vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > > > > > > > of out of order completion since it depends on the information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > > > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > > > > > > >
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > >
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > >
> > > > But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > >
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > >
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > > * descriptor offset can be deduced from the CQE index instead of
> > > > * reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > >
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > > drivers/net gives some hints. Looks like there're few deivces do this.
> > >
> > > > I guess maybe storage device may want OOO.
> > > Right, some iSCSI did.
> > >
> > > But tracking them elsewhere is not only for OOO.
> > >
> > > Spec said:
> > >
> > > for element address
> > >
> > > "
> > > In a used descriptor, Element Address is unused.
> > > "
> > >
> > > for Next flag:
> > >
> > > "
> > > For example, if descriptors are used in the same order in which they are
> > > made available, this will result in
> > > the used descriptor overwriting the first available descriptor in the list,
> > > the used descriptor for the next list
> > > overwriting the first available descriptor in the next list, etc.
> > > "
> > >
> > > for in order completion:
> > >
> > > "
> > > This will result in the used descriptor overwriting the first available
> > > descriptor in the batch, the used descriptor
> > > for the next batch overwriting the first available descriptor in the next
> > > batch, etc.
> > > "
> > >
> > > So:
> > >
> > > - It's an alignment to the spec
> > > - device may (or should) overwrite the descriptor make also make address
> > > field useless.
> > You didn't get my point...
>
> I don't hope so.
>
> > I agreed driver should track the DMA addrs or some
> > other necessary things from the very beginning. And
> > I also repeated the spec to emphasize that it does
> > make sense. And I'd like to do that.
> >
> > What I was saying is that, to support OOO, we may
> > need to manage these context (which saves DMA addrs
> > etc) via a list which is similar to the desc list
> > maintained via `next` in split ring instead of an
> > array whose elements always can be indexed directly.
>
> My point is these context is a must (not only for OOO).
Yeah, and I have the exactly same point after you
pointed that I shouldn't get the addrs from descs.
I do think it makes sense. I'll do it in the next
version. I don't have any doubt about it. All my
questions are about the OOO, instead of whether we
should save context or not. It just seems that you
thought I don't want to do it, and were trying to
convince me that I should do it.
>
> >
> > The desc ring in split ring is an array, but its
> > free entries are managed as list via next. I was
> > just wondering, do we want to manage such a list
> > because of OOO. It's just a very simple question
> > that I want to hear your opinion... (It doesn't
> > means anything, e.g. It doesn't mean I don't want
> > to support OOO. It's just a simple question...)
>
> So the question is yes. But I admit I don't have better idea other than what
> you propose here (something like split ring which is a little bit sad).
> Maybe Michael had.
Yeah, that's why I asked this question. It will
make the packed ring a bit similar to split ring
at least in the driver part. So I want to draw
your attention on this to make sure that we're
on the same page.
Best regards,
Tiwei Bie
>
> Thanks
>
> >
> > Best regards,
> > Tiwei Bie
> >
> > > Thanks
> > >
> > > > Best regards,
> > > > Tiwei Bie
> > > >
> > > > > Thanks
> > > > >
> > > > > > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > > > > > much more simpler to be started with.
> > > > > > +1
> > > > > >
> > > > > > Best regards,
> > > > > > Tiwei Bie
>
next prev parent reply other threads:[~2018-05-19 2:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-16 8:37 [RFC v4 0/5] virtio: support packed ring Tiwei Bie
2018-05-16 8:37 ` [RFC v4 1/5] virtio: add packed ring definitions Tiwei Bie
2018-05-16 8:37 ` [RFC v4 2/5] virtio_ring: support creating packed ring Tiwei Bie
2018-05-16 8:37 ` [RFC v4 3/5] virtio_ring: add packed ring support Tiwei Bie
2018-05-16 11:50 ` Jason Wang
2018-05-16 12:39 ` Tiwei Bie
2018-05-16 12:51 ` Jason Wang
2018-05-16 13:45 ` Tiwei Bie
2018-05-16 14:05 ` Jason Wang
2018-05-16 14:33 ` Tiwei Bie
2018-05-17 12:01 ` Jason Wang
2018-05-18 11:29 ` Tiwei Bie
2018-05-18 13:17 ` Jason Wang
2018-05-18 14:33 ` Tiwei Bie
2018-05-19 1:12 ` Jason Wang
2018-05-19 2:29 ` Tiwei Bie [this message]
2018-05-21 2:30 ` Jason Wang
2018-05-21 2:39 ` Tiwei Bie
2018-05-16 8:37 ` [RFC v4 4/5] virtio_ring: add event idx support in packed ring Tiwei Bie
2018-05-16 12:17 ` Jason Wang
2018-05-16 12:58 ` Tiwei Bie
2018-05-16 13:31 ` Jason Wang
2018-05-16 8:37 ` [RFC v4 5/5] virtio_ring: enable " Tiwei Bie
2018-05-16 10:15 ` Sergei Shtylyov
2018-05-16 10:21 ` Tiwei Bie
2018-05-16 11:42 ` Sergei Shtylyov
2018-05-16 12:26 ` Tiwei Bie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180519022938.GA18888@debian \
--to=tiwei.bie@intel.com \
--cc=jasowang@redhat.com \
--cc=jfreimann@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=virtualization@lists.linux-foundation.org \
--cc=wexu@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).