netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tiwei Bie <tiwei.bie@intel.com>
To: Jason Wang <jasowang@redhat.com>
Cc: mst@redhat.com, virtualization@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	wexu@redhat.com, jfreimann@redhat.com
Subject: Re: [RFC v4 3/5] virtio_ring: add packed ring support
Date: Sat, 19 May 2018 10:29:38 +0800	[thread overview]
Message-ID: <20180519022938.GA18888@debian> (raw)
In-Reply-To: <1a661df0-8ca9-b31d-9c17-8684d608a33a@redhat.com>

On Sat, May 19, 2018 at 09:12:30AM +0800, Jason Wang wrote:
> On 2018年05月18日 22:33, Tiwei Bie wrote:
> > On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> > > On 2018年05月18日 19:29, Tiwei Bie wrote:
> > > > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > > > +			      unsigned int id, void **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +	struct vring_packed_desc *desc;
> > > > > > > > > > > > +	unsigned int i, j;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	/* Clear data ptr. */
> > > > > > > > > > > > +	vq->desc_state[id].data = NULL;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	i = head;
> > > > > > > > > > > > +
> > > > > > > > > > > > +	for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > > > +		desc = &vq->vring_packed.desc[i];
> > > > > > > > > > > > +		vring_unmap_one_packed(vq, desc);
> > > > > > > > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > > > > > > > of out of order completion since it depends on the information in the
> > > > > > > > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > > > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > > > 
> > > > > > > > > > Best regards,
> > > > > > > > > > Tiwei Bie
> > > > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > > > > > > > 
> > > > > > > > I got your point now. I think it makes sense to reserve
> > > > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > > > no matter whether we support out-of-order or not.
> > > > > > > Maybe I was wrong, but I remember spec mentioned something like this.
> > > > > > You're right. Spec mentioned this. I was just repeating
> > > > > > the spec to emphasize that it does make sense. :)
> > > > > > 
> > > > > > > > But combining it with the out-of-order support, it will
> > > > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > > > list that is very similar to the desc ring in the split
> > > > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > > > To make it work for OOO backends I think we need something like this
> > > > > > > (hardware NIC drivers are usually have something like this).
> > > > > > Which hardware NIC drivers have this?
> > > > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > > > It seems that I had a misunderstanding on your
> > > > previous comments. I know it's quite common for
> > > > drivers to track e.g. DMA addrs somewhere (and
> > > > I think one reason behind this is that they want
> > > > to reuse the bits of addr field).
> > > Yes, we may want this for virtio-net as well in the future.
> > > 
> > > >    But tracking
> > > > addrs somewhere doesn't means supporting OOO.
> > > > I thought you were saying it's quite common for
> > > > hardware NIC drivers to support OOO (i.e. NICs
> > > > will return the descriptors OOO):
> > > > 
> > > > I'm not familiar with mlx4, maybe I'm wrong.
> > > > I just had a quick glance. And I found below
> > > > comments in mlx4_en_process_rx_cq():
> > > > 
> > > > ```
> > > > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> > > >    * descriptor offset can be deduced from the CQE index instead of
> > > >    * reading 'cqe->index' */
> > > > index = cq->mcq.cons_index & ring->size_mask;
> > > > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > > > ```
> > > > 
> > > > It seems that although they have a completion
> > > > queue, they are still using the ring in order.
> > > I guess so (at least from the above bits). Git grep -i "out of order" in
> > > drivers/net gives some hints. Looks like there're few deivces do this.
> > > 
> > > > I guess maybe storage device may want OOO.
> > > Right, some iSCSI did.
> > > 
> > > But tracking them elsewhere is not only for OOO.
> > > 
> > > Spec said:
> > > 
> > > for element address
> > > 
> > > "
> > > In a used descriptor, Element Address is unused.
> > > "
> > > 
> > > for Next flag:
> > > 
> > > "
> > > For example, if descriptors are used in the same order in which they are
> > > made available, this will result in
> > > the used descriptor overwriting the first available descriptor in the list,
> > > the used descriptor for the next list
> > > overwriting the first available descriptor in the next list, etc.
> > > "
> > > 
> > > for in order completion:
> > > 
> > > "
> > > This will result in the used descriptor overwriting the first available
> > > descriptor in the batch, the used descriptor
> > > for the next batch overwriting the first available descriptor in the next
> > > batch, etc.
> > > "
> > > 
> > > So:
> > > 
> > > - It's an alignment to the spec
> > > - device may (or should) overwrite the descriptor make also make address
> > > field useless.
> > You didn't get my point...
> 
> I don't hope so.
> 
> > I agreed driver should track the DMA addrs or some
> > other necessary things from the very beginning. And
> > I also repeated the spec to emphasize that it does
> > make sense. And I'd like to do that.
> > 
> > What I was saying is that, to support OOO, we may
> > need to manage these context (which saves DMA addrs
> > etc) via a list which is similar to the desc list
> > maintained via `next` in split ring instead of an
> > array whose elements always can be indexed directly.
> 
> My point is these context is a must (not only for OOO).

Yeah, and I have the exactly same point after you
pointed that I shouldn't get the addrs from descs.
I do think it makes sense. I'll do it in the next
version. I don't have any doubt about it. All my
questions are about the OOO, instead of whether we
should save context or not. It just seems that you
thought I don't want to do it, and were trying to
convince me that I should do it.

> 
> > 
> > The desc ring in split ring is an array, but its
> > free entries are managed as list via next. I was
> > just wondering, do we want to manage such a list
> > because of OOO. It's just a very simple question
> > that I want to hear your opinion... (It doesn't
> > means anything, e.g. It doesn't mean I don't want
> > to support OOO. It's just a simple question...)
> 
> So the question is yes. But I admit I don't have better idea other than what
> you propose here (something like split ring which is a little bit sad).
> Maybe Michael had.

Yeah, that's why I asked this question. It will
make the packed ring a bit similar to split ring
at least in the driver part. So I want to draw
your attention on this to make sure that we're
on the same page.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > Thanks
> > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > > 
> > > > > Thanks
> > > > > 
> > > > > > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > > > > > much more simpler to be started with.
> > > > > > +1
> > > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> 

  reply	other threads:[~2018-05-19  2:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-16  8:37 [RFC v4 0/5] virtio: support packed ring Tiwei Bie
2018-05-16  8:37 ` [RFC v4 1/5] virtio: add packed ring definitions Tiwei Bie
2018-05-16  8:37 ` [RFC v4 2/5] virtio_ring: support creating packed ring Tiwei Bie
2018-05-16  8:37 ` [RFC v4 3/5] virtio_ring: add packed ring support Tiwei Bie
2018-05-16 11:50   ` Jason Wang
2018-05-16 12:39     ` Tiwei Bie
2018-05-16 12:51       ` Jason Wang
2018-05-16 13:45         ` Tiwei Bie
2018-05-16 14:05           ` Jason Wang
2018-05-16 14:33             ` Tiwei Bie
2018-05-17 12:01               ` Jason Wang
2018-05-18 11:29                 ` Tiwei Bie
2018-05-18 13:17                   ` Jason Wang
2018-05-18 14:33                     ` Tiwei Bie
2018-05-19  1:12                       ` Jason Wang
2018-05-19  2:29                         ` Tiwei Bie [this message]
2018-05-21  2:30                           ` Jason Wang
2018-05-21  2:39                             ` Tiwei Bie
2018-05-16  8:37 ` [RFC v4 4/5] virtio_ring: add event idx support in packed ring Tiwei Bie
2018-05-16 12:17   ` Jason Wang
2018-05-16 12:58     ` Tiwei Bie
2018-05-16 13:31       ` Jason Wang
2018-05-16  8:37 ` [RFC v4 5/5] virtio_ring: enable " Tiwei Bie
2018-05-16 10:15   ` Sergei Shtylyov
2018-05-16 10:21     ` Tiwei Bie
2018-05-16 11:42       ` Sergei Shtylyov
2018-05-16 12:26         ` Tiwei Bie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180519022938.GA18888@debian \
    --to=tiwei.bie@intel.com \
    --cc=jasowang@redhat.com \
    --cc=jfreimann@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=wexu@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).