All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH 4/6] vhost: add Tx zero copy
Date: Tue, 23 Aug 2016 22:31:47 +0800	[thread overview]
Message-ID: <20160823143147.GO30752@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <8043aa6f-baa6-3680-dc07-0e535f1b9b2b@redhat.com>

On Tue, Aug 23, 2016 at 04:04:30PM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/23/2016 10:10 AM, Yuanhan Liu wrote:
> >The basic idea of Tx zero copy is, instead of copying data from the
> >desc buf, here we let the mbuf reference the desc buf addr directly.
> >
> >Doing so, however, has one major issue: we can't update the used ring
> >at the end of rte_vhost_dequeue_burst. Because we don't do the copy
> >here, an update of the used ring would let the driver to reclaim the
> >desc buf. As a result, DPDK might reference a stale memory region.
> >
> >To update the used ring properly, this patch does several tricks:
> >
> >- when mbuf references a desc buf, refcnt is added by 1.
> >
> >  This is to pin lock the mbuf, so that a mbuf free from the DPDK
> >  won't actually free it, instead, refcnt is subtracted by 1.
> >
> >- We chain all those mbuf together (by tailq)
> >
> >  And we check it every time on the rte_vhost_dequeue_burst entrance,
> >  to see if the mbuf is freed (when refcnt equals to 1). If that
> >  happens, it means we are the last user of this mbuf and we are
> >  safe to update the used ring.
> >
> >- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
> >  right desc idx.
> >
> >Tx zero copy is introduced for performance reason, and some rough tests
> >show about 40% perfomance boost for packet size 1400B. FOr small packets,
> >(e.g. 64B), it actually slows a bit down. That is expected because this
> >patch introduces some extra works, and it outweighs the benefit from
> >saving few bytes copy.
> >
> >Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >---
> > lib/librte_vhost/vhost.c      |   2 +
> > lib/librte_vhost/vhost.h      |  21 ++++++
> > lib/librte_vhost/vhost_user.c |  41 +++++++++-
> > lib/librte_vhost/virtio_net.c | 169 +++++++++++++++++++++++++++++++++++++-----
> > 4 files changed, 214 insertions(+), 19 deletions(-)
> >
> ...
> 
> > rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > 	struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count)
> >@@ -823,6 +943,30 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
> > 	if (unlikely(vq->enabled == 0))
> > 		return 0;
> >
> >+	if (dev->tx_zero_copy) {
> >+		struct zcopy_mbuf *zmbuf, *next;
> >+		int nr_updated = 0;
> >+
> >+		for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
> >+		     zmbuf != NULL; zmbuf = next) {
> >+			next = TAILQ_NEXT(zmbuf, next);
> >+
> >+			if (mbuf_is_consumed(zmbuf->mbuf)) {
> >+				used_idx = vq->last_used_idx++ & (vq->size - 1);
> >+				update_used_ring(dev, vq, used_idx,
> >+						 zmbuf->desc_idx);
> >+				nr_updated += 1;
> >+
> >+				TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
> >+				rte_pktmbuf_free(zmbuf->mbuf);
> >+				put_zmbuf(zmbuf);
> >+				vq->nr_zmbuf -= 1;
> >+			}
> Shouldn't you break the loop here as soon as a mbuf is not consumed?

I have thought of that as well, as a micro optimization. But I was
wondering what if a heading mbuf is pin locked by the DPDK APP? Then
the whole chain would be blocked. This should be rare, but I think
we should think of the worst case.

Besides that, the performance boost I got is quite decent, that I think
we could drop this micro optimization.

> Indeed, they might not be consumed sequentially, and would cause
> last_used_idx to be incremented whereas it shouldn't.

I think the out of order used vring update won't be an issue here.
Well, there might be some problems for reconnect. The trick the
commit 0823c1cb0a73 ("vhost: workaround stale vring base") introduced
assumes that used vring will always be updated in order.

	--yliu

  reply	other threads:[~2016-08-23 14:22 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-23  8:10 [PATCH 0/6] vhost: add Tx zero copy support Yuanhan Liu
2016-08-23  8:10 ` [PATCH 1/6] vhost: simplify memory regions handling Yuanhan Liu
2016-08-23  9:17   ` Maxime Coquelin
2016-08-24  7:26   ` Xu, Qian Q
2016-08-24  7:40     ` Yuanhan Liu
2016-08-24  7:36       ` Xu, Qian Q
2016-08-23  8:10 ` [PATCH 2/6] vhost: get guest/host physical address mappings Yuanhan Liu
2016-08-23  9:58   ` Maxime Coquelin
2016-08-23 12:32     ` Yuanhan Liu
2016-08-23 13:25       ` Maxime Coquelin
2016-08-23 13:49         ` Yuanhan Liu
2016-08-23 14:05           ` Maxime Coquelin
2016-08-23  8:10 ` [PATCH 3/6] vhost: introduce last avail idx for Tx Yuanhan Liu
2016-08-23 12:27   ` Maxime Coquelin
2016-08-23  8:10 ` [PATCH 4/6] vhost: add Tx zero copy Yuanhan Liu
2016-08-23 14:04   ` Maxime Coquelin
2016-08-23 14:31     ` Yuanhan Liu [this message]
2016-08-23 15:40       ` Maxime Coquelin
2016-08-23  8:10 ` [PATCH 5/6] vhost: add a flag to enable " Yuanhan Liu
2016-09-06  9:00   ` Xu, Qian Q
2016-09-06  9:42     ` Xu, Qian Q
2016-09-06 10:02       ` Yuanhan Liu
2016-09-07  2:43         ` Xu, Qian Q
2016-09-06  9:55     ` Yuanhan Liu
2016-09-07 16:00       ` Thomas Monjalon
2016-09-08  7:21         ` Yuanhan Liu
2016-09-08  7:57           ` Thomas Monjalon
2016-08-23  8:10 ` [PATCH 6/6] examples/vhost: add an option " Yuanhan Liu
2016-08-23  9:31   ` Thomas Monjalon
2016-08-23 12:33     ` Yuanhan Liu
2016-08-23 14:14   ` Maxime Coquelin
2016-08-23 14:45     ` Yuanhan Liu
2016-08-23 14:18 ` [PATCH 0/6] vhost: add Tx zero copy support Maxime Coquelin
2016-08-23 14:42   ` Yuanhan Liu
2016-08-23 14:53     ` Yuanhan Liu
2016-08-23 16:41       ` Maxime Coquelin
2016-08-29  8:32 ` Xu, Qian Q
2016-08-29  8:57   ` Xu, Qian Q
2016-09-23  4:11     ` Yuanhan Liu
2016-10-09 15:20   ` Yuanhan Liu
2016-09-23  4:13 ` [PATCH v2 0/7] vhost: add dequeue " Yuanhan Liu
2016-09-23  4:13   ` [PATCH v2 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-09-23  4:13   ` [PATCH v2 2/7] vhost: get guest/host physical address mappings Yuanhan Liu
2016-09-26 20:17     ` Maxime Coquelin
2016-09-23  4:13   ` [PATCH v2 3/7] vhost: introduce last avail idx for dequeue Yuanhan Liu
2016-09-23  4:13   ` [PATCH v2 4/7] vhost: add dequeue zero copy Yuanhan Liu
2016-09-26 20:45     ` Maxime Coquelin
2016-10-06 14:37     ` Xu, Qian Q
2016-10-09  2:03       ` Yuanhan Liu
2016-10-10 10:12         ` Xu, Qian Q
2016-10-10 10:14           ` Maxime Coquelin
2016-10-10 10:22             ` Xu, Qian Q
2016-10-10 10:40               ` Xu, Qian Q
2016-10-10 11:48               ` Maxime Coquelin
2016-09-23  4:13   ` [PATCH v2 5/7] vhost: add a flag to enable " Yuanhan Liu
2016-09-26 20:57     ` Maxime Coquelin
2016-09-23  4:13   ` [PATCH v2 6/7] examples/vhost: add an option " Yuanhan Liu
2016-09-26 21:05     ` Maxime Coquelin
2016-09-23  4:13   ` [PATCH v2 7/7] net/vhost: " Yuanhan Liu
2016-09-26 21:05     ` Maxime Coquelin
2016-10-09  7:27   ` [PATCH v3 0/7] vhost: add dequeue zero copy support Yuanhan Liu
2016-10-09  7:27     ` [PATCH v3 1/7] vhost: simplify memory regions handling Yuanhan Liu
2016-10-09  7:27     ` [PATCH v3 2/7] vhost: get guest/host physical address mappings Yuanhan Liu
2016-11-29  3:10       ` linhaifeng
2016-11-29 13:14       ` linhaifeng
2016-10-09  7:27     ` [PATCH v3 3/7] vhost: introduce last avail idx for dequeue Yuanhan Liu
2016-10-09  7:27     ` [PATCH v3 4/7] vhost: add dequeue zero copy Yuanhan Liu
2016-10-09  7:27     ` [PATCH v3 5/7] vhost: add a flag to enable " Yuanhan Liu
2016-10-09  7:27     ` [PATCH v3 6/7] examples/vhost: add an option " Yuanhan Liu
2016-10-09  7:28     ` [PATCH v3 7/7] net/vhost: " Yuanhan Liu
2016-10-11 13:04     ` [PATCH v3 0/7] vhost: add dequeue zero copy support Xu, Qian Q
2016-10-12  7:48     ` Yuanhan Liu
2016-10-09 10:46 ` [PATCH 0/6] vhost: add Tx " linhaifeng
2016-10-10  8:03   ` Yuanhan Liu
2016-10-14  7:30     ` linhaifeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160823143147.GO30752@yliu-dev.sh.intel.com \
    --to=yuanhan.liu@linux.intel.com \
    --cc=dev@dpdk.org \
    --cc=maxime.coquelin@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.