All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: yuanhan.liu@linux.intel.com, huawei.xie@intel.com, dev@dpdk.org,
	vkaplans@redhat.com, stephen@networkplumber.org
Subject: Re: [PATCH v3] vhost: Add indirect descriptors support to the TX path
Date: Fri, 23 Sep 2016 18:49:18 +0300	[thread overview]
Message-ID: <20160923184310-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <1474619303-16709-1-git-send-email-maxime.coquelin@redhat.com>

On Fri, Sep 23, 2016 at 10:28:23AM +0200, Maxime Coquelin wrote:
> Indirect descriptors are usually supported by virtio-net devices,
> allowing to dispatch a larger number of requests.
> 
> When the virtio device sends a packet using indirect descriptors,
> only one slot is used in the ring, even for large packets.
> 
> The main effect is to improve the 0% packet loss benchmark.
> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> zero loss.
> 
> On the downside, micro-benchmark using testpmd txonly in VM and
> rxonly on host shows a loss between 1 and 4%.i But depending on
> the needs, feature can be disabled at VM boot time by passing
> indirect_desc=off argument to vhost-user device in Qemu.

Even better, change guest pmd to only use indirect
descriptors when this makes sense (e.g. sufficiently
large packets).

I would be very interested to know when does it make
sense.

The feature is there, it's up to guest whether to
use it.


> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> Changes since v2:
> =================
>  - Revert back to not checking feature flag to be aligned with
> kernel implementation
>  - Ensure we don't have nested indirect descriptors
>  - Ensure the indirect desc address is valid, to protect against
> malicious guests
> 
> Changes since RFC:
> =================
>  - Enrich commit message with figures
>  - Rebased on top of dpdk-next-virtio's master
>  - Add feature check to ensure we don't receive an indirect desc
> if not supported by the virtio driver
> 
>  lib/librte_vhost/vhost.c      |  3 ++-
>  lib/librte_vhost/virtio_net.c | 41 +++++++++++++++++++++++++++++++----------
>  2 files changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index 46095c3..30bb0ce 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -65,7 +65,8 @@
>  				(1ULL << VIRTIO_NET_F_CSUM)    | \
>  				(1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
>  				(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> -				(1ULL << VIRTIO_NET_F_GUEST_TSO6))
> +				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
> +				(1ULL << VIRTIO_RING_F_INDIRECT_DESC))
>  
>  uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
>  
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 8a151af..2e0a587 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
>  }
>  
>  static inline int __attribute__((always_inline))
> -copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> -		  struct rte_mbuf *m, uint16_t desc_idx,
> +copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> +		  uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
>  		  struct rte_mempool *mbuf_pool)
>  {
>  	struct vring_desc *desc;
> @@ -693,8 +693,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	/* A counter to avoid desc dead loop chain */
>  	uint32_t nr_desc = 1;
>  
> -	desc = &vq->desc[desc_idx];
> -	if (unlikely(desc->len < dev->vhost_hlen))
> +	desc = &descs[desc_idx];
> +	if (unlikely((desc->len < dev->vhost_hlen)) ||
> +			(desc->flags & VRING_DESC_F_INDIRECT))
>  		return -1;
>  
>  	desc_addr = gpa_to_vva(dev, desc->addr);
> @@ -711,7 +712,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	 */
>  	if (likely((desc->len == dev->vhost_hlen) &&
>  		   (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> -		desc = &vq->desc[desc->next];
> +		desc = &descs[desc->next];
> +		if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> +			return -1;
>  
>  		desc_addr = gpa_to_vva(dev, desc->addr);
>  		if (unlikely(!desc_addr))


Just to make sure, does this still allow a chain of
direct descriptors ending with an indirect one?
This is legal as per spec.

> @@ -747,10 +750,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  			if ((desc->flags & VRING_DESC_F_NEXT) == 0)
>  				break;
>  
> -			if (unlikely(desc->next >= vq->size ||
> -				     ++nr_desc > vq->size))
> +			if (unlikely(desc->next >= max_desc ||
> +				     ++nr_desc > max_desc))
> +				return -1;
> +			desc = &descs[desc->next];
> +			if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>  				return -1;
> -			desc = &vq->desc[desc->next];
>  
>  			desc_addr = gpa_to_vva(dev, desc->addr);
>  			if (unlikely(!desc_addr))
> @@ -878,19 +883,35 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>  	/* Prefetch descriptor index. */
>  	rte_prefetch0(&vq->desc[desc_indexes[0]]);
>  	for (i = 0; i < count; i++) {
> +		struct vring_desc *desc;
> +		uint16_t sz, idx;
>  		int err;
>  
>  		if (likely(i + 1 < count))
>  			rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>  
> +		if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
> +			desc = (struct vring_desc *)gpa_to_vva(dev,
> +					vq->desc[desc_indexes[i]].addr);
> +			if (unlikely(!desc))
> +				break;
> +
> +			rte_prefetch0(desc);
> +			sz = vq->desc[desc_indexes[i]].len / sizeof(*desc);
> +			idx = 0;
> +		} else {
> +			desc = vq->desc;
> +			sz = vq->size;
> +			idx = desc_indexes[i];
> +		}
> +
>  		pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
>  		if (unlikely(pkts[i] == NULL)) {
>  			RTE_LOG(ERR, VHOST_DATA,
>  				"Failed to allocate memory for mbuf.\n");
>  			break;
>  		}
> -		err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i],
> -					mbuf_pool);
> +		err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool);
>  		if (unlikely(err)) {
>  			rte_pktmbuf_free(pkts[i]);
>  			break;
> -- 
> 2.7.4

Something that I'm missing here: it's legal for guest
to add indirect descriptors for RX.
I don't see the handling of RX here though.
I think it's required for spec compliance.

-- 
MST

  reply	other threads:[~2016-09-23 15:49 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23  8:28 [PATCH v3] vhost: Add indirect descriptors support to the TX path Maxime Coquelin
2016-09-23 15:49 ` Michael S. Tsirkin [this message]
2016-09-23 18:02   ` Maxime Coquelin
2016-09-23 18:06     ` Michael S. Tsirkin
2016-09-23 18:16       ` Maxime Coquelin
2016-09-23 18:22         ` Michael S. Tsirkin
2016-09-23 20:24           ` Stephen Hemminger
2016-09-26  3:03             ` Yuanhan Liu
2016-09-26 12:25               ` Michael S. Tsirkin
2016-09-26 13:04                 ` Yuanhan Liu
2016-09-27  4:15 ` Yuanhan Liu
2016-09-27  7:25   ` Maxime Coquelin
2016-09-27  8:42 ` [PATCH v4] " Maxime Coquelin
2016-09-27 12:18   ` Yuanhan Liu
2016-10-14  7:24   ` Wang, Zhihong
2016-10-14  7:34     ` Wang, Zhihong
2016-10-14 15:50     ` Maxime Coquelin
2016-10-17 11:23       ` Maxime Coquelin
2016-10-17 13:21         ` Yuanhan Liu
2016-10-17 14:14           ` Maxime Coquelin
2016-10-27  9:00             ` Wang, Zhihong
2016-10-27  9:10               ` Maxime Coquelin
2016-10-27  9:55                 ` Maxime Coquelin
2016-10-27 10:19                   ` Wang, Zhihong
2016-10-28  7:32                     ` Pierre Pfister (ppfister)
2016-10-28  7:58                       ` Maxime Coquelin
2016-11-01  8:15                         ` Yuanhan Liu
2016-11-01  9:39                           ` Thomas Monjalon
2016-11-02  2:44                             ` Yuanhan Liu
2016-10-27 10:33                 ` Yuanhan Liu
2016-10-27 10:35                   ` Maxime Coquelin
2016-10-27 10:46                     ` Yuanhan Liu
2016-10-28  0:49                       ` Wang, Zhihong
2016-10-28  7:42                         ` Maxime Coquelin
2016-10-31 10:01                           ` Wang, Zhihong
2016-11-02 10:51                             ` Maxime Coquelin
2016-11-03  8:11                               ` Maxime Coquelin
2016-11-04  6:18                                 ` Xu, Qian Q
2016-11-04  7:41                                   ` Maxime Coquelin
2016-11-04  7:20                                 ` Wang, Zhihong
2016-11-04  7:57                                   ` Maxime Coquelin
2016-11-04  7:59                                     ` Maxime Coquelin
2016-11-04 10:43                                       ` Wang, Zhihong
2016-11-04 11:22                                         ` Maxime Coquelin
2016-11-04 11:36                                           ` Yuanhan Liu
2016-11-04 11:39                                             ` Maxime Coquelin
2016-11-04 12:30                                           ` Wang, Zhihong
2016-11-04 12:54                                             ` Maxime Coquelin
2016-11-04 13:09                                               ` Wang, Zhihong
2016-11-08 10:51                                                 ` Wang, Zhihong
2016-10-27 10:53                   ` Maxime Coquelin
2016-10-28  6:05                     ` Xu, Qian Q

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160923184310-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=dev@dpdk.org \
    --cc=huawei.xie@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=stephen@networkplumber.org \
    --cc=vkaplans@redhat.com \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.