All of lore.kernel.org
 help / color / mirror / Atom feed
From: Flavio Leitner <fbl@sysclose.org>
To: Tiwei Bie <tiwei.bie@intel.com>
Cc: dev@dpdk.org, Ilya Maximets <i.maximets@ovn.org>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	David Marchand <david.marchand@redhat.com>,
	Obrembski MichalX <michalx.obrembski@intel.com>,
	Stokes Ian <ian.stokes@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2] vhost: add support for large buffers.
Date: Thu, 10 Oct 2019 09:12:58 -0300	[thread overview]
Message-ID: <20191010091258.1881ac26@p50.lan> (raw)
In-Reply-To: <20191010051257.GA24244@___>


Hi Tiwei,

Thanks for the review. I will follow up with an improved patch.
fbl

On Thu, 10 Oct 2019 13:12:57 +0800
Tiwei Bie <tiwei.bie@intel.com> wrote:

> [PATCH v2] vhost: add support for large buffers.
> 
> There is a warning reported by devtools/check-git-log.sh
> The '.' at the end of the title should be dropped.
> 
> On Fri, Oct 04, 2019 at 05:10:08PM -0300, Flavio Leitner wrote:
> > The rte_vhost_dequeue_burst supports two ways of dequeuing data.
> > If the data fits into a buffer, then all data is copied and a
> > single linear buffer is returned. Otherwise it allocates
> > additional mbufs and chains them together to return a multiple
> > segments mbuf.
> > 
> > While that covers most use cases, it forces applications that
> > need to work with larger data sizes to support multiple segments
> > mbufs. The non-linear characteristic brings complexity and
> > performance implications to the application.
> > 
> > To resolve the issue, let the host provide during registration
> > if attaching an external buffer to pktmbuf is supported and if
> > only linear buffer are supported.
> > 
> > Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> > ---
> >  doc/guides/prog_guide/vhost_lib.rst |  35 ++++++++++
> >  lib/librte_vhost/rte_vhost.h        |   4 ++
> >  lib/librte_vhost/socket.c           |  10 +++
> >  lib/librte_vhost/vhost.c            |  22 ++++++
> >  lib/librte_vhost/vhost.h            |   4 ++
> >  lib/librte_vhost/virtio_net.c       | 103
> > ++++++++++++++++++++++++++-- 6 files changed, 172 insertions(+), 6
> > deletions(-)
> > 
> > - Changelog:
> >   V2:
> >     - Used rte_malloc() instead of another mempool as suggested by
> > Shahaf.
> >     - Added the documentation section.
> >     - Using driver registration to negotiate the features.
> >     - OvS PoC code:
> >       https://github.com/fleitner/ovs/commit/8fc197c40b1d4fda331686a7b919e9e2b670dda7
> > 
> > diff --git a/doc/guides/prog_guide/vhost_lib.rst
> > b/doc/guides/prog_guide/vhost_lib.rst index fc3ee4353..07e40e3c5
> > 100644 --- a/doc/guides/prog_guide/vhost_lib.rst
> > +++ b/doc/guides/prog_guide/vhost_lib.rst
> > @@ -117,6 +117,41 @@ The following is an overview of some key Vhost
> > API functions: Enabling this flag should only be done when the
> > calling application does not pre-fault the guest shared memory,
> > otherwise migration would fail. 
> > +  - ``RTE_VHOST_USER_LINEARBUF_SUPPORT``
> > +
> > +    Enabling this flag forces vhost dequeue function to only
> > provide linear
> > +    pktmbuf (no multi-segmented pktmbuf).
> > +
> > +    The vhost library by default provides a single pktmbuf for
> > given a
> > +    packet, but if for some reason the data doesn't fit into a
> > single
> > +    pktmbuf (e.g., TSO is enabled), the library will allocate
> > additional
> > +    pktmbufs from the same mempool and chain them together to
> > create a
> > +    multi-segmented pktmbuf.
> > +
> > +    However, the vhost application needs to support
> > multi-segmented format.
> > +    If the vhost application does not support that format and
> > requires large
> > +    buffers to be dequeue, this flag should be enabled to force
> > only linear
> > +    buffers (see RTE_VHOST_USER_EXTBUF_SUPPORT) or drop the packet.
> > +
> > +    It is disabled by default.
> > +
> > +  - ``RTE_VHOST_USER_EXTBUF_SUPPORT``
> > +
> > +    Enabling this flag allows vhost dequeue function to allocate
> > and attach
> > +    an external buffer to a pktmbuf if the pkmbuf doesn't provide
> > enough
> > +    space to store all data.
> > +
> > +    This is useful when the vhost application wants to support
> > large packets
> > +    but doesn't want to increase the default mempool object size
> > nor to
> > +    support multi-segmented mbufs (non-linear). In this case, a
> > fresh buffer
> > +    is allocated using rte_malloc() which gets attached to a
> > pktmbuf using
> > +    rte_pktmbuf_attach_extbuf().
> > +
> > +    See RTE_VHOST_USER_LINEARBUF_SUPPORT as well to disable
> > multi-segmented
> > +    mbufs for application that doesn't support chained mbufs.
> > +
> > +    It is disabled by default.
> > +
> >  * ``rte_vhost_driver_set_features(path, features)``
> >  
> >    This function sets the feature bits the vhost-user driver
> > supports. The diff --git a/lib/librte_vhost/rte_vhost.h
> > b/lib/librte_vhost/rte_vhost.h index 19474bca0..b821b5df4 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -30,6 +30,10 @@ extern "C" {
> >  #define RTE_VHOST_USER_DEQUEUE_ZERO_COPY	(1ULL << 2)
> >  #define RTE_VHOST_USER_IOMMU_SUPPORT	(1ULL << 3)
> >  #define RTE_VHOST_USER_POSTCOPY_SUPPORT		(1ULL << 4)
> > +/* support mbuf with external buffer attached */
> > +#define RTE_VHOST_USER_EXTBUF_SUPPORT	(1ULL << 5)
> > +/* support only linear buffers (no chained mbufs) */
> > +#define RTE_VHOST_USER_LINEARBUF_SUPPORT	(1ULL << 6)
> >  
> >  /** Protocol features. */
> >  #ifndef VHOST_USER_PROTOCOL_F_MQ
> > diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> > index 274988c4d..0ba610fda 100644
> > --- a/lib/librte_vhost/socket.c
> > +++ b/lib/librte_vhost/socket.c
> > @@ -40,6 +40,8 @@ struct vhost_user_socket {
> >  	bool dequeue_zero_copy;
> >  	bool iommu_support;
> >  	bool use_builtin_virtio_net;
> > +	bool extbuf;
> > +	bool linearbuf;
> >  
> >  	/*
> >  	 * The "supported_features" indicates the feature bits the
> > @@ -232,6 +234,12 @@ vhost_user_add_connection(int fd, struct
> > vhost_user_socket *vsocket) if (vsocket->dequeue_zero_copy)
> >  		vhost_enable_dequeue_zero_copy(vid);
> >  
> > +	if (vsocket->extbuf)
> > +		vhost_enable_extbuf(vid);
> > +
> > +	if (vsocket->linearbuf)
> > +		vhost_enable_linearbuf(vid);
> > +
> >  	RTE_LOG(INFO, VHOST_CONFIG, "new device, handle is %d\n",
> > vid); 
> >  	if (vsocket->notify_ops->new_connection) {
> > @@ -870,6 +878,8 @@ rte_vhost_driver_register(const char *path,
> > uint64_t flags) goto out_free;
> >  	}
> >  	vsocket->dequeue_zero_copy = flags &
> > RTE_VHOST_USER_DEQUEUE_ZERO_COPY;
> > +	vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT;
> > +	vsocket->linearbuf = flags &
> > RTE_VHOST_USER_LINEARBUF_SUPPORT;  
> 
> It could be problematic to use dequeue_zero_copy with
> above two features at the same time.
> We may want to reject such combination.
> 
> >  
> >  	/*
> >  	 * Set the supported features correctly for the builtin
> > vhost-user diff --git a/lib/librte_vhost/vhost.c
> > b/lib/librte_vhost/vhost.c index cea44df8c..77457f538 100644
> > --- a/lib/librte_vhost/vhost.c
> > +++ b/lib/librte_vhost/vhost.c
> > @@ -605,6 +605,28 @@ vhost_set_builtin_virtio_net(int vid, bool
> > enable) dev->flags &= ~VIRTIO_DEV_BUILTIN_VIRTIO_NET;
> >  }
> >  
> > +void
> > +vhost_enable_extbuf(int vid)
> > +{
> > +	struct virtio_net *dev = get_device(vid);
> > +
> > +	if (dev == NULL)
> > +		return;
> > +
> > +	dev->extbuf = 1;
> > +}
> > +
> > +void
> > +vhost_enable_linearbuf(int vid)
> > +{
> > +	struct virtio_net *dev = get_device(vid);
> > +
> > +	if (dev == NULL)
> > +		return;
> > +
> > +	dev->linearbuf = 1;
> > +}
> > +
> >  int
> >  rte_vhost_get_mtu(int vid, uint16_t *mtu)
> >  {
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index 5131a97a3..0346bd118 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -302,6 +302,8 @@ struct virtio_net {
> >  	rte_atomic16_t		broadcast_rarp;
> >  	uint32_t		nr_vring;
> >  	int			dequeue_zero_copy;
> > +	int			extbuf;
> > +	int			linearbuf;
> >  	struct vhost_virtqueue
> > *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2]; #define IF_NAME_SZ (PATH_MAX
> > > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) char
> > > ifname[IF_NAME_SZ];
> > @@ -476,6 +478,8 @@ void vhost_attach_vdpa_device(int vid, int did);
> >  void vhost_set_ifname(int, const char *if_name, unsigned int
> > if_len); void vhost_enable_dequeue_zero_copy(int vid);
> >  void vhost_set_builtin_virtio_net(int vid, bool enable);
> > +void vhost_enable_extbuf(int vid);
> > +void vhost_enable_linearbuf(int vid);
> >  
> >  struct vhost_device_ops const *vhost_driver_callback_get(const
> > char *path); 
> > diff --git a/lib/librte_vhost/virtio_net.c
> > b/lib/librte_vhost/virtio_net.c index 5b85b832d..fca75161d 100644
> > --- a/lib/librte_vhost/virtio_net.c
> > +++ b/lib/librte_vhost/virtio_net.c
> > @@ -2,6 +2,7 @@
> >   * Copyright(c) 2010-2016 Intel Corporation
> >   */
> >  
> > +#include <assert.h>
> >  #include <stdint.h>
> >  #include <stdbool.h>
> >  #include <linux/virtio_net.h>
> > @@ -1289,6 +1290,96 @@ get_zmbuf(struct vhost_virtqueue *vq)
> >  	return NULL;
> >  }
> >  
> > +static void
> > +virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque)
> > +{
> > +	rte_free(opaque);
> > +}
> > +
> > +static int
> > +virtio_dev_extbuf_alloc(struct rte_mbuf *pkt, uint16_t size)
> > +{
> > +	struct rte_mbuf_ext_shared_info *shinfo;
> > +	uint16_t buf_len;  
> 
> It also includes RTE_PKTMBUF_HEADROOM and sizeof(*shinfo).
> Is uint16_t big enough?
> 
> > +	rte_iova_t iova;
> > +	void *buf;
> > +
> > +	shinfo = NULL;
> > +	buf_len = size + RTE_PKTMBUF_HEADROOM;
> > +
> > +	/* Try to use pkt buffer to store shinfo to reduce the
> > amount of memory
> > +	 * required, otherwise store shinfo in the new buffer.
> > +	 */
> > +	if (rte_pktmbuf_tailroom(pkt) > sizeof(*shinfo))
> > +		shinfo = rte_pktmbuf_mtod(pkt,
> > +					  struct
> > rte_mbuf_ext_shared_info *);
> > +	else {
> > +		if (unlikely(buf_len + sizeof(shinfo) >
> > UINT16_MAX)) {  
> 
> Should be sizeof(*shinfo)
> 
> > +			RTE_LOG(ERR, VHOST_DATA,
> > +				"buffer size exceeded maximum.\n");
> > +			return -ENOSPC;
> > +		}
> > +
> > +		buf_len += sizeof(shinfo);  
> 
> Ditto.
> 
> > +	}
> > +
> > +	buf = rte_malloc(NULL, buf_len, RTE_CACHE_LINE_SIZE);
> > +	if (unlikely(buf == NULL)) {
> > +		RTE_LOG(ERR, VHOST_DATA,
> > +				"Failed to allocate memory for
> > mbuf.\n");  
> 
> Looks better to just have 3 tabs before the string.
> 
> > +		return -ENOMEM;
> > +	}
> > +
> > +	/* initialize shinfo */
> > +	if (shinfo) {
> > +		shinfo->free_cb = virtio_dev_extbuf_free;
> > +		shinfo->fcb_opaque = buf;
> > +		rte_mbuf_ext_refcnt_set(shinfo, 1);
> > +	} else {
> > +		shinfo = rte_pktmbuf_ext_shinfo_init_helper(buf,
> > &buf_len,
> > +
> > virtio_dev_extbuf_free, buf);
> > +		assert(shinfo);  
> 
> As a library, we shouldn't abort the process.
> We can just return the error.
> 
> > +	}
> > +
> > +	iova = rte_mempool_virt2iova(buf);  
> 
> Should be rte_malloc_virt2iova().
> 
> > +	rte_pktmbuf_attach_extbuf(pkt, buf, iova, buf_len, shinfo);
> > +	rte_pktmbuf_reset_headroom(pkt);
> > +	assert(pkt->ol_flags == EXT_ATTACHED_MBUF);  
> 
> Ditto.
> 
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Allocate a host supported pktmbuf.
> > + */
> > +static __rte_always_inline struct rte_mbuf *
> > +virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct
> > rte_mempool *mp,
> > +			 uint16_t data_len)
> > +{
> > +	struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp);
> > +
> > +	if (unlikely(pkt == NULL))
> > +		return NULL;
> > +
> > +	if (rte_pktmbuf_tailroom(pkt) >= data_len)
> > +		return pkt;
> > +
> > +	/* attach an external buffer if supported */
> > +	if (dev->extbuf && !virtio_dev_extbuf_alloc(pkt, data_len))
> > +		return pkt;
> > +
> > +	/* check if chained buffers are allowed */
> > +	if (!dev->linearbuf)
> > +		return pkt;
> > +
> > +	/* Data doesn't fit into the buffer and the host supports
> > +	 * only linear buffers
> > +	 */
> > +	rte_pktmbuf_free(pkt);
> > +
> > +	return NULL;
> > +}
> > +
> >  static __rte_noinline uint16_t
> >  virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue
> > *vq, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts,
> > uint16_t count) @@ -1343,21 +1434,21 @@ virtio_dev_tx_split(struct
> > virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count;
> > i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX];
> >  		uint16_t head_idx;
> > -		uint32_t dummy_len;
> > +		uint32_t buf_len;
> >  		uint16_t nr_vec = 0;
> >  		int err;
> >  
> >  		if (unlikely(fill_vec_buf_split(dev, vq,
> >  						vq->last_avail_idx
> > + i, &nr_vec, buf_vec,
> > -						&head_idx,
> > &dummy_len,
> > +						&head_idx,
> > &buf_len, VHOST_ACCESS_RO) < 0))
> >  			break;
> >  
> >  		if (likely(dev->dequeue_zero_copy == 0))
> >  			update_shadow_used_ring_split(vq,
> > head_idx, 0); 
> > -		pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> > +		pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool,
> > buf_len); if (unlikely(pkts[i] == NULL)) {
> >  			RTE_LOG(ERR, VHOST_DATA,
> >  				"Failed to allocate memory for
> > mbuf.\n"); @@ -1451,14 +1542,14 @@ virtio_dev_tx_packed(struct
> > virtio_net *dev, struct vhost_virtqueue *vq, for (i = 0; i < count;
> > i++) { struct buf_vector buf_vec[BUF_VECTOR_MAX];
> >  		uint16_t buf_id;
> > -		uint32_t dummy_len;
> > +		uint32_t buf_len;
> >  		uint16_t desc_count, nr_vec = 0;
> >  		int err;
> >  
> >  		if (unlikely(fill_vec_buf_packed(dev, vq,
> >  						vq->last_avail_idx,
> > &desc_count, buf_vec, &nr_vec,
> > -						&buf_id,
> > &dummy_len,
> > +						&buf_id, &buf_len,
> >  						VHOST_ACCESS_RO) <
> > 0)) break;
> >  
> > @@ -1466,7 +1557,7 @@ virtio_dev_tx_packed(struct virtio_net *dev,
> > struct vhost_virtqueue *vq, update_shadow_used_ring_packed(vq,
> > buf_id, 0, desc_count);
> >  
> > -		pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
> > +		pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool,
> > buf_len); if (unlikely(pkts[i] == NULL)) {
> >  			RTE_LOG(ERR, VHOST_DATA,
> >  				"Failed to allocate memory for
> > mbuf.\n"); -- 
> > 2.20.1
> >   


  reply	other threads:[~2019-10-10 12:13 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-01 22:19 [dpdk-dev] [PATCH] vhost: add support to large linear mbufs Flavio Leitner
2019-10-01 23:10 ` Flavio Leitner
2019-10-02  4:45 ` Shahaf Shuler
2019-10-02  8:04   ` David Marchand
2019-10-02  9:00     ` Shahaf Shuler
2019-10-02 12:58       ` Flavio Leitner
2019-10-02 17:50         ` Shahaf Shuler
2019-10-02 18:15           ` Flavio Leitner
2019-10-03 16:57             ` Ilya Maximets
2019-10-03 21:25               ` Flavio Leitner
2019-10-02  7:51 ` Maxime Coquelin
2019-10-04 20:10 ` [dpdk-dev] [PATCH v2] vhost: add support for large buffers Flavio Leitner
2019-10-06  4:47   ` Shahaf Shuler
2019-10-10  5:12   ` Tiwei Bie
2019-10-10 12:12     ` Flavio Leitner [this message]
2019-10-11 17:09   ` [dpdk-dev] [PATCH v3] " Flavio Leitner
2019-10-14  2:44     ` Tiwei Bie
2019-10-15 16:17     ` [dpdk-dev] [PATCH v4] " Flavio Leitner
2019-10-15 17:41       ` Ilya Maximets
2019-10-15 18:44         ` Flavio Leitner
2019-10-15 18:59       ` [dpdk-dev] [PATCH v5] " Flavio Leitner
2019-10-16 10:02         ` Maxime Coquelin
2019-10-16 11:13         ` Maxime Coquelin
2019-10-16 13:32           ` Ilya Maximets
2019-10-16 13:46             ` Maxime Coquelin
2019-10-16 14:02               ` Flavio Leitner
2019-10-16 14:08                 ` Ilya Maximets
2019-10-16 14:14                   ` Flavio Leitner
2019-10-16 14:05               ` Ilya Maximets
2019-10-29  9:02         ` David Marchand
2019-10-29 12:21           ` Flavio Leitner
2019-10-29 16:19             ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191010091258.1881ac26@p50.lan \
    --to=fbl@sysclose.org \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=i.maximets@ovn.org \
    --cc=ian.stokes@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=michalx.obrembski@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tiwei.bie@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.