DPDK-dev Archive on lore.kernel.org

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: Why IP_PIPELINE is faster than L2FWD
From: Xu, Qian Q @ 2016-12-23  1:34 UTC (permalink / raw)
  To: Royce Niu, Richardson, Bruce; +Cc: dev@dpdk.org, Dumitrescu, Cristian
In-Reply-To: <CAOwUCNvHieyrgNKJLGDSw2HCtUzpLyadfUzqEnp0x1A__-47og@mail.gmail.com>

As far as I know, L2FWD only uses 1 core for all RX/TX, for all queues, but for ip_pipeline, you may use more cores. 
A simple question, are you using 1core in ip_pipeline or l3fwd test? 

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Royce Niu
> Sent: Thursday, December 22, 2016 9:36 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Royce Niu <royceniu@gmail.com>; dev@dpdk.org; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Subject: Re: [dpdk-dev] Why IP_PIPELINE is faster than L2FWD
> 
> Dear Bruce,
> 
> Thanks for your kind explanation.
> 
> I will try to follow your suggestion and see the source code.
> 
> On Thu, Dec 22, 2016 at 9:25 PM, Bruce Richardson <
> bruce.richardson@intel.com> wrote:
> 
> > On Thu, Dec 22, 2016 at 08:48:50PM +0800, Royce Niu wrote:
> > > But, actually, L3FWD of IP_PIPELINE is also faster than stock L2FWD,
> > which
> > > also modifies mac addr. How can explain this?
> > >
> > > Actually, I want to know why IP_PIPELINE is much faster and I can
> > > learn from IP_PIPELINE and make our own program.
> > >
> > > But, the documentation of that is not detailed enough. if it is
> > > possible, could you tell me where is the key to boost? Thanks!
> > >
> >
> > Adding Cristian as IP Pipeline maintainer.
> >
> > A lot of tuning work went into IP Pipeline and the table and port
> > libraries it uses, so I'm not sure that there is just one or two key
> > changes which give it such good performance. L2 forward just hasn't
> > had the same level of tuning and, while performing well, is also
> > simplified to make it understandable as an example. Contrast the code
> > in l2fwd against equivalent vector code in l3fwd-lpm* files e.g.
> l3fwd_lpm_sse.h.
> > The latter is very high performing, the former is more readable.
> >
> > Regards,
> > /Bruce
> >
> > > On Thu, Dec 22, 2016 at 7:15 PM, Bruce Richardson <
> > > bruce.richardson@intel.com> wrote:
> > >
> > > > On Thu, Dec 22, 2016 at 12:18:12AM +0800, Royce Niu wrote:
> > > > > Hi all,
> > > > >
> > > > > I tested default L2FWD and IP_PIPELINE (pass-through). The
> > throughput of
> > > > > IP_PIPELINE is higher immensely.
> > > > >
> > > > > There are only two virtual NICs in KVM. The experiment is just
> > > > > moving packet from vNIC0  to vNIC1. I think the function is so
> > > > > simple. Why
> > L2FWD
> > > > > is much slower?
> > > > >
> > > > > How can I improve L2FWD, to make L2FWD faster?
> > > > >
> > > > Is IP_PIPELINE in passthrough mode modifying the packets? L2FWD
> > > > swaps the mac addresses on each packet as it processes them, which
> > > > can slow
> > it
> > > > down. L2FWD is also more an example of how the APIs work than
> > > > anything else. For fastest possible port-to-port forwarding,
> > > > testpmd should give the highest performance.
> > > >
> > > > /Bruce
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Royce
> >
> 
> 
> 
> --
> Regards,
> 
> Royce

^ permalink raw reply

* Re: Why IP_PIPELINE is faster than L2FWD
From: Royce Niu @ 2016-12-23  1:41 UTC (permalink / raw)
  To: Richardson, Bruce, Royce Niu, Xu, Qian Q
  Cc: Dumitrescu, Cristian, dev@dpdk.org
In-Reply-To: <82F45D86ADE5454A95A89742C8D1410E3B4B78B6@shsmsx102.ccr.corp.intel.com>

Yes. One core is assigned.

On Fri, 23 Dec 2016 at 9:34 AM, Xu, Qian Q <qian.q.xu@intel.com> wrote:

> As far as I know, L2FWD only uses 1 core for all RX/TX, for all queues,
> but for ip_pipeline, you may use more cores.
>
> A simple question, are you using 1core in ip_pipeline or l3fwd test?
>
>
>
> > -----Original Message-----
>
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Royce Niu
>
> > Sent: Thursday, December 22, 2016 9:36 PM
>
> > To: Richardson, Bruce <bruce.richardson@intel.com>
>
> > Cc: Royce Niu <royceniu@gmail.com>; dev@dpdk.org; Dumitrescu, Cristian
>
> > <cristian.dumitrescu@intel.com>
>
> > Subject: Re: [dpdk-dev] Why IP_PIPELINE is faster than L2FWD
>
> >
>
> > Dear Bruce,
>
> >
>
> > Thanks for your kind explanation.
>
> >
>
> > I will try to follow your suggestion and see the source code.
>
> >
>
> > On Thu, Dec 22, 2016 at 9:25 PM, Bruce Richardson <
>
> > bruce.richardson@intel.com> wrote:
>
> >
>
> > > On Thu, Dec 22, 2016 at 08:48:50PM +0800, Royce Niu wrote:
>
> > > > But, actually, L3FWD of IP_PIPELINE is also faster than stock L2FWD,
>
> > > which
>
> > > > also modifies mac addr. How can explain this?
>
> > > >
>
> > > > Actually, I want to know why IP_PIPELINE is much faster and I can
>
> > > > learn from IP_PIPELINE and make our own program.
>
> > > >
>
> > > > But, the documentation of that is not detailed enough. if it is
>
> > > > possible, could you tell me where is the key to boost? Thanks!
>
> > > >
>
> > >
>
> > > Adding Cristian as IP Pipeline maintainer.
>
> > >
>
> > > A lot of tuning work went into IP Pipeline and the table and port
>
> > > libraries it uses, so I'm not sure that there is just one or two key
>
> > > changes which give it such good performance. L2 forward just hasn't
>
> > > had the same level of tuning and, while performing well, is also
>
> > > simplified to make it understandable as an example. Contrast the code
>
> > > in l2fwd against equivalent vector code in l3fwd-lpm* files e.g.
>
> > l3fwd_lpm_sse.h.
>
> > > The latter is very high performing, the former is more readable.
>
> > >
>
> > > Regards,
>
> > > /Bruce
>
> > >
>
> > > > On Thu, Dec 22, 2016 at 7:15 PM, Bruce Richardson <
>
> > > > bruce.richardson@intel.com> wrote:
>
> > > >
>
> > > > > On Thu, Dec 22, 2016 at 12:18:12AM +0800, Royce Niu wrote:
>
> > > > > > Hi all,
>
> > > > > >
>
> > > > > > I tested default L2FWD and IP_PIPELINE (pass-through). The
>
> > > throughput of
>
> > > > > > IP_PIPELINE is higher immensely.
>
> > > > > >
>
> > > > > > There are only two virtual NICs in KVM. The experiment is just
>
> > > > > > moving packet from vNIC0  to vNIC1. I think the function is so
>
> > > > > > simple. Why
>
> > > L2FWD
>
> > > > > > is much slower?
>
> > > > > >
>
> > > > > > How can I improve L2FWD, to make L2FWD faster?
>
> > > > > >
>
> > > > > Is IP_PIPELINE in passthrough mode modifying the packets? L2FWD
>
> > > > > swaps the mac addresses on each packet as it processes them, which
>
> > > > > can slow
>
> > > it
>
> > > > > down. L2FWD is also more an example of how the APIs work than
>
> > > > > anything else. For fastest possible port-to-port forwarding,
>
> > > > > testpmd should give the highest performance.
>
> > > > >
>
> > > > > /Bruce
>
> > > > >
>
> > > >
>
> > > >
>
> > > >
>
> > > > --
>
> > > > Regards,
>
> > > >
>
> > > > Royce
>
> > >
>
> >
>
> >
>
> >
>
> > --
>
> > Regards,
>
> >
>
> > Royce
>
>

^ permalink raw reply

* Re: [PATCH 23/28] net/ixgbe: use eal I/O device memory read/write API
From: Jianbo Liu @ 2016-12-23  1:42 UTC (permalink / raw)
  To: Santosh Shukla
  Cc: Jerin Jacob, dev, Ananyev, Konstantin, Thomas Monjalon,
	Bruce Richardson, Jan Viktorin, Helin Zhang
In-Reply-To: <20161222123614.GA8868@santosh-Latitude-E5530-non-vPro>

Hi Santosh,

On 22 December 2016 at 20:36, Santosh Shukla
<santosh.shukla@caviumnetworks.com> wrote:
> Hi Jiangbo,
>
> On Thu, Dec 15, 2016 at 08:40:19PM -0800, Santosh Shukla wrote:
>> On Thu, Dec 15, 2016 at 04:37:12PM +0800, Jianbo Liu wrote:
>> > On 14 December 2016 at 09:55, Jerin Jacob
>> > <jerin.jacob@caviumnetworks.com> wrote:
>> > > From: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> > >
>> >
>> > memory barrier operation is put inside IXGBE_PCI_REG_READ/WRITE in
>> > your change, but I found rte_*mb is called before these macros in some
>> > places.
>> > Can you remove all these redundant calls? And please do the same
>> > checking for other drivers.
>> >
>>
>> Ok.
>>
>> Thinking of adding _relaxed_rd/wr style macro agnostic to arch for ixgbe case
>> in particular. Such that for those code incident:
>> x86 case> first default barrier + relaxed call.
>> arm case> first default barrier + relaxed call.
>>
>> Does that make sense to you? If so then will take care in v2.
>>
>> Santosh.
>
> We spend time looking at drivers code where double barrier
> may happen. Most of them are in driver init path,
> configuration/control path code. So keeping double
> barrier won't impact performance.
>
> We plan to replace only fast path code with _relaxed
> style API's. That way we won't impact each driver
> performance and we'll have the clean port.
>
> Does it make sense? Thought?
>

Yes, please continue your work.

^ permalink raw reply

* Re: [PATCH] acl: remove invalid test
From: Michal Miroslaw @ 2016-12-23  1:47 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev@dpdk.org
In-Reply-To: <2601191342CEEE43887BDE71AB9772583F0F2997@irsmsx105.ger.corp.intel.com>

On Mon, Dec 19, 2016 at 06:48:52PM +0000, Ananyev, Konstantin wrote:
> Hi Michal,
> 
> > -----Original Message-----
> > From: Michał Mirosław [mailto:mirq-linux@rere.qmqm.pl]
> > Sent: Wednesday, December 14, 2016 5:24 PM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Subject: [PATCH] acl: remove invalid test
> > 
> > rte_acl_add_rules() has no way of checking rule size.
> > 
> > This was hidden because the test effectively checked that
> > adding a rule with userdata == 0 failed.
> I suppose that changes have to be inside:
> [PATCH v2] acl: allow zero verdict.

The 'allow zero verdict' patch depends on this one if we are to not have
a breaking tests inbetween. Otherwise, it is an independent change.

I guess I can merge them, though, if you prefer it that way.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH 12/13] i40e: return -errno when intr setup fails
From: Michał Mirosław @ 2016-12-23  1:55 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev
In-Reply-To: <9ab7c196-9e45-dfdc-92e6-d736ece10b4a@intel.com>

On Thu, Dec 22, 2016 at 03:45:35PM +0000, Ferruh Yigit wrote:
> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
> > ---
> >  drivers/net/i40e/i40e_ethdev.c               | 5 +++--
> >  lib/librte_eal/linuxapp/eal/eal_interrupts.c | 2 +-
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
> > index 67778ba..39fbcfe 100644
> > --- a/drivers/net/i40e/i40e_ethdev.c
> > +++ b/drivers/net/i40e/i40e_ethdev.c
> > @@ -1692,8 +1692,9 @@ i40e_dev_start(struct rte_eth_dev *dev)
> >  	     !RTE_ETH_DEV_SRIOV(dev).active) &&
> >  	    dev->data->dev_conf.intr_conf.rxq != 0) {
> >  		intr_vector = dev->data->nb_rx_queues;
> > -		if (rte_intr_efd_enable(intr_handle, intr_vector))
> > -			return -1;
> > +		ret = rte_intr_efd_enable(intr_handle, intr_vector);
> > +		if (ret)
> > +			return ret;
> 
> What is the benefit of returning -errno instead of -1?

Information. Besides, all other error returns from i40e_dev_start return
negated error code (-1 happens to be -EPERM, which further confuses
the poor developer who's diagnosing the failure).

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH] doc: add known issue for uio_pci_generic in XL710
From: Jeff Guo @ 2016-12-23  2:35 UTC (permalink / raw)
  To: helin.zhang, jingjing.wu; +Cc: dev, jia.guo

When bind with uio_pci_generic in XL710, the result is failed.
uio_pci_generic is not supported by XL710.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 doc/guides/rel_notes/known_issues.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 3cd4237..3abcb64 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -640,3 +640,25 @@ I40e VF may not receive packets in the promiscuous mode
 
 **Driver/Module**:
    Poll Mode Driver (PMD).
+
+
+uio_pci_generic bind failed in XL710
+-------------------------------------------------------
+
+**Description**:
+   uio_pci_generic is not supported by XL710, since the errata of XL710 that the feature
+   about the Interrupt Status bit is not implemented.The errata is the item #71 from the
+   `xl710 controller spec  <http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-spec-update.html>`_.
+
+**Implication**:
+   When bind uio_pci_generic, the uio_pci_generic probe device and check the Interrupt
+   Status bit related feature of NICs , since it is not supported by XL710 so it return failed.
+
+**Resolution/Workaround**:
+   Do not bind to use uio_pci_generic in XL710 NICs.
+
+**Affected Environment/Platform**:
+   All.
+
+**Driver/Module**:
+   Poll Mode Driver (PMD).
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH v2 0/2] lpm6: speed improvement on delete rule
From: Paras Kumar @ 2016-12-23  6:07 UTC (permalink / raw)
  To: dev
In-Reply-To: <f6730b2f-0bab-6722-b9a1-e653354d9bcc@elyzion.net>

Hi,
  I have used this lpm6 speed improvement patch for BGPv6 by *Nikita 
Kozlov*. it works great for fixed prefix length however, when i use 
variable prefix length for routes after 2 iterations of addition and 
deletion of routes (30000 approx) lpm6 crashes in expand_rule due to 
stack overflow. The ext bit in rte_lpm6_tbl_entry comes out to be 1 
every time.

Regards
Paras

^ permalink raw reply

* Re: [PATCH 16/18] net/ixgbe: create consistent filter
From: Zhao1, Wei @ 2016-12-23  6:26 UTC (permalink / raw)
  To: Yigit, Ferruh, dev@dpdk.org; +Cc: Lu, Wenzhuo
In-Reply-To: <75f8b5f7-d640-6fca-8a94-613f92c527c1@intel.com>

Hi, Yigit

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, December 21, 2016 1:25 AM
> To: Zhao1, Wei <wei.zhao1@intel.com>; dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 16/18] net/ixgbe: create consistent filter
> 
> On 12/2/2016 10:43 AM, Wei Zhao wrote:
> > From: wei zhao1 <wei.zhao1@intel.com>
> >
> > This patch adds a function to create the flow directory filter.
> >
> > Signed-off-by: wei zhao1 <wei.zhao1@intel.com>
> > Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> 
> <...>
> 
> > +/**
> > + * Create or destroy a flow rule.
> > + * Theorically one rule can match more than one filters.
> > + * We will let it use the filter which it hitt first.
> > + * So, the sequence matters.
> > + */
> > +struct ixgbe_flow *
> > +ixgbe_flow_create(struct rte_eth_dev *dev,
> > +		  const struct rte_flow_attr *attr,
> > +		  const struct rte_flow_item pattern[],
> > +		  const struct rte_flow_action actions[],
> > +		  struct rte_flow_error *error)
> > +{
> > +	int ret;
> > +	struct rte_eth_ntuple_filter ntuple_filter;
> <...>
> > +	error->type = ret;
> 
> This also returns same ICC error, there are a few more same usage:
> 
> .../drivers/net/ixgbe/ixgbe_ethdev.c(9764): error #188: enumerated type
> mixed with another type
>         error->type = ret;
>                     ^
> 
> 
> 
Thank you for warning,  I will use ICC and gcc two kinds of tool to build my patch in v2 later. 

^ permalink raw reply

* [PATCH v2 0/7] virtio_user as an alternative exception path
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1480689075-66977-1-git-send-email-jianfeng.tan@intel.com>

v2: (Lots of them are from yuanhan's comment)
  - Add offloding feature.
  - Add multiqueue support.
  - Add a new patch to postpone the sending of driver ok notification.
  - Put fix patch ahead of the whole patch series.
  - Split original 0001 patch into 0003 and 0004 patches.
  - Remove the original vhost_internal design, just add those into
    struct virtio_user_dev for simplicity.
  - Reword "control" to "send_request".
  - Reword "host_features" to "device_features". 

In v16.07, we upstreamed a virtual device, virtio_user (with vhost-user
as the backend). The path to go with a vhost-kernel backend has been
dropped for bad performance comparing to vhost-user and code simplicity.

But after a second thought, virtio_user + vhost-kernel is a good 
candidate as an exceptional path, such as KNI, which exchanges packets
with kernel networking stack.
  - maintenance: vhost-net (kernel) is upstreamed and extensively used 
    kernel module. We don't need any out-of-tree module like KNI.
  - performance: as with KNI, this solution would use one or more
    kthreads to send/receive packets from user space DPDK applications,
    which has little impact on user space polling thread (except that
    it might enter into kernel space to wake up those kthreads if
    necessary).
  - features: vhost-net is born to be a networking solution, which has
    lots of networking related featuers, like multi queue, tso, multi-seg
    mbuf, etc.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Jianfeng Tan (7):
  net/virtio_user: fix wrongly set features
  net/virtio_user: postpone DRIVER OK notification
  net/virtio_user: move vhost user specific code
  net/virtio_user: abstract virtio user backend ops
  net/virtio_user: add vhost kernel support
  net/virtio_user: enable offloading
  net/virtio_user: enable multiqueue with vhost kernel

 drivers/net/virtio/Makefile                      |   1 +
 drivers/net/virtio/virtio_ethdev.c               |   3 +-
 drivers/net/virtio/virtio_user/vhost.h           |  51 +--
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 476 +++++++++++++++++++++++
 drivers/net/virtio/virtio_user/vhost_user.c      |  86 ++--
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 122 +++---
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  16 +-
 drivers/net/virtio/virtio_user_ethdev.c          |   4 +-
 8 files changed, 642 insertions(+), 117 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

-- 
2.7.4

^ permalink raw reply

* [PATCH v2 1/7] net/virtio_user: fix wrongly set features
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

Before the commit 86d59b21468a ("net/virtio: support LRO"), features
in virtio PMD, is decided and properly set at device initialization
and will not be changed. But afterward, features could be changed in
virtio_dev_configure(), and will be re-negotiated if it's changed.

In virtio_user, device features is obtained at driver probe phase
only once, but we did not store it. So the added feature bits in
re-negotiation will fail.

To fix it, we store it down, and will be used to feature negotiation
either at device initialization phase or device configure phase.

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 31 +++++++++++-------------
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  5 +++-
 drivers/net/virtio/virtio_user_ethdev.c          |  4 +--
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index e239e0e..af5d5be 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -228,29 +228,26 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 	}
 
 	if (vhost_user_sock(dev->vhostfd, VHOST_USER_GET_FEATURES,
-			    &dev->features) < 0) {
+			    &dev->device_features) < 0) {
 		PMD_INIT_LOG(ERR, "get_features failed: %s", strerror(errno));
 		return -1;
 	}
 	if (dev->mac_specified)
-		dev->features |= (1ull << VIRTIO_NET_F_MAC);
+		dev->device_features |= (1ull << VIRTIO_NET_F_MAC);
 
-	if (!cq) {
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
-		/* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
-		dev->features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
-		dev->features &= ~(1ull << VIRTIO_NET_F_MQ);
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
-	} else {
-		/* vhost user backend does not need to know ctrl-q, so
-		 * actually we need add this bit into features. However,
-		 * DPDK vhost-user does send features with this bit, so we
-		 * check it instead of OR it for now.
+	if (cq) {
+		/* device does not really need to know anything about CQ,
+		 * so if necessary, we just claim to support CQ
 		 */
-		if (!(dev->features & (1ull << VIRTIO_NET_F_CTRL_VQ)))
-			PMD_INIT_LOG(INFO, "vhost does not support ctrl-q");
+		dev->device_features |= (1ull << VIRTIO_NET_F_CTRL_VQ);
+	} else {
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
+		/* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_MQ);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
 	}
 
 	if (dev->max_queue_pairs > 1) {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 33690b5..28fc788 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -46,7 +46,10 @@ struct virtio_user_dev {
 	uint32_t	max_queue_pairs;
 	uint32_t	queue_pairs;
 	uint32_t	queue_size;
-	uint64_t	features;
+	uint64_t	features; /* the negotiated features with driver,
+				   * and will be sync with device
+				   */
+	uint64_t	device_features; /* supported features by device */
 	uint8_t		status;
 	uint8_t		mac_addr[ETHER_ADDR_LEN];
 	char		path[PATH_MAX];
diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c
index 406beea..c5db719 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -117,7 +117,7 @@ virtio_user_get_features(struct virtio_hw *hw)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	return dev->features;
+	return dev->device_features;
 }
 
 static void
@@ -125,7 +125,7 @@ virtio_user_set_features(struct virtio_hw *hw, uint64_t features)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	dev->features = features;
+	dev->features = features & dev->device_features;
 }
 
 static uint8_t
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 2/7] net/virtio_user: postpone DRIVER OK notification
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

In driver probe phase, we obtain device information; and then virtio
driver will initialize device and stores info, like negotiated
features, in vhost_user layer; finally, vhost_user gets DRIVER_OK
notification from virtio driver, and then sync with backend device.

Previously, DRIVER_OK could be sent twice: 1. when ether layer invokes
eth_device_init to initialize device; 2. when user configures it with
different configuration from that of previous.

Since we can only depend on DRIVER_OK notification to sync with backend
device, we postpone it to virtio_dev_start when everything is settled.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_ethdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 079fd6c..d54b1bb 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1276,7 +1276,6 @@ virtio_init_device(struct rte_eth_dev *eth_dev, uint64_t req_features)
 	ret = virtio_alloc_queues(eth_dev);
 	if (ret < 0)
 		return ret;
-	vtpci_reinit_complete(hw);

 	if (pci_dev)
 		PMD_INIT_LOG(DEBUG, "port %d vendorID=0x%x deviceID=0x%x",
@@ -1474,6 +1473,8 @@ virtio_dev_start(struct rte_eth_dev *dev)
 	struct virtnet_tx *txvq __rte_unused;
 	struct virtio_hw *hw = dev->data->dev_private;

+	vtpci_reinit_complete(hw);
+
 	/* check if lsc interrupt feature is enabled */
 	if (dev->data->dev_conf.intr_conf.lsc) {
 		if (!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC)) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 3/7] net/virtio_user: move vhost user specific code
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

To support vhost kernel as the backend of net_virtio_user in coming
patches, we move vhost_user specific structs and macros into
vhost_user.c, and only keep common definitions in vhost.h.

Besides, remove VHOST_USER_MQ feature check, it will be added back
in following multiqueue patch.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost.h           | 36 ------------------------
 drivers/net/virtio/virtio_user/vhost_user.c      | 32 +++++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  9 ------
 3 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index 7adb55f..e54ac35 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -42,8 +42,6 @@
 #include "../virtio_logs.h"
 #include "../virtqueue.h"
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
-
 struct vhost_vring_state {
 	unsigned int index;
 	unsigned int num;
@@ -105,40 +103,6 @@ struct vhost_memory_region {
 	uint64_t mmap_offset;
 };
 
-struct vhost_memory {
-	uint32_t nregions;
-	uint32_t padding;
-	struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
-};
-
-struct vhost_user_msg {
-	enum vhost_user_request request;
-
-#define VHOST_USER_VERSION_MASK     0x3
-#define VHOST_USER_REPLY_MASK       (0x1 << 2)
-	uint32_t flags;
-	uint32_t size; /* the following payload size */
-	union {
-#define VHOST_USER_VRING_IDX_MASK   0xff
-#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
-		uint64_t u64;
-		struct vhost_vring_state state;
-		struct vhost_vring_addr addr;
-		struct vhost_memory memory;
-	} payload;
-	int fds[VHOST_MEMORY_MAX_NREGIONS];
-} __attribute((packed));
-
-#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
-#define VHOST_USER_PAYLOAD_SIZE \
-	(sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
-
-/* The version of the protocol we support */
-#define VHOST_USER_VERSION    0x1
-
-#define VHOST_USER_F_PROTOCOL_FEATURES 30
-#define VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)
-
 int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
 int vhost_user_setup(const char *path);
 int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c b/drivers/net/virtio/virtio_user/vhost_user.c
index 082e821..295ce16 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -42,6 +42,38 @@
 
 #include "vhost.h"
 
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    0x1
+
+#define VHOST_MEMORY_MAX_NREGIONS 8
+struct vhost_memory {
+	uint32_t nregions;
+	uint32_t padding;
+	struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+	enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK     0x3
+#define VHOST_USER_REPLY_MASK       (0x1 << 2)
+	uint32_t flags;
+	uint32_t size; /* the following payload size */
+	union {
+#define VHOST_USER_VRING_IDX_MASK   0xff
+#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
+		uint64_t u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		struct vhost_memory memory;
+	} payload;
+	int fds[VHOST_MEMORY_MAX_NREGIONS];
+} __attribute((packed));
+
+#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
+#define VHOST_USER_PAYLOAD_SIZE \
+	(sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
+
 static int
 vhost_user_write(int fd, void *buf, int len, int *fds, int fd_num)
 {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index af5d5be..66a8ffe 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -151,8 +151,6 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 	 * and VIRTIO_NET_F_MAC is stripped.
 	 */
 	features = dev->features;
-	if (dev->max_queue_pairs > 1)
-		features |= VHOST_USER_MQ;
 	features &= ~(1ull << VIRTIO_NET_F_MAC);
 	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
 	if (ret < 0)
@@ -250,13 +248,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
 	}
 
-	if (dev->max_queue_pairs > 1) {
-		if (!(dev->features & VHOST_USER_MQ)) {
-			PMD_INIT_LOG(ERR, "MQ not supported by the backend");
-			return -1;
-		}
-	}
-
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 4/7] net/virtio_user: abstract virtio user backend ops
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

Add a struct virtio_user_backend_ops to abstract three kinds of backend
operations:
  - setup, create the unix socket connection;
  - send_request, sync messages with backend;
  - enable_qp, enable some queue pair.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost.h           | 19 +++++--
 drivers/net/virtio/virtio_user/vhost_user.c      | 54 ++++++++-----------
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 66 +++++++++++++++++-------
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  7 +++
 4 files changed, 93 insertions(+), 53 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index e54ac35..bd67133 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -103,8 +103,21 @@ struct vhost_memory_region {
 	uint64_t mmap_offset;
 };
 
-int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
-int vhost_user_setup(const char *path);
-int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
+struct virtio_user_dev;
 
+typedef int (*vhost_setup_t)(struct virtio_user_dev *dev);
+typedef int (*vhost_send_request_t)(struct virtio_user_dev *dev,
+				    enum vhost_user_request req,
+				    void *arg);
+typedef int (*vhost_enable_qp_t)(struct virtio_user_dev *dev,
+				 uint16_t pair_idx,
+				 int enable);
+
+struct virtio_user_backend_ops {
+	vhost_setup_t setup;
+	vhost_send_request_t send_request;
+	vhost_enable_qp_t enable_qp;
+};
+
+struct virtio_user_backend_ops ops_user;
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c b/drivers/net/virtio/virtio_user/vhost_user.c
index 295ce16..faea1b6 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -41,6 +41,7 @@
 #include <errno.h>
 
 #include "vhost.h"
+#include "virtio_user_dev.h"
 
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    0x1
@@ -255,24 +256,10 @@ prepare_vhost_memory_user(struct vhost_user_msg *msg, int fds[])
 
 static struct vhost_user_msg m;
 
-static const char * const vhost_msg_strings[] = {
-	[VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
-	[VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
-	[VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
-	[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
-	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-	[VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM",
-	[VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE",
-	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
-	[VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR",
-	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
-	[VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
-	[VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE",
-	NULL,
-};
-
-int
-vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
+static int
+vhost_user_sock(struct virtio_user_dev *dev,
+		enum vhost_user_request req,
+		void *arg)
 {
 	struct vhost_user_msg msg;
 	struct vhost_vring_file *file = 0;
@@ -280,11 +267,9 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
 	int fd_num = 0;
 	int i, len;
+	int vhostfd = dev->vhostfd;
 
 	RTE_SET_USED(m);
-	RTE_SET_USED(vhost_msg_strings);
-
-	PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
 
 	msg.request = req;
 	msg.flags = VHOST_USER_VERSION;
@@ -355,8 +340,8 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
 
 	len = VHOST_USER_HDR_SIZE + msg.size;
 	if (vhost_user_write(vhostfd, &msg, len, fds, fd_num) < 0) {
-		PMD_DRV_LOG(ERR, "%s failed: %s",
-			    vhost_msg_strings[req], strerror(errno));
+		PMD_DRV_LOG(ERR, "fail to write vhost user msg: %s",
+			    strerror(errno));
 		return -1;
 	}
 
@@ -403,15 +388,13 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
 
 /**
  * Set up environment to talk with a vhost user backend.
- * @param path
- *   - The path to vhost user unix socket file.
  *
  * @return
  *   - (-1) if fail to set up;
  *   - (>=0) if successful, and it is the fd to vhostfd.
  */
-int
-vhost_user_setup(const char *path)
+static int
+vhost_user_setup(struct virtio_user_dev *dev)
 {
 	int fd;
 	int flag;
@@ -429,7 +412,7 @@ vhost_user_setup(const char *path)
 
 	memset(&un, 0, sizeof(un));
 	un.sun_family = AF_UNIX;
-	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
+	snprintf(un.sun_path, sizeof(un.sun_path), "%s", dev->path);
 	if (connect(fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
 		PMD_DRV_LOG(ERR, "connect error, %s", strerror(errno));
 		close(fd);
@@ -439,8 +422,10 @@ vhost_user_setup(const char *path)
 	return fd;
 }
 
-int
-vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable)
+static int
+vhost_user_enable_queue_pair(struct virtio_user_dev *dev,
+			     uint16_t pair_idx,
+			     int enable)
 {
 	int i;
 
@@ -450,10 +435,15 @@ vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable)
 			.num   = enable,
 		};
 
-		if (vhost_user_sock(vhostfd,
-				    VHOST_USER_SET_VRING_ENABLE, &state))
+		if (vhost_user_sock(dev, VHOST_USER_SET_VRING_ENABLE, &state))
 			return -1;
 	}
 
 	return 0;
 }
+
+struct virtio_user_backend_ops ops_user = {
+	.setup = vhost_user_setup,
+	.send_request = vhost_user_sock,
+	.enable_qp = vhost_user_enable_queue_pair
+};
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 66a8ffe..a818c29 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -39,6 +39,9 @@
 #include <sys/mman.h>
 #include <unistd.h>
 #include <sys/eventfd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
 
 #include "vhost.h"
 #include "virtio_user_dev.h"
@@ -64,7 +67,7 @@ virtio_user_create_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	}
 	file.index = queue_sel;
 	file.fd = callfd;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_CALL, &file);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_CALL, &file);
 	dev->callfds[queue_sel] = callfd;
 
 	return 0;
@@ -88,12 +91,12 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 
 	state.index = queue_sel;
 	state.num = vring->num;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_NUM, &state);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_NUM, &state);
 
 	state.num = 0; /* no reservation */
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_BASE, &state);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_BASE, &state);
 
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_ADDR, &addr);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_ADDR, &addr);
 
 	/* Of all per virtqueue MSGs, make sure VHOST_USER_SET_VRING_KICK comes
 	 * lastly because vhost depends on this msg to judge if
@@ -106,7 +109,7 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	}
 	file.index = queue_sel;
 	file.fd = kickfd;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_KICK, &file);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_KICK, &file);
 	dev->kickfds[queue_sel] = kickfd;
 
 	return 0;
@@ -147,18 +150,17 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 		goto error;
 
 	/* Step 1: set features
-	 * Make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is enabled,
-	 * and VIRTIO_NET_F_MAC is stripped.
+	 * Strip VIRTIO_NET_F_MAC, as MAC address is handled in vdev init.
 	 */
 	features = dev->features;
 	features &= ~(1ull << VIRTIO_NET_F_MAC);
-	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
+	ret = dev->ops->send_request(dev, VHOST_USER_SET_FEATURES, &features);
 	if (ret < 0)
 		goto error;
 	PMD_DRV_LOG(INFO, "set features: %" PRIx64, features);
 
 	/* Step 2: share memory regions */
-	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_MEM_TABLE, NULL);
+	ret = dev->ops->send_request(dev, VHOST_USER_SET_MEM_TABLE, NULL);
 	if (ret < 0)
 		goto error;
 
@@ -169,7 +171,7 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 	/* Step 4: enable queues
 	 * we enable the 1st queue pair by default.
 	 */
-	vhost_user_enable_queue_pair(dev->vhostfd, 0, 1);
+	dev->ops->enable_qp(dev, 0, 1);
 
 	return 0;
 error:
@@ -179,7 +181,7 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
 int virtio_user_stop_device(struct virtio_user_dev *dev)
 {
-	return vhost_user_sock(dev->vhostfd, VHOST_USER_RESET_OWNER, NULL);
+	return dev->ops->send_request(dev, VHOST_USER_RESET_OWNER, NULL);
 }
 
 static inline void
@@ -203,6 +205,36 @@ parse_mac(struct virtio_user_dev *dev, const char *mac)
 	}
 }
 
+static int
+is_vhost_user_by_type(const char *path)
+{
+	struct stat sb;
+
+	if (stat(path, &sb) == -1)
+		return 0;
+
+	return S_ISSOCK(sb.st_mode);
+}
+
+static int
+virtio_user_dev_setup(struct virtio_user_dev *dev)
+{
+	uint32_t i;
+
+	dev->vhostfd = -1;
+	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
+		dev->kickfds[i] = -1;
+		dev->callfds[i] = -1;
+	}
+
+	if (is_vhost_user_by_type(dev->path)) {
+		dev->ops = &ops_user;
+		return dev->ops->setup(dev);
+	}
+
+	return -1;
+}
+
 int
 virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 		     int cq, int queue_size, const char *mac)
@@ -213,19 +245,17 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 	dev->queue_size = queue_size;
 	dev->mac_specified = 0;
 	parse_mac(dev, mac);
-	dev->vhostfd = -1;
 
-	dev->vhostfd = vhost_user_setup(dev->path);
-	if (dev->vhostfd < 0) {
+	if (virtio_user_dev_setup(dev) < 0) {
 		PMD_INIT_LOG(ERR, "backend set up fails");
 		return -1;
 	}
-	if (vhost_user_sock(dev->vhostfd, VHOST_USER_SET_OWNER, NULL) < 0) {
+	if (dev->ops->send_request(dev, VHOST_USER_SET_OWNER, NULL) < 0) {
 		PMD_INIT_LOG(ERR, "set_owner fails: %s", strerror(errno));
 		return -1;
 	}
 
-	if (vhost_user_sock(dev->vhostfd, VHOST_USER_GET_FEATURES,
+	if (dev->ops->send_request(dev, VHOST_USER_GET_FEATURES,
 			    &dev->device_features) < 0) {
 		PMD_INIT_LOG(ERR, "get_features failed: %s", strerror(errno));
 		return -1;
@@ -277,9 +307,9 @@ virtio_user_handle_mq(struct virtio_user_dev *dev, uint16_t q_pairs)
 	}
 
 	for (i = 0; i < q_pairs; ++i)
-		ret |= vhost_user_enable_queue_pair(dev->vhostfd, i, 1);
+		ret |= dev->ops->enable_qp(dev, i, 1);
 	for (i = q_pairs; i < dev->max_queue_pairs; ++i)
-		ret |= vhost_user_enable_queue_pair(dev->vhostfd, i, 0);
+		ret |= dev->ops->enable_qp(dev, i, 0);
 
 	dev->queue_pairs = q_pairs;
 
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 28fc788..503a496 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -37,9 +37,15 @@
 #include <limits.h>
 #include "../virtio_pci.h"
 #include "../virtio_ring.h"
+#include "vhost.h"
 
 struct virtio_user_dev {
+	/* for vhost_user backend */
 	int		vhostfd;
+
+	/* for vhost_kernel backend */
+
+	/* for both vhost_user and vhost_kernel */
 	int		callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
 	int		kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
 	int		mac_specified;
@@ -54,6 +60,7 @@ struct virtio_user_dev {
 	uint8_t		mac_addr[ETHER_ADDR_LEN];
 	char		path[PATH_MAX];
 	struct vring	vrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
+	struct virtio_user_backend_ops *ops;
 };
 
 int virtio_user_start_device(struct virtio_user_dev *dev);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 5/7] net/virtio_user: add vhost kernel support
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
  - vhost_kernel_setup() to open char device, each vq pair needs one
    vhostfd;
  - vhost_kernel_ioctl() to communicate control messages with vhost
    kernel module;
  - vhost_kernel_enable_queue_pair() to open tap device and set it
    as the backend of corresonding vhost fd (that is to say, vq pair).

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/Makefile                      |   1 +
 drivers/net/virtio/virtio_user/vhost.h           |   2 +
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 364 +++++++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  21 +-
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   4 +
 5 files changed, 388 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 97972a6..faeffb2 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ endif
 
 ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
 endif
diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index bd67133..ffab13a 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -120,4 +120,6 @@ struct virtio_user_backend_ops {
 };
 
 struct virtio_user_backend_ops ops_user;
+struct virtio_user_backend_ops ops_kernel;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
new file mode 100644
index 0000000..8984c5c
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -0,0 +1,364 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include <string.h>
+#include <errno.h>
+
+#include <rte_memory.h>
+#include <rte_eal_memconfig.h>
+
+#include "vhost.h"
+#include "virtio_user_dev.h"
+
+struct vhost_memory_kernel {
+	uint32_t nregions;
+	uint32_t padding;
+	struct vhost_memory_region regions[0];
+};
+
+/* vhost kernel ioctls */
+#define VHOST_VIRTIO 0xAF
+#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
+#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
+#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory_kernel)
+#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
+#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
+#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
+#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
+#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
+#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
+#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
+
+/* TUN ioctls */
+#define TUNSETIFF     _IOW('T', 202, int)
+#define TUNGETFEATURES _IOR('T', 207, unsigned int)
+#define TUNSETOFFLOAD  _IOW('T', 208, unsigned int)
+#define TUNGETIFF      _IOR('T', 210, unsigned int)
+#define TUNSETSNDBUF   _IOW('T', 212, int)
+#define TUNGETVNETHDRSZ _IOR('T', 215, int)
+#define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNSETQUEUE  _IOW('T', 217, int)
+#define TUNSETVNETLE _IOW('T', 220, int)
+#define TUNSETVNETBE _IOW('T', 222, int)
+
+/* TUNSETIFF ifr flags */
+#define IFF_TAP          0x0002
+#define IFF_NO_PI        0x1000
+#define IFF_ONE_QUEUE    0x2000
+#define IFF_VNET_HDR     0x4000
+#define IFF_MULTI_QUEUE  0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400
+
+/* Constants */
+#define TUN_DEF_SNDBUF	(1ull << 20)
+#define PATH_NET_TUN	"/dev/net/tun"
+#define VHOST_KERNEL_MAX_REGIONS	64
+
+static uint64_t vhost_req_user_to_kernel[] = {
+	[VHOST_USER_SET_OWNER] = VHOST_SET_OWNER,
+	[VHOST_USER_RESET_OWNER] = VHOST_RESET_OWNER,
+	[VHOST_USER_SET_FEATURES] = VHOST_SET_FEATURES,
+	[VHOST_USER_GET_FEATURES] = VHOST_GET_FEATURES,
+	[VHOST_USER_SET_VRING_CALL] = VHOST_SET_VRING_CALL,
+	[VHOST_USER_SET_VRING_NUM] = VHOST_SET_VRING_NUM,
+	[VHOST_USER_SET_VRING_BASE] = VHOST_SET_VRING_BASE,
+	[VHOST_USER_GET_VRING_BASE] = VHOST_GET_VRING_BASE,
+	[VHOST_USER_SET_VRING_ADDR] = VHOST_SET_VRING_ADDR,
+	[VHOST_USER_SET_VRING_KICK] = VHOST_SET_VRING_KICK,
+	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
+};
+
+/* By default, vhost kernel module allows 64 regions, but DPDK allows
+ * 256 segments. As a relief, below function merges those virtually
+ * adjacent memsegs into one region.
+ */
+static struct vhost_memory_kernel *
+prepare_vhost_memory_kernel(void)
+{
+	uint32_t i, j, k = 0;
+	struct rte_memseg *seg;
+	struct vhost_memory_region *mr;
+	struct vhost_memory_kernel *vm;
+
+	vm = malloc(sizeof(struct vhost_memory_kernel) +
+		    VHOST_KERNEL_MAX_REGIONS *
+		    sizeof(struct vhost_memory_region));
+
+	for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
+		seg = &rte_eal_get_configuration()->mem_config->memseg[i];
+		if (!seg->addr)
+			break;
+
+		int new_region = 1;
+
+		for (j = 0; j < k; ++j) {
+			mr = &vm->regions[j];
+
+			if (mr->userspace_addr + mr->memory_size ==
+			    (uint64_t)seg->addr) {
+				mr->memory_size += seg->len;
+				new_region = 0;
+				break;
+			}
+
+			if ((uint64_t)seg->addr + seg->len ==
+			    mr->userspace_addr) {
+				mr->guest_phys_addr = (uint64_t)seg->addr;
+				mr->userspace_addr = (uint64_t)seg->addr;
+				mr->memory_size += seg->len;
+				new_region = 0;
+				break;
+			}
+		}
+
+		if (new_region == 0)
+			continue;
+
+		mr = &vm->regions[k++];
+		mr->guest_phys_addr = (uint64_t)seg->addr; /* use vaddr here! */
+		mr->userspace_addr = (uint64_t)seg->addr;
+		mr->memory_size = seg->len;
+		mr->mmap_offset = 0;
+
+		if (k >= VHOST_KERNEL_MAX_REGIONS) {
+			free(vm);
+			return NULL;
+		}
+	}
+
+	vm->nregions = k;
+	vm->padding = 0;
+	return vm;
+}
+
+static int
+vhost_kernel_ioctl(struct virtio_user_dev *dev,
+		   enum vhost_user_request req,
+		   void *arg)
+{
+	int i, ret = -1;
+	uint64_t req_kernel;
+	struct vhost_memory_kernel *vm = NULL;
+
+	req_kernel = vhost_req_user_to_kernel[req];
+
+	if (req_kernel == VHOST_SET_MEM_TABLE) {
+		vm = prepare_vhost_memory_kernel();
+		if (!vm)
+			return -1;
+		arg = (void *)vm;
+	}
+
+	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
+	if (req_kernel == VHOST_SET_FEATURES)
+		*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
+
+	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
+		if (dev->vhostfds[i] < 0)
+			continue;
+
+		ret = ioctl(dev->vhostfds[i], req_kernel, arg);
+		if (ret < 0)
+			break;
+	}
+
+	if (vm)
+		free(vm);
+
+	return ret;
+}
+
+/**
+ * Set up environment to talk with a vhost kernel backend.
+ *
+ * @return
+ *   - (-1) if fail to set up;
+ *   - (>=0) if successful.
+ */
+static int
+vhost_kernel_setup(struct virtio_user_dev *dev)
+{
+	int vhostfd;
+	uint32_t q;
+
+	for (q = 0; q < dev->max_queue_pairs; ++q) {
+		vhostfd = open(dev->path, O_RDWR);
+		if (vhostfd < 0) {
+			PMD_DRV_LOG(ERR, "fail to open %s, %s",
+				    dev->path, strerror(errno));
+			return -1;
+		}
+
+		dev->vhostfds[q] = vhostfd;
+	}
+
+	return 0;
+}
+
+static int
+vhost_kernel_set_backend(int vhostfd, int tapfd)
+{
+	struct vhost_vring_file f;
+
+	f.fd = tapfd;
+	f.index = 0;
+	if (ioctl(vhostfd, VHOST_NET_SET_BACKEND, &f) < 0) {
+		PMD_DRV_LOG(ERR, "VHOST_NET_SET_BACKEND fails, %s",
+				strerror(errno));
+		return -1;
+	}
+
+	f.index = 1;
+	if (ioctl(vhostfd, VHOST_NET_SET_BACKEND, &f) < 0) {
+		PMD_DRV_LOG(ERR, "VHOST_NET_SET_BACKEND fails, %s",
+				strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
+			       uint16_t pair_idx,
+			       int enable)
+{
+	unsigned int features;
+	int sndbuf = TUN_DEF_SNDBUF;
+	struct ifreq ifr;
+	int hdr_size;
+	int vhostfd;
+	int tapfd;
+
+	vhostfd = dev->vhostfds[pair_idx];
+
+	if (!enable) {
+		if (dev->tapfds[pair_idx]) {
+			close(dev->tapfds[pair_idx]);
+			dev->tapfds[pair_idx] = -1;
+		}
+		return vhost_kernel_set_backend(vhostfd, -1);
+	} else if (dev->tapfds[pair_idx] >= 0) {
+		return 0;
+	}
+
+	if ((dev->features & (1ULL << VIRTIO_NET_F_MRG_RXBUF)) ||
+	    (dev->features & (1ULL << VIRTIO_F_VERSION_1)))
+		hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hdr_size = sizeof(struct virtio_net_hdr);
+
+	/* TODO:
+	 * 1. verify we can get/set vnet_hdr_len, tap_probe_vnet_hdr_len
+	 * 2. get number of memory regions from vhost module parameter
+	 * max_mem_regions, supported in newer version linux kernel
+	 */
+	tapfd = open(PATH_NET_TUN, O_RDWR);
+	if (tapfd < 0) {
+		PMD_DRV_LOG(ERR, "fail to open %s: %s",
+			    PATH_NET_TUN, strerror(errno));
+		return -1;
+	}
+
+	/* Construct ifr */
+	memset(&ifr, 0, sizeof(ifr));
+	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+
+	if (ioctl(tapfd, TUNGETFEATURES, &features) == -1) {
+		PMD_DRV_LOG(ERR, "TUNGETFEATURES failed: %s", strerror(errno));
+		goto error;
+	}
+	if (features & IFF_ONE_QUEUE)
+		ifr.ifr_flags |= IFF_ONE_QUEUE;
+
+	/* Let tap instead of vhost-net handle vnet header, as the latter does
+	 * not support offloading. And in this case, we should not set feature
+	 * bit VHOST_NET_F_VIRTIO_NET_HDR.
+	 */
+	if (features & IFF_VNET_HDR) {
+		ifr.ifr_flags |= IFF_VNET_HDR;
+	} else {
+		PMD_DRV_LOG(ERR, "TAP does not support IFF_VNET_HDR");
+		goto error;
+	}
+
+	if (dev->ifname)
+		strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
+	else
+		strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
+	if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
+		PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
+		goto error;
+	}
+
+	fcntl(tapfd, F_SETFL, O_NONBLOCK);
+
+	if (ioctl(tapfd, TUNSETVNETHDRSZ, &hdr_size) < 0) {
+		PMD_DRV_LOG(ERR, "TUNSETVNETHDRSZ failed: %s", strerror(errno));
+		goto error;
+	}
+
+	if (ioctl(tapfd, TUNSETSNDBUF, &sndbuf) < 0) {
+		PMD_DRV_LOG(ERR, "TUNSETSNDBUF failed: %s", strerror(errno));
+		goto error;
+	}
+
+	if (vhost_kernel_set_backend(vhostfd, tapfd) < 0)
+		goto error;
+
+	dev->tapfds[pair_idx] = tapfd;
+	if (!dev->ifname)
+		dev->ifname = strdup(ifr.ifr_name);
+
+	return 0;
+error:
+	return -1;
+}
+
+struct virtio_user_backend_ops ops_kernel = {
+	.setup = vhost_kernel_setup,
+	.send_request = vhost_kernel_ioctl,
+	.enable_qp = vhost_kernel_enable_queue_pair
+};
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index a818c29..c718b85 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -219,7 +219,7 @@ is_vhost_user_by_type(const char *path)
 static int
 virtio_user_dev_setup(struct virtio_user_dev *dev)
 {
-	uint32_t i;
+	uint32_t i, q;
 
 	dev->vhostfd = -1;
 	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
@@ -227,12 +227,18 @@ virtio_user_dev_setup(struct virtio_user_dev *dev)
 		dev->callfds[i] = -1;
 	}
 
+	for (q = 0; q < VHOST_KERNEL_MAX_QUEUES; ++q) {
+		dev->vhostfds[q] = -1;
+		dev->tapfds[q] = -1;
+	}
+
 	if (is_vhost_user_by_type(dev->path)) {
 		dev->ops = &ops_user;
-		return dev->ops->setup(dev);
+	} else {
+		dev->ops = &ops_kernel;
 	}
 
-	return -1;
+	return dev->ops->setup(dev);
 }
 
 int
@@ -284,7 +290,9 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 void
 virtio_user_dev_uninit(struct virtio_user_dev *dev)
 {
-	uint32_t i;
+	uint32_t i, q;
+
+	dev->ops->send_request(dev, VHOST_USER_RESET_OWNER, NULL);
 
 	for (i = 0; i < dev->max_queue_pairs * 2; ++i) {
 		close(dev->callfds[i]);
@@ -292,6 +300,11 @@ virtio_user_dev_uninit(struct virtio_user_dev *dev)
 	}
 
 	close(dev->vhostfd);
+
+	for (q = 0; q < VHOST_KERNEL_MAX_QUEUES; ++q) {
+		close(dev->vhostfds[q]);
+		close(dev->tapfds[q]);
+	}
 }
 
 static uint8_t
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 503a496..148b2e6 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -44,6 +44,10 @@ struct virtio_user_dev {
 	int		vhostfd;
 
 	/* for vhost_kernel backend */
+	char		*ifname;
+#define VHOST_KERNEL_MAX_QUEUES		8
+	int		vhostfds[VHOST_KERNEL_MAX_QUEUES];
+	int		tapfds[VHOST_KERNEL_MAX_QUEUES];
 
 	/* for both vhost_user and vhost_kernel */
 	int		callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 6/7] net/virtio_user: enable offloading
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

When used with vhost kernel backend, we can offload at both directions.
  - From vhost kernel to virtio_user, the offload is enabled so that
    DPDK app can trust the flow is checksum-correct; and if DPDK app
    sends it through another port, the checksum needs to be
    recalculated or offloaded. It also applies to TSO.
  - From virtio_user to vhost_kernel, the offload is enabled so that
    kernel can trust the flow is L4-checksum-correct, no need to verify
    it; if kernel will consume it, DPDK app should make sure the
    l3-checksum is correctly set.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c | 61 ++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index 8984c5c..fb3c454 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -91,6 +91,13 @@ struct vhost_memory_kernel {
 #define IFF_ATTACH_QUEUE 0x0200
 #define IFF_DETACH_QUEUE 0x0400
 
+/* Features for GSO (TUNSETOFFLOAD). */
+#define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
+#define TUN_F_TSO4	0x02	/* I can handle TSO for IPv4 packets */
+#define TUN_F_TSO6	0x04	/* I can handle TSO for IPv6 packets */
+#define TUN_F_TSO_ECN	0x08	/* I can handle TSO with ECN bits. */
+#define TUN_F_UFO	0x10	/* I can handle UFO packets */
+
 /* Constants */
 #define TUN_DEF_SNDBUF	(1ull << 20)
 #define PATH_NET_TUN	"/dev/net/tun"
@@ -173,6 +180,28 @@ prepare_vhost_memory_kernel(void)
 	return vm;
 }
 
+/* with below features, vhost kernel does not need to do the checksum and TSO,
+ * these info will be passed to virtio_user through virtio net header.
+ */
+static const uint64_t guest_offloads_mask =
+	(1ULL << VIRTIO_NET_F_GUEST_CSUM) |
+	(1ULL << VIRTIO_NET_F_GUEST_TSO4) |
+	(1ULL << VIRTIO_NET_F_GUEST_TSO6) |
+	(1ULL << VIRTIO_NET_F_GUEST_ECN)  |
+	(1ULL << VIRTIO_NET_F_GUEST_UFO);
+
+/* with below features, when flows from virtio_user to vhost kernel
+ * (1) if flows goes up through the kernel networking stack, it does not need
+ * to verify checksum, which can save CPU cycles;
+ * (2) if flows goes through a Linux bridge and outside from an interface
+ * (kernel driver), checksum and TSO will be done by GSO in kernel or even
+ * offloaded into real physical device.
+ */
+static const uint64_t host_offloads_mask =
+	(1ULL << VIRTIO_NET_F_CSUM) |
+	(1ULL << VIRTIO_NET_F_HOST_TSO4) |
+	(1ULL << VIRTIO_NET_F_HOST_TSO6);
+
 static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		   enum vhost_user_request req,
@@ -191,10 +220,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		arg = (void *)vm;
 	}
 
-	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
-	if (req_kernel == VHOST_SET_FEATURES)
+	if (req_kernel == VHOST_SET_FEATURES) {
+		/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
 		*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
 
+		/* VHOST kernel does not know about below flags */
+		*(uint64_t *)arg &= ~guest_offloads_mask;
+		*(uint64_t *)arg &= ~host_offloads_mask;
+	}
+
 	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
 		if (dev->vhostfds[i] < 0)
 			continue;
@@ -204,6 +238,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 			break;
 	}
 
+	if (!ret && req_kernel == VHOST_GET_FEATURES) {
+		/* with tap as the backend, all these features are supported
+		 * but not claimed by vhost-net, so we add them back when
+		 * reporting to upper layer.
+		 */
+		*((uint64_t *)arg) |= guest_offloads_mask;
+		*((uint64_t *)arg) |= host_offloads_mask;
+	}
+
 	if (vm)
 		free(vm);
 
@@ -271,6 +314,12 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 	int hdr_size;
 	int vhostfd;
 	int tapfd;
+	unsigned int offload =
+			TUN_F_CSUM |
+			TUN_F_TSO4 |
+			TUN_F_TSO6 |
+			TUN_F_TSO_ECN |
+			TUN_F_UFO;
 
 	vhostfd = dev->vhostfds[pair_idx];
 
@@ -345,6 +394,14 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 		goto error;
 	}
 
+	/* TODO: before set the offload capabilities, we'd better (1) check
+	 * negotiated features to see if necessary to offload; (2) query tap
+	 * to see if it supports the offload capabilities.
+	 */
+	if (ioctl(tapfd, TUNSETOFFLOAD, offload) != 0)
+		PMD_DRV_LOG(ERR, "TUNSETOFFLOAD ioctl() failed: %s",
+			   strerror(errno));
+
 	if (vhost_kernel_set_backend(vhostfd, tapfd) < 0)
 		goto error;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 7/7] net/virtio_user: enable multiqueue with vhost kernel
From: Jianfeng Tan @ 2016-12-23  7:14 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1482477266-39199-1-git-send-email-jianfeng.tan@intel.com>

With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.

And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 67 +++++++++++++++++++++---
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  1 +
 2 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index fb3c454..67b382e 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -203,6 +203,29 @@ static const uint64_t host_offloads_mask =
 	(1ULL << VIRTIO_NET_F_HOST_TSO6);
 
 static int
+tap_supporte_mq(void)
+{
+	int tapfd;
+	unsigned int tap_features;
+
+	tapfd = open(PATH_NET_TUN, O_RDWR);
+	if (tapfd < 0) {
+		PMD_DRV_LOG(ERR, "fail to open %s: %s",
+			    PATH_NET_TUN, strerror(errno));
+		return -1;
+	}
+
+	if (ioctl(tapfd, TUNGETFEATURES, &tap_features) == -1) {
+		PMD_DRV_LOG(ERR, "TUNGETFEATURES failed: %s", strerror(errno));
+		close(tapfd);
+		return -1;
+	}
+
+	close(tapfd);
+	return tap_features & IFF_MULTI_QUEUE;
+}
+
+static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		   enum vhost_user_request req,
 		   void *arg)
@@ -210,6 +233,8 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 	int i, ret = -1;
 	uint64_t req_kernel;
 	struct vhost_memory_kernel *vm = NULL;
+	int vhostfd;
+	unsigned int queue_sel;
 
 	req_kernel = vhost_req_user_to_kernel[req];
 
@@ -229,13 +254,33 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		*(uint64_t *)arg &= ~host_offloads_mask;
 	}
 
-	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
-		if (dev->vhostfds[i] < 0)
-			continue;
+	switch (req_kernel) {
+	case VHOST_SET_VRING_NUM:
+	case VHOST_SET_VRING_ADDR:
+	case VHOST_SET_VRING_BASE:
+	case VHOST_GET_VRING_BASE:
+	case VHOST_SET_VRING_KICK:
+	case VHOST_SET_VRING_CALL:
+		queue_sel = *(unsigned int *)arg;
+		vhostfd = dev->vhostfds[queue_sel / 2];
+		*(unsigned int *)arg = queue_sel % 2;
+		PMD_DRV_LOG(DEBUG, "vhostfd=%d, index=%u",
+			    vhostfd, *(unsigned int *)arg);
+		break;
+	default:
+		vhostfd = -1;
+	}
+	if (vhostfd == -1) {
+		for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
+			if (dev->vhostfds[i] < 0)
+				continue;
 
-		ret = ioctl(dev->vhostfds[i], req_kernel, arg);
-		if (ret < 0)
-			break;
+			ret = ioctl(dev->vhostfds[i], req_kernel, arg);
+			if (ret < 0)
+				break;
+		}
+	} else {
+		ret = ioctl(vhostfd, req_kernel, arg);
 	}
 
 	if (!ret && req_kernel == VHOST_GET_FEATURES) {
@@ -245,6 +290,12 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		 */
 		*((uint64_t *)arg) |= guest_offloads_mask;
 		*((uint64_t *)arg) |= host_offloads_mask;
+
+		/* vhost_kernel will not declare this feature, but it does
+		 * support multi-queue.
+		 */
+		if (tap_supporte_mq())
+			*(uint64_t *)arg |= (1ull << VIRTIO_NET_F_MQ);
 	}
 
 	if (vm)
@@ -320,6 +371,7 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 			TUN_F_TSO6 |
 			TUN_F_TSO_ECN |
 			TUN_F_UFO;
+	int req_mq = (dev->max_queue_pairs > 1);
 
 	vhostfd = dev->vhostfds[pair_idx];
 
@@ -373,6 +425,9 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 		goto error;
 	}
 
+	if (req_mq)
+		ifr.ifr_flags |= IFF_MULTI_QUEUE;
+
 	if (dev->ifname)
 		strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
 	else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index c718b85..241f5bb 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -93,6 +93,7 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	state.num = vring->num;
 	dev->ops->send_request(dev, VHOST_USER_SET_VRING_NUM, &state);
 
+	state.index = queue_sel;
 	state.num = 0; /* no reservation */
 	dev->ops->send_request(dev, VHOST_USER_SET_VRING_BASE, &state);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/3] Fix iv sizes in crypto drivers capabilities
From: Arek Kusztal @ 2016-12-23  8:10 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal

This patchset fixes iv (initialization vector) size values in qat
and aesni gcm pmds to be conformant with nist SP800-38D.

Arek Kusztal (3):
  crypto/aesni_gcm: fix J0 padding bytes for GCM
  crypto/aesni_gcm: fix iv size in PMD capabilities
  crypto/qat: fix iv size in PMD capabilities

 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c     | 4 +++-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 4 ++--
 drivers/crypto/qat/qat_crypto.c              | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH 1/3] crypto/aesni_gcm: fix J0 padding bytes for GCM
From: Arek Kusztal @ 2016-12-23  8:10 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal
In-Reply-To: <1482480639-26935-1-git-send-email-arkadiuszx.kusztal@intel.com>

This commit fixes pre-counter block (J0) padding by clearing
four most significant bytes before setting initial counter value.

Fixes: b2bb3597470c ("crypto/aesni_gcm: move pre-counter block to driver")

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index dba5e15..af3d60f 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -40,6 +40,7 @@
 #include <rte_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
+#include <rte_byteorder.h>
 
 #include "aesni_gcm_pmd_private.h"
 
@@ -241,7 +242,8 @@ process_gcm_crypto_op(struct aesni_gcm_qp *qp, struct rte_crypto_sym_op *op,
 	 * to set BE LSB to 1, driver expects that 16B is allocated
 	 */
 	if (op->cipher.iv.length == 12) {
-		op->cipher.iv.data[15] = 1;
+		uint32_t *iv_padd = (uint32_t *)&op->cipher.iv.data[12];
+		*iv_padd = rte_bswap32(1);
 	}
 
 	if (op->auth.aad.length != 12 && op->auth.aad.length != 8 &&
-- 
2.1.0

^ permalink raw reply related

* [PATCH 2/3] crypto/aesni_gcm: fix iv size in PMD capabilities
From: Arek Kusztal @ 2016-12-23  8:10 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal
In-Reply-To: <1482480639-26935-1-git-send-email-arkadiuszx.kusztal@intel.com>

This patch sets iv size in aesni gcm PMD to 12 bytes to be
conformant with nist SP800-38D.

Fixes: eec136f3c54f ("aesni_gcm: add driver for AES-GCM crypto operations")

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index e824d4b..c51f82a 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -77,8 +77,8 @@ static const struct rte_cryptodev_capabilities aesni_gcm_pmd_capabilities[] = {
 					.increment = 0
 				},
 				.iv_size = {
-					.min = 16,
-					.max = 16,
+					.min = 12,
+					.max = 12,
 					.increment = 0
 				}
 			}, }
-- 
2.1.0

^ permalink raw reply related

* [PATCH 3/3] crypto/qat: fix iv size in PMD capabilities
From: Arek Kusztal @ 2016-12-23  8:10 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal
In-Reply-To: <1482480639-26935-1-git-send-email-arkadiuszx.kusztal@intel.com>

This patch sets iv size in qat PMD to 12 bytes to be
conformant with nist SP800-38D.

Fixes: 26c2e4ad5ad4 ("cryptodev: add capabilities discovery")
---
 drivers/crypto/qat/qat_crypto.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qat/qat_crypto.c b/drivers/crypto/qat/qat_crypto.c
index fa78c60..0b714ad 100644
--- a/drivers/crypto/qat/qat_crypto.c
+++ b/drivers/crypto/qat/qat_crypto.c
@@ -303,8 +303,8 @@ static const struct rte_cryptodev_capabilities qat_pmd_capabilities[] = {
 					.increment = 8
 				},
 				.iv_size = {
-					.min = 16,
-					.max = 16,
+					.min = 12,
+					.max = 12,
 					.increment = 0
 				}
 			}, }
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH 15/18] net/ixgbe: parse flow director filter
From: Adrien Mazarguil @ 2016-12-23  8:13 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Zhao1, Wei, dev@dpdk.org, Lu, Wenzhuo
In-Reply-To: <86cc3f7c-4488-9efa-48ce-82eb57ae572c@intel.com>

Hi,

On Thu, Dec 22, 2016 at 10:44:32AM +0000, Ferruh Yigit wrote:
> On 12/22/2016 9:19 AM, Zhao1, Wei wrote:
> > Hi, Yigit
> > 
> >> -----Original Message-----
> >> From: Yigit, Ferruh
> >> Sent: Wednesday, December 21, 2016 1:01 AM
> >> To: Zhao1, Wei <wei.zhao1@intel.com>; dev@dpdk.org
> >> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 15/18] net/ixgbe: parse flow director filter
> >>
> >> On 12/2/2016 10:43 AM, Wei Zhao wrote:
> >>> From: wei zhao1 <wei.zhao1@intel.com>
> >>>
> >>> check if the rule is a flow director rule, and get the flow director info.
> >>>
> >>> Signed-off-by: wei zhao1 <wei.zhao1@intel.com>
> >>> Signed-off-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
> >>> ---
> >>
> >> <...>
> >>
> >>> +	PATTERN_SKIP_VOID(rule, struct ixgbe_fdir_rule,
> >>> +			  RTE_FLOW_ERROR_TYPE_ITEM_NUM);
> >>> +	if (item->type != RTE_FLOW_ITEM_TYPE_ETH &&
> >>> +	    item->type != RTE_FLOW_ITEM_TYPE_IPV4 &&
> >>> +	    item->type != RTE_FLOW_ITEM_TYPE_IPV6 &&
> >>> +	    item->type != RTE_FLOW_ITEM_TYPE_UDP &&
> >>> +	    item->type != RTE_FLOW_ITEM_TYPE_VXLAN &&
> >>> +	    item->type != RTE_FLOW_ITEM_TYPE_NVGRE) {
> >>
> >> This gives build error [1], there are a few more same usage:
> >>
> >> .../drivers/net/ixgbe/ixgbe_ethdev.c:9238:17: error: comparison of constant
> >> 241 with expression of type 'const enum rte_flow_item_type' is always true
> >> [-Werror,-Wtautological-constant-out-of-range-compare]
> >>             item->type != RTE_FLOW_ITEM_TYPE_NVGRE) {
> >>
> >>
> >>
> > 
> > Ok, I will add two type definition RTE_FLOW_ITEM_TYPE_NVGRE and RTE_FLOW_ITEM_TYPE_E_TAG  into const enum rte_flow_item_type to eliminate this problem.
> > Thank you.
> > 
> 
> CC: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Thanks, the original message did not catch my attention.

> Yes, that is what right thing to do, since rte_flow patchset not merged
> yet, perhaps Adrien may want to include this as next version of his
> patchset?
> 
> What do you think Adrien?

I think by now the rte_flow series is automatically categorized as spam by
half the community already, and that new items such as these can be added
subsequently on their own, ideally before the entire ixgbe/i40e series.

I have a few comments regarding these new items if we want to make them part
of rte_flow.h (see definitions below).

Unfortunately, even though they are super convenient to use and expose in the
testpmd flow command, we need to avoid C-style bit-field definitions such as
these for protocol header matching because they are not endian-safe,
particularly multi-byte fields. Only meta-data that should be interpreted
with host endianness can use them (e.g. struct rte_flow_attr, struct
rte_flow_action_vf, etc):

 struct rte_flow_item_nvgre {
        uint32_t flags0:1; /**< 0 */
        uint32_t rsvd1:1; /**< 1 bit not defined */
        uint32_t flags1:2; /**< 2 bits, 1 0 */
        uint32_t rsvd0:9; /**< Reserved0 */
        uint32_t ver:3; /**< version */
        uint32_t protocol:16; /**< protocol type, 0x6558 */
        uint32_t tni:24; /**< tenant network ID or virtual subnet ID */
        uint32_t flow_id:8; /**< flow ID or Reserved */
 };

For an example how to avoid them, see struct ipv6_hdr definition in
rte_ip.h, where field vtc_flow is 32 bit but covers three protocol fields
and is considered big-endian (Nelio's endianness series [1] would be really
handy to eliminate confusion here). Also see struct rte_flow_item_vxlan,
which covers 24-bit fields using uint8_t arrays.

 struct rte_flow_item_e_tag {
        struct ether_addr dst; /**< Destination MAC. */
        struct ether_addr src; /**< Source MAC. */
        uint16_t e_tag_ethertype; /**< E-tag EtherType, 0x893F. */
        uint16_t e_pcp:3; /**<  E-PCP */
        uint16_t dei:1; /**< DEI */
        uint16_t in_e_cid_base:12; /**< Ingress E-CID base */
        uint16_t rsv:2; /**< reserved */
        uint16_t grp:2; /**< GRP */
        uint16_t e_cid_base:12; /**< E-CID base */
        uint16_t in_e_cid_ext:8; /**< Ingress E-CID extend */
        uint16_t e_cid_ext:8; /**< E-CID extend */
        uint16_t type; /**< MAC type. */
        unsigned int tags; /**< Number of 802.1Q/ad tags defined. */
        struct {
                uint16_t tpid; /**< Tag protocol identifier. */
                uint16_t tci; /**< Tag control information. */
        } tag[]; /**< 802.1Q/ad tag definitions, outermost first. */
 };

Besides the bit-field issue for this one, looks like it should be split, at
least the initial part is already covered by rte_flow_item_eth.

I do not know much about E-Tag (IEEE 802.1BR right?) but it sort of looks
like a 2.5 layer protocol reminiscent of VLAN.

tags and tag[] fields seem based on the VLAN definition of the original
rte_flow RFC that has since been replaced with stacked rte_flow_item_vlan
items, much easier to program for. Perhaps this can be relied on instead of
having e_tag implement its own variant.

As a protocol-matching item and E-Tag TCI being 6 bytes according to IEEE
802.1BR, sizeof(struct rte_flow_item_e_tag) should ideally return 6 as well
and would likely comprise three big-endian uint16_t fields. See how
PCP/DEI/VID VLAN fields are interpreted in testpmd [2].

Again, these concerns only stand if you intend to include these definitions
into rte_flow.h.

[1] http://dpdk.org/ml/archives/dev/2016-November/050060.html
[2] http://dpdk.org/ml/archives/dev/2016-December/052976.html

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply

* Example(Load_balancer) Tx flush bug
From: Maple @ 2016-12-23  8:23 UTC (permalink / raw)
  To: dev; +Cc: maintainers
In-Reply-To: <201612222107161151532@raisecom.com>

From: Maple <liujian@raisecom.com>
To: <dev@dpdk.org>
Cc: <thomas.monjalon@6wind.com>
Subject: [PATCH] Example(Load_balancer) Tx flush bug
Date: Thu, 22 Dec 2016 09:57:48 +0800
Message-Id: <1482371868-19669-1-git-send-email-liujian@raisecom.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <2016122122394164225248@raisecom.com>
References: <2016122122394164225248@raisecom.com>

We found a bug in use load_balancer example,and,This bug DPDK each version.
In IO tx flush, only flush port 0.
So,If I enable more than the Port,then,In addition to 0 port won't flush.

Signed-off-by: Maple <liujian@raisecom.com>
---
 a/examples/load_balancer/runtime.c | 667 ++++++++++++++++++++++++++++++++++++
 b/examples/load_balancer/runtime.c | 669 +++++++++++++++++++++++++++++++++++++
 2 files changed, 1336 insertions(+)
 create mode 100644 a/examples/load_balancer/runtime.c
 create mode 100644 b/examples/load_balancer/runtime.c

diff --git a/examples/load_balancer/runtime.c b/examples/load_balancer/runtime.c
index 9612392..3a2e900 100644
--- a/test/a/examples/load_balancer/runtime.c
+++ b/test/b/examples/load_balancer/runtime.c
@@ -418,9 +418,11 @@ app_lcore_io_tx(
 static inline void
 app_lcore_io_tx_flush(struct app_lcore_params_io *lp)
 {
+       uint8_t i;
        uint8_t port;

-       for (port = 0; port < lp->tx.n_nic_ports; port ++) {
+       port = lp->tx.nic_ports[0];
+       for (i = 0; i < lp->tx.n_nic_ports; i ++) {
                uint32_t n_pkts;

                if (likely((lp->tx.mbuf_out_flush[port] == 0) ||

^ permalink raw reply related

* [PATCH v2 0/3] Fix iv sizes in crypto drivers capabilities
From: Arek Kusztal @ 2016-12-23  8:24 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal

This patchset fixes iv (initialization vector) size values in qat
and aesni gcm pmds to be conformant with nist SP800-38D.

v2:
- added missing signed-off-by line

Arek Kusztal (3):
  crypto/aesni_gcm: fix J0 padding bytes for GCM
  crypto/aesni_gcm: fix iv size in PMD capabilities
  crypto/qat: fix iv size in PMD capabilities

 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c     | 4 +++-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 4 ++--
 drivers/crypto/qat/qat_crypto.c              | 4 ++--
 3 files changed, 7 insertions(+), 5 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH v2 1/3] crypto/aesni_gcm: fix J0 padding bytes for GCM
From: Arek Kusztal @ 2016-12-23  8:24 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal
In-Reply-To: <1482481493-4369-1-git-send-email-arkadiuszx.kusztal@intel.com>

This commit fixes pre-counter block (J0) padding by clearing
four most significant bytes before setting initial counter value.

Fixes: b2bb3597470c ("crypto/aesni_gcm: move pre-counter block to driver")

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index dba5e15..af3d60f 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -40,6 +40,7 @@
 #include <rte_vdev.h>
 #include <rte_malloc.h>
 #include <rte_cpuflags.h>
+#include <rte_byteorder.h>
 
 #include "aesni_gcm_pmd_private.h"
 
@@ -241,7 +242,8 @@ process_gcm_crypto_op(struct aesni_gcm_qp *qp, struct rte_crypto_sym_op *op,
 	 * to set BE LSB to 1, driver expects that 16B is allocated
 	 */
 	if (op->cipher.iv.length == 12) {
-		op->cipher.iv.data[15] = 1;
+		uint32_t *iv_padd = (uint32_t *)&op->cipher.iv.data[12];
+		*iv_padd = rte_bswap32(1);
 	}
 
 	if (op->auth.aad.length != 12 && op->auth.aad.length != 8 &&
-- 
2.1.0

^ permalink raw reply related

* [PATCH v2 2/3] crypto/aesni_gcm: fix iv size in PMD capabilities
From: Arek Kusztal @ 2016-12-23  8:24 UTC (permalink / raw)
  To: dev
  Cc: fiona.trahe, pablo.de.lara.guarch, john.griffin, deepak.k.jain,
	declan.doherty, Arek Kusztal
In-Reply-To: <1482481493-4369-1-git-send-email-arkadiuszx.kusztal@intel.com>

This patch sets iv size in aesni gcm PMD to 12 bytes to be
conformant with nist SP800-38D.

Fixes: eec136f3c54f ("aesni_gcm: add driver for AES-GCM crypto operations")

Signed-off-by: Arek Kusztal <arkadiuszx.kusztal@intel.com>
---
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index e824d4b..c51f82a 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -77,8 +77,8 @@ static const struct rte_cryptodev_capabilities aesni_gcm_pmd_capabilities[] = {
 					.increment = 0
 				},
 				.iv_size = {
-					.min = 16,
-					.max = 16,
+					.min = 12,
+					.max = 12,
 					.increment = 0
 				}
 			}, }
-- 
2.1.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox