DPDK-dev Archive on lore.kernel.org

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/7] virtio_user as an alternative exception path
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1480689075-66977-1-git-send-email-jianfeng.tan@intel.com>

v3:
  - Drop the patch to postpone driver ok sending patch, superseded it
    with a bug fix to disable all virtqueues and re-init the device.
    (you might wonder why not just send reset owner msg. Under my test,
     it causes spinlock deadlock problem when killing the program).
  - Avoid compiling error on 32-bit system for pointer convert.
  - Fix a bug in patch "abstract virtio user backend ops", vhostfd is
    not properly assigned.
  - Fix a "MQ cannot be used" bug in v2, which is related to strip
    some feature bits that vhost kernel does not recognize.
  - Update release note.

v2: (Lots of them are from yuanhan's comment)
  - Add offloding feature.
  - Add multiqueue support.
  - Add a new patch to postpone the sending of driver ok notification.
  - Put fix patch ahead of the whole patch series.
  - Split original 0001 patch into 0003 and 0004 patches.
  - Remove the original vhost_internal design, just add those into
    struct virtio_user_dev for simplicity.
  - Reword "control" to "send_request".
  - Reword "host_features" to "device_features". 

In v16.07, we upstreamed a virtual device, virtio_user (with vhost-user
as the backend). The path to go with a vhost-kernel backend has been
dropped for bad performance comparing to vhost-user and code simplicity.

But after a second thought, virtio_user + vhost-kernel is a good 
candidate as an exceptional path, such as KNI, which exchanges packets
with kernel networking stack.
  - maintenance: vhost-net (kernel) is upstreamed and extensively used 
    kernel module. We don't need any out-of-tree module like KNI.
  - performance: as with KNI, this solution would use one or more
    kthreads to send/receive packets from user space DPDK applications,
    which has little impact on user space polling thread (except that
    it might enter into kernel space to wake up those kthreads if
    necessary).
  - features: vhost-net is born to be a networking solution, which has
    lots of networking related featuers, like multi queue, tso, multi-seg
    mbuf, etc.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>


Jianfeng Tan (7):
  net/virtio_user: fix wrongly set features
  net/virtio_user: fix not properly reset device
  net/virtio_user: move vhost user specific code
  net/virtio_user: abstract virtio user backend ops
  net/virtio_user: add vhost kernel support
  net/virtio_user: enable offloading
  net/virtio_user: enable multiqueue with vhost kernel

 doc/guides/rel_notes/release_17_02.rst           |  20 +
 drivers/net/virtio/Makefile                      |   1 +
 drivers/net/virtio/virtio_user/vhost.h           |  51 +--
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 487 +++++++++++++++++++++++
 drivers/net/virtio/virtio_user/vhost_user.c      |  97 +++--
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 138 ++++---
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  16 +-
 drivers/net/virtio/virtio_user_ethdev.c          |  19 +-
 8 files changed, 705 insertions(+), 124 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

-- 
2.7.4

^ permalink raw reply

* [PATCH v3 3/7] net/virtio_user: move vhost user specific code
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

To support vhost kernel as the backend of net_virtio_user in coming
patches, we move vhost_user specific structs and macros into
vhost_user.c, and only keep common definitions in vhost.h.

Besides, remove VHOST_USER_MQ feature check.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost.h           | 36 ------------------------
 drivers/net/virtio/virtio_user/vhost_user.c      | 32 +++++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  9 ------
 3 files changed, 32 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index 7adb55f..e54ac35 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -42,8 +42,6 @@
 #include "../virtio_logs.h"
 #include "../virtqueue.h"
 
-#define VHOST_MEMORY_MAX_NREGIONS 8
-
 struct vhost_vring_state {
 	unsigned int index;
 	unsigned int num;
@@ -105,40 +103,6 @@ struct vhost_memory_region {
 	uint64_t mmap_offset;
 };
 
-struct vhost_memory {
-	uint32_t nregions;
-	uint32_t padding;
-	struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
-};
-
-struct vhost_user_msg {
-	enum vhost_user_request request;
-
-#define VHOST_USER_VERSION_MASK     0x3
-#define VHOST_USER_REPLY_MASK       (0x1 << 2)
-	uint32_t flags;
-	uint32_t size; /* the following payload size */
-	union {
-#define VHOST_USER_VRING_IDX_MASK   0xff
-#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
-		uint64_t u64;
-		struct vhost_vring_state state;
-		struct vhost_vring_addr addr;
-		struct vhost_memory memory;
-	} payload;
-	int fds[VHOST_MEMORY_MAX_NREGIONS];
-} __attribute((packed));
-
-#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
-#define VHOST_USER_PAYLOAD_SIZE \
-	(sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
-
-/* The version of the protocol we support */
-#define VHOST_USER_VERSION    0x1
-
-#define VHOST_USER_F_PROTOCOL_FEATURES 30
-#define VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)
-
 int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
 int vhost_user_setup(const char *path);
 int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c b/drivers/net/virtio/virtio_user/vhost_user.c
index 082e821..295ce16 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -42,6 +42,38 @@
 
 #include "vhost.h"
 
+/* The version of the protocol we support */
+#define VHOST_USER_VERSION    0x1
+
+#define VHOST_MEMORY_MAX_NREGIONS 8
+struct vhost_memory {
+	uint32_t nregions;
+	uint32_t padding;
+	struct vhost_memory_region regions[VHOST_MEMORY_MAX_NREGIONS];
+};
+
+struct vhost_user_msg {
+	enum vhost_user_request request;
+
+#define VHOST_USER_VERSION_MASK     0x3
+#define VHOST_USER_REPLY_MASK       (0x1 << 2)
+	uint32_t flags;
+	uint32_t size; /* the following payload size */
+	union {
+#define VHOST_USER_VRING_IDX_MASK   0xff
+#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
+		uint64_t u64;
+		struct vhost_vring_state state;
+		struct vhost_vring_addr addr;
+		struct vhost_memory memory;
+	} payload;
+	int fds[VHOST_MEMORY_MAX_NREGIONS];
+} __attribute((packed));
+
+#define VHOST_USER_HDR_SIZE offsetof(struct vhost_user_msg, payload.u64)
+#define VHOST_USER_PAYLOAD_SIZE \
+	(sizeof(struct vhost_user_msg) - VHOST_USER_HDR_SIZE)
+
 static int
 vhost_user_write(int fd, void *buf, int len, int *fds, int fd_num)
 {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index a38398b..8dd563a 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -151,8 +151,6 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 	 * VIRTIO_NET_F_MAC and VIRTIO_NET_F_CTRL_VQ is stripped.
 	 */
 	features = dev->features;
-	if (dev->max_queue_pairs > 1)
-		features |= VHOST_USER_MQ;
 	features &= ~(1ull << VIRTIO_NET_F_MAC);
 	features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
 	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
@@ -268,13 +266,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
 	}
 
-	if (dev->max_queue_pairs > 1) {
-		if (!(dev->features & VHOST_USER_MQ)) {
-			PMD_INIT_LOG(ERR, "MQ not supported by the backend");
-			return -1;
-		}
-	}
-
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 4/7] net/virtio_user: abstract virtio user backend ops
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

Add a struct virtio_user_backend_ops to abstract three kinds of backend
operations:
  - setup, create the unix socket connection;
  - send_request, sync messages with backend;
  - enable_qp, enable some queue pair.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost.h           | 17 +++++-
 drivers/net/virtio/virtio_user/vhost_user.c      | 65 +++++++++++---------
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 77 +++++++++++++++---------
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  5 ++
 4 files changed, 106 insertions(+), 58 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index e54ac35..515e4fc 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -96,6 +96,8 @@ enum vhost_user_request {
 	VHOST_USER_MAX
 };
 
+const char * const vhost_msg_strings[VHOST_USER_MAX];
+
 struct vhost_memory_region {
 	uint64_t guest_phys_addr;
 	uint64_t memory_size; /* bytes */
@@ -103,8 +105,17 @@ struct vhost_memory_region {
 	uint64_t mmap_offset;
 };
 
-int vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg);
-int vhost_user_setup(const char *path);
-int vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable);
+struct virtio_user_dev;
+
+struct virtio_user_backend_ops {
+	int (*setup)(struct virtio_user_dev *dev);
+	int (*send_request)(struct virtio_user_dev *dev,
+			    enum vhost_user_request req,
+			    void *arg);
+	int (*enable_qp)(struct virtio_user_dev *dev,
+			 uint16_t pair_idx,
+			 int enable);
+};
 
+struct virtio_user_backend_ops ops_user;
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_user.c b/drivers/net/virtio/virtio_user/vhost_user.c
index 295ce16..a9ca10f 100644
--- a/drivers/net/virtio/virtio_user/vhost_user.c
+++ b/drivers/net/virtio/virtio_user/vhost_user.c
@@ -41,6 +41,7 @@
 #include <errno.h>
 
 #include "vhost.h"
+#include "virtio_user_dev.h"
 
 /* The version of the protocol we support */
 #define VHOST_USER_VERSION    0x1
@@ -255,24 +256,26 @@ prepare_vhost_memory_user(struct vhost_user_msg *msg, int fds[])
 
 static struct vhost_user_msg m;
 
-static const char * const vhost_msg_strings[] = {
-	[VHOST_USER_SET_OWNER] = "VHOST_USER_SET_OWNER",
-	[VHOST_USER_RESET_OWNER] = "VHOST_USER_RESET_OWNER",
-	[VHOST_USER_SET_FEATURES] = "VHOST_USER_SET_FEATURES",
-	[VHOST_USER_GET_FEATURES] = "VHOST_USER_GET_FEATURES",
-	[VHOST_USER_SET_VRING_CALL] = "VHOST_USER_SET_VRING_CALL",
-	[VHOST_USER_SET_VRING_NUM] = "VHOST_USER_SET_VRING_NUM",
-	[VHOST_USER_SET_VRING_BASE] = "VHOST_USER_SET_VRING_BASE",
-	[VHOST_USER_GET_VRING_BASE] = "VHOST_USER_GET_VRING_BASE",
-	[VHOST_USER_SET_VRING_ADDR] = "VHOST_USER_SET_VRING_ADDR",
-	[VHOST_USER_SET_VRING_KICK] = "VHOST_USER_SET_VRING_KICK",
-	[VHOST_USER_SET_MEM_TABLE] = "VHOST_USER_SET_MEM_TABLE",
-	[VHOST_USER_SET_VRING_ENABLE] = "VHOST_USER_SET_VRING_ENABLE",
+const char * const vhost_msg_strings[] = {
+	[VHOST_USER_SET_OWNER] = "VHOST_SET_OWNER",
+	[VHOST_USER_RESET_OWNER] = "VHOST_RESET_OWNER",
+	[VHOST_USER_SET_FEATURES] = "VHOST_SET_FEATURES",
+	[VHOST_USER_GET_FEATURES] = "VHOST_GET_FEATURES",
+	[VHOST_USER_SET_VRING_CALL] = "VHOST_SET_VRING_CALL",
+	[VHOST_USER_SET_VRING_NUM] = "VHOST_SET_VRING_NUM",
+	[VHOST_USER_SET_VRING_BASE] = "VHOST_SET_VRING_BASE",
+	[VHOST_USER_GET_VRING_BASE] = "VHOST_GET_VRING_BASE",
+	[VHOST_USER_SET_VRING_ADDR] = "VHOST_SET_VRING_ADDR",
+	[VHOST_USER_SET_VRING_KICK] = "VHOST_SET_VRING_KICK",
+	[VHOST_USER_SET_MEM_TABLE] = "VHOST_SET_MEM_TABLE",
+	[VHOST_USER_SET_VRING_ENABLE] = "VHOST_SET_VRING_ENABLE",
 	NULL,
 };
 
-int
-vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
+static int
+vhost_user_sock(struct virtio_user_dev *dev,
+		enum vhost_user_request req,
+		void *arg)
 {
 	struct vhost_user_msg msg;
 	struct vhost_vring_file *file = 0;
@@ -280,9 +283,9 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
 	int fd_num = 0;
 	int i, len;
+	int vhostfd = dev->vhostfd;
 
 	RTE_SET_USED(m);
-	RTE_SET_USED(vhost_msg_strings);
 
 	PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
 
@@ -403,15 +406,13 @@ vhost_user_sock(int vhostfd, enum vhost_user_request req, void *arg)
 
 /**
  * Set up environment to talk with a vhost user backend.
- * @param path
- *   - The path to vhost user unix socket file.
  *
  * @return
- *   - (-1) if fail to set up;
- *   - (>=0) if successful, and it is the fd to vhostfd.
+ *   - (-1) if fail;
+ *   - (0) if succeed.
  */
-int
-vhost_user_setup(const char *path)
+static int
+vhost_user_setup(struct virtio_user_dev *dev)
 {
 	int fd;
 	int flag;
@@ -429,18 +430,21 @@ vhost_user_setup(const char *path)
 
 	memset(&un, 0, sizeof(un));
 	un.sun_family = AF_UNIX;
-	snprintf(un.sun_path, sizeof(un.sun_path), "%s", path);
+	snprintf(un.sun_path, sizeof(un.sun_path), "%s", dev->path);
 	if (connect(fd, (struct sockaddr *)&un, sizeof(un)) < 0) {
 		PMD_DRV_LOG(ERR, "connect error, %s", strerror(errno));
 		close(fd);
 		return -1;
 	}
 
-	return fd;
+	dev->vhostfd = fd;
+	return 0;
 }
 
-int
-vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable)
+static int
+vhost_user_enable_queue_pair(struct virtio_user_dev *dev,
+			     uint16_t pair_idx,
+			     int enable)
 {
 	int i;
 
@@ -450,10 +454,15 @@ vhost_user_enable_queue_pair(int vhostfd, uint16_t pair_idx, int enable)
 			.num   = enable,
 		};
 
-		if (vhost_user_sock(vhostfd,
-				    VHOST_USER_SET_VRING_ENABLE, &state))
+		if (vhost_user_sock(dev, VHOST_USER_SET_VRING_ENABLE, &state))
 			return -1;
 	}
 
 	return 0;
 }
+
+struct virtio_user_backend_ops ops_user = {
+	.setup = vhost_user_setup,
+	.send_request = vhost_user_sock,
+	.enable_qp = vhost_user_enable_queue_pair
+};
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 8dd563a..32039a1 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -39,6 +39,9 @@
 #include <sys/mman.h>
 #include <unistd.h>
 #include <sys/eventfd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
 
 #include "vhost.h"
 #include "virtio_user_dev.h"
@@ -64,7 +67,7 @@ virtio_user_create_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	}
 	file.index = queue_sel;
 	file.fd = callfd;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_CALL, &file);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_CALL, &file);
 	dev->callfds[queue_sel] = callfd;
 
 	return 0;
@@ -88,12 +91,12 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 
 	state.index = queue_sel;
 	state.num = vring->num;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_NUM, &state);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_NUM, &state);
 
 	state.num = 0; /* no reservation */
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_BASE, &state);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_BASE, &state);
 
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_ADDR, &addr);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_ADDR, &addr);
 
 	/* Of all per virtqueue MSGs, make sure VHOST_USER_SET_VRING_KICK comes
 	 * lastly because vhost depends on this msg to judge if
@@ -106,7 +109,7 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	}
 	file.index = queue_sel;
 	file.fd = kickfd;
-	vhost_user_sock(dev->vhostfd, VHOST_USER_SET_VRING_KICK, &file);
+	dev->ops->send_request(dev, VHOST_USER_SET_VRING_KICK, &file);
 	dev->kickfds[queue_sel] = kickfd;
 
 	return 0;
@@ -146,20 +149,19 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 	if (virtio_user_queue_setup(dev, virtio_user_create_queue) < 0)
 		goto error;
 
-	/* Step 1: set features
-	 * Make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is enabled,
-	 * VIRTIO_NET_F_MAC and VIRTIO_NET_F_CTRL_VQ is stripped.
-	 */
+	/* Step 1: set features */
 	features = dev->features;
+	/* Strip VIRTIO_NET_F_MAC, as MAC address is handled in vdev init */
 	features &= ~(1ull << VIRTIO_NET_F_MAC);
+	/* Strip VIRTIO_NET_F_CTRL_VQ, as devices do not really need to know */
 	features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
-	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
+	ret = dev->ops->send_request(dev, VHOST_USER_SET_FEATURES, &features);
 	if (ret < 0)
 		goto error;
 	PMD_DRV_LOG(INFO, "set features: %" PRIx64, features);
 
 	/* Step 2: share memory regions */
-	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_MEM_TABLE, NULL);
+	ret = dev->ops->send_request(dev, VHOST_USER_SET_MEM_TABLE, NULL);
 	if (ret < 0)
 		goto error;
 
@@ -170,7 +172,7 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 	/* Step 4: enable queues
 	 * we enable the 1st queue pair by default.
 	 */
-	vhost_user_enable_queue_pair(dev->vhostfd, 0, 1);
+	dev->ops->enable_qp(dev, 0, 1);
 
 	return 0;
 error:
@@ -188,7 +190,7 @@ int virtio_user_stop_device(struct virtio_user_dev *dev)
 	}
 
 	for (i = 0; i < dev->max_queue_pairs; ++i)
-		vhost_user_enable_queue_pair(dev->vhostfd, i, 0);
+		dev->ops->enable_qp(dev, i, 0);
 
 	return 0;
 }
@@ -214,36 +216,57 @@ parse_mac(struct virtio_user_dev *dev, const char *mac)
 	}
 }
 
+static int
+is_vhost_user_by_type(const char *path)
+{
+	struct stat sb;
+
+	if (stat(path, &sb) == -1)
+		return 0;
+
+	return S_ISSOCK(sb.st_mode);
+}
+
+static int
+virtio_user_dev_setup(struct virtio_user_dev *dev)
+{
+	uint32_t i;
+
+	dev->vhostfd = -1;
+	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
+		dev->kickfds[i] = -1;
+		dev->callfds[i] = -1;
+	}
+
+	if (is_vhost_user_by_type(dev->path)) {
+		dev->ops = &ops_user;
+		return dev->ops->setup(dev);
+	}
+
+	return -1;
+}
+
 int
 virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 		     int cq, int queue_size, const char *mac)
 {
-	uint32_t i;
-
 	snprintf(dev->path, PATH_MAX, "%s", path);
 	dev->max_queue_pairs = queues;
 	dev->queue_pairs = 1; /* mq disabled by default */
 	dev->queue_size = queue_size;
 	dev->mac_specified = 0;
 	parse_mac(dev, mac);
-	dev->vhostfd = -1;
-
-	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
-		dev->kickfds[i] = -1;
-		dev->callfds[i] = -1;
-	}
 
-	dev->vhostfd = vhost_user_setup(dev->path);
-	if (dev->vhostfd < 0) {
+	if (virtio_user_dev_setup(dev) < 0) {
 		PMD_INIT_LOG(ERR, "backend set up fails");
 		return -1;
 	}
-	if (vhost_user_sock(dev->vhostfd, VHOST_USER_SET_OWNER, NULL) < 0) {
+	if (dev->ops->send_request(dev, VHOST_USER_SET_OWNER, NULL) < 0) {
 		PMD_INIT_LOG(ERR, "set_owner fails: %s", strerror(errno));
 		return -1;
 	}
 
-	if (vhost_user_sock(dev->vhostfd, VHOST_USER_GET_FEATURES,
+	if (dev->ops->send_request(dev, VHOST_USER_GET_FEATURES,
 			    &dev->device_features) < 0) {
 		PMD_INIT_LOG(ERR, "get_features failed: %s", strerror(errno));
 		return -1;
@@ -288,9 +311,9 @@ virtio_user_handle_mq(struct virtio_user_dev *dev, uint16_t q_pairs)
 	}
 
 	for (i = 0; i < q_pairs; ++i)
-		ret |= vhost_user_enable_queue_pair(dev->vhostfd, i, 1);
+		ret |= dev->ops->enable_qp(dev, i, 1);
 	for (i = q_pairs; i < dev->max_queue_pairs; ++i)
-		ret |= vhost_user_enable_queue_pair(dev->vhostfd, i, 0);
+		ret |= dev->ops->enable_qp(dev, i, 0);
 
 	dev->queue_pairs = q_pairs;
 
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 28fc788..9f2f82e 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -37,9 +37,13 @@
 #include <limits.h>
 #include "../virtio_pci.h"
 #include "../virtio_ring.h"
+#include "vhost.h"
 
 struct virtio_user_dev {
+	/* for vhost_user backend */
 	int		vhostfd;
+
+	/* for both vhost_user and vhost_kernel */
 	int		callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
 	int		kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
 	int		mac_specified;
@@ -54,6 +58,7 @@ struct virtio_user_dev {
 	uint8_t		mac_addr[ETHER_ADDR_LEN];
 	char		path[PATH_MAX];
 	struct vring	vrings[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
+	struct virtio_user_backend_ops *ops;
 };
 
 int virtio_user_start_device(struct virtio_user_dev *dev);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 5/7] net/virtio_user: add vhost kernel support
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

This patch add support vhost kernel as the backend for virtio_user.
Three main hook functions are added:
  - vhost_kernel_setup() to open char device, each vq pair needs one
    vhostfd;
  - vhost_kernel_ioctl() to communicate control messages with vhost
    kernel module;
  - vhost_kernel_enable_queue_pair() to open tap device and set it
    as the backend of corresonding vhost fd (that is to say, vq pair).

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 doc/guides/rel_notes/release_17_02.rst           |  20 ++
 drivers/net/virtio/Makefile                      |   1 +
 drivers/net/virtio/virtio_user/vhost.h           |   2 +
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 373 +++++++++++++++++++++++
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  21 +-
 drivers/net/virtio/virtio_user/virtio_user_dev.h |   6 +
 6 files changed, 420 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/virtio/virtio_user/vhost_kernel.c

diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 180af82..7354df5 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -52,6 +52,26 @@ New Features
   See the :ref:`Generic flow API <Generic_flow_API>` documentation for more
   information.
 
+* **virtio_user with vhost-kernel as another exceptional path.**
+
+  Previously, we upstreamed a virtual device, virtio_user with vhost-user
+  as the backend, as a way for IPC (Inter-Process Communication) and user
+  space container networking.
+
+  Virtio_user with vhost-kernel as the backend is a solution for exceptional
+  path, such as KNI, which exchanges packets with kernel networking stack.
+  This solution is very promising in:
+
+  * maintenance: vhost and vhost-net (kernel) is upstreamed and extensively
+    used kernel module.
+  * features: vhost-net is born to be a networking solution, which has
+    lots of networking related featuers, like multi queue, tso, multi-seg
+    mbuf, etc.
+  * performance: similar to KNI, this solution would use one or more
+    kthreads to send/receive packets from user space DPDK applications,
+    which has little impact on user space polling thread (except that
+    it might enter into kernel space to wake up those kthreads if
+    necessary).
 
 Resolved Issues
 ---------------
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 97972a6..faeffb2 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -60,6 +60,7 @@ endif
 
 ifeq ($(CONFIG_RTE_VIRTIO_USER),y)
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_user.c
+SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/vhost_kernel.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user/virtio_user_dev.c
 SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_user_ethdev.c
 endif
diff --git a/drivers/net/virtio/virtio_user/vhost.h b/drivers/net/virtio/virtio_user/vhost.h
index 515e4fc..5c983bd 100644
--- a/drivers/net/virtio/virtio_user/vhost.h
+++ b/drivers/net/virtio/virtio_user/vhost.h
@@ -118,4 +118,6 @@ struct virtio_user_backend_ops {
 };
 
 struct virtio_user_backend_ops ops_user;
+struct virtio_user_backend_ops ops_kernel;
+
 #endif
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
new file mode 100644
index 0000000..1e7cdef
--- /dev/null
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -0,0 +1,373 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include <string.h>
+#include <errno.h>
+
+#include <rte_memory.h>
+#include <rte_eal_memconfig.h>
+
+#include "vhost.h"
+#include "virtio_user_dev.h"
+
+struct vhost_memory_kernel {
+	uint32_t nregions;
+	uint32_t padding;
+	struct vhost_memory_region regions[0];
+};
+
+/* vhost kernel ioctls */
+#define VHOST_VIRTIO 0xAF
+#define VHOST_GET_FEATURES _IOR(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_FEATURES _IOW(VHOST_VIRTIO, 0x00, __u64)
+#define VHOST_SET_OWNER _IO(VHOST_VIRTIO, 0x01)
+#define VHOST_RESET_OWNER _IO(VHOST_VIRTIO, 0x02)
+#define VHOST_SET_MEM_TABLE _IOW(VHOST_VIRTIO, 0x03, struct vhost_memory_kernel)
+#define VHOST_SET_LOG_BASE _IOW(VHOST_VIRTIO, 0x04, __u64)
+#define VHOST_SET_LOG_FD _IOW(VHOST_VIRTIO, 0x07, int)
+#define VHOST_SET_VRING_NUM _IOW(VHOST_VIRTIO, 0x10, struct vhost_vring_state)
+#define VHOST_SET_VRING_ADDR _IOW(VHOST_VIRTIO, 0x11, struct vhost_vring_addr)
+#define VHOST_SET_VRING_BASE _IOW(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+#define VHOST_GET_VRING_BASE _IOWR(VHOST_VIRTIO, 0x12, struct vhost_vring_state)
+#define VHOST_SET_VRING_KICK _IOW(VHOST_VIRTIO, 0x20, struct vhost_vring_file)
+#define VHOST_SET_VRING_CALL _IOW(VHOST_VIRTIO, 0x21, struct vhost_vring_file)
+#define VHOST_SET_VRING_ERR _IOW(VHOST_VIRTIO, 0x22, struct vhost_vring_file)
+#define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file)
+
+/* TUN ioctls */
+#define TUNSETIFF     _IOW('T', 202, int)
+#define TUNGETFEATURES _IOR('T', 207, unsigned int)
+#define TUNSETOFFLOAD  _IOW('T', 208, unsigned int)
+#define TUNGETIFF      _IOR('T', 210, unsigned int)
+#define TUNSETSNDBUF   _IOW('T', 212, int)
+#define TUNGETVNETHDRSZ _IOR('T', 215, int)
+#define TUNSETVNETHDRSZ _IOW('T', 216, int)
+#define TUNSETQUEUE  _IOW('T', 217, int)
+#define TUNSETVNETLE _IOW('T', 220, int)
+#define TUNSETVNETBE _IOW('T', 222, int)
+
+/* TUNSETIFF ifr flags */
+#define IFF_TAP          0x0002
+#define IFF_NO_PI        0x1000
+#define IFF_ONE_QUEUE    0x2000
+#define IFF_VNET_HDR     0x4000
+#define IFF_MULTI_QUEUE  0x0100
+#define IFF_ATTACH_QUEUE 0x0200
+#define IFF_DETACH_QUEUE 0x0400
+
+/* Constants */
+#define TUN_DEF_SNDBUF	(1ull << 20)
+#define PATH_NET_TUN	"/dev/net/tun"
+#define VHOST_KERNEL_MAX_REGIONS	64
+
+static uint64_t vhost_req_user_to_kernel[] = {
+	[VHOST_USER_SET_OWNER] = VHOST_SET_OWNER,
+	[VHOST_USER_RESET_OWNER] = VHOST_RESET_OWNER,
+	[VHOST_USER_SET_FEATURES] = VHOST_SET_FEATURES,
+	[VHOST_USER_GET_FEATURES] = VHOST_GET_FEATURES,
+	[VHOST_USER_SET_VRING_CALL] = VHOST_SET_VRING_CALL,
+	[VHOST_USER_SET_VRING_NUM] = VHOST_SET_VRING_NUM,
+	[VHOST_USER_SET_VRING_BASE] = VHOST_SET_VRING_BASE,
+	[VHOST_USER_GET_VRING_BASE] = VHOST_GET_VRING_BASE,
+	[VHOST_USER_SET_VRING_ADDR] = VHOST_SET_VRING_ADDR,
+	[VHOST_USER_SET_VRING_KICK] = VHOST_SET_VRING_KICK,
+	[VHOST_USER_SET_MEM_TABLE] = VHOST_SET_MEM_TABLE,
+};
+
+/* By default, vhost kernel module allows 64 regions, but DPDK allows
+ * 256 segments. As a relief, below function merges those virtually
+ * adjacent memsegs into one region.
+ */
+static struct vhost_memory_kernel *
+prepare_vhost_memory_kernel(void)
+{
+	uint32_t i, j, k = 0;
+	struct rte_memseg *seg;
+	struct vhost_memory_region *mr;
+	struct vhost_memory_kernel *vm;
+
+	vm = malloc(sizeof(struct vhost_memory_kernel) +
+		    VHOST_KERNEL_MAX_REGIONS *
+		    sizeof(struct vhost_memory_region));
+
+	for (i = 0; i < RTE_MAX_MEMSEG; ++i) {
+		seg = &rte_eal_get_configuration()->mem_config->memseg[i];
+		if (!seg->addr)
+			break;
+
+		int new_region = 1;
+
+		for (j = 0; j < k; ++j) {
+			mr = &vm->regions[j];
+
+			if (mr->userspace_addr + mr->memory_size ==
+			    (uint64_t)(uintptr_t)seg->addr) {
+				mr->memory_size += seg->len;
+				new_region = 0;
+				break;
+			}
+
+			if ((uint64_t)(uintptr_t)seg->addr + seg->len ==
+			    mr->userspace_addr) {
+				mr->guest_phys_addr =
+					(uint64_t)(uintptr_t)seg->addr;
+				mr->userspace_addr =
+					(uint64_t)(uintptr_t)seg->addr;
+				mr->memory_size += seg->len;
+				new_region = 0;
+				break;
+			}
+		}
+
+		if (new_region == 0)
+			continue;
+
+		mr = &vm->regions[k++];
+		/* use vaddr here! */
+		mr->guest_phys_addr = (uint64_t)(uintptr_t)seg->addr;
+		mr->userspace_addr = (uint64_t)(uintptr_t)seg->addr;
+		mr->memory_size = seg->len;
+		mr->mmap_offset = 0;
+
+		if (k >= VHOST_KERNEL_MAX_REGIONS) {
+			free(vm);
+			return NULL;
+		}
+	}
+
+	vm->nregions = k;
+	vm->padding = 0;
+	return vm;
+}
+
+static int
+vhost_kernel_ioctl(struct virtio_user_dev *dev,
+		   enum vhost_user_request req,
+		   void *arg)
+{
+	int i, ret = -1;
+	uint64_t req_kernel;
+	struct vhost_memory_kernel *vm = NULL;
+
+	PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
+
+	req_kernel = vhost_req_user_to_kernel[req];
+
+	if (req_kernel == VHOST_SET_MEM_TABLE) {
+		vm = prepare_vhost_memory_kernel();
+		if (!vm)
+			return -1;
+		arg = (void *)vm;
+	}
+
+	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
+	if (req_kernel == VHOST_SET_FEATURES)
+		*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
+
+	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
+		if (dev->vhostfds[i] < 0)
+			continue;
+
+		ret = ioctl(dev->vhostfds[i], req_kernel, arg);
+		if (ret < 0)
+			break;
+	}
+
+	if (vm)
+		free(vm);
+
+	if (ret < 0)
+		PMD_DRV_LOG(ERR, "%s failed: %s",
+			    vhost_msg_strings[req], strerror(errno));
+
+	return ret;
+}
+
+/**
+ * Set up environment to talk with a vhost kernel backend.
+ *
+ * @return
+ *   - (-1) if fail to set up;
+ *   - (>=0) if successful.
+ */
+static int
+vhost_kernel_setup(struct virtio_user_dev *dev)
+{
+	int vhostfd;
+	uint32_t i;
+
+	for (i = 0; i < dev->max_queue_pairs; ++i) {
+		vhostfd = open(dev->path, O_RDWR);
+		if (vhostfd < 0) {
+			PMD_DRV_LOG(ERR, "fail to open %s, %s",
+				    dev->path, strerror(errno));
+			return -1;
+		}
+
+		dev->vhostfds[i] = vhostfd;
+	}
+
+	return 0;
+}
+
+static int
+vhost_kernel_set_backend(int vhostfd, int tapfd)
+{
+	struct vhost_vring_file f;
+
+	f.fd = tapfd;
+	f.index = 0;
+	if (ioctl(vhostfd, VHOST_NET_SET_BACKEND, &f) < 0) {
+		PMD_DRV_LOG(ERR, "VHOST_NET_SET_BACKEND fails, %s",
+				strerror(errno));
+		return -1;
+	}
+
+	f.index = 1;
+	if (ioctl(vhostfd, VHOST_NET_SET_BACKEND, &f) < 0) {
+		PMD_DRV_LOG(ERR, "VHOST_NET_SET_BACKEND fails, %s",
+				strerror(errno));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
+			       uint16_t pair_idx,
+			       int enable)
+{
+	unsigned int tap_features;
+	int sndbuf = TUN_DEF_SNDBUF;
+	struct ifreq ifr;
+	int hdr_size;
+	int vhostfd;
+	int tapfd;
+
+	vhostfd = dev->vhostfds[pair_idx];
+
+	if (!enable) {
+		if (dev->tapfds[pair_idx]) {
+			close(dev->tapfds[pair_idx]);
+			dev->tapfds[pair_idx] = -1;
+		}
+		return vhost_kernel_set_backend(vhostfd, -1);
+	} else if (dev->tapfds[pair_idx] >= 0) {
+		return 0;
+	}
+
+	if ((dev->features & (1ULL << VIRTIO_NET_F_MRG_RXBUF)) ||
+	    (dev->features & (1ULL << VIRTIO_F_VERSION_1)))
+		hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hdr_size = sizeof(struct virtio_net_hdr);
+
+	/* TODO:
+	 * 1. verify we can get/set vnet_hdr_len, tap_probe_vnet_hdr_len
+	 * 2. get number of memory regions from vhost module parameter
+	 * max_mem_regions, supported in newer version linux kernel
+	 */
+	tapfd = open(PATH_NET_TUN, O_RDWR);
+	if (tapfd < 0) {
+		PMD_DRV_LOG(ERR, "fail to open %s: %s",
+			    PATH_NET_TUN, strerror(errno));
+		return -1;
+	}
+
+	/* Construct ifr */
+	memset(&ifr, 0, sizeof(ifr));
+	ifr.ifr_flags = IFF_TAP | IFF_NO_PI;
+
+	if (ioctl(tapfd, TUNGETFEATURES, &tap_features) == -1) {
+		PMD_DRV_LOG(ERR, "TUNGETFEATURES failed: %s", strerror(errno));
+		goto error;
+	}
+	if (tap_features & IFF_ONE_QUEUE)
+		ifr.ifr_flags |= IFF_ONE_QUEUE;
+
+	/* Let tap instead of vhost-net handle vnet header, as the latter does
+	 * not support offloading. And in this case, we should not set feature
+	 * bit VHOST_NET_F_VIRTIO_NET_HDR.
+	 */
+	if (tap_features & IFF_VNET_HDR) {
+		ifr.ifr_flags |= IFF_VNET_HDR;
+	} else {
+		PMD_DRV_LOG(ERR, "TAP does not support IFF_VNET_HDR");
+		goto error;
+	}
+
+	if (dev->ifname)
+		strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
+	else
+		strncpy(ifr.ifr_name, "tap%d", IFNAMSIZ);
+	if (ioctl(tapfd, TUNSETIFF, (void *)&ifr) == -1) {
+		PMD_DRV_LOG(ERR, "TUNSETIFF failed: %s", strerror(errno));
+		goto error;
+	}
+
+	fcntl(tapfd, F_SETFL, O_NONBLOCK);
+
+	if (ioctl(tapfd, TUNSETVNETHDRSZ, &hdr_size) < 0) {
+		PMD_DRV_LOG(ERR, "TUNSETVNETHDRSZ failed: %s", strerror(errno));
+		goto error;
+	}
+
+	if (ioctl(tapfd, TUNSETSNDBUF, &sndbuf) < 0) {
+		PMD_DRV_LOG(ERR, "TUNSETSNDBUF failed: %s", strerror(errno));
+		goto error;
+	}
+
+	if (vhost_kernel_set_backend(vhostfd, tapfd) < 0)
+		goto error;
+
+	dev->tapfds[pair_idx] = tapfd;
+	if (!dev->ifname)
+		dev->ifname = strdup(ifr.ifr_name);
+
+	return 0;
+error:
+	return -1;
+}
+
+struct virtio_user_backend_ops ops_kernel = {
+	.setup = vhost_kernel_setup,
+	.send_request = vhost_kernel_ioctl,
+	.enable_qp = vhost_kernel_enable_queue_pair
+};
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 32039a1..c40b77e 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -192,6 +192,9 @@ int virtio_user_stop_device(struct virtio_user_dev *dev)
 	for (i = 0; i < dev->max_queue_pairs; ++i)
 		dev->ops->enable_qp(dev, i, 0);
 
+	free(dev->ifname);
+	dev->ifname = NULL;
+
 	return 0;
 }
 
@@ -230,7 +233,7 @@ is_vhost_user_by_type(const char *path)
 static int
 virtio_user_dev_setup(struct virtio_user_dev *dev)
 {
-	uint32_t i;
+	uint32_t i, q;
 
 	dev->vhostfd = -1;
 	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
@@ -238,12 +241,18 @@ virtio_user_dev_setup(struct virtio_user_dev *dev)
 		dev->callfds[i] = -1;
 	}
 
+	for (q = 0; q < VHOST_KERNEL_MAX_QUEUES; ++q) {
+		dev->vhostfds[q] = -1;
+		dev->tapfds[q] = -1;
+	}
+
 	if (is_vhost_user_by_type(dev->path)) {
 		dev->ops = &ops_user;
-		return dev->ops->setup(dev);
+	} else {
+		dev->ops = &ops_kernel;
 	}
 
-	return -1;
+	return dev->ops->setup(dev);
 }
 
 int
@@ -295,7 +304,13 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 void
 virtio_user_dev_uninit(struct virtio_user_dev *dev)
 {
+	uint32_t i;
+
+	virtio_user_stop_device(dev);
+
 	close(dev->vhostfd);
+	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i)
+		close(dev->vhostfds[i]);
 }
 
 static uint8_t
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 9f2f82e..148b2e6 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -43,6 +43,12 @@ struct virtio_user_dev {
 	/* for vhost_user backend */
 	int		vhostfd;
 
+	/* for vhost_kernel backend */
+	char		*ifname;
+#define VHOST_KERNEL_MAX_QUEUES		8
+	int		vhostfds[VHOST_KERNEL_MAX_QUEUES];
+	int		tapfds[VHOST_KERNEL_MAX_QUEUES];
+
 	/* for both vhost_user and vhost_kernel */
 	int		callfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
 	int		kickfds[VIRTIO_MAX_VIRTQUEUES * 2 + 1];
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 6/7] net/virtio_user: enable offloading
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

When used with vhost kernel backend, we can offload at both directions.
  - From vhost kernel to virtio_user, the offload is enabled so that
    DPDK app can trust the flow is checksum-correct; and if DPDK app
    sends it through another port, the checksum needs to be
    recalculated or offloaded. It also applies to TSO.
  - From virtio_user to vhost_kernel, the offload is enabled so that
    kernel can trust the flow is L4-checksum-correct, no need to verify
    it; if kernel will consume it, DPDK app should make sure the
    l3-checksum is correctly set.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c | 61 ++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index 1e7cdef..bdb4af2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -91,6 +91,13 @@ struct vhost_memory_kernel {
 #define IFF_ATTACH_QUEUE 0x0200
 #define IFF_DETACH_QUEUE 0x0400
 
+/* Features for GSO (TUNSETOFFLOAD). */
+#define TUN_F_CSUM	0x01	/* You can hand me unchecksummed packets. */
+#define TUN_F_TSO4	0x02	/* I can handle TSO for IPv4 packets */
+#define TUN_F_TSO6	0x04	/* I can handle TSO for IPv6 packets */
+#define TUN_F_TSO_ECN	0x08	/* I can handle TSO with ECN bits. */
+#define TUN_F_UFO	0x10	/* I can handle UFO packets */
+
 /* Constants */
 #define TUN_DEF_SNDBUF	(1ull << 20)
 #define PATH_NET_TUN	"/dev/net/tun"
@@ -176,6 +183,28 @@ prepare_vhost_memory_kernel(void)
 	return vm;
 }
 
+/* with below features, vhost kernel does not need to do the checksum and TSO,
+ * these info will be passed to virtio_user through virtio net header.
+ */
+#define VHOST_KERNEL_GUEST_OFFLOADS_MASK	\
+	((1ULL << VIRTIO_NET_F_GUEST_CSUM) |	\
+	 (1ULL << VIRTIO_NET_F_GUEST_TSO4) |	\
+	 (1ULL << VIRTIO_NET_F_GUEST_TSO6) |	\
+	 (1ULL << VIRTIO_NET_F_GUEST_ECN)  |	\
+	 (1ULL << VIRTIO_NET_F_GUEST_UFO))
+
+/* with below features, when flows from virtio_user to vhost kernel
+ * (1) if flows goes up through the kernel networking stack, it does not need
+ * to verify checksum, which can save CPU cycles;
+ * (2) if flows goes through a Linux bridge and outside from an interface
+ * (kernel driver), checksum and TSO will be done by GSO in kernel or even
+ * offloaded into real physical device.
+ */
+#define VHOST_KERNEL_HOST_OFFLOADS_MASK		\
+	((1ULL << VIRTIO_NET_F_HOST_TSO4) |	\
+	 (1ULL << VIRTIO_NET_F_HOST_TSO6) |	\
+	 (1ULL << VIRTIO_NET_F_CSUM))
+
 static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		   enum vhost_user_request req,
@@ -196,10 +225,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		arg = (void *)vm;
 	}
 
-	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
-	if (req_kernel == VHOST_SET_FEATURES)
+	if (req_kernel == VHOST_SET_FEATURES) {
+		/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
 		*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);
 
+		/* VHOST kernel does not know about below flags */
+		*(uint64_t *)arg &= ~VHOST_KERNEL_GUEST_OFFLOADS_MASK;
+		*(uint64_t *)arg &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
+	}
+
 	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
 		if (dev->vhostfds[i] < 0)
 			continue;
@@ -209,6 +243,15 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 			break;
 	}
 
+	if (!ret && req_kernel == VHOST_GET_FEATURES) {
+		/* with tap as the backend, all these features are supported
+		 * but not claimed by vhost-net, so we add them back when
+		 * reporting to upper layer.
+		 */
+		*((uint64_t *)arg) |= VHOST_KERNEL_GUEST_OFFLOADS_MASK;
+		*((uint64_t *)arg) |= VHOST_KERNEL_HOST_OFFLOADS_MASK;
+	}
+
 	if (vm)
 		free(vm);
 
@@ -280,6 +323,12 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 	int hdr_size;
 	int vhostfd;
 	int tapfd;
+	unsigned int offload =
+			TUN_F_CSUM |
+			TUN_F_TSO4 |
+			TUN_F_TSO6 |
+			TUN_F_TSO_ECN |
+			TUN_F_UFO;
 
 	vhostfd = dev->vhostfds[pair_idx];
 
@@ -354,6 +403,14 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 		goto error;
 	}
 
+	/* TODO: before set the offload capabilities, we'd better (1) check
+	 * negotiated features to see if necessary to offload; (2) query tap
+	 * to see if it supports the offload capabilities.
+	 */
+	if (ioctl(tapfd, TUNSETOFFLOAD, offload) != 0)
+		PMD_DRV_LOG(ERR, "TUNSETOFFLOAD ioctl() failed: %s",
+			   strerror(errno));
+
 	if (vhost_kernel_set_backend(vhostfd, tapfd) < 0)
 		goto error;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 7/7] net/virtio_user: enable multiqueue with vhost kernel
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

With vhost kernel, to enable multiqueue, we need backend device
in kernel support multiqueue feature. Specifically, with tap
as the backend, as linux/Documentation/networking/tuntap.txt shows,
we check if tap supports IFF_MULTI_QUEUE feature.

And for vhost kernel, each queue pair has a vhost fd, and with a tap
fd binding this vhost fd. All tap fds are set with the same tap
interface name.

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/vhost_kernel.c    | 69 +++++++++++++++++++++---
 drivers/net/virtio/virtio_user/virtio_user_dev.c |  1 +
 2 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index bdb4af2..023bdf8 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -206,6 +206,29 @@ prepare_vhost_memory_kernel(void)
 	 (1ULL << VIRTIO_NET_F_CSUM))
 
 static int
+tap_supporte_mq(void)
+{
+	int tapfd;
+	unsigned int tap_features;
+
+	tapfd = open(PATH_NET_TUN, O_RDWR);
+	if (tapfd < 0) {
+		PMD_DRV_LOG(ERR, "fail to open %s: %s",
+			    PATH_NET_TUN, strerror(errno));
+		return -1;
+	}
+
+	if (ioctl(tapfd, TUNGETFEATURES, &tap_features) == -1) {
+		PMD_DRV_LOG(ERR, "TUNGETFEATURES failed: %s", strerror(errno));
+		close(tapfd);
+		return -1;
+	}
+
+	close(tapfd);
+	return tap_features & IFF_MULTI_QUEUE;
+}
+
+static int
 vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		   enum vhost_user_request req,
 		   void *arg)
@@ -213,6 +236,8 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 	int i, ret = -1;
 	uint64_t req_kernel;
 	struct vhost_memory_kernel *vm = NULL;
+	int vhostfd;
+	unsigned int queue_sel;
 
 	PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
 
@@ -232,15 +257,37 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		/* VHOST kernel does not know about below flags */
 		*(uint64_t *)arg &= ~VHOST_KERNEL_GUEST_OFFLOADS_MASK;
 		*(uint64_t *)arg &= ~VHOST_KERNEL_HOST_OFFLOADS_MASK;
+
+		*(uint64_t *)arg &= ~(1ULL << VIRTIO_NET_F_MQ);
 	}
 
-	for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
-		if (dev->vhostfds[i] < 0)
-			continue;
+	switch (req_kernel) {
+	case VHOST_SET_VRING_NUM:
+	case VHOST_SET_VRING_ADDR:
+	case VHOST_SET_VRING_BASE:
+	case VHOST_GET_VRING_BASE:
+	case VHOST_SET_VRING_KICK:
+	case VHOST_SET_VRING_CALL:
+		queue_sel = *(unsigned int *)arg;
+		vhostfd = dev->vhostfds[queue_sel / 2];
+		*(unsigned int *)arg = queue_sel % 2;
+		PMD_DRV_LOG(DEBUG, "vhostfd=%d, index=%u",
+			    vhostfd, *(unsigned int *)arg);
+		break;
+	default:
+		vhostfd = -1;
+	}
+	if (vhostfd == -1) {
+		for (i = 0; i < VHOST_KERNEL_MAX_QUEUES; ++i) {
+			if (dev->vhostfds[i] < 0)
+				continue;
 
-		ret = ioctl(dev->vhostfds[i], req_kernel, arg);
-		if (ret < 0)
-			break;
+			ret = ioctl(dev->vhostfds[i], req_kernel, arg);
+			if (ret < 0)
+				break;
+		}
+	} else {
+		ret = ioctl(vhostfd, req_kernel, arg);
 	}
 
 	if (!ret && req_kernel == VHOST_GET_FEATURES) {
@@ -250,6 +297,12 @@ vhost_kernel_ioctl(struct virtio_user_dev *dev,
 		 */
 		*((uint64_t *)arg) |= VHOST_KERNEL_GUEST_OFFLOADS_MASK;
 		*((uint64_t *)arg) |= VHOST_KERNEL_HOST_OFFLOADS_MASK;
+
+		/* vhost_kernel will not declare this feature, but it does
+		 * support multi-queue.
+		 */
+		if (tap_supporte_mq())
+			*(uint64_t *)arg |= (1ull << VIRTIO_NET_F_MQ);
 	}
 
 	if (vm)
@@ -329,6 +382,7 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 			TUN_F_TSO6 |
 			TUN_F_TSO_ECN |
 			TUN_F_UFO;
+	int req_mq = (dev->max_queue_pairs > 1);
 
 	vhostfd = dev->vhostfds[pair_idx];
 
@@ -382,6 +436,9 @@ vhost_kernel_enable_queue_pair(struct virtio_user_dev *dev,
 		goto error;
 	}
 
+	if (req_mq)
+		ifr.ifr_flags |= IFF_MULTI_QUEUE;
+
 	if (dev->ifname)
 		strncpy(ifr.ifr_name, dev->ifname, IFNAMSIZ);
 	else
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index c40b77e..2d9d989 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -93,6 +93,7 @@ virtio_user_kick_queue(struct virtio_user_dev *dev, uint32_t queue_sel)
 	state.num = vring->num;
 	dev->ops->send_request(dev, VHOST_USER_SET_VRING_NUM, &state);
 
+	state.index = queue_sel;
 	state.num = 0; /* no reservation */
 	dev->ops->send_request(dev, VHOST_USER_SET_VRING_BASE, &state);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 1/7] net/virtio_user: fix wrongly set features
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan, stable
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

Before the commit 86d59b21468a ("net/virtio: support LRO"), features
in virtio PMD, is decided and properly set at device initialization
and will not be changed. But afterward, features could be changed in
virtio_dev_configure(), and will be re-negotiated if it's changed.

In virtio_user, device features is obtained at driver probe phase
only once, but we did not store it. So the added feature bits in
re-negotiation will fail.

To fix it, we store it down, and will be used to feature negotiation
either at device initialization phase or device configure phase.

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")

CC: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 34 +++++++++++-------------
 drivers/net/virtio/virtio_user/virtio_user_dev.h |  5 +++-
 drivers/net/virtio/virtio_user_ethdev.c          |  4 +--
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index e239e0e..0d7e17b 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -148,12 +148,13 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
 	/* Step 1: set features
 	 * Make sure VHOST_USER_F_PROTOCOL_FEATURES is added if mq is enabled,
-	 * and VIRTIO_NET_F_MAC is stripped.
+	 * VIRTIO_NET_F_MAC and VIRTIO_NET_F_CTRL_VQ is stripped.
 	 */
 	features = dev->features;
 	if (dev->max_queue_pairs > 1)
 		features |= VHOST_USER_MQ;
 	features &= ~(1ull << VIRTIO_NET_F_MAC);
+	features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
 	ret = vhost_user_sock(dev->vhostfd, VHOST_USER_SET_FEATURES, &features);
 	if (ret < 0)
 		goto error;
@@ -228,29 +229,26 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 	}
 
 	if (vhost_user_sock(dev->vhostfd, VHOST_USER_GET_FEATURES,
-			    &dev->features) < 0) {
+			    &dev->device_features) < 0) {
 		PMD_INIT_LOG(ERR, "get_features failed: %s", strerror(errno));
 		return -1;
 	}
 	if (dev->mac_specified)
-		dev->features |= (1ull << VIRTIO_NET_F_MAC);
+		dev->device_features |= (1ull << VIRTIO_NET_F_MAC);
 
-	if (!cq) {
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
-		/* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
-		dev->features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
-		dev->features &= ~(1ull << VIRTIO_NET_F_MQ);
-		dev->features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
-	} else {
-		/* vhost user backend does not need to know ctrl-q, so
-		 * actually we need add this bit into features. However,
-		 * DPDK vhost-user does send features with this bit, so we
-		 * check it instead of OR it for now.
+	if (cq) {
+		/* device does not really need to know anything about CQ,
+		 * so if necessary, we just claim to support CQ
 		 */
-		if (!(dev->features & (1ull << VIRTIO_NET_F_CTRL_VQ)))
-			PMD_INIT_LOG(INFO, "vhost does not support ctrl-q");
+		dev->device_features |= (1ull << VIRTIO_NET_F_CTRL_VQ);
+	} else {
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VQ);
+		/* Also disable features depends on VIRTIO_NET_F_CTRL_VQ */
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_RX);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_VLAN);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_GUEST_ANNOUNCE);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_MQ);
+		dev->device_features &= ~(1ull << VIRTIO_NET_F_CTRL_MAC_ADDR);
 	}
 
 	if (dev->max_queue_pairs > 1) {
diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.h b/drivers/net/virtio/virtio_user/virtio_user_dev.h
index 33690b5..28fc788 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.h
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.h
@@ -46,7 +46,10 @@ struct virtio_user_dev {
 	uint32_t	max_queue_pairs;
 	uint32_t	queue_pairs;
 	uint32_t	queue_size;
-	uint64_t	features;
+	uint64_t	features; /* the negotiated features with driver,
+				   * and will be sync with device
+				   */
+	uint64_t	device_features; /* supported features by device */
 	uint8_t		status;
 	uint8_t		mac_addr[ETHER_ADDR_LEN];
 	char		path[PATH_MAX];
diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c
index 8cb983c..4a5a227 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -117,7 +117,7 @@ virtio_user_get_features(struct virtio_hw *hw)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	return dev->features;
+	return dev->device_features;
 }
 
 static void
@@ -125,7 +125,7 @@ virtio_user_set_features(struct virtio_hw *hw, uint64_t features)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	dev->features = features;
+	dev->features = features & dev->device_features;
 }
 
 static uint8_t
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 2/7] net/virtio_user: fix not properly reset device
From: Jianfeng Tan @ 2017-01-04  3:59 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, ferruh.yigit, cunming.liang, Jianfeng Tan, stable
In-Reply-To: <1483502366-140154-1-git-send-email-jianfeng.tan@intel.com>

virtio_user is not properly reset when users call vtpci_reset(),
as it ignores VIRTIO_CONFIG_STATUS_RESET status in
virtio_user_set_status().

This might lead to initialization failure as it starts to re-init
the device before sending RESET messege to backend. Besides, previous
callfds and kickfds are not closed.

To fix it, we add support to disable virtqueues when it's set to
DRIVER OK status, and re-init fields in struct virtio_user_dev.

Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")
Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")

CC: stable@dpdk.org

Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 drivers/net/virtio/virtio_user/virtio_user_dev.c | 26 ++++++++++++++++--------
 drivers/net/virtio/virtio_user_ethdev.c          | 15 ++++++++------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
index 0d7e17b..a38398b 100644
--- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
+++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
@@ -182,7 +182,17 @@ virtio_user_start_device(struct virtio_user_dev *dev)
 
 int virtio_user_stop_device(struct virtio_user_dev *dev)
 {
-	return vhost_user_sock(dev->vhostfd, VHOST_USER_RESET_OWNER, NULL);
+	uint32_t i;
+
+	for (i = 0; i < dev->max_queue_pairs * 2; ++i) {
+		close(dev->callfds[i]);
+		close(dev->kickfds[i]);
+	}
+
+	for (i = 0; i < dev->max_queue_pairs; ++i)
+		vhost_user_enable_queue_pair(dev->vhostfd, i, 0);
+
+	return 0;
 }
 
 static inline void
@@ -210,6 +220,8 @@ int
 virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 		     int cq, int queue_size, const char *mac)
 {
+	uint32_t i;
+
 	snprintf(dev->path, PATH_MAX, "%s", path);
 	dev->max_queue_pairs = queues;
 	dev->queue_pairs = 1; /* mq disabled by default */
@@ -218,6 +230,11 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 	parse_mac(dev, mac);
 	dev->vhostfd = -1;
 
+	for (i = 0; i < VIRTIO_MAX_VIRTQUEUES * 2 + 1; ++i) {
+		dev->kickfds[i] = -1;
+		dev->callfds[i] = -1;
+	}
+
 	dev->vhostfd = vhost_user_setup(dev->path);
 	if (dev->vhostfd < 0) {
 		PMD_INIT_LOG(ERR, "backend set up fails");
@@ -264,13 +281,6 @@ virtio_user_dev_init(struct virtio_user_dev *dev, char *path, int queues,
 void
 virtio_user_dev_uninit(struct virtio_user_dev *dev)
 {
-	uint32_t i;
-
-	for (i = 0; i < dev->max_queue_pairs * 2; ++i) {
-		close(dev->callfds[i]);
-		close(dev->kickfds[i]);
-	}
-
 	close(dev->vhostfd);
 }
 
diff --git a/drivers/net/virtio/virtio_user_ethdev.c b/drivers/net/virtio/virtio_user_ethdev.c
index 4a5a227..93f5b01 100644
--- a/drivers/net/virtio/virtio_user_ethdev.c
+++ b/drivers/net/virtio/virtio_user_ethdev.c
@@ -87,21 +87,24 @@ virtio_user_write_dev_config(struct virtio_hw *hw, size_t offset,
 }
 
 static void
-virtio_user_set_status(struct virtio_hw *hw, uint8_t status)
+virtio_user_reset(struct virtio_hw *hw)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	if (status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
-		virtio_user_start_device(dev);
-	dev->status = status;
+	if (dev->status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
+		virtio_user_stop_device(dev);
 }
 
 static void
-virtio_user_reset(struct virtio_hw *hw)
+virtio_user_set_status(struct virtio_hw *hw, uint8_t status)
 {
 	struct virtio_user_dev *dev = virtio_user_get_dev(hw);
 
-	virtio_user_stop_device(dev);
+	if (status & VIRTIO_CONFIG_STATUS_DRIVER_OK)
+		virtio_user_start_device(dev);
+	else if (status == VIRTIO_CONFIG_STATUS_RESET)
+		virtio_user_reset(hw);
+	dev->status = status;
 }
 
 static uint8_t
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] lib/librte_vhost: fix memory leak
From: Yuanhan Liu @ 2017-01-04  4:02 UTC (permalink / raw)
  To: Yong Wang; +Cc: dev
In-Reply-To: <1483502275-18482-1-git-send-email-wang.yong19@zte.com.cn>

On Tue, Jan 03, 2017 at 10:57:55PM -0500, Yong Wang wrote:
> In function vhost_new_device(), current code dose not free 'dev'
> in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
> memory leak.

Nice catch!

Here are few minor stuff you might need pay attention to for future
contribution:

- a fix patch needs a fixline, like following

  Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")

- the prefix for vhost lib is "vhost: ". And FYI, for PMD drivers, it's
  'net/PMD_NAME', say 'net/virtio'.


For you convenience, I have fixed the two while applying. And thanks
for the fix.

Applied to dpdk-next-virtio.

	--yliu

^ permalink raw reply

* Re: [PATCH v3 2/7] net/virtio_user: fix not properly reset device
From: Yuanhan Liu @ 2017-01-04  5:46 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, ferruh.yigit, cunming.liang, stable
In-Reply-To: <1483502366-140154-3-git-send-email-jianfeng.tan@intel.com>

On Wed, Jan 04, 2017 at 03:59:21AM +0000, Jianfeng Tan wrote:
> virtio_user is not properly reset when users call vtpci_reset(),
> as it ignores VIRTIO_CONFIG_STATUS_RESET status in
> virtio_user_set_status().
> 
> This might lead to initialization failure as it starts to re-init
> the device before sending RESET messege to backend. Besides, previous
> callfds and kickfds are not closed.
> 
> To fix it, we add support to disable virtqueues when it's set to
> DRIVER OK status, and re-init fields in struct virtio_user_dev.
> 
> Fixes: e9efa4d93821 ("net/virtio-user: add new virtual PCI driver")
> Fixes: 37a7eb2ae816 ("net/virtio-user: add device emulation layer")
> 
> CC: stable@dpdk.org
> 
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

Note that, typically, there should be no empty line between 'Cc' and SoB.

> ---
>  drivers/net/virtio/virtio_user/virtio_user_dev.c | 26 ++++++++++++++++--------
>  drivers/net/virtio/virtio_user_ethdev.c          | 15 ++++++++------
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/virtio/virtio_user/virtio_user_dev.c b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> index 0d7e17b..a38398b 100644
> --- a/drivers/net/virtio/virtio_user/virtio_user_dev.c
> +++ b/drivers/net/virtio/virtio_user/virtio_user_dev.c
> @@ -182,7 +182,17 @@ virtio_user_start_device(struct virtio_user_dev *dev)
>  
>  int virtio_user_stop_device(struct virtio_user_dev *dev)

The name doesn't seem to be well named: "dev_stop" comes to my firstly
when I saw that :/

Rename it to "xxx_reset_device"?

	--yliu

^ permalink raw reply

* Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
From: Yuanhan Liu @ 2017-01-04  6:02 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, ferruh.yigit, cunming.liang
In-Reply-To: <1483502366-140154-4-git-send-email-jianfeng.tan@intel.com>

On Wed, Jan 04, 2017 at 03:59:22AM +0000, Jianfeng Tan wrote:
> To support vhost kernel as the backend of net_virtio_user in coming
> patches, we move vhost_user specific structs and macros into
> vhost_user.c, and only keep common definitions in vhost.h.
> 
> Besides, remove VHOST_USER_MQ feature check.

Again, I have to ask, why? You don't only remove the check, also, you
removed this feature setting, which seems to break the MQ support?

	--yliu

^ permalink raw reply

* Re: [PATCH v3 4/7] net/virtio_user: abstract virtio user backend ops
From: Yuanhan Liu @ 2017-01-04  6:11 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, ferruh.yigit, cunming.liang
In-Reply-To: <1483502366-140154-5-git-send-email-jianfeng.tan@intel.com>

On Wed, Jan 04, 2017 at 03:59:23AM +0000, Jianfeng Tan wrote:
> +struct virtio_user_backend_ops ops_user;

Better to qualify it with "extern const" ...

	--yliu

^ permalink raw reply

* Re: [PATCH v3 5/7] net/virtio_user: add vhost kernel support
From: Yuanhan Liu @ 2017-01-04  6:13 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, ferruh.yigit, cunming.liang
In-Reply-To: <1483502366-140154-6-git-send-email-jianfeng.tan@intel.com>

On Wed, Jan 04, 2017 at 03:59:24AM +0000, Jianfeng Tan wrote:
> +static int
> +vhost_kernel_ioctl(struct virtio_user_dev *dev,
> +		   enum vhost_user_request req,
> +		   void *arg)
> +{
> +	int i, ret = -1;
> +	uint64_t req_kernel;
> +	struct vhost_memory_kernel *vm = NULL;
> +
> +	PMD_DRV_LOG(INFO, "%s", vhost_msg_strings[req]);
> +
> +	req_kernel = vhost_req_user_to_kernel[req];
> +
> +	if (req_kernel == VHOST_SET_MEM_TABLE) {
> +		vm = prepare_vhost_memory_kernel();
> +		if (!vm)
> +			return -1;
> +		arg = (void *)vm;
> +	}
> +
> +	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
> +	if (req_kernel == VHOST_SET_FEATURES)
> +		*(uint64_t *)arg &= ~(1ULL << VIRTIO_F_IOMMU_PLATFORM);

You missed my comments in last version?

	--yliu

^ permalink raw reply

* Re: [PATCH v5 00/17] net/i40e: consistent filter API
From: Wu, Jingjing @ 2017-01-04  6:40 UTC (permalink / raw)
  To: Xing, Beilei, Zhang, Helin; +Cc: dev@dpdk.org
In-Reply-To: <1483500187-124740-1-git-send-email-beilei.xing@intel.com>



> -----Original Message-----
> From: Xing, Beilei
> Sent: Wednesday, January 4, 2017 11:23 AM
> To: Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Helin <helin.zhang@intel.com>
> Cc: dev@dpdk.org
> Subject: [PATCH v5 00/17] net/i40e: consistent filter API
> 
> The patch set depends on Adrien's Generic flow API(rte_flow).
> 
> The patches mainly finish following functions:
> 1) Store and restore all kinds of filters.
> 2) Parse all kinds of filters.
> 3) Add flow validate function.
> 4) Add flow create function.
> 5) Add flow destroy function.
> 6) Add flow flush function.
> 
> v5 changes:
>  Change some local variable name.
>  Add removing i40e_flow_list during device unint.
>  Fix compile error when gcc compile option isn't '-O0'.
> 
> v4 changes:
>  Change I40E_TCI_MASK with 0xFFFF to align with testpmd.
>  Modidy the stats show when restoring filters.
> 
> v3 changes:
>  Set the related cause pointer to a non-NULL value when error happens.
>  Change return value when error happens.
>  Modify filter_del parameter with key.
>  Malloc filter after checking when delete a filter.
>  Delete meaningless initialization.
>  Add return value when there's error.
>  Change global variable definition.
>  Modify some function declaration.
> 
> v2 changes:
>  Add i40e_flow.c, all flow ops are implemented in the file.
>  Change the whole implementation of all parse flow functions.
>  Update error info for all flow ops.
>  Add flow_list to store flows created.
> 
> Beilei Xing (17):
>   net/i40e: store ethertype filter
>   net/i40e: store tunnel filter
>   net/i40e: store flow director filter
>   net/i40e: restore ethertype filter
>   net/i40e: restore tunnel filter
>   net/i40e: restore flow director filter
>   net/i40e: add flow validate function
>   net/i40e: parse flow director filter
>   net/i40e: parse tunnel filter
>   net/i40e: add flow create function
>   net/i40e: add flow destroy function
>   net/i40e: destroy ethertype filter
>   net/i40e: destroy tunnel filter
>   net/i40e: destroy flow directory filter
>   net/i40e: add flow flush function
>   net/i40e: flush ethertype filters
>   net/i40e: flush tunnel filters
> 
>  drivers/net/i40e/Makefile      |    2 +
>  drivers/net/i40e/i40e_ethdev.c |  526 ++++++++++--
> drivers/net/i40e/i40e_ethdev.h |  173 ++++
>  drivers/net/i40e/i40e_fdir.c   |  140 +++-
>  drivers/net/i40e/i40e_flow.c   | 1772
> ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 2547 insertions(+), 66 deletions(-)  create mode 100644
> drivers/net/i40e/i40e_flow.c
> 

Acked-by: Jingjing Wu <jingjing.wu@intel.com>

Thanks
Jingjing

^ permalink raw reply

* Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
From: Tan, Jianfeng @ 2017-01-04  6:46 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev@dpdk.org, Yigit, Ferruh, Liang, Cunming
In-Reply-To: <20170104060238.GG21228@yliu-dev.sh.intel.com>

Hi Yuanhan,

> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Wednesday, January 4, 2017 2:03 PM
> To: Tan, Jianfeng
> Cc: dev@dpdk.org; Yigit, Ferruh; Liang, Cunming
> Subject: Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
> 
> On Wed, Jan 04, 2017 at 03:59:22AM +0000, Jianfeng Tan wrote:
> > To support vhost kernel as the backend of net_virtio_user in coming
> > patches, we move vhost_user specific structs and macros into
> > vhost_user.c, and only keep common definitions in vhost.h.
> >
> > Besides, remove VHOST_USER_MQ feature check.
> 
> Again, I have to ask, why? You don't only remove the check, also, you
> removed this feature setting, which seems to break the MQ support?

I have answered it here:
http://dpdk.org/ml/archives/dev/2016-December/053520.html

To be more clear, VHOST_USER_MQ is a not-well-defined macro: #define VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES),
which is a feature bit in vhost user protocol.

According to QEMU/ docs/specs/vhost-user.txt, "If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized in an enabled state. "

But our DPDK vhost library does not take care of this feature bit. Just make this as default: the ring is initialized in an disabled state. And our virtio_user with vhost-user does send VHOST_USER_SET_VRING_ENABLE to enable each queue pair.

So I think it's not necessary to add it back.

How do you think?

Thanks,
Jianfeng

> 
> 	--yliu

^ permalink raw reply

* 答复: Re: [PATCH] lib/librte_vhost: fix memory leak
From: wang.yong19 @ 2017-01-04  6:55 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev
In-Reply-To: <20170104040210.GE21228@yliu-dev.sh.intel.com>

> Yuanhan Liu <yuanhan.liu@linux.intel.com> 
> 2017/01/04 12:02
> 
> to
> 
> Yong Wang <wang.yong19@zte.com.cn>, 
> 
> cc
> 
> dev@dpdk.org
> 
> subject
> 
> Re: [PATCH] lib/librte_vhost: fix memory leak
> 
> On Tue, Jan 03, 2017 at 10:57:55PM -0500, Yong Wang wrote:
> > In function vhost_new_device(), current code dose not free 'dev'
> > in "i == MAX_VHOST_DEVICE" condition statements. It will lead to a
> > memory leak.
> 
> Nice catch!
> 
> Here are few minor stuff you might need pay attention to for future
> contribution:
> 
> - a fix patch needs a fixline, like following
> 
>   Fixes: 45ca9c6f7bc6 ("vhost: get rid of linked list for devices")
> 
> - the prefix for vhost lib is "vhost: ". And FYI, for PMD drivers, it's
>   'net/PMD_NAME', say 'net/virtio'.
> 
> 
> For you convenience, I have fixed the two while applying. And thanks
> for the fix.
> 
> Applied to dpdk-next-virtio.
> 
>    --yliu

Thanks for your advice. 

^ permalink raw reply

* Re: [PATCH v2 5/9] net/virtio: setup rxq interrupts
From: Tan, Jianfeng @ 2017-01-04  6:56 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, stephen
In-Reply-To: <20161230062747.GI21789@yliu-dev.sh.intel.com>



On 12/30/2016 2:27 PM, Yuanhan Liu wrote:
> On Thu, Dec 29, 2016 at 07:30:39AM +0000, Jianfeng Tan wrote:
>> This patch mainly allocates structure to store queue/irq mapping,
>> and configure queue/irq mapping down through PCI ops. It also creates
>> eventfds for each Rx queue and tell the kernel about the eventfd/intr
>> binding.
>>
>> Mostly importantly, different from previous NICs (usually implements
>> these logic in dev_start()), virtio's interrupt settings should be
>> configured down to QEMU before sending DRIVER_OK notification.
> Isn't it obvious we have to have all driver stuff (including interrupt
> settings) configured properly before setting DRIVER_OK? :) That said,
> it's meanless to state the fact that virtio acts differently than other
> nics here on dev_start/stop.
>
>> Note: We only support 1:1 queue/irq mapping so far, which means, each
>> rx queue has one exclusive interrupt (corresponding to irqfd in the
>> qemu/kvm) to get notified when packets are available on that queue.
> That means you have to setup the "vectors=N" option has to set correctly
> in QEMU, otherwise it won't work?

Yes, actually, the correct value should be "vectors>=N+1", with N 
standing for the number of queue pairs. It's due to the hard coded 
mapping logic:
0 -> config irq
1 -> rxq0
2 -> rxq1
...

>   If so, you also have to doc it somewhere.

Agreed.

[...]
>> +
>> +	if (virtio_queues_bind_intr(dev) < 0) {
>> +		PMD_INIT_LOG(ERR, "Failed to bind queue/interrupt");
>> +		return -1;
> You have to free intr_handle->intr_vec, otherwise, memory leak occurs.

It's freed at dev_close(). Do you mean freeing and reallocating here? As 
nr_rx_queues is not a changeable value, I don't see the necessity here. 
I miss something?

Thanks,
Jianfeng

^ permalink raw reply

* Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
From: Yuanhan Liu @ 2017-01-04  7:08 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: dev@dpdk.org, Yigit, Ferruh, Liang, Cunming
In-Reply-To: <ED26CBA2FAD1BF48A8719AEF02201E3651107348@SHSMSX103.ccr.corp.intel.com>

On Wed, Jan 04, 2017 at 06:46:34AM +0000, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Wednesday, January 4, 2017 2:03 PM
> > To: Tan, Jianfeng
> > Cc: dev@dpdk.org; Yigit, Ferruh; Liang, Cunming
> > Subject: Re: [PATCH v3 3/7] net/virtio_user: move vhost user specific code
> > 
> > On Wed, Jan 04, 2017 at 03:59:22AM +0000, Jianfeng Tan wrote:
> > > To support vhost kernel as the backend of net_virtio_user in coming
> > > patches, we move vhost_user specific structs and macros into
> > > vhost_user.c, and only keep common definitions in vhost.h.
> > >
> > > Besides, remove VHOST_USER_MQ feature check.
> > 
> > Again, I have to ask, why? You don't only remove the check, also, you
> > removed this feature setting, which seems to break the MQ support?
> 
> I have answered it here:
> http://dpdk.org/ml/archives/dev/2016-December/053520.html

I thought we have made some agreements :/

> 
> To be more clear, VHOST_USER_MQ is a not-well-defined macro: #define VHOST_USER_MQ (1ULL << VHOST_USER_F_PROTOCOL_FEATURES),
> which is a feature bit in vhost user protocol.

Yes, it's again named wrongly.

> According to QEMU/ docs/specs/vhost-user.txt, "If VHOST_USER_F_PROTOCOL_FEATURES has not been negotiated, the ring is initialized in an enabled state. "
> 
> But our DPDK vhost library does not take care of this feature bit.
> Just make this as default: the ring is initialized in an disabled state. And our virtio_user with vhost-user does send VHOST_USER_SET_VRING_ENABLE to enable each queue pair.

VHOST_USER_F_PROTOCOL_FEATURES is a fundamental feature for quite many
vhost-user extended features, including the MQ. If it's not set, the MQ
should not work.

It may still work in your case, becase you made an assumtion that the
vhost backend supports the MQ feature (which is true in nowadays, as
the feature has been there for a quite while). However, that's not an
assumtion you can take while adding the vhost-user MQ support at that
time. And such feature bit (including the PROTOCOL_F_MQ) makes sure
that we will not try to enable MQ with and older vhost backend that
doesn't have the support.

Put simply, this feature is needed, and as the feature name states,
it's needed only for vhost-user.

	--yliu

> 
> So I think it's not necessary to add it back.
> 
> How do you think?
> 
> Thanks,
> Jianfeng
> 
> > 
> > 	--yliu

^ permalink raw reply

* Re: [PATCH v2 5/9] net/virtio: setup rxq interrupts
From: Yuanhan Liu @ 2017-01-04  7:22 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: dev, stephen
In-Reply-To: <88093c9a-5bd8-5acb-aa87-4aa92c03a2df@intel.com>

On Wed, Jan 04, 2017 at 02:56:50PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 12/30/2016 2:27 PM, Yuanhan Liu wrote:
> >On Thu, Dec 29, 2016 at 07:30:39AM +0000, Jianfeng Tan wrote:
> >>This patch mainly allocates structure to store queue/irq mapping,
> >>and configure queue/irq mapping down through PCI ops. It also creates
> >>eventfds for each Rx queue and tell the kernel about the eventfd/intr
> >>binding.
> >>
> >>Mostly importantly, different from previous NICs (usually implements
> >>these logic in dev_start()), virtio's interrupt settings should be
> >>configured down to QEMU before sending DRIVER_OK notification.
> >Isn't it obvious we have to have all driver stuff (including interrupt
> >settings) configured properly before setting DRIVER_OK? :) That said,
> >it's meanless to state the fact that virtio acts differently than other
> >nics here on dev_start/stop.
> >
> >>Note: We only support 1:1 queue/irq mapping so far, which means, each
> >>rx queue has one exclusive interrupt (corresponding to irqfd in the
> >>qemu/kvm) to get notified when packets are available on that queue.
> >That means you have to setup the "vectors=N" option has to set correctly
> >in QEMU, otherwise it won't work?
> 
> Yes, actually, the correct value should be "vectors>=N+1", with N standing

Yeah, and it's a typo.

> for the number of queue pairs. It's due to the hard coded mapping logic:
> 0 -> config irq
> 1 -> rxq0
> 2 -> rxq1
> ...
> 
> >  If so, you also have to doc it somewhere.
> 
> Agreed.
> 
> [...]
> >>+
> >>+	if (virtio_queues_bind_intr(dev) < 0) {
> >>+		PMD_INIT_LOG(ERR, "Failed to bind queue/interrupt");
> >>+		return -1;
> >You have to free intr_handle->intr_vec, otherwise, memory leak occurs.
> 
> It's freed at dev_close(). Do you mean freeing and reallocating here? As

The typical way is free the resources have been allocated when errors
happens.

> nr_rx_queues is not a changeable value, I don't see the necessity here. I
> miss something?

No. nb_rx_queues does change, when people reconfigure the queue number.
However, the MAX queues the virito supports does not change. You could
use that number for allocation.

	--yliu

^ permalink raw reply

* Re: [PATCH v2 5/7] net/virtio_user: add vhost kernel support
From: Tan, Jianfeng @ 2017-01-04  7:22 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, ferruh.yigit, cunming.liang
In-Reply-To: <20161226074437.GD19288@yliu-dev.sh.intel.com>

Sorry, I forget to reply this comment.

On 12/26/2016 3:44 PM, Yuanhan Liu wrote:
> [...]
>> +
>> +	/* Does not work when VIRTIO_F_IOMMU_PLATFORM now, why? */
> Because this feature need the vhost IOTLB support from the device
> emulation. Patches for QEMU hasn't been merged yet, but it has been
> there for a while.

Yes. And it's in need of help from QEMU.

>
> Since we don't have the support yet, for sure it won't work. But
> I'm wondering why you have to disable it explicitly?

Here we do not have QEMU. Frontend driver talks with vhost-net through 
virtio_user_dev. And both ends claim to support VIRTIO_F_IOMMU_PLATFORM. 
So this feature bit will be negotiated if we don't explicitly disable 
it. In my previous test, it fails in vhost_init_device_iotlb() of vhost 
kernel module. My guess is that, for this feature, without the help of 
QEMU, vhost kernel module cannot work independently.

Thanks,
Jianfeng

^ permalink raw reply

* [PATCH v5 0/8] Add MACsec offload support for ixgbe
From: Tiwei Bie @ 2017-01-04  7:21 UTC (permalink / raw)
  To: dev
  Cc: adrien.mazarguil, wenzhuo.lu, john.mcnamara, olivier.matz,
	thomas.monjalon, konstantin.ananyev, helin.zhang, wei.dai,
	xiao.w.wang
In-Reply-To: <1482939691-34855-1-git-send-email-tiwei.bie@intel.com>

This patch set adds the MACsec offload support for ixgbe.
The testpmd is also updated to support MACsec cmds.

v2:
- Update the documents for testpmd;
- Update the release notes;
- Reuse the functions provided by base code;

v3:
- Add the missing parts of MACsec mbuf flag and reorganize the patch set;
- Add an ethdev event type for MACsec;
- Advertise the MACsec offload capabilities based on the mac type;
- Minor fixes and improvements;

v4:
- Reserve bits in mbuf and ethdev for PMD specific API;
- Use the reserved bits in PMD specific API;

v5:
- Add MACsec offload in the NIC feature list;
- Minor improvements on comments;

Tiwei Bie (8):
  mbuf: reserve a Tx offload flag for PMD-specific API
  ethdev: reserve an event type for PMD-specific API
  ethdev: reserve capability flags for PMD-specific API
  net/ixgbe: add MACsec offload support
  app/testpmd: add MACsec offload commands
  doc: add ixgbe specific APIs
  doc: update the release notes for the reserved flags
  doc: add MACsec offload into NIC feature list

 app/test-pmd/cmdline.c                      | 389 ++++++++++++++++++++++
 app/test-pmd/macfwd.c                       |   7 +
 app/test-pmd/macswap.c                      |   7 +
 app/test-pmd/testpmd.h                      |   2 +
 app/test-pmd/txonly.c                       |   7 +
 doc/guides/nics/features/default.ini        |   1 +
 doc/guides/nics/features/ixgbe.ini          |   1 +
 doc/guides/rel_notes/release_17_02.rst      |  18 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  32 ++
 drivers/net/ixgbe/ixgbe_ethdev.c            | 481 +++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h            |  45 +++
 drivers/net/ixgbe/ixgbe_rxtx.c              |   5 +
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 122 +++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  11 +
 lib/librte_ether/rte_ethdev.h               |   4 +
 lib/librte_mbuf/rte_mbuf.c                  |   2 +
 lib/librte_mbuf/rte_mbuf.h                  |   5 +
 17 files changed, 1134 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH v5 1/8] mbuf: reserve a Tx offload flag for PMD-specific API
From: Tiwei Bie @ 2017-01-04  7:21 UTC (permalink / raw)
  To: dev
  Cc: adrien.mazarguil, wenzhuo.lu, john.mcnamara, olivier.matz,
	thomas.monjalon, konstantin.ananyev, helin.zhang, wei.dai,
	xiao.w.wang
In-Reply-To: <1483514502-32841-1-git-send-email-tiwei.bie@intel.com>

Reserve a Tx offload flag in mbuf, that can be used by PMD to define
its own Tx offload flag when implementing the PMD-specific API.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 lib/librte_mbuf/rte_mbuf.c | 2 ++
 lib/librte_mbuf/rte_mbuf.h | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 63f43c8..15c4f68 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -404,6 +404,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
 	case PKT_TX_TUNNEL_GRE: return "PKT_TX_TUNNEL_GRE";
 	case PKT_TX_TUNNEL_IPIP: return "PKT_TX_TUNNEL_IPIP";
 	case PKT_TX_TUNNEL_GENEVE: return "PKT_TX_TUNNEL_GENEVE";
+	case PKT_TX_RESERVED_0: return "PKT_TX_RESERVED_0";
 	default: return NULL;
 	}
 }
@@ -434,6 +435,7 @@ rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen)
 		  "PKT_TX_TUNNEL_NONE" },
 		{ PKT_TX_TUNNEL_GENEVE, PKT_TX_TUNNEL_MASK,
 		  "PKT_TX_TUNNEL_NONE" },
+		{ PKT_TX_RESERVED_0, PKT_TX_RESERVED_0, NULL },
 	};
 	const char *name;
 	unsigned int i;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index ead7c6e..6168a6d 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -182,6 +182,11 @@ extern "C" {
 /* add new TX flags here */
 
 /**
+ * Reserved Tx offload flag for PMD-specific API.
+ */
+#define PKT_TX_RESERVED_0     (0x1ULL << 44)
+
+/**
  * Bits 45:48 used for the tunnel type.
  * When doing Tx offload like TSO or checksum, the HW needs to configure the
  * tunnel type into the HW descriptors.
-- 
2.7.4

^ permalink raw reply related

* [PATCH v5 2/8] ethdev: reserve an event type for PMD-specific API
From: Tiwei Bie @ 2017-01-04  7:21 UTC (permalink / raw)
  To: dev
  Cc: adrien.mazarguil, wenzhuo.lu, john.mcnamara, olivier.matz,
	thomas.monjalon, konstantin.ananyev, helin.zhang, wei.dai,
	xiao.w.wang
In-Reply-To: <1483514502-32841-1-git-send-email-tiwei.bie@intel.com>

Reserve an event type, that can be used by PMD to define its own
event type when implementing the PMD-specific API.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index fb51754..d465825 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -3044,6 +3044,8 @@ enum rte_eth_event_type {
 	RTE_ETH_EVENT_INTR_RESET,
 			/**< reset interrupt event, sent to VF on PF reset */
 	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF */
+	RTE_ETH_EVENT_RESERVED_0,
+			/**< reserved event type for PMD-specific API */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH v5 3/8] ethdev: reserve capability flags for PMD-specific API
From: Tiwei Bie @ 2017-01-04  7:21 UTC (permalink / raw)
  To: dev
  Cc: adrien.mazarguil, wenzhuo.lu, john.mcnamara, olivier.matz,
	thomas.monjalon, konstantin.ananyev, helin.zhang, wei.dai,
	xiao.w.wang
In-Reply-To: <1483514502-32841-1-git-send-email-tiwei.bie@intel.com>

Reserve a Tx capability flag and a Rx capability flag, that can be
used by PMD to define its own capability flags when implementing the
PMD-specific API.

Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index d465825..8800b39 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -857,6 +857,7 @@ struct rte_eth_conf {
 #define DEV_RX_OFFLOAD_TCP_LRO     0x00000010
 #define DEV_RX_OFFLOAD_QINQ_STRIP  0x00000020
 #define DEV_RX_OFFLOAD_OUTER_IPV4_CKSUM 0x00000040
+#define DEV_RX_OFFLOAD_RESERVED_0  0x00000080 /**< Used for PMD-specific API. */
 
 /**
  * TX offload capabilities of a device.
@@ -874,6 +875,7 @@ struct rte_eth_conf {
 #define DEV_TX_OFFLOAD_GRE_TNL_TSO      0x00000400    /**< Used for tunneling packet. */
 #define DEV_TX_OFFLOAD_IPIP_TNL_TSO     0x00000800    /**< Used for tunneling packet. */
 #define DEV_TX_OFFLOAD_GENEVE_TNL_TSO   0x00001000    /**< Used for tunneling packet. */
+#define DEV_TX_OFFLOAD_RESERVED_0  0x00002000 /**< Used for PMD-specific API. */
 
 /**
  * Ethernet device information
-- 
2.7.4

^ permalink raw reply related

* [PATCH v5 4/8] net/ixgbe: add MACsec offload support
From: Tiwei Bie @ 2017-01-04  7:21 UTC (permalink / raw)
  To: dev
  Cc: adrien.mazarguil, wenzhuo.lu, john.mcnamara, olivier.matz,
	thomas.monjalon, konstantin.ananyev, helin.zhang, wei.dai,
	xiao.w.wang
In-Reply-To: <1483514502-32841-1-git-send-email-tiwei.bie@intel.com>

MACsec (or LinkSec, 802.1AE) is a MAC level encryption/authentication
scheme defined in IEEE 802.1AE that uses symmetric cryptography.
This commit adds the MACsec offload support for ixgbe.

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Wenzhuo Lu <wenzhuo.lu@intel.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c            | 481 +++++++++++++++++++++++++++-
 drivers/net/ixgbe/ixgbe_ethdev.h            |  45 +++
 drivers/net/ixgbe/ixgbe_rxtx.c              |   5 +
 drivers/net/ixgbe/rte_pmd_ixgbe.h           | 122 +++++++
 drivers/net/ixgbe/rte_pmd_ixgbe_version.map |  11 +
 5 files changed, 659 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ec2edad..653ddd2 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -231,6 +231,7 @@ static int ixgbe_dev_rss_reta_query(struct rte_eth_dev *dev,
 			uint16_t reta_size);
 static void ixgbe_dev_link_status_print(struct rte_eth_dev *dev);
 static int ixgbe_dev_lsc_interrupt_setup(struct rte_eth_dev *dev);
+static int ixgbe_dev_macsec_interrupt_setup(struct rte_eth_dev *dev);
 static int ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev);
 static int ixgbe_dev_interrupt_action(struct rte_eth_dev *dev,
@@ -745,6 +746,51 @@ static const struct rte_ixgbe_xstats_name_off rte_ixgbe_stats_strings[] = {
 #define IXGBE_NB_HW_STATS (sizeof(rte_ixgbe_stats_strings) / \
 			   sizeof(rte_ixgbe_stats_strings[0]))
 
+/* MACsec statistics */
+static const struct rte_ixgbe_xstats_name_off rte_ixgbe_macsec_strings[] = {
+	{"out_pkts_untagged", offsetof(struct ixgbe_macsec_stats,
+		out_pkts_untagged)},
+	{"out_pkts_encrypted", offsetof(struct ixgbe_macsec_stats,
+		out_pkts_encrypted)},
+	{"out_pkts_protected", offsetof(struct ixgbe_macsec_stats,
+		out_pkts_protected)},
+	{"out_octets_encrypted", offsetof(struct ixgbe_macsec_stats,
+		out_octets_encrypted)},
+	{"out_octets_protected", offsetof(struct ixgbe_macsec_stats,
+		out_octets_protected)},
+	{"in_pkts_untagged", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_untagged)},
+	{"in_pkts_badtag", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_badtag)},
+	{"in_pkts_nosci", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_nosci)},
+	{"in_pkts_unknownsci", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_unknownsci)},
+	{"in_octets_decrypted", offsetof(struct ixgbe_macsec_stats,
+		in_octets_decrypted)},
+	{"in_octets_validated", offsetof(struct ixgbe_macsec_stats,
+		in_octets_validated)},
+	{"in_pkts_unchecked", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_unchecked)},
+	{"in_pkts_delayed", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_delayed)},
+	{"in_pkts_late", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_late)},
+	{"in_pkts_ok", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_ok)},
+	{"in_pkts_invalid", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_invalid)},
+	{"in_pkts_notvalid", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_notvalid)},
+	{"in_pkts_unusedsa", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_unusedsa)},
+	{"in_pkts_notusingsa", offsetof(struct ixgbe_macsec_stats,
+		in_pkts_notusingsa)},
+};
+
+#define IXGBE_NB_MACSEC_STATS (sizeof(rte_ixgbe_macsec_strings) / \
+			   sizeof(rte_ixgbe_macsec_strings[0]))
+
 /* Per-queue statistics */
 static const struct rte_ixgbe_xstats_name_off rte_ixgbe_rxq_strings[] = {
 	{"mbuf_allocation_errors", offsetof(struct ixgbe_hw_stats, rnbc)},
@@ -2367,6 +2413,7 @@ ixgbe_dev_start(struct rte_eth_dev *dev)
 		/* check if lsc interrupt is enabled */
 		if (dev->data->dev_conf.intr_conf.lsc != 0)
 			ixgbe_dev_lsc_interrupt_setup(dev);
+		ixgbe_dev_macsec_interrupt_setup(dev);
 	} else {
 		rte_intr_callback_unregister(intr_handle,
 					     ixgbe_dev_interrupt_handler, dev);
@@ -2557,6 +2604,7 @@ ixgbe_dev_close(struct rte_eth_dev *dev)
 static void
 ixgbe_read_stats_registers(struct ixgbe_hw *hw,
 			   struct ixgbe_hw_stats *hw_stats,
+			   struct ixgbe_macsec_stats *macsec_stats,
 			   uint64_t *total_missed_rx, uint64_t *total_qbrc,
 			   uint64_t *total_qprc, uint64_t *total_qprdc)
 {
@@ -2726,6 +2774,40 @@ ixgbe_read_stats_registers(struct ixgbe_hw *hw,
 	/* Flow Director Stats registers */
 	hw_stats->fdirmatch += IXGBE_READ_REG(hw, IXGBE_FDIRMATCH);
 	hw_stats->fdirmiss += IXGBE_READ_REG(hw, IXGBE_FDIRMISS);
+
+	/* MACsec Stats registers */
+	macsec_stats->out_pkts_untagged += IXGBE_READ_REG(hw, IXGBE_LSECTXUT);
+	macsec_stats->out_pkts_encrypted +=
+		IXGBE_READ_REG(hw, IXGBE_LSECTXPKTE);
+	macsec_stats->out_pkts_protected +=
+		IXGBE_READ_REG(hw, IXGBE_LSECTXPKTP);
+	macsec_stats->out_octets_encrypted +=
+		IXGBE_READ_REG(hw, IXGBE_LSECTXOCTE);
+	macsec_stats->out_octets_protected +=
+		IXGBE_READ_REG(hw, IXGBE_LSECTXOCTP);
+	macsec_stats->in_pkts_untagged += IXGBE_READ_REG(hw, IXGBE_LSECRXUT);
+	macsec_stats->in_pkts_badtag += IXGBE_READ_REG(hw, IXGBE_LSECRXBAD);
+	macsec_stats->in_pkts_nosci += IXGBE_READ_REG(hw, IXGBE_LSECRXNOSCI);
+	macsec_stats->in_pkts_unknownsci +=
+		IXGBE_READ_REG(hw, IXGBE_LSECRXUNSCI);
+	macsec_stats->in_octets_decrypted +=
+		IXGBE_READ_REG(hw, IXGBE_LSECRXOCTD);
+	macsec_stats->in_octets_validated +=
+		IXGBE_READ_REG(hw, IXGBE_LSECRXOCTV);
+	macsec_stats->in_pkts_unchecked += IXGBE_READ_REG(hw, IXGBE_LSECRXUNCH);
+	macsec_stats->in_pkts_delayed += IXGBE_READ_REG(hw, IXGBE_LSECRXDELAY);
+	macsec_stats->in_pkts_late += IXGBE_READ_REG(hw, IXGBE_LSECRXLATE);
+	for (i = 0; i < 2; i++) {
+		macsec_stats->in_pkts_ok +=
+			IXGBE_READ_REG(hw, IXGBE_LSECRXOK(i));
+		macsec_stats->in_pkts_invalid +=
+			IXGBE_READ_REG(hw, IXGBE_LSECRXINV(i));
+		macsec_stats->in_pkts_notvalid +=
+			IXGBE_READ_REG(hw, IXGBE_LSECRXNV(i));
+	}
+	macsec_stats->in_pkts_unusedsa += IXGBE_READ_REG(hw, IXGBE_LSECRXUNSA);
+	macsec_stats->in_pkts_notusingsa +=
+		IXGBE_READ_REG(hw, IXGBE_LSECRXNUSA);
 }
 
 /*
@@ -2738,6 +2820,9 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 			IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ixgbe_hw_stats *hw_stats =
 			IXGBE_DEV_PRIVATE_TO_STATS(dev->data->dev_private);
+	struct ixgbe_macsec_stats *macsec_stats =
+			IXGBE_DEV_PRIVATE_TO_MACSEC_STATS(
+				dev->data->dev_private);
 	uint64_t total_missed_rx, total_qbrc, total_qprc, total_qprdc;
 	unsigned i;
 
@@ -2746,8 +2831,8 @@ ixgbe_dev_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 	total_qprc = 0;
 	total_qprdc = 0;
 
-	ixgbe_read_stats_registers(hw, hw_stats, &total_missed_rx, &total_qbrc,
-			&total_qprc, &total_qprdc);
+	ixgbe_read_stats_registers(hw, hw_stats, macsec_stats, &total_missed_rx,
+			&total_qbrc, &total_qprc, &total_qprdc);
 
 	if (stats == NULL)
 		return;
@@ -2799,7 +2884,7 @@ ixgbe_dev_stats_reset(struct rte_eth_dev *dev)
 /* This function calculates the number of xstats based on the current config */
 static unsigned
 ixgbe_xstats_calc_num(void) {
-	return IXGBE_NB_HW_STATS +
+	return IXGBE_NB_HW_STATS + IXGBE_NB_MACSEC_STATS +
 		(IXGBE_NB_RXQ_PRIO_STATS * IXGBE_NB_RXQ_PRIO_VALUES) +
 		(IXGBE_NB_TXQ_PRIO_STATS * IXGBE_NB_TXQ_PRIO_VALUES);
 }
@@ -2826,6 +2911,15 @@ static int ixgbe_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev,
 			count++;
 		}
 
+		/* MACsec Stats */
+		for (i = 0; i < IXGBE_NB_MACSEC_STATS; i++) {
+			snprintf(xstats_names[count].name,
+				sizeof(xstats_names[count].name),
+				"%s",
+				rte_ixgbe_macsec_strings[i].name);
+			count++;
+		}
+
 		/* RX Priority Stats */
 		for (stat = 0; stat < IXGBE_NB_RXQ_PRIO_STATS; stat++) {
 			for (i = 0; i < IXGBE_NB_RXQ_PRIO_VALUES; i++) {
@@ -2875,6 +2969,9 @@ ixgbe_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 			IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct ixgbe_hw_stats *hw_stats =
 			IXGBE_DEV_PRIVATE_TO_STATS(dev->data->dev_private);
+	struct ixgbe_macsec_stats *macsec_stats =
+			IXGBE_DEV_PRIVATE_TO_MACSEC_STATS(
+				dev->data->dev_private);
 	uint64_t total_missed_rx, total_qbrc, total_qprc, total_qprdc;
 	unsigned i, stat, count = 0;
 
@@ -2888,8 +2985,8 @@ ixgbe_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 	total_qprc = 0;
 	total_qprdc = 0;
 
-	ixgbe_read_stats_registers(hw, hw_stats, &total_missed_rx, &total_qbrc,
-				   &total_qprc, &total_qprdc);
+	ixgbe_read_stats_registers(hw, hw_stats, macsec_stats, &total_missed_rx,
+			&total_qbrc, &total_qprc, &total_qprdc);
 
 	/* If this is a reset xstats is NULL, and we have cleared the
 	 * registers by reading them.
@@ -2905,6 +3002,13 @@ ixgbe_dev_xstats_get(struct rte_eth_dev *dev, struct rte_eth_xstat *xstats,
 		count++;
 	}
 
+	/* MACsec Stats */
+	for (i = 0; i < IXGBE_NB_MACSEC_STATS; i++) {
+		xstats[count].value = *(uint64_t *)(((char *)macsec_stats) +
+				rte_ixgbe_macsec_strings[i].offset);
+		count++;
+	}
+
 	/* RX Priority Stats */
 	for (stat = 0; stat < IXGBE_NB_RXQ_PRIO_STATS; stat++) {
 		for (i = 0; i < IXGBE_NB_RXQ_PRIO_VALUES; i++) {
@@ -2932,6 +3036,9 @@ ixgbe_dev_xstats_reset(struct rte_eth_dev *dev)
 {
 	struct ixgbe_hw_stats *stats =
 			IXGBE_DEV_PRIVATE_TO_STATS(dev->data->dev_private);
+	struct ixgbe_macsec_stats *macsec_stats =
+			IXGBE_DEV_PRIVATE_TO_MACSEC_STATS(
+				dev->data->dev_private);
 
 	unsigned count = ixgbe_xstats_calc_num();
 
@@ -2940,6 +3047,7 @@ ixgbe_dev_xstats_reset(struct rte_eth_dev *dev)
 
 	/* Reset software totals */
 	memset(stats, 0, sizeof(*stats));
+	memset(macsec_stats, 0, sizeof(*macsec_stats));
 }
 
 static void
@@ -3072,6 +3180,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	    !RTE_ETH_DEV_SRIOV(dev).active)
 		dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_TCP_LRO;
 
+	if (hw->mac.type == ixgbe_mac_82599EB ||
+	    hw->mac.type == ixgbe_mac_X540)
+		dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_IXGBE_MACSEC_STRIP;
+
 	if (hw->mac.type == ixgbe_mac_X550 ||
 	    hw->mac.type == ixgbe_mac_X550EM_x ||
 	    hw->mac.type == ixgbe_mac_X550EM_a)
@@ -3085,6 +3197,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 		DEV_TX_OFFLOAD_SCTP_CKSUM  |
 		DEV_TX_OFFLOAD_TCP_TSO;
 
+	if (hw->mac.type == ixgbe_mac_82599EB ||
+	    hw->mac.type == ixgbe_mac_X540)
+		dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_IXGBE_MACSEC_INSERT;
+
 	if (hw->mac.type == ixgbe_mac_X550 ||
 	    hw->mac.type == ixgbe_mac_X550EM_x ||
 	    hw->mac.type == ixgbe_mac_X550EM_a)
@@ -3382,6 +3498,28 @@ ixgbe_dev_rxq_interrupt_setup(struct rte_eth_dev *dev)
 	return 0;
 }
 
+/**
+ * It clears the interrupt causes and enables the interrupt.
+ * It will be called once only during nic initialized.
+ *
+ * @param dev
+ *  Pointer to struct rte_eth_dev.
+ *
+ * @return
+ *  - On success, zero.
+ *  - On failure, a negative value.
+ */
+static int
+ixgbe_dev_macsec_interrupt_setup(struct rte_eth_dev *dev)
+{
+	struct ixgbe_interrupt *intr =
+		IXGBE_DEV_PRIVATE_TO_INTR(dev->data->dev_private);
+
+	intr->mask |= IXGBE_EICR_LINKSEC;
+
+	return 0;
+}
+
 /*
  * It reads ICR and sets flag (IXGBE_EICR_LSC) for the link_update.
  *
@@ -3416,6 +3554,9 @@ ixgbe_dev_interrupt_get_status(struct rte_eth_dev *dev)
 	if (eicr & IXGBE_EICR_MAILBOX)
 		intr->flags |= IXGBE_FLAG_MAILBOX;
 
+	if (eicr & IXGBE_EICR_LINKSEC)
+		intr->flags |= IXGBE_FLAG_MACSEC;
+
 	if (hw->mac.type ==  ixgbe_mac_X550EM_x &&
 	    hw->phy.type == ixgbe_phy_x550em_ext_t &&
 	    (eicr & IXGBE_EICR_GPI_SDP0_X550EM_x))
@@ -3570,6 +3711,12 @@ ixgbe_dev_interrupt_delayed_handler(void *param)
 		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_LSC, NULL);
 	}
 
+	if (intr->flags & IXGBE_FLAG_MACSEC) {
+		_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_IXGBE_MACSEC,
+					      NULL);
+		intr->flags &= ~IXGBE_FLAG_MACSEC;
+	}
+
 	PMD_DRV_LOG(DEBUG, "enable intr in delayed handler S[%08x]", eicr);
 	ixgbe_enable_intr(dev);
 	rte_intr_enable(intr_handle);
@@ -7610,6 +7757,330 @@ ixgbevf_dev_interrupt_handler(__rte_unused struct rte_intr_handle *handle,
 	ixgbevf_dev_interrupt_action(dev);
 }
 
+/**
+ *  ixgbe_disable_sec_tx_path_generic - Stops the transmit data path
+ *  @hw: pointer to hardware structure
+ *
+ *  Stops the transmit data path and waits for the HW to internally empty
+ *  the Tx security block
+ **/
+int ixgbe_disable_sec_tx_path_generic(struct ixgbe_hw *hw)
+{
+#define IXGBE_MAX_SECTX_POLL 40
+
+	int i;
+	int sectxreg;
+
+	sectxreg = IXGBE_READ_REG(hw, IXGBE_SECTXCTRL);
+	sectxreg |= IXGBE_SECTXCTRL_TX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXCTRL, sectxreg);
+	for (i = 0; i < IXGBE_MAX_SECTX_POLL; i++) {
+		sectxreg = IXGBE_READ_REG(hw, IXGBE_SECTXSTAT);
+		if (sectxreg & IXGBE_SECTXSTAT_SECTX_RDY)
+			break;
+		/* Use interrupt-safe sleep just in case */
+		usec_delay(1000);
+	}
+
+	/* For informational purposes only */
+	if (i >= IXGBE_MAX_SECTX_POLL)
+		PMD_DRV_LOG(DEBUG, "Tx unit being enabled before security "
+			 "path fully disabled.  Continuing with init.\n");
+
+	return IXGBE_SUCCESS;
+}
+
+/**
+ *  ixgbe_enable_sec_tx_path_generic - Enables the transmit data path
+ *  @hw: pointer to hardware structure
+ *
+ *  Enables the transmit data path.
+ **/
+int ixgbe_enable_sec_tx_path_generic(struct ixgbe_hw *hw)
+{
+	uint32_t sectxreg;
+
+	sectxreg = IXGBE_READ_REG(hw, IXGBE_SECTXCTRL);
+	sectxreg &= ~IXGBE_SECTXCTRL_TX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXCTRL, sectxreg);
+	IXGBE_WRITE_FLUSH(hw);
+
+	return IXGBE_SUCCESS;
+}
+
+int
+rte_pmd_ixgbe_macsec_enable(uint8_t port, uint8_t en, uint8_t rp)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	/* Stop the data paths */
+	if (ixgbe_disable_sec_rx_path(hw) != IXGBE_SUCCESS)
+		return -ENOTSUP;
+	/*
+	 * Workaround:
+	 * As no ixgbe_disable_sec_rx_path equivalent is
+	 * implemented for tx in the base code, and we are
+	 * not allowed to modify the base code in DPDK, so
+	 * just call the hand-written one directly for now.
+	 * The hardware support has been checked by
+	 * ixgbe_disable_sec_rx_path().
+	 */
+	ixgbe_disable_sec_tx_path_generic(hw);
+
+	/* Enable Ethernet CRC (required by MACsec offload) */
+	ctrl = IXGBE_READ_REG(hw, IXGBE_HLREG0);
+	ctrl |= IXGBE_HLREG0_TXCRCEN | IXGBE_HLREG0_RXCRCSTRP;
+	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, ctrl);
+
+	/* Enable the TX and RX crypto engines */
+	ctrl = IXGBE_READ_REG(hw, IXGBE_SECTXCTRL);
+	ctrl &= ~IXGBE_SECTXCTRL_SECTX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXCTRL, ctrl);
+
+	ctrl = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL);
+	ctrl &= ~IXGBE_SECRXCTRL_SECRX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, ctrl);
+
+	ctrl = IXGBE_READ_REG(hw, IXGBE_SECTXMINIFG);
+	ctrl &= ~IXGBE_SECTX_MINSECIFG_MASK;
+	ctrl |= 0x3;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXMINIFG, ctrl);
+
+	/* Enable SA lookup */
+	ctrl = IXGBE_READ_REG(hw, IXGBE_LSECTXCTRL);
+	ctrl &= ~IXGBE_LSECTXCTRL_EN_MASK;
+	ctrl |= en ? IXGBE_LSECTXCTRL_AUTH_ENCRYPT :
+		     IXGBE_LSECTXCTRL_AUTH;
+	ctrl |= IXGBE_LSECTXCTRL_AISCI;
+	ctrl &= ~IXGBE_LSECTXCTRL_PNTHRSH_MASK;
+	ctrl |= IXGBE_MACSEC_PNTHRSH & IXGBE_LSECTXCTRL_PNTHRSH_MASK;
+	IXGBE_WRITE_REG(hw, IXGBE_LSECTXCTRL, ctrl);
+
+	ctrl = IXGBE_READ_REG(hw, IXGBE_LSECRXCTRL);
+	ctrl &= ~IXGBE_LSECRXCTRL_EN_MASK;
+	ctrl |= IXGBE_LSECRXCTRL_STRICT << IXGBE_LSECRXCTRL_EN_SHIFT;
+	ctrl &= ~IXGBE_LSECRXCTRL_PLSH;
+	if (rp)
+		ctrl |= IXGBE_LSECRXCTRL_RP;
+	else
+		ctrl &= ~IXGBE_LSECRXCTRL_RP;
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXCTRL, ctrl);
+
+	/* Start the data paths */
+	ixgbe_enable_sec_rx_path(hw);
+	/*
+	 * Workaround:
+	 * As no ixgbe_enable_sec_rx_path equivalent is
+	 * implemented for tx in the base code, and we are
+	 * not allowed to modify the base code in DPDK, so
+	 * just call the hand-written one directly for now.
+	 */
+	ixgbe_enable_sec_tx_path_generic(hw);
+
+	return 0;
+}
+
+int
+rte_pmd_ixgbe_macsec_disable(uint8_t port)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	/* Stop the data paths */
+	if (ixgbe_disable_sec_rx_path(hw) != IXGBE_SUCCESS)
+		return -ENOTSUP;
+	/*
+	 * Workaround:
+	 * As no ixgbe_disable_sec_rx_path equivalent is
+	 * implemented for tx in the base code, and we are
+	 * not allowed to modify the base code in DPDK, so
+	 * just call the hand-written one directly for now.
+	 * The hardware support has been checked by
+	 * ixgbe_disable_sec_rx_path().
+	 */
+	ixgbe_disable_sec_tx_path_generic(hw);
+
+	/* Disable the TX and RX crypto engines */
+	ctrl = IXGBE_READ_REG(hw, IXGBE_SECTXCTRL);
+	ctrl |= IXGBE_SECTXCTRL_SECTX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECTXCTRL, ctrl);
+
+	ctrl = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL);
+	ctrl |= IXGBE_SECRXCTRL_SECRX_DIS;
+	IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, ctrl);
+
+	/* Disable SA lookup */
+	ctrl = IXGBE_READ_REG(hw, IXGBE_LSECTXCTRL);
+	ctrl &= ~IXGBE_LSECTXCTRL_EN_MASK;
+	ctrl |= IXGBE_LSECTXCTRL_DISABLE;
+	IXGBE_WRITE_REG(hw, IXGBE_LSECTXCTRL, ctrl);
+
+	ctrl = IXGBE_READ_REG(hw, IXGBE_LSECRXCTRL);
+	ctrl &= ~IXGBE_LSECRXCTRL_EN_MASK;
+	ctrl |= IXGBE_LSECRXCTRL_DISABLE << IXGBE_LSECRXCTRL_EN_SHIFT;
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXCTRL, ctrl);
+
+	/* Start the data paths */
+	ixgbe_enable_sec_rx_path(hw);
+	/*
+	 * Workaround:
+	 * As no ixgbe_enable_sec_rx_path equivalent is
+	 * implemented for tx in the base code, and we are
+	 * not allowed to modify the base code in DPDK, so
+	 * just call the hand-written one directly for now.
+	 */
+	ixgbe_enable_sec_tx_path_generic(hw);
+
+	return 0;
+}
+
+int
+rte_pmd_ixgbe_macsec_config_txsc(uint8_t port, uint8_t *mac)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	ctrl = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECTXSCL, ctrl);
+
+	ctrl = mac[4] | (mac[5] << 8);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECTXSCH, ctrl);
+
+	return 0;
+}
+
+int
+rte_pmd_ixgbe_macsec_config_rxsc(uint8_t port, uint8_t *mac, uint16_t pi)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	ctrl = mac[0] | (mac[1] << 8) | (mac[2] << 16) | (mac[3] << 24);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXSCL, ctrl);
+
+	pi = rte_cpu_to_be_16(pi);
+	ctrl = mac[4] | (mac[5] << 8) | (pi << 16);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXSCH, ctrl);
+
+	return 0;
+}
+
+int
+rte_pmd_ixgbe_macsec_select_txsa(uint8_t port, uint8_t idx, uint8_t an,
+				 uint32_t pn, uint8_t *key)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl, i;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	if (idx != 0 && idx != 1)
+		return -EINVAL;
+
+	if (an >= 4)
+		return -EINVAL;
+
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	/* Set the PN and key */
+	pn = rte_cpu_to_be_32(pn);
+	if (idx == 0) {
+		IXGBE_WRITE_REG(hw, IXGBE_LSECTXPN0, pn);
+
+		for (i = 0; i < 4; i++) {
+			ctrl = (key[i * 4 + 0] <<  0) |
+			       (key[i * 4 + 1] <<  8) |
+			       (key[i * 4 + 2] << 16) |
+			       (key[i * 4 + 3] << 24);
+			IXGBE_WRITE_REG(hw, IXGBE_LSECTXKEY0(i), ctrl);
+		}
+	} else {
+		IXGBE_WRITE_REG(hw, IXGBE_LSECTXPN1, pn);
+
+		for (i = 0; i < 4; i++) {
+			ctrl = (key[i * 4 + 0] <<  0) |
+			       (key[i * 4 + 1] <<  8) |
+			       (key[i * 4 + 2] << 16) |
+			       (key[i * 4 + 3] << 24);
+			IXGBE_WRITE_REG(hw, IXGBE_LSECTXKEY1(i), ctrl);
+		}
+	}
+
+	/* Set AN and select the SA */
+	ctrl = (an << idx * 2) | (idx << 4);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECTXSA, ctrl);
+
+	return 0;
+}
+
+int
+rte_pmd_ixgbe_macsec_select_rxsa(uint8_t port, uint8_t idx, uint8_t an,
+				 uint32_t pn, uint8_t *key)
+{
+	struct ixgbe_hw *hw;
+	struct rte_eth_dev *dev;
+	uint32_t ctrl, i;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port, -ENODEV);
+
+	dev = &rte_eth_devices[port];
+	hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+
+	if (idx != 0 && idx != 1)
+		return -EINVAL;
+
+	if (an >= 4)
+		return -EINVAL;
+
+	/* Set the PN */
+	pn = rte_cpu_to_be_32(pn);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXPN(idx), pn);
+
+	/* Set the key */
+	for (i = 0; i < 4; i++) {
+		ctrl = (key[i * 4 + 0] <<  0) |
+		       (key[i * 4 + 1] <<  8) |
+		       (key[i * 4 + 2] << 16) |
+		       (key[i * 4 + 3] << 24);
+		IXGBE_WRITE_REG(hw, IXGBE_LSECRXKEY(idx, i), ctrl);
+	}
+
+	/* Set the AN and validate the SA */
+	ctrl = an | (1 << 2);
+	IXGBE_WRITE_REG(hw, IXGBE_LSECRXSA(idx), ctrl);
+
+	return 0;
+}
+
 RTE_PMD_REGISTER_PCI(net_ixgbe, rte_ixgbe_pmd.pci_drv);
 RTE_PMD_REGISTER_PCI_TABLE(net_ixgbe, pci_id_ixgbe_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ixgbe, "* igb_uio | uio_pci_generic | vfio");
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index 80350c2..ffced1c 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -43,6 +43,7 @@
 #define IXGBE_FLAG_NEED_LINK_UPDATE (uint32_t)(1 << 0)
 #define IXGBE_FLAG_MAILBOX          (uint32_t)(1 << 1)
 #define IXGBE_FLAG_PHY_INTERRUPT    (uint32_t)(1 << 2)
+#define IXGBE_FLAG_MACSEC           (uint32_t)(1 << 3)
 
 /*
  * Defines that were not part of ixgbe_type.h as they are not used by the
@@ -130,6 +131,10 @@
 #define IXGBE_MISC_VEC_ID               RTE_INTR_VEC_ZERO_OFFSET
 #define IXGBE_RX_VEC_START              RTE_INTR_VEC_RXTX_OFFSET
 
+#define IXGBE_SECTX_MINSECIFG_MASK      0x0000000F
+
+#define IXGBE_MACSEC_PNTHRSH            0xFFFFFE00
+
 /*
  * Information about the fdir mode.
  */
@@ -265,11 +270,44 @@ struct ixgbe_filter_info {
 };
 
 /*
+ * Statistics counters collected by the MACsec
+ */
+struct ixgbe_macsec_stats {
+	/* TX port statistics */
+	uint64_t out_pkts_untagged;
+	uint64_t out_pkts_encrypted;
+	uint64_t out_pkts_protected;
+	uint64_t out_octets_encrypted;
+	uint64_t out_octets_protected;
+
+	/* RX port statistics */
+	uint64_t in_pkts_untagged;
+	uint64_t in_pkts_badtag;
+	uint64_t in_pkts_nosci;
+	uint64_t in_pkts_unknownsci;
+	uint64_t in_octets_decrypted;
+	uint64_t in_octets_validated;
+
+	/* RX SC statistics */
+	uint64_t in_pkts_unchecked;
+	uint64_t in_pkts_delayed;
+	uint64_t in_pkts_late;
+
+	/* RX SA statistics */
+	uint64_t in_pkts_ok;
+	uint64_t in_pkts_invalid;
+	uint64_t in_pkts_notvalid;
+	uint64_t in_pkts_unusedsa;
+	uint64_t in_pkts_notusingsa;
+};
+
+/*
  * Structure to store private data for each driver instance (for each port).
  */
 struct ixgbe_adapter {
 	struct ixgbe_hw             hw;
 	struct ixgbe_hw_stats       stats;
+	struct ixgbe_macsec_stats   macsec_stats;
 	struct ixgbe_hw_fdir_info   fdir;
 	struct ixgbe_interrupt      intr;
 	struct ixgbe_stat_mapping_registers stat_mappings;
@@ -300,6 +338,9 @@ struct ixgbe_adapter {
 #define IXGBE_DEV_PRIVATE_TO_STATS(adapter) \
 	(&((struct ixgbe_adapter *)adapter)->stats)
 
+#define IXGBE_DEV_PRIVATE_TO_MACSEC_STATS(adapter) \
+	(&((struct ixgbe_adapter *)adapter)->macsec_stats)
+
 #define IXGBE_DEV_PRIVATE_TO_INTR(adapter) \
 	(&((struct ixgbe_adapter *)adapter)->intr)
 
@@ -445,4 +486,8 @@ uint32_t ixgbe_convert_vm_rx_mask_to_val(uint16_t rx_mask, uint32_t orig_val);
 
 int ixgbe_fdir_ctrl_func(struct rte_eth_dev *dev,
 			enum rte_filter_op filter_op, void *arg);
+
+int ixgbe_disable_sec_tx_path_generic(struct ixgbe_hw *hw);
+
+int ixgbe_enable_sec_tx_path_generic(struct ixgbe_hw *hw);
 #endif /* _IXGBE_ETHDEV_H_ */
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b2d9f45..db9fc18 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -79,12 +79,15 @@
 #include "base/ixgbe_common.h"
 #include "ixgbe_rxtx.h"
 
+#include "rte_pmd_ixgbe.h"
+
 /* Bit Mask to indicate what bits required for building TX context */
 #define IXGBE_TX_OFFLOAD_MASK (			 \
 		PKT_TX_VLAN_PKT |		 \
 		PKT_TX_IP_CKSUM |		 \
 		PKT_TX_L4_MASK |		 \
 		PKT_TX_TCP_SEG |		 \
+		PKT_TX_IXGBE_MACSEC |		 \
 		PKT_TX_OUTER_IP_CKSUM)
 
 #if 1
@@ -519,6 +522,8 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 		cmdtype |= IXGBE_ADVTXD_DCMD_TSE;
 	if (ol_flags & PKT_TX_OUTER_IP_CKSUM)
 		cmdtype |= (1 << IXGBE_ADVTXD_OUTERIPCS_SHIFT);
+	if (ol_flags & PKT_TX_IXGBE_MACSEC)
+		cmdtype |= IXGBE_ADVTXD_MAC_LINKSEC;
 	return cmdtype;
 }
 
diff --git a/drivers/net/ixgbe/rte_pmd_ixgbe.h b/drivers/net/ixgbe/rte_pmd_ixgbe.h
index c2fb826..0c28ea2 100644
--- a/drivers/net/ixgbe/rte_pmd_ixgbe.h
+++ b/drivers/net/ixgbe/rte_pmd_ixgbe.h
@@ -42,6 +42,28 @@
 #include <rte_ethdev.h>
 
 /**
+ * If these flags are advertised by the PMD, the NIC supports the MACsec
+ * offload. The incoming MACsec traffics can be offloaded transparently
+ * after the MACsec offload is configured correctly by the application.
+ * And the application can set the PKT_TX_IXGBE_MACSEC flag in mbufs to
+ * enable the MACsec offload for the packets to be transmitted.
+ */
+#define DEV_RX_OFFLOAD_IXGBE_MACSEC_STRIP	DEV_RX_OFFLOAD_RESERVED_0
+#define DEV_TX_OFFLOAD_IXGBE_MACSEC_INSERT	DEV_TX_OFFLOAD_RESERVED_0
+
+/**
+ * This event will occur when the PN counter in a MACsec connection
+ * reach the exhaustion threshold.
+ */
+#define RTE_ETH_EVENT_IXGBE_MACSEC		RTE_ETH_EVENT_RESERVED_0
+
+/**
+ * Offload the MACsec. This flag must be set by the application in mbuf
+ * to enable this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_IXGBE_MACSEC			PKT_TX_RESERVED_0
+
+/**
  * Set the VF MAC address.
  *
  * @param port
@@ -183,6 +205,106 @@ int
 rte_pmd_ixgbe_set_vf_vlan_stripq(uint8_t port, uint16_t vf, uint8_t on);
 
 /**
+ * Enable MACsec offload.
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param en
+ *    1 - Enable encryption (encrypt and add integrity signature).
+ *    0 - Disable encryption (only add integrity signature).
+ * @param rp
+ *    1 - Enable replay protection.
+ *    0 - Disable replay protection.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ */
+int rte_pmd_ixgbe_macsec_enable(uint8_t port, uint8_t en, uint8_t rp);
+
+/**
+ * Disable MACsec offload.
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ */
+int rte_pmd_ixgbe_macsec_disable(uint8_t port);
+
+/**
+ * Configure Tx SC (Secure Connection).
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param mac
+ *   The MAC address on the local side.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ */
+int rte_pmd_ixgbe_macsec_config_txsc(uint8_t port, uint8_t *mac);
+
+/**
+ * Configure Rx SC (Secure Connection).
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param mac
+ *   The MAC address on the remote side.
+ * @param pi
+ *   The PI (port identifier) on the remote side.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ */
+int rte_pmd_ixgbe_macsec_config_rxsc(uint8_t port, uint8_t *mac, uint16_t pi);
+
+/**
+ * Enable Tx SA (Secure Association).
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param idx
+ *   The SA to be enabled (0 or 1).
+ * @param an
+ *   The association number on the local side.
+ * @param pn
+ *   The packet number on the local side.
+ * @param key
+ *   The key on the local side.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+int rte_pmd_ixgbe_macsec_select_txsa(uint8_t port, uint8_t idx, uint8_t an,
+		uint32_t pn, uint8_t *key);
+
+/**
+ * Enable Rx SA (Secure Association).
+ *
+ * @param port
+ *   The port identifier of the Ethernet device.
+ * @param idx
+ *   The SA to be enabled (0 or 1)
+ * @param an
+ *   The association number on the remote side.
+ * @param pn
+ *   The packet number on the remote side.
+ * @param key
+ *   The key on the remote side.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENODEV) if *port* invalid.
+ *   - (-EINVAL) if bad parameter.
+ */
+int rte_pmd_ixgbe_macsec_select_rxsa(uint8_t port, uint8_t idx, uint8_t an,
+		uint32_t pn, uint8_t *key);
+
+/**
  * Response sent back to ixgbe driver from user app after callback
  */
 enum rte_pmd_ixgbe_mb_event_rsp {
diff --git a/drivers/net/ixgbe/rte_pmd_ixgbe_version.map b/drivers/net/ixgbe/rte_pmd_ixgbe_version.map
index 92434f3..6d68934 100644
--- a/drivers/net/ixgbe/rte_pmd_ixgbe_version.map
+++ b/drivers/net/ixgbe/rte_pmd_ixgbe_version.map
@@ -15,3 +15,14 @@ DPDK_16.11 {
 	rte_pmd_ixgbe_set_vf_vlan_insert;
 	rte_pmd_ixgbe_set_vf_vlan_stripq;
 } DPDK_2.0;
+
+DPDK_17.02 {
+	global:
+
+	rte_pmd_ixgbe_macsec_enable;
+	rte_pmd_ixgbe_macsec_disable;
+	rte_pmd_ixgbe_macsec_config_txsc;
+	rte_pmd_ixgbe_macsec_config_rxsc;
+	rte_pmd_ixgbe_macsec_select_txsa;
+	rte_pmd_ixgbe_macsec_select_rxsa;
+} DPDK_16.11;
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox