* [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5)
@ 2015-10-05 17:52 Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
` (13 more replies)
0 siblings, 14 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:52 UTC (permalink / raw)
To: dev
This PMD adds basic support for Mellanox ConnectX-4 (mlx5) families of
10/25/40/50/100 Gb/s adapters through the Verbs framework.
Its design is very similar to that of mlx4 from which most of its code is
borrowed without the mistake of putting it all in a single huge file.
It is disabled by default due to its dependency on libibverbs.
Adrien Mazarguil (13):
mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters
mlx5: add non-scattered TX and RX support
mlx5: add MAC handling
mlx5: add device configure/start/stop
mlx5: add support for scattered RX and TX buffers
mlx5: add MTU configuration support
mlx5: add software counters and related callbacks
mlx5: add promiscuous and allmulticast RX modes
mlx5: add link update device operation
mlx5: add flow control device operations
mlx5: add VLAN filtering
mlx5: add checksum offloading support
doc: add mlx5 documentation and release notes for version 2.2
MAINTAINERS | 4 +
config/common_bsdapp | 9 +
config/common_linuxapp | 9 +
doc/guides/nics/mlx5.rst | 308 +++++++++
doc/guides/rel_notes/release_2_2.rst | 8 +
drivers/net/Makefile | 1 +
drivers/net/mlx5/Makefile | 128 ++++
drivers/net/mlx5/mlx5.c | 553 +++++++++++++++
drivers/net/mlx5/mlx5.h | 215 ++++++
drivers/net/mlx5/mlx5_defs.h | 85 +++
drivers/net/mlx5/mlx5_ethdev.c | 843 +++++++++++++++++++++++
drivers/net/mlx5/mlx5_mac.c | 497 ++++++++++++++
drivers/net/mlx5/mlx5_rxmode.c | 327 +++++++++
drivers/net/mlx5/mlx5_rxq.c | 1067 +++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.c | 1008 +++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 197 ++++++
drivers/net/mlx5/mlx5_stats.c | 144 ++++
drivers/net/mlx5/mlx5_trigger.c | 153 +++++
drivers/net/mlx5/mlx5_txq.c | 513 ++++++++++++++
drivers/net/mlx5/mlx5_utils.h | 166 +++++
drivers/net/mlx5/mlx5_vlan.c | 166 +++++
drivers/net/mlx5/rte_pmd_mlx5_version.map | 3 +
mk/rte.app.mk | 5 +
23 files changed, 6409 insertions(+)
create mode 100644 doc/guides/nics/mlx5.rst
create mode 100644 drivers/net/mlx5/Makefile
create mode 100644 drivers/net/mlx5/mlx5.c
create mode 100644 drivers/net/mlx5/mlx5.h
create mode 100644 drivers/net/mlx5/mlx5_defs.h
create mode 100644 drivers/net/mlx5/mlx5_ethdev.c
create mode 100644 drivers/net/mlx5/mlx5_mac.c
create mode 100644 drivers/net/mlx5/mlx5_rxmode.c
create mode 100644 drivers/net/mlx5/mlx5_rxq.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.h
create mode 100644 drivers/net/mlx5/mlx5_stats.c
create mode 100644 drivers/net/mlx5/mlx5_trigger.c
create mode 100644 drivers/net/mlx5/mlx5_txq.c
create mode 100644 drivers/net/mlx5/mlx5_utils.h
create mode 100644 drivers/net/mlx5/mlx5_vlan.c
create mode 100644 drivers/net/mlx5/rte_pmd_mlx5_version.map
--
2.1.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
@ 2015-10-05 17:52 ` Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
` (12 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:52 UTC (permalink / raw)
To: dev
In its current state, this driver implements the bare minimum to initialize
itself and Mellanox ConnectX-4 adapters without doing anything else
(no RX/TX for instance). It is disabled by default since it is based on the
mlx4 driver and also depends on libibverbs.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
---
MAINTAINERS | 4 +
config/common_bsdapp | 6 +
config/common_linuxapp | 6 +
drivers/net/Makefile | 1 +
drivers/net/mlx5/Makefile | 109 +++++++
drivers/net/mlx5/mlx5.c | 496 ++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5.h | 154 ++++++++++
drivers/net/mlx5/mlx5_defs.h | 53 ++++
drivers/net/mlx5/mlx5_ethdev.c | 420 +++++++++++++++++++++++++
drivers/net/mlx5/mlx5_mac.c | 150 +++++++++
drivers/net/mlx5/mlx5_utils.h | 149 +++++++++
drivers/net/mlx5/rte_pmd_mlx5_version.map | 3 +
mk/rte.app.mk | 5 +
13 files changed, 1556 insertions(+)
create mode 100644 drivers/net/mlx5/Makefile
create mode 100644 drivers/net/mlx5/mlx5.c
create mode 100644 drivers/net/mlx5/mlx5.h
create mode 100644 drivers/net/mlx5/mlx5_defs.h
create mode 100644 drivers/net/mlx5/mlx5_ethdev.c
create mode 100644 drivers/net/mlx5/mlx5_mac.c
create mode 100644 drivers/net/mlx5/mlx5_utils.h
create mode 100644 drivers/net/mlx5/rte_pmd_mlx5_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..9d11055 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -255,6 +255,10 @@ M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
F: drivers/net/mlx4/
F: doc/guides/nics/mlx4.rst
+Mellanox mlx5
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: drivers/net/mlx5/
+
RedHat virtio
M: Huawei Xie <huawei.xie@intel.com>
M: Changchun Ouyang <changchun.ouyang@intel.com>
diff --git a/config/common_bsdapp b/config/common_bsdapp
index b37dcf4..1e6885f 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -214,6 +214,12 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
+# Compile burst-oriented Mellanox ConnectX-4 (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_PMD=n
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+
+#
# Compile burst-oriented Broadcom PMD driver
#
CONFIG_RTE_LIBRTE_BNX2X_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7da7ba7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -212,6 +212,12 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
+# Compile burst-oriented Mellanox ConnectX-4 (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_PMD=n
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+
+#
# Compile burst-oriented Broadcom PMD driver
#
CONFIG_RTE_LIBRTE_BNX2X_PMD=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..6da1ce2 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
+DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
new file mode 100644
index 0000000..6e63073
--- /dev/null
+++ b/drivers/net/mlx5/Makefile
@@ -0,0 +1,109 @@
+# BSD LICENSE
+#
+# Copyright 2015 6WIND S.A.
+# Copyright 2015 Mellanox.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS)$(CONFIG_RTE_BUILD_SHARED_LIB),yy)
+all:
+ @echo 'MLX5: Not supported in a combined shared library'
+ @false
+endif
+
+# Library name.
+LIB = librte_pmd_mlx5.a
+
+# Sources.
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+
+# Dependencies.
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_mempool
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -libverbs
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual
+
+EXPORT_MAP := rte_pmd_mlx5_version.map
+LIBABIVER := 1
+
+# DEBUG which is usually provided on the command-line may enable
+# CONFIG_RTE_LIBRTE_MLX5_DEBUG.
+ifeq ($(DEBUG),1)
+CONFIG_RTE_LIBRTE_MLX5_DEBUG := y
+endif
+
+# User-defined CFLAGS.
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DEBUG),y)
+CFLAGS += -pedantic -UNDEBUG -DPEDANTIC
+else
+CFLAGS += -DNDEBUG -UPEDANTIC
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
+ $Q $(RM) -f -- '$@'
+ $Q sh -- '$<' '$@' \
+ RSS_SUPPORT \
+ infiniband/verbs.h \
+ enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
+ $Q sh -- '$<' '$@' \
+ HAVE_EXP_QUERY_DEVICE \
+ infiniband/verbs.h \
+ type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+
+mlx5.o: mlx5_autoconf.h
+
+clean_mlx5: FORCE
+ $Q rm -f -- mlx5_autoconf.h
+
+clean: clean_mlx5
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
new file mode 100644
index 0000000..f6fe8a6
--- /dev/null
+++ b/drivers/net/mlx5/mlx5.c
@@ -0,0 +1,496 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <unistd.h>
+#include <string.h>
+#include <assert.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <net/if.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+#include "mlx5_autoconf.h"
+
+/**
+ * DPDK callback to close the device.
+ *
+ * Destroy all queues and objects, free memory.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+static void
+mlx5_dev_close(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ priv_lock(priv);
+ DEBUG("%p: closing device \"%s\"",
+ (void *)dev,
+ ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+ if (priv->pd != NULL) {
+ assert(priv->ctx != NULL);
+ claim_zero(ibv_dealloc_pd(priv->pd));
+ claim_zero(ibv_close_device(priv->ctx));
+ } else
+ assert(priv->ctx == NULL);
+ priv_unlock(priv);
+ memset(priv, 0, sizeof(*priv));
+}
+
+static const struct eth_dev_ops mlx5_dev_ops = {
+ .dev_close = mlx5_dev_close,
+};
+
+static struct {
+ struct rte_pci_addr pci_addr; /* associated PCI address */
+ uint32_t ports; /* physical ports bitfield. */
+} mlx5_dev[32];
+
+/**
+ * Get device index in mlx5_dev[] from PCI bus address.
+ *
+ * @param[in] pci_addr
+ * PCI bus address to look for.
+ *
+ * @return
+ * mlx5_dev[] index on success, -1 on failure.
+ */
+static int
+mlx5_dev_idx(struct rte_pci_addr *pci_addr)
+{
+ unsigned int i;
+ int ret = -1;
+
+ assert(pci_addr != NULL);
+ for (i = 0; (i != RTE_DIM(mlx5_dev)); ++i) {
+ if ((mlx5_dev[i].pci_addr.domain == pci_addr->domain) &&
+ (mlx5_dev[i].pci_addr.bus == pci_addr->bus) &&
+ (mlx5_dev[i].pci_addr.devid == pci_addr->devid) &&
+ (mlx5_dev[i].pci_addr.function == pci_addr->function))
+ return i;
+ if ((mlx5_dev[i].ports == 0) && (ret == -1))
+ ret = i;
+ }
+ return ret;
+}
+
+static struct eth_driver mlx5_driver;
+
+/**
+ * DPDK callback to register a PCI device.
+ *
+ * This function creates an Ethernet device for each port of a given
+ * PCI device.
+ *
+ * @param[in] pci_drv
+ * PCI driver structure (mlx5_driver).
+ * @param[in] pci_dev
+ * PCI device information.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+static int
+mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+ struct ibv_device **list;
+ struct ibv_device *ibv_dev;
+ int err = 0;
+ struct ibv_context *attr_ctx = NULL;
+ struct ibv_device_attr device_attr;
+ unsigned int vf;
+ int idx;
+ int i;
+
+ (void)pci_drv;
+ assert(pci_drv == &mlx5_driver.pci_drv);
+ /* Get mlx5_dev[] index. */
+ idx = mlx5_dev_idx(&pci_dev->addr);
+ if (idx == -1) {
+ ERROR("this driver cannot support any more adapters");
+ return -ENOMEM;
+ }
+ DEBUG("using driver device index %d", idx);
+
+ /* Save PCI address. */
+ mlx5_dev[idx].pci_addr = pci_dev->addr;
+ list = ibv_get_device_list(&i);
+ if (list == NULL) {
+ assert(errno);
+ if (errno == ENOSYS) {
+ WARN("cannot list devices, is ib_uverbs loaded?");
+ return 0;
+ }
+ return -errno;
+ }
+ assert(i >= 0);
+ /*
+ * For each listed device, check related sysfs entry against
+ * the provided PCI ID.
+ */
+ while (i != 0) {
+ struct rte_pci_addr pci_addr;
+
+ --i;
+ DEBUG("checking device \"%s\"", list[i]->name);
+ if (mlx5_ibv_device_to_pci_addr(list[i], &pci_addr))
+ continue;
+ if ((pci_dev->addr.domain != pci_addr.domain) ||
+ (pci_dev->addr.bus != pci_addr.bus) ||
+ (pci_dev->addr.devid != pci_addr.devid) ||
+ (pci_dev->addr.function != pci_addr.function))
+ continue;
+ vf = ((pci_dev->id.device_id ==
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4VF) ||
+ (pci_dev->id.device_id ==
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF));
+ INFO("PCI information matches, using device \"%s\" (VF: %s)",
+ list[i]->name, (vf ? "true" : "false"));
+ attr_ctx = ibv_open_device(list[i]);
+ err = errno;
+ break;
+ }
+ if (attr_ctx == NULL) {
+ ibv_free_device_list(list);
+ switch (err) {
+ case 0:
+ WARN("cannot access device, is mlx5_ib loaded?");
+ return 0;
+ case EINVAL:
+ WARN("cannot use device, are drivers up to date?");
+ return 0;
+ }
+ assert(err > 0);
+ return -err;
+ }
+ ibv_dev = list[i];
+
+ DEBUG("device opened");
+ if (ibv_query_device(attr_ctx, &device_attr))
+ goto error;
+ INFO("%u port(s) detected", device_attr.phys_port_cnt);
+
+ for (i = 0; i < device_attr.phys_port_cnt; i++) {
+ uint32_t port = i + 1; /* ports are indexed from one */
+ uint32_t test = (1 << i);
+ struct ibv_context *ctx = NULL;
+ struct ibv_port_attr port_attr;
+ struct ibv_pd *pd = NULL;
+ struct priv *priv = NULL;
+ struct rte_eth_dev *eth_dev;
+#ifdef HAVE_EXP_QUERY_DEVICE
+ struct ibv_exp_device_attr exp_device_attr;
+#endif /* HAVE_EXP_QUERY_DEVICE */
+ struct ether_addr mac;
+
+#ifdef HAVE_EXP_QUERY_DEVICE
+ exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
+#ifdef RSS_SUPPORT
+ exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
+#endif /* RSS_SUPPORT */
+#endif /* HAVE_EXP_QUERY_DEVICE */
+
+ DEBUG("using port %u (%08" PRIx32 ")", port, test);
+
+ ctx = ibv_open_device(ibv_dev);
+ if (ctx == NULL)
+ goto port_error;
+
+ /* Check port status. */
+ err = ibv_query_port(ctx, port, &port_attr);
+ if (err) {
+ ERROR("port query failed: %s", strerror(err));
+ goto port_error;
+ }
+ if (port_attr.state != IBV_PORT_ACTIVE)
+ WARN("bad state for port %d: \"%s\" (%d)",
+ port, ibv_port_state_str(port_attr.state),
+ port_attr.state);
+
+ /* Allocate protection domain. */
+ pd = ibv_alloc_pd(ctx);
+ if (pd == NULL) {
+ ERROR("PD allocation failure");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ mlx5_dev[idx].ports |= test;
+
+ /* from rte_ethdev.c */
+ priv = rte_zmalloc("ethdev private structure",
+ sizeof(*priv),
+ RTE_CACHE_LINE_SIZE);
+ if (priv == NULL) {
+ ERROR("priv allocation failure");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ priv->ctx = ctx;
+ priv->device_attr = device_attr;
+ priv->port = port;
+ priv->pd = pd;
+ priv->mtu = ETHER_MTU;
+#ifdef HAVE_EXP_QUERY_DEVICE
+ if (ibv_exp_query_device(ctx, &exp_device_attr)) {
+ ERROR("ibv_exp_query_device() failed");
+ goto port_error;
+ }
+#ifdef RSS_SUPPORT
+ if ((exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_QPG) &&
+ (exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_UD_RSS) &&
+ (exp_device_attr.comp_mask &
+ IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
+ (exp_device_attr.max_rss_tbl_sz > 0)) {
+ priv->hw_qpg = 1;
+ priv->hw_rss = 1;
+ priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
+ } else {
+ priv->hw_qpg = 0;
+ priv->hw_rss = 0;
+ priv->max_rss_tbl_sz = 0;
+ }
+ priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_UD_TSS);
+ DEBUG("device flags: %s%s%s",
+ (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
+ (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
+ (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
+ if (priv->hw_rss)
+ DEBUG("maximum RSS indirection table size: %u",
+ exp_device_attr.max_rss_tbl_sz);
+#endif /* RSS_SUPPORT */
+
+ priv->hw_csum =
+ ((exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) &&
+ (exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_IP_PKT));
+ DEBUG("checksum offloading is %ssupported",
+ (priv->hw_csum ? "" : "not "));
+
+ priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_VXLAN_SUPPORT);
+ DEBUG("L2 tunnel checksum offloads are %ssupported",
+ (priv->hw_csum_l2tun ? "" : "not "));
+
+#endif /* HAVE_EXP_QUERY_DEVICE */
+
+ priv->vf = vf;
+ /* Configure the first MAC address by default. */
+ if (priv_get_mac(priv, &mac.addr_bytes)) {
+ ERROR("cannot get MAC address, is mlx5_en loaded?"
+ " (errno: %s)", strerror(errno));
+ goto port_error;
+ }
+ INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+ priv->port,
+ mac.addr_bytes[0], mac.addr_bytes[1],
+ mac.addr_bytes[2], mac.addr_bytes[3],
+ mac.addr_bytes[4], mac.addr_bytes[5]);
+ /* Register MAC and broadcast addresses. */
+ claim_zero(priv_mac_addr_add(priv, 0,
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ mac.addr_bytes));
+ claim_zero(priv_mac_addr_add(priv, 1,
+ &(const uint8_t [ETHER_ADDR_LEN])
+ { "\xff\xff\xff\xff\xff\xff" }));
+#ifndef NDEBUG
+ {
+ char ifname[IF_NAMESIZE];
+
+ if (priv_get_ifname(priv, &ifname) == 0)
+ DEBUG("port %u ifname is \"%s\"",
+ priv->port, ifname);
+ else
+ DEBUG("port %u ifname is unknown", priv->port);
+ }
+#endif
+ /* Get actual MTU if possible. */
+ priv_get_mtu(priv, &priv->mtu);
+ DEBUG("port %u MTU is %u", priv->port, priv->mtu);
+
+ /* from rte_ethdev.c */
+ {
+ char name[RTE_ETH_NAME_MAX_LEN];
+
+ snprintf(name, sizeof(name), "%s port %u",
+ ibv_get_device_name(ibv_dev), port);
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+ }
+ if (eth_dev == NULL) {
+ ERROR("can not allocate rte ethdev");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ eth_dev->data->dev_private = priv;
+ eth_dev->pci_dev = pci_dev;
+ eth_dev->driver = &mlx5_driver;
+ eth_dev->data->rx_mbuf_alloc_failed = 0;
+ eth_dev->data->mtu = ETHER_MTU;
+
+ priv->dev = eth_dev;
+ eth_dev->dev_ops = &mlx5_dev_ops;
+ eth_dev->data->mac_addrs = priv->mac;
+
+ /* Bring Ethernet device up. */
+ DEBUG("forcing Ethernet interface up");
+ priv_set_flags(priv, ~IFF_UP, IFF_UP);
+ continue;
+
+port_error:
+ rte_free(priv);
+ if (pd)
+ claim_zero(ibv_dealloc_pd(pd));
+ if (ctx)
+ claim_zero(ibv_close_device(ctx));
+ break;
+ }
+
+ /*
+ * XXX if something went wrong in the loop above, there is a resource
+ * leak (ctx, pd, priv, dpdk ethdev) but we can do nothing about it as
+ * long as the dpdk does not provide a way to deallocate a ethdev and a
+ * way to enumerate the registered ethdevs to free the previous ones.
+ */
+
+ /* no port found, complain */
+ if (!mlx5_dev[idx].ports) {
+ err = ENODEV;
+ goto error;
+ }
+
+error:
+ if (attr_ctx)
+ claim_zero(ibv_close_device(attr_ctx));
+ if (list)
+ ibv_free_device_list(list);
+ assert(err >= 0);
+ return -err;
+}
+
+static const struct rte_pci_id mlx5_pci_id_map[] = {
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4VF,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LX,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = 0
+ }
+};
+
+static struct eth_driver mlx5_driver = {
+ .pci_drv = {
+ .name = MLX5_DRIVER_NAME,
+ .id_table = mlx5_pci_id_map,
+ .devinit = mlx5_pci_devinit,
+ },
+ .dev_private_size = sizeof(struct priv)
+};
+
+/**
+ * Driver initialization routine.
+ */
+static int
+rte_mlx5_pmd_init(const char *name, const char *args)
+{
+ (void)name;
+ (void)args;
+ /*
+ * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+ * huge pages. Calling ibv_fork_init() during init allows
+ * applications to use fork() safely for purposes other than
+ * using this PMD, which is not supported in forked processes.
+ */
+ setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+ ibv_fork_init();
+ rte_eal_pci_register(&mlx5_driver.pci_drv);
+ return 0;
+}
+
+static struct rte_driver rte_mlx5_driver = {
+ .type = PMD_PDEV,
+ .name = MLX5_DRIVER_NAME,
+ .init = rte_mlx5_pmd_init,
+};
+
+PMD_REGISTER_DRIVER(rte_mlx5_driver)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
new file mode 100644
index 0000000..ef7975d
--- /dev/null
+++ b/drivers/net/mlx5/mlx5.h
@@ -0,0 +1,154 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_H_
+#define RTE_PMD_MLX5_H_
+
+#include <stddef.h>
+#include <stdint.h>
+#include <limits.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_spinlock.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5_autoconf.h"
+#include "mlx5_defs.h"
+
+enum {
+ PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+};
+
+struct priv {
+ struct rte_eth_dev *dev; /* Ethernet device. */
+ struct ibv_context *ctx; /* Verbs context. */
+ struct ibv_device_attr device_attr; /* Device properties. */
+ struct ibv_pd *pd; /* Protection Domain. */
+ /*
+ * MAC addresses array and configuration bit-field.
+ * An extra entry that cannot be modified by the DPDK is reserved
+ * for broadcast frames (destination MAC address ff:ff:ff:ff:ff:ff).
+ */
+ struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES];
+ BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
+ /* VLAN filters. */
+ struct {
+ unsigned int enabled:1; /* If enabled. */
+ unsigned int id:12; /* VLAN ID (0-4095). */
+ } vlan_filter[MLX5_MAX_VLAN_IDS]; /* VLAN filters table. */
+ /* Device properties. */
+ uint16_t mtu; /* Configured MTU. */
+ uint8_t port; /* Physical port number. */
+ unsigned int started:1; /* Device started, flows enabled. */
+ unsigned int promisc:1; /* Device in promiscuous mode. */
+ unsigned int allmulti:1; /* Device receives all multicast packets. */
+ unsigned int hw_qpg:1; /* QP groups are supported. */
+ unsigned int hw_tss:1; /* TSS is supported. */
+ unsigned int hw_rss:1; /* RSS is supported. */
+ unsigned int hw_csum:1; /* Checksum offload is supported. */
+ unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
+ unsigned int rss:1; /* RSS is enabled. */
+ unsigned int vf:1; /* This is a VF device. */
+ unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
+ rte_spinlock_t lock; /* Lock for control functions. */
+};
+
+/**
+ * Lock private structure to protect it from concurrent access in the
+ * control path.
+ *
+ * @param priv
+ * Pointer to private structure.
+ */
+static inline void
+priv_lock(struct priv *priv)
+{
+ rte_spinlock_lock(&priv->lock);
+}
+
+/**
+ * Unlock private structure.
+ *
+ * @param priv
+ * Pointer to private structure.
+ */
+static inline void
+priv_unlock(struct priv *priv)
+{
+ rte_spinlock_unlock(&priv->lock);
+}
+
+/* mlx5_ethdev.c */
+
+int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
+int priv_ifreq(const struct priv *, int req, struct ifreq *);
+int priv_get_mtu(struct priv *, uint16_t *);
+int priv_set_flags(struct priv *, unsigned int, unsigned int);
+int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
+ struct rte_pci_addr *);
+
+/* mlx5_mac.c */
+
+int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
+int priv_mac_addr_add(struct priv *, unsigned int,
+ const uint8_t (*)[ETHER_ADDR_LEN]);
+
+#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
new file mode 100644
index 0000000..987ddcf
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -0,0 +1,53 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_DEFS_H_
+#define RTE_PMD_MLX5_DEFS_H_
+
+/* Reported driver name. */
+#define MLX5_DRIVER_NAME "librte_pmd_mlx5"
+
+/*
+ * Maximum number of simultaneous MAC addresses supported.
+ *
+ * According to ConnectX's Programmer Reference Manual:
+ * The L2 Address Match is implemented by comparing a MAC/VLAN combination
+ * of 128 MAC addresses and 127 VLAN values, comprising 128x127 possible
+ * L2 addresses.
+ */
+#define MLX5_MAX_MAC_ADDRESSES 128
+
+/* Maximum number of simultaneous VLAN filters supported. See above. */
+#define MLX5_MAX_VLAN_IDS 127
+
+#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
new file mode 100644
index 0000000..b6c7d7a
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -0,0 +1,420 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <dirent.h>
+#include <net/if.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_atomic.h>
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+
+/**
+ * Get interface name from private structure.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[out] ifname
+ * Interface name output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
+{
+ DIR *dir;
+ struct dirent *dent;
+ unsigned int dev_type = 0;
+ unsigned int dev_port_prev = ~0u;
+ char match[IF_NAMESIZE] = "";
+
+ {
+ MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
+
+ dir = opendir(path);
+ if (dir == NULL)
+ return -1;
+ }
+ while ((dent = readdir(dir)) != NULL) {
+ char *name = dent->d_name;
+ FILE *file;
+ unsigned int dev_port;
+ int r;
+
+ if ((name[0] == '.') &&
+ ((name[1] == '\0') ||
+ ((name[1] == '.') && (name[2] == '\0'))))
+ continue;
+
+ MKSTR(path, "%s/device/net/%s/%s",
+ priv->ctx->device->ibdev_path, name,
+ (dev_type ? "dev_id" : "dev_port"));
+
+ file = fopen(path, "rb");
+ if (file == NULL) {
+ if (errno != ENOENT)
+ continue;
+ /*
+ * Switch to dev_id when dev_port does not exist as
+ * is the case with Linux kernel versions < 3.15.
+ */
+try_dev_id:
+ match[0] = '\0';
+ if (dev_type)
+ break;
+ dev_type = 1;
+ dev_port_prev = ~0u;
+ rewinddir(dir);
+ continue;
+ }
+ r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+ fclose(file);
+ if (r != 1)
+ continue;
+ /*
+ * Switch to dev_id when dev_port returns the same value for
+ * all ports. May happen when using a MOFED release older than
+ * 3.0 with a Linux kernel >= 3.15.
+ */
+ if (dev_port == dev_port_prev)
+ goto try_dev_id;
+ dev_port_prev = dev_port;
+ if (dev_port == (priv->port - 1u))
+ snprintf(match, sizeof(match), "%s", name);
+ }
+ closedir(dir);
+ if (match[0] == '\0')
+ return -1;
+ strncpy(*ifname, match, sizeof(*ifname));
+ return 0;
+}
+
+/**
+ * Read from sysfs entry.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[in] entry
+ * Entry name relative to sysfs path.
+ * @param[out] buf
+ * Data output buffer.
+ * @param size
+ * Buffer size.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_sysfs_read(const struct priv *priv, const char *entry,
+ char *buf, size_t size)
+{
+ char ifname[IF_NAMESIZE];
+ FILE *file;
+ int ret;
+ int err;
+
+ if (priv_get_ifname(priv, &ifname))
+ return -1;
+
+ MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+ ifname, entry);
+
+ file = fopen(path, "rb");
+ if (file == NULL)
+ return -1;
+ ret = fread(buf, 1, size, file);
+ err = errno;
+ if (((size_t)ret < size) && (ferror(file)))
+ ret = -1;
+ else
+ ret = size;
+ fclose(file);
+ errno = err;
+ return ret;
+}
+
+/**
+ * Write to sysfs entry.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[in] entry
+ * Entry name relative to sysfs path.
+ * @param[in] buf
+ * Data buffer.
+ * @param size
+ * Buffer size.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_sysfs_write(const struct priv *priv, const char *entry,
+ char *buf, size_t size)
+{
+ char ifname[IF_NAMESIZE];
+ FILE *file;
+ int ret;
+ int err;
+
+ if (priv_get_ifname(priv, &ifname))
+ return -1;
+
+ MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+ ifname, entry);
+
+ file = fopen(path, "wb");
+ if (file == NULL)
+ return -1;
+ ret = fwrite(buf, 1, size, file);
+ err = errno;
+ if (((size_t)ret < size) || (ferror(file)))
+ ret = -1;
+ else
+ ret = size;
+ fclose(file);
+ errno = err;
+ return ret;
+}
+
+/**
+ * Get unsigned long sysfs property.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[in] name
+ * Entry name relative to sysfs path.
+ * @param[out] value
+ * Value output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
+{
+ int ret;
+ unsigned long value_ret;
+ char value_str[32];
+
+ ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
+ if (ret == -1) {
+ DEBUG("cannot read %s value from sysfs: %s",
+ name, strerror(errno));
+ return -1;
+ }
+ value_str[ret] = '\0';
+ errno = 0;
+ value_ret = strtoul(value_str, NULL, 0);
+ if (errno) {
+ DEBUG("invalid %s value `%s': %s", name, value_str,
+ strerror(errno));
+ return -1;
+ }
+ *value = value_ret;
+ return 0;
+}
+
+/**
+ * Set unsigned long sysfs property.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[in] name
+ * Entry name relative to sysfs path.
+ * @param value
+ * Value to set.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
+{
+ int ret;
+ MKSTR(value_str, "%lu", value);
+
+ ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
+ if (ret == -1) {
+ DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
+ name, value_str, value, strerror(errno));
+ return -1;
+ }
+ return 0;
+}
+
+/**
+ * Perform ifreq ioctl() on associated Ethernet device.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param req
+ * Request number to pass to ioctl().
+ * @param[out] ifr
+ * Interface request structure output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
+{
+ int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+ int ret = -1;
+
+ if (sock == -1)
+ return ret;
+ if (priv_get_ifname(priv, &ifr->ifr_name) == 0)
+ ret = ioctl(sock, req, ifr);
+ close(sock);
+ return ret;
+}
+
+/**
+ * Get device MTU.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[out] mtu
+ * MTU value output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_mtu(struct priv *priv, uint16_t *mtu)
+{
+ unsigned long ulong_mtu;
+
+ if (priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu) == -1)
+ return -1;
+ *mtu = ulong_mtu;
+ return 0;
+}
+
+/**
+ * Set device flags.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param keep
+ * Bitmask for flags that must remain untouched.
+ * @param flags
+ * Bitmask for flags to modify.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
+{
+ unsigned long tmp;
+
+ if (priv_get_sysfs_ulong(priv, "flags", &tmp) == -1)
+ return -1;
+ tmp &= keep;
+ tmp |= flags;
+ return priv_set_sysfs_ulong(priv, "flags", tmp);
+}
+
+/**
+ * Get PCI information from struct ibv_device.
+ *
+ * @param device
+ * Pointer to Ethernet device structure.
+ * @param[out] pci_addr
+ * PCI bus address output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
+ struct rte_pci_addr *pci_addr)
+{
+ FILE *file;
+ char line[32];
+ MKSTR(path, "%s/device/uevent", device->ibdev_path);
+
+ file = fopen(path, "rb");
+ if (file == NULL)
+ return -1;
+ while (fgets(line, sizeof(line), file) == line) {
+ size_t len = strlen(line);
+ int ret;
+
+ /* Truncate long lines. */
+ if (len == (sizeof(line) - 1))
+ while (line[(len - 1)] != '\n') {
+ ret = fgetc(file);
+ if (ret == EOF)
+ break;
+ line[(len - 1)] = ret;
+ }
+ /* Extract information. */
+ if (sscanf(line,
+ "PCI_SLOT_NAME="
+ "%" SCNx16 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+ &pci_addr->domain,
+ &pci_addr->bus,
+ &pci_addr->devid,
+ &pci_addr->function) == 4) {
+ ret = 0;
+ break;
+ }
+ }
+ fclose(file);
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
new file mode 100644
index 0000000..f7e1cf6
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -0,0 +1,150 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <stdint.h>
+#include <string.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+
+/**
+ * Get MAC address by querying netdevice.
+ *
+ * @param[in] priv
+ * struct priv for the requested device.
+ * @param[out] mac
+ * MAC address output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+ struct ifreq request;
+
+ if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
+ return -1;
+ memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+ return 0;
+}
+
+/**
+ * Unregister a MAC address.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
+{
+ assert(mac_index < RTE_DIM(priv->mac));
+ if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
+ return;
+ BITFIELD_RESET(priv->mac_configured, mac_index);
+}
+
+/**
+ * Register a MAC address.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mac_index
+ * MAC address index to use.
+ * @param mac
+ * MAC address to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
+ const uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+ unsigned int i;
+
+ assert(mac_index < RTE_DIM(priv->mac));
+ /* First, make sure this address isn't already configured. */
+ for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
+ /* Skip this index, it's going to be reconfigured. */
+ if (i == mac_index)
+ continue;
+ if (!BITFIELD_ISSET(priv->mac_configured, i))
+ continue;
+ if (memcmp(priv->mac[i].addr_bytes, *mac, sizeof(*mac)))
+ continue;
+ /* Address already configured elsewhere, return with error. */
+ return EADDRINUSE;
+ }
+ if (BITFIELD_ISSET(priv->mac_configured, mac_index))
+ priv_mac_addr_del(priv, mac_index);
+ priv->mac[mac_index] = (struct ether_addr){
+ {
+ (*mac)[0], (*mac)[1], (*mac)[2],
+ (*mac)[3], (*mac)[4], (*mac)[5]
+ }
+ };
+ BITFIELD_SET(priv->mac_configured, mac_index);
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
new file mode 100644
index 0000000..cc6aab6
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -0,0 +1,149 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_UTILS_H_
+#define RTE_PMD_MLX5_UTILS_H_
+
+#include <stddef.h>
+#include <stdio.h>
+#include <limits.h>
+#include <assert.h>
+#include <errno.h>
+
+#include "mlx5_defs.h"
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+ type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+ !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+ BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+ ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+ ~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+ ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/* Save and restore errno around argument evaluation. */
+#define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+ const char *n = s;
+
+ while (*n)
+ if (*(n++) == '/')
+ s = n;
+ return s;
+}
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG___(level, ...) \
+ ERRNO_SAFE(RTE_LOG(level, PMD, __VA_ARGS__))
+#define PMD_DRV_LOG__(level, ...) \
+ PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, s, ...) \
+ PMD_DRV_LOG__(level, \
+ s "\n" PMD_DRV_LOG_COMMA \
+ pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+ __LINE__ PMD_DRV_LOG_COMMA \
+ __func__, \
+ __VA_ARGS__)
+
+#else /* NDEBUG */
+
+#define PMD_DRV_LOG___(level, ...) \
+ ERRNO_SAFE(RTE_LOG(level, PMD, MLX5_DRIVER_NAME ": " __VA_ARGS__))
+#define PMD_DRV_LOG__(level, ...) \
+ PMD_DRV_LOG___(level, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, s, ...) \
+ PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Generic printf()-like logging macro with automatic line feed. */
+#define PMD_DRV_LOG(level, ...) \
+ PMD_DRV_LOG_(level, \
+ __VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+ PMD_DRV_LOG_CPAREN)
+
+/*
+ * Like assert(), DEBUG() becomes a no-op and claim_zero() does not perform
+ * any check when debugging is disabled.
+ */
+#ifndef NDEBUG
+
+#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__)
+#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
+#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+ char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
+ \
+ snprintf(name, sizeof(name), __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_UTILS_H_ */
diff --git a/drivers/net/mlx5/rte_pmd_mlx5_version.map b/drivers/net/mlx5/rte_pmd_mlx5_version.map
new file mode 100644
index 0000000..ad607bb
--- /dev/null
+++ b/drivers/net/mlx5/rte_pmd_mlx5_version.map
@@ -0,0 +1,3 @@
+DPDK_2.2 {
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9e1909e..724efa7 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,10 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -libverbs
endif # ! CONFIG_RTE_BUILD_SHARED_LIBS
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -libverbs
+endif # ! CONFIG_RTE_BUILD_SHARED_LIBS
+
_LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += -lz
_LDLIBS-y += --start-group
@@ -137,6 +141,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += -lrte_pmd_fm10k
_LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += -lrte_pmd_ixgbe
_LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += -lrte_pmd_e1000
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5
_LDLIBS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += -lrte_pmd_mpipe -lgxio
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING) += -lrte_pmd_ring
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/13] mlx5: add non-scattered TX and RX support
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
@ 2015-10-05 17:52 ` Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 03/13] mlx5: add MAC handling Adrien Mazarguil
` (11 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:52 UTC (permalink / raw)
To: dev
RSS implementation with parent/child QPs comes from mlx4 and is temporary.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
config/common_bsdapp | 3 +
config/common_linuxapp | 3 +
drivers/net/mlx5/Makefile | 15 +
drivers/net/mlx5/mlx5.c | 40 +++
drivers/net/mlx5/mlx5.h | 25 ++
drivers/net/mlx5/mlx5_defs.h | 24 ++
drivers/net/mlx5/mlx5_rxq.c | 682 ++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.c | 495 ++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 161 ++++++++++
drivers/net/mlx5/mlx5_txq.c | 512 +++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_utils.h | 11 +
11 files changed, 1971 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_rxq.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.h
create mode 100644 drivers/net/mlx5/mlx5_txq.c
diff --git a/config/common_bsdapp b/config/common_bsdapp
index 1e6885f..3b50ff9 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -218,6 +218,9 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
CONFIG_RTE_LIBRTE_MLX5_PMD=n
CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
#
# Compile burst-oriented Broadcom PMD driver
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7da7ba7..eed8fc0 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -216,6 +216,9 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
CONFIG_RTE_LIBRTE_MLX5_PMD=n
CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
#
# Compile burst-oriented Broadcom PMD driver
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 6e63073..7b9c57b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -42,6 +42,9 @@ LIB = librte_pmd_mlx5.a
# Sources.
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
@@ -79,6 +82,18 @@ else
CFLAGS += -DNDEBUG -UPEDANTIC
endif
+ifdef CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
+CFLAGS += -DMLX5_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N)
+endif
+
+ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
+CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
+endif
+
+ifdef CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE
+CFLAGS += -DMLX5_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE)
+endif
+
include $(RTE_SDK)/mk/rte.lib.mk
# Generate and clean-up mlx5_autoconf.h.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f6fe8a6..31ce5ec 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -63,6 +63,7 @@
#include "mlx5.h"
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
#include "mlx5_autoconf.h"
/**
@@ -77,11 +78,46 @@ static void
mlx5_dev_close(struct rte_eth_dev *dev)
{
struct priv *priv = dev->data->dev_private;
+ void *tmp;
+ unsigned int i;
priv_lock(priv);
DEBUG("%p: closing device \"%s\"",
(void *)dev,
((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+ /* Prevent crashes when queues are still in use. */
+ dev->rx_pkt_burst = removed_rx_burst;
+ dev->tx_pkt_burst = removed_tx_burst;
+ if (priv->rxqs != NULL) {
+ /* XXX race condition if mlx5_rx_burst() is still running. */
+ usleep(1000);
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ tmp = (*priv->rxqs)[i];
+ if (tmp == NULL)
+ continue;
+ (*priv->rxqs)[i] = NULL;
+ rxq_cleanup(tmp);
+ rte_free(tmp);
+ }
+ priv->rxqs_n = 0;
+ priv->rxqs = NULL;
+ }
+ if (priv->txqs != NULL) {
+ /* XXX race condition if mlx5_tx_burst() is still running. */
+ usleep(1000);
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ tmp = (*priv->txqs)[i];
+ if (tmp == NULL)
+ continue;
+ (*priv->txqs)[i] = NULL;
+ txq_cleanup(tmp);
+ rte_free(tmp);
+ }
+ priv->txqs_n = 0;
+ priv->txqs = NULL;
+ }
+ if (priv->rss)
+ rxq_cleanup(&priv->rxq_parent);
if (priv->pd != NULL) {
assert(priv->ctx != NULL);
claim_zero(ibv_dealloc_pd(priv->pd));
@@ -94,6 +130,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
static const struct eth_dev_ops mlx5_dev_ops = {
.dev_close = mlx5_dev_close,
+ .rx_queue_setup = mlx5_rx_queue_setup,
+ .tx_queue_setup = mlx5_tx_queue_setup,
+ .rx_queue_release = mlx5_rx_queue_release,
+ .tx_queue_release = mlx5_tx_queue_release,
};
static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ef7975d..7e60e70 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -63,6 +63,7 @@
#endif
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
#include "mlx5_autoconf.h"
#include "mlx5_defs.h"
@@ -108,9 +109,33 @@ struct priv {
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
+ /* RX/TX queues. */
+ struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
+ unsigned int rxqs_n; /* RX queues array size. */
+ unsigned int txqs_n; /* TX queues array size. */
+ struct rxq *(*rxqs)[]; /* RX queues. */
+ struct txq *(*txqs)[]; /* TX queues. */
rte_spinlock_t lock; /* Lock for control functions. */
};
+/* Work Request ID data type (64 bit). */
+typedef union {
+ struct {
+ uint32_t id;
+ uint16_t offset;
+ } data;
+ uint64_t raw;
+} wr_id_t;
+
+/* Compile-time check. */
+static inline void wr_id_t_check(void)
+{
+ wr_id_t check[1 + (2 * -!(sizeof(wr_id_t) == sizeof(uint64_t)))];
+
+ (void)check;
+ (void)wr_id_t_check;
+}
+
/**
* Lock private structure to protect it from concurrent access in the
* control path.
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 987ddcf..4f13a4e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -50,4 +50,28 @@
/* Maximum number of simultaneous VLAN filters supported. See above. */
#define MLX5_MAX_VLAN_IDS 127
+/* Request send completion once in every 64 sends, might be less. */
+#define MLX5_PMD_TX_PER_COMP_REQ 64
+
+/* Maximum number of Scatter/Gather Elements per Work Request. */
+#ifndef MLX5_PMD_SGE_WR_N
+#define MLX5_PMD_SGE_WR_N 4
+#endif
+
+/* Maximum size for inline data. */
+#ifndef MLX5_PMD_MAX_INLINE
+#define MLX5_PMD_MAX_INLINE 0
+#endif
+
+/*
+ * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
+ * from which buffers are to be transmitted will have to be mapped by this
+ * driver to their own Memory Region (MR). This is a slow operation.
+ *
+ * This value is always 1 for RX queues.
+ */
+#ifndef MLX5_PMD_TX_MP_CACHE
+#define MLX5_PMD_TX_MP_CACHE 8
+#endif
+
#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
new file mode 100644
index 0000000..01cc649
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -0,0 +1,682 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+#include "mlx5_defs.h"
+
+/**
+ * Allocate RX queue elements.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ * @param[in] pool
+ * If not NULL, fetch buffers from this array instead of allocating them
+ * with rte_pktmbuf_alloc().
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
+{
+ unsigned int i;
+ struct rxq_elt (*elts)[elts_n] =
+ rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+ rxq->socket);
+ int ret = 0;
+
+ if (elts == NULL) {
+ ERROR("%p: can't allocate packets array", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* For each WR (packet). */
+ for (i = 0; (i != elts_n); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct ibv_recv_wr *wr = &elt->wr;
+ struct ibv_sge *sge = &(*elts)[i].sge;
+ struct rte_mbuf *buf;
+
+ if (pool != NULL) {
+ buf = *(pool++);
+ assert(buf != NULL);
+ rte_pktmbuf_reset(buf);
+ } else
+ buf = rte_pktmbuf_alloc(rxq->mp);
+ if (buf == NULL) {
+ assert(pool == NULL);
+ ERROR("%p: empty mbuf pool", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* Configure WR. Work request ID contains its own index in
+ * the elts array and the offset between SGE buffer header and
+ * its data. */
+ WR_ID(wr->wr_id).id = i;
+ WR_ID(wr->wr_id).offset =
+ (((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
+ (uintptr_t)buf);
+ wr->next = &(*elts)[(i + 1)].wr;
+ wr->sg_list = sge;
+ wr->num_sge = 1;
+ /* Headroom is reserved by rte_pktmbuf_alloc(). */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ /* Buffer is supposed to be empty. */
+ assert(rte_pktmbuf_data_len(buf) == 0);
+ assert(rte_pktmbuf_pkt_len(buf) == 0);
+ /* sge->addr must be able to store a pointer. */
+ assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+ /* SGE keeps its headroom. */
+ sge->addr = (uintptr_t)
+ ((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
+ sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
+ sge->lkey = rxq->mr->lkey;
+ /* Redundant check for tailroom. */
+ assert(sge->length == rte_pktmbuf_tailroom(buf));
+ /* Make sure elts index and SGE mbuf pointer can be deduced
+ * from WR ID. */
+ if ((WR_ID(wr->wr_id).id != i) ||
+ ((void *)((uintptr_t)sge->addr -
+ WR_ID(wr->wr_id).offset) != buf)) {
+ ERROR("%p: cannot store index and offset in WR ID",
+ (void *)rxq);
+ sge->addr = 0;
+ rte_pktmbuf_free(buf);
+ ret = EOVERFLOW;
+ goto error;
+ }
+ }
+ /* The last WR pointer must be NULL. */
+ (*elts)[(i - 1)].wr.next = NULL;
+ DEBUG("%p: allocated and configured %u single-segment WRs",
+ (void *)rxq, elts_n);
+ rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
+ rxq->elts.no_sp = elts;
+ assert(ret == 0);
+ return 0;
+error:
+ if (elts != NULL) {
+ assert(pool == NULL);
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf;
+
+ if (elt->sge.addr == 0)
+ continue;
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ buf = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+ rte_pktmbuf_free_seg(buf);
+ }
+ rte_free(elts);
+ }
+ DEBUG("%p: failed, freed everything", (void *)rxq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free RX queue elements.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+static void
+rxq_free_elts(struct rxq *rxq)
+{
+ unsigned int i;
+ unsigned int elts_n = rxq->elts_n;
+ struct rxq_elt (*elts)[elts_n] = rxq->elts.no_sp;
+
+ DEBUG("%p: freeing WRs", (void *)rxq);
+ rxq->elts_n = 0;
+ rxq->elts.no_sp = NULL;
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf;
+
+ if (elt->sge.addr == 0)
+ continue;
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ buf = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+ rte_pktmbuf_free_seg(buf);
+ }
+ rte_free(elts);
+}
+
+/**
+ * Clean up a RX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_cleanup(struct rxq *rxq)
+{
+ struct ibv_exp_release_intf_params params;
+
+ DEBUG("cleaning up %p", (void *)rxq);
+ rxq_free_elts(rxq);
+ if (rxq->if_qp != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_qp,
+ ¶ms));
+ }
+ if (rxq->if_cq != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_cq,
+ ¶ms));
+ }
+ if (rxq->qp != NULL) {
+ claim_zero(ibv_destroy_qp(rxq->qp));
+ }
+ if (rxq->cq != NULL)
+ claim_zero(ibv_destroy_cq(rxq->cq));
+ if (rxq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(rxq->priv->ctx,
+ rxq->rd,
+ &attr));
+ }
+ if (rxq->mr != NULL)
+ claim_zero(ibv_dereg_mr(rxq->mr));
+ memset(rxq, 0, sizeof(*rxq));
+}
+
+/**
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param cq
+ * Completion queue to associate with QP.
+ * @param desc
+ * Number of descriptors in QP (hint only).
+ *
+ * @return
+ * QP pointer or NULL in case of error.
+ */
+static struct ibv_qp *
+rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
+ struct ibv_exp_res_domain *rd)
+{
+ struct ibv_exp_qp_init_attr attr = {
+ /* CQ to be associated with the send queue. */
+ .send_cq = cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_recv_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
+ .pd = priv->pd,
+ .res_domain = rd,
+ };
+
+ return ibv_exp_create_qp(priv->ctx, &attr);
+}
+
+#ifdef RSS_SUPPORT
+
+/**
+ * Allocate a RSS Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param cq
+ * Completion queue to associate with QP.
+ * @param desc
+ * Number of descriptors in QP (hint only).
+ * @param parent
+ * If nonzero, create a parent QP, otherwise a child.
+ *
+ * @return
+ * QP pointer or NULL in case of error.
+ */
+static struct ibv_qp *
+rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
+ int parent, struct ibv_exp_res_domain *rd)
+{
+ struct ibv_exp_qp_init_attr attr = {
+ /* CQ to be associated with the send queue. */
+ .send_cq = cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_recv_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
+ IBV_EXP_QP_INIT_ATTR_QPG),
+ .pd = priv->pd,
+ .res_domain = rd,
+ };
+
+ if (parent) {
+ attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
+ /* TSS isn't necessary. */
+ attr.qpg.parent_attrib.tss_child_count = 0;
+ attr.qpg.parent_attrib.rss_child_count = priv->rxqs_n;
+ DEBUG("initializing parent RSS queue");
+ } else {
+ attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
+ attr.qpg.qpg_parent = priv->rxq_parent.qp;
+ DEBUG("initializing child RSS queue");
+ }
+ return ibv_exp_create_qp(priv->ctx, &attr);
+}
+
+#endif /* RSS_SUPPORT */
+
+/**
+ * Configure a RX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ * @param mp
+ * Memory pool for buffer allocations.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
+ unsigned int socket, const struct rte_eth_rxconf *conf,
+ struct rte_mempool *mp)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rxq tmpl = {
+ .priv = priv,
+ .mp = mp,
+ .socket = socket
+ };
+ struct ibv_exp_qp_attr mod;
+ union {
+ struct ibv_exp_query_intf_params params;
+ struct ibv_exp_cq_init_attr cq;
+ struct ibv_exp_res_domain_init_attr rd;
+ } attr;
+ enum ibv_exp_query_intf_status status;
+ struct ibv_recv_wr *bad_wr;
+ struct rte_mbuf *buf;
+ int ret = 0;
+ int parent = (rxq == &priv->rxq_parent);
+
+ (void)conf; /* Thresholds configuration (ignored). */
+ /*
+ * If this is a parent queue, hardware must support RSS and
+ * RSS must be enabled.
+ */
+ assert((!parent) || ((priv->hw_rss) && (priv->rss)));
+ if (parent) {
+ /* Even if unused, ibv_create_cq() requires at least one
+ * descriptor. */
+ desc = 1;
+ goto skip_mr;
+ }
+ if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
+ ERROR("%p: invalid number of RX descriptors (must be a"
+ " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
+ return EINVAL;
+ }
+ /* Get mbuf length. */
+ buf = rte_pktmbuf_alloc(mp);
+ if (buf == NULL) {
+ ERROR("%p: unable to allocate mbuf", (void *)dev);
+ return ENOMEM;
+ }
+ tmpl.mb_len = buf->buf_len;
+ assert((rte_pktmbuf_headroom(buf) +
+ rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
+ assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
+ rte_pktmbuf_free(buf);
+ /* Use the entire RX mempool as the memory region. */
+ tmpl.mr = ibv_reg_mr(priv->pd,
+ (void *)mp->elt_va_start,
+ (mp->elt_va_end - mp->elt_va_start),
+ (IBV_ACCESS_LOCAL_WRITE |
+ IBV_ACCESS_REMOTE_WRITE));
+ if (tmpl.mr == NULL) {
+ ret = EINVAL;
+ ERROR("%p: MR creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+skip_mr:
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+ if (tmpl.cq == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: CQ creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ DEBUG("priv->device_attr.max_qp_wr is %d",
+ priv->device_attr.max_qp_wr);
+ DEBUG("priv->device_attr.max_sge is %d",
+ priv->device_attr.max_sge);
+#ifdef RSS_SUPPORT
+ if (priv->rss)
+ tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
+ tmpl.rd);
+ else
+#endif /* RSS_SUPPORT */
+ tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+ if (tmpl.qp == NULL) {
+ ret = (errno ? errno : EINVAL);
+ ERROR("%p: QP creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &mod,
+ (IBV_EXP_QP_STATE |
+#ifdef RSS_SUPPORT
+ (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+#endif /* RSS_SUPPORT */
+ IBV_EXP_QP_PORT));
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ /* Allocate descriptors for RX queues, except for the RSS parent. */
+ if (parent)
+ goto skip_alloc;
+ ret = rxq_alloc_elts(&tmpl, desc, NULL);
+ if (ret) {
+ ERROR("%p: RXQ allocation failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ ret = ibv_post_recv(tmpl.qp,
+ &(*tmpl.elts.no_sp)[0].wr,
+ &bad_wr);
+ if (ret) {
+ ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+ (void *)dev,
+ (void *)bad_wr,
+ strerror(ret));
+ goto error;
+ }
+skip_alloc:
+ mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ /* Save port ID. */
+ tmpl.port_id = dev->data->port_id;
+ DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ /* Clean up rxq in case we're reinitializing it. */
+ DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
+ rxq_cleanup(rxq);
+ *rxq = tmpl;
+ DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
+ assert(ret == 0);
+ return 0;
+error:
+ rxq_cleanup(&tmpl);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback to configure a RX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param idx
+ * RX queue index.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ * @param mp
+ * Memory pool for buffer allocations.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+ unsigned int socket, const struct rte_eth_rxconf *conf,
+ struct rte_mempool *mp)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rxq *rxq = (*priv->rxqs)[idx];
+ int ret;
+
+ priv_lock(priv);
+ DEBUG("%p: configuring queue %u for %u descriptors",
+ (void *)dev, idx, desc);
+ if (idx >= priv->rxqs_n) {
+ ERROR("%p: queue index out of range (%u >= %u)",
+ (void *)dev, idx, priv->rxqs_n);
+ priv_unlock(priv);
+ return -EOVERFLOW;
+ }
+ if (rxq != NULL) {
+ DEBUG("%p: reusing already allocated queue index %u (%p)",
+ (void *)dev, idx, (void *)rxq);
+ if (priv->started) {
+ priv_unlock(priv);
+ return -EEXIST;
+ }
+ (*priv->rxqs)[idx] = NULL;
+ rxq_cleanup(rxq);
+ } else {
+ rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+ if (rxq == NULL) {
+ ERROR("%p: unable to allocate queue index %u",
+ (void *)dev, idx);
+ priv_unlock(priv);
+ return -ENOMEM;
+ }
+ }
+ ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
+ if (ret)
+ rte_free(rxq);
+ else {
+ DEBUG("%p: adding RX queue %p to list",
+ (void *)dev, (void *)rxq);
+ (*priv->rxqs)[idx] = rxq;
+ /* Update receive callback. */
+ dev->rx_pkt_burst = mlx5_rx_burst;
+ }
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to release a RX queue.
+ *
+ * @param dpdk_rxq
+ * Generic RX queue pointer.
+ */
+void
+mlx5_rx_queue_release(void *dpdk_rxq)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct priv *priv;
+ unsigned int i;
+
+ if (rxq == NULL)
+ return;
+ priv = rxq->priv;
+ priv_lock(priv);
+ assert(rxq != &priv->rxq_parent);
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] == rxq) {
+ DEBUG("%p: removing RX queue %p from list",
+ (void *)priv->dev, (void *)rxq);
+ (*priv->rxqs)[i] = NULL;
+ break;
+ }
+ rxq_cleanup(rxq);
+ rte_free(rxq);
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
new file mode 100644
index 0000000..40bddf0
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -0,0 +1,495 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_prefetch.h>
+#include <rte_common.h>
+#include <rte_branch_prediction.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
+
+/**
+ * Manage TX completions.
+ *
+ * When sending a burst, mlx5_tx_burst() posts several WRs.
+ * To improve performance, a completion event is only required once every
+ * MLX5_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ *
+ * @return
+ * 0 on success, -1 on failure.
+ */
+static int
+txq_complete(struct txq *txq)
+{
+ unsigned int elts_comp = txq->elts_comp;
+ unsigned int elts_tail = txq->elts_tail;
+ const unsigned int elts_n = txq->elts_n;
+ int wcs_n;
+
+ if (unlikely(elts_comp == 0))
+ return 0;
+#ifdef DEBUG_SEND
+ DEBUG("%p: processing %u work requests completions",
+ (void *)txq, elts_comp);
+#endif
+ wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
+ if (unlikely(wcs_n == 0))
+ return 0;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)txq, wcs_n);
+ return -1;
+ }
+ elts_comp -= wcs_n;
+ assert(elts_comp <= txq->elts_comp);
+ /*
+ * Assume WC status is successful as nothing can be done about it
+ * anyway.
+ */
+ elts_tail += wcs_n * txq->elts_comp_cd_init;
+ if (elts_tail >= elts_n)
+ elts_tail -= elts_n;
+ txq->elts_tail = elts_tail;
+ txq->elts_comp = elts_comp;
+ return 0;
+}
+
+/**
+ * Get Memory Region (MR) <-> Memory Pool (MP) association from txq->mp2mr[].
+ * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
+ * remove an entry first.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param[in] mp
+ * Memory Pool for which a Memory Region lkey must be returned.
+ *
+ * @return
+ * mr->lkey on success, (uint32_t)-1 on failure.
+ */
+static uint32_t
+txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
+{
+ unsigned int i;
+ struct ibv_mr *mr;
+
+ for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+ if (unlikely(txq->mp2mr[i].mp == NULL)) {
+ /* Unknown MP, add a new MR for it. */
+ break;
+ }
+ if (txq->mp2mr[i].mp == mp) {
+ assert(txq->mp2mr[i].lkey != (uint32_t)-1);
+ assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
+ return txq->mp2mr[i].lkey;
+ }
+ }
+ /* Add a new entry, register MR first. */
+ DEBUG("%p: discovered new memory pool %p", (void *)txq, (void *)mp);
+ mr = ibv_reg_mr(txq->priv->pd,
+ (void *)mp->elt_va_start,
+ (mp->elt_va_end - mp->elt_va_start),
+ (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+ if (unlikely(mr == NULL)) {
+ DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
+ (void *)txq);
+ return (uint32_t)-1;
+ }
+ if (unlikely(i == RTE_DIM(txq->mp2mr))) {
+ /* Table is full, remove oldest entry. */
+ DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
+ (void *)txq);
+ --i;
+ claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+ memmove(&txq->mp2mr[0], &txq->mp2mr[1],
+ (sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
+ }
+ /* Store the new entry. */
+ txq->mp2mr[i].mp = mp;
+ txq->mp2mr[i].mr = mr;
+ txq->mp2mr[i].lkey = mr->lkey;
+ DEBUG("%p: new MR lkey for MP %p: 0x%08" PRIu32,
+ (void *)txq, (void *)mp, txq->mp2mr[i].lkey);
+ return txq->mp2mr[i].lkey;
+}
+
+/**
+ * DPDK callback for TX.
+ *
+ * @param dpdk_txq
+ * Generic pointer to TX queue structure.
+ * @param[in] pkts
+ * Packets to transmit.
+ * @param pkts_n
+ * Number of packets in array.
+ *
+ * @return
+ * Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct txq *txq = (struct txq *)dpdk_txq;
+ unsigned int elts_head = txq->elts_head;
+ const unsigned int elts_tail = txq->elts_tail;
+ const unsigned int elts_n = txq->elts_n;
+ unsigned int elts_comp_cd = txq->elts_comp_cd;
+ unsigned int elts_comp = 0;
+ unsigned int i;
+ unsigned int max;
+ int err;
+
+ assert(elts_comp_cd != 0);
+ txq_complete(txq);
+ max = (elts_n - (elts_head - elts_tail));
+ if (max > elts_n)
+ max -= elts_n;
+ assert(max >= 1);
+ assert(max <= elts_n);
+ /* Always leave one free entry in the ring. */
+ --max;
+ if (max == 0)
+ return 0;
+ if (max > pkts_n)
+ max = pkts_n;
+ for (i = 0; (i != max); ++i) {
+ struct rte_mbuf *buf = pkts[i];
+ unsigned int elts_head_next =
+ (((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+ struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
+ struct txq_elt *elt = &(*txq->elts)[elts_head];
+ unsigned int segs = NB_SEGS(buf);
+ uint32_t send_flags = 0;
+
+ /* Clean up old buffer. */
+ if (likely(elt->buf != NULL)) {
+ struct rte_mbuf *tmp = elt->buf;
+
+ /* Faster than rte_pktmbuf_free(). */
+ do {
+ struct rte_mbuf *next = NEXT(tmp);
+
+ rte_pktmbuf_free_seg(tmp);
+ tmp = next;
+ } while (tmp != NULL);
+ }
+ /* Request TX completion. */
+ if (unlikely(--elts_comp_cd == 0)) {
+ elts_comp_cd = txq->elts_comp_cd_init;
+ ++elts_comp;
+ send_flags |= IBV_EXP_QP_BURST_SIGNALED;
+ }
+ if (likely(segs == 1)) {
+ uintptr_t addr;
+ uint32_t length;
+ uint32_t lkey;
+
+ /* Retrieve buffer information. */
+ addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ length = DATA_LEN(buf);
+ /* Retrieve Memory Region key for this memory pool. */
+ lkey = txq_mp2mr(txq, buf->pool);
+ if (unlikely(lkey == (uint32_t)-1)) {
+ /* MR does not exist. */
+ DEBUG("%p: unable to get MP <-> MR"
+ " association", (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* Update element. */
+ elt->buf = buf;
+ if (txq->priv->vf)
+ rte_prefetch0((volatile void *)
+ (uintptr_t)addr);
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+ /* Put packet into send queue. */
+#if MLX5_PMD_MAX_INLINE > 0
+ if (length <= txq->max_inline)
+ err = txq->if_qp->send_pending_inline
+ (txq->qp,
+ (void *)addr,
+ length,
+ send_flags);
+ else
+#endif
+ err = txq->if_qp->send_pending
+ (txq->qp,
+ addr,
+ length,
+ lkey,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
+ } else {
+ DEBUG("%p: TX scattered buffers support not"
+ " compiled in", (void *)txq);
+ goto stop;
+ }
+ elts_head = elts_head_next;
+ }
+stop:
+ /* Take a shortcut if nothing must be sent. */
+ if (unlikely(i == 0))
+ return 0;
+ /* Ring QP doorbell. */
+ err = txq->if_qp->send_flush(txq->qp);
+ if (unlikely(err)) {
+ /* A nonzero value is not supposed to be returned.
+ * Nothing can be done about it. */
+ DEBUG("%p: send_flush() failed with error %d",
+ (void *)txq, err);
+ }
+ txq->elts_head = elts_head;
+ txq->elts_comp += elts_comp;
+ txq->elts_comp_cd = elts_comp_cd;
+ return i;
+}
+
+/**
+ * DPDK callback for RX.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
+ struct ibv_sge sges[pkts_n];
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
+
+ for (i = 0; (i != pkts_n); ++i) {
+ struct rxq_elt *elt = &(*elts)[elts_head];
+ struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
+ struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(wr_id).offset);
+ struct rte_mbuf *rep;
+ uint32_t flags;
+
+ /* Sanity checks. */
+ assert(WR_ID(wr_id).id < rxq->elts_n);
+ assert(wr->sg_list == &elt->sge);
+ assert(wr->num_sge == 1);
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+ goto repost;
+ }
+ ret = wc.byte_len;
+ }
+ if (ret == 0)
+ break;
+ len = ret;
+ /*
+ * Fetch initial bytes of packet descriptor into a
+ * cacheline while allocating rep.
+ */
+ rte_prefetch0(seg);
+ rep = __rte_mbuf_raw_alloc(rxq->mp);
+ if (unlikely(rep == NULL)) {
+ /*
+ * Unable to allocate a replacement mbuf,
+ * repost WR.
+ */
+ DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
+ " can't allocate a new mbuf",
+ (void *)rxq, WR_ID(wr_id).id);
+ /* Increment out of memory counters. */
+ ++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+ goto repost;
+ }
+
+ /* Reconfigure sge to use rep instead of seg. */
+ elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
+ assert(elt->sge.lkey == rxq->mr->lkey);
+ WR_ID(wr->wr_id).offset =
+ (((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
+ (uintptr_t)rep);
+ assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+
+ /* Update seg information. */
+ SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
+ NB_SEGS(seg) = 1;
+ PORT(seg) = rxq->port_id;
+ NEXT(seg) = NULL;
+ PKT_LEN(seg) = len;
+ DATA_LEN(seg) = len;
+
+ /* Return packet. */
+ *(pkts++) = seg;
+ ++pkts_ret;
+repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
+ continue;
+ }
+ if (unlikely(i == 0))
+ return 0;
+ /* Repost WRs. */
+#ifdef DEBUG_RECV
+ DEBUG("%p: reposting %u WRs", (void *)rxq, i);
+#endif
+ ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+ if (unlikely(ret)) {
+ /* Inability to repost WRs is fatal. */
+ DEBUG("%p: recv_burst(): failed (ret=%d)",
+ (void *)rxq->priv,
+ ret);
+ abort();
+ }
+ rxq->elts_head = elts_head;
+ return pkts_ret;
+}
+
+/**
+ * Dummy DPDK callback for TX.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_txq
+ * Generic pointer to TX queue structure.
+ * @param[in] pkts
+ * Packets to transmit.
+ * @param pkts_n
+ * Number of packets in array.
+ *
+ * @return
+ * Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+removed_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ (void)dpdk_txq;
+ (void)pkts;
+ (void)pkts_n;
+ return 0;
+}
+
+/**
+ * Dummy DPDK callback for RX.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ (void)dpdk_rxq;
+ (void)pkts;
+ (void)pkts_n;
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
new file mode 100644
index 0000000..08a4911
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -0,0 +1,161 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_RXTX_H_
+#define RTE_PMD_MLX5_RXTX_H_
+
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+#include "mlx5_defs.h"
+
+/* RX element. */
+struct rxq_elt {
+ struct ibv_recv_wr wr; /* Work Request. */
+ struct ibv_sge sge; /* Scatter/Gather Element. */
+ /* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+};
+
+struct priv;
+
+/* RX queue descriptor. */
+struct rxq {
+ struct priv *priv; /* Back pointer to private data. */
+ struct rte_mempool *mp; /* Memory Pool for allocations. */
+ struct ibv_mr *mr; /* Memory Region (for mp). */
+ struct ibv_cq *cq; /* Completion Queue. */
+ struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+ /*
+ * Each VLAN ID requires a separate flow steering rule.
+ */
+ BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
+ struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+ unsigned int port_id; /* Port ID for incoming packets. */
+ unsigned int elts_n; /* (*elts)[] length. */
+ unsigned int elts_head; /* Current index in (*elts)[]. */
+ union {
+ struct rxq_elt (*no_sp)[]; /* RX elements. */
+ } elts;
+ uint32_t mb_len; /* Length of a mp-issued mbuf. */
+ unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
+};
+
+/* TX element. */
+struct txq_elt {
+ struct rte_mbuf *buf;
+};
+
+/* Linear buffer type. It is used when transmitting buffers with too many
+ * segments that do not fit the hardware queue (see max_send_sge).
+ * Extra segments are copied (linearized) in such buffers, replacing the
+ * last SGE during TX.
+ * The size is arbitrary but large enough to hold a jumbo frame with
+ * 8 segments considering mbuf.buf_len is about 2048 bytes. */
+typedef uint8_t linear_t[16384];
+
+/* TX queue descriptor. */
+struct txq {
+ struct priv *priv; /* Back pointer to private data. */
+ struct {
+ struct rte_mempool *mp; /* Cached Memory Pool. */
+ struct ibv_mr *mr; /* Memory Region (for mp). */
+ uint32_t lkey; /* mr->lkey */
+ } mp2mr[MLX5_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
+ struct ibv_cq *cq; /* Completion Queue. */
+ struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+#if MLX5_PMD_MAX_INLINE > 0
+ uint32_t max_inline; /* Max inline send size <= MLX5_PMD_MAX_INLINE. */
+#endif
+ unsigned int elts_n; /* (*elts)[] length. */
+ struct txq_elt (*elts)[]; /* TX elements. */
+ unsigned int elts_head; /* Current index in (*elts)[]. */
+ unsigned int elts_tail; /* First element awaiting completion. */
+ unsigned int elts_comp; /* Number of completion requests. */
+ unsigned int elts_comp_cd; /* Countdown for next completion request. */
+ unsigned int elts_comp_cd_init; /* Initial value for countdown. */
+ linear_t (*elts_linear)[]; /* Linearized buffers. */
+ struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
+ unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
+};
+
+/* mlx5_rxq.c */
+
+void rxq_cleanup(struct rxq *);
+int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
+ const struct rte_eth_rxconf *, struct rte_mempool *);
+int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
+ const struct rte_eth_rxconf *, struct rte_mempool *);
+void mlx5_rx_queue_release(void *);
+
+/* mlx5_txq.c */
+
+void txq_cleanup(struct txq *);
+int mlx5_tx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
+ const struct rte_eth_txconf *);
+void mlx5_tx_queue_release(void *);
+
+/* mlx5_rxtx.c */
+
+uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
+
+#endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
new file mode 100644
index 0000000..2bae61f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -0,0 +1,512 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_autoconf.h"
+#include "mlx5_defs.h"
+
+/**
+ * Allocate TX queue elements.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+txq_alloc_elts(struct txq *txq, unsigned int elts_n)
+{
+ unsigned int i;
+ struct txq_elt (*elts)[elts_n] =
+ rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
+ linear_t (*elts_linear)[elts_n] =
+ rte_calloc_socket("TXQ", 1, sizeof(*elts_linear), 0,
+ txq->socket);
+ struct ibv_mr *mr_linear = NULL;
+ int ret = 0;
+
+ if ((elts == NULL) || (elts_linear == NULL)) {
+ ERROR("%p: can't allocate packets array", (void *)txq);
+ ret = ENOMEM;
+ goto error;
+ }
+ mr_linear =
+ ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
+ (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+ if (mr_linear == NULL) {
+ ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
+ (void *)txq);
+ ret = EINVAL;
+ goto error;
+ }
+ for (i = 0; (i != elts_n); ++i) {
+ struct txq_elt *elt = &(*elts)[i];
+
+ elt->buf = NULL;
+ }
+ DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
+ txq->elts_n = elts_n;
+ txq->elts = elts;
+ txq->elts_head = 0;
+ txq->elts_tail = 0;
+ txq->elts_comp = 0;
+ /* Request send completion every MLX5_PMD_TX_PER_COMP_REQ packets or
+ * at least 4 times per ring. */
+ txq->elts_comp_cd_init =
+ ((MLX5_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+ MLX5_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+ txq->elts_comp_cd = txq->elts_comp_cd_init;
+ txq->elts_linear = elts_linear;
+ txq->mr_linear = mr_linear;
+ assert(ret == 0);
+ return 0;
+error:
+ if (mr_linear != NULL)
+ claim_zero(ibv_dereg_mr(mr_linear));
+
+ rte_free(elts_linear);
+ rte_free(elts);
+
+ DEBUG("%p: failed, freed everything", (void *)txq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free TX queue elements.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ */
+static void
+txq_free_elts(struct txq *txq)
+{
+ unsigned int i;
+ unsigned int elts_n = txq->elts_n;
+ struct txq_elt (*elts)[elts_n] = txq->elts;
+ linear_t (*elts_linear)[elts_n] = txq->elts_linear;
+ struct ibv_mr *mr_linear = txq->mr_linear;
+
+ DEBUG("%p: freeing WRs", (void *)txq);
+ txq->elts_n = 0;
+ txq->elts = NULL;
+ txq->elts_linear = NULL;
+ txq->mr_linear = NULL;
+ if (mr_linear != NULL)
+ claim_zero(ibv_dereg_mr(mr_linear));
+
+ rte_free(elts_linear);
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct txq_elt *elt = &(*elts)[i];
+
+ if (elt->buf == NULL)
+ continue;
+ rte_pktmbuf_free(elt->buf);
+ }
+ rte_free(elts);
+}
+
+/**
+ * Clean up a TX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ */
+void
+txq_cleanup(struct txq *txq)
+{
+ struct ibv_exp_release_intf_params params;
+ size_t i;
+
+ DEBUG("cleaning up %p", (void *)txq);
+ txq_free_elts(txq);
+ if (txq->if_qp != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_qp,
+ ¶ms));
+ }
+ if (txq->if_cq != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_cq,
+ ¶ms));
+ }
+ if (txq->qp != NULL)
+ claim_zero(ibv_destroy_qp(txq->qp));
+ if (txq->cq != NULL)
+ claim_zero(ibv_destroy_cq(txq->cq));
+ if (txq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
+ txq->rd,
+ &attr));
+ }
+ for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+ if (txq->mp2mr[i].mp == NULL)
+ break;
+ assert(txq->mp2mr[i].mr != NULL);
+ claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+ }
+ memset(txq, 0, sizeof(*txq));
+}
+
+/**
+ * Configure a TX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
+ unsigned int socket, const struct rte_eth_txconf *conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct txq tmpl = {
+ .priv = priv,
+ .socket = socket
+ };
+ union {
+ struct ibv_exp_query_intf_params params;
+ struct ibv_exp_qp_init_attr init;
+ struct ibv_exp_res_domain_init_attr rd;
+ struct ibv_exp_cq_init_attr cq;
+ struct ibv_exp_qp_attr mod;
+ } attr;
+ enum ibv_exp_query_intf_status status;
+ int ret = 0;
+
+ (void)conf; /* Thresholds configuration (ignored). */
+ if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
+ ERROR("%p: invalid number of TX descriptors (must be a"
+ " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
+ return EINVAL;
+ }
+ desc /= MLX5_PMD_SGE_WR_N;
+ /* MRs will be registered in mp2mr[] later. */
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+ if (tmpl.cq == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: CQ creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ DEBUG("priv->device_attr.max_qp_wr is %d",
+ priv->device_attr.max_qp_wr);
+ DEBUG("priv->device_attr.max_sge is %d",
+ priv->device_attr.max_sge);
+ attr.init = (struct ibv_exp_qp_init_attr){
+ /* CQ to be associated with the send queue. */
+ .send_cq = tmpl.cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = tmpl.cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_send_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+#if MLX5_PMD_MAX_INLINE > 0
+ .max_inline_data = MLX5_PMD_MAX_INLINE,
+#endif
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ /* Do *NOT* enable this, completions events are managed per
+ * TX burst. */
+ .sq_sig_all = 0,
+ .pd = priv->pd,
+ .res_domain = tmpl.rd,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
+ };
+ tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
+ if (tmpl.qp == NULL) {
+ ret = (errno ? errno : EINVAL);
+ ERROR("%p: QP creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+#if MLX5_PMD_MAX_INLINE > 0
+ /* ibv_create_qp() updates this value. */
+ tmpl.max_inline = attr.init.cap.max_inline_data;
+#endif
+ attr.mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod,
+ (IBV_EXP_QP_STATE | IBV_EXP_QP_PORT));
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ ret = txq_alloc_elts(&tmpl, desc);
+ if (ret) {
+ ERROR("%p: TXQ allocation failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.mod.qp_state = IBV_QPS_RTS;
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ret = EINVAL;
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ret = EINVAL;
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ /* Clean up txq in case we're reinitializing it. */
+ DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
+ txq_cleanup(txq);
+ *txq = tmpl;
+ DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
+ assert(ret == 0);
+ return 0;
+error:
+ txq_cleanup(&tmpl);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param idx
+ * TX queue index.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+ unsigned int socket, const struct rte_eth_txconf *conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct txq *txq = (*priv->txqs)[idx];
+ int ret;
+
+ priv_lock(priv);
+ DEBUG("%p: configuring queue %u for %u descriptors",
+ (void *)dev, idx, desc);
+ if (idx >= priv->txqs_n) {
+ ERROR("%p: queue index out of range (%u >= %u)",
+ (void *)dev, idx, priv->txqs_n);
+ priv_unlock(priv);
+ return -EOVERFLOW;
+ }
+ if (txq != NULL) {
+ DEBUG("%p: reusing already allocated queue index %u (%p)",
+ (void *)dev, idx, (void *)txq);
+ if (priv->started) {
+ priv_unlock(priv);
+ return -EEXIST;
+ }
+ (*priv->txqs)[idx] = NULL;
+ txq_cleanup(txq);
+ } else {
+ txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+ if (txq == NULL) {
+ ERROR("%p: unable to allocate queue index %u",
+ (void *)dev, idx);
+ priv_unlock(priv);
+ return -ENOMEM;
+ }
+ }
+ ret = txq_setup(dev, txq, desc, socket, conf);
+ if (ret)
+ rte_free(txq);
+ else {
+ DEBUG("%p: adding TX queue %p to list",
+ (void *)dev, (void *)txq);
+ (*priv->txqs)[idx] = txq;
+ /* Update send callback. */
+ dev->tx_pkt_burst = mlx5_tx_burst;
+ }
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to release a TX queue.
+ *
+ * @param dpdk_txq
+ * Generic TX queue pointer.
+ */
+void
+mlx5_tx_queue_release(void *dpdk_txq)
+{
+ struct txq *txq = (struct txq *)dpdk_txq;
+ struct priv *priv;
+ unsigned int i;
+
+ if (txq == NULL)
+ return;
+ priv = txq->priv;
+ priv_lock(priv);
+ for (i = 0; (i != priv->txqs_n); ++i)
+ if ((*priv->txqs)[i] == txq) {
+ DEBUG("%p: removing TX queue %p from list",
+ (void *)priv->dev, (void *)txq);
+ (*priv->txqs)[i] = NULL;
+ break;
+ }
+ txq_cleanup(txq);
+ rte_free(txq);
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index cc6aab6..e48e6b6 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -140,10 +140,21 @@ pmd_drv_log_basename(const char *s)
#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+/* Convenience macros for accessing mbuf fields. */
+#define NEXT(m) ((m)->next)
+#define DATA_LEN(m) ((m)->data_len)
+#define PKT_LEN(m) ((m)->pkt_len)
+#define DATA_OFF(m) ((m)->data_off)
+#define SET_DATA_OFF(m, o) ((m)->data_off = (o))
+#define NB_SEGS(m) ((m)->nb_segs)
+#define PORT(m) ((m)->port)
+
/* Allocate a buffer on the stack and fill it with a printf format string. */
#define MKSTR(name, ...) \
char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
\
snprintf(name, sizeof(name), __VA_ARGS__)
+#define WR_ID(o) (((wr_id_t *)&(o))->data)
+
#endif /* RTE_PMD_MLX5_UTILS_H_ */
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/13] mlx5: add MAC handling
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
@ 2015-10-05 17:52 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
` (10 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:52 UTC (permalink / raw)
To: dev
This commit adds support for MAC flow steering rules mandatory for the RX
path as well as the related callbacks to add/remove MAC addresses.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/mlx5/mlx5.h | 5 +
drivers/net/mlx5/mlx5_mac.c | 347 ++++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 10 ++
4 files changed, 365 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 31ce5ec..fb306d4 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -134,6 +134,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
.tx_queue_release = mlx5_tx_queue_release,
+ .mac_addr_remove = mlx5_mac_addr_remove,
+ .mac_addr_add = mlx5_mac_addr_add,
};
static struct {
@@ -390,7 +392,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
claim_zero(priv_mac_addr_add(priv, 0,
(const uint8_t (*)[ETHER_ADDR_LEN])
mac.addr_bytes));
- claim_zero(priv_mac_addr_add(priv, 1,
+ claim_zero(priv_mac_addr_add(priv, (RTE_DIM(priv->mac) - 1),
&(const uint8_t [ETHER_ADDR_LEN])
{ "\xff\xff\xff\xff\xff\xff" }));
#ifndef NDEBUG
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 7e60e70..3e0c11e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -173,7 +173,12 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
/* mlx5_mac.c */
int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
+void rxq_mac_addrs_del(struct rxq *);
+void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
+int rxq_mac_addrs_add(struct rxq *);
int priv_mac_addr_add(struct priv *, unsigned int,
const uint8_t (*)[ETHER_ADDR_LEN]);
+void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
+ uint32_t);
#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index f7e1cf6..f01faf0 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -65,6 +65,8 @@
#include "mlx5.h"
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
/**
* Get MAC address by querying netdevice.
@@ -89,8 +91,86 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
}
/**
+ * Delete flow steering rule.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index.
+ * @param vlan_index
+ * VLAN index.
+ */
+static void
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+{
+#ifndef NDEBUG
+ struct priv *priv = rxq->priv;
+ const uint8_t (*mac)[ETHER_ADDR_LEN] =
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ priv->mac[mac_index].addr_bytes;
+#endif
+ assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
+ DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " (VLAN ID %" PRIu16 ")",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index, priv->vlan_filter[vlan_index].id);
+ claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
+ rxq->mac_flow[mac_index][vlan_index] = NULL;
+}
+
+/**
+ * Unregister a MAC address from a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ unsigned int vlans = 0;
+
+ assert(mac_index < RTE_DIM(priv->mac));
+ if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
+ return;
+ for (i = 0; (i != RTE_DIM(priv->vlan_filter)); ++i) {
+ if (!priv->vlan_filter[i].enabled)
+ continue;
+ rxq_del_flow(rxq, mac_index, i);
+ vlans++;
+ }
+ if (!vlans) {
+ rxq_del_flow(rxq, mac_index, 0);
+ }
+ BITFIELD_RESET(rxq->mac_configured, mac_index);
+}
+
+/**
+ * Unregister all MAC addresses from a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_mac_addrs_del(struct rxq *rxq)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+
+ for (i = 0; (i != RTE_DIM(priv->mac)); ++i)
+ rxq_mac_addr_del(rxq, i);
+}
+
+/**
* Unregister a MAC address.
*
+ * In RSS mode, the MAC address is unregistered from the parent queue,
+ * otherwise it is unregistered from each queue directly.
+ *
* @param priv
* Pointer to private structure.
* @param mac_index
@@ -99,15 +179,217 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
static void
priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
{
+ unsigned int i;
+
assert(mac_index < RTE_DIM(priv->mac));
if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
return;
+ if (priv->rss) {
+ rxq_mac_addr_del(&priv->rxq_parent, mac_index);
+ goto end;
+ }
+ for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
+ rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+end:
BITFIELD_RESET(priv->mac_configured, mac_index);
}
/**
+ * DPDK callback to remove a MAC address.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param index
+ * MAC address index.
+ */
+void
+mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ priv_lock(priv);
+ DEBUG("%p: removing MAC address from index %" PRIu32,
+ (void *)dev, index);
+ /* Last array entry is reserved for broadcast. */
+ if (index >= (RTE_DIM(priv->mac) - 1))
+ goto end;
+ priv_mac_addr_del(priv, index);
+end:
+ priv_unlock(priv);
+}
+
+/**
+ * Add single flow steering rule.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index to register.
+ * @param vlan_index
+ * VLAN index. Use -1 for a flow without VLAN.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
+{
+ struct ibv_flow *flow;
+ struct priv *priv = rxq->priv;
+ const uint8_t (*mac)[ETHER_ADDR_LEN] =
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ priv->mac[mac_index].addr_bytes;
+
+ /* Allocate flow specification on the stack. */
+ struct __attribute__((packed)) {
+ struct ibv_flow_attr attr;
+ struct ibv_flow_spec_eth spec;
+ } data;
+ struct ibv_flow_attr *attr = &data.attr;
+ struct ibv_flow_spec_eth *spec = &data.spec;
+
+ assert(mac_index < RTE_DIM(priv->mac));
+ assert((vlan_index < RTE_DIM(priv->vlan_filter)) || (vlan_index == -1u));
+ /*
+ * No padding must be inserted by the compiler between attr and spec.
+ * This layout is expected by libibverbs.
+ */
+ assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
+ *attr = (struct ibv_flow_attr){
+ .type = IBV_FLOW_ATTR_NORMAL,
+ .num_of_specs = 1,
+ .port = priv->port,
+ .flags = 0
+ };
+ *spec = (struct ibv_flow_spec_eth){
+ .type = IBV_FLOW_SPEC_ETH,
+ .size = sizeof(*spec),
+ .val = {
+ .dst_mac = {
+ (*mac)[0], (*mac)[1], (*mac)[2],
+ (*mac)[3], (*mac)[4], (*mac)[5]
+ },
+ .vlan_tag = ((vlan_index != -1u) ?
+ htons(priv->vlan_filter[vlan_index].id) :
+ 0),
+ },
+ .mask = {
+ .dst_mac = "\xff\xff\xff\xff\xff\xff",
+ .vlan_tag = ((vlan_index != -1u) ? htons(0xfff) : 0),
+ }
+ };
+ DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " (VLAN %s %" PRIu16 ")",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index,
+ ((vlan_index != -1u) ? "ID" : "index"),
+ ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
+ /* Create related flow. */
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ if (vlan_index == -1u)
+ vlan_index = 0;
+ assert(rxq->mac_flow[mac_index][vlan_index] == NULL);
+ rxq->mac_flow[mac_index][vlan_index] = flow;
+ return 0;
+}
+
+/**
+ * Register a MAC address in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ unsigned int vlans = 0;
+ int ret;
+
+ assert(mac_index < RTE_DIM(priv->mac));
+ if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
+ rxq_mac_addr_del(rxq, mac_index);
+ /* Fill VLAN specifications. */
+ for (i = 0; (i != RTE_DIM(priv->vlan_filter)); ++i) {
+ if (!priv->vlan_filter[i].enabled)
+ continue;
+ /* Create related flow. */
+ ret = rxq_add_flow(rxq, mac_index, i);
+ if (!ret) {
+ vlans++;
+ continue;
+ }
+ /* Failure, rollback. */
+ while (i != 0)
+ if (priv->vlan_filter[--i].enabled)
+ rxq_del_flow(rxq, mac_index, i);
+ assert(ret > 0);
+ return ret;
+ }
+ /* In case there is no VLAN filter. */
+ if (!vlans) {
+ ret = rxq_add_flow(rxq, mac_index, -1);
+ if (ret)
+ return ret;
+ }
+ BITFIELD_SET(rxq->mac_configured, mac_index);
+ return 0;
+}
+
+/**
+ * Register all MAC addresses in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_mac_addrs_add(struct rxq *rxq)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ int ret;
+
+ for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
+ if (!BITFIELD_ISSET(priv->mac_configured, i))
+ continue;
+ ret = rxq_mac_addr_add(rxq, i);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ rxq_mac_addr_del(rxq, --i);
+ assert(ret > 0);
+ return ret;
+ }
+ return 0;
+}
+
+/**
* Register a MAC address.
*
+ * In RSS mode, the MAC address is registered in the parent queue,
+ * otherwise it is registered in each queue directly.
+ *
* @param priv
* Pointer to private structure.
* @param mac_index
@@ -123,6 +405,7 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
const uint8_t (*mac)[ETHER_ADDR_LEN])
{
unsigned int i;
+ int ret;
assert(mac_index < RTE_DIM(priv->mac));
/* First, make sure this address isn't already configured. */
@@ -145,6 +428,70 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
(*mac)[3], (*mac)[4], (*mac)[5]
}
};
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started) {
+#ifndef NDEBUG
+ /* Verify that all queues have this index disabled. */
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ assert(!BITFIELD_ISSET
+ ((*priv->rxqs)[i]->mac_configured, mac_index));
+ }
+#endif
+ goto end;
+ }
+ if (priv->rss) {
+ ret = rxq_mac_addr_add(&priv->rxq_parent, mac_index);
+ if (ret)
+ return ret;
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[(--i)] != NULL)
+ rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+ return ret;
+ }
+end:
BITFIELD_SET(priv->mac_configured, mac_index);
return 0;
}
+
+/**
+ * DPDK callback to add a MAC address.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param mac_addr
+ * MAC address to register.
+ * @param index
+ * MAC address index.
+ * @param vmdq
+ * VMDq pool index to associate address with (ignored).
+ */
+void
+mlx5_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+ uint32_t index, uint32_t vmdq)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ (void)vmdq;
+ priv_lock(priv);
+ DEBUG("%p: adding MAC address at index %" PRIu32,
+ (void *)dev, index);
+ /* Last array entry is reserved for broadcast. */
+ if (index >= (RTE_DIM(priv->mac) - 1))
+ goto end;
+ priv_mac_addr_add(priv, index,
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ mac_addr->addr_bytes);
+end:
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 01cc649..8450fe3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -248,6 +248,7 @@ rxq_cleanup(struct rxq *rxq)
¶ms));
}
if (rxq->qp != NULL) {
+ rxq_mac_addrs_del(rxq);
claim_zero(ibv_destroy_qp(rxq->qp));
}
if (rxq->cq != NULL)
@@ -515,6 +516,15 @@ skip_mr:
(void *)dev, strerror(ret));
goto error;
}
+ if ((parent) || (!priv->rss)) {
+ /* Configure MAC and broadcast addresses. */
+ ret = rxq_mac_addrs_add(&tmpl);
+ if (ret) {
+ ERROR("%p: QP flow attachment failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ }
/* Allocate descriptors for RX queues, except for the RSS parent. */
if (parent)
goto skip_alloc;
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/13] mlx5: add device configure/start/stop
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (2 preceding siblings ...)
2015-10-05 17:52 ` [PATCH 03/13] mlx5: add MAC handling Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
` (9 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev; +Cc: Francesco Santoro
This commit adds the remaining missing callbacks to make mlx5 usable.
Like mlx4, device start and stop are implemented on top of MAC RX flows.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Francesco Santoro <francesco.santoro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 4 ++
drivers/net/mlx5/mlx5.h | 7 ++
drivers/net/mlx5/mlx5_ethdev.c | 148 ++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_trigger.c | 145 +++++++++++++++++++++++++++++++++++++++
5 files changed, 305 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_trigger.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 7b9c57b..028c22c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -45,6 +45,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fb306d4..56cea7c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -129,7 +129,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
}
static const struct eth_dev_ops mlx5_dev_ops = {
+ .dev_configure = mlx5_dev_configure,
+ .dev_start = mlx5_dev_start,
+ .dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .dev_infos_get = mlx5_dev_infos_get,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3e0c11e..3f4f8fd 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -167,6 +167,8 @@ int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
int priv_ifreq(const struct priv *, int req, struct ifreq *);
int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
+int mlx5_dev_configure(struct rte_eth_dev *);
+void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
@@ -181,4 +183,9 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_trigger.c */
+
+int mlx5_dev_start(struct rte_eth_dev *);
+void mlx5_dev_stop(struct rte_eth_dev *);
+
#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index b6c7d7a..6b13cec 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -32,6 +32,7 @@
*/
#include <stddef.h>
+#include <assert.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>
@@ -58,6 +59,7 @@
#endif
#include "mlx5.h"
+#include "mlx5_rxtx.h"
#include "mlx5_utils.h"
/**
@@ -370,6 +372,152 @@ priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
}
/**
+ * Ethernet device configuration.
+ *
+ * Prepare the driver for a given number of TX and RX queues.
+ * Allocate parent RSS queue when several RX queues are requested.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+dev_configure(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int rxqs_n = dev->data->nb_rx_queues;
+ unsigned int txqs_n = dev->data->nb_tx_queues;
+ unsigned int tmp;
+ int ret;
+
+ priv->rxqs = (void *)dev->data->rx_queues;
+ priv->txqs = (void *)dev->data->tx_queues;
+ if (txqs_n != priv->txqs_n) {
+ INFO("%p: TX queues number update: %u -> %u",
+ (void *)dev, priv->txqs_n, txqs_n);
+ priv->txqs_n = txqs_n;
+ }
+ if (rxqs_n == priv->rxqs_n)
+ return 0;
+ INFO("%p: RX queues number update: %u -> %u",
+ (void *)dev, priv->rxqs_n, rxqs_n);
+ /* If RSS is enabled, disable it first. */
+ if (priv->rss) {
+ unsigned int i;
+
+ /* Only if there are no remaining child RX queues. */
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ return EINVAL;
+ rxq_cleanup(&priv->rxq_parent);
+ priv->rss = 0;
+ priv->rxqs_n = 0;
+ }
+ if (rxqs_n <= 1) {
+ /* Nothing else to do. */
+ priv->rxqs_n = rxqs_n;
+ return 0;
+ }
+ /* Allocate a new RSS parent queue if supported by hardware. */
+ if (!priv->hw_rss) {
+ ERROR("%p: only a single RX queue can be configured when"
+ " hardware doesn't support RSS",
+ (void *)dev);
+ return EINVAL;
+ }
+ /* Fail if hardware doesn't support that many RSS queues. */
+ if (rxqs_n >= priv->max_rss_tbl_sz) {
+ ERROR("%p: only %u RX queues can be configured for RSS",
+ (void *)dev, priv->max_rss_tbl_sz);
+ return EINVAL;
+ }
+ priv->rss = 1;
+ tmp = priv->rxqs_n;
+ priv->rxqs_n = rxqs_n;
+ ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, NULL, NULL);
+ if (!ret)
+ return 0;
+ /* Failure, rollback. */
+ priv->rss = 0;
+ priv->rxqs_n = tmp;
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback for Ethernet device configuration.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_configure(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = dev_configure(dev);
+ assert(ret >= 0);
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to get information about the device.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] info
+ * Info structure output buffer.
+ */
+void
+mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int max;
+ char ifname[IF_NAMESIZE];
+
+ priv_lock(priv);
+ /* FIXME: we should ask the device for these values. */
+ info->min_rx_bufsize = 32;
+ info->max_rx_pktlen = 65536;
+ /*
+ * Since we need one CQ per QP, the limit is the minimum number
+ * between the two values.
+ */
+ max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
+ priv->device_attr.max_qp : priv->device_attr.max_cq);
+ /* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
+ if (max >= 65535)
+ max = 65535;
+ info->max_rx_queues = max;
+ info->max_tx_queues = max;
+ /* Last array entry is reserved for broadcast. */
+ info->max_mac_addrs = (RTE_DIM(priv->mac) - 1);
+ info->rx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_RX_OFFLOAD_IPV4_CKSUM |
+ DEV_RX_OFFLOAD_UDP_CKSUM |
+ DEV_RX_OFFLOAD_TCP_CKSUM) :
+ 0);
+ info->tx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_TX_OFFLOAD_IPV4_CKSUM |
+ DEV_TX_OFFLOAD_UDP_CKSUM |
+ DEV_TX_OFFLOAD_TCP_CKSUM) :
+ 0);
+ if (priv_get_ifname(priv, &ifname) == 0)
+ info->if_index = if_nametoindex(ifname);
+ priv_unlock(priv);
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
new file mode 100644
index 0000000..bcec957
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -0,0 +1,145 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+
+/**
+ * DPDK callback to start the device.
+ *
+ * Simulate device start by attaching all configured flows.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_start(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i = 0;
+ unsigned int r;
+ struct rxq *rxq;
+
+ priv_lock(priv);
+ if (priv->started) {
+ priv_unlock(priv);
+ return 0;
+ }
+ DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
+ priv->started = 1;
+ if (priv->rss) {
+ rxq = &priv->rxq_parent;
+ r = 1;
+ } else {
+ rxq = (*priv->rxqs)[0];
+ r = priv->rxqs_n;
+ }
+ /* Iterate only once when RSS is enabled. */
+ do {
+ int ret;
+
+ /* Ignore nonexistent RX queues. */
+ if (rxq == NULL)
+ continue;
+ ret = rxq_mac_addrs_add(rxq);
+ if (!ret)
+ continue;
+ WARN("%p: QP flow attachment failed: %s",
+ (void *)dev, strerror(ret));
+ /* Rollback. */
+ while (i != 0) {
+ rxq = (*priv->rxqs)[--i];
+ if (rxq != NULL) {
+ rxq_mac_addrs_del(rxq);
+ }
+ }
+ priv->started = 0;
+ return -ret;
+ } while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+ priv_unlock(priv);
+ return 0;
+}
+
+/**
+ * DPDK callback to stop the device.
+ *
+ * Simulate device stop by detaching all configured flows.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_dev_stop(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i = 0;
+ unsigned int r;
+ struct rxq *rxq;
+
+ priv_lock(priv);
+ if (!priv->started) {
+ priv_unlock(priv);
+ return;
+ }
+ DEBUG("%p: detaching flows from all RX queues", (void *)dev);
+ priv->started = 0;
+ if (priv->rss) {
+ rxq = &priv->rxq_parent;
+ r = 1;
+ } else {
+ rxq = (*priv->rxqs)[0];
+ r = priv->rxqs_n;
+ }
+ /* Iterate only once when RSS is enabled. */
+ do {
+ /* Ignore nonexistent RX queues. */
+ if (rxq == NULL)
+ continue;
+ rxq_mac_addrs_del(rxq);
+ } while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+ priv_unlock(priv);
+}
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/13] mlx5: add support for scattered RX and TX buffers
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (3 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 06/13] mlx5: add MTU configuration support Adrien Mazarguil
` (8 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
A dedicated RX callback is added to handle scattered buffers. For better
performance, it is only used when jumbo frames are enabled and MTU is larger
than a single mbuf.
On the TX path, scattered buffers are also handled in a separate function.
When there are more than MLX5_PMD_SGE_WR_N segments in a given mbuf, the
remaining segments are linearized in the last SGE entry.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5_rxq.c | 175 +++++++++++++++++++-
drivers/net/mlx5/mlx5_rxtx.c | 376 +++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 10 ++
3 files changed, 557 insertions(+), 4 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8450fe3..1eddfc7 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -65,6 +65,153 @@
#include "mlx5_defs.h"
/**
+ * Allocate RX queue elements with scattered packets support.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ * @param[in] pool
+ * If not NULL, fetch buffers from this array instead of allocating them
+ * with rte_pktmbuf_alloc().
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
+ struct rte_mbuf **pool)
+{
+ unsigned int i;
+ struct rxq_elt_sp (*elts)[elts_n] =
+ rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+ rxq->socket);
+ int ret = 0;
+
+ if (elts == NULL) {
+ ERROR("%p: can't allocate packets array", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* For each WR (packet). */
+ for (i = 0; (i != elts_n); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+ struct ibv_recv_wr *wr = &elt->wr;
+ struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = &elt->sges;
+
+ /* These two arrays must have the same size. */
+ assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
+ /* Configure WR. */
+ wr->wr_id = i;
+ wr->next = &(*elts)[(i + 1)].wr;
+ wr->sg_list = &(*sges)[0];
+ wr->num_sge = RTE_DIM(*sges);
+ /* For each SGE (segment). */
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct ibv_sge *sge = &(*sges)[j];
+ struct rte_mbuf *buf;
+
+ if (pool != NULL) {
+ buf = *(pool++);
+ assert(buf != NULL);
+ rte_pktmbuf_reset(buf);
+ } else
+ buf = rte_pktmbuf_alloc(rxq->mp);
+ if (buf == NULL) {
+ assert(pool == NULL);
+ ERROR("%p: empty mbuf pool", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ elt->bufs[j] = buf;
+ /* Headroom is reserved by rte_pktmbuf_alloc(). */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ /* Buffer is supposed to be empty. */
+ assert(rte_pktmbuf_data_len(buf) == 0);
+ assert(rte_pktmbuf_pkt_len(buf) == 0);
+ /* sge->addr must be able to store a pointer. */
+ assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+ if (j == 0) {
+ /* The first SGE keeps its headroom. */
+ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ sge->length = (buf->buf_len -
+ RTE_PKTMBUF_HEADROOM);
+ } else {
+ /* Subsequent SGEs lose theirs. */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ SET_DATA_OFF(buf, 0);
+ sge->addr = (uintptr_t)buf->buf_addr;
+ sge->length = buf->buf_len;
+ }
+ sge->lkey = rxq->mr->lkey;
+ /* Redundant check for tailroom. */
+ assert(sge->length == rte_pktmbuf_tailroom(buf));
+ }
+ }
+ /* The last WR pointer must be NULL. */
+ (*elts)[(i - 1)].wr.next = NULL;
+ DEBUG("%p: allocated and configured %u WRs (%zu segments)",
+ (void *)rxq, elts_n, (elts_n * RTE_DIM((*elts)[0].sges)));
+ rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
+ rxq->elts.sp = elts;
+ assert(ret == 0);
+ return 0;
+error:
+ if (elts != NULL) {
+ assert(pool == NULL);
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct rte_mbuf *buf = elt->bufs[j];
+
+ if (buf != NULL)
+ rte_pktmbuf_free_seg(buf);
+ }
+ }
+ rte_free(elts);
+ }
+ DEBUG("%p: failed, freed everything", (void *)rxq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free RX queue elements with scattered packets support.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+static void
+rxq_free_elts_sp(struct rxq *rxq)
+{
+ unsigned int i;
+ unsigned int elts_n = rxq->elts_n;
+ struct rxq_elt_sp (*elts)[elts_n] = rxq->elts.sp;
+
+ DEBUG("%p: freeing WRs", (void *)rxq);
+ rxq->elts_n = 0;
+ rxq->elts.sp = NULL;
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct rte_mbuf *buf = elt->bufs[j];
+
+ if (buf != NULL)
+ rte_pktmbuf_free_seg(buf);
+ }
+ }
+ rte_free(elts);
+}
+
+/**
* Allocate RX queue elements.
*
* @param rxq
@@ -224,7 +371,10 @@ rxq_cleanup(struct rxq *rxq)
struct ibv_exp_release_intf_params params;
DEBUG("cleaning up %p", (void *)rxq);
- rxq_free_elts(rxq);
+ if (rxq->sp)
+ rxq_free_elts_sp(rxq);
+ else
+ rxq_free_elts(rxq);
if (rxq->if_qp != NULL) {
assert(rxq->priv != NULL);
assert(rxq->priv->ctx != NULL);
@@ -445,6 +595,15 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+ /* Enable scattered packets support for this queue if necessary. */
+ if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
+ (dev->data->dev_conf.rxmode.max_rx_pkt_len >
+ (tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+ tmpl.sp = 1;
+ desc /= MLX5_PMD_SGE_WR_N;
+ }
+ DEBUG("%p: %s scattered packets support (%u WRs)",
+ (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
tmpl.mr = ibv_reg_mr(priv->pd,
(void *)mp->elt_va_start,
@@ -528,14 +687,19 @@ skip_mr:
/* Allocate descriptors for RX queues, except for the RSS parent. */
if (parent)
goto skip_alloc;
- ret = rxq_alloc_elts(&tmpl, desc, NULL);
+ if (tmpl.sp)
+ ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
+ else
+ ret = rxq_alloc_elts(&tmpl, desc, NULL);
if (ret) {
ERROR("%p: RXQ allocation failed: %s",
(void *)dev, strerror(ret));
goto error;
}
ret = ibv_post_recv(tmpl.qp,
- &(*tmpl.elts.no_sp)[0].wr,
+ (tmpl.sp ?
+ &(*tmpl.elts.sp)[0].wr :
+ &(*tmpl.elts.no_sp)[0].wr),
&bad_wr);
if (ret) {
ERROR("%p: ibv_post_recv() failed for WR %p: %s",
@@ -655,7 +819,10 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
(void *)dev, (void *)rxq);
(*priv->rxqs)[idx] = rxq;
/* Update receive callback. */
- dev->rx_pkt_burst = mlx5_rx_burst;
+ if (rxq->sp)
+ dev->rx_pkt_burst = mlx5_rx_burst_sp;
+ else
+ dev->rx_pkt_burst = mlx5_rx_burst;
}
priv_unlock(priv);
return -ret;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 40bddf0..5042011 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -173,6 +173,154 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
return txq->mp2mr[i].lkey;
}
+#if MLX5_PMD_SGE_WR_N > 1
+
+/**
+ * Copy scattered mbuf contents to a single linear buffer.
+ *
+ * @param[out] linear
+ * Linear output buffer.
+ * @param[in] buf
+ * Scattered input buffer.
+ *
+ * @return
+ * Number of bytes copied to the output buffer or 0 if not large enough.
+ */
+static unsigned int
+linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
+{
+ unsigned int size = 0;
+ unsigned int offset;
+
+ do {
+ unsigned int len = DATA_LEN(buf);
+
+ offset = size;
+ size += len;
+ if (unlikely(size > sizeof(*linear)))
+ return 0;
+ memcpy(&(*linear)[offset],
+ rte_pktmbuf_mtod(buf, uint8_t *),
+ len);
+ buf = NEXT(buf);
+ } while (buf != NULL);
+ return size;
+}
+
+/**
+ * Handle scattered buffers for mlx5_tx_burst().
+ *
+ * @param txq
+ * TX queue structure.
+ * @param segs
+ * Number of segments in buf.
+ * @param elt
+ * TX queue element to fill.
+ * @param[in] buf
+ * Buffer to process.
+ * @param elts_head
+ * Index of the linear buffer to use if necessary (normally txq->elts_head).
+ * @param[out] sges
+ * Array filled with SGEs on success.
+ *
+ * @return
+ * A structure containing the processed packet size in bytes and the
+ * number of SGEs. Both fields are set to (unsigned int)-1 in case of
+ * failure.
+ */
+static struct tx_burst_sg_ret {
+ unsigned int length;
+ unsigned int num;
+}
+tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
+ struct rte_mbuf *buf, unsigned int elts_head,
+ struct ibv_sge (*sges)[MLX5_PMD_SGE_WR_N])
+{
+ unsigned int sent_size = 0;
+ unsigned int j;
+ int linearize = 0;
+
+ /* When there are too many segments, extra segments are
+ * linearized in the last SGE. */
+ if (unlikely(segs > RTE_DIM(*sges))) {
+ segs = (RTE_DIM(*sges) - 1);
+ linearize = 1;
+ }
+ /* Update element. */
+ elt->buf = buf;
+ /* Register segments as SGEs. */
+ for (j = 0; (j != segs); ++j) {
+ struct ibv_sge *sge = &(*sges)[j];
+ uint32_t lkey;
+
+ /* Retrieve Memory Region key for this memory pool. */
+ lkey = txq_mp2mr(txq, buf->pool);
+ if (unlikely(lkey == (uint32_t)-1)) {
+ /* MR does not exist. */
+ DEBUG("%p: unable to get MP <-> MR association",
+ (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* Update SGE. */
+ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ if (txq->priv->vf)
+ rte_prefetch0((volatile void *)
+ (uintptr_t)sge->addr);
+ sge->length = DATA_LEN(buf);
+ sge->lkey = lkey;
+ sent_size += sge->length;
+ buf = NEXT(buf);
+ }
+ /* If buf is not NULL here and is not going to be linearized,
+ * nb_segs is not valid. */
+ assert(j == segs);
+ assert((buf == NULL) || (linearize));
+ /* Linearize extra segments. */
+ if (linearize) {
+ struct ibv_sge *sge = &(*sges)[segs];
+ linear_t *linear = &(*txq->elts_linear)[elts_head];
+ unsigned int size = linearize_mbuf(linear, buf);
+
+ assert(segs == (RTE_DIM(*sges) - 1));
+ if (size == 0) {
+ /* Invalid packet. */
+ DEBUG("%p: packet too large to be linearized.",
+ (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* If MLX5_PMD_SGE_WR_N is 1, free mbuf immediately. */
+ if (RTE_DIM(*sges) == 1) {
+ do {
+ struct rte_mbuf *next = NEXT(buf);
+
+ rte_pktmbuf_free_seg(buf);
+ buf = next;
+ } while (buf != NULL);
+ elt->buf = NULL;
+ }
+ /* Update SGE. */
+ sge->addr = (uintptr_t)&(*linear)[0];
+ sge->length = size;
+ sge->lkey = txq->mr_linear->lkey;
+ sent_size += size;
+ }
+ return (struct tx_burst_sg_ret){
+ .length = sent_size,
+ .num = segs,
+ };
+stop:
+ return (struct tx_burst_sg_ret){
+ .length = -1,
+ .num = -1,
+ };
+}
+
+#endif /* MLX5_PMD_SGE_WR_N > 1 */
+
/**
* DPDK callback for TX.
*
@@ -282,9 +430,28 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
if (unlikely(err))
goto stop;
} else {
+#if MLX5_PMD_SGE_WR_N > 1
+ struct ibv_sge sges[MLX5_PMD_SGE_WR_N];
+ struct tx_burst_sg_ret ret;
+
+ ret = tx_burst_sg(txq, segs, elt, buf, elts_head,
+ &sges);
+ if (ret.length == (unsigned int)-1)
+ goto stop;
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+ /* Put SG list into send queue. */
+ err = txq->if_qp->send_pending_sg_list
+ (txq->qp,
+ sges,
+ ret.num,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
+#else /* MLX5_PMD_SGE_WR_N > 1 */
DEBUG("%p: TX scattered buffers support not"
" compiled in", (void *)txq);
goto stop;
+#endif /* MLX5_PMD_SGE_WR_N > 1 */
}
elts_head = elts_head_next;
}
@@ -307,8 +474,215 @@ stop:
}
/**
+ * DPDK callback for RX with scattered packets support.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
+ struct ibv_recv_wr head;
+ struct ibv_recv_wr **next = &head.next;
+ struct ibv_recv_wr *bad_wr;
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
+
+ if (unlikely(!rxq->sp))
+ return mlx5_rx_burst(dpdk_rxq, pkts, pkts_n);
+ if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
+ return 0;
+ for (i = 0; (i != pkts_n); ++i) {
+ struct rxq_elt_sp *elt = &(*elts)[elts_head];
+ struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
+ unsigned int pkt_buf_len;
+ struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
+ struct rte_mbuf **pkt_buf_next = &pkt_buf;
+ unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
+ unsigned int j = 0;
+ uint32_t flags;
+
+ /* Sanity checks. */
+#ifdef NDEBUG
+ (void)wr_id;
+#endif
+ assert(wr_id < rxq->elts_n);
+ assert(wr->sg_list == elt->sges);
+ assert(wr->num_sge == RTE_DIM(elt->sges));
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
+ goto repost;
+ }
+ ret = wc.byte_len;
+ }
+ if (ret == 0)
+ break;
+ len = ret;
+ pkt_buf_len = len;
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
+ /*
+ * Replace spent segments with new ones, concatenate and
+ * return them as pkt_buf.
+ */
+ while (1) {
+ struct ibv_sge *sge = &elt->sges[j];
+ struct rte_mbuf *seg = elt->bufs[j];
+ struct rte_mbuf *rep;
+ unsigned int seg_tailroom;
+
+ /*
+ * Fetch initial bytes of packet descriptor into a
+ * cacheline while allocating rep.
+ */
+ rte_prefetch0(seg);
+ rep = __rte_mbuf_raw_alloc(rxq->mp);
+ if (unlikely(rep == NULL)) {
+ /*
+ * Unable to allocate a replacement mbuf,
+ * repost WR.
+ */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
+ " can't allocate a new mbuf",
+ (void *)rxq, wr_id);
+ if (pkt_buf != NULL) {
+ *pkt_buf_next = NULL;
+ rte_pktmbuf_free(pkt_buf);
+ }
+ /* Increment out of memory counters. */
+ ++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+ goto repost;
+ }
+#ifndef NDEBUG
+ /* Poison user-modifiable fields in rep. */
+ NEXT(rep) = (void *)((uintptr_t)-1);
+ SET_DATA_OFF(rep, 0xdead);
+ DATA_LEN(rep) = 0xd00d;
+ PKT_LEN(rep) = 0xdeadd00d;
+ NB_SEGS(rep) = 0x2a;
+ PORT(rep) = 0x2a;
+ rep->ol_flags = -1;
+#endif
+ assert(rep->buf_len == seg->buf_len);
+ assert(rep->buf_len == rxq->mb_len);
+ /* Reconfigure sge to use rep instead of seg. */
+ assert(sge->lkey == rxq->mr->lkey);
+ sge->addr = ((uintptr_t)rep->buf_addr + seg_headroom);
+ elt->bufs[j] = rep;
+ ++j;
+ /* Update pkt_buf if it's the first segment, or link
+ * seg to the previous one and update pkt_buf_next. */
+ *pkt_buf_next = seg;
+ pkt_buf_next = &NEXT(seg);
+ /* Update seg information. */
+ seg_tailroom = (seg->buf_len - seg_headroom);
+ assert(sge->length == seg_tailroom);
+ SET_DATA_OFF(seg, seg_headroom);
+ if (likely(len <= seg_tailroom)) {
+ /* Last segment. */
+ DATA_LEN(seg) = len;
+ PKT_LEN(seg) = len;
+ /* Sanity check. */
+ assert(rte_pktmbuf_headroom(seg) ==
+ seg_headroom);
+ assert(rte_pktmbuf_tailroom(seg) ==
+ (seg_tailroom - len));
+ break;
+ }
+ DATA_LEN(seg) = seg_tailroom;
+ PKT_LEN(seg) = seg_tailroom;
+ /* Sanity check. */
+ assert(rte_pktmbuf_headroom(seg) == seg_headroom);
+ assert(rte_pktmbuf_tailroom(seg) == 0);
+ /* Fix len and clear headroom for next segments. */
+ len -= seg_tailroom;
+ seg_headroom = 0;
+ }
+ /* Update head and tail segments. */
+ *pkt_buf_next = NULL;
+ assert(pkt_buf != NULL);
+ assert(j != 0);
+ NB_SEGS(pkt_buf) = j;
+ PORT(pkt_buf) = rxq->port_id;
+ PKT_LEN(pkt_buf) = pkt_buf_len;
+
+ /* Return packet. */
+ *(pkts++) = pkt_buf;
+ ++pkts_ret;
+repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
+ continue;
+ }
+ if (unlikely(i == 0))
+ return 0;
+ *next = NULL;
+ /* Repost WRs. */
+#ifdef DEBUG_RECV
+ DEBUG("%p: reposting %d WRs", (void *)rxq, i);
+#endif
+ ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
+ if (unlikely(ret)) {
+ /* Inability to repost WRs is fatal. */
+ DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
+ (void *)rxq->priv,
+ (void *)bad_wr,
+ strerror(ret));
+ abort();
+ }
+ rxq->elts_head = elts_head;
+ return pkts_ret;
+}
+
+/**
* DPDK callback for RX.
*
+ * The following function is the same as mlx5_rx_burst_sp(), except it doesn't
+ * manage scattered packets. Improves performance when MRU is lower than the
+ * size of the first segment.
+ *
* @param dpdk_rxq
* Generic pointer to RX queue structure.
* @param[out] pkts
@@ -331,6 +705,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
unsigned int pkts_ret = 0;
int ret;
+ if (unlikely(rxq->sp))
+ return mlx5_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
for (i = 0; (i != pkts_n); ++i) {
struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 08a4911..44170d7 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -60,6 +60,13 @@
#include "mlx5.h"
#include "mlx5_defs.h"
+/* RX element (scattered packets). */
+struct rxq_elt_sp {
+ struct ibv_recv_wr wr; /* Work Request. */
+ struct ibv_sge sges[MLX5_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
+ struct rte_mbuf *bufs[MLX5_PMD_SGE_WR_N]; /* SGEs buffers. */
+};
+
/* RX element. */
struct rxq_elt {
struct ibv_recv_wr wr; /* Work Request. */
@@ -87,8 +94,10 @@ struct rxq {
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
union {
+ struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
+ unsigned int sp:1; /* Use scattered RX elements. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
unsigned int socket; /* CPU socket ID for allocations. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -154,6 +163,7 @@ void mlx5_tx_queue_release(void *);
/* mlx5_rxtx.c */
uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t mlx5_rx_burst_sp(void *, struct rte_mbuf **, uint16_t);
uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t);
uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t);
uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/13] mlx5: add MTU configuration support
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (4 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
` (7 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
Depending on the MTU and whether jumbo frames are enabled, RX queues may
switch between SG and non-SG modes for better performance.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 101 +++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 181 +++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 1 +
5 files changed, 285 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 56cea7c..5bef742 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_release = mlx5_tx_queue_release,
.mac_addr_remove = mlx5_mac_addr_remove,
.mac_addr_add = mlx5_mac_addr_add,
+ .mtu_set = mlx5_dev_set_mtu,
};
static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3f4f8fd..14f55ba 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -169,6 +169,7 @@ int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
+int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6b13cec..4621282 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -347,6 +347,23 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
}
/**
+ * Set device MTU.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mtu
+ * MTU value to set.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_set_mtu(struct priv *priv, uint16_t mtu)
+{
+ return priv_set_sysfs_ulong(priv, "mtu", mtu);
+}
+
+/**
* Set device flags.
*
* @param priv
@@ -518,6 +535,90 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
}
/**
+ * DPDK callback to change the MTU.
+ *
+ * Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
+ * received). Use this as a hint to enable/disable scattered packets support
+ * and improve performance when not needed.
+ * Since failure is not an option, reconfiguring queues on the fly is not
+ * recommended.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param in_mtu
+ * New MTU.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret = 0;
+ unsigned int i;
+ uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
+ mlx5_rx_burst;
+
+ priv_lock(priv);
+ /* Set kernel interface MTU first. */
+ if (priv_set_mtu(priv, mtu)) {
+ ret = errno;
+ WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
+ strerror(ret));
+ goto out;
+ } else
+ DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
+ priv->mtu = mtu;
+ /* Temporarily replace RX handler with a fake one, assuming it has not
+ * been copied elsewhere. */
+ dev->rx_pkt_burst = removed_rx_burst;
+ /* Make sure everyone has left mlx5_rx_burst() and uses
+ * removed_rx_burst() instead. */
+ rte_wmb();
+ usleep(1000);
+ /* Reconfigure each RX queue. */
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ struct rxq *rxq = (*priv->rxqs)[i];
+ unsigned int max_frame_len;
+ int sp;
+
+ if (rxq == NULL)
+ continue;
+ /* Calculate new maximum frame length according to MTU and
+ * toggle scattered support (sp) if necessary. */
+ max_frame_len = (priv->mtu + ETHER_HDR_LEN +
+ (ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
+ sp = (max_frame_len > (rxq->mb_len - RTE_PKTMBUF_HEADROOM));
+ /* Provide new values to rxq_setup(). */
+ dev->data->dev_conf.rxmode.jumbo_frame = sp;
+ dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
+ ret = rxq_rehash(dev, rxq);
+ if (ret) {
+ /* Force SP RX if that queue requires it and abort. */
+ if (rxq->sp)
+ rx_func = mlx5_rx_burst_sp;
+ break;
+ }
+ /* Reenable non-RSS queue attributes. No need to check
+ * for errors at this stage. */
+ if (!priv->rss) {
+ rxq_mac_addrs_add(rxq);
+ }
+ /* Scattered burst function takes priority. */
+ if (rxq->sp)
+ rx_func = mlx5_rx_burst_sp;
+ }
+ /* Burst functions can now be called again. */
+ rte_wmb();
+ dev->rx_pkt_burst = rx_func;
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1eddfc7..401b575 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -526,6 +526,187 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
#endif /* RSS_SUPPORT */
/**
+ * Reconfigure a RX queue with new parameters.
+ *
+ * rxq_rehash() does not allocate mbufs, which, if not done from the right
+ * thread (such as a control thread), may corrupt the pool.
+ * In case of failure, the queue is left untouched.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param rxq
+ * RX queue pointer.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
+{
+ struct priv *priv = rxq->priv;
+ struct rxq tmpl = *rxq;
+ unsigned int mbuf_n;
+ unsigned int desc_n;
+ struct rte_mbuf **pool;
+ unsigned int i, k;
+ struct ibv_exp_qp_attr mod;
+ struct ibv_recv_wr *bad_wr;
+ int err;
+ int parent = (rxq == &priv->rxq_parent);
+
+ if (parent) {
+ ERROR("%p: cannot rehash parent queue %p",
+ (void *)dev, (void *)rxq);
+ return EINVAL;
+ }
+ DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
+ /* Number of descriptors and mbufs currently allocated. */
+ desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
+ mbuf_n = desc_n;
+ /* Enable scattered packets support for this queue if necessary. */
+ if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
+ (dev->data->dev_conf.rxmode.max_rx_pkt_len >
+ (tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+ tmpl.sp = 1;
+ desc_n /= MLX5_PMD_SGE_WR_N;
+ } else
+ tmpl.sp = 0;
+ DEBUG("%p: %s scattered packets support (%u WRs)",
+ (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc_n);
+ /* If scatter mode is the same as before, nothing to do. */
+ if (tmpl.sp == rxq->sp) {
+ DEBUG("%p: nothing to do", (void *)dev);
+ return 0;
+ }
+ /* Remove attached flows if RSS is disabled (no parent queue). */
+ if (!priv->rss) {
+ rxq_mac_addrs_del(&tmpl);
+ /* Update original queue in case of failure. */
+ memcpy(rxq->mac_configured, tmpl.mac_configured,
+ sizeof(rxq->mac_configured));
+ memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+ }
+ /* From now on, any failure will render the queue unusable.
+ * Reinitialize QP. */
+ mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (err) {
+ ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ }
+ err = ibv_resize_cq(tmpl.cq, desc_n);
+ if (err) {
+ ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod,
+ (IBV_EXP_QP_STATE |
+#ifdef RSS_SUPPORT
+ (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+#endif /* RSS_SUPPORT */
+ IBV_EXP_QP_PORT));
+ if (err) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ };
+ /* Reconfigure flows. Do not care for errors. */
+ if (!priv->rss) {
+ rxq_mac_addrs_add(&tmpl);
+ /* Update original queue in case of failure. */
+ memcpy(rxq->mac_configured, tmpl.mac_configured,
+ sizeof(rxq->mac_configured));
+ memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+ }
+ /* Allocate pool. */
+ pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
+ if (pool == NULL) {
+ ERROR("%p: cannot allocate memory", (void *)dev);
+ return ENOBUFS;
+ }
+ /* Snatch mbufs from original queue. */
+ k = 0;
+ if (rxq->sp) {
+ struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt_sp *elt = &(*elts)[i];
+ unsigned int j;
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ assert(elt->bufs[j] != NULL);
+ pool[k++] = elt->bufs[j];
+ }
+ }
+ } else {
+ struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf = (void *)
+ ((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ pool[k++] = buf;
+ }
+ }
+ assert(k == mbuf_n);
+ tmpl.elts_n = 0;
+ tmpl.elts.sp = NULL;
+ assert((void *)&tmpl.elts.sp == (void *)&tmpl.elts.no_sp);
+ err = ((tmpl.sp) ?
+ rxq_alloc_elts_sp(&tmpl, desc_n, pool) :
+ rxq_alloc_elts(&tmpl, desc_n, pool));
+ if (err) {
+ ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
+ rte_free(pool);
+ assert(err > 0);
+ return err;
+ }
+ assert(tmpl.elts_n == desc_n);
+ assert(tmpl.elts.sp != NULL);
+ rte_free(pool);
+ /* Clean up original data. */
+ rxq->elts_n = 0;
+ rte_free(rxq->elts.sp);
+ rxq->elts.sp = NULL;
+ /* Post WRs. */
+ err = ibv_post_recv(tmpl.qp,
+ (tmpl.sp ?
+ &(*tmpl.elts.sp)[0].wr :
+ &(*tmpl.elts.no_sp)[0].wr),
+ &bad_wr);
+ if (err) {
+ ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+ (void *)dev,
+ (void *)bad_wr,
+ strerror(err));
+ goto skip_rtr;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (err)
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(err));
+skip_rtr:
+ *rxq = tmpl;
+ assert(err >= 0);
+ return err;
+}
+
+/**
* Configure a RX queue.
*
* @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 44170d7..0a2e650 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -147,6 +147,7 @@ struct txq {
/* mlx5_rxq.c */
void rxq_cleanup(struct rxq *);
+int rxq_rehash(struct rte_eth_dev *, struct rxq *);
int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
const struct rte_eth_rxconf *, struct rte_mempool *);
int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/13] mlx5: add software counters and related callbacks
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (5 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 06/13] mlx5: add MTU configuration support Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
` (6 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
Hardware counters are not supported yet.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5.h | 5 ++
drivers/net/mlx5/mlx5_defs.h | 8 +++
drivers/net/mlx5/mlx5_rxq.c | 1 +
drivers/net/mlx5/mlx5_rxtx.c | 43 +++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 21 ++++++
drivers/net/mlx5/mlx5_stats.c | 144 ++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_txq.c | 1 +
9 files changed, 226 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_stats.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 028c22c..88b361c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -48,6 +48,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 5bef742..262b458 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -133,6 +133,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .stats_get = mlx5_stats_get,
+ .stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 14f55ba..261593e 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,11 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_stats.c */
+
+void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
+void mlx5_stats_reset(struct rte_eth_dev *);
+
/* mlx5_trigger.c */
int mlx5_dev_start(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 4f13a4e..79de609 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -74,4 +74,12 @@
#define MLX5_PMD_TX_MP_CACHE 8
#endif
+/*
+ * If defined, only use software counters. The PMD will never ask the hardware
+ * for these, and many of them won't be available.
+ */
+#ifndef MLX5_PMD_SOFT_COUNTERS
+#define MLX5_PMD_SOFT_COUNTERS 1
+#endif
+
#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 401b575..a7d5081 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -996,6 +996,7 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
if (ret)
rte_free(rxq);
else {
+ rxq->stats.idx = idx;
DEBUG("%p: adding RX queue %p to list",
(void *)dev, (void *)rxq);
(*priv->rxqs)[idx] = rxq;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5042011..960a3e5 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -367,6 +367,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
struct txq_elt *elt = &(*txq->elts)[elts_head];
unsigned int segs = NB_SEGS(buf);
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ unsigned int sent_size = 0;
+#endif
uint32_t send_flags = 0;
/* Clean up old buffer. */
@@ -429,6 +432,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
send_flags);
if (unlikely(err))
goto stop;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ sent_size += length;
+#endif
} else {
#if MLX5_PMD_SGE_WR_N > 1
struct ibv_sge sges[MLX5_PMD_SGE_WR_N];
@@ -447,6 +453,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
send_flags);
if (unlikely(err))
goto stop;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ sent_size += ret.length;
+#endif
#else /* MLX5_PMD_SGE_WR_N > 1 */
DEBUG("%p: TX scattered buffers support not"
" compiled in", (void *)txq);
@@ -454,11 +463,19 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
#endif /* MLX5_PMD_SGE_WR_N > 1 */
}
elts_head = elts_head_next;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment sent bytes counter. */
+ txq->stats.obytes += sent_size;
+#endif
}
stop:
/* Take a shortcut if nothing must be sent. */
if (unlikely(i == 0))
return 0;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment sent packets counter. */
+ txq->stats.opackets += i;
+#endif
/* Ring QP doorbell. */
err = txq->if_qp->send_flush(txq->qp);
if (unlikely(err)) {
@@ -549,6 +566,10 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" completion status (%d): %s",
(void *)rxq, wc.wr_id, wc.status,
ibv_wc_status_str(wc.status));
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
+#endif
/* Link completed WRs together for repost. */
*next = wr;
next = &wr->next;
@@ -592,6 +613,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
rte_pktmbuf_free(pkt_buf);
}
/* Increment out of memory counters. */
+ ++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
goto repost;
}
@@ -651,6 +673,10 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Return packet. */
*(pkts++) = pkt_buf;
++pkts_ret;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment bytes counter. */
+ rxq->stats.ibytes += pkt_buf_len;
+#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
@@ -673,6 +699,10 @@ repost:
abort();
}
rxq->elts_head = elts_head;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment packets counter. */
+ rxq->stats.ipackets += pkts_ret;
+#endif
return pkts_ret;
}
@@ -747,6 +777,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" completion status (%d): %s",
(void *)rxq, wc.wr_id, wc.status,
ibv_wc_status_str(wc.status));
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
+#endif
/* Add SGE to array for repost. */
sges[i] = elt->sge;
goto repost;
@@ -771,6 +805,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" can't allocate a new mbuf",
(void *)rxq, WR_ID(wr_id).id);
/* Increment out of memory counters. */
+ ++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
goto repost;
}
@@ -797,6 +832,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Return packet. */
*(pkts++) = seg;
++pkts_ret;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment bytes counter. */
+ rxq->stats.ibytes += len;
+#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
@@ -817,6 +856,10 @@ repost:
abort();
}
rxq->elts_head = elts_head;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment packets counter. */
+ rxq->stats.ipackets += pkts_ret;
+#endif
return pkts_ret;
}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 0a2e650..b37843b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -60,6 +60,25 @@
#include "mlx5.h"
#include "mlx5_defs.h"
+struct mlx5_rxq_stats {
+ unsigned int idx; /**< Mapping index. */
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ uint64_t ipackets; /**< Total of successfully received packets. */
+ uint64_t ibytes; /**< Total of successfully received bytes. */
+#endif
+ uint64_t idropped; /**< Total of packets dropped when RX ring full. */
+ uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
+};
+
+struct mlx5_txq_stats {
+ unsigned int idx; /**< Mapping index. */
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ uint64_t opackets; /**< Total of successfully sent packets. */
+ uint64_t obytes; /**< Total of successfully sent bytes. */
+#endif
+ uint64_t odropped; /**< Total of packets not sent when TX ring full. */
+};
+
/* RX element (scattered packets). */
struct rxq_elt_sp {
struct ibv_recv_wr wr; /* Work Request. */
@@ -99,6 +118,7 @@ struct rxq {
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
+ struct mlx5_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
};
@@ -138,6 +158,7 @@ struct txq {
unsigned int elts_comp; /* Number of completion requests. */
unsigned int elts_comp_cd; /* Countdown for next completion request. */
unsigned int elts_comp_cd_init; /* Initial value for countdown. */
+ struct mlx5_txq_stats stats; /* TX queue counters. */
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
unsigned int socket; /* CPU socket ID for allocations. */
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
new file mode 100644
index 0000000..a51e945
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -0,0 +1,144 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
+
+/**
+ * DPDK callback to get device statistics.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] stats
+ * Stats structure output buffer.
+ */
+void
+mlx5_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rte_eth_stats tmp = {0};
+ unsigned int i;
+ unsigned int idx;
+
+ priv_lock(priv);
+ /* Add software counters. */
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ struct rxq *rxq = (*priv->rxqs)[i];
+
+ if (rxq == NULL)
+ continue;
+ idx = rxq->stats.idx;
+ if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.q_ipackets[idx] += rxq->stats.ipackets;
+ tmp.q_ibytes[idx] += rxq->stats.ibytes;
+#endif
+ tmp.q_errors[idx] += (rxq->stats.idropped +
+ rxq->stats.rx_nombuf);
+ }
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.ipackets += rxq->stats.ipackets;
+ tmp.ibytes += rxq->stats.ibytes;
+#endif
+ tmp.ierrors += rxq->stats.idropped;
+ tmp.rx_nombuf += rxq->stats.rx_nombuf;
+ }
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ struct txq *txq = (*priv->txqs)[i];
+
+ if (txq == NULL)
+ continue;
+ idx = txq->stats.idx;
+ if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.q_opackets[idx] += txq->stats.opackets;
+ tmp.q_obytes[idx] += txq->stats.obytes;
+#endif
+ tmp.q_errors[idx] += txq->stats.odropped;
+ }
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.opackets += txq->stats.opackets;
+ tmp.obytes += txq->stats.obytes;
+#endif
+ tmp.oerrors += txq->stats.odropped;
+ }
+#ifndef MLX5_PMD_SOFT_COUNTERS
+ /* FIXME: retrieve and add hardware counters. */
+#endif
+ *stats = tmp;
+ priv_unlock(priv);
+}
+
+/**
+ * DPDK callback to clear device statistics.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_stats_reset(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ unsigned int idx;
+
+ priv_lock(priv);
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ idx = (*priv->rxqs)[i]->stats.idx;
+ (*priv->rxqs)[i]->stats =
+ (struct mlx5_rxq_stats){ .idx = idx };
+ }
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ if ((*priv->txqs)[i] == NULL)
+ continue;
+ idx = (*priv->rxqs)[i]->stats.idx;
+ (*priv->txqs)[i]->stats =
+ (struct mlx5_txq_stats){ .idx = idx };
+ }
+#ifndef MLX5_PMD_SOFT_COUNTERS
+ /* FIXME: reset hardware counters. */
+#endif
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 2bae61f..a53b128 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -472,6 +472,7 @@ mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
if (ret)
rte_free(txq);
else {
+ txq->stats.idx = idx;
DEBUG("%p: adding TX queue %p to list",
(void *)dev, (void *)txq);
(*priv->txqs)[idx] = txq;
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/13] mlx5: add promiscuous and allmulticast RX modes
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (6 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 09/13] mlx5: add link update device operation Adrien Mazarguil
` (5 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
These modes require special non-MAC flows.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 4 +
drivers/net/mlx5/mlx5.h | 11 ++
drivers/net/mlx5/mlx5_ethdev.c | 4 +
drivers/net/mlx5/mlx5_rxmode.c | 327 ++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 12 ++
drivers/net/mlx5/mlx5_rxtx.h | 2 +
drivers/net/mlx5/mlx5_trigger.c | 8 +
8 files changed, 369 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_rxmode.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 88b361c..4d25d9c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -48,6 +48,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 262b458..7e60d6a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -133,6 +133,10 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .promiscuous_enable = mlx5_promiscuous_enable,
+ .promiscuous_disable = mlx5_promiscuous_disable,
+ .allmulticast_enable = mlx5_allmulticast_enable,
+ .allmulticast_disable = mlx5_allmulticast_disable,
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 261593e..40dc1f9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -184,6 +184,17 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_rxmode.c */
+
+int rxq_promiscuous_enable(struct rxq *);
+void mlx5_promiscuous_enable(struct rte_eth_dev *);
+void rxq_promiscuous_disable(struct rxq *);
+void mlx5_promiscuous_disable(struct rte_eth_dev *);
+int rxq_allmulticast_enable(struct rxq *);
+void mlx5_allmulticast_enable(struct rte_eth_dev *);
+void rxq_allmulticast_disable(struct rxq *);
+void mlx5_allmulticast_disable(struct rte_eth_dev *);
+
/* mlx5_stats.c */
void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 4621282..0e95d26 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -604,6 +604,10 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
* for errors at this stage. */
if (!priv->rss) {
rxq_mac_addrs_add(rxq);
+ if (priv->promisc)
+ rxq_promiscuous_enable(rxq);
+ if (priv->allmulti)
+ rxq_allmulticast_enable(rxq);
}
/* Scattered burst function takes priority. */
if (rxq->sp)
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
new file mode 100644
index 0000000..b4e5493
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -0,0 +1,327 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <errno.h>
+#include <string.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+
+/**
+ * Enable promiscuous mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_promiscuous_enable(struct rxq *rxq)
+{
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_ALL_DEFAULT,
+ .num_of_specs = 0,
+ .port = rxq->priv->port,
+ .flags = 0
+ };
+
+ if (rxq->priv->vf)
+ return 0;
+ DEBUG("%p: enabling promiscuous mode", (void *)rxq);
+ if (rxq->promisc_flow != NULL)
+ return EBUSY;
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, &attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ rxq->promisc_flow = flow;
+ DEBUG("%p: promiscuous mode enabled", (void *)rxq);
+ return 0;
+}
+
+/**
+ * DPDK callback to enable promiscuous mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_promiscuous_enable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ int ret;
+
+ priv_lock(priv);
+ if (priv->promisc) {
+ priv_unlock(priv);
+ return;
+ }
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started)
+ goto end;
+ if (priv->rss) {
+ ret = rxq_promiscuous_enable(&priv->rxq_parent);
+ if (ret) {
+ priv_unlock(priv);
+ return;
+ }
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[--i] != NULL)
+ rxq_promiscuous_disable((*priv->rxqs)[i]);
+ priv_unlock(priv);
+ return;
+ }
+end:
+ priv->promisc = 1;
+ priv_unlock(priv);
+}
+
+/**
+ * Disable promiscuous mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_promiscuous_disable(struct rxq *rxq)
+{
+ if (rxq->priv->vf)
+ return;
+ DEBUG("%p: disabling promiscuous mode", (void *)rxq);
+ if (rxq->promisc_flow == NULL)
+ return;
+ claim_zero(ibv_destroy_flow(rxq->promisc_flow));
+ rxq->promisc_flow = NULL;
+ DEBUG("%p: promiscuous mode disabled", (void *)rxq);
+}
+
+/**
+ * DPDK callback to disable promiscuous mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_promiscuous_disable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+
+ priv_lock(priv);
+ if (!priv->promisc) {
+ priv_unlock(priv);
+ return;
+ }
+ if (priv->rss) {
+ rxq_promiscuous_disable(&priv->rxq_parent);
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_promiscuous_disable((*priv->rxqs)[i]);
+end:
+ priv->promisc = 0;
+ priv_unlock(priv);
+}
+
+/**
+ * Enable allmulti mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_allmulticast_enable(struct rxq *rxq)
+{
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_MC_DEFAULT,
+ .num_of_specs = 0,
+ .port = rxq->priv->port,
+ .flags = 0
+ };
+
+ DEBUG("%p: enabling allmulticast mode", (void *)rxq);
+ if (rxq->allmulti_flow != NULL)
+ return EBUSY;
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, &attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ rxq->allmulti_flow = flow;
+ DEBUG("%p: allmulticast mode enabled", (void *)rxq);
+ return 0;
+}
+
+/**
+ * DPDK callback to enable allmulti mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_allmulticast_enable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ int ret;
+
+ priv_lock(priv);
+ if (priv->allmulti) {
+ priv_unlock(priv);
+ return;
+ }
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started)
+ goto end;
+ if (priv->rss) {
+ ret = rxq_allmulticast_enable(&priv->rxq_parent);
+ if (ret) {
+ priv_unlock(priv);
+ return;
+ }
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[--i] != NULL)
+ rxq_allmulticast_disable((*priv->rxqs)[i]);
+ priv_unlock(priv);
+ return;
+ }
+end:
+ priv->allmulti = 1;
+ priv_unlock(priv);
+}
+
+/**
+ * Disable allmulti mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_allmulticast_disable(struct rxq *rxq)
+{
+ DEBUG("%p: disabling allmulticast mode", (void *)rxq);
+ if (rxq->allmulti_flow == NULL)
+ return;
+ claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
+ rxq->allmulti_flow = NULL;
+ DEBUG("%p: allmulticast mode disabled", (void *)rxq);
+}
+
+/**
+ * DPDK callback to disable allmulti mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_allmulticast_disable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+
+ priv_lock(priv);
+ if (!priv->allmulti) {
+ priv_unlock(priv);
+ return;
+ }
+ if (priv->rss) {
+ rxq_allmulticast_disable(&priv->rxq_parent);
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_allmulticast_disable((*priv->rxqs)[i]);
+end:
+ priv->allmulti = 0;
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a7d5081..d44bb10 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -398,6 +398,8 @@ rxq_cleanup(struct rxq *rxq)
¶ms));
}
if (rxq->qp != NULL) {
+ rxq_promiscuous_disable(rxq);
+ rxq_allmulticast_disable(rxq);
rxq_mac_addrs_del(rxq);
claim_zero(ibv_destroy_qp(rxq->qp));
}
@@ -580,8 +582,12 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
}
/* Remove attached flows if RSS is disabled (no parent queue). */
if (!priv->rss) {
+ rxq_allmulticast_disable(&tmpl);
+ rxq_promiscuous_disable(&tmpl);
rxq_mac_addrs_del(&tmpl);
/* Update original queue in case of failure. */
+ rxq->allmulti_flow = tmpl.allmulti_flow;
+ rxq->promisc_flow = tmpl.promisc_flow;
memcpy(rxq->mac_configured, tmpl.mac_configured,
sizeof(rxq->mac_configured));
memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
@@ -622,7 +628,13 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
/* Reconfigure flows. Do not care for errors. */
if (!priv->rss) {
rxq_mac_addrs_add(&tmpl);
+ if (priv->promisc)
+ rxq_promiscuous_enable(&tmpl);
+ if (priv->allmulti)
+ rxq_allmulticast_enable(&tmpl);
/* Update original queue in case of failure. */
+ rxq->allmulti_flow = tmpl.allmulti_flow;
+ rxq->promisc_flow = tmpl.promisc_flow;
memcpy(rxq->mac_configured, tmpl.mac_configured,
sizeof(rxq->mac_configured));
memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index b37843b..228dff6 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -109,6 +109,8 @@ struct rxq {
*/
BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
+ struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+ struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index bcec957..fbc977c 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -86,6 +86,10 @@ mlx5_dev_start(struct rte_eth_dev *dev)
if (rxq == NULL)
continue;
ret = rxq_mac_addrs_add(rxq);
+ if (!ret && priv->promisc)
+ ret = rxq_promiscuous_enable(rxq);
+ if (!ret && priv->allmulti)
+ ret = rxq_allmulticast_enable(rxq);
if (!ret)
continue;
WARN("%p: QP flow attachment failed: %s",
@@ -94,6 +98,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
while (i != 0) {
rxq = (*priv->rxqs)[--i];
if (rxq != NULL) {
+ rxq_allmulticast_disable(rxq);
+ rxq_promiscuous_disable(rxq);
rxq_mac_addrs_del(rxq);
}
}
@@ -139,6 +145,8 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
/* Ignore nonexistent RX queues. */
if (rxq == NULL)
continue;
+ rxq_allmulticast_disable(rxq);
+ rxq_promiscuous_disable(rxq);
rxq_mac_addrs_del(rxq);
} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
priv_unlock(priv);
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/13] mlx5: add link update device operation
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (7 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 10/13] mlx5: add flow control device operations Adrien Mazarguil
` (4 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
Link information is retrieved using ethtool ioctls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 71 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7e60d6a..cb66186 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -137,6 +137,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.promiscuous_disable = mlx5_promiscuous_disable,
.allmulticast_enable = mlx5_allmulticast_enable,
.allmulticast_disable = mlx5_allmulticast_disable,
+ .link_update = mlx5_link_update,
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 40dc1f9..3078f13 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -169,6 +169,7 @@ int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
+int mlx5_link_update(struct rte_eth_dev *, int);
int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0e95d26..a665725 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -45,6 +45,8 @@
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/if.h>
+#include <linux/ethtool.h>
+#include <linux/sockios.h>
/* DPDK headers don't like -pedantic. */
#ifdef PEDANTIC
@@ -535,6 +537,75 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
}
/**
+ * DPDK callback to retrieve physical link information (unlocked version).
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ * Wait for request completion (ignored).
+ */
+static int
+mlx5_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ethtool_cmd edata = {
+ .cmd = ETHTOOL_GSET
+ };
+ struct ifreq ifr;
+ struct rte_eth_link dev_link;
+ int link_speed = 0;
+
+ (void)wait_to_complete;
+ if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+ WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
+ return -1;
+ }
+ memset(&dev_link, 0, sizeof(dev_link));
+ dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+ (ifr.ifr_flags & IFF_RUNNING));
+ ifr.ifr_data = &edata;
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+ strerror(errno));
+ return -1;
+ }
+ link_speed = ethtool_cmd_speed(&edata);
+ if (link_speed == -1)
+ dev_link.link_speed = 0;
+ else
+ dev_link.link_speed = link_speed;
+ dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+ ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+ if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
+ /* Link status changed. */
+ dev->data->dev_link = dev_link;
+ return 0;
+ }
+ /* Link status is still the same. */
+ return -1;
+}
+
+/**
+ * DPDK callback to retrieve physical link information.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ * Wait for request completion (ignored).
+ */
+int
+mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = mlx5_link_update_unlocked(dev, wait_to_complete);
+ priv_unlock(priv);
+ return ret;
+}
+
+/**
* DPDK callback to change the MTU.
*
* Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/13] mlx5: add flow control device operations
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (8 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 09/13] mlx5: add link update device operation Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 11/13] mlx5: add VLAN filtering Adrien Mazarguil
` (3 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
Like most other device control operations, those are handled by the related
kernel network device through syscalls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5.h | 2 +
drivers/net/mlx5/mlx5_ethdev.c | 99 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 103 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index cb66186..b182d0b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -145,6 +145,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
.tx_queue_release = mlx5_tx_queue_release,
+ .flow_ctrl_get = mlx5_dev_get_flow_ctrl,
+ .flow_ctrl_set = mlx5_dev_set_flow_ctrl,
.mac_addr_remove = mlx5_mac_addr_remove,
.mac_addr_add = mlx5_mac_addr_add,
.mtu_set = mlx5_dev_set_mtu,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3078f13..1d488fb 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -171,6 +171,8 @@ int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
int mlx5_link_update(struct rte_eth_dev *, int);
int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
+int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
+int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index a665725..181a877 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -694,6 +694,105 @@ out:
}
/**
+ * DPDK callback to get flow control status.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] fc_conf
+ * Flow control output buffer.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ifreq ifr;
+ struct ethtool_pauseparam ethpause = {
+ .cmd = ETHTOOL_GPAUSEPARAM
+ };
+ int ret;
+
+ ifr.ifr_data = ðpause;
+ priv_lock(priv);
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ ret = errno;
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
+ " failed: %s",
+ strerror(ret));
+ goto out;
+ }
+
+ fc_conf->autoneg = ethpause.autoneg;
+ if (ethpause.rx_pause && ethpause.tx_pause)
+ fc_conf->mode = RTE_FC_FULL;
+ else if (ethpause.rx_pause)
+ fc_conf->mode = RTE_FC_RX_PAUSE;
+ else if (ethpause.tx_pause)
+ fc_conf->mode = RTE_FC_TX_PAUSE;
+ else
+ fc_conf->mode = RTE_FC_NONE;
+ ret = 0;
+
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
+ * DPDK callback to modify flow control parameters.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[in] fc_conf
+ * Flow control parameters.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ifreq ifr;
+ struct ethtool_pauseparam ethpause = {
+ .cmd = ETHTOOL_SPAUSEPARAM
+ };
+ int ret;
+
+ ifr.ifr_data = ðpause;
+ ethpause.autoneg = fc_conf->autoneg;
+ if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+ (fc_conf->mode & RTE_FC_RX_PAUSE))
+ ethpause.rx_pause = 1;
+ else
+ ethpause.rx_pause = 0;
+
+ if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+ (fc_conf->mode & RTE_FC_TX_PAUSE))
+ ethpause.tx_pause = 1;
+ else
+ ethpause.tx_pause = 0;
+
+ priv_lock(priv);
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ ret = errno;
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
+ " failed: %s",
+ strerror(ret));
+ goto out;
+ }
+ ret = 0;
+
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 11/13] mlx5: add VLAN filtering
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (9 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 10/13] mlx5: add flow control device operations Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 12/13] mlx5: add checksum offloading support Adrien Mazarguil
` (2 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
All MAC RX flows must be updated with VLAN information when configuring a
VLAN filter.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 4 ++
drivers/net/mlx5/mlx5_vlan.c | 166 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 172 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_vlan.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 4d25d9c..8b1e32b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index b182d0b..47070f8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -141,6 +141,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
+ .vlan_filter_set = mlx5_vlan_filter_set,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1d488fb..459dc3d 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -203,6 +203,10 @@ void mlx5_allmulticast_disable(struct rte_eth_dev *);
void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
void mlx5_stats_reset(struct rte_eth_dev *);
+/* mlx5_vlan.c */
+
+int mlx5_vlan_filter_set(struct rte_eth_dev *, uint16_t, int);
+
/* mlx5_trigger.c */
int mlx5_dev_start(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
new file mode 100644
index 0000000..60fe06b
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -0,0 +1,166 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <errno.h>
+#include <assert.h>
+#include <stdint.h>
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+
+/**
+ * Configure a VLAN filter.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param vlan_id
+ * VLAN ID to filter.
+ * @param on
+ * Toggle filter.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ unsigned int j = -1;
+
+ DEBUG("%p: %s VLAN filter ID %" PRIu16,
+ (void *)dev, (on ? "enable" : "disable"), vlan_id);
+ for (i = 0; (i != RTE_DIM(priv->vlan_filter)); ++i) {
+ if (!priv->vlan_filter[i].enabled) {
+ /* Unused index, remember it. */
+ j = i;
+ continue;
+ }
+ if (priv->vlan_filter[i].id != vlan_id)
+ continue;
+ /* This VLAN ID is already known, use its index. */
+ j = i;
+ break;
+ }
+ /* Check if there's room for another VLAN filter. */
+ if (j == (unsigned int)-1)
+ return ENOMEM;
+ /*
+ * VLAN filters apply to all configured MAC addresses, flow
+ * specifications must be reconfigured accordingly.
+ */
+ priv->vlan_filter[j].id = vlan_id;
+ if ((on) && (!priv->vlan_filter[j].enabled)) {
+ /*
+ * Filter is disabled, enable it.
+ * Rehashing flows in all RX queues is necessary.
+ */
+ if (priv->rss)
+ rxq_mac_addrs_del(&priv->rxq_parent);
+ else
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_mac_addrs_del((*priv->rxqs)[i]);
+ priv->vlan_filter[j].enabled = 1;
+ if (priv->started) {
+ if (priv->rss)
+ rxq_mac_addrs_add(&priv->rxq_parent);
+ else
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ rxq_mac_addrs_add((*priv->rxqs)[i]);
+ }
+ }
+ } else if ((!on) && (priv->vlan_filter[j].enabled)) {
+ /*
+ * Filter is enabled, disable it.
+ * Rehashing flows in all RX queues is necessary.
+ */
+ if (priv->rss)
+ rxq_mac_addrs_del(&priv->rxq_parent);
+ else
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_mac_addrs_del((*priv->rxqs)[i]);
+ priv->vlan_filter[j].enabled = 0;
+ if (priv->started) {
+ if (priv->rss)
+ rxq_mac_addrs_add(&priv->rxq_parent);
+ else
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ rxq_mac_addrs_add((*priv->rxqs)[i]);
+ }
+ }
+ }
+ return 0;
+}
+
+/**
+ * DPDK callback to configure a VLAN filter.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param vlan_id
+ * VLAN ID to filter.
+ * @param on
+ * Toggle filter.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = vlan_filter_set(dev, vlan_id, on);
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/13] mlx5: add checksum offloading support
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (10 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 11/13] mlx5: add VLAN filtering Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
This is the same implementation as mlx4.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5_rxq.c | 14 +++++++
drivers/net/mlx5/mlx5_rxtx.c | 94 +++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 2 +
drivers/net/mlx5/mlx5_utils.h | 6 +++
4 files changed, 116 insertions(+)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index d44bb10..8cfad17 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -565,6 +565,15 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
/* Number of descriptors and mbufs currently allocated. */
desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
mbuf_n = desc_n;
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum) {
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum = tmpl.csum;
+ }
+ if (priv->hw_csum_l2tun) {
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum_l2tun = tmpl.csum_l2tun;
+ }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -788,6 +797,11 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum)
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ if (priv->hw_csum_l2tun)
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 960a3e5..668aff0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -390,6 +390,17 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
++elts_comp;
send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
+ /* Should we enable HW CKSUM offload */
+ if (buf->ol_flags &
+ (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
+ send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
+ /* HW does not support checksum offloads at arbitrary
+ * offsets but automatically recognizes the packet
+ * type. For inner L3/L4 checksums, only VXLAN (UDP)
+ * tunnels are currently supported. */
+ if (RTE_ETH_IS_TUNNEL_PKT(buf->packet_type))
+ send_flags |= IBV_EXP_QP_BURST_TUNNEL;
+ }
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -491,6 +502,85 @@ stop:
}
/**
+ * Translate RX completion flags to packet type.
+ *
+ * @param flags
+ * RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ * Packet type for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_pkt_type(uint32_t flags)
+{
+ uint32_t pkt_type;
+
+ if (flags & IBV_EXP_CQ_RX_TUNNEL_PACKET)
+ pkt_type =
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
+ RTE_PTYPE_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
+ RTE_PTYPE_L3_IPV6) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV4_PACKET,
+ RTE_PTYPE_INNER_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV6_PACKET,
+ RTE_PTYPE_INNER_L3_IPV6);
+ else
+ pkt_type =
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV4_PACKET,
+ RTE_PTYPE_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV6_PACKET,
+ RTE_PTYPE_L3_IPV6);
+ return pkt_type;
+}
+
+/**
+ * Translate RX completion flags to offload flags.
+ *
+ * @param[in] rxq
+ * Pointer to RX queue structure.
+ * @param flags
+ * RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ * Offload flags (ol_flags) for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
+{
+ uint32_t ol_flags = 0;
+
+ if (rxq->csum)
+ ol_flags |=
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+ /*
+ * PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
+ * of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
+ * (its value is 0).
+ */
+ if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
+ ol_flags |=
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+ return ol_flags;
+}
+
+/**
* DPDK callback for RX with scattered packets support.
*
* @param dpdk_rxq
@@ -669,6 +759,8 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = pkt_buf_len;
+ pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
+ pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = pkt_buf;
@@ -828,6 +920,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NEXT(seg) = NULL;
PKT_LEN(seg) = len;
DATA_LEN(seg) = len;
+ seg->packet_type = rxq_cq_to_pkt_type(flags);
+ seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = seg;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 228dff6..0eb1e98 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -119,6 +119,8 @@ struct rxq {
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
+ unsigned int csum:1; /* Enable checksum offloading. */
+ unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx5_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index e48e6b6..8ff075b 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -149,6 +149,12 @@ pmd_drv_log_basename(const char *s)
#define NB_SEGS(m) ((m)->nb_segs)
#define PORT(m) ((m)->port)
+/* Transpose flags. Useful to convert IBV to DPDK flags. */
+#define TRANSPOSE(val, from, to) \
+ (((from) >= (to)) ? \
+ (((val) & (from)) / ((from) / (to))) : \
+ (((val) & (from)) * ((to) / (from))))
+
/* Allocate a buffer on the stack and fill it with a printf format string. */
#define MKSTR(name, ...) \
char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/13] doc: add mlx5 documentation and release notes for version 2.2
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (11 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 12/13] mlx5: add checksum offloading support Adrien Mazarguil
@ 2015-10-05 17:53 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-05 17:53 UTC (permalink / raw)
To: dev
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
doc/guides/nics/mlx5.rst | 308 +++++++++++++++++++++++++++++++++++
doc/guides/rel_notes/release_2_2.rst | 8 +
2 files changed, 316 insertions(+)
create mode 100644 doc/guides/nics/mlx5.rst
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
new file mode 100644
index 0000000..fdb621c
--- /dev/null
+++ b/doc/guides/nics/mlx5.rst
@@ -0,0 +1,308 @@
+.. BSD LICENSE
+ Copyright 2015 6WIND S.A.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of 6WIND S.A. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MLX5 poll mode driver
+=====================
+
+The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for
+**Mellanox ConnectX-4 EN** and **Mellanox ConnectX-4 Lx EN** families of
+10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+Information and documentation about these adapters can be found on the
+`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
+`Mellanox community <http://community.mellanox.com/welcome>`__.
+
+There is also a `section dedicated to this poll mode driver
+<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__.
+
+.. note::
+
+ Due to external dependencies, this driver is disabled by default. It must
+ be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and
+ recompiling DPDK.
+
+.. warning::
+
+ ``CONFIG_RTE_BUILD_COMBINE_LIBS`` with ``CONFIG_RTE_BUILD_SHARED_LIB``
+ is not supported and thus the compilation will fail with this configuration.
+
+Implementation details
+----------------------
+
+Besides its dependency on libibverbs (that implies libmlx5 and associated
+kernel support), librte_pmd_mlx5 relies heavily on system calls for control
+operations such as querying/updating the MTU and flow control parameters.
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel
+combined with hardware specifications that allow it to handle virtual memory
+addresses directly ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+This capability allows the PMD to coexist with kernel network interfaces
+which remain functional, although they stop receiving unicast packets as
+long as they share the same MAC address.
+
+Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
+libibverbs.
+
+Configuration
+-------------
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**)
+
+ Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**)
+
+ Toggle debugging code and stricter compilation flags. Enabling this option
+ adds additional run-time checks and debugging messages at the cost of
+ lower performance.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
+
+ Number of scatter/gather elements (SGEs) per work request (WR). Lowering
+ this number improves performance but also limits the ability to receive
+ scattered packets (packets that do not fit a single mbuf). The default
+ value is a safe tradeoff.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
+
+ Amount of data to be inlined during TX operations. Improves latency but
+ lowers throughput.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)
+
+ Maximum number of cached memory pools (MPs) per TX queue. Each MP from
+ which buffers are to be transmitted must be associated to memory regions
+ (MRs). This is a slow operation that must be cached.
+
+ This value is always 1 for RX queues since they use a single MP.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- librte_pmd_mlx5 brings kernel network interfaces up during initialization
+ because it is affected by their state. Forcing them down prevents packets
+ reception.
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+Prerequisites
+-------------
+
+This driver relies on external libraries and kernel drivers for resources
+allocations and initialization. The following dependencies are not part of
+DPDK and must be installed separately:
+
+- **libibverbs**
+
+ User space Verbs framework used by librte_pmd_mlx5. This library provides
+ a generic interface between the kernel and low-level user space drivers
+ such as libmlx5.
+
+ It allows slow and privileged operations (context initialization, hardware
+ resources allocations) to be managed by the kernel and fast operations to
+ never leave user space.
+
+- **libmlx5**
+
+ Low-level user space driver library for Mellanox ConnectX-4 devices,
+ it is automatically loaded by libibverbs.
+
+ This library basically implements send/receive calls to the hardware
+ queues.
+
+- **Kernel modules** (mlnx-ofed-kernel)
+
+ They provide the kernel-side Verbs API and low level device drivers that
+ manage actual hardware initialization and resources sharing with user
+ space processes.
+
+ Unlike most other PMDs, these modules must remain loaded and bound to
+ their devices:
+
+ - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and
+ related Ethernet kernel network devices.
+ - mlx5_ib: InifiniBand device driver.
+ - ib_uverbs: user space driver for Verbs (entry point for libibverbs).
+
+- **Firmware update**
+
+ Mellanox OFED releases include firmware updates for ConnectX-4 adapters.
+
+ Because each release provides new features, these updates must be applied to
+ match the kernel modules and libraries they come with.
+
+.. note::
+
+ Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
+ licensed.
+
+Getting Mellanox OFED
+~~~~~~~~~~~~~~~~~~~~~
+
+While these libraries and kernel modules are available on OpenFabrics
+Alliance's `website <https://www.openfabrics.org/>`__ and provided by package
+managers on most distributions, this PMD requires Ethernet extensions that
+may not be supported at the moment (this is a work in progress).
+
+`Mellanox OFED
+<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__
+includes the necessary support and should be used in the meantime. For DPDK,
+only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are
+required from that distribution.
+
+.. note::
+
+ Several versions of Mellanox OFED are available. Installing the version
+ this DPDK release was developed and tested against is strongly
+ recommended. Please check the `prerequisites`_.
+
+Usage example
+-------------
+
+This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4
+devices managed by librte_pmd_mlx5.
+
+#. Load the kernel modules:
+
+ .. code-block:: console
+
+ modprobe -a ib_uverbs mlx5_core mlx5_ib
+
+ .. note::
+
+ User space I/O kernel modules (uio and igb_uio) are not used and do
+ not have to be loaded.
+
+#. Make sure Ethernet interfaces are in working order and linked to kernel
+ verbs. Related sysfs entries should be present:
+
+ .. code-block:: console
+
+ ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5
+
+ Example output:
+
+ .. code-block:: console
+
+ eth30
+ eth31
+ eth32
+ eth33
+
+#. Optionally, retrieve their PCI bus addresses for whitelisting:
+
+ .. code-block:: console
+
+ {
+ for intf in eth2 eth3 eth4 eth5;
+ do
+ (cd "/sys/class/net/${intf}/device/" && pwd -P);
+ done;
+ } |
+ sed -n 's,.*/\(.*\),-w \1,p'
+
+ Example output:
+
+ .. code-block:: console
+
+ -w 0000:05:00.1
+ -w 0000:06:00.0
+ -w 0000:06:00.1
+ -w 0000:05:00.0
+
+#. Request huge pages:
+
+ .. code-block:: console
+
+ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages
+
+#. Start testpmd with basic parameters:
+
+ .. code-block:: console
+
+ testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
+
+ Example output:
+
+ .. code-block:: console
+
+ [...]
+ EAL: PCI device 0000:05:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe
+ EAL: PCI device 0000:05:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff
+ EAL: PCI device 0000:06:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa
+ EAL: PCI device 0000:06:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb
+ Interactive-mode selected
+ Configuring Port 0 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2
+ Port 0: E4:1D:2D:E7:0C:FE
+ Configuring Port 1 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2
+ Port 1: E4:1D:2D:E7:0C:FF
+ Configuring Port 2 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2
+ Port 2: E4:1D:2D:E7:0C:FA
+ Configuring Port 3 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2
+ Port 3: E4:1D:2D:E7:0C:FB
+ Checking link statuses...
+ Port 0 Link Up - speed 40000 Mbps - full-duplex
+ Port 1 Link Up - speed 40000 Mbps - full-duplex
+ Port 2 Link Up - speed 10000 Mbps - full-duplex
+ Port 3 Link Up - speed 10000 Mbps - full-duplex
+ Done
+ testpmd>
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 5687676..e562030 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -4,6 +4,14 @@ DPDK Release 2.2
New Features
------------
+* **Added support for Mellanox ConnectX-4 adapters (mlx5).**
+
+ The mlx5 poll-mode driver implements support for Mellanox ConnectX-4 EN
+ and Mellanox ConnectX-4 Lx EN families of 10/25/40/50/100 Gb/s adapters.
+
+ Like mlx4, this PMD is only available for Linux and is disabled by default
+ due to external dependencies (libibverbs and libmlx5).
+
Resolved Issues
---------------
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5)
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (12 preceding siblings ...)
2015-10-05 17:53 ` [PATCH 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
` (13 more replies)
13 siblings, 14 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
This PMD adds basic support for Mellanox ConnectX-4 (mlx5) families of
10/25/40/50/100 Gb/s adapters through the Verbs framework.
Its design is very similar to that of mlx4 from which most of its code is
borrowed without the mistake of putting it all in a single huge file.
It is disabled by default due to its dependency on libibverbs.
Changes in v2:
- Removed useless port inactive warning.
- Simplified code by replacing configured MAC addresses RX queue bit-field
with flow pointer checks.
- Replaced allmulti/promisc status bits with request bits to fix
inconsistencies when restoring these modes.
- Updated comments about maximum number of MAC addresses and VLAN filters.
- Improved performance with better prefetching.
- Fixed deadlock in case of error during port start.
- Simplified VLAN filtering configuration storage using a basic list instead
of a table (with holes).
Adrien Mazarguil (13):
mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters
mlx5: add non-scattered TX and RX support
mlx5: add MAC handling
mlx5: add device configure/start/stop
mlx5: add support for scattered RX and TX buffers
mlx5: add MTU configuration support
mlx5: add software counters and related callbacks
mlx5: add promiscuous and allmulticast RX modes
mlx5: add link update device operation
mlx5: add flow control device operations
mlx5: add VLAN filtering
mlx5: add checksum offloading support
doc: add mlx5 documentation and release notes for version 2.2
MAINTAINERS | 4 +
config/common_bsdapp | 9 +
config/common_linuxapp | 9 +
doc/guides/nics/mlx5.rst | 308 +++++++++
doc/guides/rel_notes/release_2_2.rst | 9 +
drivers/net/Makefile | 1 +
drivers/net/mlx5/Makefile | 128 ++++
drivers/net/mlx5/mlx5.c | 553 +++++++++++++++
drivers/net/mlx5/mlx5.h | 212 ++++++
drivers/net/mlx5/mlx5_defs.h | 78 +++
drivers/net/mlx5/mlx5_ethdev.c | 844 +++++++++++++++++++++++
drivers/net/mlx5/mlx5_mac.c | 464 +++++++++++++
drivers/net/mlx5/mlx5_rxmode.c | 311 +++++++++
drivers/net/mlx5/mlx5_rxq.c | 1064 +++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.c | 1009 +++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 194 ++++++
drivers/net/mlx5/mlx5_stats.c | 144 ++++
drivers/net/mlx5/mlx5_trigger.c | 154 +++++
drivers/net/mlx5/mlx5_txq.c | 513 ++++++++++++++
drivers/net/mlx5/mlx5_utils.h | 166 +++++
drivers/net/mlx5/mlx5_vlan.c | 156 +++++
drivers/net/mlx5/rte_pmd_mlx5_version.map | 3 +
mk/rte.app.mk | 5 +
23 files changed, 6338 insertions(+)
create mode 100644 doc/guides/nics/mlx5.rst
create mode 100644 drivers/net/mlx5/Makefile
create mode 100644 drivers/net/mlx5/mlx5.c
create mode 100644 drivers/net/mlx5/mlx5.h
create mode 100644 drivers/net/mlx5/mlx5_defs.h
create mode 100644 drivers/net/mlx5/mlx5_ethdev.c
create mode 100644 drivers/net/mlx5/mlx5_mac.c
create mode 100644 drivers/net/mlx5/mlx5_rxmode.c
create mode 100644 drivers/net/mlx5/mlx5_rxq.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.h
create mode 100644 drivers/net/mlx5/mlx5_stats.c
create mode 100644 drivers/net/mlx5/mlx5_trigger.c
create mode 100644 drivers/net/mlx5/mlx5_txq.c
create mode 100644 drivers/net/mlx5/mlx5_utils.h
create mode 100644 drivers/net/mlx5/mlx5_vlan.c
create mode 100644 drivers/net/mlx5/rte_pmd_mlx5_version.map
--
2.1.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
` (12 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
In its current state, this driver implements the bare minimum to initialize
itself and Mellanox ConnectX-4 adapters without doing anything else
(no RX/TX for instance). It is disabled by default since it is based on the
mlx4 driver and also depends on libibverbs.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Or Ami <ora@mellanox.com>
---
MAINTAINERS | 4 +
config/common_bsdapp | 6 +
config/common_linuxapp | 6 +
drivers/net/Makefile | 1 +
drivers/net/mlx5/Makefile | 109 +++++++
drivers/net/mlx5/mlx5.c | 496 ++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5.h | 147 +++++++++
drivers/net/mlx5/mlx5_defs.h | 43 +++
drivers/net/mlx5/mlx5_ethdev.c | 420 +++++++++++++++++++++++++
drivers/net/mlx5/mlx5_mac.c | 150 +++++++++
drivers/net/mlx5/mlx5_utils.h | 149 +++++++++
drivers/net/mlx5/rte_pmd_mlx5_version.map | 3 +
mk/rte.app.mk | 5 +
13 files changed, 1539 insertions(+)
create mode 100644 drivers/net/mlx5/Makefile
create mode 100644 drivers/net/mlx5/mlx5.c
create mode 100644 drivers/net/mlx5/mlx5.h
create mode 100644 drivers/net/mlx5/mlx5_defs.h
create mode 100644 drivers/net/mlx5/mlx5_ethdev.c
create mode 100644 drivers/net/mlx5/mlx5_mac.c
create mode 100644 drivers/net/mlx5/mlx5_utils.h
create mode 100644 drivers/net/mlx5/rte_pmd_mlx5_version.map
diff --git a/MAINTAINERS b/MAINTAINERS
index 080a8e8..9d11055 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -255,6 +255,10 @@ M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
F: drivers/net/mlx4/
F: doc/guides/nics/mlx4.rst
+Mellanox mlx5
+M: Adrien Mazarguil <adrien.mazarguil@6wind.com>
+F: drivers/net/mlx5/
+
RedHat virtio
M: Huawei Xie <huawei.xie@intel.com>
M: Changchun Ouyang <changchun.ouyang@intel.com>
diff --git a/config/common_bsdapp b/config/common_bsdapp
index b37dcf4..1e6885f 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -214,6 +214,12 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
+# Compile burst-oriented Mellanox ConnectX-4 (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_PMD=n
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+
+#
# Compile burst-oriented Broadcom PMD driver
#
CONFIG_RTE_LIBRTE_BNX2X_PMD=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 0de43d5..7da7ba7 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -212,6 +212,12 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
+# Compile burst-oriented Mellanox ConnectX-4 (MLX5) PMD
+#
+CONFIG_RTE_LIBRTE_MLX5_PMD=n
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+
+#
# Compile burst-oriented Broadcom PMD driver
#
CONFIG_RTE_LIBRTE_BNX2X_PMD=n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 5ebf963..6da1ce2 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
+DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null
DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += pcap
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
new file mode 100644
index 0000000..6e63073
--- /dev/null
+++ b/drivers/net/mlx5/Makefile
@@ -0,0 +1,109 @@
+# BSD LICENSE
+#
+# Copyright 2015 6WIND S.A.
+# Copyright 2015 Mellanox.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+# * Redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer.
+# * Redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in
+# the documentation and/or other materials provided with the
+# distribution.
+# * Neither the name of 6WIND S.A. nor the names of its
+# contributors may be used to endorse or promote products derived
+# from this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+ifeq ($(CONFIG_RTE_BUILD_COMBINE_LIBS)$(CONFIG_RTE_BUILD_SHARED_LIB),yy)
+all:
+ @echo 'MLX5: Not supported in a combined shared library'
+ @false
+endif
+
+# Library name.
+LIB = librte_pmd_mlx5.a
+
+# Sources.
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+
+# Dependencies.
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_mempool
+
+# Basic CFLAGS.
+CFLAGS += -O3
+CFLAGS += -std=gnu99 -Wall -Wextra
+CFLAGS += -g
+CFLAGS += -I.
+CFLAGS += -D_XOPEN_SOURCE=600
+CFLAGS += $(WERROR_FLAGS)
+LDLIBS += -libverbs
+
+# A few warnings cannot be avoided in external headers.
+CFLAGS += -Wno-error=cast-qual
+
+EXPORT_MAP := rte_pmd_mlx5_version.map
+LIBABIVER := 1
+
+# DEBUG which is usually provided on the command-line may enable
+# CONFIG_RTE_LIBRTE_MLX5_DEBUG.
+ifeq ($(DEBUG),1)
+CONFIG_RTE_LIBRTE_MLX5_DEBUG := y
+endif
+
+# User-defined CFLAGS.
+ifeq ($(CONFIG_RTE_LIBRTE_MLX5_DEBUG),y)
+CFLAGS += -pedantic -UNDEBUG -DPEDANTIC
+else
+CFLAGS += -DNDEBUG -UPEDANTIC
+endif
+
+include $(RTE_SDK)/mk/rte.lib.mk
+
+# Generate and clean-up mlx5_autoconf.h.
+
+export CC CFLAGS CPPFLAGS EXTRA_CFLAGS EXTRA_CPPFLAGS
+export AUTO_CONFIG_CFLAGS = -Wno-error
+
+ifndef V
+AUTOCONF_OUTPUT := >/dev/null
+endif
+
+mlx5_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
+ $Q $(RM) -f -- '$@'
+ $Q sh -- '$<' '$@' \
+ RSS_SUPPORT \
+ infiniband/verbs.h \
+ enum IBV_EXP_DEVICE_UD_RSS $(AUTOCONF_OUTPUT)
+ $Q sh -- '$<' '$@' \
+ HAVE_EXP_QUERY_DEVICE \
+ infiniband/verbs.h \
+ type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+
+mlx5.o: mlx5_autoconf.h
+
+clean_mlx5: FORCE
+ $Q rm -f -- mlx5_autoconf.h
+
+clean: clean_mlx5
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
new file mode 100644
index 0000000..6df486b
--- /dev/null
+++ b/drivers/net/mlx5/mlx5.c
@@ -0,0 +1,496 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <unistd.h>
+#include <string.h>
+#include <assert.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <net/if.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_pci.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+#include "mlx5_autoconf.h"
+
+/**
+ * DPDK callback to close the device.
+ *
+ * Destroy all queues and objects, free memory.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+static void
+mlx5_dev_close(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ priv_lock(priv);
+ DEBUG("%p: closing device \"%s\"",
+ (void *)dev,
+ ((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+ if (priv->pd != NULL) {
+ assert(priv->ctx != NULL);
+ claim_zero(ibv_dealloc_pd(priv->pd));
+ claim_zero(ibv_close_device(priv->ctx));
+ } else
+ assert(priv->ctx == NULL);
+ priv_unlock(priv);
+ memset(priv, 0, sizeof(*priv));
+}
+
+static const struct eth_dev_ops mlx5_dev_ops = {
+ .dev_close = mlx5_dev_close,
+};
+
+static struct {
+ struct rte_pci_addr pci_addr; /* associated PCI address */
+ uint32_t ports; /* physical ports bitfield. */
+} mlx5_dev[32];
+
+/**
+ * Get device index in mlx5_dev[] from PCI bus address.
+ *
+ * @param[in] pci_addr
+ * PCI bus address to look for.
+ *
+ * @return
+ * mlx5_dev[] index on success, -1 on failure.
+ */
+static int
+mlx5_dev_idx(struct rte_pci_addr *pci_addr)
+{
+ unsigned int i;
+ int ret = -1;
+
+ assert(pci_addr != NULL);
+ for (i = 0; (i != RTE_DIM(mlx5_dev)); ++i) {
+ if ((mlx5_dev[i].pci_addr.domain == pci_addr->domain) &&
+ (mlx5_dev[i].pci_addr.bus == pci_addr->bus) &&
+ (mlx5_dev[i].pci_addr.devid == pci_addr->devid) &&
+ (mlx5_dev[i].pci_addr.function == pci_addr->function))
+ return i;
+ if ((mlx5_dev[i].ports == 0) && (ret == -1))
+ ret = i;
+ }
+ return ret;
+}
+
+static struct eth_driver mlx5_driver;
+
+/**
+ * DPDK callback to register a PCI device.
+ *
+ * This function creates an Ethernet device for each port of a given
+ * PCI device.
+ *
+ * @param[in] pci_drv
+ * PCI driver structure (mlx5_driver).
+ * @param[in] pci_dev
+ * PCI device information.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+static int
+mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
+{
+ struct ibv_device **list;
+ struct ibv_device *ibv_dev;
+ int err = 0;
+ struct ibv_context *attr_ctx = NULL;
+ struct ibv_device_attr device_attr;
+ unsigned int vf;
+ int idx;
+ int i;
+
+ (void)pci_drv;
+ assert(pci_drv == &mlx5_driver.pci_drv);
+ /* Get mlx5_dev[] index. */
+ idx = mlx5_dev_idx(&pci_dev->addr);
+ if (idx == -1) {
+ ERROR("this driver cannot support any more adapters");
+ return -ENOMEM;
+ }
+ DEBUG("using driver device index %d", idx);
+
+ /* Save PCI address. */
+ mlx5_dev[idx].pci_addr = pci_dev->addr;
+ list = ibv_get_device_list(&i);
+ if (list == NULL) {
+ assert(errno);
+ if (errno == ENOSYS) {
+ WARN("cannot list devices, is ib_uverbs loaded?");
+ return 0;
+ }
+ return -errno;
+ }
+ assert(i >= 0);
+ /*
+ * For each listed device, check related sysfs entry against
+ * the provided PCI ID.
+ */
+ while (i != 0) {
+ struct rte_pci_addr pci_addr;
+
+ --i;
+ DEBUG("checking device \"%s\"", list[i]->name);
+ if (mlx5_ibv_device_to_pci_addr(list[i], &pci_addr))
+ continue;
+ if ((pci_dev->addr.domain != pci_addr.domain) ||
+ (pci_dev->addr.bus != pci_addr.bus) ||
+ (pci_dev->addr.devid != pci_addr.devid) ||
+ (pci_dev->addr.function != pci_addr.function))
+ continue;
+ vf = ((pci_dev->id.device_id ==
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4VF) ||
+ (pci_dev->id.device_id ==
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF));
+ INFO("PCI information matches, using device \"%s\" (VF: %s)",
+ list[i]->name, (vf ? "true" : "false"));
+ attr_ctx = ibv_open_device(list[i]);
+ err = errno;
+ break;
+ }
+ if (attr_ctx == NULL) {
+ ibv_free_device_list(list);
+ switch (err) {
+ case 0:
+ WARN("cannot access device, is mlx5_ib loaded?");
+ return 0;
+ case EINVAL:
+ WARN("cannot use device, are drivers up to date?");
+ return 0;
+ }
+ assert(err > 0);
+ return -err;
+ }
+ ibv_dev = list[i];
+
+ DEBUG("device opened");
+ if (ibv_query_device(attr_ctx, &device_attr))
+ goto error;
+ INFO("%u port(s) detected", device_attr.phys_port_cnt);
+
+ for (i = 0; i < device_attr.phys_port_cnt; i++) {
+ uint32_t port = i + 1; /* ports are indexed from one */
+ uint32_t test = (1 << i);
+ struct ibv_context *ctx = NULL;
+ struct ibv_port_attr port_attr;
+ struct ibv_pd *pd = NULL;
+ struct priv *priv = NULL;
+ struct rte_eth_dev *eth_dev;
+#ifdef HAVE_EXP_QUERY_DEVICE
+ struct ibv_exp_device_attr exp_device_attr;
+#endif /* HAVE_EXP_QUERY_DEVICE */
+ struct ether_addr mac;
+
+#ifdef HAVE_EXP_QUERY_DEVICE
+ exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
+#ifdef RSS_SUPPORT
+ exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
+#endif /* RSS_SUPPORT */
+#endif /* HAVE_EXP_QUERY_DEVICE */
+
+ DEBUG("using port %u (%08" PRIx32 ")", port, test);
+
+ ctx = ibv_open_device(ibv_dev);
+ if (ctx == NULL)
+ goto port_error;
+
+ /* Check port status. */
+ err = ibv_query_port(ctx, port, &port_attr);
+ if (err) {
+ ERROR("port query failed: %s", strerror(err));
+ goto port_error;
+ }
+ if (port_attr.state != IBV_PORT_ACTIVE)
+ DEBUG("port %d is not active: \"%s\" (%d)",
+ port, ibv_port_state_str(port_attr.state),
+ port_attr.state);
+
+ /* Allocate protection domain. */
+ pd = ibv_alloc_pd(ctx);
+ if (pd == NULL) {
+ ERROR("PD allocation failure");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ mlx5_dev[idx].ports |= test;
+
+ /* from rte_ethdev.c */
+ priv = rte_zmalloc("ethdev private structure",
+ sizeof(*priv),
+ RTE_CACHE_LINE_SIZE);
+ if (priv == NULL) {
+ ERROR("priv allocation failure");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ priv->ctx = ctx;
+ priv->device_attr = device_attr;
+ priv->port = port;
+ priv->pd = pd;
+ priv->mtu = ETHER_MTU;
+#ifdef HAVE_EXP_QUERY_DEVICE
+ if (ibv_exp_query_device(ctx, &exp_device_attr)) {
+ ERROR("ibv_exp_query_device() failed");
+ goto port_error;
+ }
+#ifdef RSS_SUPPORT
+ if ((exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_QPG) &&
+ (exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_UD_RSS) &&
+ (exp_device_attr.comp_mask &
+ IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ) &&
+ (exp_device_attr.max_rss_tbl_sz > 0)) {
+ priv->hw_qpg = 1;
+ priv->hw_rss = 1;
+ priv->max_rss_tbl_sz = exp_device_attr.max_rss_tbl_sz;
+ } else {
+ priv->hw_qpg = 0;
+ priv->hw_rss = 0;
+ priv->max_rss_tbl_sz = 0;
+ }
+ priv->hw_tss = !!(exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_UD_TSS);
+ DEBUG("device flags: %s%s%s",
+ (priv->hw_qpg ? "IBV_DEVICE_QPG " : ""),
+ (priv->hw_tss ? "IBV_DEVICE_TSS " : ""),
+ (priv->hw_rss ? "IBV_DEVICE_RSS " : ""));
+ if (priv->hw_rss)
+ DEBUG("maximum RSS indirection table size: %u",
+ exp_device_attr.max_rss_tbl_sz);
+#endif /* RSS_SUPPORT */
+
+ priv->hw_csum =
+ ((exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) &&
+ (exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_IP_PKT));
+ DEBUG("checksum offloading is %ssupported",
+ (priv->hw_csum ? "" : "not "));
+
+ priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_VXLAN_SUPPORT);
+ DEBUG("L2 tunnel checksum offloads are %ssupported",
+ (priv->hw_csum_l2tun ? "" : "not "));
+
+#endif /* HAVE_EXP_QUERY_DEVICE */
+
+ priv->vf = vf;
+ /* Configure the first MAC address by default. */
+ if (priv_get_mac(priv, &mac.addr_bytes)) {
+ ERROR("cannot get MAC address, is mlx5_en loaded?"
+ " (errno: %s)", strerror(errno));
+ goto port_error;
+ }
+ INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
+ priv->port,
+ mac.addr_bytes[0], mac.addr_bytes[1],
+ mac.addr_bytes[2], mac.addr_bytes[3],
+ mac.addr_bytes[4], mac.addr_bytes[5]);
+ /* Register MAC and broadcast addresses. */
+ claim_zero(priv_mac_addr_add(priv, 0,
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ mac.addr_bytes));
+ claim_zero(priv_mac_addr_add(priv, 1,
+ &(const uint8_t [ETHER_ADDR_LEN])
+ { "\xff\xff\xff\xff\xff\xff" }));
+#ifndef NDEBUG
+ {
+ char ifname[IF_NAMESIZE];
+
+ if (priv_get_ifname(priv, &ifname) == 0)
+ DEBUG("port %u ifname is \"%s\"",
+ priv->port, ifname);
+ else
+ DEBUG("port %u ifname is unknown", priv->port);
+ }
+#endif
+ /* Get actual MTU if possible. */
+ priv_get_mtu(priv, &priv->mtu);
+ DEBUG("port %u MTU is %u", priv->port, priv->mtu);
+
+ /* from rte_ethdev.c */
+ {
+ char name[RTE_ETH_NAME_MAX_LEN];
+
+ snprintf(name, sizeof(name), "%s port %u",
+ ibv_get_device_name(ibv_dev), port);
+ eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_PCI);
+ }
+ if (eth_dev == NULL) {
+ ERROR("can not allocate rte ethdev");
+ err = ENOMEM;
+ goto port_error;
+ }
+
+ eth_dev->data->dev_private = priv;
+ eth_dev->pci_dev = pci_dev;
+ eth_dev->driver = &mlx5_driver;
+ eth_dev->data->rx_mbuf_alloc_failed = 0;
+ eth_dev->data->mtu = ETHER_MTU;
+
+ priv->dev = eth_dev;
+ eth_dev->dev_ops = &mlx5_dev_ops;
+ eth_dev->data->mac_addrs = priv->mac;
+
+ /* Bring Ethernet device up. */
+ DEBUG("forcing Ethernet interface up");
+ priv_set_flags(priv, ~IFF_UP, IFF_UP);
+ continue;
+
+port_error:
+ rte_free(priv);
+ if (pd)
+ claim_zero(ibv_dealloc_pd(pd));
+ if (ctx)
+ claim_zero(ibv_close_device(ctx));
+ break;
+ }
+
+ /*
+ * XXX if something went wrong in the loop above, there is a resource
+ * leak (ctx, pd, priv, dpdk ethdev) but we can do nothing about it as
+ * long as the dpdk does not provide a way to deallocate a ethdev and a
+ * way to enumerate the registered ethdevs to free the previous ones.
+ */
+
+ /* no port found, complain */
+ if (!mlx5_dev[idx].ports) {
+ err = ENODEV;
+ goto error;
+ }
+
+error:
+ if (attr_ctx)
+ claim_zero(ibv_close_device(attr_ctx));
+ if (list)
+ ibv_free_device_list(list);
+ assert(err >= 0);
+ return -err;
+}
+
+static const struct rte_pci_id mlx5_pci_id_map[] = {
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4VF,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LX,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = PCI_VENDOR_ID_MELLANOX,
+ .device_id = PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF,
+ .subsystem_vendor_id = PCI_ANY_ID,
+ .subsystem_device_id = PCI_ANY_ID
+ },
+ {
+ .vendor_id = 0
+ }
+};
+
+static struct eth_driver mlx5_driver = {
+ .pci_drv = {
+ .name = MLX5_DRIVER_NAME,
+ .id_table = mlx5_pci_id_map,
+ .devinit = mlx5_pci_devinit,
+ },
+ .dev_private_size = sizeof(struct priv)
+};
+
+/**
+ * Driver initialization routine.
+ */
+static int
+rte_mlx5_pmd_init(const char *name, const char *args)
+{
+ (void)name;
+ (void)args;
+ /*
+ * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+ * huge pages. Calling ibv_fork_init() during init allows
+ * applications to use fork() safely for purposes other than
+ * using this PMD, which is not supported in forked processes.
+ */
+ setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+ ibv_fork_init();
+ rte_eal_pci_register(&mlx5_driver.pci_drv);
+ return 0;
+}
+
+static struct rte_driver rte_mlx5_driver = {
+ .type = PMD_PDEV,
+ .name = MLX5_DRIVER_NAME,
+ .init = rte_mlx5_pmd_init,
+};
+
+PMD_REGISTER_DRIVER(rte_mlx5_driver)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
new file mode 100644
index 0000000..21db3cd
--- /dev/null
+++ b/drivers/net/mlx5/mlx5.h
@@ -0,0 +1,147 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_H_
+#define RTE_PMD_MLX5_H_
+
+#include <stddef.h>
+#include <stdint.h>
+#include <limits.h>
+#include <net/if.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_spinlock.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5_autoconf.h"
+#include "mlx5_defs.h"
+
+enum {
+ PCI_VENDOR_ID_MELLANOX = 0x15b3,
+};
+
+enum {
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4 = 0x1013,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4VF = 0x1014,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LX = 0x1015,
+ PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF = 0x1016,
+};
+
+struct priv {
+ struct rte_eth_dev *dev; /* Ethernet device. */
+ struct ibv_context *ctx; /* Verbs context. */
+ struct ibv_device_attr device_attr; /* Device properties. */
+ struct ibv_pd *pd; /* Protection Domain. */
+ /*
+ * MAC addresses array and configuration bit-field.
+ * An extra entry that cannot be modified by the DPDK is reserved
+ * for broadcast frames (destination MAC address ff:ff:ff:ff:ff:ff).
+ */
+ struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES];
+ BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
+ /* Device properties. */
+ uint16_t mtu; /* Configured MTU. */
+ uint8_t port; /* Physical port number. */
+ unsigned int started:1; /* Device started, flows enabled. */
+ unsigned int hw_qpg:1; /* QP groups are supported. */
+ unsigned int hw_tss:1; /* TSS is supported. */
+ unsigned int hw_rss:1; /* RSS is supported. */
+ unsigned int hw_csum:1; /* Checksum offload is supported. */
+ unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
+ unsigned int rss:1; /* RSS is enabled. */
+ unsigned int vf:1; /* This is a VF device. */
+ unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
+ rte_spinlock_t lock; /* Lock for control functions. */
+};
+
+/**
+ * Lock private structure to protect it from concurrent access in the
+ * control path.
+ *
+ * @param priv
+ * Pointer to private structure.
+ */
+static inline void
+priv_lock(struct priv *priv)
+{
+ rte_spinlock_lock(&priv->lock);
+}
+
+/**
+ * Unlock private structure.
+ *
+ * @param priv
+ * Pointer to private structure.
+ */
+static inline void
+priv_unlock(struct priv *priv)
+{
+ rte_spinlock_unlock(&priv->lock);
+}
+
+/* mlx5_ethdev.c */
+
+int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
+int priv_ifreq(const struct priv *, int req, struct ifreq *);
+int priv_get_mtu(struct priv *, uint16_t *);
+int priv_set_flags(struct priv *, unsigned int, unsigned int);
+int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
+ struct rte_pci_addr *);
+
+/* mlx5_mac.c */
+
+int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
+int priv_mac_addr_add(struct priv *, unsigned int,
+ const uint8_t (*)[ETHER_ADDR_LEN]);
+
+#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
new file mode 100644
index 0000000..c66a74f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -0,0 +1,43 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_DEFS_H_
+#define RTE_PMD_MLX5_DEFS_H_
+
+/* Reported driver name. */
+#define MLX5_DRIVER_NAME "librte_pmd_mlx5"
+
+/* Maximum number of simultaneous MAC addresses. */
+#define MLX5_MAX_MAC_ADDRESSES 128
+
+#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
new file mode 100644
index 0000000..b6c7d7a
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -0,0 +1,420 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <dirent.h>
+#include <net/if.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_atomic.h>
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+
+/**
+ * Get interface name from private structure.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[out] ifname
+ * Interface name output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
+{
+ DIR *dir;
+ struct dirent *dent;
+ unsigned int dev_type = 0;
+ unsigned int dev_port_prev = ~0u;
+ char match[IF_NAMESIZE] = "";
+
+ {
+ MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
+
+ dir = opendir(path);
+ if (dir == NULL)
+ return -1;
+ }
+ while ((dent = readdir(dir)) != NULL) {
+ char *name = dent->d_name;
+ FILE *file;
+ unsigned int dev_port;
+ int r;
+
+ if ((name[0] == '.') &&
+ ((name[1] == '\0') ||
+ ((name[1] == '.') && (name[2] == '\0'))))
+ continue;
+
+ MKSTR(path, "%s/device/net/%s/%s",
+ priv->ctx->device->ibdev_path, name,
+ (dev_type ? "dev_id" : "dev_port"));
+
+ file = fopen(path, "rb");
+ if (file == NULL) {
+ if (errno != ENOENT)
+ continue;
+ /*
+ * Switch to dev_id when dev_port does not exist as
+ * is the case with Linux kernel versions < 3.15.
+ */
+try_dev_id:
+ match[0] = '\0';
+ if (dev_type)
+ break;
+ dev_type = 1;
+ dev_port_prev = ~0u;
+ rewinddir(dir);
+ continue;
+ }
+ r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+ fclose(file);
+ if (r != 1)
+ continue;
+ /*
+ * Switch to dev_id when dev_port returns the same value for
+ * all ports. May happen when using a MOFED release older than
+ * 3.0 with a Linux kernel >= 3.15.
+ */
+ if (dev_port == dev_port_prev)
+ goto try_dev_id;
+ dev_port_prev = dev_port;
+ if (dev_port == (priv->port - 1u))
+ snprintf(match, sizeof(match), "%s", name);
+ }
+ closedir(dir);
+ if (match[0] == '\0')
+ return -1;
+ strncpy(*ifname, match, sizeof(*ifname));
+ return 0;
+}
+
+/**
+ * Read from sysfs entry.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[in] entry
+ * Entry name relative to sysfs path.
+ * @param[out] buf
+ * Data output buffer.
+ * @param size
+ * Buffer size.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_sysfs_read(const struct priv *priv, const char *entry,
+ char *buf, size_t size)
+{
+ char ifname[IF_NAMESIZE];
+ FILE *file;
+ int ret;
+ int err;
+
+ if (priv_get_ifname(priv, &ifname))
+ return -1;
+
+ MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+ ifname, entry);
+
+ file = fopen(path, "rb");
+ if (file == NULL)
+ return -1;
+ ret = fread(buf, 1, size, file);
+ err = errno;
+ if (((size_t)ret < size) && (ferror(file)))
+ ret = -1;
+ else
+ ret = size;
+ fclose(file);
+ errno = err;
+ return ret;
+}
+
+/**
+ * Write to sysfs entry.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param[in] entry
+ * Entry name relative to sysfs path.
+ * @param[in] buf
+ * Data buffer.
+ * @param size
+ * Buffer size.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_sysfs_write(const struct priv *priv, const char *entry,
+ char *buf, size_t size)
+{
+ char ifname[IF_NAMESIZE];
+ FILE *file;
+ int ret;
+ int err;
+
+ if (priv_get_ifname(priv, &ifname))
+ return -1;
+
+ MKSTR(path, "%s/device/net/%s/%s", priv->ctx->device->ibdev_path,
+ ifname, entry);
+
+ file = fopen(path, "wb");
+ if (file == NULL)
+ return -1;
+ ret = fwrite(buf, 1, size, file);
+ err = errno;
+ if (((size_t)ret < size) || (ferror(file)))
+ ret = -1;
+ else
+ ret = size;
+ fclose(file);
+ errno = err;
+ return ret;
+}
+
+/**
+ * Get unsigned long sysfs property.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[in] name
+ * Entry name relative to sysfs path.
+ * @param[out] value
+ * Value output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_get_sysfs_ulong(struct priv *priv, const char *name, unsigned long *value)
+{
+ int ret;
+ unsigned long value_ret;
+ char value_str[32];
+
+ ret = priv_sysfs_read(priv, name, value_str, (sizeof(value_str) - 1));
+ if (ret == -1) {
+ DEBUG("cannot read %s value from sysfs: %s",
+ name, strerror(errno));
+ return -1;
+ }
+ value_str[ret] = '\0';
+ errno = 0;
+ value_ret = strtoul(value_str, NULL, 0);
+ if (errno) {
+ DEBUG("invalid %s value `%s': %s", name, value_str,
+ strerror(errno));
+ return -1;
+ }
+ *value = value_ret;
+ return 0;
+}
+
+/**
+ * Set unsigned long sysfs property.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[in] name
+ * Entry name relative to sysfs path.
+ * @param value
+ * Value to set.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_set_sysfs_ulong(struct priv *priv, const char *name, unsigned long value)
+{
+ int ret;
+ MKSTR(value_str, "%lu", value);
+
+ ret = priv_sysfs_write(priv, name, value_str, (sizeof(value_str) - 1));
+ if (ret == -1) {
+ DEBUG("cannot write %s `%s' (%lu) to sysfs: %s",
+ name, value_str, value, strerror(errno));
+ return -1;
+ }
+ return 0;
+}
+
+/**
+ * Perform ifreq ioctl() on associated Ethernet device.
+ *
+ * @param[in] priv
+ * Pointer to private structure.
+ * @param req
+ * Request number to pass to ioctl().
+ * @param[out] ifr
+ * Interface request structure output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_ifreq(const struct priv *priv, int req, struct ifreq *ifr)
+{
+ int sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
+ int ret = -1;
+
+ if (sock == -1)
+ return ret;
+ if (priv_get_ifname(priv, &ifr->ifr_name) == 0)
+ ret = ioctl(sock, req, ifr);
+ close(sock);
+ return ret;
+}
+
+/**
+ * Get device MTU.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param[out] mtu
+ * MTU value output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_mtu(struct priv *priv, uint16_t *mtu)
+{
+ unsigned long ulong_mtu;
+
+ if (priv_get_sysfs_ulong(priv, "mtu", &ulong_mtu) == -1)
+ return -1;
+ *mtu = ulong_mtu;
+ return 0;
+}
+
+/**
+ * Set device flags.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param keep
+ * Bitmask for flags that must remain untouched.
+ * @param flags
+ * Bitmask for flags to modify.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
+{
+ unsigned long tmp;
+
+ if (priv_get_sysfs_ulong(priv, "flags", &tmp) == -1)
+ return -1;
+ tmp &= keep;
+ tmp |= flags;
+ return priv_set_sysfs_ulong(priv, "flags", tmp);
+}
+
+/**
+ * Get PCI information from struct ibv_device.
+ *
+ * @param device
+ * Pointer to Ethernet device structure.
+ * @param[out] pci_addr
+ * PCI bus address output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+mlx5_ibv_device_to_pci_addr(const struct ibv_device *device,
+ struct rte_pci_addr *pci_addr)
+{
+ FILE *file;
+ char line[32];
+ MKSTR(path, "%s/device/uevent", device->ibdev_path);
+
+ file = fopen(path, "rb");
+ if (file == NULL)
+ return -1;
+ while (fgets(line, sizeof(line), file) == line) {
+ size_t len = strlen(line);
+ int ret;
+
+ /* Truncate long lines. */
+ if (len == (sizeof(line) - 1))
+ while (line[(len - 1)] != '\n') {
+ ret = fgetc(file);
+ if (ret == EOF)
+ break;
+ line[(len - 1)] = ret;
+ }
+ /* Extract information. */
+ if (sscanf(line,
+ "PCI_SLOT_NAME="
+ "%" SCNx16 ":%" SCNx8 ":%" SCNx8 ".%" SCNx8 "\n",
+ &pci_addr->domain,
+ &pci_addr->bus,
+ &pci_addr->devid,
+ &pci_addr->function) == 4) {
+ ret = 0;
+ break;
+ }
+ }
+ fclose(file);
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
new file mode 100644
index 0000000..f7e1cf6
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -0,0 +1,150 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <stdint.h>
+#include <string.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <netinet/in.h>
+#include <linux/if.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+
+/**
+ * Get MAC address by querying netdevice.
+ *
+ * @param[in] priv
+ * struct priv for the requested device.
+ * @param[out] mac
+ * MAC address output buffer.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+int
+priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+ struct ifreq request;
+
+ if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
+ return -1;
+ memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+ return 0;
+}
+
+/**
+ * Unregister a MAC address.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
+{
+ assert(mac_index < RTE_DIM(priv->mac));
+ if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
+ return;
+ BITFIELD_RESET(priv->mac_configured, mac_index);
+}
+
+/**
+ * Register a MAC address.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mac_index
+ * MAC address index to use.
+ * @param mac
+ * MAC address to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
+ const uint8_t (*mac)[ETHER_ADDR_LEN])
+{
+ unsigned int i;
+
+ assert(mac_index < RTE_DIM(priv->mac));
+ /* First, make sure this address isn't already configured. */
+ for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
+ /* Skip this index, it's going to be reconfigured. */
+ if (i == mac_index)
+ continue;
+ if (!BITFIELD_ISSET(priv->mac_configured, i))
+ continue;
+ if (memcmp(priv->mac[i].addr_bytes, *mac, sizeof(*mac)))
+ continue;
+ /* Address already configured elsewhere, return with error. */
+ return EADDRINUSE;
+ }
+ if (BITFIELD_ISSET(priv->mac_configured, mac_index))
+ priv_mac_addr_del(priv, mac_index);
+ priv->mac[mac_index] = (struct ether_addr){
+ {
+ (*mac)[0], (*mac)[1], (*mac)[2],
+ (*mac)[3], (*mac)[4], (*mac)[5]
+ }
+ };
+ BITFIELD_SET(priv->mac_configured, mac_index);
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
new file mode 100644
index 0000000..cc6aab6
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -0,0 +1,149 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_UTILS_H_
+#define RTE_PMD_MLX5_UTILS_H_
+
+#include <stddef.h>
+#include <stdio.h>
+#include <limits.h>
+#include <assert.h>
+#include <errno.h>
+
+#include "mlx5_defs.h"
+
+/* Bit-field manipulation. */
+#define BITFIELD_DECLARE(bf, type, size) \
+ type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \
+ !!((size_t)(size) % (sizeof(type) * CHAR_BIT)))]
+#define BITFIELD_DEFINE(bf, type, size) \
+ BITFIELD_DECLARE((bf), type, (size)) = { 0 }
+#define BITFIELD_SET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] |= \
+ ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_RESET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ (void)((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] &= \
+ ~((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT)))))
+#define BITFIELD_ISSET(bf, b) \
+ (assert((size_t)(b) < (sizeof(bf) * CHAR_BIT)), \
+ !!(((bf)[((b) / (sizeof((bf)[0]) * CHAR_BIT))] & \
+ ((size_t)1 << ((b) % (sizeof((bf)[0]) * CHAR_BIT))))))
+
+/* Save and restore errno around argument evaluation. */
+#define ERRNO_SAFE(x) ((errno = (int []){ errno, ((x), 0) }[0]))
+
+/*
+ * Helper macros to work around __VA_ARGS__ limitations in a C99 compliant
+ * manner.
+ */
+#define PMD_DRV_LOG_STRIP(a, b) a
+#define PMD_DRV_LOG_OPAREN (
+#define PMD_DRV_LOG_CPAREN )
+#define PMD_DRV_LOG_COMMA ,
+
+/* Return the file name part of a path. */
+static inline const char *
+pmd_drv_log_basename(const char *s)
+{
+ const char *n = s;
+
+ while (*n)
+ if (*(n++) == '/')
+ s = n;
+ return s;
+}
+
+/*
+ * When debugging is enabled (NDEBUG not defined), file, line and function
+ * information replace the driver name (MLX5_DRIVER_NAME) in log messages.
+ */
+#ifndef NDEBUG
+
+#define PMD_DRV_LOG___(level, ...) \
+ ERRNO_SAFE(RTE_LOG(level, PMD, __VA_ARGS__))
+#define PMD_DRV_LOG__(level, ...) \
+ PMD_DRV_LOG___(level, "%s:%u: %s(): " __VA_ARGS__)
+#define PMD_DRV_LOG_(level, s, ...) \
+ PMD_DRV_LOG__(level, \
+ s "\n" PMD_DRV_LOG_COMMA \
+ pmd_drv_log_basename(__FILE__) PMD_DRV_LOG_COMMA \
+ __LINE__ PMD_DRV_LOG_COMMA \
+ __func__, \
+ __VA_ARGS__)
+
+#else /* NDEBUG */
+
+#define PMD_DRV_LOG___(level, ...) \
+ ERRNO_SAFE(RTE_LOG(level, PMD, MLX5_DRIVER_NAME ": " __VA_ARGS__))
+#define PMD_DRV_LOG__(level, ...) \
+ PMD_DRV_LOG___(level, __VA_ARGS__)
+#define PMD_DRV_LOG_(level, s, ...) \
+ PMD_DRV_LOG__(level, s "\n", __VA_ARGS__)
+
+#endif /* NDEBUG */
+
+/* Generic printf()-like logging macro with automatic line feed. */
+#define PMD_DRV_LOG(level, ...) \
+ PMD_DRV_LOG_(level, \
+ __VA_ARGS__ PMD_DRV_LOG_STRIP PMD_DRV_LOG_OPAREN, \
+ PMD_DRV_LOG_CPAREN)
+
+/*
+ * Like assert(), DEBUG() becomes a no-op and claim_zero() does not perform
+ * any check when debugging is disabled.
+ */
+#ifndef NDEBUG
+
+#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__)
+#define claim_zero(...) assert((__VA_ARGS__) == 0)
+
+#else /* NDEBUG */
+
+#define DEBUG(...) (void)0
+#define claim_zero(...) (__VA_ARGS__)
+
+#endif /* NDEBUG */
+
+#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__)
+#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
+#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+
+/* Allocate a buffer on the stack and fill it with a printf format string. */
+#define MKSTR(name, ...) \
+ char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
+ \
+ snprintf(name, sizeof(name), __VA_ARGS__)
+
+#endif /* RTE_PMD_MLX5_UTILS_H_ */
diff --git a/drivers/net/mlx5/rte_pmd_mlx5_version.map b/drivers/net/mlx5/rte_pmd_mlx5_version.map
new file mode 100644
index 0000000..ad607bb
--- /dev/null
+++ b/drivers/net/mlx5/rte_pmd_mlx5_version.map
@@ -0,0 +1,3 @@
+DPDK_2.2 {
+ local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9e1909e..724efa7 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -104,6 +104,10 @@ ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -libverbs
endif # ! CONFIG_RTE_BUILD_SHARED_LIBS
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),n)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -libverbs
+endif # ! CONFIG_RTE_BUILD_SHARED_LIBS
+
_LDLIBS-$(CONFIG_RTE_LIBRTE_BNX2X_PMD) += -lz
_LDLIBS-y += --start-group
@@ -137,6 +141,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += -lrte_pmd_fm10k
_LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += -lrte_pmd_ixgbe
_LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += -lrte_pmd_e1000
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
+_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5
_LDLIBS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += -lrte_pmd_mpipe -lgxio
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_RING) += -lrte_pmd_ring
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 02/13] mlx5: add non-scattered TX and RX support
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 03/13] mlx5: add MAC handling Adrien Mazarguil
` (11 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
RSS implementation with parent/child QPs comes from mlx4 and is temporary.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
config/common_bsdapp | 3 +
config/common_linuxapp | 3 +
drivers/net/mlx5/Makefile | 15 +
drivers/net/mlx5/mlx5.c | 40 +++
drivers/net/mlx5/mlx5.h | 25 ++
drivers/net/mlx5/mlx5_defs.h | 24 ++
drivers/net/mlx5/mlx5_rxq.c | 682 ++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.c | 496 ++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 156 ++++++++++
drivers/net/mlx5/mlx5_txq.c | 512 +++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_utils.h | 11 +
11 files changed, 1967 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_rxq.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.c
create mode 100644 drivers/net/mlx5/mlx5_rxtx.h
create mode 100644 drivers/net/mlx5/mlx5_txq.c
diff --git a/config/common_bsdapp b/config/common_bsdapp
index 1e6885f..3b50ff9 100644
--- a/config/common_bsdapp
+++ b/config/common_bsdapp
@@ -218,6 +218,9 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
CONFIG_RTE_LIBRTE_MLX5_PMD=n
CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
#
# Compile burst-oriented Broadcom PMD driver
diff --git a/config/common_linuxapp b/config/common_linuxapp
index 7da7ba7..eed8fc0 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -216,6 +216,9 @@ CONFIG_RTE_LIBRTE_MLX4_SOFT_COUNTERS=1
#
CONFIG_RTE_LIBRTE_MLX5_PMD=n
CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N=4
+CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE=0
+CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8
#
# Compile burst-oriented Broadcom PMD driver
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 6e63073..7b9c57b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -42,6 +42,9 @@ LIB = librte_pmd_mlx5.a
# Sources.
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
@@ -79,6 +82,18 @@ else
CFLAGS += -DNDEBUG -UPEDANTIC
endif
+ifdef CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N
+CFLAGS += -DMLX5_PMD_SGE_WR_N=$(CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N)
+endif
+
+ifdef CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE
+CFLAGS += -DMLX5_PMD_MAX_INLINE=$(CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE)
+endif
+
+ifdef CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE
+CFLAGS += -DMLX5_PMD_TX_MP_CACHE=$(CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE)
+endif
+
include $(RTE_SDK)/mk/rte.lib.mk
# Generate and clean-up mlx5_autoconf.h.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 6df486b..54bd6b9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -63,6 +63,7 @@
#include "mlx5.h"
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
#include "mlx5_autoconf.h"
/**
@@ -77,11 +78,46 @@ static void
mlx5_dev_close(struct rte_eth_dev *dev)
{
struct priv *priv = dev->data->dev_private;
+ void *tmp;
+ unsigned int i;
priv_lock(priv);
DEBUG("%p: closing device \"%s\"",
(void *)dev,
((priv->ctx != NULL) ? priv->ctx->device->name : ""));
+ /* Prevent crashes when queues are still in use. */
+ dev->rx_pkt_burst = removed_rx_burst;
+ dev->tx_pkt_burst = removed_tx_burst;
+ if (priv->rxqs != NULL) {
+ /* XXX race condition if mlx5_rx_burst() is still running. */
+ usleep(1000);
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ tmp = (*priv->rxqs)[i];
+ if (tmp == NULL)
+ continue;
+ (*priv->rxqs)[i] = NULL;
+ rxq_cleanup(tmp);
+ rte_free(tmp);
+ }
+ priv->rxqs_n = 0;
+ priv->rxqs = NULL;
+ }
+ if (priv->txqs != NULL) {
+ /* XXX race condition if mlx5_tx_burst() is still running. */
+ usleep(1000);
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ tmp = (*priv->txqs)[i];
+ if (tmp == NULL)
+ continue;
+ (*priv->txqs)[i] = NULL;
+ txq_cleanup(tmp);
+ rte_free(tmp);
+ }
+ priv->txqs_n = 0;
+ priv->txqs = NULL;
+ }
+ if (priv->rss)
+ rxq_cleanup(&priv->rxq_parent);
if (priv->pd != NULL) {
assert(priv->ctx != NULL);
claim_zero(ibv_dealloc_pd(priv->pd));
@@ -94,6 +130,10 @@ mlx5_dev_close(struct rte_eth_dev *dev)
static const struct eth_dev_ops mlx5_dev_ops = {
.dev_close = mlx5_dev_close,
+ .rx_queue_setup = mlx5_rx_queue_setup,
+ .tx_queue_setup = mlx5_tx_queue_setup,
+ .rx_queue_release = mlx5_rx_queue_release,
+ .tx_queue_release = mlx5_tx_queue_release,
};
static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 21db3cd..49978f5 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -63,6 +63,7 @@
#endif
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
#include "mlx5_autoconf.h"
#include "mlx5_defs.h"
@@ -101,9 +102,33 @@ struct priv {
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
unsigned int max_rss_tbl_sz; /* Maximum number of RSS queues. */
+ /* RX/TX queues. */
+ struct rxq rxq_parent; /* Parent queue when RSS is enabled. */
+ unsigned int rxqs_n; /* RX queues array size. */
+ unsigned int txqs_n; /* TX queues array size. */
+ struct rxq *(*rxqs)[]; /* RX queues. */
+ struct txq *(*txqs)[]; /* TX queues. */
rte_spinlock_t lock; /* Lock for control functions. */
};
+/* Work Request ID data type (64 bit). */
+typedef union {
+ struct {
+ uint32_t id;
+ uint16_t offset;
+ } data;
+ uint64_t raw;
+} wr_id_t;
+
+/* Compile-time check. */
+static inline void wr_id_t_check(void)
+{
+ wr_id_t check[1 + (2 * -!(sizeof(wr_id_t) == sizeof(uint64_t)))];
+
+ (void)check;
+ (void)wr_id_t_check;
+}
+
/**
* Lock private structure to protect it from concurrent access in the
* control path.
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index c66a74f..c85be9c 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -40,4 +40,28 @@
/* Maximum number of simultaneous MAC addresses. */
#define MLX5_MAX_MAC_ADDRESSES 128
+/* Request send completion once in every 64 sends, might be less. */
+#define MLX5_PMD_TX_PER_COMP_REQ 64
+
+/* Maximum number of Scatter/Gather Elements per Work Request. */
+#ifndef MLX5_PMD_SGE_WR_N
+#define MLX5_PMD_SGE_WR_N 4
+#endif
+
+/* Maximum size for inline data. */
+#ifndef MLX5_PMD_MAX_INLINE
+#define MLX5_PMD_MAX_INLINE 0
+#endif
+
+/*
+ * Maximum number of cached Memory Pools (MPs) per TX queue. Each RTE MP
+ * from which buffers are to be transmitted will have to be mapped by this
+ * driver to their own Memory Region (MR). This is a slow operation.
+ *
+ * This value is always 1 for RX queues.
+ */
+#ifndef MLX5_PMD_TX_MP_CACHE
+#define MLX5_PMD_TX_MP_CACHE 8
+#endif
+
#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
new file mode 100644
index 0000000..01cc649
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -0,0 +1,682 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+#include "mlx5_defs.h"
+
+/**
+ * Allocate RX queue elements.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ * @param[in] pool
+ * If not NULL, fetch buffers from this array instead of allocating them
+ * with rte_pktmbuf_alloc().
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
+{
+ unsigned int i;
+ struct rxq_elt (*elts)[elts_n] =
+ rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+ rxq->socket);
+ int ret = 0;
+
+ if (elts == NULL) {
+ ERROR("%p: can't allocate packets array", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* For each WR (packet). */
+ for (i = 0; (i != elts_n); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct ibv_recv_wr *wr = &elt->wr;
+ struct ibv_sge *sge = &(*elts)[i].sge;
+ struct rte_mbuf *buf;
+
+ if (pool != NULL) {
+ buf = *(pool++);
+ assert(buf != NULL);
+ rte_pktmbuf_reset(buf);
+ } else
+ buf = rte_pktmbuf_alloc(rxq->mp);
+ if (buf == NULL) {
+ assert(pool == NULL);
+ ERROR("%p: empty mbuf pool", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* Configure WR. Work request ID contains its own index in
+ * the elts array and the offset between SGE buffer header and
+ * its data. */
+ WR_ID(wr->wr_id).id = i;
+ WR_ID(wr->wr_id).offset =
+ (((uintptr_t)buf->buf_addr + RTE_PKTMBUF_HEADROOM) -
+ (uintptr_t)buf);
+ wr->next = &(*elts)[(i + 1)].wr;
+ wr->sg_list = sge;
+ wr->num_sge = 1;
+ /* Headroom is reserved by rte_pktmbuf_alloc(). */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ /* Buffer is supposed to be empty. */
+ assert(rte_pktmbuf_data_len(buf) == 0);
+ assert(rte_pktmbuf_pkt_len(buf) == 0);
+ /* sge->addr must be able to store a pointer. */
+ assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+ /* SGE keeps its headroom. */
+ sge->addr = (uintptr_t)
+ ((uint8_t *)buf->buf_addr + RTE_PKTMBUF_HEADROOM);
+ sge->length = (buf->buf_len - RTE_PKTMBUF_HEADROOM);
+ sge->lkey = rxq->mr->lkey;
+ /* Redundant check for tailroom. */
+ assert(sge->length == rte_pktmbuf_tailroom(buf));
+ /* Make sure elts index and SGE mbuf pointer can be deduced
+ * from WR ID. */
+ if ((WR_ID(wr->wr_id).id != i) ||
+ ((void *)((uintptr_t)sge->addr -
+ WR_ID(wr->wr_id).offset) != buf)) {
+ ERROR("%p: cannot store index and offset in WR ID",
+ (void *)rxq);
+ sge->addr = 0;
+ rte_pktmbuf_free(buf);
+ ret = EOVERFLOW;
+ goto error;
+ }
+ }
+ /* The last WR pointer must be NULL. */
+ (*elts)[(i - 1)].wr.next = NULL;
+ DEBUG("%p: allocated and configured %u single-segment WRs",
+ (void *)rxq, elts_n);
+ rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
+ rxq->elts.no_sp = elts;
+ assert(ret == 0);
+ return 0;
+error:
+ if (elts != NULL) {
+ assert(pool == NULL);
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf;
+
+ if (elt->sge.addr == 0)
+ continue;
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ buf = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+ rte_pktmbuf_free_seg(buf);
+ }
+ rte_free(elts);
+ }
+ DEBUG("%p: failed, freed everything", (void *)rxq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free RX queue elements.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+static void
+rxq_free_elts(struct rxq *rxq)
+{
+ unsigned int i;
+ unsigned int elts_n = rxq->elts_n;
+ struct rxq_elt (*elts)[elts_n] = rxq->elts.no_sp;
+
+ DEBUG("%p: freeing WRs", (void *)rxq);
+ rxq->elts_n = 0;
+ rxq->elts.no_sp = NULL;
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf;
+
+ if (elt->sge.addr == 0)
+ continue;
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ buf = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+ rte_pktmbuf_free_seg(buf);
+ }
+ rte_free(elts);
+}
+
+/**
+ * Clean up a RX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_cleanup(struct rxq *rxq)
+{
+ struct ibv_exp_release_intf_params params;
+
+ DEBUG("cleaning up %p", (void *)rxq);
+ rxq_free_elts(rxq);
+ if (rxq->if_qp != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_qp,
+ ¶ms));
+ }
+ if (rxq->if_cq != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_cq,
+ ¶ms));
+ }
+ if (rxq->qp != NULL) {
+ claim_zero(ibv_destroy_qp(rxq->qp));
+ }
+ if (rxq->cq != NULL)
+ claim_zero(ibv_destroy_cq(rxq->cq));
+ if (rxq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(rxq->priv->ctx,
+ rxq->rd,
+ &attr));
+ }
+ if (rxq->mr != NULL)
+ claim_zero(ibv_dereg_mr(rxq->mr));
+ memset(rxq, 0, sizeof(*rxq));
+}
+
+/**
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param cq
+ * Completion queue to associate with QP.
+ * @param desc
+ * Number of descriptors in QP (hint only).
+ *
+ * @return
+ * QP pointer or NULL in case of error.
+ */
+static struct ibv_qp *
+rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
+ struct ibv_exp_res_domain *rd)
+{
+ struct ibv_exp_qp_init_attr attr = {
+ /* CQ to be associated with the send queue. */
+ .send_cq = cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_recv_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
+ .pd = priv->pd,
+ .res_domain = rd,
+ };
+
+ return ibv_exp_create_qp(priv->ctx, &attr);
+}
+
+#ifdef RSS_SUPPORT
+
+/**
+ * Allocate a RSS Queue Pair.
+ * Optionally setup inline receive if supported.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param cq
+ * Completion queue to associate with QP.
+ * @param desc
+ * Number of descriptors in QP (hint only).
+ * @param parent
+ * If nonzero, create a parent QP, otherwise a child.
+ *
+ * @return
+ * QP pointer or NULL in case of error.
+ */
+static struct ibv_qp *
+rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
+ int parent, struct ibv_exp_res_domain *rd)
+{
+ struct ibv_exp_qp_init_attr attr = {
+ /* CQ to be associated with the send queue. */
+ .send_cq = cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_recv_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
+ IBV_EXP_QP_INIT_ATTR_QPG),
+ .pd = priv->pd,
+ .res_domain = rd,
+ };
+
+ if (parent) {
+ attr.qpg.qpg_type = IBV_EXP_QPG_PARENT;
+ /* TSS isn't necessary. */
+ attr.qpg.parent_attrib.tss_child_count = 0;
+ attr.qpg.parent_attrib.rss_child_count = priv->rxqs_n;
+ DEBUG("initializing parent RSS queue");
+ } else {
+ attr.qpg.qpg_type = IBV_EXP_QPG_CHILD_RX;
+ attr.qpg.qpg_parent = priv->rxq_parent.qp;
+ DEBUG("initializing child RSS queue");
+ }
+ return ibv_exp_create_qp(priv->ctx, &attr);
+}
+
+#endif /* RSS_SUPPORT */
+
+/**
+ * Configure a RX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ * @param mp
+ * Memory pool for buffer allocations.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
+ unsigned int socket, const struct rte_eth_rxconf *conf,
+ struct rte_mempool *mp)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rxq tmpl = {
+ .priv = priv,
+ .mp = mp,
+ .socket = socket
+ };
+ struct ibv_exp_qp_attr mod;
+ union {
+ struct ibv_exp_query_intf_params params;
+ struct ibv_exp_cq_init_attr cq;
+ struct ibv_exp_res_domain_init_attr rd;
+ } attr;
+ enum ibv_exp_query_intf_status status;
+ struct ibv_recv_wr *bad_wr;
+ struct rte_mbuf *buf;
+ int ret = 0;
+ int parent = (rxq == &priv->rxq_parent);
+
+ (void)conf; /* Thresholds configuration (ignored). */
+ /*
+ * If this is a parent queue, hardware must support RSS and
+ * RSS must be enabled.
+ */
+ assert((!parent) || ((priv->hw_rss) && (priv->rss)));
+ if (parent) {
+ /* Even if unused, ibv_create_cq() requires at least one
+ * descriptor. */
+ desc = 1;
+ goto skip_mr;
+ }
+ if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
+ ERROR("%p: invalid number of RX descriptors (must be a"
+ " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
+ return EINVAL;
+ }
+ /* Get mbuf length. */
+ buf = rte_pktmbuf_alloc(mp);
+ if (buf == NULL) {
+ ERROR("%p: unable to allocate mbuf", (void *)dev);
+ return ENOMEM;
+ }
+ tmpl.mb_len = buf->buf_len;
+ assert((rte_pktmbuf_headroom(buf) +
+ rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
+ assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
+ rte_pktmbuf_free(buf);
+ /* Use the entire RX mempool as the memory region. */
+ tmpl.mr = ibv_reg_mr(priv->pd,
+ (void *)mp->elt_va_start,
+ (mp->elt_va_end - mp->elt_va_start),
+ (IBV_ACCESS_LOCAL_WRITE |
+ IBV_ACCESS_REMOTE_WRITE));
+ if (tmpl.mr == NULL) {
+ ret = EINVAL;
+ ERROR("%p: MR creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+skip_mr:
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+ if (tmpl.cq == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: CQ creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ DEBUG("priv->device_attr.max_qp_wr is %d",
+ priv->device_attr.max_qp_wr);
+ DEBUG("priv->device_attr.max_sge is %d",
+ priv->device_attr.max_sge);
+#ifdef RSS_SUPPORT
+ if (priv->rss)
+ tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
+ tmpl.rd);
+ else
+#endif /* RSS_SUPPORT */
+ tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
+ if (tmpl.qp == NULL) {
+ ret = (errno ? errno : EINVAL);
+ ERROR("%p: QP creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &mod,
+ (IBV_EXP_QP_STATE |
+#ifdef RSS_SUPPORT
+ (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+#endif /* RSS_SUPPORT */
+ IBV_EXP_QP_PORT));
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ /* Allocate descriptors for RX queues, except for the RSS parent. */
+ if (parent)
+ goto skip_alloc;
+ ret = rxq_alloc_elts(&tmpl, desc, NULL);
+ if (ret) {
+ ERROR("%p: RXQ allocation failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ ret = ibv_post_recv(tmpl.qp,
+ &(*tmpl.elts.no_sp)[0].wr,
+ &bad_wr);
+ if (ret) {
+ ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+ (void *)dev,
+ (void *)bad_wr,
+ strerror(ret));
+ goto error;
+ }
+skip_alloc:
+ mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ /* Save port ID. */
+ tmpl.port_id = dev->data->port_id;
+ DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ /* Clean up rxq in case we're reinitializing it. */
+ DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
+ rxq_cleanup(rxq);
+ *rxq = tmpl;
+ DEBUG("%p: rxq updated with %p", (void *)rxq, (void *)&tmpl);
+ assert(ret == 0);
+ return 0;
+error:
+ rxq_cleanup(&tmpl);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback to configure a RX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param idx
+ * RX queue index.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ * @param mp
+ * Memory pool for buffer allocations.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+ unsigned int socket, const struct rte_eth_rxconf *conf,
+ struct rte_mempool *mp)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rxq *rxq = (*priv->rxqs)[idx];
+ int ret;
+
+ priv_lock(priv);
+ DEBUG("%p: configuring queue %u for %u descriptors",
+ (void *)dev, idx, desc);
+ if (idx >= priv->rxqs_n) {
+ ERROR("%p: queue index out of range (%u >= %u)",
+ (void *)dev, idx, priv->rxqs_n);
+ priv_unlock(priv);
+ return -EOVERFLOW;
+ }
+ if (rxq != NULL) {
+ DEBUG("%p: reusing already allocated queue index %u (%p)",
+ (void *)dev, idx, (void *)rxq);
+ if (priv->started) {
+ priv_unlock(priv);
+ return -EEXIST;
+ }
+ (*priv->rxqs)[idx] = NULL;
+ rxq_cleanup(rxq);
+ } else {
+ rxq = rte_calloc_socket("RXQ", 1, sizeof(*rxq), 0, socket);
+ if (rxq == NULL) {
+ ERROR("%p: unable to allocate queue index %u",
+ (void *)dev, idx);
+ priv_unlock(priv);
+ return -ENOMEM;
+ }
+ }
+ ret = rxq_setup(dev, rxq, desc, socket, conf, mp);
+ if (ret)
+ rte_free(rxq);
+ else {
+ DEBUG("%p: adding RX queue %p to list",
+ (void *)dev, (void *)rxq);
+ (*priv->rxqs)[idx] = rxq;
+ /* Update receive callback. */
+ dev->rx_pkt_burst = mlx5_rx_burst;
+ }
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to release a RX queue.
+ *
+ * @param dpdk_rxq
+ * Generic RX queue pointer.
+ */
+void
+mlx5_rx_queue_release(void *dpdk_rxq)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct priv *priv;
+ unsigned int i;
+
+ if (rxq == NULL)
+ return;
+ priv = rxq->priv;
+ priv_lock(priv);
+ assert(rxq != &priv->rxq_parent);
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] == rxq) {
+ DEBUG("%p: removing RX queue %p from list",
+ (void *)priv->dev, (void *)rxq);
+ (*priv->rxqs)[i] = NULL;
+ break;
+ }
+ rxq_cleanup(rxq);
+ rte_free(rxq);
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
new file mode 100644
index 0000000..0f1e541
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -0,0 +1,496 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_prefetch.h>
+#include <rte_common.h>
+#include <rte_branch_prediction.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
+
+/**
+ * Manage TX completions.
+ *
+ * When sending a burst, mlx5_tx_burst() posts several WRs.
+ * To improve performance, a completion event is only required once every
+ * MLX5_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ *
+ * @return
+ * 0 on success, -1 on failure.
+ */
+static int
+txq_complete(struct txq *txq)
+{
+ unsigned int elts_comp = txq->elts_comp;
+ unsigned int elts_tail = txq->elts_tail;
+ const unsigned int elts_n = txq->elts_n;
+ int wcs_n;
+
+ if (unlikely(elts_comp == 0))
+ return 0;
+#ifdef DEBUG_SEND
+ DEBUG("%p: processing %u work requests completions",
+ (void *)txq, elts_comp);
+#endif
+ wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
+ if (unlikely(wcs_n == 0))
+ return 0;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("%p: ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)txq, wcs_n);
+ return -1;
+ }
+ elts_comp -= wcs_n;
+ assert(elts_comp <= txq->elts_comp);
+ /*
+ * Assume WC status is successful as nothing can be done about it
+ * anyway.
+ */
+ elts_tail += wcs_n * txq->elts_comp_cd_init;
+ if (elts_tail >= elts_n)
+ elts_tail -= elts_n;
+ txq->elts_tail = elts_tail;
+ txq->elts_comp = elts_comp;
+ return 0;
+}
+
+/**
+ * Get Memory Region (MR) <-> Memory Pool (MP) association from txq->mp2mr[].
+ * Add MP to txq->mp2mr[] if it's not registered yet. If mp2mr[] is full,
+ * remove an entry first.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param[in] mp
+ * Memory Pool for which a Memory Region lkey must be returned.
+ *
+ * @return
+ * mr->lkey on success, (uint32_t)-1 on failure.
+ */
+static uint32_t
+txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
+{
+ unsigned int i;
+ struct ibv_mr *mr;
+
+ for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+ if (unlikely(txq->mp2mr[i].mp == NULL)) {
+ /* Unknown MP, add a new MR for it. */
+ break;
+ }
+ if (txq->mp2mr[i].mp == mp) {
+ assert(txq->mp2mr[i].lkey != (uint32_t)-1);
+ assert(txq->mp2mr[i].mr->lkey == txq->mp2mr[i].lkey);
+ return txq->mp2mr[i].lkey;
+ }
+ }
+ /* Add a new entry, register MR first. */
+ DEBUG("%p: discovered new memory pool %p", (void *)txq, (void *)mp);
+ mr = ibv_reg_mr(txq->priv->pd,
+ (void *)mp->elt_va_start,
+ (mp->elt_va_end - mp->elt_va_start),
+ (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+ if (unlikely(mr == NULL)) {
+ DEBUG("%p: unable to configure MR, ibv_reg_mr() failed.",
+ (void *)txq);
+ return (uint32_t)-1;
+ }
+ if (unlikely(i == RTE_DIM(txq->mp2mr))) {
+ /* Table is full, remove oldest entry. */
+ DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
+ (void *)txq);
+ --i;
+ claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+ memmove(&txq->mp2mr[0], &txq->mp2mr[1],
+ (sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
+ }
+ /* Store the new entry. */
+ txq->mp2mr[i].mp = mp;
+ txq->mp2mr[i].mr = mr;
+ txq->mp2mr[i].lkey = mr->lkey;
+ DEBUG("%p: new MR lkey for MP %p: 0x%08" PRIu32,
+ (void *)txq, (void *)mp, txq->mp2mr[i].lkey);
+ return txq->mp2mr[i].lkey;
+}
+
+/**
+ * DPDK callback for TX.
+ *
+ * @param dpdk_txq
+ * Generic pointer to TX queue structure.
+ * @param[in] pkts
+ * Packets to transmit.
+ * @param pkts_n
+ * Number of packets in array.
+ *
+ * @return
+ * Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct txq *txq = (struct txq *)dpdk_txq;
+ unsigned int elts_head = txq->elts_head;
+ const unsigned int elts_tail = txq->elts_tail;
+ const unsigned int elts_n = txq->elts_n;
+ unsigned int elts_comp_cd = txq->elts_comp_cd;
+ unsigned int elts_comp = 0;
+ unsigned int i;
+ unsigned int max;
+ int err;
+
+ assert(elts_comp_cd != 0);
+ txq_complete(txq);
+ max = (elts_n - (elts_head - elts_tail));
+ if (max > elts_n)
+ max -= elts_n;
+ assert(max >= 1);
+ assert(max <= elts_n);
+ /* Always leave one free entry in the ring. */
+ --max;
+ if (max == 0)
+ return 0;
+ if (max > pkts_n)
+ max = pkts_n;
+ for (i = 0; (i != max); ++i) {
+ struct rte_mbuf *buf = pkts[i];
+ unsigned int elts_head_next =
+ (((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+ struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
+ struct txq_elt *elt = &(*txq->elts)[elts_head];
+ unsigned int segs = NB_SEGS(buf);
+ uint32_t send_flags = 0;
+
+ /* Clean up old buffer. */
+ if (likely(elt->buf != NULL)) {
+ struct rte_mbuf *tmp = elt->buf;
+
+ /* Faster than rte_pktmbuf_free(). */
+ do {
+ struct rte_mbuf *next = NEXT(tmp);
+
+ rte_pktmbuf_free_seg(tmp);
+ tmp = next;
+ } while (tmp != NULL);
+ }
+ /* Request TX completion. */
+ if (unlikely(--elts_comp_cd == 0)) {
+ elts_comp_cd = txq->elts_comp_cd_init;
+ ++elts_comp;
+ send_flags |= IBV_EXP_QP_BURST_SIGNALED;
+ }
+ if (likely(segs == 1)) {
+ uintptr_t addr;
+ uint32_t length;
+ uint32_t lkey;
+
+ /* Retrieve buffer information. */
+ addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ length = DATA_LEN(buf);
+ /* Retrieve Memory Region key for this memory pool. */
+ lkey = txq_mp2mr(txq, buf->pool);
+ if (unlikely(lkey == (uint32_t)-1)) {
+ /* MR does not exist. */
+ DEBUG("%p: unable to get MP <-> MR"
+ " association", (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* Update element. */
+ elt->buf = buf;
+ if (txq->priv->vf)
+ rte_prefetch0((volatile void *)
+ (uintptr_t)addr);
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+ /* Put packet into send queue. */
+#if MLX5_PMD_MAX_INLINE > 0
+ if (length <= txq->max_inline)
+ err = txq->if_qp->send_pending_inline
+ (txq->qp,
+ (void *)addr,
+ length,
+ send_flags);
+ else
+#endif
+ err = txq->if_qp->send_pending
+ (txq->qp,
+ addr,
+ length,
+ lkey,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
+ } else {
+ DEBUG("%p: TX scattered buffers support not"
+ " compiled in", (void *)txq);
+ goto stop;
+ }
+ elts_head = elts_head_next;
+ }
+stop:
+ /* Take a shortcut if nothing must be sent. */
+ if (unlikely(i == 0))
+ return 0;
+ /* Ring QP doorbell. */
+ err = txq->if_qp->send_flush(txq->qp);
+ if (unlikely(err)) {
+ /* A nonzero value is not supposed to be returned.
+ * Nothing can be done about it. */
+ DEBUG("%p: send_flush() failed with error %d",
+ (void *)txq, err);
+ }
+ txq->elts_head = elts_head;
+ txq->elts_comp += elts_comp;
+ txq->elts_comp_cd = elts_comp_cd;
+ return i;
+}
+
+/**
+ * DPDK callback for RX.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
+ struct ibv_sge sges[pkts_n];
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
+
+ for (i = 0; (i != pkts_n); ++i) {
+ struct rxq_elt *elt = &(*elts)[elts_head];
+ struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
+ struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
+ WR_ID(wr_id).offset);
+ struct rte_mbuf *rep;
+ uint32_t flags;
+
+ /* Sanity checks. */
+ assert(WR_ID(wr_id).id < rxq->elts_n);
+ assert(wr->sg_list == &elt->sge);
+ assert(wr->num_sge == 1);
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
+ /*
+ * Fetch initial bytes of packet descriptor into a
+ * cacheline while allocating rep.
+ */
+ rte_prefetch0(seg);
+ rte_prefetch0(&seg->cacheline1);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+ goto repost;
+ }
+ ret = wc.byte_len;
+ }
+ if (ret == 0)
+ break;
+ len = ret;
+ rep = __rte_mbuf_raw_alloc(rxq->mp);
+ if (unlikely(rep == NULL)) {
+ /*
+ * Unable to allocate a replacement mbuf,
+ * repost WR.
+ */
+ DEBUG("rxq=%p, wr_id=%" PRIu32 ":"
+ " can't allocate a new mbuf",
+ (void *)rxq, WR_ID(wr_id).id);
+ /* Increment out of memory counters. */
+ ++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+ goto repost;
+ }
+
+ /* Reconfigure sge to use rep instead of seg. */
+ elt->sge.addr = (uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM;
+ assert(elt->sge.lkey == rxq->mr->lkey);
+ WR_ID(wr->wr_id).offset =
+ (((uintptr_t)rep->buf_addr + RTE_PKTMBUF_HEADROOM) -
+ (uintptr_t)rep);
+ assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+
+ /* Update seg information. */
+ SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
+ NB_SEGS(seg) = 1;
+ PORT(seg) = rxq->port_id;
+ NEXT(seg) = NULL;
+ PKT_LEN(seg) = len;
+ DATA_LEN(seg) = len;
+
+ /* Return packet. */
+ *(pkts++) = seg;
+ ++pkts_ret;
+repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
+ continue;
+ }
+ if (unlikely(i == 0))
+ return 0;
+ /* Repost WRs. */
+#ifdef DEBUG_RECV
+ DEBUG("%p: reposting %u WRs", (void *)rxq, i);
+#endif
+ ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+ if (unlikely(ret)) {
+ /* Inability to repost WRs is fatal. */
+ DEBUG("%p: recv_burst(): failed (ret=%d)",
+ (void *)rxq->priv,
+ ret);
+ abort();
+ }
+ rxq->elts_head = elts_head;
+ return pkts_ret;
+}
+
+/**
+ * Dummy DPDK callback for TX.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_txq
+ * Generic pointer to TX queue structure.
+ * @param[in] pkts
+ * Packets to transmit.
+ * @param pkts_n
+ * Number of packets in array.
+ *
+ * @return
+ * Number of packets successfully transmitted (<= pkts_n).
+ */
+uint16_t
+removed_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ (void)dpdk_txq;
+ (void)pkts;
+ (void)pkts_n;
+ return 0;
+}
+
+/**
+ * Dummy DPDK callback for RX.
+ *
+ * This function is used to temporarily replace the real callback during
+ * unsafe control operations on the queue, or in case of error.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ (void)dpdk_rxq;
+ (void)pkts;
+ (void)pkts_n;
+ return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
new file mode 100644
index 0000000..1459317
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -0,0 +1,156 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef RTE_PMD_MLX5_RXTX_H_
+#define RTE_PMD_MLX5_RXTX_H_
+
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+#include "mlx5_defs.h"
+
+/* RX element. */
+struct rxq_elt {
+ struct ibv_recv_wr wr; /* Work Request. */
+ struct ibv_sge sge; /* Scatter/Gather Element. */
+ /* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+};
+
+struct priv;
+
+/* RX queue descriptor. */
+struct rxq {
+ struct priv *priv; /* Back pointer to private data. */
+ struct rte_mempool *mp; /* Memory Pool for allocations. */
+ struct ibv_mr *mr; /* Memory Region (for mp). */
+ struct ibv_cq *cq; /* Completion Queue. */
+ struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+ unsigned int port_id; /* Port ID for incoming packets. */
+ unsigned int elts_n; /* (*elts)[] length. */
+ unsigned int elts_head; /* Current index in (*elts)[]. */
+ union {
+ struct rxq_elt (*no_sp)[]; /* RX elements. */
+ } elts;
+ uint32_t mb_len; /* Length of a mp-issued mbuf. */
+ unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
+};
+
+/* TX element. */
+struct txq_elt {
+ struct rte_mbuf *buf;
+};
+
+/* Linear buffer type. It is used when transmitting buffers with too many
+ * segments that do not fit the hardware queue (see max_send_sge).
+ * Extra segments are copied (linearized) in such buffers, replacing the
+ * last SGE during TX.
+ * The size is arbitrary but large enough to hold a jumbo frame with
+ * 8 segments considering mbuf.buf_len is about 2048 bytes. */
+typedef uint8_t linear_t[16384];
+
+/* TX queue descriptor. */
+struct txq {
+ struct priv *priv; /* Back pointer to private data. */
+ struct {
+ struct rte_mempool *mp; /* Cached Memory Pool. */
+ struct ibv_mr *mr; /* Memory Region (for mp). */
+ uint32_t lkey; /* mr->lkey */
+ } mp2mr[MLX5_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
+ struct ibv_cq *cq; /* Completion Queue. */
+ struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+#if MLX5_PMD_MAX_INLINE > 0
+ uint32_t max_inline; /* Max inline send size <= MLX5_PMD_MAX_INLINE. */
+#endif
+ unsigned int elts_n; /* (*elts)[] length. */
+ struct txq_elt (*elts)[]; /* TX elements. */
+ unsigned int elts_head; /* Current index in (*elts)[]. */
+ unsigned int elts_tail; /* First element awaiting completion. */
+ unsigned int elts_comp; /* Number of completion requests. */
+ unsigned int elts_comp_cd; /* Countdown for next completion request. */
+ unsigned int elts_comp_cd_init; /* Initial value for countdown. */
+ linear_t (*elts_linear)[]; /* Linearized buffers. */
+ struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
+ unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
+};
+
+/* mlx5_rxq.c */
+
+void rxq_cleanup(struct rxq *);
+int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
+ const struct rte_eth_rxconf *, struct rte_mempool *);
+int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
+ const struct rte_eth_rxconf *, struct rte_mempool *);
+void mlx5_rx_queue_release(void *);
+
+/* mlx5_txq.c */
+
+void txq_cleanup(struct txq *);
+int mlx5_tx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
+ const struct rte_eth_txconf *);
+void mlx5_tx_queue_release(void *);
+
+/* mlx5_rxtx.c */
+
+uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
+
+#endif /* RTE_PMD_MLX5_RXTX_H_ */
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
new file mode 100644
index 0000000..2bae61f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -0,0 +1,512 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <assert.h>
+#include <errno.h>
+#include <string.h>
+#include <stdint.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_mbuf.h>
+#include <rte_malloc.h>
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_autoconf.h"
+#include "mlx5_defs.h"
+
+/**
+ * Allocate TX queue elements.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+txq_alloc_elts(struct txq *txq, unsigned int elts_n)
+{
+ unsigned int i;
+ struct txq_elt (*elts)[elts_n] =
+ rte_calloc_socket("TXQ", 1, sizeof(*elts), 0, txq->socket);
+ linear_t (*elts_linear)[elts_n] =
+ rte_calloc_socket("TXQ", 1, sizeof(*elts_linear), 0,
+ txq->socket);
+ struct ibv_mr *mr_linear = NULL;
+ int ret = 0;
+
+ if ((elts == NULL) || (elts_linear == NULL)) {
+ ERROR("%p: can't allocate packets array", (void *)txq);
+ ret = ENOMEM;
+ goto error;
+ }
+ mr_linear =
+ ibv_reg_mr(txq->priv->pd, elts_linear, sizeof(*elts_linear),
+ (IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE));
+ if (mr_linear == NULL) {
+ ERROR("%p: unable to configure MR, ibv_reg_mr() failed",
+ (void *)txq);
+ ret = EINVAL;
+ goto error;
+ }
+ for (i = 0; (i != elts_n); ++i) {
+ struct txq_elt *elt = &(*elts)[i];
+
+ elt->buf = NULL;
+ }
+ DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
+ txq->elts_n = elts_n;
+ txq->elts = elts;
+ txq->elts_head = 0;
+ txq->elts_tail = 0;
+ txq->elts_comp = 0;
+ /* Request send completion every MLX5_PMD_TX_PER_COMP_REQ packets or
+ * at least 4 times per ring. */
+ txq->elts_comp_cd_init =
+ ((MLX5_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+ MLX5_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+ txq->elts_comp_cd = txq->elts_comp_cd_init;
+ txq->elts_linear = elts_linear;
+ txq->mr_linear = mr_linear;
+ assert(ret == 0);
+ return 0;
+error:
+ if (mr_linear != NULL)
+ claim_zero(ibv_dereg_mr(mr_linear));
+
+ rte_free(elts_linear);
+ rte_free(elts);
+
+ DEBUG("%p: failed, freed everything", (void *)txq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free TX queue elements.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ */
+static void
+txq_free_elts(struct txq *txq)
+{
+ unsigned int i;
+ unsigned int elts_n = txq->elts_n;
+ struct txq_elt (*elts)[elts_n] = txq->elts;
+ linear_t (*elts_linear)[elts_n] = txq->elts_linear;
+ struct ibv_mr *mr_linear = txq->mr_linear;
+
+ DEBUG("%p: freeing WRs", (void *)txq);
+ txq->elts_n = 0;
+ txq->elts = NULL;
+ txq->elts_linear = NULL;
+ txq->mr_linear = NULL;
+ if (mr_linear != NULL)
+ claim_zero(ibv_dereg_mr(mr_linear));
+
+ rte_free(elts_linear);
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct txq_elt *elt = &(*elts)[i];
+
+ if (elt->buf == NULL)
+ continue;
+ rte_pktmbuf_free(elt->buf);
+ }
+ rte_free(elts);
+}
+
+/**
+ * Clean up a TX queue.
+ *
+ * Destroy objects, free allocated memory and reset the structure for reuse.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ */
+void
+txq_cleanup(struct txq *txq)
+{
+ struct ibv_exp_release_intf_params params;
+ size_t i;
+
+ DEBUG("cleaning up %p", (void *)txq);
+ txq_free_elts(txq);
+ if (txq->if_qp != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_qp,
+ ¶ms));
+ }
+ if (txq->if_cq != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_cq,
+ ¶ms));
+ }
+ if (txq->qp != NULL)
+ claim_zero(ibv_destroy_qp(txq->qp));
+ if (txq->cq != NULL)
+ claim_zero(ibv_destroy_cq(txq->cq));
+ if (txq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
+ txq->rd,
+ &attr));
+ }
+ for (i = 0; (i != RTE_DIM(txq->mp2mr)); ++i) {
+ if (txq->mp2mr[i].mp == NULL)
+ break;
+ assert(txq->mp2mr[i].mr != NULL);
+ claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
+ }
+ memset(txq, 0, sizeof(*txq));
+}
+
+/**
+ * Configure a TX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
+ unsigned int socket, const struct rte_eth_txconf *conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct txq tmpl = {
+ .priv = priv,
+ .socket = socket
+ };
+ union {
+ struct ibv_exp_query_intf_params params;
+ struct ibv_exp_qp_init_attr init;
+ struct ibv_exp_res_domain_init_attr rd;
+ struct ibv_exp_cq_init_attr cq;
+ struct ibv_exp_qp_attr mod;
+ } attr;
+ enum ibv_exp_query_intf_status status;
+ int ret = 0;
+
+ (void)conf; /* Thresholds configuration (ignored). */
+ if ((desc == 0) || (desc % MLX5_PMD_SGE_WR_N)) {
+ ERROR("%p: invalid number of TX descriptors (must be a"
+ " multiple of %d)", (void *)dev, MLX5_PMD_SGE_WR_N);
+ return EINVAL;
+ }
+ desc /= MLX5_PMD_SGE_WR_N;
+ /* MRs will be registered in mp2mr[] later. */
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
+ if (tmpl.cq == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: CQ creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ DEBUG("priv->device_attr.max_qp_wr is %d",
+ priv->device_attr.max_qp_wr);
+ DEBUG("priv->device_attr.max_sge is %d",
+ priv->device_attr.max_sge);
+ attr.init = (struct ibv_exp_qp_init_attr){
+ /* CQ to be associated with the send queue. */
+ .send_cq = tmpl.cq,
+ /* CQ to be associated with the receive queue. */
+ .recv_cq = tmpl.cq,
+ .cap = {
+ /* Max number of outstanding WRs. */
+ .max_send_wr = ((priv->device_attr.max_qp_wr < desc) ?
+ priv->device_attr.max_qp_wr :
+ desc),
+ /* Max number of scatter/gather elements in a WR. */
+ .max_send_sge = ((priv->device_attr.max_sge <
+ MLX5_PMD_SGE_WR_N) ?
+ priv->device_attr.max_sge :
+ MLX5_PMD_SGE_WR_N),
+#if MLX5_PMD_MAX_INLINE > 0
+ .max_inline_data = MLX5_PMD_MAX_INLINE,
+#endif
+ },
+ .qp_type = IBV_QPT_RAW_PACKET,
+ /* Do *NOT* enable this, completions events are managed per
+ * TX burst. */
+ .sq_sig_all = 0,
+ .pd = priv->pd,
+ .res_domain = tmpl.rd,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
+ };
+ tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
+ if (tmpl.qp == NULL) {
+ ret = (errno ? errno : EINVAL);
+ ERROR("%p: QP creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+#if MLX5_PMD_MAX_INLINE > 0
+ /* ibv_create_qp() updates this value. */
+ tmpl.max_inline = attr.init.cap.max_inline_data;
+#endif
+ attr.mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod,
+ (IBV_EXP_QP_STATE | IBV_EXP_QP_PORT));
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ ret = txq_alloc_elts(&tmpl, desc);
+ if (ret) {
+ ERROR("%p: TXQ allocation failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.mod.qp_state = IBV_QPS_RTS;
+ ret = ibv_exp_modify_qp(tmpl.qp, &attr.mod, IBV_EXP_QP_STATE);
+ if (ret) {
+ ERROR("%p: QP state to IBV_QPS_RTS failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ret = EINVAL;
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ret = EINVAL;
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ /* Clean up txq in case we're reinitializing it. */
+ DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
+ txq_cleanup(txq);
+ *txq = tmpl;
+ DEBUG("%p: txq updated with %p", (void *)txq, (void *)&tmpl);
+ assert(ret == 0);
+ return 0;
+error:
+ txq_cleanup(&tmpl);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback to configure a TX queue.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param idx
+ * TX queue index.
+ * @param desc
+ * Number of descriptors to configure in queue.
+ * @param socket
+ * NUMA socket on which memory must be allocated.
+ * @param[in] conf
+ * Thresholds parameters.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
+ unsigned int socket, const struct rte_eth_txconf *conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct txq *txq = (*priv->txqs)[idx];
+ int ret;
+
+ priv_lock(priv);
+ DEBUG("%p: configuring queue %u for %u descriptors",
+ (void *)dev, idx, desc);
+ if (idx >= priv->txqs_n) {
+ ERROR("%p: queue index out of range (%u >= %u)",
+ (void *)dev, idx, priv->txqs_n);
+ priv_unlock(priv);
+ return -EOVERFLOW;
+ }
+ if (txq != NULL) {
+ DEBUG("%p: reusing already allocated queue index %u (%p)",
+ (void *)dev, idx, (void *)txq);
+ if (priv->started) {
+ priv_unlock(priv);
+ return -EEXIST;
+ }
+ (*priv->txqs)[idx] = NULL;
+ txq_cleanup(txq);
+ } else {
+ txq = rte_calloc_socket("TXQ", 1, sizeof(*txq), 0, socket);
+ if (txq == NULL) {
+ ERROR("%p: unable to allocate queue index %u",
+ (void *)dev, idx);
+ priv_unlock(priv);
+ return -ENOMEM;
+ }
+ }
+ ret = txq_setup(dev, txq, desc, socket, conf);
+ if (ret)
+ rte_free(txq);
+ else {
+ DEBUG("%p: adding TX queue %p to list",
+ (void *)dev, (void *)txq);
+ (*priv->txqs)[idx] = txq;
+ /* Update send callback. */
+ dev->tx_pkt_burst = mlx5_tx_burst;
+ }
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to release a TX queue.
+ *
+ * @param dpdk_txq
+ * Generic TX queue pointer.
+ */
+void
+mlx5_tx_queue_release(void *dpdk_txq)
+{
+ struct txq *txq = (struct txq *)dpdk_txq;
+ struct priv *priv;
+ unsigned int i;
+
+ if (txq == NULL)
+ return;
+ priv = txq->priv;
+ priv_lock(priv);
+ for (i = 0; (i != priv->txqs_n); ++i)
+ if ((*priv->txqs)[i] == txq) {
+ DEBUG("%p: removing TX queue %p from list",
+ (void *)priv->dev, (void *)txq);
+ (*priv->txqs)[i] = NULL;
+ break;
+ }
+ txq_cleanup(txq);
+ rte_free(txq);
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index cc6aab6..e48e6b6 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -140,10 +140,21 @@ pmd_drv_log_basename(const char *s)
#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__)
#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__)
+/* Convenience macros for accessing mbuf fields. */
+#define NEXT(m) ((m)->next)
+#define DATA_LEN(m) ((m)->data_len)
+#define PKT_LEN(m) ((m)->pkt_len)
+#define DATA_OFF(m) ((m)->data_off)
+#define SET_DATA_OFF(m, o) ((m)->data_off = (o))
+#define NB_SEGS(m) ((m)->nb_segs)
+#define PORT(m) ((m)->port)
+
/* Allocate a buffer on the stack and fill it with a printf format string. */
#define MKSTR(name, ...) \
char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
\
snprintf(name, sizeof(name), __VA_ARGS__)
+#define WR_ID(o) (((wr_id_t *)&(o))->data)
+
#endif /* RTE_PMD_MLX5_UTILS_H_ */
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 03/13] mlx5: add MAC handling
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
` (10 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
This commit adds support for MAC flow steering rules mandatory for the RX
path as well as the related callbacks to add/remove MAC addresses.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/mlx5/mlx5.h | 5 +
drivers/net/mlx5/mlx5_mac.c | 282 +++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 10 ++
drivers/net/mlx5/mlx5_rxtx.h | 2 +
5 files changed, 302 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 54bd6b9..a241c28 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -134,6 +134,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
.tx_queue_release = mlx5_tx_queue_release,
+ .mac_addr_remove = mlx5_mac_addr_remove,
+ .mac_addr_add = mlx5_mac_addr_add,
};
static struct {
@@ -390,7 +392,7 @@ mlx5_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
claim_zero(priv_mac_addr_add(priv, 0,
(const uint8_t (*)[ETHER_ADDR_LEN])
mac.addr_bytes));
- claim_zero(priv_mac_addr_add(priv, 1,
+ claim_zero(priv_mac_addr_add(priv, (RTE_DIM(priv->mac) - 1),
&(const uint8_t [ETHER_ADDR_LEN])
{ "\xff\xff\xff\xff\xff\xff" }));
#ifndef NDEBUG
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 49978f5..6ab31ad 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -166,7 +166,12 @@ int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
/* mlx5_mac.c */
int priv_get_mac(struct priv *, uint8_t (*)[ETHER_ADDR_LEN]);
+void rxq_mac_addrs_del(struct rxq *);
+void mlx5_mac_addr_remove(struct rte_eth_dev *, uint32_t);
+int rxq_mac_addrs_add(struct rxq *);
int priv_mac_addr_add(struct priv *, unsigned int,
const uint8_t (*)[ETHER_ADDR_LEN]);
+void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
+ uint32_t);
#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index f7e1cf6..262d7c6 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -65,6 +65,8 @@
#include "mlx5.h"
#include "mlx5_utils.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
/**
* Get MAC address by querying netdevice.
@@ -89,8 +91,69 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
}
/**
+ * Delete MAC flow steering rule.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index)
+{
+#ifndef NDEBUG
+ const uint8_t (*mac)[ETHER_ADDR_LEN] =
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ rxq->priv->mac[mac_index].addr_bytes;
+#endif
+
+ assert(mac_index < RTE_DIM(rxq->mac_flow));
+ if (rxq->mac_flow[mac_index] == NULL)
+ return;
+ DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index);
+ claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
+ rxq->mac_flow[mac_index] = NULL;
+}
+
+/**
+ * Unregister a MAC address from a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+{
+ assert(mac_index < RTE_DIM(rxq->mac_flow));
+ rxq_del_mac_flow(rxq, mac_index);
+}
+
+/**
+ * Unregister all MAC addresses from a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_mac_addrs_del(struct rxq *rxq)
+{
+ unsigned int i;
+
+ for (i = 0; (i != RTE_DIM(rxq->mac_flow)); ++i)
+ rxq_mac_addr_del(rxq, i);
+}
+
+/**
* Unregister a MAC address.
*
+ * In RSS mode, the MAC address is unregistered from the parent queue,
+ * otherwise it is unregistered from each queue directly.
+ *
* @param priv
* Pointer to private structure.
* @param mac_index
@@ -99,15 +162,179 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
static void
priv_mac_addr_del(struct priv *priv, unsigned int mac_index)
{
+ unsigned int i;
+
assert(mac_index < RTE_DIM(priv->mac));
if (!BITFIELD_ISSET(priv->mac_configured, mac_index))
return;
+ if (priv->rss) {
+ rxq_mac_addr_del(&priv->rxq_parent, mac_index);
+ goto end;
+ }
+ for (i = 0; (i != priv->dev->data->nb_rx_queues); ++i)
+ rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+end:
BITFIELD_RESET(priv->mac_configured, mac_index);
}
/**
+ * DPDK callback to remove a MAC address.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param index
+ * MAC address index.
+ */
+void
+mlx5_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ priv_lock(priv);
+ DEBUG("%p: removing MAC address from index %" PRIu32,
+ (void *)dev, index);
+ /* Last array entry is reserved for broadcast. */
+ if (index >= (RTE_DIM(priv->mac) - 1))
+ goto end;
+ priv_mac_addr_del(priv, index);
+end:
+ priv_unlock(priv);
+}
+
+/**
+ * Add MAC flow steering rule.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
+{
+ struct ibv_flow *flow;
+ struct priv *priv = rxq->priv;
+ const uint8_t (*mac)[ETHER_ADDR_LEN] =
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ priv->mac[mac_index].addr_bytes;
+ struct __attribute__((packed)) {
+ struct ibv_flow_attr attr;
+ struct ibv_flow_spec_eth spec;
+ } data;
+ struct ibv_flow_attr *attr = &data.attr;
+ struct ibv_flow_spec_eth *spec = &data.spec;
+
+ assert(mac_index < RTE_DIM(rxq->mac_flow));
+ if (rxq->mac_flow[mac_index] != NULL)
+ return 0;
+ /*
+ * No padding must be inserted by the compiler between attr and spec.
+ * This layout is expected by libibverbs.
+ */
+ assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
+ *attr = (struct ibv_flow_attr){
+ .type = IBV_FLOW_ATTR_NORMAL,
+ .num_of_specs = 1,
+ .port = priv->port,
+ .flags = 0
+ };
+ *spec = (struct ibv_flow_spec_eth){
+ .type = IBV_FLOW_SPEC_ETH,
+ .size = sizeof(*spec),
+ .val = {
+ .dst_mac = {
+ (*mac)[0], (*mac)[1], (*mac)[2],
+ (*mac)[3], (*mac)[4], (*mac)[5]
+ },
+ },
+ .mask = {
+ .dst_mac = "\xff\xff\xff\xff\xff\xff",
+ },
+ };
+ DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index);
+ /* Create related flow. */
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ rxq->mac_flow[mac_index] = flow;
+ return 0;
+}
+
+/**
+ * Register a MAC address in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+{
+ int ret;
+
+ assert(mac_index < RTE_DIM(rxq->mac_flow));
+ ret = rxq_add_mac_flow(rxq, mac_index);
+ if (ret)
+ return ret;
+ return 0;
+}
+
+/**
+ * Register all MAC addresses in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_mac_addrs_add(struct rxq *rxq)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ int ret;
+
+ assert(RTE_DIM(priv->mac) == RTE_DIM(rxq->mac_flow));
+ for (i = 0; (i != RTE_DIM(priv->mac)); ++i) {
+ if (!BITFIELD_ISSET(priv->mac_configured, i))
+ continue;
+ ret = rxq_mac_addr_add(rxq, i);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ rxq_mac_addr_del(rxq, --i);
+ assert(ret > 0);
+ return ret;
+ }
+ return 0;
+}
+
+/**
* Register a MAC address.
*
+ * In RSS mode, the MAC address is registered in the parent queue,
+ * otherwise it is registered in each queue directly.
+ *
* @param priv
* Pointer to private structure.
* @param mac_index
@@ -123,6 +350,7 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
const uint8_t (*mac)[ETHER_ADDR_LEN])
{
unsigned int i;
+ int ret;
assert(mac_index < RTE_DIM(priv->mac));
/* First, make sure this address isn't already configured. */
@@ -145,6 +373,60 @@ priv_mac_addr_add(struct priv *priv, unsigned int mac_index,
(*mac)[3], (*mac)[4], (*mac)[5]
}
};
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started)
+ goto end;
+ if (priv->rss) {
+ ret = rxq_mac_addr_add(&priv->rxq_parent, mac_index);
+ if (ret)
+ return ret;
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_mac_addr_add((*priv->rxqs)[i], mac_index);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[--i] != NULL)
+ rxq_mac_addr_del((*priv->rxqs)[i], mac_index);
+ return ret;
+ }
+end:
BITFIELD_SET(priv->mac_configured, mac_index);
return 0;
}
+
+/**
+ * DPDK callback to add a MAC address.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param mac_addr
+ * MAC address to register.
+ * @param index
+ * MAC address index.
+ * @param vmdq
+ * VMDq pool index to associate address with (ignored).
+ */
+void
+mlx5_mac_addr_add(struct rte_eth_dev *dev, struct ether_addr *mac_addr,
+ uint32_t index, uint32_t vmdq)
+{
+ struct priv *priv = dev->data->dev_private;
+
+ (void)vmdq;
+ priv_lock(priv);
+ DEBUG("%p: adding MAC address at index %" PRIu32,
+ (void *)dev, index);
+ /* Last array entry is reserved for broadcast. */
+ if (index >= (RTE_DIM(priv->mac) - 1))
+ goto end;
+ priv_mac_addr_add(priv, index,
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ mac_addr->addr_bytes);
+end:
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 01cc649..8450fe3 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -248,6 +248,7 @@ rxq_cleanup(struct rxq *rxq)
¶ms));
}
if (rxq->qp != NULL) {
+ rxq_mac_addrs_del(rxq);
claim_zero(ibv_destroy_qp(rxq->qp));
}
if (rxq->cq != NULL)
@@ -515,6 +516,15 @@ skip_mr:
(void *)dev, strerror(ret));
goto error;
}
+ if ((parent) || (!priv->rss)) {
+ /* Configure MAC and broadcast addresses. */
+ ret = rxq_mac_addrs_add(&tmpl);
+ if (ret) {
+ ERROR("%p: QP flow attachment failed: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ }
/* Allocate descriptors for RX queues, except for the RSS parent. */
if (parent)
goto skip_alloc;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 1459317..3733d3e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -78,6 +78,8 @@ struct rxq {
struct ibv_qp *qp; /* Queue Pair. */
struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
struct ibv_exp_cq_family *if_cq; /* CQ interface. */
+ /* MAC flow steering rules. */
+ struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES];
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 04/13] mlx5: add device configure/start/stop
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (2 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 03/13] mlx5: add MAC handling Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
` (9 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev; +Cc: Francesco Santoro
This commit adds the remaining missing callbacks to make mlx5 usable.
Like mlx4, device start and stop are implemented on top of MAC RX flows.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Francesco Santoro <francesco.santoro@6wind.com>
Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 4 ++
drivers/net/mlx5/mlx5.h | 7 ++
drivers/net/mlx5/mlx5_ethdev.c | 148 ++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_trigger.c | 146 +++++++++++++++++++++++++++++++++++++++
5 files changed, 306 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_trigger.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 7b9c57b..028c22c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -45,6 +45,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxq.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_txq.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index a241c28..aafa70b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -129,7 +129,11 @@ mlx5_dev_close(struct rte_eth_dev *dev)
}
static const struct eth_dev_ops mlx5_dev_ops = {
+ .dev_configure = mlx5_dev_configure,
+ .dev_start = mlx5_dev_start,
+ .dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .dev_infos_get = mlx5_dev_infos_get,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 6ab31ad..3f47a15 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -160,6 +160,8 @@ int priv_get_ifname(const struct priv *, char (*)[IF_NAMESIZE]);
int priv_ifreq(const struct priv *, int req, struct ifreq *);
int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
+int mlx5_dev_configure(struct rte_eth_dev *);
+void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
@@ -174,4 +176,9 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_trigger.c */
+
+int mlx5_dev_start(struct rte_eth_dev *);
+void mlx5_dev_stop(struct rte_eth_dev *);
+
#endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index b6c7d7a..6b13cec 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -32,6 +32,7 @@
*/
#include <stddef.h>
+#include <assert.h>
#include <unistd.h>
#include <stdint.h>
#include <stdio.h>
@@ -58,6 +59,7 @@
#endif
#include "mlx5.h"
+#include "mlx5_rxtx.h"
#include "mlx5_utils.h"
/**
@@ -370,6 +372,152 @@ priv_set_flags(struct priv *priv, unsigned int keep, unsigned int flags)
}
/**
+ * Ethernet device configuration.
+ *
+ * Prepare the driver for a given number of TX and RX queues.
+ * Allocate parent RSS queue when several RX queues are requested.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+dev_configure(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int rxqs_n = dev->data->nb_rx_queues;
+ unsigned int txqs_n = dev->data->nb_tx_queues;
+ unsigned int tmp;
+ int ret;
+
+ priv->rxqs = (void *)dev->data->rx_queues;
+ priv->txqs = (void *)dev->data->tx_queues;
+ if (txqs_n != priv->txqs_n) {
+ INFO("%p: TX queues number update: %u -> %u",
+ (void *)dev, priv->txqs_n, txqs_n);
+ priv->txqs_n = txqs_n;
+ }
+ if (rxqs_n == priv->rxqs_n)
+ return 0;
+ INFO("%p: RX queues number update: %u -> %u",
+ (void *)dev, priv->rxqs_n, rxqs_n);
+ /* If RSS is enabled, disable it first. */
+ if (priv->rss) {
+ unsigned int i;
+
+ /* Only if there are no remaining child RX queues. */
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ return EINVAL;
+ rxq_cleanup(&priv->rxq_parent);
+ priv->rss = 0;
+ priv->rxqs_n = 0;
+ }
+ if (rxqs_n <= 1) {
+ /* Nothing else to do. */
+ priv->rxqs_n = rxqs_n;
+ return 0;
+ }
+ /* Allocate a new RSS parent queue if supported by hardware. */
+ if (!priv->hw_rss) {
+ ERROR("%p: only a single RX queue can be configured when"
+ " hardware doesn't support RSS",
+ (void *)dev);
+ return EINVAL;
+ }
+ /* Fail if hardware doesn't support that many RSS queues. */
+ if (rxqs_n >= priv->max_rss_tbl_sz) {
+ ERROR("%p: only %u RX queues can be configured for RSS",
+ (void *)dev, priv->max_rss_tbl_sz);
+ return EINVAL;
+ }
+ priv->rss = 1;
+ tmp = priv->rxqs_n;
+ priv->rxqs_n = rxqs_n;
+ ret = rxq_setup(dev, &priv->rxq_parent, 0, 0, NULL, NULL);
+ if (!ret)
+ return 0;
+ /* Failure, rollback. */
+ priv->rss = 0;
+ priv->rxqs_n = tmp;
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * DPDK callback for Ethernet device configuration.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_configure(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = dev_configure(dev);
+ assert(ret >= 0);
+ priv_unlock(priv);
+ return -ret;
+}
+
+/**
+ * DPDK callback to get information about the device.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] info
+ * Info structure output buffer.
+ */
+void
+mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int max;
+ char ifname[IF_NAMESIZE];
+
+ priv_lock(priv);
+ /* FIXME: we should ask the device for these values. */
+ info->min_rx_bufsize = 32;
+ info->max_rx_pktlen = 65536;
+ /*
+ * Since we need one CQ per QP, the limit is the minimum number
+ * between the two values.
+ */
+ max = ((priv->device_attr.max_cq > priv->device_attr.max_qp) ?
+ priv->device_attr.max_qp : priv->device_attr.max_cq);
+ /* If max >= 65535 then max = 0, max_rx_queues is uint16_t. */
+ if (max >= 65535)
+ max = 65535;
+ info->max_rx_queues = max;
+ info->max_tx_queues = max;
+ /* Last array entry is reserved for broadcast. */
+ info->max_mac_addrs = (RTE_DIM(priv->mac) - 1);
+ info->rx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_RX_OFFLOAD_IPV4_CKSUM |
+ DEV_RX_OFFLOAD_UDP_CKSUM |
+ DEV_RX_OFFLOAD_TCP_CKSUM) :
+ 0);
+ info->tx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_TX_OFFLOAD_IPV4_CKSUM |
+ DEV_TX_OFFLOAD_UDP_CKSUM |
+ DEV_TX_OFFLOAD_TCP_CKSUM) :
+ 0);
+ if (priv_get_ifname(priv, &ifname) == 0)
+ info->if_index = if_nametoindex(ifname);
+ priv_unlock(priv);
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
new file mode 100644
index 0000000..f5d965f
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -0,0 +1,146 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ether.h>
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+
+/**
+ * DPDK callback to start the device.
+ *
+ * Simulate device start by attaching all configured flows.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_start(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i = 0;
+ unsigned int r;
+ struct rxq *rxq;
+
+ priv_lock(priv);
+ if (priv->started) {
+ priv_unlock(priv);
+ return 0;
+ }
+ DEBUG("%p: attaching configured flows to all RX queues", (void *)dev);
+ priv->started = 1;
+ if (priv->rss) {
+ rxq = &priv->rxq_parent;
+ r = 1;
+ } else {
+ rxq = (*priv->rxqs)[0];
+ r = priv->rxqs_n;
+ }
+ /* Iterate only once when RSS is enabled. */
+ do {
+ int ret;
+
+ /* Ignore nonexistent RX queues. */
+ if (rxq == NULL)
+ continue;
+ ret = rxq_mac_addrs_add(rxq);
+ if (!ret)
+ continue;
+ WARN("%p: QP flow attachment failed: %s",
+ (void *)dev, strerror(ret));
+ /* Rollback. */
+ while (i != 0) {
+ rxq = (*priv->rxqs)[--i];
+ if (rxq != NULL) {
+ rxq_mac_addrs_del(rxq);
+ }
+ }
+ priv->started = 0;
+ priv_unlock(priv);
+ return -ret;
+ } while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+ priv_unlock(priv);
+ return 0;
+}
+
+/**
+ * DPDK callback to stop the device.
+ *
+ * Simulate device stop by detaching all configured flows.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_dev_stop(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i = 0;
+ unsigned int r;
+ struct rxq *rxq;
+
+ priv_lock(priv);
+ if (!priv->started) {
+ priv_unlock(priv);
+ return;
+ }
+ DEBUG("%p: detaching flows from all RX queues", (void *)dev);
+ priv->started = 0;
+ if (priv->rss) {
+ rxq = &priv->rxq_parent;
+ r = 1;
+ } else {
+ rxq = (*priv->rxqs)[0];
+ r = priv->rxqs_n;
+ }
+ /* Iterate only once when RSS is enabled. */
+ do {
+ /* Ignore nonexistent RX queues. */
+ if (rxq == NULL)
+ continue;
+ rxq_mac_addrs_del(rxq);
+ } while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
+ priv_unlock(priv);
+}
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 05/13] mlx5: add support for scattered RX and TX buffers
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (3 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 06/13] mlx5: add MTU configuration support Adrien Mazarguil
` (8 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
A dedicated RX callback is added to handle scattered buffers. For better
performance, it is only used when jumbo frames are enabled and MTU is larger
than a single mbuf.
On the TX path, scattered buffers are also handled in a separate function.
When there are more than MLX5_PMD_SGE_WR_N segments in a given mbuf, the
remaining segments are linearized in the last SGE entry.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5_rxq.c | 175 +++++++++++++++++++-
drivers/net/mlx5/mlx5_rxtx.c | 376 +++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 10 ++
3 files changed, 557 insertions(+), 4 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 8450fe3..1eddfc7 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -65,6 +65,153 @@
#include "mlx5_defs.h"
/**
+ * Allocate RX queue elements with scattered packets support.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param elts_n
+ * Number of elements to allocate.
+ * @param[in] pool
+ * If not NULL, fetch buffers from this array instead of allocating them
+ * with rte_pktmbuf_alloc().
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
+ struct rte_mbuf **pool)
+{
+ unsigned int i;
+ struct rxq_elt_sp (*elts)[elts_n] =
+ rte_calloc_socket("RXQ elements", 1, sizeof(*elts), 0,
+ rxq->socket);
+ int ret = 0;
+
+ if (elts == NULL) {
+ ERROR("%p: can't allocate packets array", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ /* For each WR (packet). */
+ for (i = 0; (i != elts_n); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+ struct ibv_recv_wr *wr = &elt->wr;
+ struct ibv_sge (*sges)[RTE_DIM(elt->sges)] = &elt->sges;
+
+ /* These two arrays must have the same size. */
+ assert(RTE_DIM(elt->sges) == RTE_DIM(elt->bufs));
+ /* Configure WR. */
+ wr->wr_id = i;
+ wr->next = &(*elts)[(i + 1)].wr;
+ wr->sg_list = &(*sges)[0];
+ wr->num_sge = RTE_DIM(*sges);
+ /* For each SGE (segment). */
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct ibv_sge *sge = &(*sges)[j];
+ struct rte_mbuf *buf;
+
+ if (pool != NULL) {
+ buf = *(pool++);
+ assert(buf != NULL);
+ rte_pktmbuf_reset(buf);
+ } else
+ buf = rte_pktmbuf_alloc(rxq->mp);
+ if (buf == NULL) {
+ assert(pool == NULL);
+ ERROR("%p: empty mbuf pool", (void *)rxq);
+ ret = ENOMEM;
+ goto error;
+ }
+ elt->bufs[j] = buf;
+ /* Headroom is reserved by rte_pktmbuf_alloc(). */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ /* Buffer is supposed to be empty. */
+ assert(rte_pktmbuf_data_len(buf) == 0);
+ assert(rte_pktmbuf_pkt_len(buf) == 0);
+ /* sge->addr must be able to store a pointer. */
+ assert(sizeof(sge->addr) >= sizeof(uintptr_t));
+ if (j == 0) {
+ /* The first SGE keeps its headroom. */
+ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ sge->length = (buf->buf_len -
+ RTE_PKTMBUF_HEADROOM);
+ } else {
+ /* Subsequent SGEs lose theirs. */
+ assert(DATA_OFF(buf) == RTE_PKTMBUF_HEADROOM);
+ SET_DATA_OFF(buf, 0);
+ sge->addr = (uintptr_t)buf->buf_addr;
+ sge->length = buf->buf_len;
+ }
+ sge->lkey = rxq->mr->lkey;
+ /* Redundant check for tailroom. */
+ assert(sge->length == rte_pktmbuf_tailroom(buf));
+ }
+ }
+ /* The last WR pointer must be NULL. */
+ (*elts)[(i - 1)].wr.next = NULL;
+ DEBUG("%p: allocated and configured %u WRs (%zu segments)",
+ (void *)rxq, elts_n, (elts_n * RTE_DIM((*elts)[0].sges)));
+ rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
+ rxq->elts.sp = elts;
+ assert(ret == 0);
+ return 0;
+error:
+ if (elts != NULL) {
+ assert(pool == NULL);
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct rte_mbuf *buf = elt->bufs[j];
+
+ if (buf != NULL)
+ rte_pktmbuf_free_seg(buf);
+ }
+ }
+ rte_free(elts);
+ }
+ DEBUG("%p: failed, freed everything", (void *)rxq);
+ assert(ret > 0);
+ return ret;
+}
+
+/**
+ * Free RX queue elements with scattered packets support.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+static void
+rxq_free_elts_sp(struct rxq *rxq)
+{
+ unsigned int i;
+ unsigned int elts_n = rxq->elts_n;
+ struct rxq_elt_sp (*elts)[elts_n] = rxq->elts.sp;
+
+ DEBUG("%p: freeing WRs", (void *)rxq);
+ rxq->elts_n = 0;
+ rxq->elts.sp = NULL;
+ if (elts == NULL)
+ return;
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ unsigned int j;
+ struct rxq_elt_sp *elt = &(*elts)[i];
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ struct rte_mbuf *buf = elt->bufs[j];
+
+ if (buf != NULL)
+ rte_pktmbuf_free_seg(buf);
+ }
+ }
+ rte_free(elts);
+}
+
+/**
* Allocate RX queue elements.
*
* @param rxq
@@ -224,7 +371,10 @@ rxq_cleanup(struct rxq *rxq)
struct ibv_exp_release_intf_params params;
DEBUG("cleaning up %p", (void *)rxq);
- rxq_free_elts(rxq);
+ if (rxq->sp)
+ rxq_free_elts_sp(rxq);
+ else
+ rxq_free_elts(rxq);
if (rxq->if_qp != NULL) {
assert(rxq->priv != NULL);
assert(rxq->priv->ctx != NULL);
@@ -445,6 +595,15 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+ /* Enable scattered packets support for this queue if necessary. */
+ if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
+ (dev->data->dev_conf.rxmode.max_rx_pkt_len >
+ (tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+ tmpl.sp = 1;
+ desc /= MLX5_PMD_SGE_WR_N;
+ }
+ DEBUG("%p: %s scattered packets support (%u WRs)",
+ (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc);
/* Use the entire RX mempool as the memory region. */
tmpl.mr = ibv_reg_mr(priv->pd,
(void *)mp->elt_va_start,
@@ -528,14 +687,19 @@ skip_mr:
/* Allocate descriptors for RX queues, except for the RSS parent. */
if (parent)
goto skip_alloc;
- ret = rxq_alloc_elts(&tmpl, desc, NULL);
+ if (tmpl.sp)
+ ret = rxq_alloc_elts_sp(&tmpl, desc, NULL);
+ else
+ ret = rxq_alloc_elts(&tmpl, desc, NULL);
if (ret) {
ERROR("%p: RXQ allocation failed: %s",
(void *)dev, strerror(ret));
goto error;
}
ret = ibv_post_recv(tmpl.qp,
- &(*tmpl.elts.no_sp)[0].wr,
+ (tmpl.sp ?
+ &(*tmpl.elts.sp)[0].wr :
+ &(*tmpl.elts.no_sp)[0].wr),
&bad_wr);
if (ret) {
ERROR("%p: ibv_post_recv() failed for WR %p: %s",
@@ -655,7 +819,10 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
(void *)dev, (void *)rxq);
(*priv->rxqs)[idx] = rxq;
/* Update receive callback. */
- dev->rx_pkt_burst = mlx5_rx_burst;
+ if (rxq->sp)
+ dev->rx_pkt_burst = mlx5_rx_burst_sp;
+ else
+ dev->rx_pkt_burst = mlx5_rx_burst;
}
priv_unlock(priv);
return -ret;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 0f1e541..ed6faa1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -173,6 +173,154 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
return txq->mp2mr[i].lkey;
}
+#if MLX5_PMD_SGE_WR_N > 1
+
+/**
+ * Copy scattered mbuf contents to a single linear buffer.
+ *
+ * @param[out] linear
+ * Linear output buffer.
+ * @param[in] buf
+ * Scattered input buffer.
+ *
+ * @return
+ * Number of bytes copied to the output buffer or 0 if not large enough.
+ */
+static unsigned int
+linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
+{
+ unsigned int size = 0;
+ unsigned int offset;
+
+ do {
+ unsigned int len = DATA_LEN(buf);
+
+ offset = size;
+ size += len;
+ if (unlikely(size > sizeof(*linear)))
+ return 0;
+ memcpy(&(*linear)[offset],
+ rte_pktmbuf_mtod(buf, uint8_t *),
+ len);
+ buf = NEXT(buf);
+ } while (buf != NULL);
+ return size;
+}
+
+/**
+ * Handle scattered buffers for mlx5_tx_burst().
+ *
+ * @param txq
+ * TX queue structure.
+ * @param segs
+ * Number of segments in buf.
+ * @param elt
+ * TX queue element to fill.
+ * @param[in] buf
+ * Buffer to process.
+ * @param elts_head
+ * Index of the linear buffer to use if necessary (normally txq->elts_head).
+ * @param[out] sges
+ * Array filled with SGEs on success.
+ *
+ * @return
+ * A structure containing the processed packet size in bytes and the
+ * number of SGEs. Both fields are set to (unsigned int)-1 in case of
+ * failure.
+ */
+static struct tx_burst_sg_ret {
+ unsigned int length;
+ unsigned int num;
+}
+tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
+ struct rte_mbuf *buf, unsigned int elts_head,
+ struct ibv_sge (*sges)[MLX5_PMD_SGE_WR_N])
+{
+ unsigned int sent_size = 0;
+ unsigned int j;
+ int linearize = 0;
+
+ /* When there are too many segments, extra segments are
+ * linearized in the last SGE. */
+ if (unlikely(segs > RTE_DIM(*sges))) {
+ segs = (RTE_DIM(*sges) - 1);
+ linearize = 1;
+ }
+ /* Update element. */
+ elt->buf = buf;
+ /* Register segments as SGEs. */
+ for (j = 0; (j != segs); ++j) {
+ struct ibv_sge *sge = &(*sges)[j];
+ uint32_t lkey;
+
+ /* Retrieve Memory Region key for this memory pool. */
+ lkey = txq_mp2mr(txq, buf->pool);
+ if (unlikely(lkey == (uint32_t)-1)) {
+ /* MR does not exist. */
+ DEBUG("%p: unable to get MP <-> MR association",
+ (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* Update SGE. */
+ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ if (txq->priv->vf)
+ rte_prefetch0((volatile void *)
+ (uintptr_t)sge->addr);
+ sge->length = DATA_LEN(buf);
+ sge->lkey = lkey;
+ sent_size += sge->length;
+ buf = NEXT(buf);
+ }
+ /* If buf is not NULL here and is not going to be linearized,
+ * nb_segs is not valid. */
+ assert(j == segs);
+ assert((buf == NULL) || (linearize));
+ /* Linearize extra segments. */
+ if (linearize) {
+ struct ibv_sge *sge = &(*sges)[segs];
+ linear_t *linear = &(*txq->elts_linear)[elts_head];
+ unsigned int size = linearize_mbuf(linear, buf);
+
+ assert(segs == (RTE_DIM(*sges) - 1));
+ if (size == 0) {
+ /* Invalid packet. */
+ DEBUG("%p: packet too large to be linearized.",
+ (void *)txq);
+ /* Clean up TX element. */
+ elt->buf = NULL;
+ goto stop;
+ }
+ /* If MLX5_PMD_SGE_WR_N is 1, free mbuf immediately. */
+ if (RTE_DIM(*sges) == 1) {
+ do {
+ struct rte_mbuf *next = NEXT(buf);
+
+ rte_pktmbuf_free_seg(buf);
+ buf = next;
+ } while (buf != NULL);
+ elt->buf = NULL;
+ }
+ /* Update SGE. */
+ sge->addr = (uintptr_t)&(*linear)[0];
+ sge->length = size;
+ sge->lkey = txq->mr_linear->lkey;
+ sent_size += size;
+ }
+ return (struct tx_burst_sg_ret){
+ .length = sent_size,
+ .num = segs,
+ };
+stop:
+ return (struct tx_burst_sg_ret){
+ .length = -1,
+ .num = -1,
+ };
+}
+
+#endif /* MLX5_PMD_SGE_WR_N > 1 */
+
/**
* DPDK callback for TX.
*
@@ -282,9 +430,28 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
if (unlikely(err))
goto stop;
} else {
+#if MLX5_PMD_SGE_WR_N > 1
+ struct ibv_sge sges[MLX5_PMD_SGE_WR_N];
+ struct tx_burst_sg_ret ret;
+
+ ret = tx_burst_sg(txq, segs, elt, buf, elts_head,
+ &sges);
+ if (ret.length == (unsigned int)-1)
+ goto stop;
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
+ /* Put SG list into send queue. */
+ err = txq->if_qp->send_pending_sg_list
+ (txq->qp,
+ sges,
+ ret.num,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
+#else /* MLX5_PMD_SGE_WR_N > 1 */
DEBUG("%p: TX scattered buffers support not"
" compiled in", (void *)txq);
goto stop;
+#endif /* MLX5_PMD_SGE_WR_N > 1 */
}
elts_head = elts_head_next;
}
@@ -307,8 +474,215 @@ stop:
}
/**
+ * DPDK callback for RX with scattered packets support.
+ *
+ * @param dpdk_rxq
+ * Generic pointer to RX queue structure.
+ * @param[out] pkts
+ * Array to store received packets.
+ * @param pkts_n
+ * Maximum number of packets in array.
+ *
+ * @return
+ * Number of packets successfully received (<= pkts_n).
+ */
+uint16_t
+mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
+{
+ struct rxq *rxq = (struct rxq *)dpdk_rxq;
+ struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
+ struct ibv_recv_wr head;
+ struct ibv_recv_wr **next = &head.next;
+ struct ibv_recv_wr *bad_wr;
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
+
+ if (unlikely(!rxq->sp))
+ return mlx5_rx_burst(dpdk_rxq, pkts, pkts_n);
+ if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
+ return 0;
+ for (i = 0; (i != pkts_n); ++i) {
+ struct rxq_elt_sp *elt = &(*elts)[elts_head];
+ struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
+ unsigned int pkt_buf_len;
+ struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
+ struct rte_mbuf **pkt_buf_next = &pkt_buf;
+ unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
+ unsigned int j = 0;
+ uint32_t flags;
+
+ /* Sanity checks. */
+#ifdef NDEBUG
+ (void)wr_id;
+#endif
+ assert(wr_id < rxq->elts_n);
+ assert(wr->sg_list == elt->sges);
+ assert(wr->num_sge == RTE_DIM(elt->sges));
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
+ goto repost;
+ }
+ ret = wc.byte_len;
+ }
+ if (ret == 0)
+ break;
+ len = ret;
+ pkt_buf_len = len;
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
+ /*
+ * Replace spent segments with new ones, concatenate and
+ * return them as pkt_buf.
+ */
+ while (1) {
+ struct ibv_sge *sge = &elt->sges[j];
+ struct rte_mbuf *seg = elt->bufs[j];
+ struct rte_mbuf *rep;
+ unsigned int seg_tailroom;
+
+ /*
+ * Fetch initial bytes of packet descriptor into a
+ * cacheline while allocating rep.
+ */
+ rte_prefetch0(seg);
+ rep = __rte_mbuf_raw_alloc(rxq->mp);
+ if (unlikely(rep == NULL)) {
+ /*
+ * Unable to allocate a replacement mbuf,
+ * repost WR.
+ */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
+ " can't allocate a new mbuf",
+ (void *)rxq, wr_id);
+ if (pkt_buf != NULL) {
+ *pkt_buf_next = NULL;
+ rte_pktmbuf_free(pkt_buf);
+ }
+ /* Increment out of memory counters. */
+ ++rxq->priv->dev->data->rx_mbuf_alloc_failed;
+ goto repost;
+ }
+#ifndef NDEBUG
+ /* Poison user-modifiable fields in rep. */
+ NEXT(rep) = (void *)((uintptr_t)-1);
+ SET_DATA_OFF(rep, 0xdead);
+ DATA_LEN(rep) = 0xd00d;
+ PKT_LEN(rep) = 0xdeadd00d;
+ NB_SEGS(rep) = 0x2a;
+ PORT(rep) = 0x2a;
+ rep->ol_flags = -1;
+#endif
+ assert(rep->buf_len == seg->buf_len);
+ assert(rep->buf_len == rxq->mb_len);
+ /* Reconfigure sge to use rep instead of seg. */
+ assert(sge->lkey == rxq->mr->lkey);
+ sge->addr = ((uintptr_t)rep->buf_addr + seg_headroom);
+ elt->bufs[j] = rep;
+ ++j;
+ /* Update pkt_buf if it's the first segment, or link
+ * seg to the previous one and update pkt_buf_next. */
+ *pkt_buf_next = seg;
+ pkt_buf_next = &NEXT(seg);
+ /* Update seg information. */
+ seg_tailroom = (seg->buf_len - seg_headroom);
+ assert(sge->length == seg_tailroom);
+ SET_DATA_OFF(seg, seg_headroom);
+ if (likely(len <= seg_tailroom)) {
+ /* Last segment. */
+ DATA_LEN(seg) = len;
+ PKT_LEN(seg) = len;
+ /* Sanity check. */
+ assert(rte_pktmbuf_headroom(seg) ==
+ seg_headroom);
+ assert(rte_pktmbuf_tailroom(seg) ==
+ (seg_tailroom - len));
+ break;
+ }
+ DATA_LEN(seg) = seg_tailroom;
+ PKT_LEN(seg) = seg_tailroom;
+ /* Sanity check. */
+ assert(rte_pktmbuf_headroom(seg) == seg_headroom);
+ assert(rte_pktmbuf_tailroom(seg) == 0);
+ /* Fix len and clear headroom for next segments. */
+ len -= seg_tailroom;
+ seg_headroom = 0;
+ }
+ /* Update head and tail segments. */
+ *pkt_buf_next = NULL;
+ assert(pkt_buf != NULL);
+ assert(j != 0);
+ NB_SEGS(pkt_buf) = j;
+ PORT(pkt_buf) = rxq->port_id;
+ PKT_LEN(pkt_buf) = pkt_buf_len;
+
+ /* Return packet. */
+ *(pkts++) = pkt_buf;
+ ++pkts_ret;
+repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
+ continue;
+ }
+ if (unlikely(i == 0))
+ return 0;
+ *next = NULL;
+ /* Repost WRs. */
+#ifdef DEBUG_RECV
+ DEBUG("%p: reposting %d WRs", (void *)rxq, i);
+#endif
+ ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
+ if (unlikely(ret)) {
+ /* Inability to repost WRs is fatal. */
+ DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
+ (void *)rxq->priv,
+ (void *)bad_wr,
+ strerror(ret));
+ abort();
+ }
+ rxq->elts_head = elts_head;
+ return pkts_ret;
+}
+
+/**
* DPDK callback for RX.
*
+ * The following function is the same as mlx5_rx_burst_sp(), except it doesn't
+ * manage scattered packets. Improves performance when MRU is lower than the
+ * size of the first segment.
+ *
* @param dpdk_rxq
* Generic pointer to RX queue structure.
* @param[out] pkts
@@ -331,6 +705,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
unsigned int pkts_ret = 0;
int ret;
+ if (unlikely(rxq->sp))
+ return mlx5_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
for (i = 0; (i != pkts_n); ++i) {
struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 3733d3e..c7f634e 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -60,6 +60,13 @@
#include "mlx5.h"
#include "mlx5_defs.h"
+/* RX element (scattered packets). */
+struct rxq_elt_sp {
+ struct ibv_recv_wr wr; /* Work Request. */
+ struct ibv_sge sges[MLX5_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
+ struct rte_mbuf *bufs[MLX5_PMD_SGE_WR_N]; /* SGEs buffers. */
+};
+
/* RX element. */
struct rxq_elt {
struct ibv_recv_wr wr; /* Work Request. */
@@ -84,8 +91,10 @@ struct rxq {
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
union {
+ struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
+ unsigned int sp:1; /* Use scattered RX elements. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
unsigned int socket; /* CPU socket ID for allocations. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
@@ -151,6 +160,7 @@ void mlx5_tx_queue_release(void *);
/* mlx5_rxtx.c */
uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t);
+uint16_t mlx5_rx_burst_sp(void *, struct rte_mbuf **, uint16_t);
uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t);
uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t);
uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 06/13] mlx5: add MTU configuration support
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (4 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
` (7 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
Depending on the MTU and whether jumbo frames are enabled, RX queues may
switch between SG and non-SG modes for better performance.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 102 +++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 178 +++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 1 +
5 files changed, 283 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index aafa70b..ddd74d0 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -140,6 +140,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_release = mlx5_tx_queue_release,
.mac_addr_remove = mlx5_mac_addr_remove,
.mac_addr_add = mlx5_mac_addr_add,
+ .mtu_set = mlx5_dev_set_mtu,
};
static struct {
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3f47a15..0e2457a 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -162,6 +162,7 @@ int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
+int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 6b13cec..0afc1bb 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -347,6 +347,23 @@ priv_get_mtu(struct priv *priv, uint16_t *mtu)
}
/**
+ * Set device MTU.
+ *
+ * @param priv
+ * Pointer to private structure.
+ * @param mtu
+ * MTU value to set.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
+ */
+static int
+priv_set_mtu(struct priv *priv, uint16_t mtu)
+{
+ return priv_set_sysfs_ulong(priv, "mtu", mtu);
+}
+
+/**
* Set device flags.
*
* @param priv
@@ -518,6 +535,91 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
}
/**
+ * DPDK callback to change the MTU.
+ *
+ * Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
+ * received). Use this as a hint to enable/disable scattered packets support
+ * and improve performance when not needed.
+ * Since failure is not an option, reconfiguring queues on the fly is not
+ * recommended.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param in_mtu
+ * New MTU.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret = 0;
+ unsigned int i;
+ uint16_t (*rx_func)(void *, struct rte_mbuf **, uint16_t) =
+ mlx5_rx_burst;
+
+ priv_lock(priv);
+ /* Set kernel interface MTU first. */
+ if (priv_set_mtu(priv, mtu)) {
+ ret = errno;
+ WARN("cannot set port %u MTU to %u: %s", priv->port, mtu,
+ strerror(ret));
+ goto out;
+ } else
+ DEBUG("adapter port %u MTU set to %u", priv->port, mtu);
+ priv->mtu = mtu;
+ /* Temporarily replace RX handler with a fake one, assuming it has not
+ * been copied elsewhere. */
+ dev->rx_pkt_burst = removed_rx_burst;
+ /* Make sure everyone has left mlx5_rx_burst() and uses
+ * removed_rx_burst() instead. */
+ rte_wmb();
+ usleep(1000);
+ /* Reconfigure each RX queue. */
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ struct rxq *rxq = (*priv->rxqs)[i];
+ unsigned int max_frame_len;
+ int sp;
+
+ if (rxq == NULL)
+ continue;
+ /* Calculate new maximum frame length according to MTU and
+ * toggle scattered support (sp) if necessary. */
+ max_frame_len = (priv->mtu + ETHER_HDR_LEN +
+ (ETHER_MAX_VLAN_FRAME_LEN - ETHER_MAX_LEN));
+ sp = (max_frame_len > (rxq->mb_len - RTE_PKTMBUF_HEADROOM));
+ /* Provide new values to rxq_setup(). */
+ dev->data->dev_conf.rxmode.jumbo_frame = sp;
+ dev->data->dev_conf.rxmode.max_rx_pkt_len = max_frame_len;
+ ret = rxq_rehash(dev, rxq);
+ if (ret) {
+ /* Force SP RX if that queue requires it and abort. */
+ if (rxq->sp)
+ rx_func = mlx5_rx_burst_sp;
+ break;
+ }
+ /* Reenable non-RSS queue attributes. No need to check
+ * for errors at this stage. */
+ if (!priv->rss) {
+ if (priv->started)
+ rxq_mac_addrs_add(rxq);
+ }
+ /* Scattered burst function takes priority. */
+ if (rxq->sp)
+ rx_func = mlx5_rx_burst_sp;
+ }
+ /* Burst functions can now be called again. */
+ rte_wmb();
+ dev->rx_pkt_burst = rx_func;
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1eddfc7..71d4470 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -526,6 +526,184 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
#endif /* RSS_SUPPORT */
/**
+ * Reconfigure a RX queue with new parameters.
+ *
+ * rxq_rehash() does not allocate mbufs, which, if not done from the right
+ * thread (such as a control thread), may corrupt the pool.
+ * In case of failure, the queue is left untouched.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param rxq
+ * RX queue pointer.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
+{
+ struct priv *priv = rxq->priv;
+ struct rxq tmpl = *rxq;
+ unsigned int mbuf_n;
+ unsigned int desc_n;
+ struct rte_mbuf **pool;
+ unsigned int i, k;
+ struct ibv_exp_qp_attr mod;
+ struct ibv_recv_wr *bad_wr;
+ int err;
+ int parent = (rxq == &priv->rxq_parent);
+
+ if (parent) {
+ ERROR("%p: cannot rehash parent queue %p",
+ (void *)dev, (void *)rxq);
+ return EINVAL;
+ }
+ DEBUG("%p: rehashing queue %p", (void *)dev, (void *)rxq);
+ /* Number of descriptors and mbufs currently allocated. */
+ desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
+ mbuf_n = desc_n;
+ /* Enable scattered packets support for this queue if necessary. */
+ if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
+ (dev->data->dev_conf.rxmode.max_rx_pkt_len >
+ (tmpl.mb_len - RTE_PKTMBUF_HEADROOM))) {
+ tmpl.sp = 1;
+ desc_n /= MLX5_PMD_SGE_WR_N;
+ } else
+ tmpl.sp = 0;
+ DEBUG("%p: %s scattered packets support (%u WRs)",
+ (void *)dev, (tmpl.sp ? "enabling" : "disabling"), desc_n);
+ /* If scatter mode is the same as before, nothing to do. */
+ if (tmpl.sp == rxq->sp) {
+ DEBUG("%p: nothing to do", (void *)dev);
+ return 0;
+ }
+ /* Remove attached flows if RSS is disabled (no parent queue). */
+ if (!priv->rss) {
+ rxq_mac_addrs_del(&tmpl);
+ /* Update original queue in case of failure. */
+ memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+ }
+ /* From now on, any failure will render the queue unusable.
+ * Reinitialize QP. */
+ mod = (struct ibv_exp_qp_attr){ .qp_state = IBV_QPS_RESET };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (err) {
+ ERROR("%p: cannot reset QP: %s", (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ }
+ err = ibv_resize_cq(tmpl.cq, desc_n);
+ if (err) {
+ ERROR("%p: cannot resize CQ: %s", (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ /* Move the QP to this state. */
+ .qp_state = IBV_QPS_INIT,
+ /* Primary port number. */
+ .port_num = priv->port
+ };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod,
+ (IBV_EXP_QP_STATE |
+#ifdef RSS_SUPPORT
+ (parent ? IBV_EXP_QP_GROUP_RSS : 0) |
+#endif /* RSS_SUPPORT */
+ IBV_EXP_QP_PORT));
+ if (err) {
+ ERROR("%p: QP state to IBV_QPS_INIT failed: %s",
+ (void *)dev, strerror(err));
+ assert(err > 0);
+ return err;
+ };
+ /* Reconfigure flows. Do not care for errors. */
+ if (!priv->rss) {
+ if (priv->started)
+ rxq_mac_addrs_add(&tmpl);
+ /* Update original queue in case of failure. */
+ memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
+ }
+ /* Allocate pool. */
+ pool = rte_malloc(__func__, (mbuf_n * sizeof(*pool)), 0);
+ if (pool == NULL) {
+ ERROR("%p: cannot allocate memory", (void *)dev);
+ return ENOBUFS;
+ }
+ /* Snatch mbufs from original queue. */
+ k = 0;
+ if (rxq->sp) {
+ struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt_sp *elt = &(*elts)[i];
+ unsigned int j;
+
+ for (j = 0; (j != RTE_DIM(elt->bufs)); ++j) {
+ assert(elt->bufs[j] != NULL);
+ pool[k++] = elt->bufs[j];
+ }
+ }
+ } else {
+ struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+
+ for (i = 0; (i != RTE_DIM(*elts)); ++i) {
+ struct rxq_elt *elt = &(*elts)[i];
+ struct rte_mbuf *buf = (void *)
+ ((uintptr_t)elt->sge.addr -
+ WR_ID(elt->wr.wr_id).offset);
+
+ assert(WR_ID(elt->wr.wr_id).id == i);
+ pool[k++] = buf;
+ }
+ }
+ assert(k == mbuf_n);
+ tmpl.elts_n = 0;
+ tmpl.elts.sp = NULL;
+ assert((void *)&tmpl.elts.sp == (void *)&tmpl.elts.no_sp);
+ err = ((tmpl.sp) ?
+ rxq_alloc_elts_sp(&tmpl, desc_n, pool) :
+ rxq_alloc_elts(&tmpl, desc_n, pool));
+ if (err) {
+ ERROR("%p: cannot reallocate WRs, aborting", (void *)dev);
+ rte_free(pool);
+ assert(err > 0);
+ return err;
+ }
+ assert(tmpl.elts_n == desc_n);
+ assert(tmpl.elts.sp != NULL);
+ rte_free(pool);
+ /* Clean up original data. */
+ rxq->elts_n = 0;
+ rte_free(rxq->elts.sp);
+ rxq->elts.sp = NULL;
+ /* Post WRs. */
+ err = ibv_post_recv(tmpl.qp,
+ (tmpl.sp ?
+ &(*tmpl.elts.sp)[0].wr :
+ &(*tmpl.elts.no_sp)[0].wr),
+ &bad_wr);
+ if (err) {
+ ERROR("%p: ibv_post_recv() failed for WR %p: %s",
+ (void *)dev,
+ (void *)bad_wr,
+ strerror(err));
+ goto skip_rtr;
+ }
+ mod = (struct ibv_exp_qp_attr){
+ .qp_state = IBV_QPS_RTR
+ };
+ err = ibv_exp_modify_qp(tmpl.qp, &mod, IBV_EXP_QP_STATE);
+ if (err)
+ ERROR("%p: QP state to IBV_QPS_RTR failed: %s",
+ (void *)dev, strerror(err));
+skip_rtr:
+ *rxq = tmpl;
+ assert(err >= 0);
+ return err;
+}
+
+/**
* Configure a RX queue.
*
* @param dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index c7f634e..b6f2128 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -144,6 +144,7 @@ struct txq {
/* mlx5_rxq.c */
void rxq_cleanup(struct rxq *);
+int rxq_rehash(struct rte_eth_dev *, struct rxq *);
int rxq_setup(struct rte_eth_dev *, struct rxq *, uint16_t, unsigned int,
const struct rte_eth_rxconf *, struct rte_mempool *);
int mlx5_rx_queue_setup(struct rte_eth_dev *, uint16_t, uint16_t, unsigned int,
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 07/13] mlx5: add software counters and related callbacks
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (5 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 06/13] mlx5: add MTU configuration support Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
` (6 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
Hardware counters are not supported yet.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5.h | 5 ++
drivers/net/mlx5/mlx5_defs.h | 8 +++
drivers/net/mlx5/mlx5_rxq.c | 1 +
drivers/net/mlx5/mlx5_rxtx.c | 43 +++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 21 ++++++
drivers/net/mlx5/mlx5_stats.c | 144 ++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_txq.c | 1 +
9 files changed, 226 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_stats.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 028c22c..88b361c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -48,6 +48,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
DEPDIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += lib/librte_ether
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ddd74d0..38f5199 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -133,6 +133,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .stats_get = mlx5_stats_get,
+ .stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 0e2457a..79559bc 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -177,6 +177,11 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_stats.c */
+
+void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
+void mlx5_stats_reset(struct rte_eth_dev *);
+
/* mlx5_trigger.c */
int mlx5_dev_start(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index c85be9c..d3a0d0e 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -64,4 +64,12 @@
#define MLX5_PMD_TX_MP_CACHE 8
#endif
+/*
+ * If defined, only use software counters. The PMD will never ask the hardware
+ * for these, and many of them won't be available.
+ */
+#ifndef MLX5_PMD_SOFT_COUNTERS
+#define MLX5_PMD_SOFT_COUNTERS 1
+#endif
+
#endif /* RTE_PMD_MLX5_DEFS_H_ */
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 71d4470..620ec70 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -993,6 +993,7 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
if (ret)
rte_free(rxq);
else {
+ rxq->stats.idx = idx;
DEBUG("%p: adding RX queue %p to list",
(void *)dev, (void *)rxq);
(*priv->rxqs)[idx] = rxq;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index ed6faa1..eccbbb9 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -367,6 +367,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
struct txq_elt *elt = &(*txq->elts)[elts_head];
unsigned int segs = NB_SEGS(buf);
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ unsigned int sent_size = 0;
+#endif
uint32_t send_flags = 0;
/* Clean up old buffer. */
@@ -429,6 +432,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
send_flags);
if (unlikely(err))
goto stop;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ sent_size += length;
+#endif
} else {
#if MLX5_PMD_SGE_WR_N > 1
struct ibv_sge sges[MLX5_PMD_SGE_WR_N];
@@ -447,6 +453,9 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
send_flags);
if (unlikely(err))
goto stop;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ sent_size += ret.length;
+#endif
#else /* MLX5_PMD_SGE_WR_N > 1 */
DEBUG("%p: TX scattered buffers support not"
" compiled in", (void *)txq);
@@ -454,11 +463,19 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
#endif /* MLX5_PMD_SGE_WR_N > 1 */
}
elts_head = elts_head_next;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment sent bytes counter. */
+ txq->stats.obytes += sent_size;
+#endif
}
stop:
/* Take a shortcut if nothing must be sent. */
if (unlikely(i == 0))
return 0;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment sent packets counter. */
+ txq->stats.opackets += i;
+#endif
/* Ring QP doorbell. */
err = txq->if_qp->send_flush(txq->qp);
if (unlikely(err)) {
@@ -549,6 +566,10 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" completion status (%d): %s",
(void *)rxq, wc.wr_id, wc.status,
ibv_wc_status_str(wc.status));
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
+#endif
/* Link completed WRs together for repost. */
*next = wr;
next = &wr->next;
@@ -592,6 +613,7 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
rte_pktmbuf_free(pkt_buf);
}
/* Increment out of memory counters. */
+ ++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
goto repost;
}
@@ -651,6 +673,10 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Return packet. */
*(pkts++) = pkt_buf;
++pkts_ret;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment bytes counter. */
+ rxq->stats.ibytes += pkt_buf_len;
+#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
@@ -673,6 +699,10 @@ repost:
abort();
}
rxq->elts_head = elts_head;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment packets counter. */
+ rxq->stats.ipackets += pkts_ret;
+#endif
return pkts_ret;
}
@@ -753,6 +783,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" completion status (%d): %s",
(void *)rxq, wc.wr_id, wc.status,
ibv_wc_status_str(wc.status));
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
+#endif
/* Add SGE to array for repost. */
sges[i] = elt->sge;
goto repost;
@@ -772,6 +806,7 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
" can't allocate a new mbuf",
(void *)rxq, WR_ID(wr_id).id);
/* Increment out of memory counters. */
+ ++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
goto repost;
}
@@ -798,6 +833,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Return packet. */
*(pkts++) = seg;
++pkts_ret;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment bytes counter. */
+ rxq->stats.ibytes += len;
+#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
@@ -818,6 +857,10 @@ repost:
abort();
}
rxq->elts_head = elts_head;
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ /* Increment packets counter. */
+ rxq->stats.ipackets += pkts_ret;
+#endif
return pkts_ret;
}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index b6f2128..4183820 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -60,6 +60,25 @@
#include "mlx5.h"
#include "mlx5_defs.h"
+struct mlx5_rxq_stats {
+ unsigned int idx; /**< Mapping index. */
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ uint64_t ipackets; /**< Total of successfully received packets. */
+ uint64_t ibytes; /**< Total of successfully received bytes. */
+#endif
+ uint64_t idropped; /**< Total of packets dropped when RX ring full. */
+ uint64_t rx_nombuf; /**< Total of RX mbuf allocation failures. */
+};
+
+struct mlx5_txq_stats {
+ unsigned int idx; /**< Mapping index. */
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ uint64_t opackets; /**< Total of successfully sent packets. */
+ uint64_t obytes; /**< Total of successfully sent bytes. */
+#endif
+ uint64_t odropped; /**< Total of packets not sent when TX ring full. */
+};
+
/* RX element (scattered packets). */
struct rxq_elt_sp {
struct ibv_recv_wr wr; /* Work Request. */
@@ -96,6 +115,7 @@ struct rxq {
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
+ struct mlx5_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
struct ibv_exp_res_domain *rd; /* Resource Domain. */
};
@@ -135,6 +155,7 @@ struct txq {
unsigned int elts_comp; /* Number of completion requests. */
unsigned int elts_comp_cd; /* Countdown for next completion request. */
unsigned int elts_comp_cd_init; /* Initial value for countdown. */
+ struct mlx5_txq_stats stats; /* TX queue counters. */
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
unsigned int socket; /* CPU socket ID for allocations. */
diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c
new file mode 100644
index 0000000..a51e945
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_stats.c
@@ -0,0 +1,144 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_defs.h"
+
+/**
+ * DPDK callback to get device statistics.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] stats
+ * Stats structure output buffer.
+ */
+void
+mlx5_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct rte_eth_stats tmp = {0};
+ unsigned int i;
+ unsigned int idx;
+
+ priv_lock(priv);
+ /* Add software counters. */
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ struct rxq *rxq = (*priv->rxqs)[i];
+
+ if (rxq == NULL)
+ continue;
+ idx = rxq->stats.idx;
+ if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.q_ipackets[idx] += rxq->stats.ipackets;
+ tmp.q_ibytes[idx] += rxq->stats.ibytes;
+#endif
+ tmp.q_errors[idx] += (rxq->stats.idropped +
+ rxq->stats.rx_nombuf);
+ }
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.ipackets += rxq->stats.ipackets;
+ tmp.ibytes += rxq->stats.ibytes;
+#endif
+ tmp.ierrors += rxq->stats.idropped;
+ tmp.rx_nombuf += rxq->stats.rx_nombuf;
+ }
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ struct txq *txq = (*priv->txqs)[i];
+
+ if (txq == NULL)
+ continue;
+ idx = txq->stats.idx;
+ if (idx < RTE_ETHDEV_QUEUE_STAT_CNTRS) {
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.q_opackets[idx] += txq->stats.opackets;
+ tmp.q_obytes[idx] += txq->stats.obytes;
+#endif
+ tmp.q_errors[idx] += txq->stats.odropped;
+ }
+#ifdef MLX5_PMD_SOFT_COUNTERS
+ tmp.opackets += txq->stats.opackets;
+ tmp.obytes += txq->stats.obytes;
+#endif
+ tmp.oerrors += txq->stats.odropped;
+ }
+#ifndef MLX5_PMD_SOFT_COUNTERS
+ /* FIXME: retrieve and add hardware counters. */
+#endif
+ *stats = tmp;
+ priv_unlock(priv);
+}
+
+/**
+ * DPDK callback to clear device statistics.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_stats_reset(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ unsigned int idx;
+
+ priv_lock(priv);
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ idx = (*priv->rxqs)[i]->stats.idx;
+ (*priv->rxqs)[i]->stats =
+ (struct mlx5_rxq_stats){ .idx = idx };
+ }
+ for (i = 0; (i != priv->txqs_n); ++i) {
+ if ((*priv->txqs)[i] == NULL)
+ continue;
+ idx = (*priv->rxqs)[i]->stats.idx;
+ (*priv->txqs)[i]->stats =
+ (struct mlx5_txq_stats){ .idx = idx };
+ }
+#ifndef MLX5_PMD_SOFT_COUNTERS
+ /* FIXME: reset hardware counters. */
+#endif
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 2bae61f..a53b128 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -472,6 +472,7 @@ mlx5_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc,
if (ret)
rte_free(txq);
else {
+ txq->stats.idx = idx;
DEBUG("%p: adding TX queue %p to list",
(void *)dev, (void *)txq);
(*priv->txqs)[idx] = txq;
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 08/13] mlx5: add promiscuous and allmulticast RX modes
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (6 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 09/13] mlx5: add link update device operation Adrien Mazarguil
` (5 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev; +Cc: Yaacov Hazan
These modes require special non-MAC flows.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Signed-off-by: Yaacov Hazan <yaacovh@mellanox.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 4 +
drivers/net/mlx5/mlx5.h | 13 ++
drivers/net/mlx5/mlx5_ethdev.c | 4 +
drivers/net/mlx5/mlx5_rxmode.c | 311 ++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxq.c | 12 ++
drivers/net/mlx5/mlx5_rxtx.h | 2 +
drivers/net/mlx5/mlx5_trigger.c | 8 ++
8 files changed, 355 insertions(+)
create mode 100644 drivers/net/mlx5/mlx5_rxmode.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 88b361c..4d25d9c 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -48,6 +48,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxtx.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 38f5199..ee63bdf 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -133,6 +133,10 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.dev_start = mlx5_dev_start,
.dev_stop = mlx5_dev_stop,
.dev_close = mlx5_dev_close,
+ .promiscuous_enable = mlx5_promiscuous_enable,
+ .promiscuous_disable = mlx5_promiscuous_disable,
+ .allmulticast_enable = mlx5_allmulticast_enable,
+ .allmulticast_disable = mlx5_allmulticast_disable,
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 79559bc..56da43c 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -94,6 +94,8 @@ struct priv {
uint16_t mtu; /* Configured MTU. */
uint8_t port; /* Physical port number. */
unsigned int started:1; /* Device started, flows enabled. */
+ unsigned int promisc_req:1; /* Promiscuous mode requested. */
+ unsigned int allmulti_req:1; /* All multicast mode requested. */
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
@@ -177,6 +179,17 @@ int priv_mac_addr_add(struct priv *, unsigned int,
void mlx5_mac_addr_add(struct rte_eth_dev *, struct ether_addr *, uint32_t,
uint32_t);
+/* mlx5_rxmode.c */
+
+int rxq_promiscuous_enable(struct rxq *);
+void mlx5_promiscuous_enable(struct rte_eth_dev *);
+void rxq_promiscuous_disable(struct rxq *);
+void mlx5_promiscuous_disable(struct rte_eth_dev *);
+int rxq_allmulticast_enable(struct rxq *);
+void mlx5_allmulticast_enable(struct rte_eth_dev *);
+void rxq_allmulticast_disable(struct rxq *);
+void mlx5_allmulticast_disable(struct rte_eth_dev *);
+
/* mlx5_stats.c */
void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 0afc1bb..26b6d73 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -605,6 +605,10 @@ mlx5_dev_set_mtu(struct rte_eth_dev *dev, uint16_t mtu)
if (!priv->rss) {
if (priv->started)
rxq_mac_addrs_add(rxq);
+ if (priv->started && priv->promisc_req)
+ rxq_promiscuous_enable(rxq);
+ if (priv->started && priv->allmulti_req)
+ rxq_allmulticast_enable(rxq);
}
/* Scattered burst function takes priority. */
if (rxq->sp)
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
new file mode 100644
index 0000000..7efa21b
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -0,0 +1,311 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <errno.h>
+#include <string.h>
+
+/* Verbs header. */
+/* ISO C doesn't support unnamed structs/unions, disabling -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <infiniband/verbs.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5.h"
+#include "mlx5_rxtx.h"
+#include "mlx5_utils.h"
+
+/**
+ * Enable promiscuous mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_promiscuous_enable(struct rxq *rxq)
+{
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_ALL_DEFAULT,
+ .num_of_specs = 0,
+ .port = rxq->priv->port,
+ .flags = 0
+ };
+
+ if (rxq->priv->vf)
+ return 0;
+ if (rxq->promisc_flow != NULL)
+ return 0;
+ DEBUG("%p: enabling promiscuous mode", (void *)rxq);
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, &attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ rxq->promisc_flow = flow;
+ DEBUG("%p: promiscuous mode enabled", (void *)rxq);
+ return 0;
+}
+
+/**
+ * DPDK callback to enable promiscuous mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_promiscuous_enable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ int ret;
+
+ priv_lock(priv);
+ priv->promisc_req = 1;
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started)
+ goto end;
+ if (priv->rss) {
+ ret = rxq_promiscuous_enable(&priv->rxq_parent);
+ if (ret) {
+ priv_unlock(priv);
+ return;
+ }
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_promiscuous_enable((*priv->rxqs)[i]);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[--i] != NULL)
+ rxq_promiscuous_disable((*priv->rxqs)[i]);
+ priv_unlock(priv);
+ return;
+ }
+end:
+ priv_unlock(priv);
+}
+
+/**
+ * Disable promiscuous mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_promiscuous_disable(struct rxq *rxq)
+{
+ if (rxq->priv->vf)
+ return;
+ if (rxq->promisc_flow == NULL)
+ return;
+ DEBUG("%p: disabling promiscuous mode", (void *)rxq);
+ claim_zero(ibv_destroy_flow(rxq->promisc_flow));
+ rxq->promisc_flow = NULL;
+ DEBUG("%p: promiscuous mode disabled", (void *)rxq);
+}
+
+/**
+ * DPDK callback to disable promiscuous mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_promiscuous_disable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+
+ priv_lock(priv);
+ priv->promisc_req = 0;
+ if (priv->rss) {
+ rxq_promiscuous_disable(&priv->rxq_parent);
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_promiscuous_disable((*priv->rxqs)[i]);
+end:
+ priv_unlock(priv);
+}
+
+/**
+ * Enable allmulti mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+int
+rxq_allmulticast_enable(struct rxq *rxq)
+{
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_MC_DEFAULT,
+ .num_of_specs = 0,
+ .port = rxq->priv->port,
+ .flags = 0
+ };
+
+ if (rxq->allmulti_flow != NULL)
+ return 0;
+ DEBUG("%p: enabling allmulticast mode", (void *)rxq);
+ errno = 0;
+ flow = ibv_create_flow(rxq->qp, &attr);
+ if (flow == NULL) {
+ /* It's not clear whether errno is always set in this case. */
+ ERROR("%p: flow configuration failed, errno=%d: %s",
+ (void *)rxq, errno,
+ (errno ? strerror(errno) : "Unknown error"));
+ if (errno)
+ return errno;
+ return EINVAL;
+ }
+ rxq->allmulti_flow = flow;
+ DEBUG("%p: allmulticast mode enabled", (void *)rxq);
+ return 0;
+}
+
+/**
+ * DPDK callback to enable allmulti mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_allmulticast_enable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ int ret;
+
+ priv_lock(priv);
+ priv->allmulti_req = 1;
+ /* If device isn't started, this is all we need to do. */
+ if (!priv->started)
+ goto end;
+ if (priv->rss) {
+ ret = rxq_allmulticast_enable(&priv->rxq_parent);
+ if (ret) {
+ priv_unlock(priv);
+ return;
+ }
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i) {
+ if ((*priv->rxqs)[i] == NULL)
+ continue;
+ ret = rxq_allmulticast_enable((*priv->rxqs)[i]);
+ if (!ret)
+ continue;
+ /* Failure, rollback. */
+ while (i != 0)
+ if ((*priv->rxqs)[--i] != NULL)
+ rxq_allmulticast_disable((*priv->rxqs)[i]);
+ priv_unlock(priv);
+ return;
+ }
+end:
+ priv_unlock(priv);
+}
+
+/**
+ * Disable allmulti mode in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ */
+void
+rxq_allmulticast_disable(struct rxq *rxq)
+{
+ if (rxq->allmulti_flow == NULL)
+ return;
+ DEBUG("%p: disabling allmulticast mode", (void *)rxq);
+ claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
+ rxq->allmulti_flow = NULL;
+ DEBUG("%p: allmulticast mode disabled", (void *)rxq);
+}
+
+/**
+ * DPDK callback to disable allmulti mode.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ */
+void
+mlx5_allmulticast_disable(struct rte_eth_dev *dev)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+
+ priv_lock(priv);
+ priv->allmulti_req = 0;
+ if (priv->rss) {
+ rxq_allmulticast_disable(&priv->rxq_parent);
+ goto end;
+ }
+ for (i = 0; (i != priv->rxqs_n); ++i)
+ if ((*priv->rxqs)[i] != NULL)
+ rxq_allmulticast_disable((*priv->rxqs)[i]);
+end:
+ priv_unlock(priv);
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 620ec70..1cd28c2 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -398,6 +398,8 @@ rxq_cleanup(struct rxq *rxq)
¶ms));
}
if (rxq->qp != NULL) {
+ rxq_promiscuous_disable(rxq);
+ rxq_allmulticast_disable(rxq);
rxq_mac_addrs_del(rxq);
claim_zero(ibv_destroy_qp(rxq->qp));
}
@@ -580,8 +582,12 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
}
/* Remove attached flows if RSS is disabled (no parent queue). */
if (!priv->rss) {
+ rxq_allmulticast_disable(&tmpl);
+ rxq_promiscuous_disable(&tmpl);
rxq_mac_addrs_del(&tmpl);
/* Update original queue in case of failure. */
+ rxq->allmulti_flow = tmpl.allmulti_flow;
+ rxq->promisc_flow = tmpl.promisc_flow;
memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
}
/* From now on, any failure will render the queue unusable.
@@ -621,7 +627,13 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
if (!priv->rss) {
if (priv->started)
rxq_mac_addrs_add(&tmpl);
+ if (priv->started && priv->promisc_req)
+ rxq_promiscuous_enable(&tmpl);
+ if (priv->started && priv->allmulti_req)
+ rxq_allmulticast_enable(&tmpl);
/* Update original queue in case of failure. */
+ rxq->allmulti_flow = tmpl.allmulti_flow;
+ rxq->promisc_flow = tmpl.promisc_flow;
memcpy(rxq->mac_flow, tmpl.mac_flow, sizeof(rxq->mac_flow));
}
/* Allocate pool. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4183820..020acf0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -106,6 +106,8 @@ struct rxq {
struct ibv_exp_cq_family *if_cq; /* CQ interface. */
/* MAC flow steering rules. */
struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES];
+ struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+ struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index f5d965f..dced025 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -86,6 +86,10 @@ mlx5_dev_start(struct rte_eth_dev *dev)
if (rxq == NULL)
continue;
ret = rxq_mac_addrs_add(rxq);
+ if (!ret && priv->promisc_req)
+ ret = rxq_promiscuous_enable(rxq);
+ if (!ret && priv->allmulti_req)
+ ret = rxq_allmulticast_enable(rxq);
if (!ret)
continue;
WARN("%p: QP flow attachment failed: %s",
@@ -94,6 +98,8 @@ mlx5_dev_start(struct rte_eth_dev *dev)
while (i != 0) {
rxq = (*priv->rxqs)[--i];
if (rxq != NULL) {
+ rxq_allmulticast_disable(rxq);
+ rxq_promiscuous_disable(rxq);
rxq_mac_addrs_del(rxq);
}
}
@@ -140,6 +146,8 @@ mlx5_dev_stop(struct rte_eth_dev *dev)
/* Ignore nonexistent RX queues. */
if (rxq == NULL)
continue;
+ rxq_allmulticast_disable(rxq);
+ rxq_promiscuous_disable(rxq);
rxq_mac_addrs_del(rxq);
} while ((--r) && ((rxq = (*priv->rxqs)[++i]), i));
priv_unlock(priv);
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 09/13] mlx5: add link update device operation
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (7 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-11-02 17:52 ` Stephen Hemminger
2015-10-30 18:52 ` [PATCH v2 10/13] mlx5: add flow control device operations Adrien Mazarguil
` (4 subsequent siblings)
13 siblings, 1 reply; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
Link information is retrieved using ethtool ioctls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_ethdev.c | 71 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ee63bdf..5ed828d 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -137,6 +137,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.promiscuous_disable = mlx5_promiscuous_disable,
.allmulticast_enable = mlx5_allmulticast_enable,
.allmulticast_disable = mlx5_allmulticast_disable,
+ .link_update = mlx5_link_update,
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 56da43c..1a18326 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -164,6 +164,7 @@ int priv_get_mtu(struct priv *, uint16_t *);
int priv_set_flags(struct priv *, unsigned int, unsigned int);
int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
+int mlx5_link_update(struct rte_eth_dev *, int);
int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 26b6d73..d01dee5 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -45,6 +45,8 @@
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/if.h>
+#include <linux/ethtool.h>
+#include <linux/sockios.h>
/* DPDK headers don't like -pedantic. */
#ifdef PEDANTIC
@@ -535,6 +537,75 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
}
/**
+ * DPDK callback to retrieve physical link information (unlocked version).
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ * Wait for request completion (ignored).
+ */
+static int
+mlx5_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ethtool_cmd edata = {
+ .cmd = ETHTOOL_GSET
+ };
+ struct ifreq ifr;
+ struct rte_eth_link dev_link;
+ int link_speed = 0;
+
+ (void)wait_to_complete;
+ if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+ WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
+ return -1;
+ }
+ memset(&dev_link, 0, sizeof(dev_link));
+ dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+ (ifr.ifr_flags & IFF_RUNNING));
+ ifr.ifr_data = &edata;
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+ strerror(errno));
+ return -1;
+ }
+ link_speed = ethtool_cmd_speed(&edata);
+ if (link_speed == -1)
+ dev_link.link_speed = 0;
+ else
+ dev_link.link_speed = link_speed;
+ dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+ ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+ if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
+ /* Link status changed. */
+ dev->data->dev_link = dev_link;
+ return 0;
+ }
+ /* Link status is still the same. */
+ return -1;
+}
+
+/**
+ * DPDK callback to retrieve physical link information.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param wait_to_complete
+ * Wait for request completion (ignored).
+ */
+int
+mlx5_link_update(struct rte_eth_dev *dev, int wait_to_complete)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = mlx5_link_update_unlocked(dev, wait_to_complete);
+ priv_unlock(priv);
+ return ret;
+}
+
+/**
* DPDK callback to change the MTU.
*
* Setting the MTU affects hardware MRU (packets larger than the MTU cannot be
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 10/13] mlx5: add flow control device operations
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (8 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 09/13] mlx5: add link update device operation Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 11/13] mlx5: add VLAN filtering Adrien Mazarguil
` (3 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
Like most other device control operations, those are handled by the related
kernel network device through syscalls.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5.c | 2 +
drivers/net/mlx5/mlx5.h | 2 +
drivers/net/mlx5/mlx5_ethdev.c | 99 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 103 insertions(+)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 5ed828d..c454e93 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -145,6 +145,8 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
.tx_queue_release = mlx5_tx_queue_release,
+ .flow_ctrl_get = mlx5_dev_get_flow_ctrl,
+ .flow_ctrl_set = mlx5_dev_set_flow_ctrl,
.mac_addr_remove = mlx5_mac_addr_remove,
.mac_addr_add = mlx5_mac_addr_add,
.mtu_set = mlx5_dev_set_mtu,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 1a18326..c6c3d3f 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -166,6 +166,8 @@ int mlx5_dev_configure(struct rte_eth_dev *);
void mlx5_dev_infos_get(struct rte_eth_dev *, struct rte_eth_dev_info *);
int mlx5_link_update(struct rte_eth_dev *, int);
int mlx5_dev_set_mtu(struct rte_eth_dev *, uint16_t);
+int mlx5_dev_get_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
+int mlx5_dev_set_flow_ctrl(struct rte_eth_dev *, struct rte_eth_fc_conf *);
int mlx5_ibv_device_to_pci_addr(const struct ibv_device *,
struct rte_pci_addr *);
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index d01dee5..5df5fa1 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -695,6 +695,105 @@ out:
}
/**
+ * DPDK callback to get flow control status.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[out] fc_conf
+ * Flow control output buffer.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_get_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ifreq ifr;
+ struct ethtool_pauseparam ethpause = {
+ .cmd = ETHTOOL_GPAUSEPARAM
+ };
+ int ret;
+
+ ifr.ifr_data = ðpause;
+ priv_lock(priv);
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ ret = errno;
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_GPAUSEPARAM)"
+ " failed: %s",
+ strerror(ret));
+ goto out;
+ }
+
+ fc_conf->autoneg = ethpause.autoneg;
+ if (ethpause.rx_pause && ethpause.tx_pause)
+ fc_conf->mode = RTE_FC_FULL;
+ else if (ethpause.rx_pause)
+ fc_conf->mode = RTE_FC_RX_PAUSE;
+ else if (ethpause.tx_pause)
+ fc_conf->mode = RTE_FC_TX_PAUSE;
+ else
+ fc_conf->mode = RTE_FC_NONE;
+ ret = 0;
+
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
+ * DPDK callback to modify flow control parameters.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param[in] fc_conf
+ * Flow control parameters.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_dev_set_flow_ctrl(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf)
+{
+ struct priv *priv = dev->data->dev_private;
+ struct ifreq ifr;
+ struct ethtool_pauseparam ethpause = {
+ .cmd = ETHTOOL_SPAUSEPARAM
+ };
+ int ret;
+
+ ifr.ifr_data = ðpause;
+ ethpause.autoneg = fc_conf->autoneg;
+ if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+ (fc_conf->mode & RTE_FC_RX_PAUSE))
+ ethpause.rx_pause = 1;
+ else
+ ethpause.rx_pause = 0;
+
+ if (((fc_conf->mode & RTE_FC_FULL) == RTE_FC_FULL) ||
+ (fc_conf->mode & RTE_FC_TX_PAUSE))
+ ethpause.tx_pause = 1;
+ else
+ ethpause.tx_pause = 0;
+
+ priv_lock(priv);
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ ret = errno;
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM)"
+ " failed: %s",
+ strerror(ret));
+ goto out;
+ }
+ ret = 0;
+
+out:
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
+
+/**
* Get PCI information from struct ibv_device.
*
* @param device
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 11/13] mlx5: add VLAN filtering
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (9 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 10/13] mlx5: add flow control device operations Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 12/13] mlx5: add checksum offloading support Adrien Mazarguil
` (2 subsequent siblings)
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
All MAC RX flows must be updated with VLAN information when configuring a
VLAN filter.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/Makefile | 1 +
drivers/net/mlx5/mlx5.c | 1 +
drivers/net/mlx5/mlx5.h | 6 ++
drivers/net/mlx5/mlx5_defs.h | 3 +
drivers/net/mlx5/mlx5_mac.c | 62 ++++++++++++-----
drivers/net/mlx5/mlx5_rxtx.h | 4 +-
drivers/net/mlx5/mlx5_vlan.c | 156 +++++++++++++++++++++++++++++++++++++++++++
7 files changed, 216 insertions(+), 17 deletions(-)
create mode 100644 drivers/net/mlx5/mlx5_vlan.c
diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile
index 4d25d9c..8b1e32b 100644
--- a/drivers/net/mlx5/Makefile
+++ b/drivers/net/mlx5/Makefile
@@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_trigger.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_mac.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_rxmode.c
+SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_vlan.c
SRCS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5_stats.c
# Dependencies.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c454e93..8f75f76 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -141,6 +141,7 @@ static const struct eth_dev_ops mlx5_dev_ops = {
.stats_get = mlx5_stats_get,
.stats_reset = mlx5_stats_reset,
.dev_infos_get = mlx5_dev_infos_get,
+ .vlan_filter_set = mlx5_vlan_filter_set,
.rx_queue_setup = mlx5_rx_queue_setup,
.tx_queue_setup = mlx5_tx_queue_setup,
.rx_queue_release = mlx5_rx_queue_release,
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c6c3d3f..3a1e7a6 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -90,6 +90,8 @@ struct priv {
*/
struct ether_addr mac[MLX5_MAX_MAC_ADDRESSES];
BITFIELD_DECLARE(mac_configured, uint32_t, MLX5_MAX_MAC_ADDRESSES);
+ uint16_t vlan_filter[MLX5_MAX_VLAN_IDS]; /* VLAN filters table. */
+ unsigned int vlan_filter_n; /* Number of configured VLAN filters. */
/* Device properties. */
uint16_t mtu; /* Configured MTU. */
uint8_t port; /* Physical port number. */
@@ -198,6 +200,10 @@ void mlx5_allmulticast_disable(struct rte_eth_dev *);
void mlx5_stats_get(struct rte_eth_dev *, struct rte_eth_stats *);
void mlx5_stats_reset(struct rte_eth_dev *);
+/* mlx5_vlan.c */
+
+int mlx5_vlan_filter_set(struct rte_eth_dev *, uint16_t, int);
+
/* mlx5_trigger.c */
int mlx5_dev_start(struct rte_eth_dev *);
diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index d3a0d0e..369f8b6 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -40,6 +40,9 @@
/* Maximum number of simultaneous MAC addresses. */
#define MLX5_MAX_MAC_ADDRESSES 128
+/* Maximum number of simultaneous VLAN filters. */
+#define MLX5_MAX_VLAN_IDS 128
+
/* Request send completion once in every 64 sends, might be less. */
#define MLX5_PMD_TX_PER_COMP_REQ 64
diff --git a/drivers/net/mlx5/mlx5_mac.c b/drivers/net/mlx5/mlx5_mac.c
index 262d7c6..95afccf 100644
--- a/drivers/net/mlx5/mlx5_mac.c
+++ b/drivers/net/mlx5/mlx5_mac.c
@@ -97,9 +97,12 @@ priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
* Pointer to RX queue structure.
* @param mac_index
* MAC address index.
+ * @param vlan_index
+ * VLAN index to use.
*/
static void
-rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index,
+ unsigned int vlan_index)
{
#ifndef NDEBUG
const uint8_t (*mac)[ETHER_ADDR_LEN] =
@@ -108,14 +111,17 @@ rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index)
#endif
assert(mac_index < RTE_DIM(rxq->mac_flow));
- if (rxq->mac_flow[mac_index] == NULL)
+ assert(vlan_index < RTE_DIM(rxq->mac_flow[mac_index]));
+ if (rxq->mac_flow[mac_index][vlan_index] == NULL)
return;
- DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+ DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " VLAN index %u",
(void *)rxq,
(*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
- mac_index);
- claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
- rxq->mac_flow[mac_index] = NULL;
+ mac_index,
+ vlan_index);
+ claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
+ rxq->mac_flow[mac_index][vlan_index] = NULL;
}
/**
@@ -129,8 +135,11 @@ rxq_del_mac_flow(struct rxq *rxq, unsigned int mac_index)
static void
rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
{
+ unsigned int i;
+
assert(mac_index < RTE_DIM(rxq->mac_flow));
- rxq_del_mac_flow(rxq, mac_index);
+ for (i = 0; (i != RTE_DIM(rxq->mac_flow[mac_index])); ++i)
+ rxq_del_mac_flow(rxq, mac_index, i);
}
/**
@@ -208,12 +217,15 @@ end:
* Pointer to RX queue structure.
* @param mac_index
* MAC address index to register.
+ * @param vlan_index
+ * VLAN index to use.
*
* @return
* 0 on success, errno value on failure.
*/
static int
-rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
+rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index,
+ unsigned int vlan_index)
{
struct ibv_flow *flow;
struct priv *priv = rxq->priv;
@@ -226,9 +238,12 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
} data;
struct ibv_flow_attr *attr = &data.attr;
struct ibv_flow_spec_eth *spec = &data.spec;
+ unsigned int vlan_enabled = !!priv->vlan_filter_n;
+ unsigned int vlan_id = priv->vlan_filter[vlan_index];
assert(mac_index < RTE_DIM(rxq->mac_flow));
- if (rxq->mac_flow[mac_index] != NULL)
+ assert(vlan_index < RTE_DIM(rxq->mac_flow[mac_index]));
+ if (rxq->mac_flow[mac_index][vlan_index] != NULL)
return 0;
/*
* No padding must be inserted by the compiler between attr and spec.
@@ -249,15 +264,21 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
(*mac)[0], (*mac)[1], (*mac)[2],
(*mac)[3], (*mac)[4], (*mac)[5]
},
+ .vlan_tag = (vlan_enabled ? htons(vlan_id) : 0),
},
.mask = {
.dst_mac = "\xff\xff\xff\xff\xff\xff",
+ .vlan_tag = (vlan_enabled ? htons(0xfff) : 0),
},
};
- DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u",
+ DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " VLAN index %u filtering %s, ID %u",
(void *)rxq,
(*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
- mac_index);
+ mac_index,
+ vlan_index,
+ (vlan_enabled ? "enabled" : "disabled"),
+ vlan_id);
/* Create related flow. */
errno = 0;
flow = ibv_create_flow(rxq->qp, attr);
@@ -270,7 +291,7 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
return errno;
return EINVAL;
}
- rxq->mac_flow[mac_index] = flow;
+ rxq->mac_flow[mac_index][vlan_index] = flow;
return 0;
}
@@ -288,12 +309,23 @@ rxq_add_mac_flow(struct rxq *rxq, unsigned int mac_index)
static int
rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
{
+ struct priv *priv = rxq->priv;
+ unsigned int i = 0;
int ret;
assert(mac_index < RTE_DIM(rxq->mac_flow));
- ret = rxq_add_mac_flow(rxq, mac_index);
- if (ret)
- return ret;
+ assert(RTE_DIM(rxq->mac_flow[mac_index]) ==
+ RTE_DIM(priv->vlan_filter));
+ /* Add a MAC address for each VLAN filter, or at least once. */
+ do {
+ ret = rxq_add_mac_flow(rxq, mac_index, i);
+ if (ret) {
+ /* Failure, rollback. */
+ while (i != 0)
+ rxq_del_mac_flow(rxq, mac_index, --i);
+ return ret;
+ }
+ } while (++i < priv->vlan_filter_n);
return 0;
}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 020acf0..521aee0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -104,8 +104,8 @@ struct rxq {
struct ibv_qp *qp; /* Queue Pair. */
struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
struct ibv_exp_cq_family *if_cq; /* CQ interface. */
- /* MAC flow steering rules. */
- struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES];
+ /* MAC flow steering rules, one per VLAN ID. */
+ struct ibv_flow *mac_flow[MLX5_MAX_MAC_ADDRESSES][MLX5_MAX_VLAN_IDS];
struct ibv_flow *promisc_flow; /* Promiscuous flow. */
struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
diff --git a/drivers/net/mlx5/mlx5_vlan.c b/drivers/net/mlx5/mlx5_vlan.c
new file mode 100644
index 0000000..ca80571
--- /dev/null
+++ b/drivers/net/mlx5/mlx5_vlan.c
@@ -0,0 +1,156 @@
+/*-
+ * BSD LICENSE
+ *
+ * Copyright 2015 6WIND S.A.
+ * Copyright 2015 Mellanox.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of 6WIND S.A. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stddef.h>
+#include <errno.h>
+#include <assert.h>
+#include <stdint.h>
+
+/* DPDK headers don't like -pedantic. */
+#ifdef PEDANTIC
+#pragma GCC diagnostic ignored "-pedantic"
+#endif
+#include <rte_ethdev.h>
+#include <rte_common.h>
+#ifdef PEDANTIC
+#pragma GCC diagnostic error "-pedantic"
+#endif
+
+#include "mlx5_utils.h"
+#include "mlx5.h"
+
+/**
+ * Configure a VLAN filter.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param vlan_id
+ * VLAN ID to filter.
+ * @param on
+ * Toggle filter.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+ struct priv *priv = dev->data->dev_private;
+ unsigned int i;
+ unsigned int r;
+ struct rxq *rxq;
+
+ DEBUG("%p: %s VLAN filter ID %" PRIu16,
+ (void *)dev, (on ? "enable" : "disable"), vlan_id);
+ assert(priv->vlan_filter_n <= RTE_DIM(priv->vlan_filter));
+ for (i = 0; (i != priv->vlan_filter_n); ++i)
+ if (priv->vlan_filter[i] == vlan_id)
+ break;
+ /* Check if there's room for another VLAN filter. */
+ if (i == RTE_DIM(priv->vlan_filter))
+ return ENOMEM;
+ if (i < priv->vlan_filter_n) {
+ assert(priv->vlan_filter_n != 0);
+ /* Enabling an existing VLAN filter has no effect. */
+ if (on)
+ return 0;
+ /* Remove VLAN filter from list. */
+ --priv->vlan_filter_n;
+ memmove(&priv->vlan_filter[i],
+ &priv->vlan_filter[i + 1],
+ priv->vlan_filter_n - i);
+ priv->vlan_filter[priv->vlan_filter_n] = 0;
+ } else {
+ assert(i == priv->vlan_filter_n);
+ /* Disabling an unknown VLAN filter has no effect. */
+ if (!on)
+ return 0;
+ /* Add new VLAN filter. */
+ priv->vlan_filter[priv->vlan_filter_n] = vlan_id;
+ ++priv->vlan_filter_n;
+ }
+ if (!priv->started)
+ return 0;
+ /* Rehash MAC flows in all RX queues. */
+ if (priv->rss) {
+ rxq = &priv->rxq_parent;
+ r = 1;
+ } else {
+ rxq = (*priv->rxqs)[0];
+ r = priv->rxqs_n;
+ }
+ for (i = 0; (i < r); rxq = (*priv->rxqs)[++i]) {
+ int ret;
+
+ if (rxq == NULL)
+ continue;
+ rxq_mac_addrs_del(rxq);
+ ret = rxq_mac_addrs_add(rxq);
+ if (!ret)
+ continue;
+ /* Rollback. */
+ while (i != 0) {
+ rxq = (*priv->rxqs)[--i];
+ if (rxq != NULL)
+ rxq_mac_addrs_del(rxq);
+ }
+ return ret;
+ }
+ return 0;
+}
+
+/**
+ * DPDK callback to configure a VLAN filter.
+ *
+ * @param dev
+ * Pointer to Ethernet device structure.
+ * @param vlan_id
+ * VLAN ID to filter.
+ * @param on
+ * Toggle filter.
+ *
+ * @return
+ * 0 on success, negative errno value on failure.
+ */
+int
+mlx5_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
+{
+ struct priv *priv = dev->data->dev_private;
+ int ret;
+
+ priv_lock(priv);
+ ret = vlan_filter_set(dev, vlan_id, on);
+ priv_unlock(priv);
+ assert(ret >= 0);
+ return -ret;
+}
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 12/13] mlx5: add checksum offloading support
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (10 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 11/13] mlx5: add VLAN filtering Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
2015-10-30 23:18 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Thomas Monjalon
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
This is the same implementation as mlx4.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
drivers/net/mlx5/mlx5_rxq.c | 14 +++++++
drivers/net/mlx5/mlx5_rxtx.c | 94 +++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 2 +
drivers/net/mlx5/mlx5_utils.h | 6 +++
4 files changed, 116 insertions(+)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 1cd28c2..5a55886 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -565,6 +565,15 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
/* Number of descriptors and mbufs currently allocated. */
desc_n = (tmpl.elts_n * (tmpl.sp ? MLX5_PMD_SGE_WR_N : 1));
mbuf_n = desc_n;
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum) {
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum = tmpl.csum;
+ }
+ if (priv->hw_csum_l2tun) {
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum_l2tun = tmpl.csum_l2tun;
+ }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -785,6 +794,11 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum)
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ if (priv->hw_csum_l2tun)
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index eccbbb9..623219d 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -390,6 +390,17 @@ mlx5_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
++elts_comp;
send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
+ /* Should we enable HW CKSUM offload */
+ if (buf->ol_flags &
+ (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
+ send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
+ /* HW does not support checksum offloads at arbitrary
+ * offsets but automatically recognizes the packet
+ * type. For inner L3/L4 checksums, only VXLAN (UDP)
+ * tunnels are currently supported. */
+ if (RTE_ETH_IS_TUNNEL_PKT(buf->packet_type))
+ send_flags |= IBV_EXP_QP_BURST_TUNNEL;
+ }
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -491,6 +502,85 @@ stop:
}
/**
+ * Translate RX completion flags to packet type.
+ *
+ * @param flags
+ * RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ * Packet type for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_pkt_type(uint32_t flags)
+{
+ uint32_t pkt_type;
+
+ if (flags & IBV_EXP_CQ_RX_TUNNEL_PACKET)
+ pkt_type =
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
+ RTE_PTYPE_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
+ RTE_PTYPE_L3_IPV6) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV4_PACKET,
+ RTE_PTYPE_INNER_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV6_PACKET,
+ RTE_PTYPE_INNER_L3_IPV6);
+ else
+ pkt_type =
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV4_PACKET,
+ RTE_PTYPE_L3_IPV4) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_IPV6_PACKET,
+ RTE_PTYPE_L3_IPV6);
+ return pkt_type;
+}
+
+/**
+ * Translate RX completion flags to offload flags.
+ *
+ * @param[in] rxq
+ * Pointer to RX queue structure.
+ * @param flags
+ * RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ * Offload flags (ol_flags) for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
+{
+ uint32_t ol_flags = 0;
+
+ if (rxq->csum)
+ ol_flags |=
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+ /*
+ * PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
+ * of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
+ * (its value is 0).
+ */
+ if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
+ ol_flags |=
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+ return ol_flags;
+}
+
+/**
* DPDK callback for RX with scattered packets support.
*
* @param dpdk_rxq
@@ -669,6 +759,8 @@ mlx5_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = pkt_buf_len;
+ pkt_buf->packet_type = rxq_cq_to_pkt_type(flags);
+ pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = pkt_buf;
@@ -829,6 +921,8 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NEXT(seg) = NULL;
PKT_LEN(seg) = len;
DATA_LEN(seg) = len;
+ seg->packet_type = rxq_cq_to_pkt_type(flags);
+ seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = seg;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 521aee0..d86d623 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -116,6 +116,8 @@ struct rxq {
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
+ unsigned int csum:1; /* Enable checksum offloading. */
+ unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx5_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index e48e6b6..8ff075b 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -149,6 +149,12 @@ pmd_drv_log_basename(const char *s)
#define NB_SEGS(m) ((m)->nb_segs)
#define PORT(m) ((m)->port)
+/* Transpose flags. Useful to convert IBV to DPDK flags. */
+#define TRANSPOSE(val, from, to) \
+ (((from) >= (to)) ? \
+ (((val) & (from)) / ((from) / (to))) : \
+ (((val) & (from)) * ((to) / (from))))
+
/* Allocate a buffer on the stack and fill it with a printf format string. */
#define MKSTR(name, ...) \
char name[snprintf(NULL, 0, __VA_ARGS__) + 1]; \
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 13/13] doc: add mlx5 documentation and release notes for version 2.2
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (11 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 12/13] mlx5: add checksum offloading support Adrien Mazarguil
@ 2015-10-30 18:52 ` Adrien Mazarguil
2015-10-30 23:18 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Thomas Monjalon
13 siblings, 0 replies; 32+ messages in thread
From: Adrien Mazarguil @ 2015-10-30 18:52 UTC (permalink / raw)
To: dev
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
doc/guides/nics/mlx5.rst | 308 +++++++++++++++++++++++++++++++++++
doc/guides/rel_notes/release_2_2.rst | 9 +
2 files changed, 317 insertions(+)
create mode 100644 doc/guides/nics/mlx5.rst
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
new file mode 100644
index 0000000..fdb621c
--- /dev/null
+++ b/doc/guides/nics/mlx5.rst
@@ -0,0 +1,308 @@
+.. BSD LICENSE
+ Copyright 2015 6WIND S.A.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions
+ are met:
+
+ * Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+ * Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer in
+ the documentation and/or other materials provided with the
+ distribution.
+ * Neither the name of 6WIND S.A. nor the names of its
+ contributors may be used to endorse or promote products derived
+ from this software without specific prior written permission.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+MLX5 poll mode driver
+=====================
+
+The MLX5 poll mode driver library (**librte_pmd_mlx5**) provides support for
+**Mellanox ConnectX-4 EN** and **Mellanox ConnectX-4 Lx EN** families of
+10/25/40/50/100 Gb/s adapters as well as their virtual functions (VF) in
+SR-IOV context.
+
+Information and documentation about these adapters can be found on the
+`Mellanox website <http://www.mellanox.com>`__. Help is also provided by the
+`Mellanox community <http://community.mellanox.com/welcome>`__.
+
+There is also a `section dedicated to this poll mode driver
+<http://www.mellanox.com/page/products_dyn?product_family=209&mtag=pmd_for_dpdk>`__.
+
+.. note::
+
+ Due to external dependencies, this driver is disabled by default. It must
+ be enabled manually by setting ``CONFIG_RTE_LIBRTE_MLX5_PMD=y`` and
+ recompiling DPDK.
+
+.. warning::
+
+ ``CONFIG_RTE_BUILD_COMBINE_LIBS`` with ``CONFIG_RTE_BUILD_SHARED_LIB``
+ is not supported and thus the compilation will fail with this configuration.
+
+Implementation details
+----------------------
+
+Besides its dependency on libibverbs (that implies libmlx5 and associated
+kernel support), librte_pmd_mlx5 relies heavily on system calls for control
+operations such as querying/updating the MTU and flow control parameters.
+
+For security reasons and robustness, this driver only deals with virtual
+memory addresses. The way resources allocations are handled by the kernel
+combined with hardware specifications that allow it to handle virtual memory
+addresses directly ensure that DPDK applications cannot access random
+physical memory (or memory that does not belong to the current process).
+
+This capability allows the PMD to coexist with kernel network interfaces
+which remain functional, although they stop receiving unicast packets as
+long as they share the same MAC address.
+
+Enabling librte_pmd_mlx5 causes DPDK applications to be linked against
+libibverbs.
+
+Configuration
+-------------
+
+Compilation options
+~~~~~~~~~~~~~~~~~~~
+
+These options can be modified in the ``.config`` file.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_PMD`` (default **n**)
+
+ Toggle compilation of librte_pmd_mlx5 itself.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_DEBUG`` (default **n**)
+
+ Toggle debugging code and stricter compilation flags. Enabling this option
+ adds additional run-time checks and debugging messages at the cost of
+ lower performance.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_SGE_WR_N`` (default **4**)
+
+ Number of scatter/gather elements (SGEs) per work request (WR). Lowering
+ this number improves performance but also limits the ability to receive
+ scattered packets (packets that do not fit a single mbuf). The default
+ value is a safe tradeoff.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_MAX_INLINE`` (default **0**)
+
+ Amount of data to be inlined during TX operations. Improves latency but
+ lowers throughput.
+
+- ``CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE`` (default **8**)
+
+ Maximum number of cached memory pools (MPs) per TX queue. Each MP from
+ which buffers are to be transmitted must be associated to memory regions
+ (MRs). This is a slow operation that must be cached.
+
+ This value is always 1 for RX queues since they use a single MP.
+
+Run-time configuration
+~~~~~~~~~~~~~~~~~~~~~~
+
+- librte_pmd_mlx5 brings kernel network interfaces up during initialization
+ because it is affected by their state. Forcing them down prevents packets
+ reception.
+
+- **ethtool** operations on related kernel interfaces also affect the PMD.
+
+Prerequisites
+-------------
+
+This driver relies on external libraries and kernel drivers for resources
+allocations and initialization. The following dependencies are not part of
+DPDK and must be installed separately:
+
+- **libibverbs**
+
+ User space Verbs framework used by librte_pmd_mlx5. This library provides
+ a generic interface between the kernel and low-level user space drivers
+ such as libmlx5.
+
+ It allows slow and privileged operations (context initialization, hardware
+ resources allocations) to be managed by the kernel and fast operations to
+ never leave user space.
+
+- **libmlx5**
+
+ Low-level user space driver library for Mellanox ConnectX-4 devices,
+ it is automatically loaded by libibverbs.
+
+ This library basically implements send/receive calls to the hardware
+ queues.
+
+- **Kernel modules** (mlnx-ofed-kernel)
+
+ They provide the kernel-side Verbs API and low level device drivers that
+ manage actual hardware initialization and resources sharing with user
+ space processes.
+
+ Unlike most other PMDs, these modules must remain loaded and bound to
+ their devices:
+
+ - mlx5_core: hardware driver managing Mellanox ConnectX-4 devices and
+ related Ethernet kernel network devices.
+ - mlx5_ib: InifiniBand device driver.
+ - ib_uverbs: user space driver for Verbs (entry point for libibverbs).
+
+- **Firmware update**
+
+ Mellanox OFED releases include firmware updates for ConnectX-4 adapters.
+
+ Because each release provides new features, these updates must be applied to
+ match the kernel modules and libraries they come with.
+
+.. note::
+
+ Both libraries are BSD and GPL licensed. Linux kernel modules are GPL
+ licensed.
+
+Getting Mellanox OFED
+~~~~~~~~~~~~~~~~~~~~~
+
+While these libraries and kernel modules are available on OpenFabrics
+Alliance's `website <https://www.openfabrics.org/>`__ and provided by package
+managers on most distributions, this PMD requires Ethernet extensions that
+may not be supported at the moment (this is a work in progress).
+
+`Mellanox OFED
+<http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux>`__
+includes the necessary support and should be used in the meantime. For DPDK,
+only libibverbs, libmlx5, mlnx-ofed-kernel packages and firmware updates are
+required from that distribution.
+
+.. note::
+
+ Several versions of Mellanox OFED are available. Installing the version
+ this DPDK release was developed and tested against is strongly
+ recommended. Please check the `prerequisites`_.
+
+Usage example
+-------------
+
+This section demonstrates how to launch **testpmd** with Mellanox ConnectX-4
+devices managed by librte_pmd_mlx5.
+
+#. Load the kernel modules:
+
+ .. code-block:: console
+
+ modprobe -a ib_uverbs mlx5_core mlx5_ib
+
+ .. note::
+
+ User space I/O kernel modules (uio and igb_uio) are not used and do
+ not have to be loaded.
+
+#. Make sure Ethernet interfaces are in working order and linked to kernel
+ verbs. Related sysfs entries should be present:
+
+ .. code-block:: console
+
+ ls -d /sys/class/net/*/device/infiniband_verbs/uverbs* | cut -d / -f 5
+
+ Example output:
+
+ .. code-block:: console
+
+ eth30
+ eth31
+ eth32
+ eth33
+
+#. Optionally, retrieve their PCI bus addresses for whitelisting:
+
+ .. code-block:: console
+
+ {
+ for intf in eth2 eth3 eth4 eth5;
+ do
+ (cd "/sys/class/net/${intf}/device/" && pwd -P);
+ done;
+ } |
+ sed -n 's,.*/\(.*\),-w \1,p'
+
+ Example output:
+
+ .. code-block:: console
+
+ -w 0000:05:00.1
+ -w 0000:06:00.0
+ -w 0000:06:00.1
+ -w 0000:05:00.0
+
+#. Request huge pages:
+
+ .. code-block:: console
+
+ echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages/nr_hugepages
+
+#. Start testpmd with basic parameters:
+
+ .. code-block:: console
+
+ testpmd -c 0xff00 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
+
+ Example output:
+
+ .. code-block:: console
+
+ [...]
+ EAL: PCI device 0000:05:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_0" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fe
+ EAL: PCI device 0000:05:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_1" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:ff
+ EAL: PCI device 0000:06:00.0 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_2" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fa
+ EAL: PCI device 0000:06:00.1 on NUMA socket 0
+ EAL: probe driver: 15b3:1013 librte_pmd_mlx5
+ PMD: librte_pmd_mlx5: PCI information matches, using device "mlx5_3" (VF: false)
+ PMD: librte_pmd_mlx5: 1 port(s) detected
+ PMD: librte_pmd_mlx5: port 1 MAC address is e4:1d:2d:e7:0c:fb
+ Interactive-mode selected
+ Configuring Port 0 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cba80: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cba80: RX queues number update: 0 -> 2
+ Port 0: E4:1D:2D:E7:0C:FE
+ Configuring Port 1 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ccac8: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ccac8: RX queues number update: 0 -> 2
+ Port 1: E4:1D:2D:E7:0C:FF
+ Configuring Port 2 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8cdb10: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8cdb10: RX queues number update: 0 -> 2
+ Port 2: E4:1D:2D:E7:0C:FA
+ Configuring Port 3 (socket 0)
+ PMD: librte_pmd_mlx5: 0x8ceb58: TX queues number update: 0 -> 2
+ PMD: librte_pmd_mlx5: 0x8ceb58: RX queues number update: 0 -> 2
+ Port 3: E4:1D:2D:E7:0C:FB
+ Checking link statuses...
+ Port 0 Link Up - speed 40000 Mbps - full-duplex
+ Port 1 Link Up - speed 40000 Mbps - full-duplex
+ Port 2 Link Up - speed 10000 Mbps - full-duplex
+ Port 3 Link Up - speed 10000 Mbps - full-duplex
+ Done
+ testpmd>
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 89e4d58..e0f8681 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -31,6 +31,15 @@ New Features
* **Added vhost-user multiple queue support.**
+* **Added support for Mellanox ConnectX-4 adapters (mlx5).**
+
+ The mlx5 poll-mode driver implements support for Mellanox ConnectX-4 EN
+ and Mellanox ConnectX-4 Lx EN families of 10/25/40/50/100 Gb/s adapters.
+
+ Like mlx4, this PMD is only available for Linux and is disabled by default
+ due to external dependencies (libibverbs and libmlx5).
+
+
Resolved Issues
---------------
--
2.1.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5)
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
` (12 preceding siblings ...)
2015-10-30 18:52 ` [PATCH v2 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
@ 2015-10-30 23:18 ` Thomas Monjalon
13 siblings, 0 replies; 32+ messages in thread
From: Thomas Monjalon @ 2015-10-30 23:18 UTC (permalink / raw)
To: Adrien Mazarguil; +Cc: dev
2015-10-30 19:52, Adrien Mazarguil:
> This PMD adds basic support for Mellanox ConnectX-4 (mlx5) families of
> 10/25/40/50/100 Gb/s adapters through the Verbs framework.
>
> Its design is very similar to that of mlx4 from which most of its code is
> borrowed without the mistake of putting it all in a single huge file.
>
> It is disabled by default due to its dependency on libibverbs.
>
> Changes in v2:
> - Removed useless port inactive warning.
> - Simplified code by replacing configured MAC addresses RX queue bit-field
> with flow pointer checks.
> - Replaced allmulti/promisc status bits with request bits to fix
> inconsistencies when restoring these modes.
> - Updated comments about maximum number of MAC addresses and VLAN filters.
> - Improved performance with better prefetching.
> - Fixed deadlock in case of error during port start.
> - Simplified VLAN filtering configuration storage using a basic list instead
> of a table (with holes).
Applied, thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/13] mlx5: add link update device operation
2015-10-30 18:52 ` [PATCH v2 09/13] mlx5: add link update device operation Adrien Mazarguil
@ 2015-11-02 17:52 ` Stephen Hemminger
2015-11-02 18:27 ` Adrien Mazarguil
0 siblings, 1 reply; 32+ messages in thread
From: Stephen Hemminger @ 2015-11-02 17:52 UTC (permalink / raw)
To: Adrien Mazarguil; +Cc: dev
On Fri, 30 Oct 2015 19:52:38 +0100
Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
> +static int
> +mlx5_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
> +{
> + struct priv *priv = dev->data->dev_private;
> + struct ethtool_cmd edata = {
> + .cmd = ETHTOOL_GSET
> + };
> + struct ifreq ifr;
> + struct rte_eth_link dev_link;
> + int link_speed = 0;
> +
> + (void)wait_to_complete;
DPDK style is to use the __rte_unused attribute rather than dummy statements
to avoid unused warnings.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/13] mlx5: add link update device operation
2015-11-02 17:52 ` Stephen Hemminger
@ 2015-11-02 18:27 ` Adrien Mazarguil
2015-11-02 18:43 ` Stephen Hemminger
0 siblings, 1 reply; 32+ messages in thread
From: Adrien Mazarguil @ 2015-11-02 18:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On Mon, Nov 02, 2015 at 09:52:17AM -0800, Stephen Hemminger wrote:
> On Fri, 30 Oct 2015 19:52:38 +0100
> Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
>
> > +static int
> > +mlx5_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
> > +{
> > + struct priv *priv = dev->data->dev_private;
> > + struct ethtool_cmd edata = {
> > + .cmd = ETHTOOL_GSET
> > + };
> > + struct ifreq ifr;
> > + struct rte_eth_link dev_link;
> > + int link_speed = 0;
> > +
> > + (void)wait_to_complete;
>
> DPDK style is to use the __rte_unused attribute rather than dummy statements
> to avoid unused warnings.
Thanks for pointing this out, I'm used to avoiding C extensions whenever
possible but will stick to DPDK style next time.
Still, it would be nice if we could steer DPDK away from such extensions as
much as possible. As a library, we should allow user applications to compile
with flags we can't control (such as -pedantic -std=c99, and various
-Wsomething).
--
Adrien Mazarguil
6WIND
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 09/13] mlx5: add link update device operation
2015-11-02 18:27 ` Adrien Mazarguil
@ 2015-11-02 18:43 ` Stephen Hemminger
0 siblings, 0 replies; 32+ messages in thread
From: Stephen Hemminger @ 2015-11-02 18:43 UTC (permalink / raw)
To: Adrien Mazarguil; +Cc: dev
On Mon, 2 Nov 2015 19:27:40 +0100
Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote:
> Thanks for pointing this out, I'm used to avoiding C extensions whenever
> possible but will stick to DPDK style next time.
>
> Still, it would be nice if we could steer DPDK away from such extensions as
> much as possible. As a library, we should allow user applications to compile
> with flags we can't control (such as -pedantic -std=c99, and various
> -Wsomething)
No. The extensions are very useful, catch errors, and generate more readable code.
For example the extension to check printf formats.
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2015-11-02 18:42 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-05 17:52 [PATCH 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
2015-10-05 17:52 ` [PATCH 03/13] mlx5: add MAC handling Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 06/13] mlx5: add MTU configuration support Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 09/13] mlx5: add link update device operation Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 10/13] mlx5: add flow control device operations Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 11/13] mlx5: add VLAN filtering Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 12/13] mlx5: add checksum offloading support Adrien Mazarguil
2015-10-05 17:53 ` [PATCH 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 01/13] mlx5: new poll-mode driver for Mellanox ConnectX-4 adapters Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 02/13] mlx5: add non-scattered TX and RX support Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 03/13] mlx5: add MAC handling Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 04/13] mlx5: add device configure/start/stop Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 05/13] mlx5: add support for scattered RX and TX buffers Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 06/13] mlx5: add MTU configuration support Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 07/13] mlx5: add software counters and related callbacks Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 08/13] mlx5: add promiscuous and allmulticast RX modes Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 09/13] mlx5: add link update device operation Adrien Mazarguil
2015-11-02 17:52 ` Stephen Hemminger
2015-11-02 18:27 ` Adrien Mazarguil
2015-11-02 18:43 ` Stephen Hemminger
2015-10-30 18:52 ` [PATCH v2 10/13] mlx5: add flow control device operations Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 11/13] mlx5: add VLAN filtering Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 12/13] mlx5: add checksum offloading support Adrien Mazarguil
2015-10-30 18:52 ` [PATCH v2 13/13] doc: add mlx5 documentation and release notes for version 2.2 Adrien Mazarguil
2015-10-30 23:18 ` [PATCH v2 00/13] Mellanox ConnectX-4 PMD (mlx5) Thomas Monjalon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.