From: Saeed Mahameed <saeed@kernel.org>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
netdev@vger.kernel.org, Tariq Toukan <tariqt@nvidia.com>,
Gal Pressman <gal@nvidia.com>,
Leon Romanovsky <leonro@nvidia.com>, Jiri Pirko <jiri@nvidia.com>
Subject: [PATCH net-next 14/14] net/mlx5: Implement eSwitch hairpin per prio buffers devlink params
Date: Thu, 27 Feb 2025 18:12:27 -0800 [thread overview]
Message-ID: <20250228021227.871993-15-saeed@kernel.org> (raw)
In-Reply-To: <20250228021227.871993-1-saeed@kernel.org>
From: Saeed Mahameed <saeedm@nvidia.com>
E-Switch hairpin per prio buffers are controlled and configurable by the
device, add two devlink params to control them.
esw_haripin_per_prio_log_queue_size: p0,p1,....,p7
Log(base 2) of the number of packets descriptors allocated
internally for hairpin for IEEE802.1p priorities.
0 means that no descriptors are allocated for this priority
and traffic with this priority will be dropped.
esw_hairpin_per_prio_log_buf_size: p0,p1,...,p7
Log(base 2) of the buffer size (in bytes) allocated internally
for hairpin for IEEE802.1p priorities.
0 means no buffer for this priority and traffic with this
priority will be dropped.
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
Documentation/networking/devlink/mlx5.rst | 15 +
.../net/ethernet/mellanox/mlx5/core/devlink.h | 4 +-
.../mellanox/mlx5/core/lib/nv_param.c | 272 ++++++++++++++++++
3 files changed, 290 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index c9c064de4699..053060de6126 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -161,6 +161,21 @@ parameters.
* ``balanced`` : Merges fewer CQEs, resulting in a moderate compression ratio but maintaining a balance between bandwidth savings and performance
* ``aggressive`` : Merges more CQEs into a single entry, achieving a higher compression rate and maximizing performance, particularly under high traffic loads
+ * - ``esw_hairpin_per_prio_log_queue_size``
+ - u32 array[8]
+ - permanent
+ - each item is log(base 2) of the number of packet descriptors allocated
+ internally for hairpin for IEEE802.1p priorities.
+ 0 means that no descriptors are allocated for this priority
+ and traffic with this priority will be dropped.
+
+ * - ``esw_hairpin_per_prio_log_buf_size``
+ - u32 array[8]
+ - permanent
+ - each item is log(base 2) of the buffer size (in bytes) allocated internally
+ for hairpin for IEEE802.1p priorities.
+ 0 means no buffer for this priority and traffic with this priority will be dropped.
+
The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD``
Info versions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
index 74bcdfa70361..b2c10ce1eac5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.h
@@ -22,7 +22,9 @@ enum mlx5_devlink_param_id {
MLX5_DEVLINK_PARAM_ID_ESW_MULTIPORT,
MLX5_DEVLINK_PARAM_ID_HAIRPIN_NUM_QUEUES,
MLX5_DEVLINK_PARAM_ID_HAIRPIN_QUEUE_SIZE,
- MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE
+ MLX5_DEVLINK_PARAM_ID_CQE_COMPRESSION_TYPE,
+ MLX5_DEVLINK_PARAM_ID_ESW_HAIRPIN_DESCRIPTORS,
+ MLX5_DEVLINK_PARAM_ID_ESW_HAIRPIN_DATA_SIZE,
};
struct mlx5_trap_ctx {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
index 159d75967a48..d9815c66ea58 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/nv_param.c
@@ -1,11 +1,15 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+#include <net/dcbnl.h>
+
#include "nv_param.h"
#include "mlx5_core.h"
#include "en.h"
enum {
+ MLX5_CLASS_0_CTRL_ID_NV_INTERNAL_HAIRPIN_CONF = 0x13,
+ MLX5_CLASS_0_CTRL_ID_NV_INTERNAL_HAIRPIN_CAP = 0x14,
MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CONF = 0x80,
MLX5_CLASS_0_CTRL_ID_NV_GLOBAL_PCI_CAP = 0x81,
MLX5_CLASS_0_CTRL_ID_NV_SW_OFFLOAD_CONFIG = 0x10a,
@@ -145,6 +149,19 @@ struct mlx5_ifc_nv_keep_link_up_bits {
u8 keep_eth_link_up[0x1];
};
+struct mlx5_ifc_nv_internal_hairpin_cap_bits {
+ u8 log_max_hpin_total_num_descriptors[0x8];
+ u8 log_max_hpin_total_data_size[0x8];
+ u8 log_max_hpin_num_descriptor_per_prio[0x8];
+ u8 log_max_hpin_data_size_per_prio[0x8];
+};
+
+struct mlx5_ifc_nv_internal_hairpin_conf_bits {
+ u8 log_hpin_num_descriptor[8][0x8];
+
+ u8 log_hpin_data_size[8][0x8];
+};
+
#define MNVDA_HDR_SZ \
(MLX5_ST_SZ_BYTES(mnvda_reg) - MLX5_BYTE_OFF(mnvda_reg, configuration_item_data))
@@ -531,6 +548,247 @@ static int mlx5_devlink_total_vfs_validate(struct devlink *devlink, u32 id,
return 0;
}
+static int
+mlx5_nv_param_read_internal_hairpin_conf(struct mlx5_core_dev *dev,
+ void *mnvda, size_t len)
+{
+ MLX5_SET_CONFIG_ITEM_TYPE(global, mnvda, type_class, 0);
+ MLX5_SET_CONFIG_ITEM_TYPE(global, mnvda, parameter_index,
+ MLX5_CLASS_0_CTRL_ID_NV_INTERNAL_HAIRPIN_CONF);
+ MLX5_SET_CONFIG_HDR_LEN(mnvda, nv_internal_hairpin_conf);
+
+ return mlx5_nv_param_read(dev, mnvda, len);
+}
+
+static int
+mlx5_nv_param_read_internal_hairpin_cap(struct mlx5_core_dev *dev,
+ void *mnvda, size_t len)
+{
+ MLX5_SET_CONFIG_ITEM_TYPE(global, mnvda, type_class, 0);
+ MLX5_SET_CONFIG_ITEM_TYPE(global, mnvda, parameter_index,
+ MLX5_CLASS_0_CTRL_ID_NV_INTERNAL_HAIRPIN_CAP);
+
+ return mlx5_nv_param_read(dev, mnvda, len);
+}
+
+static int
+mlx5_nv_param_esw_hairpin_descriptors_get(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ void *data;
+ int err, i;
+
+ BUILD_BUG_ON(IEEE_8021QAZ_MAX_TCS > __DEVLINK_PARAM_MAX_ARRAY_SIZE);
+
+ err = mlx5_nv_param_read_internal_hairpin_conf(dev, mnvda, sizeof(mnvda));
+ if (err)
+ return err;
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+
+ ctx->val.arr.size = IEEE_8021QAZ_MAX_TCS;
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ ctx->val.arr.vu32[i] = MLX5_GET(nv_internal_hairpin_conf, data,
+ log_hpin_num_descriptor[i]);
+ return 0;
+}
+
+static int
+mlx5_nv_param_esw_hairpin_descriptors_set(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ void *data;
+ int err, i;
+
+ err = mlx5_nv_param_read_internal_hairpin_conf(dev, mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query internal hairpin conf");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ MLX5_SET(nv_internal_hairpin_conf, data,
+ log_hpin_num_descriptor[i], ctx->val.arr.vu32[i]);
+
+ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda));
+}
+
+static int
+mlx5_nv_param_esw_hairpin_descriptors_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ u8 log_max_num_descriptors, log_max_total_descriptors;
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ u16 total = 0;
+ void *data;
+ int err, i;
+
+ if (val.arr.size != IEEE_8021QAZ_MAX_TCS) {
+ NL_SET_ERR_MSG_FMT_MOD(extack, "Array size must be %d",
+ IEEE_8021QAZ_MAX_TCS);
+ return -EINVAL;
+ }
+ err = mlx5_nv_param_read_internal_hairpin_cap(devlink_priv(devlink),
+ mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query internal hairpin cap");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ log_max_total_descriptors = MLX5_GET(nv_internal_hairpin_cap, data,
+ log_max_hpin_total_num_descriptors);
+ log_max_num_descriptors = MLX5_GET(nv_internal_hairpin_cap, data,
+ log_max_hpin_num_descriptor_per_prio);
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (val.arr.vu32[i] <= log_max_num_descriptors)
+ continue;
+
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "Max allowed value per prio is %d",
+ log_max_num_descriptors);
+ return -ERANGE;
+ }
+
+ /* Validate total number of descriptors */
+ memset(mnvda, 0, sizeof(mnvda));
+ err = mlx5_nv_param_read_internal_hairpin_conf(devlink_priv(devlink),
+ mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query internal hairpin conf");
+ return err;
+ }
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ total += 1 << val.arr.vu32[i];
+
+ if (total > (1 << log_max_total_descriptors)) {
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "Log max total value allowed is %d",
+ log_max_total_descriptors);
+ return -ERANGE;
+ }
+
+ return 0;
+}
+
+static int
+mlx5_nv_param_esw_hairpin_data_size_get(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ void *data;
+ int err, i;
+
+ err = mlx5_nv_param_read_internal_hairpin_conf(dev, mnvda, sizeof(mnvda));
+ if (err)
+ return err;
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ ctx->val.arr.size = IEEE_8021QAZ_MAX_TCS;
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ ctx->val.arr.vu32[i] = MLX5_GET(nv_internal_hairpin_conf, data,
+ log_hpin_data_size[i]);
+ return 0;
+}
+
+static int
+mlx5_nv_param_esw_hairpin_data_size_set(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ int err, i;
+ void *data;
+
+ err = mlx5_nv_param_read_internal_hairpin_conf(dev, mnvda, sizeof(mnvda));
+ if (err)
+ return err;
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ MLX5_SET(nv_internal_hairpin_conf, data, log_hpin_data_size[i],
+ ctx->val.arr.vu32[i]);
+
+ return mlx5_nv_param_write(dev, mnvda, sizeof(mnvda));
+}
+
+static int
+mlx5_nv_param_esw_hairpin_data_size_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ u8 log_max_data_size, log_max_total_data_size;
+ u32 mnvda[MLX5_ST_SZ_DW(mnvda_reg)] = {};
+ unsigned long total = 0;
+ void *data;
+ int err, i;
+
+ if (val.arr.size != IEEE_8021QAZ_MAX_TCS) {
+ NL_SET_ERR_MSG_FMT_MOD(extack, "Array size must be %d",
+ IEEE_8021QAZ_MAX_TCS);
+ return -EINVAL;
+ }
+
+ err = mlx5_nv_param_read_internal_hairpin_cap(devlink_priv(devlink),
+ mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query internal hairpin cap");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+ log_max_data_size = MLX5_GET(nv_internal_hairpin_cap, data,
+ log_max_hpin_data_size_per_prio);
+ log_max_total_data_size = MLX5_GET(nv_internal_hairpin_cap, data,
+ log_max_hpin_total_data_size);
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (val.arr.vu32[i] <= log_max_data_size)
+ continue;
+
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "Max allowed value per prio is %d",
+ log_max_data_size);
+ return -ERANGE;
+ }
+
+ /* Validate total data size */
+ memset(mnvda, 0, sizeof(mnvda));
+ err = mlx5_nv_param_read_internal_hairpin_conf(devlink_priv(devlink),
+ mnvda, sizeof(mnvda));
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Unable to query internal hairpin conf");
+ return err;
+ }
+
+ data = MLX5_ADDR_OF(mnvda_reg, mnvda, configuration_item_data);
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
+ total += 1 << val.arr.vu32[i];
+
+ if (total > (1 << log_max_total_data_size)) {
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "Log max total value allowed is %d",
+ log_max_total_data_size);
+ return -ERANGE;
+ }
+
+ return 0;
+}
+
static const struct devlink_param mlx5_nv_param_devlink_params[] = {
DEVLINK_PARAM_GENERIC(ENABLE_SRIOV, BIT(DEVLINK_PARAM_CMODE_PERMANENT),
mlx5_devlink_enable_sriov_get,
@@ -544,6 +802,20 @@ static const struct devlink_param mlx5_nv_param_devlink_params[] = {
mlx5_nv_param_devlink_cqe_compress_get,
mlx5_nv_param_devlink_cqe_compress_set,
mlx5_nv_param_devlink_cqe_compress_validate),
+ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_ESW_HAIRPIN_DESCRIPTORS,
+ "esw_hairpin_per_prio_log_queue_size",
+ DEVLINK_PARAM_TYPE_ARR_U32,
+ BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ mlx5_nv_param_esw_hairpin_descriptors_get,
+ mlx5_nv_param_esw_hairpin_descriptors_set,
+ mlx5_nv_param_esw_hairpin_descriptors_validate),
+ DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_ESW_HAIRPIN_DATA_SIZE,
+ "esw_hairpin_per_prio_log_buf_size",
+ DEVLINK_PARAM_TYPE_ARR_U32,
+ BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ mlx5_nv_param_esw_hairpin_data_size_get,
+ mlx5_nv_param_esw_hairpin_data_size_set,
+ mlx5_nv_param_esw_hairpin_data_size_validate),
};
int mlx5_nv_param_register_dl_params(struct devlink *devlink)
--
2.48.1
prev parent reply other threads:[~2025-02-28 2:13 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-28 2:12 [PATCH net-next 00/14] devlink, mlx5: Add new parameters for link management and SRIOV/eSwitch configurations Saeed Mahameed
2025-02-28 2:12 ` [PATCH net-next 01/14] devlink: define enum for attr types of dynamic attributes Saeed Mahameed
2025-03-06 12:05 ` Simon Horman
2025-03-19 22:45 ` Saeed Mahameed
2025-02-28 2:12 ` [PATCH net-next 02/14] devlink: Add 'total_vfs' generic device param Saeed Mahameed
2025-02-28 12:39 ` Jiri Pirko
2025-03-04 16:42 ` Kamal Heib
2025-02-28 2:12 ` [PATCH net-next 03/14] net/mlx5: Implement cqe_compress_type via devlink params Saeed Mahameed
2025-02-28 2:12 ` [PATCH net-next 04/14] net/mlx5: Implement devlink enable_sriov parameter Saeed Mahameed
2025-02-28 12:46 ` Jiri Pirko
2025-02-28 18:19 ` Saeed Mahameed
2025-03-03 11:35 ` Jiri Pirko
2025-03-03 2:27 ` kernel test robot
2025-03-04 16:43 ` Kamal Heib
2025-02-28 2:12 ` [PATCH net-next 05/14] net/mlx5: Implement devlink total_vfs parameter Saeed Mahameed
2025-03-04 16:45 ` Kamal Heib
2025-02-28 2:12 ` [PATCH net-next 06/14] devlink: pass struct devlink_port * as arg to devlink_nl_param_fill() Saeed Mahameed
2025-02-28 2:12 ` [PATCH net-next 07/14] devlink: Implement port params registration Saeed Mahameed
2025-02-28 11:58 ` Przemek Kitszel
2025-02-28 12:28 ` Jiri Pirko
2025-02-28 13:23 ` Przemek Kitszel
2025-02-28 15:21 ` Jiri Pirko
2025-03-20 8:16 ` Przemek Kitszel
2025-02-28 2:12 ` [PATCH net-next 08/14] devlink: Implement get/dump netlink commands for port params Saeed Mahameed
2025-02-28 2:12 ` [PATCH net-next 09/14] devlink: Implement set netlink command " Saeed Mahameed
2025-02-28 12:49 ` Jiri Pirko
2025-02-28 2:12 ` [PATCH net-next 10/14] devlink: Add 'keep_link_up' generic devlink device param Saeed Mahameed
2025-02-28 12:51 ` Jiri Pirko
2025-02-28 2:12 ` [PATCH net-next 11/14] net/mlx5: Implement devlink keep_link_up port parameter Saeed Mahameed
2025-02-28 12:51 ` Jiri Pirko
2025-02-28 2:12 ` [PATCH net-next 12/14] devlink: Throw extack messages on param value validation error Saeed Mahameed
2025-02-28 12:53 ` Jiri Pirko
2025-03-03 7:06 ` Dan Carpenter
2025-02-28 2:12 ` [PATCH net-next 13/14] devlink: Implement devlink param multi attribute nested data values Saeed Mahameed
2025-02-28 2:12 ` Saeed Mahameed [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250228021227.871993-15-saeed@kernel.org \
--to=saeed@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.