* [net-next 0/2] devlink: Add port function attribute for IO EQs
@ 2024-04-01 9:05 Parav Pandit
2024-04-01 9:05 ` [net-next 1/2] devlink: Support setting max_io_eqs Parav Pandit
2024-04-01 9:05 ` [net-next 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit
0 siblings, 2 replies; 5+ messages in thread
From: Parav Pandit @ 2024-04-01 9:05 UTC (permalink / raw)
To: netdev, davem, edumazet, kuba, pabeni, corbet
Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc,
linux-rdma, Parav Pandit
Currently, PCI SFs and VFs use IO event queues to deliver netdev per
channel events. The number of netdev channels is a function of IO
event queues. In the second scenario of an RDMA device, the
completion vectors are also a function of IO event queues. Currently, an
administrator on the hypervisor has no means to provision the number
of IO event queues for the SF device or the VF device. Device/firmware
determines some arbitrary value for these IO event queues. Due to this,
the SF netdev channels are unpredictable, and consequently, the
performance is too.
This short series introduces a new port function attribute: max_io_eqs.
The goal is to provide administrators at the hypervisor level with the
ability to provision the maximum number of IO event queues for a
function. This gives the control to the administrator to provision
right number of IO event queues and have predictable performance.
Examples of when an administrator provisions (set) maximum number of
IO event queues when using switchdev mode:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
$ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
This sets the corresponding maximum IO event queues of the function
before it is enumerated. Thus, when the VF/SF driver reads the
capability from the device, it sees the value provisioned by the
hypervisor. The driver is then able to configure the number of channels
for the net device, as well as the number of completion vectors
for the RDMA device. The device/firmware also honors the provisioned
value, hence any VF/SF driver attempting to create IO EQs
beyond provisioned value results in an error.
With above setting now, the administrator is able to achieve the 2x
performance on SFs with 20 channels. In second example when SF was
provisioned for a container with 2 cpus, the administrator provisioned only
2 IO event queues, thereby saving device resources.
With the above settings now in place, the administrator achieved 2x
performance with the SF device with 20 channels. In the second example,
when the SF was provisioned for a container with 2 CPUs, the administrator
provisioned only 2 IO event queues, thereby saving device resources.
Parav Pandit (2):
devlink: Support setting max_io_eqs
mlx5/core: Support max_io_eqs for a function
.../networking/devlink/devlink-port.rst | 25 +++++
.../mellanox/mlx5/core/esw/devlink_port.c | 2 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
.../mellanox/mlx5/core/eswitch_offloads.c | 94 +++++++++++++++++++
include/net/devlink.h | 14 +++
include/uapi/linux/devlink.h | 1 +
net/devlink/port.c | 52 ++++++++++
7 files changed, 195 insertions(+)
--
2.26.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [net-next 1/2] devlink: Support setting max_io_eqs
2024-04-01 9:05 [net-next 0/2] devlink: Add port function attribute for IO EQs Parav Pandit
@ 2024-04-01 9:05 ` Parav Pandit
2024-04-01 9:05 ` [net-next 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit
1 sibling, 0 replies; 5+ messages in thread
From: Parav Pandit @ 2024-04-01 9:05 UTC (permalink / raw)
To: netdev, davem, edumazet, kuba, pabeni, corbet
Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc,
linux-rdma, Parav Pandit, Jiri Pirko
Many devices send event notifications for the IO queues,
such as tx and rx queues, through event queues.
Enable a privileged owner, such as a hypervisor PF, to set the number
of IO event queues for the VF and SF during the provisioning stage.
example:
Get maximum IO event queues of the VF device::
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
Set maximum IO event queues of the VF device::
$ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
function:
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
.../networking/devlink/devlink-port.rst | 25 +++++++++
include/net/devlink.h | 14 +++++
include/uapi/linux/devlink.h | 1 +
net/devlink/port.c | 52 +++++++++++++++++++
4 files changed, 92 insertions(+)
diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 562f46b41274..451f57393f11 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using
Users may also set the IPsec packet capability of the function using
`devlink port function set ipsec_packet` command.
+Users may also set the maximum IO event queues of the function
+using `devlink port function set max_io_eqs` command.
+
Function attributes
===================
@@ -295,6 +298,28 @@ policy is processed in software by the kernel.
function:
hw_addr 00:00:00:00:00:00 ipsec_packet enabled
+Maximum IO events queues setup
+------------------------------
+When user sets maximum number of IO event queues for a SF or
+a VF, such function driver is limited to consume only enforced
+number of IO event queues.
+
+- Get maximum IO event queues of the VF device::
+
+ $ devlink port show pci/0000:06:00.0/2
+ pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+ function:
+ hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
+
+- Set maximum IO event queues of the VF device::
+
+ $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
+
+ $ devlink port show pci/0000:06:00.0/2
+ pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
+ function:
+ hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
+
Subfunction
============
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9ac394bdfbe4..a270e71dee0e 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink);
* capability. Should be used by device drivers to
* enable/disable ipsec_packet capability of a
* function managed by the devlink port.
+ * @port_fn_max_io_eqs_get: Callback used to get port function's maximum number of
+ * event queues. Should be used by device drivers to
+ * report the maximum event queues of a function
+ * managed by the devlink port.
+ * @port_fn_max_io_eqs_set: Callback used to set port function's maximum number of
+ * event queues. Should be used by device drivers to
+ * configure maximum number of event queues
+ * of a function managed by the devlink port.
*
* Note: Driver should return -EOPNOTSUPP if it doesn't support
* port function (@port_fn_*) handling for a particular port.
@@ -1651,6 +1659,12 @@ struct devlink_port_ops {
int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port,
bool enable,
struct netlink_ext_ack *extack);
+ int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port,
+ u32 *max_eqs,
+ struct netlink_ext_ack *extack);
+ int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port,
+ u32 max_eqs,
+ struct netlink_ext_ack *extack);
};
void devlink_port_init(struct devlink *devlink,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 2da0c7eb6710..9401aa343673 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -686,6 +686,7 @@ enum devlink_port_function_attr {
DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */
DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */
DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */
+ DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */
__DEVLINK_PORT_FUNCTION_ATTR_MAX,
DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1
diff --git a/net/devlink/port.c b/net/devlink/port.c
index 118d130d2afd..307bfeedda54 100644
--- a/net/devlink/port.c
+++ b/net/devlink/port.c
@@ -16,6 +16,7 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_
DEVLINK_PORT_FN_STATE_ACTIVE),
[DEVLINK_PORT_FN_ATTR_CAPS] =
NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK),
+ [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 },
};
#define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \
@@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct devlink_port *devlink_port,
return 0;
}
+static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port,
+ struct sk_buff *msg,
+ struct netlink_ext_ack *extack,
+ bool *msg_updated)
+{
+ u32 max_io_eqs;
+ int err;
+
+ if (!port->ops->port_fn_max_io_eqs_get)
+ return 0;
+
+ err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, extack);
+ if (err) {
+ if (err == -EOPNOTSUPP)
+ return 0;
+ return err;
+ }
+ err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, max_io_eqs);
+ if (err)
+ return err;
+ *msg_updated = true;
+ return 0;
+}
+
int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port)
{
if (devlink_nl_put_handle(msg, devlink_port->devlink))
@@ -409,6 +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
return 0;
}
+static int
+devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port,
+ const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
+{
+ u32 max_io_eqs;
+
+ max_io_eqs = nla_get_u32(attr);
+ return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port,
+ max_io_eqs, extack);
+}
+
static int
devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
struct netlink_ext_ack *extack)
@@ -428,6 +465,9 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
if (err)
goto out;
err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated);
+ if (err)
+ goto out;
+ err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, &msg_updated);
if (err)
goto out;
err = devlink_rel_devlink_handle_put(msg, port->devlink,
@@ -726,6 +766,11 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
}
}
}
+ if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && !ops->port_fn_max_io_eqs_set) {
+ NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS],
+ "Function does not support max_io_eqs setting");
+ return -EOPNOTSUPP;
+ }
return 0;
}
@@ -761,6 +806,13 @@ static int devlink_port_function_set(struct devlink_port *port,
return err;
}
+ attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS];
+ if (attr) {
+ err = devlink_port_fn_max_io_eqs_set(port, attr, extack);
+ if (err)
+ return err;
+ }
+
/* Keep this as the last function attribute set, so that when
* multiple port function attributes are set along with state,
* Those can be applied first before activating the state.
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [net-next 2/2] mlx5/core: Support max_io_eqs for a function
2024-04-01 9:05 [net-next 0/2] devlink: Add port function attribute for IO EQs Parav Pandit
2024-04-01 9:05 ` [net-next 1/2] devlink: Support setting max_io_eqs Parav Pandit
@ 2024-04-01 9:05 ` Parav Pandit
2024-04-01 9:17 ` Kalesh Anakkur Purayil
1 sibling, 1 reply; 5+ messages in thread
From: Parav Pandit @ 2024-04-01 9:05 UTC (permalink / raw)
To: netdev, davem, edumazet, kuba, pabeni, corbet
Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc,
linux-rdma, Parav Pandit, Jiri Pirko
Implement get and set for the maximum IO event queues for SF and VF.
This enables administrator on the hypervisor to control the maximum
IO event queues which are typically used to derive the maximum and
default number of net device channels or rdma device completion vectors.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
.../mellanox/mlx5/core/esw/devlink_port.c | 2 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
.../mellanox/mlx5/core/eswitch_offloads.c | 94 +++++++++++++++++++
3 files changed, 103 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index d8e739cbcbce..76d1ed93c773 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = {
.port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get,
.port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set,
#endif /* CONFIG_XFRM_OFFLOAD */
+ .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get,
+ .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set,
};
static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 349e28a6dd8d..50ce1ea20dd4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en
int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable,
struct netlink_ext_ack *extack);
#endif /* CONFIG_XFRM_OFFLOAD */
+int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port,
+ u32 *max_io_eqs,
+ struct netlink_ext_ack *extack);
+int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port,
+ u32 max_io_eqs,
+ struct netlink_ext_ack *extack);
+
void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type);
int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index baaae628b0a0..9d9a06a25cac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -66,6 +66,8 @@
#define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1)
+#define MLX5_ESW_MAX_CTRL_EQS 4
+
static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = {
.max_fte = MLX5_ESW_VPORT_TBL_SIZE,
.max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS,
@@ -4557,3 +4559,95 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port,
return err;
}
#endif /* CONFIG_XFRM_OFFLOAD */
+
+int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
+ struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
+ int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+ u16 vport_num = vport->vport;
+ void *query_ctx;
+ void *hca_caps;
+ u32 max_eqs;
+ int err;
+
+ if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
+ NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
+ return -EOPNOTSUPP;
+ }
+
+ query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
+ if (!query_ctx)
+ return -ENOMEM;
+
+ mutex_lock(&esw->state_lock);
+ err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
+ MLX5_CAP_GENERAL);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
+ goto out;
+ }
+
+ hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+ max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs);
+ if (max_eqs < MLX5_ESW_MAX_CTRL_EQS)
+ *max_io_eqs = 0;
+ else
+ *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS;
+out:
+ mutex_unlock(&esw->state_lock);
+ return 0;
+}
+
+int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
+ struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
+ int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+ u16 vport_num = vport->vport;
+ u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS;
+ void *query_ctx;
+ void *hca_caps;
+ int err;
+
+ if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
+ NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
+ return -EOPNOTSUPP;
+ }
+
+ if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) {
+ NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range");
+ return -EINVAL;
+ }
+
+ mutex_lock(&esw->state_lock);
+
+ query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
+ if (!query_ctx) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
+ MLX5_CAP_GENERAL);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
+ goto out_free;
+ }
+
+ hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
+ MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs);
+
+ err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num,
+ MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
+ if (err)
+ NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA roce cap");
+
+out_free:
+ kfree(query_ctx);
+out:
+ mutex_unlock(&esw->state_lock);
+ return err;
+}
--
2.26.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [net-next 2/2] mlx5/core: Support max_io_eqs for a function
2024-04-01 9:05 ` [net-next 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit
@ 2024-04-01 9:17 ` Kalesh Anakkur Purayil
2024-04-01 9:47 ` Parav Pandit
0 siblings, 1 reply; 5+ messages in thread
From: Kalesh Anakkur Purayil @ 2024-04-01 9:17 UTC (permalink / raw)
To: Parav Pandit
Cc: netdev, davem, edumazet, kuba, pabeni, corbet, saeedm, leon, jiri,
shayd, danielj, dchumak, linux-doc, linux-rdma, Jiri Pirko
[-- Attachment #1: Type: text/plain, Size: 7270 bytes --]
On Mon, Apr 1, 2024 at 2:36 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Implement get and set for the maximum IO event queues for SF and VF.
> This enables administrator on the hypervisor to control the maximum
> IO event queues which are typically used to derive the maximum and
> default number of net device channels or rdma device completion vectors.
>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> ---
> .../mellanox/mlx5/core/esw/devlink_port.c | 2 +
> .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
> .../mellanox/mlx5/core/eswitch_offloads.c | 94 +++++++++++++++++++
> 3 files changed, 103 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> index d8e739cbcbce..76d1ed93c773 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> @@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = {
> .port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get,
> .port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set,
> #endif /* CONFIG_XFRM_OFFLOAD */
> + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get,
> + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set,
> };
>
> static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> index 349e28a6dd8d..50ce1ea20dd4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en
> int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable,
> struct netlink_ext_ack *extack);
> #endif /* CONFIG_XFRM_OFFLOAD */
> +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port,
> + u32 *max_io_eqs,
> + struct netlink_ext_ack *extack);
> +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port,
> + u32 max_io_eqs,
> + struct netlink_ext_ack *extack);
> +
> void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type);
>
> int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index baaae628b0a0..9d9a06a25cac 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -66,6 +66,8 @@
>
> #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1)
>
> +#define MLX5_ESW_MAX_CTRL_EQS 4
> +
> static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = {
> .max_fte = MLX5_ESW_VPORT_TBL_SIZE,
> .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS,
> @@ -4557,3 +4559,95 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port,
> return err;
> }
> #endif /* CONFIG_XFRM_OFFLOAD */
> +
> +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs,
> + struct netlink_ext_ack *extack)
> +{
> + struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
> + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
> + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
> + u16 vport_num = vport->vport;
> + void *query_ctx;
> + void *hca_caps;
> + u32 max_eqs;
> + int err;
> +
> + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
> + NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
> + return -EOPNOTSUPP;
> + }
> +
> + query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
> + if (!query_ctx)
> + return -ENOMEM;
> +
> + mutex_lock(&esw->state_lock);
> + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
> + MLX5_CAP_GENERAL);
> + if (err) {
> + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
> + goto out;
> + }
> +
> + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
> + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs);
> + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS)
> + *max_io_eqs = 0;
> + else
> + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS;
> +out:
[Kalesh]: Missing " kfree(query_ctx);" here?
> + mutex_unlock(&esw->state_lock);
> + return 0;
[Kalesh] "return err;" to propagate the error back to the caller?
> +}
> +
> +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs,
> + struct netlink_ext_ack *extack)
> +{
> + struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
> + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
> + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
> + u16 vport_num = vport->vport;
> + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS;
> + void *query_ctx;
> + void *hca_caps;
> + int err;
> +
> + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
> + NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
> + return -EOPNOTSUPP;
> + }
> +
> + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) {
> + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range");
> + return -EINVAL;
> + }
> +
> + mutex_lock(&esw->state_lock);
> +
> + query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
> + if (!query_ctx) {
> + err = -ENOMEM;
> + goto out;
> + }
> +
> + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
> + MLX5_CAP_GENERAL);
> + if (err) {
> + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
> + goto out_free;
> + }
> +
> + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
> + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs);
> +
> + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num,
> + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
> + if (err)
> + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA roce cap");
> +
> +out_free:
> + kfree(query_ctx);
> +out:
> + mutex_unlock(&esw->state_lock);
> + return err;
> +}
> --
> 2.26.2
>
>
--
Regards,
Kalesh A P
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4239 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [net-next 2/2] mlx5/core: Support max_io_eqs for a function
2024-04-01 9:17 ` Kalesh Anakkur Purayil
@ 2024-04-01 9:47 ` Parav Pandit
0 siblings, 0 replies; 5+ messages in thread
From: Parav Pandit @ 2024-04-01 9:47 UTC (permalink / raw)
To: Kalesh Anakkur Purayil
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net,
Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori,
Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org,
linux-rdma@vger.kernel.org, Jiri Pirko
> From: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
> Sent: Monday, April 1, 2024 2:47 PM
>
> On Mon, Apr 1, 2024 at 2:36 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Implement get and set for the maximum IO event queues for SF and VF.
> > This enables administrator on the hypervisor to control the maximum
> > IO event queues which are typically used to derive the maximum and
> > default number of net device channels or rdma device completion vectors.
> >
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> > ---
> > .../mellanox/mlx5/core/esw/devlink_port.c | 2 +
> > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
> > .../mellanox/mlx5/core/eswitch_offloads.c | 94 +++++++++++++++++++
> > 3 files changed, 103 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> > index d8e739cbcbce..76d1ed93c773 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
> > @@ -98,6 +98,8 @@ static const struct devlink_port_ops
> mlx5_esw_pf_vf_dl_port_ops = {
> > .port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get,
> > .port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set,
> > #endif /* CONFIG_XFRM_OFFLOAD */
> > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get,
> > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set,
> > };
> >
> > static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct
> mlx5_eswitch *esw,
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> > index 349e28a6dd8d..50ce1ea20dd4 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> > @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct
> devlink_port *port, bool *is_en
> > int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool
> enable,
> > struct netlink_ext_ack *extack);
> > #endif /* CONFIG_XFRM_OFFLOAD */
> > +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port,
> > + u32 *max_io_eqs,
> > + struct netlink_ext_ack *extack);
> > +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port,
> > + u32 max_io_eqs,
> > + struct netlink_ext_ack *extack);
> > +
> > void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8
> rep_type);
> >
> > int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > index baaae628b0a0..9d9a06a25cac 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > @@ -66,6 +66,8 @@
> >
> > #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1)
> >
> > +#define MLX5_ESW_MAX_CTRL_EQS 4
> > +
> > static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = {
> > .max_fte = MLX5_ESW_VPORT_TBL_SIZE,
> > .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS,
> > @@ -4557,3 +4559,95 @@ int
> mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port,
> > return err;
> > }
> > #endif /* CONFIG_XFRM_OFFLOAD */
> > +
> > +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32
> *max_io_eqs,
> > + struct netlink_ext_ack *extack)
> > +{
> > + struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port-
> >devlink);
> > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
> > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
> > + u16 vport_num = vport->vport;
> > + void *query_ctx;
> > + void *hca_caps;
> > + u32 max_eqs;
> > + int err;
> > +
> > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
> > + NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA
> management");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
> > + if (!query_ctx)
> > + return -ENOMEM;
> > +
> > + mutex_lock(&esw->state_lock);
> > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num,
> query_ctx,
> > + MLX5_CAP_GENERAL);
> > + if (err) {
> > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
> > + goto out;
> > + }
> > +
> > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
> > + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs);
> > + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS)
> > + *max_io_eqs = 0;
> > + else
> > + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS;
> > +out:
> [Kalesh]: Missing " kfree(query_ctx);" here?
> > + mutex_unlock(&esw->state_lock);
> > + return 0;
> [Kalesh] "return err;" to propagate the error back to the caller?
> > +}
Ack. Thanks Kalesh. Fixing both the comments in v1.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-01 9:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-01 9:05 [net-next 0/2] devlink: Add port function attribute for IO EQs Parav Pandit
2024-04-01 9:05 ` [net-next 1/2] devlink: Support setting max_io_eqs Parav Pandit
2024-04-01 9:05 ` [net-next 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit
2024-04-01 9:17 ` Kalesh Anakkur Purayil
2024-04-01 9:47 ` Parav Pandit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).