* [net-next v3 0/2] devlink: Add port function attribute for IO EQs
@ 2024-04-03 17:41 Parav Pandit
2024-04-03 17:41 ` [net-next v3 1/2] devlink: Support setting max_io_eqs Parav Pandit
2024-04-03 17:41 ` [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit
0 siblings, 2 replies; 18+ messages in thread
From: Parav Pandit @ 2024-04-03 17:41 UTC (permalink / raw)
To: netdev, davem, edumazet, kuba, pabeni, corbet,
kalesh-anakkur.purayil
Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc,
linux-rdma, Parav Pandit
Currently, PCI SFs and VFs use IO event queues to deliver netdev per
channel events. The number of netdev channels is a function of IO
event queues. In the second scenario of an RDMA device, the
completion vectors are also a function of IO event queues. Currently, an
administrator on the hypervisor has no means to provision the number
of IO event queues for the SF device or the VF device. Device/firmware
determines some arbitrary value for these IO event queues. Due to this,
the SF netdev channels are unpredictable, and consequently, the
performance is too.
This short series introduces a new port function attribute: max_io_eqs.
The goal is to provide administrators at the hypervisor level with the
ability to provision the maximum number of IO event queues for a
function. This gives the control to the administrator to provision
right number of IO event queues and have predictable performance.
Examples of when an administrator provisions (set) maximum number of
IO event queues when using switchdev mode:
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
$ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
$ devlink port show pci/0000:06:00.0/1
pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
function:
hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
This sets the corresponding maximum IO event queues of the function
before it is enumerated. Thus, when the VF/SF driver reads the
capability from the device, it sees the value provisioned by the
hypervisor. The driver is then able to configure the number of channels
for the net device, as well as the number of completion vectors
for the RDMA device. The device/firmware also honors the provisioned
value, hence any VF/SF driver attempting to create IO EQs
beyond provisioned value results in an error.
With above setting now, the administrator is able to achieve the 2x
performance on SFs with 20 channels. In second example when SF was
provisioned for a container with 2 cpus, the administrator provisioned only
2 IO event queues, thereby saving device resources.
With the above settings now in place, the administrator achieved 2x
performance with the SF device with 20 channels. In the second example,
when the SF was provisioned for a container with 2 CPUs, the administrator
provisioned only 2 IO event queues, thereby saving device resources.
changelog:
v2->v3:
- limited to 80 chars per line in devlink
- fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
on error path
v1->v2:
- limited comment to 80 chars per line in header file
- fixed set function variables for reverse christmas tree
- fixed comments from Kalesh
- fixed missing kfree in get call
- returning error code for get cmd failure
- fixed error msg copy paste error in set on cmd failure
Parav Pandit (2):
devlink: Support setting max_io_eqs
mlx5/core: Support max_io_eqs for a function
.../networking/devlink/devlink-port.rst | 25 +++++
.../mellanox/mlx5/core/esw/devlink_port.c | 4 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++
.../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++
include/net/devlink.h | 14 +++
include/uapi/linux/devlink.h | 1 +
net/devlink/port.c | 53 ++++++++++
7 files changed, 201 insertions(+)
--
2.26.2
^ permalink raw reply [flat|nested] 18+ messages in thread* [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-03 17:41 [net-next v3 0/2] devlink: Add port function attribute for IO EQs Parav Pandit @ 2024-04-03 17:41 ` Parav Pandit 2024-04-04 16:59 ` David Wei 2024-04-05 2:21 ` Jakub Kicinski 2024-04-03 17:41 ` [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit 1 sibling, 2 replies; 18+ messages in thread From: Parav Pandit @ 2024-04-03 17:41 UTC (permalink / raw) To: netdev, davem, edumazet, kuba, pabeni, corbet, kalesh-anakkur.purayil Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma, Parav Pandit, Jiri Pirko Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v2->v3: - limited 80 chars per line v1->v2: - limited comment to 80 chars per line in header file --- .../networking/devlink/devlink-port.rst | 25 +++++++++ include/net/devlink.h | 14 +++++ include/uapi/linux/devlink.h | 1 + net/devlink/port.c | 53 +++++++++++++++++++ 4 files changed, 93 insertions(+) diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst index 562f46b41274..451f57393f11 100644 --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using Users may also set the IPsec packet capability of the function using `devlink port function set ipsec_packet` command. +Users may also set the maximum IO event queues of the function +using `devlink port function set max_io_eqs` command. + Function attributes =================== @@ -295,6 +298,28 @@ policy is processed in software by the kernel. function: hw_addr 00:00:00:00:00:00 ipsec_packet enabled +Maximum IO events queues setup +------------------------------ +When user sets maximum number of IO event queues for a SF or +a VF, such function driver is limited to consume only enforced +number of IO event queues. + +- Get maximum IO event queues of the VF device:: + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 + +- Set maximum IO event queues of the VF device:: + + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 + + $ devlink port show pci/0000:06:00.0/2 + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 + function: + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 + Subfunction ============ diff --git a/include/net/devlink.h b/include/net/devlink.h index 9ac394bdfbe4..bb1af599d101 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); * capability. Should be used by device drivers to * enable/disable ipsec_packet capability of a * function managed by the devlink port. + * @port_fn_max_io_eqs_get: Callback used to get port function's maximum number + * of event queues. Should be used by device drivers to + * report the maximum event queues of a function + * managed by the devlink port. + * @port_fn_max_io_eqs_set: Callback used to set port function's maximum number + * of event queues. Should be used by device drivers to + * configure maximum number of event queues + * of a function managed by the devlink port. * * Note: Driver should return -EOPNOTSUPP if it doesn't support * port function (@port_fn_*) handling for a particular port. @@ -1651,6 +1659,12 @@ struct devlink_port_ops { int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, bool enable, struct netlink_ext_ack *extack); + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, + u32 *max_eqs, + struct netlink_ext_ack *extack); + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, + u32 max_eqs, + struct netlink_ext_ack *extack); }; void devlink_port_init(struct devlink *devlink, diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index 2da0c7eb6710..9401aa343673 100644 --- a/include/uapi/linux/devlink.h +++ b/include/uapi/linux/devlink.h @@ -686,6 +686,7 @@ enum devlink_port_function_attr { DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ __DEVLINK_PORT_FUNCTION_ATTR_MAX, DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index 118d130d2afd..be9158b4453c 100644 --- a/net/devlink/port.c +++ b/net/devlink/port.c @@ -16,6 +16,7 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ DEVLINK_PORT_FN_STATE_ACTIVE), [DEVLINK_PORT_FN_ATTR_CAPS] = NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, }; #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct devlink_port *devlink_port, return 0; } +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, + struct sk_buff *msg, + struct netlink_ext_ack *extack, + bool *msg_updated) +{ + u32 max_io_eqs; + int err; + + if (!port->ops->port_fn_max_io_eqs_get) + return 0; + + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, extack); + if (err) { + if (err == -EOPNOTSUPP) + return 0; + return err; + } + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, max_io_eqs); + if (err) + return err; + *msg_updated = true; + return 0; +} + int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port) { if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, return 0; } +static int +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, + const struct nlattr *attr, + struct netlink_ext_ack *extack) +{ + u32 max_io_eqs; + + max_io_eqs = nla_get_u32(attr); + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, + max_io_eqs, extack); +} + static int devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, struct netlink_ext_ack *extack) @@ -428,6 +465,9 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por if (err) goto out; err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); + if (err) + goto out; + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, &msg_updated); if (err) goto out; err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ -726,6 +766,12 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port, } } } + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && + !ops->port_fn_max_io_eqs_set) { + NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], + "Function does not support max_io_eqs setting"); + return -EOPNOTSUPP; + } return 0; } @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct devlink_port *port, return err; } + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; + if (attr) { + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); + if (err) + return err; + } + /* Keep this as the last function attribute set, so that when * multiple port function attributes are set along with state, * Those can be applied first before activating the state. -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-03 17:41 ` [net-next v3 1/2] devlink: Support setting max_io_eqs Parav Pandit @ 2024-04-04 16:59 ` David Wei 2024-04-04 17:58 ` Parav Pandit 2024-04-05 2:21 ` Jakub Kicinski 1 sibling, 1 reply; 18+ messages in thread From: David Wei @ 2024-04-04 16:59 UTC (permalink / raw) To: Parav Pandit, netdev, davem, edumazet, kuba, pabeni, corbet, kalesh-anakkur.purayil Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma, Jiri Pirko On 2024-04-03 10:41, Parav Pandit wrote: > Many devices send event notifications for the IO queues, > such as tx and rx queues, through event queues. > > Enable a privileged owner, such as a hypervisor PF, to set the number > of IO event queues for the VF and SF during the provisioning stage. > > example: > Get maximum IO event queues of the VF device:: > > $ devlink port show pci/0000:06:00.0/2 > pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 > function: > hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 > > Set maximum IO event queues of the VF device:: > > $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > > $ devlink port show pci/0000:06:00.0/2 > pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 > function: > hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 > > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > Reviewed-by: Shay Drory <shayd@nvidia.com> > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v2->v3: > - limited 80 chars per line > v1->v2: > - limited comment to 80 chars per line in header file > --- > .../networking/devlink/devlink-port.rst | 25 +++++++++ > include/net/devlink.h | 14 +++++ > include/uapi/linux/devlink.h | 1 + > net/devlink/port.c | 53 +++++++++++++++++++ > 4 files changed, 93 insertions(+) > > diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst > index 562f46b41274..451f57393f11 100644 > --- a/Documentation/networking/devlink/devlink-port.rst > +++ b/Documentation/networking/devlink/devlink-port.rst > @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using > Users may also set the IPsec packet capability of the function using > `devlink port function set ipsec_packet` command. > > +Users may also set the maximum IO event queues of the function > +using `devlink port function set max_io_eqs` command. > + > Function attributes > =================== > > @@ -295,6 +298,28 @@ policy is processed in software by the kernel. > function: > hw_addr 00:00:00:00:00:00 ipsec_packet enabled > > +Maximum IO events queues setup > +------------------------------ > +When user sets maximum number of IO event queues for a SF or > +a VF, such function driver is limited to consume only enforced > +number of IO event queues. > + > +- Get maximum IO event queues of the VF device:: > + > + $ devlink port show pci/0000:06:00.0/2 > + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 > + function: > + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 > + > +- Set maximum IO event queues of the VF device:: > + > + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > + > + $ devlink port show pci/0000:06:00.0/2 > + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 > + function: > + hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 > + > Subfunction > ============ > > diff --git a/include/net/devlink.h b/include/net/devlink.h > index 9ac394bdfbe4..bb1af599d101 100644 > --- a/include/net/devlink.h > +++ b/include/net/devlink.h > @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); > * capability. Should be used by device drivers to > * enable/disable ipsec_packet capability of a > * function managed by the devlink port. > + * @port_fn_max_io_eqs_get: Callback used to get port function's maximum number > + * of event queues. Should be used by device drivers to > + * report the maximum event queues of a function > + * managed by the devlink port. > + * @port_fn_max_io_eqs_set: Callback used to set port function's maximum number > + * of event queues. Should be used by device drivers to > + * configure maximum number of event queues > + * of a function managed by the devlink port. > * > * Note: Driver should return -EOPNOTSUPP if it doesn't support > * port function (@port_fn_*) handling for a particular port. > @@ -1651,6 +1659,12 @@ struct devlink_port_ops { > int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, > bool enable, > struct netlink_ext_ack *extack); > + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, > + u32 *max_eqs, > + struct netlink_ext_ack *extack); > + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, > + u32 max_eqs, > + struct netlink_ext_ack *extack); > }; > > void devlink_port_init(struct devlink *devlink, > diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h > index 2da0c7eb6710..9401aa343673 100644 > --- a/include/uapi/linux/devlink.h > +++ b/include/uapi/linux/devlink.h > @@ -686,6 +686,7 @@ enum devlink_port_function_attr { > DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ > DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ > DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ > + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ > > __DEVLINK_PORT_FUNCTION_ATTR_MAX, > DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1 > diff --git a/net/devlink/port.c b/net/devlink/port.c > index 118d130d2afd..be9158b4453c 100644 > --- a/net/devlink/port.c > +++ b/net/devlink/port.c > @@ -16,6 +16,7 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ > DEVLINK_PORT_FN_STATE_ACTIVE), > [DEVLINK_PORT_FN_ATTR_CAPS] = > NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), > + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, > }; > > #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \ > @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct devlink_port *devlink_port, > return 0; > } > > +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, > + struct sk_buff *msg, > + struct netlink_ext_ack *extack, > + bool *msg_updated) > +{ > + u32 max_io_eqs; > + int err; > + > + if (!port->ops->port_fn_max_io_eqs_get) > + return 0; > + > + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, extack); > + if (err) { > + if (err == -EOPNOTSUPP) > + return 0; Docs above says: * Note: Driver should return -EOPNOTSUPP if it doesn't support * port function (@port_fn_*) handling for a particular port. But here you're returning 0 in both cases of no port_fn_max_io_eqs_get or port_fn_max_io_eqs_get() returns EOPNOTSUPP. > + return err; > + } > + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, max_io_eqs); > + if (err) > + return err; > + *msg_updated = true; > + return 0; > +} > + > int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port) > { > if (devlink_nl_put_handle(msg, devlink_port->devlink)) > @@ -409,6 +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port, > return 0; > } > > +static int > +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, > + const struct nlattr *attr, > + struct netlink_ext_ack *extack) > +{ > + u32 max_io_eqs; > + > + max_io_eqs = nla_get_u32(attr); > + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, > + max_io_eqs, extack); > +} > + > static int > devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port, > struct netlink_ext_ack *extack) > @@ -428,6 +465,9 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por > if (err) > goto out; > err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); > + if (err) > + goto out; > + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, &msg_updated); > if (err) > goto out; > err = devlink_rel_devlink_handle_put(msg, port->devlink, > @@ -726,6 +766,12 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port, > } > } > } > + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && > + !ops->port_fn_max_io_eqs_set) { > + NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], > + "Function does not support max_io_eqs setting"); > + return -EOPNOTSUPP; > + } > return 0; > } > > @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct devlink_port *port, > return err; > } > > + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; > + if (attr) { > + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); > + if (err) > + return err; > + } > + > /* Keep this as the last function attribute set, so that when > * multiple port function attributes are set along with state, > * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 16:59 ` David Wei @ 2024-04-04 17:58 ` Parav Pandit 2024-04-04 18:40 ` David Wei 0 siblings, 1 reply; 18+ messages in thread From: Parav Pandit @ 2024-04-04 17:58 UTC (permalink / raw) To: David Wei, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko Hi David, > From: David Wei <dw@davidwei.uk> > Sent: Thursday, April 4, 2024 10:29 PM > > On 2024-04-03 10:41, Parav Pandit wrote: > > Many devices send event notifications for the IO queues, such as tx > > and rx queues, through event queues. > > > > Enable a privileged owner, such as a hypervisor PF, to set the number > > of IO event queues for the VF and SF during the provisioning stage. > > > > example: > > Get maximum IO event queues of the VF device:: > > > > $ devlink port show pci/0000:06:00.0/2 > > pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 > vfnum 1 > > function: > > hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs > > 10 > > > > Set maximum IO event queues of the VF device:: > > > > $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > > > > $ devlink port show pci/0000:06:00.0/2 > > pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 > vfnum 1 > > function: > > hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs > > 32 > > > > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > > Reviewed-by: Shay Drory <shayd@nvidia.com> > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > --- > > changelog: > > v2->v3: > > - limited 80 chars per line > > v1->v2: > > - limited comment to 80 chars per line in header file > > --- > > .../networking/devlink/devlink-port.rst | 25 +++++++++ > > include/net/devlink.h | 14 +++++ > > include/uapi/linux/devlink.h | 1 + > > net/devlink/port.c | 53 +++++++++++++++++++ > > 4 files changed, 93 insertions(+) > > > > diff --git a/Documentation/networking/devlink/devlink-port.rst > > b/Documentation/networking/devlink/devlink-port.rst > > index 562f46b41274..451f57393f11 100644 > > --- a/Documentation/networking/devlink/devlink-port.rst > > +++ b/Documentation/networking/devlink/devlink-port.rst > > @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of > > the function using Users may also set the IPsec packet capability of > > the function using `devlink port function set ipsec_packet` command. > > > > +Users may also set the maximum IO event queues of the function using > > +`devlink port function set max_io_eqs` command. > > + > > Function attributes > > =================== > > > > @@ -295,6 +298,28 @@ policy is processed in software by the kernel. > > function: > > hw_addr 00:00:00:00:00:00 ipsec_packet enabled > > > > +Maximum IO events queues setup > > +------------------------------ > > +When user sets maximum number of IO event queues for a SF or a VF, > > +such function driver is limited to consume only enforced number of IO > > +event queues. > > + > > +- Get maximum IO event queues of the VF device:: > > + > > + $ devlink port show pci/0000:06:00.0/2 > > + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum > 0 vfnum 1 > > + function: > > + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > > + max_io_eqs 10 > > + > > +- Set maximum IO event queues of the VF device:: > > + > > + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > > + > > + $ devlink port show pci/0000:06:00.0/2 > > + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum > 0 vfnum 1 > > + function: > > + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > > + max_io_eqs 32 > > + > > Subfunction > > ============ > > > > diff --git a/include/net/devlink.h b/include/net/devlink.h index > > 9ac394bdfbe4..bb1af599d101 100644 > > --- a/include/net/devlink.h > > +++ b/include/net/devlink.h > > @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); > > * capability. Should be used by device drivers to > > * enable/disable ipsec_packet capability of a > > * function managed by the devlink port. > > + * @port_fn_max_io_eqs_get: Callback used to get port function's > maximum number > > + * of event queues. Should be used by device drivers > to > > + * report the maximum event queues of a function > > + * managed by the devlink port. > > + * @port_fn_max_io_eqs_set: Callback used to set port function's > maximum number > > + * of event queues. Should be used by device drivers > to > > + * configure maximum number of event queues > > + * of a function managed by the devlink port. > > * > > * Note: Driver should return -EOPNOTSUPP if it doesn't support > > * port function (@port_fn_*) handling for a particular port. > > @@ -1651,6 +1659,12 @@ struct devlink_port_ops { > > int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, > > bool enable, > > struct netlink_ext_ack *extack); > > + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, > > + u32 *max_eqs, > > + struct netlink_ext_ack *extack); > > + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, > > + u32 max_eqs, > > + struct netlink_ext_ack *extack); > > }; > > > > void devlink_port_init(struct devlink *devlink, diff --git > > a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index > > 2da0c7eb6710..9401aa343673 100644 > > --- a/include/uapi/linux/devlink.h > > +++ b/include/uapi/linux/devlink.h > > @@ -686,6 +686,7 @@ enum devlink_port_function_attr { > > DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ > > DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ > > DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ > > + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ > > > > __DEVLINK_PORT_FUNCTION_ATTR_MAX, > > DEVLINK_PORT_FUNCTION_ATTR_MAX = > __DEVLINK_PORT_FUNCTION_ATTR_MAX - > > 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index > > 118d130d2afd..be9158b4453c 100644 > > --- a/net/devlink/port.c > > +++ b/net/devlink/port.c > > @@ -16,6 +16,7 @@ static const struct nla_policy > devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ > > DEVLINK_PORT_FN_STATE_ACTIVE), > > [DEVLINK_PORT_FN_ATTR_CAPS] = > > > NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), > > + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, > > }; > > > > #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) > \ > > @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct > devlink_port *devlink_port, > > return 0; > > } > > > > +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, > > + struct sk_buff *msg, > > + struct netlink_ext_ack *extack, > > + bool *msg_updated) > > +{ > > + u32 max_io_eqs; > > + int err; > > + > > + if (!port->ops->port_fn_max_io_eqs_get) > > + return 0; > > + > > + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, > extack); > > + if (err) { > > + if (err == -EOPNOTSUPP) > > + return 0; > > Docs above says: > * Note: Driver should return -EOPNOTSUPP if it doesn't support > * port function (@port_fn_*) handling for a particular port. > > But here you're returning 0 in both cases of no port_fn_max_io_eqs_get or > port_fn_max_io_eqs_get() returns EOPNOTSUPP. > When the port does not support this op, the function pointer is null and, 0 is returned as expected. When the port for some reason has the ops function pointer set for a port, but if the port does not support the ops, it will return ENOPNOTSUPP. This may be possible when the driver has chosen to use same ops callback structure for multiple port flavors. This code pattern is likely left over code of relatively recent work that moved ops from devlink instance to per port ops. I propose to keep the current check as done in this patch, and run a full audit of all the drivers, if all drivers have moved to per port ops, then simplify the code to drop the check for EOPNOTSUPP in a new series that may touch more drivers. Otherwise, we may end up failing the port show operation when it returns - ENOPNOTSUPP. > > + return err; > > + } > > + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, > max_io_eqs); > > + if (err) > > + return err; > > + *msg_updated = true; > > + return 0; > > +} > > + > > int devlink_nl_port_handle_fill(struct sk_buff *msg, struct > > devlink_port *devlink_port) { > > if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 > > +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port > *devlink_port, > > return 0; > > } > > > > +static int > > +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, > > + const struct nlattr *attr, > > + struct netlink_ext_ack *extack) { > > + u32 max_io_eqs; > > + > > + max_io_eqs = nla_get_u32(attr); > > + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, > > + max_io_eqs, extack); > > +} > > + > > static int > > devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > devlink_port *port, > > struct netlink_ext_ack *extack) @@ -428,6 > +465,9 @@ > > devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > devlink_port *por > > if (err) > > goto out; > > err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); > > + if (err) > > + goto out; > > + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, > > +&msg_updated); > > if (err) > > goto out; > > err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ -726,6 > > +766,12 @@ static int devlink_port_function_validate(struct devlink_port > *devlink_port, > > } > > } > > } > > + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && > > + !ops->port_fn_max_io_eqs_set) { > > + NL_SET_ERR_MSG_ATTR(extack, > tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], > > + "Function does not support max_io_eqs > setting"); > > + return -EOPNOTSUPP; > > + } > > return 0; > > } > > > > @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct > devlink_port *port, > > return err; > > } > > > > + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; > > + if (attr) { > > + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); > > + if (err) > > + return err; > > + } > > + > > /* Keep this as the last function attribute set, so that when > > * multiple port function attributes are set along with state, > > * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 17:58 ` Parav Pandit @ 2024-04-04 18:40 ` David Wei 2024-04-04 18:50 ` Parav Pandit 0 siblings, 1 reply; 18+ messages in thread From: David Wei @ 2024-04-04 18:40 UTC (permalink / raw) To: Parav Pandit, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko On 2024-04-04 10:58, Parav Pandit wrote: > Hi David, > >> From: David Wei <dw@davidwei.uk> >> Sent: Thursday, April 4, 2024 10:29 PM >> >> On 2024-04-03 10:41, Parav Pandit wrote: >>> Many devices send event notifications for the IO queues, such as tx >>> and rx queues, through event queues. >>> >>> Enable a privileged owner, such as a hypervisor PF, to set the number >>> of IO event queues for the VF and SF during the provisioning stage. >>> >>> example: >>> Get maximum IO event queues of the VF device:: >>> >>> $ devlink port show pci/0000:06:00.0/2 >>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 >> vfnum 1 >>> function: >>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs >>> 10 >>> >>> Set maximum IO event queues of the VF device:: >>> >>> $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 >>> >>> $ devlink port show pci/0000:06:00.0/2 >>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 >> vfnum 1 >>> function: >>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs >>> 32 >>> >>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> >>> Reviewed-by: Shay Drory <shayd@nvidia.com> >>> Signed-off-by: Parav Pandit <parav@nvidia.com> >>> --- >>> changelog: >>> v2->v3: >>> - limited 80 chars per line >>> v1->v2: >>> - limited comment to 80 chars per line in header file >>> --- >>> .../networking/devlink/devlink-port.rst | 25 +++++++++ >>> include/net/devlink.h | 14 +++++ >>> include/uapi/linux/devlink.h | 1 + >>> net/devlink/port.c | 53 +++++++++++++++++++ >>> 4 files changed, 93 insertions(+) >>> >>> diff --git a/Documentation/networking/devlink/devlink-port.rst >>> b/Documentation/networking/devlink/devlink-port.rst >>> index 562f46b41274..451f57393f11 100644 >>> --- a/Documentation/networking/devlink/devlink-port.rst >>> +++ b/Documentation/networking/devlink/devlink-port.rst >>> @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of >>> the function using Users may also set the IPsec packet capability of >>> the function using `devlink port function set ipsec_packet` command. >>> >>> +Users may also set the maximum IO event queues of the function using >>> +`devlink port function set max_io_eqs` command. >>> + >>> Function attributes >>> =================== >>> >>> @@ -295,6 +298,28 @@ policy is processed in software by the kernel. >>> function: >>> hw_addr 00:00:00:00:00:00 ipsec_packet enabled >>> >>> +Maximum IO events queues setup >>> +------------------------------ >>> +When user sets maximum number of IO event queues for a SF or a VF, >>> +such function driver is limited to consume only enforced number of IO >>> +event queues. >>> + >>> +- Get maximum IO event queues of the VF device:: >>> + >>> + $ devlink port show pci/0000:06:00.0/2 >>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum >> 0 vfnum 1 >>> + function: >>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled >>> + max_io_eqs 10 >>> + >>> +- Set maximum IO event queues of the VF device:: >>> + >>> + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 >>> + >>> + $ devlink port show pci/0000:06:00.0/2 >>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum >> 0 vfnum 1 >>> + function: >>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled >>> + max_io_eqs 32 >>> + >>> Subfunction >>> ============ >>> >>> diff --git a/include/net/devlink.h b/include/net/devlink.h index >>> 9ac394bdfbe4..bb1af599d101 100644 >>> --- a/include/net/devlink.h >>> +++ b/include/net/devlink.h >>> @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); >>> * capability. Should be used by device drivers to >>> * enable/disable ipsec_packet capability of a >>> * function managed by the devlink port. >>> + * @port_fn_max_io_eqs_get: Callback used to get port function's >> maximum number >>> + * of event queues. Should be used by device drivers >> to >>> + * report the maximum event queues of a function >>> + * managed by the devlink port. >>> + * @port_fn_max_io_eqs_set: Callback used to set port function's >> maximum number >>> + * of event queues. Should be used by device drivers >> to >>> + * configure maximum number of event queues >>> + * of a function managed by the devlink port. >>> * >>> * Note: Driver should return -EOPNOTSUPP if it doesn't support >>> * port function (@port_fn_*) handling for a particular port. >>> @@ -1651,6 +1659,12 @@ struct devlink_port_ops { >>> int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, >>> bool enable, >>> struct netlink_ext_ack *extack); >>> + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, >>> + u32 *max_eqs, >>> + struct netlink_ext_ack *extack); >>> + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, >>> + u32 max_eqs, >>> + struct netlink_ext_ack *extack); >>> }; >>> >>> void devlink_port_init(struct devlink *devlink, diff --git >>> a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index >>> 2da0c7eb6710..9401aa343673 100644 >>> --- a/include/uapi/linux/devlink.h >>> +++ b/include/uapi/linux/devlink.h >>> @@ -686,6 +686,7 @@ enum devlink_port_function_attr { >>> DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ >>> DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ >>> DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ >>> + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ >>> >>> __DEVLINK_PORT_FUNCTION_ATTR_MAX, >>> DEVLINK_PORT_FUNCTION_ATTR_MAX = >> __DEVLINK_PORT_FUNCTION_ATTR_MAX - >>> 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index >>> 118d130d2afd..be9158b4453c 100644 >>> --- a/net/devlink/port.c >>> +++ b/net/devlink/port.c >>> @@ -16,6 +16,7 @@ static const struct nla_policy >> devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ >>> DEVLINK_PORT_FN_STATE_ACTIVE), >>> [DEVLINK_PORT_FN_ATTR_CAPS] = >>> >> NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), >>> + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, >>> }; >>> >>> #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) >> \ >>> @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct >> devlink_port *devlink_port, >>> return 0; >>> } >>> >>> +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, >>> + struct sk_buff *msg, >>> + struct netlink_ext_ack *extack, >>> + bool *msg_updated) >>> +{ >>> + u32 max_io_eqs; >>> + int err; >>> + >>> + if (!port->ops->port_fn_max_io_eqs_get) >>> + return 0; >>> + >>> + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, >> extack); >>> + if (err) { >>> + if (err == -EOPNOTSUPP) >>> + return 0; >> >> Docs above says: >> * Note: Driver should return -EOPNOTSUPP if it doesn't support >> * port function (@port_fn_*) handling for a particular port. >> >> But here you're returning 0 in both cases of no port_fn_max_io_eqs_get or >> port_fn_max_io_eqs_get() returns EOPNOTSUPP. >> > When the port does not support this op, the function pointer is null and, 0 is returned as expected. > > When the port for some reason has the ops function pointer set for a port, but if the port does not support the ops, it will return ENOPNOTSUPP. > This may be possible when the driver has chosen to use same ops callback structure for multiple port flavors. > > This code pattern is likely left over code of relatively recent work that moved ops from devlink instance to per port ops. > I propose to keep the current check as done in this patch, > and run a full audit of all the drivers, if all drivers have moved to per port ops, then simplify the code to drop the check for EOPNOTSUPP in a new series that may touch more drivers. > Otherwise, we may end up failing the port show operation when it returns - ENOPNOTSUPP. Thanks for the explanation. So ideally each port flavour has its own unique set of struct ops, and if something is not supported then don't set the func ptr in the struct ops. Yes, I see that 0 has to be returned for devlink_port_fn_caps_fill() to succeed. Had a brief look and there's only a handful of drivers (mlx, nfp, ice) that use devlink_port_ops. > >>> + return err; >>> + } >>> + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, >> max_io_eqs); >>> + if (err) >>> + return err; >>> + *msg_updated = true; >>> + return 0; >>> +} >>> + >>> int devlink_nl_port_handle_fill(struct sk_buff *msg, struct >>> devlink_port *devlink_port) { >>> if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 >>> +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port >> *devlink_port, >>> return 0; >>> } >>> >>> +static int >>> +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, >>> + const struct nlattr *attr, >>> + struct netlink_ext_ack *extack) { >>> + u32 max_io_eqs; >>> + >>> + max_io_eqs = nla_get_u32(attr); >>> + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, >>> + max_io_eqs, extack); >>> +} >>> + >>> static int >>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct >> devlink_port *port, >>> struct netlink_ext_ack *extack) @@ -428,6 >> +465,9 @@ >>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct >> devlink_port *por >>> if (err) >>> goto out; >>> err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); >>> + if (err) >>> + goto out; >>> + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, >>> +&msg_updated); >>> if (err) >>> goto out; >>> err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ -726,6 >>> +766,12 @@ static int devlink_port_function_validate(struct devlink_port >> *devlink_port, >>> } >>> } >>> } >>> + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && >>> + !ops->port_fn_max_io_eqs_set) { >>> + NL_SET_ERR_MSG_ATTR(extack, >> tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], >>> + "Function does not support max_io_eqs >> setting"); >>> + return -EOPNOTSUPP; >>> + } >>> return 0; >>> } >>> >>> @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct >> devlink_port *port, >>> return err; >>> } >>> >>> + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; >>> + if (attr) { >>> + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); >>> + if (err) >>> + return err; >>> + } >>> + >>> /* Keep this as the last function attribute set, so that when >>> * multiple port function attributes are set along with state, >>> * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 18:40 ` David Wei @ 2024-04-04 18:50 ` Parav Pandit 2024-04-04 18:52 ` David Wei 0 siblings, 1 reply; 18+ messages in thread From: Parav Pandit @ 2024-04-04 18:50 UTC (permalink / raw) To: David Wei, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko > From: David Wei <dw@davidwei.uk> > Sent: Friday, April 5, 2024 12:10 AM > > On 2024-04-04 10:58, Parav Pandit wrote: > > Hi David, > > > >> From: David Wei <dw@davidwei.uk> > >> Sent: Thursday, April 4, 2024 10:29 PM > >> > >> On 2024-04-03 10:41, Parav Pandit wrote: > >>> Many devices send event notifications for the IO queues, such as tx > >>> and rx queues, through event queues. > >>> > >>> Enable a privileged owner, such as a hypervisor PF, to set the > >>> number of IO event queues for the VF and SF during the provisioning > stage. > >>> > >>> example: > >>> Get maximum IO event queues of the VF device:: > >>> > >>> $ devlink port show pci/0000:06:00.0/2 > >>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>> pfnum 0 > >> vfnum 1 > >>> function: > >>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs > >>> 10 > >>> > >>> Set maximum IO event queues of the VF device:: > >>> > >>> $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > >>> > >>> $ devlink port show pci/0000:06:00.0/2 > >>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>> pfnum 0 > >> vfnum 1 > >>> function: > >>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs > >>> 32 > >>> > >>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> > >>> Reviewed-by: Shay Drory <shayd@nvidia.com> > >>> Signed-off-by: Parav Pandit <parav@nvidia.com> > >>> --- > >>> changelog: > >>> v2->v3: > >>> - limited 80 chars per line > >>> v1->v2: > >>> - limited comment to 80 chars per line in header file > >>> --- > >>> .../networking/devlink/devlink-port.rst | 25 +++++++++ > >>> include/net/devlink.h | 14 +++++ > >>> include/uapi/linux/devlink.h | 1 + > >>> net/devlink/port.c | 53 +++++++++++++++++++ > >>> 4 files changed, 93 insertions(+) > >>> > >>> diff --git a/Documentation/networking/devlink/devlink-port.rst > >>> b/Documentation/networking/devlink/devlink-port.rst > >>> index 562f46b41274..451f57393f11 100644 > >>> --- a/Documentation/networking/devlink/devlink-port.rst > >>> +++ b/Documentation/networking/devlink/devlink-port.rst > >>> @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability > >>> of the function using Users may also set the IPsec packet > >>> capability of the function using `devlink port function set ipsec_packet` > command. > >>> > >>> +Users may also set the maximum IO event queues of the function > >>> +using `devlink port function set max_io_eqs` command. > >>> + > >>> Function attributes > >>> =================== > >>> > >>> @@ -295,6 +298,28 @@ policy is processed in software by the kernel. > >>> function: > >>> hw_addr 00:00:00:00:00:00 ipsec_packet enabled > >>> > >>> +Maximum IO events queues setup > >>> +------------------------------ > >>> +When user sets maximum number of IO event queues for a SF or a VF, > >>> +such function driver is limited to consume only enforced number of > >>> +IO event queues. > >>> + > >>> +- Get maximum IO event queues of the VF device:: > >>> + > >>> + $ devlink port show pci/0000:06:00.0/2 > >>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>> + pfnum > >> 0 vfnum 1 > >>> + function: > >>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>> + max_io_eqs 10 > >>> + > >>> +- Set maximum IO event queues of the VF device:: > >>> + > >>> + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > >>> + > >>> + $ devlink port show pci/0000:06:00.0/2 > >>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>> + pfnum > >> 0 vfnum 1 > >>> + function: > >>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>> + max_io_eqs 32 > >>> + > >>> Subfunction > >>> ============ > >>> > >>> diff --git a/include/net/devlink.h b/include/net/devlink.h index > >>> 9ac394bdfbe4..bb1af599d101 100644 > >>> --- a/include/net/devlink.h > >>> +++ b/include/net/devlink.h > >>> @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); > >>> * capability. Should be used by device drivers to > >>> * enable/disable ipsec_packet capability of a > >>> * function managed by the devlink port. > >>> + * @port_fn_max_io_eqs_get: Callback used to get port function's > >> maximum number > >>> + * of event queues. Should be used by device drivers > >> to > >>> + * report the maximum event queues of a function > >>> + * managed by the devlink port. > >>> + * @port_fn_max_io_eqs_set: Callback used to set port function's > >> maximum number > >>> + * of event queues. Should be used by device drivers > >> to > >>> + * configure maximum number of event queues > >>> + * of a function managed by the devlink port. > >>> * > >>> * Note: Driver should return -EOPNOTSUPP if it doesn't support > >>> * port function (@port_fn_*) handling for a particular port. > >>> @@ -1651,6 +1659,12 @@ struct devlink_port_ops { > >>> int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, > >>> bool enable, > >>> struct netlink_ext_ack *extack); > >>> + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, > >>> + u32 *max_eqs, > >>> + struct netlink_ext_ack *extack); > >>> + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, > >>> + u32 max_eqs, > >>> + struct netlink_ext_ack *extack); > >>> }; > >>> > >>> void devlink_port_init(struct devlink *devlink, diff --git > >>> a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index > >>> 2da0c7eb6710..9401aa343673 100644 > >>> --- a/include/uapi/linux/devlink.h > >>> +++ b/include/uapi/linux/devlink.h > >>> @@ -686,6 +686,7 @@ enum devlink_port_function_attr { > >>> DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ > >>> DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ > >>> DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ > >>> + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ > >>> > >>> __DEVLINK_PORT_FUNCTION_ATTR_MAX, > >>> DEVLINK_PORT_FUNCTION_ATTR_MAX = > >> __DEVLINK_PORT_FUNCTION_ATTR_MAX - > >>> 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index > >>> 118d130d2afd..be9158b4453c 100644 > >>> --- a/net/devlink/port.c > >>> +++ b/net/devlink/port.c > >>> @@ -16,6 +16,7 @@ static const struct nla_policy > >> devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ > >>> DEVLINK_PORT_FN_STATE_ACTIVE), > >>> [DEVLINK_PORT_FN_ATTR_CAPS] = > >>> > >> NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), > >>> + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, > >>> }; > >>> > >>> #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) > >> \ > >>> @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct > >> devlink_port *devlink_port, > >>> return 0; > >>> } > >>> > >>> +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, > >>> + struct sk_buff *msg, > >>> + struct netlink_ext_ack *extack, > >>> + bool *msg_updated) > >>> +{ > >>> + u32 max_io_eqs; > >>> + int err; > >>> + > >>> + if (!port->ops->port_fn_max_io_eqs_get) > >>> + return 0; > >>> + > >>> + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, > >> extack); > >>> + if (err) { > >>> + if (err == -EOPNOTSUPP) > >>> + return 0; > >> > >> Docs above says: > >> * Note: Driver should return -EOPNOTSUPP if it doesn't support > >> * port function (@port_fn_*) handling for a particular port. > >> > >> But here you're returning 0 in both cases of no > >> port_fn_max_io_eqs_get or > >> port_fn_max_io_eqs_get() returns EOPNOTSUPP. > >> > > When the port does not support this op, the function pointer is null and, 0 > is returned as expected. > > > > When the port for some reason has the ops function pointer set for a port, > but if the port does not support the ops, it will return ENOPNOTSUPP. > > This may be possible when the driver has chosen to use same ops callback > structure for multiple port flavors. > > > > This code pattern is likely left over code of relatively recent work that > moved ops from devlink instance to per port ops. > > I propose to keep the current check as done in this patch, and run a > > full audit of all the drivers, if all drivers have moved to per port ops, then > simplify the code to drop the check for EOPNOTSUPP in a new series that > may touch more drivers. > > Otherwise, we may end up failing the port show operation when it returns > - ENOPNOTSUPP. > > Thanks for the explanation. So ideally each port flavour has its own unique > set of struct ops, and if something is not supported then don't set the func > ptr in the struct ops. > > Yes, I see that 0 has to be returned for devlink_port_fn_caps_fill() to succeed. > > Had a brief look and there's only a handful of drivers (mlx, nfp, ice) that use > devlink_port_ops. > And netdevsim too. :) Can we please the cleanup in a separate series? Post this may be later in the month, I may have cycles to audit and simplify. Would it be ok with you? > > > >>> + return err; > >>> + } > >>> + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, > >> max_io_eqs); > >>> + if (err) > >>> + return err; > >>> + *msg_updated = true; > >>> + return 0; > >>> +} > >>> + > >>> int devlink_nl_port_handle_fill(struct sk_buff *msg, struct > >>> devlink_port *devlink_port) { > >>> if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 > >>> +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port > >> *devlink_port, > >>> return 0; > >>> } > >>> > >>> +static int > >>> +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, > >>> + const struct nlattr *attr, > >>> + struct netlink_ext_ack *extack) { > >>> + u32 max_io_eqs; > >>> + > >>> + max_io_eqs = nla_get_u32(attr); > >>> + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, > >>> + max_io_eqs, extack); > >>> +} > >>> + > >>> static int > >>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > >> devlink_port *port, > >>> struct netlink_ext_ack *extack) @@ -428,6 > >> +465,9 @@ > >>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > >> devlink_port *por > >>> if (err) > >>> goto out; > >>> err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); > >>> + if (err) > >>> + goto out; > >>> + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, > >>> +&msg_updated); > >>> if (err) > >>> goto out; > >>> err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ -726,6 > >>> +766,12 @@ static int devlink_port_function_validate(struct > >>> +devlink_port > >> *devlink_port, > >>> } > >>> } > >>> } > >>> + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && > >>> + !ops->port_fn_max_io_eqs_set) { > >>> + NL_SET_ERR_MSG_ATTR(extack, > >> tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], > >>> + "Function does not support max_io_eqs > >> setting"); > >>> + return -EOPNOTSUPP; > >>> + } > >>> return 0; > >>> } > >>> > >>> @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct > >> devlink_port *port, > >>> return err; > >>> } > >>> > >>> + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; > >>> + if (attr) { > >>> + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); > >>> + if (err) > >>> + return err; > >>> + } > >>> + > >>> /* Keep this as the last function attribute set, so that when > >>> * multiple port function attributes are set along with state, > >>> * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 18:50 ` Parav Pandit @ 2024-04-04 18:52 ` David Wei 2024-04-04 18:55 ` Parav Pandit 0 siblings, 1 reply; 18+ messages in thread From: David Wei @ 2024-04-04 18:52 UTC (permalink / raw) To: Parav Pandit, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko On 2024-04-04 11:50, Parav Pandit wrote: > > >> From: David Wei <dw@davidwei.uk> >> Sent: Friday, April 5, 2024 12:10 AM >> >> On 2024-04-04 10:58, Parav Pandit wrote: >>> Hi David, >>> >>>> From: David Wei <dw@davidwei.uk> >>>> Sent: Thursday, April 4, 2024 10:29 PM >>>> >>>> On 2024-04-03 10:41, Parav Pandit wrote: >>>>> Many devices send event notifications for the IO queues, such as tx >>>>> and rx queues, through event queues. >>>>> >>>>> Enable a privileged owner, such as a hypervisor PF, to set the >>>>> number of IO event queues for the VF and SF during the provisioning >> stage. >>>>> >>>>> example: >>>>> Get maximum IO event queues of the VF device:: >>>>> >>>>> $ devlink port show pci/0000:06:00.0/2 >>>>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf >>>>> pfnum 0 >>>> vfnum 1 >>>>> function: >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs >>>>> 10 >>>>> >>>>> Set maximum IO event queues of the VF device:: >>>>> >>>>> $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 >>>>> >>>>> $ devlink port show pci/0000:06:00.0/2 >>>>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf >>>>> pfnum 0 >>>> vfnum 1 >>>>> function: >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs >>>>> 32 >>>>> >>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> >>>>> Reviewed-by: Shay Drory <shayd@nvidia.com> >>>>> Signed-off-by: Parav Pandit <parav@nvidia.com> >>>>> --- >>>>> changelog: >>>>> v2->v3: >>>>> - limited 80 chars per line >>>>> v1->v2: >>>>> - limited comment to 80 chars per line in header file >>>>> --- >>>>> .../networking/devlink/devlink-port.rst | 25 +++++++++ >>>>> include/net/devlink.h | 14 +++++ >>>>> include/uapi/linux/devlink.h | 1 + >>>>> net/devlink/port.c | 53 +++++++++++++++++++ >>>>> 4 files changed, 93 insertions(+) >>>>> >>>>> diff --git a/Documentation/networking/devlink/devlink-port.rst >>>>> b/Documentation/networking/devlink/devlink-port.rst >>>>> index 562f46b41274..451f57393f11 100644 >>>>> --- a/Documentation/networking/devlink/devlink-port.rst >>>>> +++ b/Documentation/networking/devlink/devlink-port.rst >>>>> @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability >>>>> of the function using Users may also set the IPsec packet >>>>> capability of the function using `devlink port function set ipsec_packet` >> command. >>>>> >>>>> +Users may also set the maximum IO event queues of the function >>>>> +using `devlink port function set max_io_eqs` command. >>>>> + >>>>> Function attributes >>>>> =================== >>>>> >>>>> @@ -295,6 +298,28 @@ policy is processed in software by the kernel. >>>>> function: >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet enabled >>>>> >>>>> +Maximum IO events queues setup >>>>> +------------------------------ >>>>> +When user sets maximum number of IO event queues for a SF or a VF, >>>>> +such function driver is limited to consume only enforced number of >>>>> +IO event queues. >>>>> + >>>>> +- Get maximum IO event queues of the VF device:: >>>>> + >>>>> + $ devlink port show pci/0000:06:00.0/2 >>>>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf >>>>> + pfnum >>>> 0 vfnum 1 >>>>> + function: >>>>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled >>>>> + max_io_eqs 10 >>>>> + >>>>> +- Set maximum IO event queues of the VF device:: >>>>> + >>>>> + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 >>>>> + >>>>> + $ devlink port show pci/0000:06:00.0/2 >>>>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf >>>>> + pfnum >>>> 0 vfnum 1 >>>>> + function: >>>>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled >>>>> + max_io_eqs 32 >>>>> + >>>>> Subfunction >>>>> ============ >>>>> >>>>> diff --git a/include/net/devlink.h b/include/net/devlink.h index >>>>> 9ac394bdfbe4..bb1af599d101 100644 >>>>> --- a/include/net/devlink.h >>>>> +++ b/include/net/devlink.h >>>>> @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); >>>>> * capability. Should be used by device drivers to >>>>> * enable/disable ipsec_packet capability of a >>>>> * function managed by the devlink port. >>>>> + * @port_fn_max_io_eqs_get: Callback used to get port function's >>>> maximum number >>>>> + * of event queues. Should be used by device drivers >>>> to >>>>> + * report the maximum event queues of a function >>>>> + * managed by the devlink port. >>>>> + * @port_fn_max_io_eqs_set: Callback used to set port function's >>>> maximum number >>>>> + * of event queues. Should be used by device drivers >>>> to >>>>> + * configure maximum number of event queues >>>>> + * of a function managed by the devlink port. >>>>> * >>>>> * Note: Driver should return -EOPNOTSUPP if it doesn't support >>>>> * port function (@port_fn_*) handling for a particular port. >>>>> @@ -1651,6 +1659,12 @@ struct devlink_port_ops { >>>>> int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, >>>>> bool enable, >>>>> struct netlink_ext_ack *extack); >>>>> + int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port, >>>>> + u32 *max_eqs, >>>>> + struct netlink_ext_ack *extack); >>>>> + int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port, >>>>> + u32 max_eqs, >>>>> + struct netlink_ext_ack *extack); >>>>> }; >>>>> >>>>> void devlink_port_init(struct devlink *devlink, diff --git >>>>> a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h index >>>>> 2da0c7eb6710..9401aa343673 100644 >>>>> --- a/include/uapi/linux/devlink.h >>>>> +++ b/include/uapi/linux/devlink.h >>>>> @@ -686,6 +686,7 @@ enum devlink_port_function_attr { >>>>> DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ >>>>> DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ >>>>> DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ >>>>> + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ >>>>> >>>>> __DEVLINK_PORT_FUNCTION_ATTR_MAX, >>>>> DEVLINK_PORT_FUNCTION_ATTR_MAX = >>>> __DEVLINK_PORT_FUNCTION_ATTR_MAX - >>>>> 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index >>>>> 118d130d2afd..be9158b4453c 100644 >>>>> --- a/net/devlink/port.c >>>>> +++ b/net/devlink/port.c >>>>> @@ -16,6 +16,7 @@ static const struct nla_policy >>>> devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ >>>>> DEVLINK_PORT_FN_STATE_ACTIVE), >>>>> [DEVLINK_PORT_FN_ATTR_CAPS] = >>>>> >>>> NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), >>>>> + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 }, >>>>> }; >>>>> >>>>> #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) >>>> \ >>>>> @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct >>>> devlink_port *devlink_port, >>>>> return 0; >>>>> } >>>>> >>>>> +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, >>>>> + struct sk_buff *msg, >>>>> + struct netlink_ext_ack *extack, >>>>> + bool *msg_updated) >>>>> +{ >>>>> + u32 max_io_eqs; >>>>> + int err; >>>>> + >>>>> + if (!port->ops->port_fn_max_io_eqs_get) >>>>> + return 0; >>>>> + >>>>> + err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, >>>> extack); >>>>> + if (err) { >>>>> + if (err == -EOPNOTSUPP) >>>>> + return 0; >>>> >>>> Docs above says: >>>> * Note: Driver should return -EOPNOTSUPP if it doesn't support >>>> * port function (@port_fn_*) handling for a particular port. >>>> >>>> But here you're returning 0 in both cases of no >>>> port_fn_max_io_eqs_get or >>>> port_fn_max_io_eqs_get() returns EOPNOTSUPP. >>>> >>> When the port does not support this op, the function pointer is null and, 0 >> is returned as expected. >>> >>> When the port for some reason has the ops function pointer set for a port, >> but if the port does not support the ops, it will return ENOPNOTSUPP. >>> This may be possible when the driver has chosen to use same ops callback >> structure for multiple port flavors. >>> >>> This code pattern is likely left over code of relatively recent work that >> moved ops from devlink instance to per port ops. >>> I propose to keep the current check as done in this patch, and run a >>> full audit of all the drivers, if all drivers have moved to per port ops, then >> simplify the code to drop the check for EOPNOTSUPP in a new series that >> may touch more drivers. >>> Otherwise, we may end up failing the port show operation when it returns >> - ENOPNOTSUPP. >> >> Thanks for the explanation. So ideally each port flavour has its own unique >> set of struct ops, and if something is not supported then don't set the func >> ptr in the struct ops. >> >> Yes, I see that 0 has to be returned for devlink_port_fn_caps_fill() to succeed. >> >> Had a brief look and there's only a handful of drivers (mlx, nfp, ice) that use >> devlink_port_ops. >> > And netdevsim too. :) > Can we please the cleanup in a separate series? > Post this may be later in the month, I may have cycles to audit and simplify. > Would it be ok with you? Yeah of course, thanks. I'm making some changes in netdevsim and can do a drive-by cleanup there too. > >>> >>>>> + return err; >>>>> + } >>>>> + err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, >>>> max_io_eqs); >>>>> + if (err) >>>>> + return err; >>>>> + *msg_updated = true; >>>>> + return 0; >>>>> +} >>>>> + >>>>> int devlink_nl_port_handle_fill(struct sk_buff *msg, struct >>>>> devlink_port *devlink_port) { >>>>> if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 >>>>> +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port >>>> *devlink_port, >>>>> return 0; >>>>> } >>>>> >>>>> +static int >>>>> +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, >>>>> + const struct nlattr *attr, >>>>> + struct netlink_ext_ack *extack) { >>>>> + u32 max_io_eqs; >>>>> + >>>>> + max_io_eqs = nla_get_u32(attr); >>>>> + return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port, >>>>> + max_io_eqs, extack); >>>>> +} >>>>> + >>>>> static int >>>>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct >>>> devlink_port *port, >>>>> struct netlink_ext_ack *extack) @@ -428,6 >>>> +465,9 @@ >>>>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct >>>> devlink_port *por >>>>> if (err) >>>>> goto out; >>>>> err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated); >>>>> + if (err) >>>>> + goto out; >>>>> + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, >>>>> +&msg_updated); >>>>> if (err) >>>>> goto out; >>>>> err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ -726,6 >>>>> +766,12 @@ static int devlink_port_function_validate(struct >>>>> +devlink_port >>>> *devlink_port, >>>>> } >>>>> } >>>>> } >>>>> + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && >>>>> + !ops->port_fn_max_io_eqs_set) { >>>>> + NL_SET_ERR_MSG_ATTR(extack, >>>> tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], >>>>> + "Function does not support max_io_eqs >>>> setting"); >>>>> + return -EOPNOTSUPP; >>>>> + } >>>>> return 0; >>>>> } >>>>> >>>>> @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct >>>> devlink_port *port, >>>>> return err; >>>>> } >>>>> >>>>> + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; >>>>> + if (attr) { >>>>> + err = devlink_port_fn_max_io_eqs_set(port, attr, extack); >>>>> + if (err) >>>>> + return err; >>>>> + } >>>>> + >>>>> /* Keep this as the last function attribute set, so that when >>>>> * multiple port function attributes are set along with state, >>>>> * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 18:52 ` David Wei @ 2024-04-04 18:55 ` Parav Pandit 2024-04-05 2:22 ` Jakub Kicinski 0 siblings, 1 reply; 18+ messages in thread From: Parav Pandit @ 2024-04-04 18:55 UTC (permalink / raw) To: David Wei, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko > From: David Wei <dw@davidwei.uk> > Sent: Friday, April 5, 2024 12:23 AM > > On 2024-04-04 11:50, Parav Pandit wrote: > > > > > >> From: David Wei <dw@davidwei.uk> > >> Sent: Friday, April 5, 2024 12:10 AM > >> > >> On 2024-04-04 10:58, Parav Pandit wrote: > >>> Hi David, > >>> > >>>> From: David Wei <dw@davidwei.uk> > >>>> Sent: Thursday, April 4, 2024 10:29 PM > >>>> > >>>> On 2024-04-03 10:41, Parav Pandit wrote: > >>>>> Many devices send event notifications for the IO queues, such as > >>>>> tx and rx queues, through event queues. > >>>>> > >>>>> Enable a privileged owner, such as a hypervisor PF, to set the > >>>>> number of IO event queues for the VF and SF during the > >>>>> provisioning > >> stage. > >>>>> > >>>>> example: > >>>>> Get maximum IO event queues of the VF device:: > >>>>> > >>>>> $ devlink port show pci/0000:06:00.0/2 > >>>>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>>>> pfnum 0 > >>>> vfnum 1 > >>>>> function: > >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>>>> max_io_eqs > >>>>> 10 > >>>>> > >>>>> Set maximum IO event queues of the VF device:: > >>>>> > >>>>> $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > >>>>> > >>>>> $ devlink port show pci/0000:06:00.0/2 > >>>>> pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf > >>>>> pfnum 0 > >>>> vfnum 1 > >>>>> function: > >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>>>> max_io_eqs > >>>>> 32 > >>>>> > >>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> > >>>>> Reviewed-by: Shay Drory <shayd@nvidia.com> > >>>>> Signed-off-by: Parav Pandit <parav@nvidia.com> > >>>>> --- > >>>>> changelog: > >>>>> v2->v3: > >>>>> - limited 80 chars per line > >>>>> v1->v2: > >>>>> - limited comment to 80 chars per line in header file > >>>>> --- > >>>>> .../networking/devlink/devlink-port.rst | 25 +++++++++ > >>>>> include/net/devlink.h | 14 +++++ > >>>>> include/uapi/linux/devlink.h | 1 + > >>>>> net/devlink/port.c | 53 +++++++++++++++++++ > >>>>> 4 files changed, 93 insertions(+) > >>>>> > >>>>> diff --git a/Documentation/networking/devlink/devlink-port.rst > >>>>> b/Documentation/networking/devlink/devlink-port.rst > >>>>> index 562f46b41274..451f57393f11 100644 > >>>>> --- a/Documentation/networking/devlink/devlink-port.rst > >>>>> +++ b/Documentation/networking/devlink/devlink-port.rst > >>>>> @@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability > >>>>> of the function using Users may also set the IPsec packet > >>>>> capability of the function using `devlink port function set > >>>>> ipsec_packet` > >> command. > >>>>> > >>>>> +Users may also set the maximum IO event queues of the function > >>>>> +using `devlink port function set max_io_eqs` command. > >>>>> + > >>>>> Function attributes > >>>>> =================== > >>>>> > >>>>> @@ -295,6 +298,28 @@ policy is processed in software by the kernel. > >>>>> function: > >>>>> hw_addr 00:00:00:00:00:00 ipsec_packet enabled > >>>>> > >>>>> +Maximum IO events queues setup > >>>>> +------------------------------ > >>>>> +When user sets maximum number of IO event queues for a SF or a > >>>>> +VF, such function driver is limited to consume only enforced > >>>>> +number of IO event queues. > >>>>> + > >>>>> +- Get maximum IO event queues of the VF device:: > >>>>> + > >>>>> + $ devlink port show pci/0000:06:00.0/2 > >>>>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour > >>>>> + pcivf pfnum > >>>> 0 vfnum 1 > >>>>> + function: > >>>>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>>>> + max_io_eqs 10 > >>>>> + > >>>>> +- Set maximum IO event queues of the VF device:: > >>>>> + > >>>>> + $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 > >>>>> + > >>>>> + $ devlink port show pci/0000:06:00.0/2 > >>>>> + pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour > >>>>> + pcivf pfnum > >>>> 0 vfnum 1 > >>>>> + function: > >>>>> + hw_addr 00:00:00:00:00:00 ipsec_packet disabled > >>>>> + max_io_eqs 32 > >>>>> + > >>>>> Subfunction > >>>>> ============ > >>>>> > >>>>> diff --git a/include/net/devlink.h b/include/net/devlink.h index > >>>>> 9ac394bdfbe4..bb1af599d101 100644 > >>>>> --- a/include/net/devlink.h > >>>>> +++ b/include/net/devlink.h > >>>>> @@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink); > >>>>> * capability. Should be used by device > drivers to > >>>>> * enable/disable ipsec_packet capability of > a > >>>>> * function managed by the devlink port. > >>>>> + * @port_fn_max_io_eqs_get: Callback used to get port function's > >>>> maximum number > >>>>> + * of event queues. Should be used by device > drivers > >>>> to > >>>>> + * report the maximum event queues of a > function > >>>>> + * managed by the devlink port. > >>>>> + * @port_fn_max_io_eqs_set: Callback used to set port function's > >>>> maximum number > >>>>> + * of event queues. Should be used by device > drivers > >>>> to > >>>>> + * configure maximum number of event > queues > >>>>> + * of a function managed by the devlink port. > >>>>> * > >>>>> * Note: Driver should return -EOPNOTSUPP if it doesn't support > >>>>> * port function (@port_fn_*) handling for a particular port. > >>>>> @@ -1651,6 +1659,12 @@ struct devlink_port_ops { > >>>>> int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port, > >>>>> bool enable, > >>>>> struct netlink_ext_ack *extack); > >>>>> + int (*port_fn_max_io_eqs_get)(struct devlink_port > *devlink_port, > >>>>> + u32 *max_eqs, > >>>>> + struct netlink_ext_ack *extack); > >>>>> + int (*port_fn_max_io_eqs_set)(struct devlink_port > *devlink_port, > >>>>> + u32 max_eqs, > >>>>> + struct netlink_ext_ack *extack); > >>>>> }; > >>>>> > >>>>> void devlink_port_init(struct devlink *devlink, diff --git > >>>>> a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h > >>>>> index > >>>>> 2da0c7eb6710..9401aa343673 100644 > >>>>> --- a/include/uapi/linux/devlink.h > >>>>> +++ b/include/uapi/linux/devlink.h > >>>>> @@ -686,6 +686,7 @@ enum devlink_port_function_attr { > >>>>> DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */ > >>>>> DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */ > >>>>> DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */ > >>>>> + DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */ > >>>>> > >>>>> __DEVLINK_PORT_FUNCTION_ATTR_MAX, > >>>>> DEVLINK_PORT_FUNCTION_ATTR_MAX = > >>>> __DEVLINK_PORT_FUNCTION_ATTR_MAX - > >>>>> 1 diff --git a/net/devlink/port.c b/net/devlink/port.c index > >>>>> 118d130d2afd..be9158b4453c 100644 > >>>>> --- a/net/devlink/port.c > >>>>> +++ b/net/devlink/port.c > >>>>> @@ -16,6 +16,7 @@ static const struct nla_policy > >>>> devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_ > >>>>> DEVLINK_PORT_FN_STATE_ACTIVE), > >>>>> [DEVLINK_PORT_FN_ATTR_CAPS] = > >>>>> > >>>> NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK), > >>>>> + [DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = > NLA_U32 }, > >>>>> }; > >>>>> > >>>>> #define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) > >>>> \ > >>>>> @@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct > >>>> devlink_port *devlink_port, > >>>>> return 0; > >>>>> } > >>>>> > >>>>> +static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port, > >>>>> + struct sk_buff *msg, > >>>>> + struct netlink_ext_ack > *extack, > >>>>> + bool *msg_updated) > >>>>> +{ > >>>>> + u32 max_io_eqs; > >>>>> + int err; > >>>>> + > >>>>> + if (!port->ops->port_fn_max_io_eqs_get) > >>>>> + return 0; > >>>>> + > >>>>> + err = port->ops->port_fn_max_io_eqs_get(port, > &max_io_eqs, > >>>> extack); > >>>>> + if (err) { > >>>>> + if (err == -EOPNOTSUPP) > >>>>> + return 0; > >>>> > >>>> Docs above says: > >>>> * Note: Driver should return -EOPNOTSUPP if it doesn't support > >>>> * port function (@port_fn_*) handling for a particular port. > >>>> > >>>> But here you're returning 0 in both cases of no > >>>> port_fn_max_io_eqs_get or > >>>> port_fn_max_io_eqs_get() returns EOPNOTSUPP. > >>>> > >>> When the port does not support this op, the function pointer is null > >>> and, 0 > >> is returned as expected. > >>> > >>> When the port for some reason has the ops function pointer set for a > >>> port, > >> but if the port does not support the ops, it will return ENOPNOTSUPP. > >>> This may be possible when the driver has chosen to use same ops > >>> callback > >> structure for multiple port flavors. > >>> > >>> This code pattern is likely left over code of relatively recent work > >>> that > >> moved ops from devlink instance to per port ops. > >>> I propose to keep the current check as done in this patch, and run a > >>> full audit of all the drivers, if all drivers have moved to per port > >>> ops, then > >> simplify the code to drop the check for EOPNOTSUPP in a new series > >> that may touch more drivers. > >>> Otherwise, we may end up failing the port show operation when it > >>> returns > >> - ENOPNOTSUPP. > >> > >> Thanks for the explanation. So ideally each port flavour has its own > >> unique set of struct ops, and if something is not supported then > >> don't set the func ptr in the struct ops. > >> > >> Yes, I see that 0 has to be returned for devlink_port_fn_caps_fill() to > succeed. > >> > >> Had a brief look and there's only a handful of drivers (mlx, nfp, > >> ice) that use devlink_port_ops. > >> > > And netdevsim too. :) > > Can we please the cleanup in a separate series? > > Post this may be later in the month, I may have cycles to audit and simplify. > > Would it be ok with you? > > Yeah of course, thanks. I'm making some changes in netdevsim and can do a > drive-by cleanup there too. > Ok. yes, that will be helpful. Thanks. > > > >>> > >>>>> + return err; > >>>>> + } > >>>>> + err = nla_put_u32(msg, > DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, > >>>> max_io_eqs); > >>>>> + if (err) > >>>>> + return err; > >>>>> + *msg_updated = true; > >>>>> + return 0; > >>>>> +} > >>>>> + > >>>>> int devlink_nl_port_handle_fill(struct sk_buff *msg, struct > >>>>> devlink_port *devlink_port) { > >>>>> if (devlink_nl_put_handle(msg, devlink_port->devlink)) @@ -409,6 > >>>>> +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port > >>>> *devlink_port, > >>>>> return 0; > >>>>> } > >>>>> > >>>>> +static int > >>>>> +devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port, > >>>>> + const struct nlattr *attr, > >>>>> + struct netlink_ext_ack *extack) { > >>>>> + u32 max_io_eqs; > >>>>> + > >>>>> + max_io_eqs = nla_get_u32(attr); > >>>>> + return devlink_port->ops- > >port_fn_max_io_eqs_set(devlink_port, > >>>>> + max_io_eqs, > extack); > >>>>> +} > >>>>> + > >>>>> static int > >>>>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > >>>> devlink_port *port, > >>>>> struct netlink_ext_ack *extack) @@ -428,6 > >>>> +465,9 @@ > >>>>> devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct > >>>> devlink_port *por > >>>>> if (err) > >>>>> goto out; > >>>>> err = devlink_port_fn_state_fill(port, msg, extack, > >>>>> &msg_updated); > >>>>> + if (err) > >>>>> + goto out; > >>>>> + err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, > >>>>> +&msg_updated); > >>>>> if (err) > >>>>> goto out; > >>>>> err = devlink_rel_devlink_handle_put(msg, port->devlink, @@ > >>>>> -726,6 > >>>>> +766,12 @@ static int devlink_port_function_validate(struct > >>>>> +devlink_port > >>>> *devlink_port, > >>>>> } > >>>>> } > >>>>> } > >>>>> + if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] && > >>>>> + !ops->port_fn_max_io_eqs_set) { > >>>>> + NL_SET_ERR_MSG_ATTR(extack, > >>>> tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS], > >>>>> + "Function does not support > max_io_eqs > >>>> setting"); > >>>>> + return -EOPNOTSUPP; > >>>>> + } > >>>>> return 0; > >>>>> } > >>>>> > >>>>> @@ -761,6 +807,13 @@ static int devlink_port_function_set(struct > >>>> devlink_port *port, > >>>>> return err; > >>>>> } > >>>>> > >>>>> + attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS]; > >>>>> + if (attr) { > >>>>> + err = devlink_port_fn_max_io_eqs_set(port, attr, > extack); > >>>>> + if (err) > >>>>> + return err; > >>>>> + } > >>>>> + > >>>>> /* Keep this as the last function attribute set, so that when > >>>>> * multiple port function attributes are set along with state, > >>>>> * Those can be applied first before activating the state. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-04 18:55 ` Parav Pandit @ 2024-04-05 2:22 ` Jakub Kicinski 0 siblings, 0 replies; 18+ messages in thread From: Jakub Kicinski @ 2024-04-05 2:22 UTC (permalink / raw) To: Parav Pandit Cc: David Wei, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com, Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko On Thu, 4 Apr 2024 18:55:09 +0000 Parav Pandit wrote: > > Yeah of course, thanks. I'm making some changes in netdevsim and can do a > > drive-by cleanup there too. > > > Ok. yes, that will be helpful. Thanks. I think that part is fine, TBH. Drivers may not support particular control on particular HW/FW for any number of reasons, we should let them return EOPNOTSUPP. Not sure if there's much benefit for splitting the ops.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-03 17:41 ` [net-next v3 1/2] devlink: Support setting max_io_eqs Parav Pandit 2024-04-04 16:59 ` David Wei @ 2024-04-05 2:21 ` Jakub Kicinski 2024-04-05 3:13 ` Parav Pandit 1 sibling, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2024-04-05 2:21 UTC (permalink / raw) To: Parav Pandit Cc: netdev, davem, edumazet, pabeni, corbet, kalesh-anakkur.purayil, saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma, Jiri Pirko On Wed, 3 Apr 2024 20:41:32 +0300 Parav Pandit wrote: > Many devices send event notifications for the IO queues, > such as tx and rx queues, through event queues. > > Enable a privileged owner, such as a hypervisor PF, to set the number > of IO event queues for the VF and SF during the provisioning stage. What's the relationship between EQ and queue pairs and IRQs? The next patch says "maximum IO event queues which are typically used to derive the maximum and default number of net device channels" It may not be obvious to non-mlx5 experts, I think it needs to be better documented. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-05 2:21 ` Jakub Kicinski @ 2024-04-05 3:13 ` Parav Pandit 2024-04-05 14:13 ` Jakub Kicinski 0 siblings, 1 reply; 18+ messages in thread From: Parav Pandit @ 2024-04-05 3:13 UTC (permalink / raw) To: Jakub Kicinski Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com, Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko Hi Jakub, > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, April 5, 2024 7:51 AM > > On Wed, 3 Apr 2024 20:41:32 +0300 Parav Pandit wrote: > > Many devices send event notifications for the IO queues, such as tx > > and rx queues, through event queues. > > > > Enable a privileged owner, such as a hypervisor PF, to set the number > > of IO event queues for the VF and SF during the provisioning stage. > > What's the relationship between EQ and queue pairs and IRQs? Netdev qps (txq, rxq pair) channels created are typically upto num cpus by driver, provided it has enough IO event queues upto cpu count. Rdma qps are far more than netdev qps as multiple process uses them and they are per user space process resource. Those applications use number of QPs based on number of cpus and number of event channels to deliver notifications to user space. Driver uses the IRQs dynamically upto the PCI's limit based on supported IO event queues. > The next patch says "maximum IO event queues which are typically used to > derive the maximum and default number of net device channels" > It may not be obvious to non-mlx5 experts, I think it needs to be better > documented. I will expand the documentation in .../networking/devlink/devlink-port.rst. I will add below change to the v4 that has David's comments also addressed. Is this ok for you? --- a/Documentation/networking/devlink/devlink-port.rst +++ b/Documentation/networking/devlink/devlink-port.rst @@ -304,6 +304,11 @@ When user sets maximum number of IO event queues for a SF or a VF, such function driver is limited to consume only enforced number of IO event queues. +IO event queues deliver events related to IO queues, including network +device transmit and receive queues (txq and rxq) and RDMA Queue Pairs (QPs). +For example, the number of netdevice channels and RDMA device completion +vectors are derived from the function's IO event queues. + ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-05 3:13 ` Parav Pandit @ 2024-04-05 14:13 ` Jakub Kicinski 2024-04-05 16:34 ` Parav Pandit 0 siblings, 1 reply; 18+ messages in thread From: Jakub Kicinski @ 2024-04-05 14:13 UTC (permalink / raw) To: Parav Pandit Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com, Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko On Fri, 5 Apr 2024 03:13:36 +0000 Parav Pandit wrote: > Netdev qps (txq, rxq pair) channels created are typically upto num cpus by driver, provided it has enough IO event queues upto cpu count. > > Rdma qps are far more than netdev qps as multiple process uses them and they are per user space process resource. > Those applications use number of QPs based on number of cpus and number of event channels to deliver notifications to user space. Some other drivers (e.g. intel) support multiple queues per core in netdev. For mlx5 I think AF_XDP may be a good example (or used to be) where there may be more than one queue? So I think the question still stands even for netdev. We should document whether number of EQs contains the number of Rx/Tx queues. > Driver uses the IRQs dynamically upto the PCI's limit based on supported IO event queues. Right but one IRQ <> one EQ? Typically / always? SFs "share" the IRQs with PF IIRC, do they share EQs? > > The next patch says "maximum IO event queues which are typically used to > > derive the maximum and default number of net device channels" > > It may not be obvious to non-mlx5 experts, I think it needs to be better > > documented. > I will expand the documentation in .../networking/devlink/devlink-port.rst. > > I will add below change to the v4 that has David's comments also addressed. > Is this ok for you? Looks like a good start but I think a few more sentences describing the relation to other resources would be good. > --- a/Documentation/networking/devlink/devlink-port.rst > +++ b/Documentation/networking/devlink/devlink-port.rst > @@ -304,6 +304,11 @@ When user sets maximum number of IO event queues for a SF or > a VF, such function driver is limited to consume only enforced > number of IO event queues. > > +IO event queues deliver events related to IO queues, including network > +device transmit and receive queues (txq and rxq) and RDMA Queue Pairs (QPs). > +For example, the number of netdevice channels and RDMA device completion > +vectors are derived from the function's IO event queues. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 1/2] devlink: Support setting max_io_eqs 2024-04-05 14:13 ` Jakub Kicinski @ 2024-04-05 16:34 ` Parav Pandit 0 siblings, 0 replies; 18+ messages in thread From: Parav Pandit @ 2024-04-05 16:34 UTC (permalink / raw) To: Jakub Kicinski Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com, Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org, Jiri Pirko > From: Jakub Kicinski <kuba@kernel.org> > Sent: Friday, April 5, 2024 7:44 PM > > On Fri, 5 Apr 2024 03:13:36 +0000 Parav Pandit wrote: > > Netdev qps (txq, rxq pair) channels created are typically upto num cpus by > driver, provided it has enough IO event queues upto cpu count. > > > > Rdma qps are far more than netdev qps as multiple process uses them and > they are per user space process resource. > > Those applications use number of QPs based on number of cpus and > number of event channels to deliver notifications to user space. > > Some other drivers (e.g. intel) support multiple queues per core in netdev. > For mlx5 I think AF_XDP may be a good example (or used to be) where there > may be more than one queue? > Yes, there may be multiple netdev queues which can be connected to a eq. For example, as you described, mlx5 xdp, also mlx5 creates a multiple txq per traffic class per channel which are linked to a single eq of the channel. But still those txq are per channel AFAIK. > So I think the question still stands even for netdev. > We should document whether number of EQs contains the number of Rx/Tx > queues. > I believe number of txq, rxqs can be more than the number of EQs connecting to same EQ. Netdev channels have more accurate linkage to EQs, compared to raw txq/rxqs. > > Driver uses the IRQs dynamically upto the PCI's limit based on supported IO > event queues. > > Right but one IRQ <> one EQ? Typically / always? Typically yes, one IRQ <> one EQ. > SFs "share" the IRQs with PF IIRC, do they share EQs? > SFs do not share EQs. Yes, SFs have their own dedicated EQs. You remember right, that they share IRQs. > > > The next patch says "maximum IO event queues which are typically > > > used to derive the maximum and default number of net device channels" > > > It may not be obvious to non-mlx5 experts, I think it needs to be > > > better documented. > > I will expand the documentation in .../networking/devlink/devlink-port.rst. > > > > I will add below change to the v4 that has David's comments also > addressed. > > Is this ok for you? > > Looks like a good start but I think a few more sentences describing the > relation to other resources would be good. > I think EQs limited object that does not have more wider relation in the stack. Relation to IRQ is probably a good addition to do. Along with below changes, will add the reference to IRQ too in v4. > > --- a/Documentation/networking/devlink/devlink-port.rst > > +++ b/Documentation/networking/devlink/devlink-port.rst > > @@ -304,6 +304,11 @@ When user sets maximum number of IO event > queues > > for a SF or a VF, such function driver is limited to consume only > > enforced number of IO event queues. > > > > +IO event queues deliver events related to IO queues, including > > +network device transmit and receive queues (txq and rxq) and RDMA > Queue Pairs (QPs). > > +For example, the number of netdevice channels and RDMA device > > +completion vectors are derived from the function's IO event queues. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function 2024-04-03 17:41 [net-next v3 0/2] devlink: Add port function attribute for IO EQs Parav Pandit 2024-04-03 17:41 ` [net-next v3 1/2] devlink: Support setting max_io_eqs Parav Pandit @ 2024-04-03 17:41 ` Parav Pandit 2024-04-04 3:51 ` Kalesh Anakkur Purayil 2024-04-04 16:57 ` David Wei 1 sibling, 2 replies; 18+ messages in thread From: Parav Pandit @ 2024-04-03 17:41 UTC (permalink / raw) To: netdev, davem, edumazet, kuba, pabeni, corbet, kalesh-anakkur.purayil Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma, Parav Pandit Implement get and set for the maximum IO event queues for SF and VF. This enables administrator on the hypervisor to control the maximum IO event queues which are typically used to derive the maximum and default number of net device channels or rdma device completion vectors. Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> --- changelog: v2->v3: - limited to 80 chars per line in devlink - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock on error path v1->v2: - fixed comments from Kalesh - fixed missing kfree in get call - returning error code for get cmd failure - fixed error msg copy paste error in set on cmd failure - limited code to 80 chars limit - fixed set function variables for reverse christmas tree --- .../mellanox/mlx5/core/esw/devlink_port.c | 4 + .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ 3 files changed, 108 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c index d8e739cbcbce..f8869c9b6802 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c @@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = { .port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get, .port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set, #endif /* CONFIG_XFRM_OFFLOAD */ + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, }; static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw, @@ -143,6 +145,8 @@ static const struct devlink_port_ops mlx5_esw_dl_sf_port_ops = { .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, #endif + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, }; int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, struct mlx5_vport *vport) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 349e28a6dd8d..50ce1ea20dd4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable, struct netlink_ext_ack *extack); #endif /* CONFIG_XFRM_OFFLOAD */ +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, + u32 *max_io_eqs, + struct netlink_ext_ack *extack); +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, + u32 max_io_eqs, + struct netlink_ext_ack *extack); + void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type); int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index baaae628b0a0..2ad50634b401 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -66,6 +66,8 @@ #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1) +#define MLX5_ESW_MAX_CTRL_EQS 4 + static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = { .max_fte = MLX5_ESW_VPORT_TBL_SIZE, .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS, @@ -4557,3 +4559,98 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, return err; } #endif /* CONFIG_XFRM_OFFLOAD */ + +int +mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs, + struct netlink_ext_ack *extack) +{ + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); + u16 vport_num = vport->vport; + struct mlx5_eswitch *esw; + void *query_ctx; + void *hca_caps; + u32 max_eqs; + int err; + + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { + NL_SET_ERR_MSG_MOD(extack, + "Device doesn't support VHCA management"); + return -EOPNOTSUPP; + } + + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); + if (!query_ctx) + return -ENOMEM; + + mutex_lock(&esw->state_lock); + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, + MLX5_CAP_GENERAL); + if (err) { + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); + goto out; + } + + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs); + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS) + *max_io_eqs = 0; + else + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS; +out: + mutex_unlock(&esw->state_lock); + kfree(query_ctx); + return err; +} + +int +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs, + struct netlink_ext_ack *extack) +{ + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS; + u16 vport_num = vport->vport; + struct mlx5_eswitch *esw; + void *query_ctx; + void *hca_caps; + int err; + + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { + NL_SET_ERR_MSG_MOD(extack, + "Device doesn't support VHCA management"); + return -EOPNOTSUPP; + } + + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) { + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range"); + return -EINVAL; + } + + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); + if (!query_ctx) + return -ENOMEM; + + mutex_lock(&esw->state_lock); + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, + MLX5_CAP_GENERAL); + if (err) { + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); + goto out; + } + + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs); + + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num, + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE); + if (err) + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps"); + +out: + mutex_unlock(&esw->state_lock); + kfree(query_ctx); + return err; +} -- 2.26.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function 2024-04-03 17:41 ` [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit @ 2024-04-04 3:51 ` Kalesh Anakkur Purayil 2024-04-04 16:57 ` David Wei 1 sibling, 0 replies; 18+ messages in thread From: Kalesh Anakkur Purayil @ 2024-04-04 3:51 UTC (permalink / raw) To: Parav Pandit Cc: netdev, davem, edumazet, kuba, pabeni, corbet, saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma [-- Attachment #1: Type: text/plain, Size: 8302 bytes --] On Wed, Apr 3, 2024 at 11:12 PM Parav Pandit <parav@nvidia.com> wrote: > > Implement get and set for the maximum IO event queues for SF and VF. > This enables administrator on the hypervisor to control the maximum > IO event queues which are typically used to derive the maximum and > default number of net device channels or rdma device completion vectors. > > Reviewed-by: Shay Drory <shayd@nvidia.com> > Signed-off-by: Parav Pandit <parav@nvidia.com> Thanks Parav for addressing the comments. Changes look good to me. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > --- > changelog: > v2->v3: > - limited to 80 chars per line in devlink > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > on error path > v1->v2: > - fixed comments from Kalesh > - fixed missing kfree in get call > - returning error code for get cmd failure > - fixed error msg copy paste error in set on cmd failure > - limited code to 80 chars limit > - fixed set function variables for reverse christmas tree > --- > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > 3 files changed, 108 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > index d8e739cbcbce..f8869c9b6802 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > @@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = { > .port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get, > .port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set, > #endif /* CONFIG_XFRM_OFFLOAD */ > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > }; > > static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw, > @@ -143,6 +145,8 @@ static const struct devlink_port_ops mlx5_esw_dl_sf_port_ops = { > .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, > .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, > #endif > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > }; > > int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, struct mlx5_vport *vport) > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > index 349e28a6dd8d..50ce1ea20dd4 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en > int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable, > struct netlink_ext_ack *extack); > #endif /* CONFIG_XFRM_OFFLOAD */ > +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, > + u32 *max_io_eqs, > + struct netlink_ext_ack *extack); > +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, > + u32 max_io_eqs, > + struct netlink_ext_ack *extack); > + > void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type); > > int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > index baaae628b0a0..2ad50634b401 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > @@ -66,6 +66,8 @@ > > #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1) > > +#define MLX5_ESW_MAX_CTRL_EQS 4 > + > static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = { > .max_fte = MLX5_ESW_VPORT_TBL_SIZE, > .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS, > @@ -4557,3 +4559,98 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, > return err; > } > #endif /* CONFIG_XFRM_OFFLOAD */ > + > +int > +mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs, > + struct netlink_ext_ack *extack) > +{ > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > + u16 vport_num = vport->vport; > + struct mlx5_eswitch *esw; > + void *query_ctx; > + void *hca_caps; > + u32 max_eqs; > + int err; > + > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > + NL_SET_ERR_MSG_MOD(extack, > + "Device doesn't support VHCA management"); > + return -EOPNOTSUPP; > + } > + > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > + if (!query_ctx) > + return -ENOMEM; > + > + mutex_lock(&esw->state_lock); > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, > + MLX5_CAP_GENERAL); > + if (err) { > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > + goto out; > + } > + > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); > + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs); > + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS) > + *max_io_eqs = 0; > + else > + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS; > +out: > + mutex_unlock(&esw->state_lock); > + kfree(query_ctx); > + return err; > +} > + > +int > +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs, > + struct netlink_ext_ack *extack) > +{ > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS; > + u16 vport_num = vport->vport; > + struct mlx5_eswitch *esw; > + void *query_ctx; > + void *hca_caps; > + int err; > + > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > + NL_SET_ERR_MSG_MOD(extack, > + "Device doesn't support VHCA management"); > + return -EOPNOTSUPP; > + } > + > + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) { > + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range"); > + return -EINVAL; > + } > + > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > + if (!query_ctx) > + return -ENOMEM; > + > + mutex_lock(&esw->state_lock); > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, > + MLX5_CAP_GENERAL); > + if (err) { > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > + goto out; > + } > + > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); > + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs); > + > + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num, > + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE); > + if (err) > + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps"); > + > +out: > + mutex_unlock(&esw->state_lock); > + kfree(query_ctx); > + return err; > +} > -- > 2.26.2 > -- Regards, Kalesh A P [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4239 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function 2024-04-03 17:41 ` [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit 2024-04-04 3:51 ` Kalesh Anakkur Purayil @ 2024-04-04 16:57 ` David Wei 2024-04-04 18:05 ` Parav Pandit 1 sibling, 1 reply; 18+ messages in thread From: David Wei @ 2024-04-04 16:57 UTC (permalink / raw) To: Parav Pandit, netdev, davem, edumazet, kuba, pabeni, corbet, kalesh-anakkur.purayil Cc: saeedm, leon, jiri, shayd, danielj, dchumak, linux-doc, linux-rdma On 2024-04-03 10:41, Parav Pandit wrote: > Implement get and set for the maximum IO event queues for SF and VF. > This enables administrator on the hypervisor to control the maximum > IO event queues which are typically used to derive the maximum and > default number of net device channels or rdma device completion vectors. > > Reviewed-by: Shay Drory <shayd@nvidia.com> > Signed-off-by: Parav Pandit <parav@nvidia.com> > --- > changelog: > v2->v3: > - limited to 80 chars per line in devlink > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > on error path > v1->v2: > - fixed comments from Kalesh > - fixed missing kfree in get call > - returning error code for get cmd failure > - fixed error msg copy paste error in set on cmd failure > - limited code to 80 chars limit > - fixed set function variables for reverse christmas tree > --- > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > 3 files changed, 108 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > index d8e739cbcbce..f8869c9b6802 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > @@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = { > .port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get, > .port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set, > #endif /* CONFIG_XFRM_OFFLOAD */ > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > }; > > static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw, > @@ -143,6 +145,8 @@ static const struct devlink_port_ops mlx5_esw_dl_sf_port_ops = { > .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, > .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, > #endif > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > }; > > int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, struct mlx5_vport *vport) > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > index 349e28a6dd8d..50ce1ea20dd4 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en > int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable, > struct netlink_ext_ack *extack); > #endif /* CONFIG_XFRM_OFFLOAD */ > +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, > + u32 *max_io_eqs, > + struct netlink_ext_ack *extack); > +int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, > + u32 max_io_eqs, > + struct netlink_ext_ack *extack); > + > void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type); > > int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > index baaae628b0a0..2ad50634b401 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > @@ -66,6 +66,8 @@ > > #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1) > > +#define MLX5_ESW_MAX_CTRL_EQS 4 > + > static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = { > .max_fte = MLX5_ESW_VPORT_TBL_SIZE, > .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS, > @@ -4557,3 +4559,98 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, > return err; > } > #endif /* CONFIG_XFRM_OFFLOAD */ > + > +int > +mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs, > + struct netlink_ext_ack *extack) > +{ > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > + u16 vport_num = vport->vport; > + struct mlx5_eswitch *esw; > + void *query_ctx; > + void *hca_caps; > + u32 max_eqs; > + int err; > + > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > + NL_SET_ERR_MSG_MOD(extack, > + "Device doesn't support VHCA management"); > + return -EOPNOTSUPP; > + } > + > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > + if (!query_ctx) > + return -ENOMEM; > + > + mutex_lock(&esw->state_lock); > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, > + MLX5_CAP_GENERAL); > + if (err) { > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > + goto out; > + } > + > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); > + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs); > + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS) > + *max_io_eqs = 0; > + else > + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS; > +out: > + mutex_unlock(&esw->state_lock); > + kfree(query_ctx); > + return err; > +} > + > +int > +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs, > + struct netlink_ext_ack *extack) > +{ > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS; > + u16 vport_num = vport->vport; > + struct mlx5_eswitch *esw; > + void *query_ctx; > + void *hca_caps; > + int err; > + > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > + NL_SET_ERR_MSG_MOD(extack, > + "Device doesn't support VHCA management"); > + return -EOPNOTSUPP; > + } > + > + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) { nit: this is the same as max_eqs > + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range"); > + return -EINVAL; > + } > + > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > + if (!query_ctx) > + return -ENOMEM; > + > + mutex_lock(&esw->state_lock); > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx, > + MLX5_CAP_GENERAL); > + if (err) { > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > + goto out; > + } > + > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability); > + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs); > + > + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num, > + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE); > + if (err) > + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps"); > + > +out: > + mutex_unlock(&esw->state_lock); > + kfree(query_ctx); > + return err; > +} ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function 2024-04-04 16:57 ` David Wei @ 2024-04-04 18:05 ` Parav Pandit 2024-04-04 18:44 ` David Wei 0 siblings, 1 reply; 18+ messages in thread From: Parav Pandit @ 2024-04-04 18:05 UTC (permalink / raw) To: David Wei, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org > From: David Wei <dw@davidwei.uk> > Sent: Thursday, April 4, 2024 10:27 PM > > On 2024-04-03 10:41, Parav Pandit wrote: > > Implement get and set for the maximum IO event queues for SF and VF. > > This enables administrator on the hypervisor to control the maximum IO > > event queues which are typically used to derive the maximum and > > default number of net device channels or rdma device completion vectors. > > > > Reviewed-by: Shay Drory <shayd@nvidia.com> > > Signed-off-by: Parav Pandit <parav@nvidia.com> > > --- > > changelog: > > v2->v3: > > - limited to 80 chars per line in devlink > > - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock > > on error path > > v1->v2: > > - fixed comments from Kalesh > > - fixed missing kfree in get call > > - returning error code for get cmd failure > > - fixed error msg copy paste error in set on cmd failure > > - limited code to 80 chars limit > > - fixed set function variables for reverse christmas tree > > --- > > .../mellanox/mlx5/core/esw/devlink_port.c | 4 + > > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ > > .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ > > 3 files changed, 108 insertions(+) > > > > diff --git > > a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > > b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > > index d8e739cbcbce..f8869c9b6802 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c > > @@ -98,6 +98,8 @@ static const struct devlink_port_ops > mlx5_esw_pf_vf_dl_port_ops = { > > .port_fn_ipsec_packet_get = > mlx5_devlink_port_fn_ipsec_packet_get, > > .port_fn_ipsec_packet_set = > mlx5_devlink_port_fn_ipsec_packet_set, > > #endif /* CONFIG_XFRM_OFFLOAD */ > > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > > }; > > > > static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct > > mlx5_eswitch *esw, @@ -143,6 +145,8 @@ static const struct > devlink_port_ops mlx5_esw_dl_sf_port_ops = { > > .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, > > .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, > > #endif > > + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, > > + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, > > }; > > > > int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, > > struct mlx5_vport *vport) diff --git > > a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > > b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > > index 349e28a6dd8d..50ce1ea20dd4 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > > @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct > > devlink_port *port, bool *is_en int > mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool > enable, > > struct netlink_ext_ack *extack); > #endif /* > > CONFIG_XFRM_OFFLOAD */ > > +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, > > + u32 *max_io_eqs, > > + struct netlink_ext_ack *extack); int > > +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, > > + u32 max_io_eqs, > > + struct netlink_ext_ack *extack); > > + > > void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 > > rep_type); > > > > int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, diff > > --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > > b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > > index baaae628b0a0..2ad50634b401 100644 > > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c > > @@ -66,6 +66,8 @@ > > > > #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1) > > > > +#define MLX5_ESW_MAX_CTRL_EQS 4 > > + > > static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = { > > .max_fte = MLX5_ESW_VPORT_TBL_SIZE, > > .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS, @@ - > 4557,3 +4559,98 > > @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, > > return err; > > } > > #endif /* CONFIG_XFRM_OFFLOAD */ > > + > > +int > > +mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 > *max_io_eqs, > > + struct netlink_ext_ack *extack) { > > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > > + u16 vport_num = vport->vport; > > + struct mlx5_eswitch *esw; > > + void *query_ctx; > > + void *hca_caps; > > + u32 max_eqs; > > + int err; > > + > > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > > + NL_SET_ERR_MSG_MOD(extack, > > + "Device doesn't support VHCA > management"); > > + return -EOPNOTSUPP; > > + } > > + > > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > > + if (!query_ctx) > > + return -ENOMEM; > > + > > + mutex_lock(&esw->state_lock); > > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, > query_ctx, > > + MLX5_CAP_GENERAL); > > + if (err) { > > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > > + goto out; > > + } > > + > > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, > capability); > > + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs); > > + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS) > > + *max_io_eqs = 0; > > + else > > + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS; > > +out: > > + mutex_unlock(&esw->state_lock); > > + kfree(query_ctx); > > + return err; > > +} > > + > > +int > > +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 > max_io_eqs, > > + struct netlink_ext_ack *extack) { > > + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); > > + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); > > + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS; > > + u16 vport_num = vport->vport; > > + struct mlx5_eswitch *esw; > > + void *query_ctx; > > + void *hca_caps; > > + int err; > > + > > + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); > > + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { > > + NL_SET_ERR_MSG_MOD(extack, > > + "Device doesn't support VHCA > management"); > > + return -EOPNOTSUPP; > > + } > > + > > + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) { > > nit: this is the same as max_eqs > Yes. max_eqs may overflow and we may miss the overflow check by reusing max_eqs. Above if condition avoids that bug. That reminds me that I can replace above max_eqs line and this with check_add_overflow() and store the result in max_io_eqs. And save one line. :) Shall I resend with this change? > > + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of > range"); > > + return -EINVAL; > > + } > > + > > + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); > > + if (!query_ctx) > > + return -ENOMEM; > > + > > + mutex_lock(&esw->state_lock); > > + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, > query_ctx, > > + MLX5_CAP_GENERAL); > > + if (err) { > > + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); > > + goto out; > > + } > > + > > + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, > capability); > > + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs); > > + > > + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, > vport_num, > > + > MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE); > > + if (err) > > + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps"); > > + > > +out: > > + mutex_unlock(&esw->state_lock); > > + kfree(query_ctx); > > + return err; > > +} ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function 2024-04-04 18:05 ` Parav Pandit @ 2024-04-04 18:44 ` David Wei 0 siblings, 0 replies; 18+ messages in thread From: David Wei @ 2024-04-04 18:44 UTC (permalink / raw) To: Parav Pandit, netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, corbet@lwn.net, kalesh-anakkur.purayil@broadcom.com Cc: Saeed Mahameed, leon@kernel.org, jiri@resnulli.us, Shay Drori, Dan Jurgens, Dima Chumak, linux-doc@vger.kernel.org, linux-rdma@vger.kernel.org On 2024-04-04 11:05, Parav Pandit wrote: > > >> From: David Wei <dw@davidwei.uk> >> Sent: Thursday, April 4, 2024 10:27 PM >> >> On 2024-04-03 10:41, Parav Pandit wrote: >>> Implement get and set for the maximum IO event queues for SF and VF. >>> This enables administrator on the hypervisor to control the maximum IO >>> event queues which are typically used to derive the maximum and >>> default number of net device channels or rdma device completion vectors. >>> >>> Reviewed-by: Shay Drory <shayd@nvidia.com> >>> Signed-off-by: Parav Pandit <parav@nvidia.com> >>> --- >>> changelog: >>> v2->v3: >>> - limited to 80 chars per line in devlink >>> - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock >>> on error path >>> v1->v2: >>> - fixed comments from Kalesh >>> - fixed missing kfree in get call >>> - returning error code for get cmd failure >>> - fixed error msg copy paste error in set on cmd failure >>> - limited code to 80 chars limit >>> - fixed set function variables for reverse christmas tree >>> --- >>> .../mellanox/mlx5/core/esw/devlink_port.c | 4 + >>> .../net/ethernet/mellanox/mlx5/core/eswitch.h | 7 ++ >>> .../mellanox/mlx5/core/eswitch_offloads.c | 97 +++++++++++++++++++ >>> 3 files changed, 108 insertions(+) >>> >>> diff --git >>> a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c >>> b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c >>> index d8e739cbcbce..f8869c9b6802 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c >>> @@ -98,6 +98,8 @@ static const struct devlink_port_ops >> mlx5_esw_pf_vf_dl_port_ops = { >>> .port_fn_ipsec_packet_get = >> mlx5_devlink_port_fn_ipsec_packet_get, >>> .port_fn_ipsec_packet_set = >> mlx5_devlink_port_fn_ipsec_packet_set, >>> #endif /* CONFIG_XFRM_OFFLOAD */ >>> + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, >>> + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, >>> }; >>> >>> static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct >>> mlx5_eswitch *esw, @@ -143,6 +145,8 @@ static const struct >> devlink_port_ops mlx5_esw_dl_sf_port_ops = { >>> .port_fn_state_get = mlx5_devlink_sf_port_fn_state_get, >>> .port_fn_state_set = mlx5_devlink_sf_port_fn_state_set, >>> #endif >>> + .port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get, >>> + .port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set, >>> }; >>> >>> int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, >>> struct mlx5_vport *vport) diff --git >>> a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h >>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h >>> index 349e28a6dd8d..50ce1ea20dd4 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h >>> @@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct >>> devlink_port *port, bool *is_en int >> mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool >> enable, >>> struct netlink_ext_ack *extack); >> #endif /* >>> CONFIG_XFRM_OFFLOAD */ >>> +int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, >>> + u32 *max_io_eqs, >>> + struct netlink_ext_ack *extack); int >>> +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, >>> + u32 max_io_eqs, >>> + struct netlink_ext_ack *extack); >>> + >>> void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 >>> rep_type); >>> >>> int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, diff >>> --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> index baaae628b0a0..2ad50634b401 100644 >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c >>> @@ -66,6 +66,8 @@ >>> >>> #define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1) >>> >>> +#define MLX5_ESW_MAX_CTRL_EQS 4 >>> + >>> static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = { >>> .max_fte = MLX5_ESW_VPORT_TBL_SIZE, >>> .max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS, @@ - >> 4557,3 +4559,98 >>> @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, >>> return err; >>> } >>> #endif /* CONFIG_XFRM_OFFLOAD */ >>> + >>> +int >>> +mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 >> *max_io_eqs, >>> + struct netlink_ext_ack *extack) { >>> + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); >>> + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); >>> + u16 vport_num = vport->vport; >>> + struct mlx5_eswitch *esw; >>> + void *query_ctx; >>> + void *hca_caps; >>> + u32 max_eqs; >>> + int err; >>> + >>> + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); >>> + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { >>> + NL_SET_ERR_MSG_MOD(extack, >>> + "Device doesn't support VHCA >> management"); >>> + return -EOPNOTSUPP; >>> + } >>> + >>> + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); >>> + if (!query_ctx) >>> + return -ENOMEM; >>> + >>> + mutex_lock(&esw->state_lock); >>> + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, >> query_ctx, >>> + MLX5_CAP_GENERAL); >>> + if (err) { >>> + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); >>> + goto out; >>> + } >>> + >>> + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, >> capability); >>> + max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs); >>> + if (max_eqs < MLX5_ESW_MAX_CTRL_EQS) >>> + *max_io_eqs = 0; >>> + else >>> + *max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS; >>> +out: >>> + mutex_unlock(&esw->state_lock); >>> + kfree(query_ctx); >>> + return err; >>> +} >>> + >>> +int >>> +mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 >> max_io_eqs, >>> + struct netlink_ext_ack *extack) { >>> + struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); >>> + int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out); >>> + u16 max_eqs = max_io_eqs + MLX5_ESW_MAX_CTRL_EQS; >>> + u16 vport_num = vport->vport; >>> + struct mlx5_eswitch *esw; >>> + void *query_ctx; >>> + void *hca_caps; >>> + int err; >>> + >>> + esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); >>> + if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { >>> + NL_SET_ERR_MSG_MOD(extack, >>> + "Device doesn't support VHCA >> management"); >>> + return -EOPNOTSUPP; >>> + } >>> + >>> + if (max_io_eqs + MLX5_ESW_MAX_CTRL_EQS > USHRT_MAX) { >> >> nit: this is the same as max_eqs >> > Yes. > max_eqs may overflow and we may miss the overflow check by reusing max_eqs. > Above if condition avoids that bug. Thanks, I missed that max_io_eqs is u32. > That reminds me that I can replace above max_eqs line and this with check_add_overflow() and store the result in max_io_eqs. > And save one line. :) > > Shall I resend with this change? Yes that sounds good, thank you. > >>> + NL_SET_ERR_MSG_MOD(extack, "Supplied value out of >> range"); >>> + return -EINVAL; >>> + } >>> + >>> + query_ctx = kzalloc(query_out_sz, GFP_KERNEL); >>> + if (!query_ctx) >>> + return -ENOMEM; >>> + >>> + mutex_lock(&esw->state_lock); >>> + err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, >> query_ctx, >>> + MLX5_CAP_GENERAL); >>> + if (err) { >>> + NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps"); >>> + goto out; >>> + } >>> + >>> + hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, >> capability); >>> + MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs); >>> + >>> + err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, >> vport_num, >>> + >> MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE); >>> + if (err) >>> + NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps"); >>> + >>> +out: >>> + mutex_unlock(&esw->state_lock); >>> + kfree(query_ctx); >>> + return err; >>> +} ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-04-05 16:35 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-03 17:41 [net-next v3 0/2] devlink: Add port function attribute for IO EQs Parav Pandit 2024-04-03 17:41 ` [net-next v3 1/2] devlink: Support setting max_io_eqs Parav Pandit 2024-04-04 16:59 ` David Wei 2024-04-04 17:58 ` Parav Pandit 2024-04-04 18:40 ` David Wei 2024-04-04 18:50 ` Parav Pandit 2024-04-04 18:52 ` David Wei 2024-04-04 18:55 ` Parav Pandit 2024-04-05 2:22 ` Jakub Kicinski 2024-04-05 2:21 ` Jakub Kicinski 2024-04-05 3:13 ` Parav Pandit 2024-04-05 14:13 ` Jakub Kicinski 2024-04-05 16:34 ` Parav Pandit 2024-04-03 17:41 ` [net-next v3 2/2] mlx5/core: Support max_io_eqs for a function Parav Pandit 2024-04-04 3:51 ` Kalesh Anakkur Purayil 2024-04-04 16:57 ` David Wei 2024-04-04 18:05 ` Parav Pandit 2024-04-04 18:44 ` David Wei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).