linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains
@ 2025-02-13 18:01 Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check Tariq Toukan
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

Hi,

This series introduces rate domains in devlink and mlx5
driver. Find detailed description by Cosmin below.

Regards,
Tariq


devlink objects support rate management for tx scheduling, which
involves maintaining a tree of rate nodes that corresponds to tx
schedulers in hardware. 'man devlink-rate' has the full details.

The tree of rate nodes is maintained per devlink object, protected by
the devlink lock.

There exists hardware capable of instantiating a tx scheduling tree
which spans multiple functions of the same physical device (and thus
devlink objects) and therefore the current API and locking scheme is
insufficient.

This patch series changes the devlink rate implementation and API to
allow supporting such hardware and managing tx scheduling trees across
multiple functions of a physical device.

Modeling this requires having devlink rate nodes with parents in other
devlink objects. A naive approach that relies on the current
one-lock-per-devlink model is impossible, as it would require in some
cases acquiring multiple devlink locks in the correct order.

The solution proposed is to move rates in a separate object named 'rate
domain'. Devlink objects create a private rate domain on init and
hardware that supports cross-function tx scheduling can switch to using
a shared rate domain for a set of devlink objects. Shared rate domains
have an additional lock serializing access to rate notes.
A new pair of devlink attributes is introduced for specifying a foreign
parent device as well as changes to the rate management devlink calls to
allow setting a rate node parent to the requested foreign parent device.
Finally, this API is used from mlx5 for NICs with the correct capability
bit to allow cross-function tx scheduling.

A note about net-shapers:
The net-shapers framework is completely orthogonal to this patch series.
net-shapers does shaping for tx queues, groups of queues and up to the
netdevice level. This patch series is for shaping across functions, so
it is strictly above the netdevice level in the shaping hierarchy.

This patch series was previously sent as an RFC ([1]).

Patches:

Small cleanup:
devlink: Remove unused param of devlink_rate_nodes_check

Introduce private rate domains:
devlink: Store devlink rates in a rate domain

Introduce rate domain locking (noop now as rate domains are private):
devlink: Serialize access to rate domains

Introduce shared rate domains and a global registry for them:
devlink: Introduce shared rate domains

Extend the devlink rate API with foreign parent devices:
devlink: Allow specifying parent device for rate commands
devlink: Allow rate node parents from other devlinks

Extends mlx5 implementation with the ability to share qos domains:
net/mlx5: qos: Introduce shared esw qos domains

Use the newly introduced stuff to support cross-function tx scheduling:
net/mlx5: qos: Support cross-esw tx scheduling
net/mlx5: qos: Init shared devlink rate domain

Finally, update documentation:
net/mlx5: Document devlink rates and cross-esw scheduling

[1] https://lore.kernel.org/netdev/20241113203317.2507537-1-cratiu@nvidia.com/


Cosmin Ratiu (10):
  devlink: Remove unused param of devlink_rate_nodes_check
  devlink: Store devlink rates in a rate domain
  devlink: Serialize access to rate domains
  devlink: Introduce shared rate domains
  devlink: Allow specifying parent device for rate commands
  devlink: Allow rate node parents from other devlinks
  net/mlx5: qos: Introduce shared esw qos domains
  net/mlx5: qos: Support cross-esw tx scheduling
  net/mlx5: qos: Init shared devlink rate domain
  net/mlx5: Document devlink rates and cross-esw scheduling

 Documentation/netlink/specs/devlink.yaml      |  18 +-
 .../networking/devlink/devlink-port.rst       |   2 +
 Documentation/networking/devlink/mlx5.rst     |  33 +++
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 144 ++++++++++--
 include/net/devlink.h                         |   8 +
 include/uapi/linux/devlink.h                  |   3 +
 net/devlink/core.c                            |  86 ++++++-
 net/devlink/dev.c                             |   6 +-
 net/devlink/devl_internal.h                   |  34 ++-
 net/devlink/netlink.c                         |  74 ++++--
 net/devlink/netlink_gen.c                     |  20 +-
 net/devlink/netlink_gen.h                     |   7 +
 net/devlink/rate.c                            | 217 +++++++++++++-----
 13 files changed, 548 insertions(+), 104 deletions(-)


base-commit: 8dbf0c7556454b52af91bae305ca71500c31495c
-- 
2.45.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-18  2:54   ` Kalesh Anakkur Purayil
  2025-02-13 18:01 ` [PATCH net-next 02/10] devlink: Store devlink rates in a rate domain Tariq Toukan
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

The 'mode' param is unused so remove it.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/devlink/dev.c           | 4 ++--
 net/devlink/devl_internal.h | 3 +--
 net/devlink/rate.c          | 3 +--
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/devlink/dev.c b/net/devlink/dev.c
index d6e3db300acb..5ff5d9055a8d 100644
--- a/net/devlink/dev.c
+++ b/net/devlink/dev.c
@@ -713,10 +713,10 @@ int devlink_nl_eswitch_set_doit(struct sk_buff *skb, struct genl_info *info)
 	if (info->attrs[DEVLINK_ATTR_ESWITCH_MODE]) {
 		if (!ops->eswitch_mode_set)
 			return -EOPNOTSUPP;
-		mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
-		err = devlink_rate_nodes_check(devlink, mode, info->extack);
+		err = devlink_rate_nodes_check(devlink, info->extack);
 		if (err)
 			return err;
+		mode = nla_get_u16(info->attrs[DEVLINK_ATTR_ESWITCH_MODE]);
 		err = ops->eswitch_mode_set(devlink, mode, info->extack);
 		if (err)
 			return err;
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index 14eaad9cfe35..9cc7a5f4a19f 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -297,8 +297,7 @@ int devlink_resources_validate(struct devlink *devlink,
 			       struct genl_info *info);
 
 /* Rates */
-int devlink_rate_nodes_check(struct devlink *devlink, u16 mode,
-			     struct netlink_ext_ack *extack);
+int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *extack);
 
 /* Linecards */
 unsigned int devlink_linecard_index(struct devlink_linecard *linecard);
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 8828ffaf6cbc..d163c61cdd98 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -561,8 +561,7 @@ int devlink_nl_rate_del_doit(struct sk_buff *skb, struct genl_info *info)
 	return err;
 }
 
-int devlink_rate_nodes_check(struct devlink *devlink, u16 mode,
-			     struct netlink_ext_ack *extack)
+int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *extack)
 {
 	struct devlink_rate *devlink_rate;
 
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 02/10] devlink: Store devlink rates in a rate domain
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 03/10] devlink: Serialize access to rate domains Tariq Toukan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Introduce the concept of a rate domain, a data structure aiming to store
devlink rates of one or more devlink objects such that rate nodes within
a rate domain can be parents of other rate nodes from the same domain.
This commit takes the first steps in that direction by:
- introducing a new 'devlink_rate_domain' data structure, allocated on
  devlink init.
- storing devlink rates from the devlink object into the rate domain.
- converting all accesses to the new data structure.
- adding checks for everything that walks rates in a rate domain so that
  the correct devlink object is looked at. These checks are now spurious
  since all rate domains are private to their devlink object, but that
  is about to change and this is a good moment to add those checks.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/devlink/core.c          | 11 ++++++++--
 net/devlink/dev.c           |  2 +-
 net/devlink/devl_internal.h |  7 ++++++-
 net/devlink/rate.c          | 41 ++++++++++++++++++++++---------------
 4 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/net/devlink/core.c b/net/devlink/core.c
index f49cd83f1955..06a2e2dce558 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -314,6 +314,7 @@ static void devlink_release(struct work_struct *work)
 	mutex_destroy(&devlink->lock);
 	lockdep_unregister_key(&devlink->lock_key);
 	put_device(devlink->dev);
+	kvfree(devlink->rate_domain);
 	kvfree(devlink);
 }
 
@@ -424,6 +425,11 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	if (!devlink)
 		return NULL;
 
+	devlink->rate_domain = kvzalloc(sizeof(*devlink->rate_domain), GFP_KERNEL);
+	if (!devlink->rate_domain)
+		goto err_rate_domain;
+	INIT_LIST_HEAD(&devlink->rate_domain->rate_list);
+
 	ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b,
 			      &last_id, GFP_KERNEL);
 	if (ret < 0)
@@ -436,7 +442,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
 	xa_init_flags(&devlink->nested_rels, XA_FLAGS_ALLOC);
 	write_pnet(&devlink->_net, net);
-	INIT_LIST_HEAD(&devlink->rate_list);
 	INIT_LIST_HEAD(&devlink->linecard_list);
 	INIT_LIST_HEAD(&devlink->sb_list);
 	INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
@@ -455,6 +460,8 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	return devlink;
 
 err_xa_alloc:
+	kvfree(devlink->rate_domain);
+err_rate_domain:
 	kvfree(devlink);
 	return NULL;
 }
@@ -477,7 +484,7 @@ void devlink_free(struct devlink *devlink)
 	WARN_ON(!list_empty(&devlink->resource_list));
 	WARN_ON(!list_empty(&devlink->dpipe_table_list));
 	WARN_ON(!list_empty(&devlink->sb_list));
-	WARN_ON(!list_empty(&devlink->rate_list));
+	WARN_ON(!list_empty(&devlink->rate_domain->rate_list));
 	WARN_ON(!list_empty(&devlink->linecard_list));
 	WARN_ON(!xa_empty(&devlink->ports));
 
diff --git a/net/devlink/dev.c b/net/devlink/dev.c
index 5ff5d9055a8d..c926c75cc10d 100644
--- a/net/devlink/dev.c
+++ b/net/devlink/dev.c
@@ -434,7 +434,7 @@ static void devlink_reload_reinit_sanity_check(struct devlink *devlink)
 	WARN_ON(!list_empty(&devlink->trap_list));
 	WARN_ON(!list_empty(&devlink->dpipe_table_list));
 	WARN_ON(!list_empty(&devlink->sb_list));
-	WARN_ON(!list_empty(&devlink->rate_list));
+	WARN_ON(!list_empty(&devlink->rate_domain->rate_list));
 	WARN_ON(!list_empty(&devlink->linecard_list));
 	WARN_ON(!xa_empty(&devlink->ports));
 }
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index 9cc7a5f4a19f..209b4a4c7070 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -30,10 +30,14 @@ struct devlink_dev_stats {
 	u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
 };
 
+/* Stores devlink rates associated with a rate domain. */
+struct devlink_rate_domain {
+	struct list_head rate_list;
+};
+
 struct devlink {
 	u32 index;
 	struct xarray ports;
-	struct list_head rate_list;
 	struct list_head sb_list;
 	struct list_head dpipe_table_list;
 	struct list_head resource_list;
@@ -55,6 +59,7 @@ struct devlink {
 	 */
 	struct mutex lock;
 	struct lock_class_key lock_key;
+	struct devlink_rate_domain *rate_domain;
 	u8 reload_failed:1;
 	refcount_t refcount;
 	struct rcu_work rwork;
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index d163c61cdd98..535863bb0c17 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -36,8 +36,9 @@ devlink_rate_node_get_by_name(struct devlink *devlink, const char *node_name)
 {
 	static struct devlink_rate *devlink_rate;
 
-	list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
-		if (devlink_rate_is_node(devlink_rate) &&
+	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
+		if (devlink_rate->devlink == devlink &&
+		    devlink_rate_is_node(devlink_rate) &&
 		    !strcmp(node_name, devlink_rate->name))
 			return devlink_rate;
 	}
@@ -166,16 +167,18 @@ void devlink_rates_notify_register(struct devlink *devlink)
 {
 	struct devlink_rate *rate_node;
 
-	list_for_each_entry(rate_node, &devlink->rate_list, list)
-		devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
+	list_for_each_entry(rate_node, &devlink->rate_domain->rate_list, list)
+		if (rate_node->devlink == devlink)
+			devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
 }
 
 void devlink_rates_notify_unregister(struct devlink *devlink)
 {
 	struct devlink_rate *rate_node;
 
-	list_for_each_entry_reverse(rate_node, &devlink->rate_list, list)
-		devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL);
+	list_for_each_entry_reverse(rate_node, &devlink->rate_domain->rate_list, list)
+		if (rate_node->devlink == devlink)
+			devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL);
 }
 
 static int
@@ -187,7 +190,7 @@ devlink_nl_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
 	int idx = 0;
 	int err = 0;
 
-	list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
+	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
 		enum devlink_command cmd = DEVLINK_CMD_RATE_NEW;
 		u32 id = NETLINK_CB(cb->skb).portid;
 
@@ -195,6 +198,9 @@ devlink_nl_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
 			idx++;
 			continue;
 		}
+		if (devlink_rate->devlink != devlink)
+			continue;
+
 		err = devlink_nl_rate_fill(msg, devlink_rate, cmd, id,
 					   cb->nlh->nlmsg_seq, flags, NULL);
 		if (err) {
@@ -522,7 +528,7 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
 		goto err_rate_set;
 
 	refcount_set(&rate_node->refcnt, 1);
-	list_add(&rate_node->list, &devlink->rate_list);
+	list_add(&rate_node->list, &devlink->rate_domain->rate_list);
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
 	return 0;
 
@@ -565,8 +571,9 @@ int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *ex
 {
 	struct devlink_rate *devlink_rate;
 
-	list_for_each_entry(devlink_rate, &devlink->rate_list, list)
-		if (devlink_rate_is_node(devlink_rate)) {
+	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list)
+		if (devlink_rate->devlink == devlink &&
+		    devlink_rate_is_node(devlink_rate)) {
 			NL_SET_ERR_MSG(extack, "Rate node(s) exists.");
 			return -EBUSY;
 		}
@@ -612,7 +619,7 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 	}
 
 	refcount_set(&rate_node->refcnt, 1);
-	list_add(&rate_node->list, &devlink->rate_list);
+	list_add(&rate_node->list, &devlink->rate_domain->rate_list);
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
 	return rate_node;
 }
@@ -650,7 +657,7 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv,
 	devlink_rate->devlink = devlink;
 	devlink_rate->devlink_port = devlink_port;
 	devlink_rate->priv = priv;
-	list_add_tail(&devlink_rate->list, &devlink->rate_list);
+	list_add_tail(&devlink_rate->list, &devlink->rate_domain->rate_list);
 	devlink_port->devlink_rate = devlink_rate;
 	devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW);
 
@@ -691,13 +698,13 @@ EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy);
  */
 void devl_rate_nodes_destroy(struct devlink *devlink)
 {
-	static struct devlink_rate *devlink_rate, *tmp;
+	struct devlink_rate *devlink_rate, *tmp;
 	const struct devlink_ops *ops = devlink->ops;
 
 	devl_assert_locked(devlink);
 
-	list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
-		if (!devlink_rate->parent)
+	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
+		if (!devlink_rate->parent || devlink_rate->devlink != devlink)
 			continue;
 
 		refcount_dec(&devlink_rate->parent->refcnt);
@@ -708,8 +715,8 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 			ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv,
 						  NULL, NULL);
 	}
-	list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_list, list) {
-		if (devlink_rate_is_node(devlink_rate)) {
+	list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_domain->rate_list, list) {
+		if (devlink_rate->devlink == devlink && devlink_rate_is_node(devlink_rate)) {
 			ops->rate_node_del(devlink_rate, devlink_rate->priv, NULL);
 			list_del(&devlink_rate->list);
 			kfree(devlink_rate->name);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 02/10] devlink: Store devlink rates in a rate domain Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-14 12:54   ` Jiri Pirko
  2025-02-13 18:01 ` [PATCH net-next 04/10] devlink: Introduce shared " Tariq Toukan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Access to rates in a rate domain should be serialized.

This commit introduces two new functions, devl_rate_domain_lock and
devl_rate_domain_unlock, and uses them whenever serial access to a rate
domain is needed. For now, they are no-ops since access to a rate domain
is protected by the devlink lock.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/devlink/devl_internal.h |   4 ++
 net/devlink/rate.c          | 114 +++++++++++++++++++++++++++---------
 2 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index 209b4a4c7070..fae81dd6953f 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -121,6 +121,10 @@ static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock)
 		device_unlock(devlink->dev);
 }
 
+static inline void devl_rate_domain_lock(struct devlink *devlink) { }
+
+static inline void devl_rate_domain_unlock(struct devlink *devlink) { }
+
 typedef void devlink_rel_notify_cb_t(struct devlink *devlink, u32 obj_index);
 typedef void devlink_rel_cleanup_cb_t(struct devlink *devlink, u32 obj_index,
 				      u32 rel_index);
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 535863bb0c17..54e6a9893e3d 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -167,18 +167,22 @@ void devlink_rates_notify_register(struct devlink *devlink)
 {
 	struct devlink_rate *rate_node;
 
+	devl_rate_domain_lock(devlink);
 	list_for_each_entry(rate_node, &devlink->rate_domain->rate_list, list)
 		if (rate_node->devlink == devlink)
 			devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
+	devl_rate_domain_unlock(devlink);
 }
 
 void devlink_rates_notify_unregister(struct devlink *devlink)
 {
 	struct devlink_rate *rate_node;
 
+	devl_rate_domain_lock(devlink);
 	list_for_each_entry_reverse(rate_node, &devlink->rate_domain->rate_list, list)
 		if (rate_node->devlink == devlink)
 			devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL);
+	devl_rate_domain_unlock(devlink);
 }
 
 static int
@@ -190,6 +194,7 @@ devlink_nl_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
 	int idx = 0;
 	int err = 0;
 
+	devl_rate_domain_lock(devlink);
 	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
 		enum devlink_command cmd = DEVLINK_CMD_RATE_NEW;
 		u32 id = NETLINK_CB(cb->skb).portid;
@@ -209,6 +214,7 @@ devlink_nl_rate_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
 		}
 		idx++;
 	}
+	devl_rate_domain_unlock(devlink);
 
 	return err;
 }
@@ -225,23 +231,33 @@ int devlink_nl_rate_get_doit(struct sk_buff *skb, struct genl_info *info)
 	struct sk_buff *msg;
 	int err;
 
+	devl_rate_domain_lock(devlink);
 	devlink_rate = devlink_rate_get_from_info(devlink, info);
-	if (IS_ERR(devlink_rate))
-		return PTR_ERR(devlink_rate);
+	if (IS_ERR(devlink_rate)) {
+		err = PTR_ERR(devlink_rate);
+		goto unlock;
+	}
 
 	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
-	if (!msg)
-		return -ENOMEM;
+	if (!msg) {
+		err = -ENOMEM;
+		goto unlock;
+	}
 
 	err = devlink_nl_rate_fill(msg, devlink_rate, DEVLINK_CMD_RATE_NEW,
 				   info->snd_portid, info->snd_seq, 0,
 				   info->extack);
-	if (err) {
-		nlmsg_free(msg);
-		return err;
-	}
+	if (err)
+		goto err_fill;
 
+	devl_rate_domain_unlock(devlink);
 	return genlmsg_reply(msg, info);
+
+err_fill:
+	nlmsg_free(msg);
+unlock:
+	devl_rate_domain_unlock(devlink);
+	return err;
 }
 
 static bool
@@ -470,18 +486,24 @@ int devlink_nl_rate_set_doit(struct sk_buff *skb, struct genl_info *info)
 	const struct devlink_ops *ops;
 	int err;
 
+	devl_rate_domain_lock(devlink);
 	devlink_rate = devlink_rate_get_from_info(devlink, info);
-	if (IS_ERR(devlink_rate))
-		return PTR_ERR(devlink_rate);
+	if (IS_ERR(devlink_rate)) {
+		err = PTR_ERR(devlink_rate);
+		goto unlock;
+	}
 
 	ops = devlink->ops;
-	if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type))
-		return -EOPNOTSUPP;
+	if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type)) {
+		err = -EOPNOTSUPP;
+		goto unlock;
+	}
 
 	err = devlink_nl_rate_set(devlink_rate, ops, info);
-
 	if (!err)
 		devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW);
+unlock:
+	devl_rate_domain_unlock(devlink);
 	return err;
 }
 
@@ -501,15 +523,21 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
 	if (!devlink_rate_set_ops_supported(ops, info, DEVLINK_RATE_TYPE_NODE))
 		return -EOPNOTSUPP;
 
+	devl_rate_domain_lock(devlink);
 	rate_node = devlink_rate_node_get_from_attrs(devlink, info->attrs);
-	if (!IS_ERR(rate_node))
-		return -EEXIST;
-	else if (rate_node == ERR_PTR(-EINVAL))
-		return -EINVAL;
+	if (!IS_ERR(rate_node)) {
+		err = -EEXIST;
+		goto unlock;
+	} else if (rate_node == ERR_PTR(-EINVAL)) {
+		err = -EINVAL;
+		goto unlock;
+	}
 
 	rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
-	if (!rate_node)
-		return -ENOMEM;
+	if (!rate_node) {
+		err = -ENOMEM;
+		goto unlock;
+	}
 
 	rate_node->devlink = devlink;
 	rate_node->type = DEVLINK_RATE_TYPE_NODE;
@@ -530,6 +558,7 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
 	refcount_set(&rate_node->refcnt, 1);
 	list_add(&rate_node->list, &devlink->rate_domain->rate_list);
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
+	devl_rate_domain_unlock(devlink);
 	return 0;
 
 err_rate_set:
@@ -538,6 +567,8 @@ int devlink_nl_rate_new_doit(struct sk_buff *skb, struct genl_info *info)
 	kfree(rate_node->name);
 err_strdup:
 	kfree(rate_node);
+unlock:
+	devl_rate_domain_unlock(devlink);
 	return err;
 }
 
@@ -547,13 +578,17 @@ int devlink_nl_rate_del_doit(struct sk_buff *skb, struct genl_info *info)
 	struct devlink_rate *rate_node;
 	int err;
 
+	devl_rate_domain_lock(devlink);
 	rate_node = devlink_rate_node_get_from_info(devlink, info);
-	if (IS_ERR(rate_node))
-		return PTR_ERR(rate_node);
+	if (IS_ERR(rate_node)) {
+		err = PTR_ERR(rate_node);
+		goto unlock;
+	}
 
 	if (refcount_read(&rate_node->refcnt) > 1) {
 		NL_SET_ERR_MSG(info->extack, "Node has children. Cannot delete node.");
-		return -EBUSY;
+		err = -EBUSY;
+		goto unlock;
 	}
 
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_DEL);
@@ -564,20 +599,26 @@ int devlink_nl_rate_del_doit(struct sk_buff *skb, struct genl_info *info)
 	list_del(&rate_node->list);
 	kfree(rate_node->name);
 	kfree(rate_node);
+unlock:
+	devl_rate_domain_unlock(devlink);
 	return err;
 }
 
 int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *extack)
 {
 	struct devlink_rate *devlink_rate;
+	int err = 0;
 
+	devl_rate_domain_lock(devlink);
 	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list)
 		if (devlink_rate->devlink == devlink &&
 		    devlink_rate_is_node(devlink_rate)) {
 			NL_SET_ERR_MSG(extack, "Rate node(s) exists.");
-			return -EBUSY;
+			err = -EBUSY;
+			break;
 		}
-	return 0;
+	devl_rate_domain_unlock(devlink);
+	return err;
 }
 
 /**
@@ -595,13 +636,19 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 {
 	struct devlink_rate *rate_node;
 
+	devl_rate_domain_lock(devlink);
+
 	rate_node = devlink_rate_node_get_by_name(devlink, node_name);
-	if (!IS_ERR(rate_node))
-		return ERR_PTR(-EEXIST);
+	if (!IS_ERR(rate_node)) {
+		rate_node = ERR_PTR(-EEXIST);
+		goto unlock;
+	}
 
 	rate_node = kzalloc(sizeof(*rate_node), GFP_KERNEL);
-	if (!rate_node)
-		return ERR_PTR(-ENOMEM);
+	if (!rate_node) {
+		rate_node = ERR_PTR(-ENOMEM);
+		goto unlock;
+	}
 
 	if (parent) {
 		rate_node->parent = parent;
@@ -615,12 +662,15 @@ devl_rate_node_create(struct devlink *devlink, void *priv, char *node_name,
 	rate_node->name = kstrdup(node_name, GFP_KERNEL);
 	if (!rate_node->name) {
 		kfree(rate_node);
-		return ERR_PTR(-ENOMEM);
+		rate_node = ERR_PTR(-ENOMEM);
+		goto unlock;
 	}
 
 	refcount_set(&rate_node->refcnt, 1);
 	list_add(&rate_node->list, &devlink->rate_domain->rate_list);
 	devlink_rate_notify(rate_node, DEVLINK_CMD_RATE_NEW);
+unlock:
+	devl_rate_domain_unlock(devlink);
 	return rate_node;
 }
 EXPORT_SYMBOL_GPL(devl_rate_node_create);
@@ -648,6 +698,7 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv,
 	if (!devlink_rate)
 		return -ENOMEM;
 
+	devl_rate_domain_lock(devlink);
 	if (parent) {
 		devlink_rate->parent = parent;
 		refcount_inc(&devlink_rate->parent->refcnt);
@@ -660,6 +711,7 @@ int devl_rate_leaf_create(struct devlink_port *devlink_port, void *priv,
 	list_add_tail(&devlink_rate->list, &devlink->rate_domain->rate_list);
 	devlink_port->devlink_rate = devlink_rate;
 	devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_NEW);
+	devl_rate_domain_unlock(devlink);
 
 	return 0;
 }
@@ -680,11 +732,13 @@ void devl_rate_leaf_destroy(struct devlink_port *devlink_port)
 	if (!devlink_rate)
 		return;
 
+	devl_rate_domain_lock(devlink_port->devlink);
 	devlink_rate_notify(devlink_rate, DEVLINK_CMD_RATE_DEL);
 	if (devlink_rate->parent)
 		refcount_dec(&devlink_rate->parent->refcnt);
 	list_del(&devlink_rate->list);
 	devlink_port->devlink_rate = NULL;
+	devl_rate_domain_unlock(devlink_port->devlink);
 	kfree(devlink_rate);
 }
 EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy);
@@ -702,6 +756,7 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 	const struct devlink_ops *ops = devlink->ops;
 
 	devl_assert_locked(devlink);
+	devl_rate_domain_lock(devlink);
 
 	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
 		if (!devlink_rate->parent || devlink_rate->devlink != devlink)
@@ -723,5 +778,6 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 			kfree(devlink_rate);
 		}
 	}
+	devl_rate_domain_unlock(devlink);
 }
 EXPORT_SYMBOL_GPL(devl_rate_nodes_destroy);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 04/10] devlink: Introduce shared rate domains
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (2 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 03/10] devlink: Serialize access to rate domains Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 05/10] devlink: Allow specifying parent device for rate commands Tariq Toukan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

The underlying idea is modeling a piece of hardware which:
1. Exposes multiple functions as separate devlink objects.
2. Is capable of instantiating a transmit scheduling tree spanning
   multiple functions.

Modeling this requires devlink rate nodes with parents across other
devlink objects. A naive approach that relies on the current
one-lock-per-devlink model is impossible, as it would require in some
cases acquiring multiple devlink locks in the correct order.

Based on the preliminary patches in this series, this commit introduces
the concept of a shared rate domain.

1. A shared rate domain stores rate nodes for a piece of hardware that
   has the properties described at the beginning.
2. A shared rate domain is identified by the devlink operations pointer
   (a proxy for the device type) and a unique u64 hardware identifier
   provided by the driver.
3. There is a global registry of reference counted shared rate domains.
4. A devlink object starts out with a private rate domain, and can be
   switched once to use a shared rate domain with
   devlink_shared_rate_domain_init. Further calls do nothing.
5. Shared rate domains have an additional mutex serializing access to
   rate nodes, acquired by the previously introduced functions
   devl_rate_domain_lock and devl_rate_domain_unlock.

These new code paths are unused for now. A caller to
devlink_shared_rate_domain_init will be added in a subsequent patch.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 include/net/devlink.h       |  8 ++++
 net/devlink/core.c          | 79 ++++++++++++++++++++++++++++++++++++-
 net/devlink/dev.c           |  2 +-
 net/devlink/devl_internal.h | 26 ++++++++++--
 net/devlink/rate.c          | 15 +++++++
 5 files changed, 124 insertions(+), 6 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b8783126c1ed..a9675c1810e6 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1561,6 +1561,14 @@ void devlink_register(struct devlink *devlink);
 void devlink_unregister(struct devlink *devlink);
 void devlink_free(struct devlink *devlink);
 
+/* Can be used to tell devlink that shared rate domains are supported.
+ * The same id needs to be provided for devlink objects that can share
+ * rate nodes in hw (e.g. contain nodes with parents in other devlink objects).
+ * This requires holding the devlink lock and can only be called once per object.
+ * Rate node relationships across different rate domains are not supported.
+ */
+int devlink_shared_rate_domain_init(struct devlink *devlink, u64 id);
+
 /**
  * struct devlink_port_ops - Port operations
  * @port_split: Callback used to split the port into multiple ones.
diff --git a/net/devlink/core.c b/net/devlink/core.c
index 06a2e2dce558..9d374d84225a 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -289,6 +289,80 @@ void devl_unlock(struct devlink *devlink)
 }
 EXPORT_SYMBOL_GPL(devl_unlock);
 
+/* A global data struct with all shared rate domains. */
+static struct {
+	struct mutex lock;    /* Acquired after the devlink lock. */
+	struct list_head rate_domains;
+} devlink_rate_domains = {
+	.lock = __MUTEX_INITIALIZER(devlink_rate_domains.lock),
+	.rate_domains = LIST_HEAD_INIT(devlink_rate_domains.rate_domains),
+};
+
+static bool devlink_rate_domain_eq(struct devlink_rate_domain *rate_domain,
+				   const struct devlink_ops *ops, u64 id)
+{
+	return rate_domain->ops == ops && rate_domain->id == id;
+}
+
+int devlink_shared_rate_domain_init(struct devlink *devlink, u64 id)
+{
+	struct devlink_rate_domain *rate_domain;
+	int err = 0;
+
+	devl_assert_locked(devlink);
+
+	if (devlink->rate_domain->shared) {
+		if (devlink_rate_domain_eq(devlink->rate_domain, devlink->ops, id))
+			return 0;
+		return -EEXIST;
+	}
+	if (!list_empty(&devlink->rate_domain->rate_list))
+		return -EINVAL;
+
+	mutex_lock(&devlink_rate_domains.lock);
+	list_for_each_entry(rate_domain, &devlink_rate_domains.rate_domains, list) {
+		if (devlink_rate_domain_eq(rate_domain, devlink->ops, id)) {
+			refcount_inc(&rate_domain->refcount);
+			goto replace_domain;
+		}
+	}
+
+	/* Shared domain not found, create one. */
+	rate_domain = kvzalloc(sizeof(*rate_domain), GFP_KERNEL);
+	if (!rate_domain) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+	rate_domain->shared = true;
+	rate_domain->ops = devlink->ops;
+	rate_domain->id = id;
+	mutex_init(&rate_domain->lock);
+	INIT_LIST_HEAD(&rate_domain->rate_list);
+	refcount_set(&rate_domain->refcount, 1);
+	list_add_tail(&rate_domain->list, &devlink_rate_domains.rate_domains);
+replace_domain:
+	kvfree(devlink->rate_domain);
+	devlink->rate_domain = rate_domain;
+unlock:
+	mutex_unlock(&devlink_rate_domains.lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_shared_rate_domain_init);
+
+static void devlink_rate_domain_put(struct devlink_rate_domain *rate_domain)
+{
+	if (rate_domain->shared) {
+		if (!refcount_dec_and_test(&rate_domain->refcount))
+			return;
+
+		WARN_ON(!list_empty(&rate_domain->rate_list));
+		mutex_lock(&devlink_rate_domains.lock);
+		list_del(&rate_domain->list);
+		mutex_unlock(&devlink_rate_domains.lock);
+	}
+	kvfree(rate_domain);
+}
+
 /**
  * devlink_try_get() - try to obtain a reference on a devlink instance
  * @devlink: instance to reference
@@ -314,7 +388,7 @@ static void devlink_release(struct work_struct *work)
 	mutex_destroy(&devlink->lock);
 	lockdep_unregister_key(&devlink->lock_key);
 	put_device(devlink->dev);
-	kvfree(devlink->rate_domain);
+	devlink_rate_domain_put(devlink->rate_domain);
 	kvfree(devlink);
 }
 
@@ -428,6 +502,7 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	devlink->rate_domain = kvzalloc(sizeof(*devlink->rate_domain), GFP_KERNEL);
 	if (!devlink->rate_domain)
 		goto err_rate_domain;
+	devlink->rate_domain->shared = false;
 	INIT_LIST_HEAD(&devlink->rate_domain->rate_list);
 
 	ret = xa_alloc_cyclic(&devlinks, &devlink->index, devlink, xa_limit_31b,
@@ -484,7 +559,7 @@ void devlink_free(struct devlink *devlink)
 	WARN_ON(!list_empty(&devlink->resource_list));
 	WARN_ON(!list_empty(&devlink->dpipe_table_list));
 	WARN_ON(!list_empty(&devlink->sb_list));
-	WARN_ON(!list_empty(&devlink->rate_domain->rate_list));
+	WARN_ON(devlink_rates_check(devlink));
 	WARN_ON(!list_empty(&devlink->linecard_list));
 	WARN_ON(!xa_empty(&devlink->ports));
 
diff --git a/net/devlink/dev.c b/net/devlink/dev.c
index c926c75cc10d..84353a85e8fe 100644
--- a/net/devlink/dev.c
+++ b/net/devlink/dev.c
@@ -434,7 +434,7 @@ static void devlink_reload_reinit_sanity_check(struct devlink *devlink)
 	WARN_ON(!list_empty(&devlink->trap_list));
 	WARN_ON(!list_empty(&devlink->dpipe_table_list));
 	WARN_ON(!list_empty(&devlink->sb_list));
-	WARN_ON(!list_empty(&devlink->rate_domain->rate_list));
+	WARN_ON(devlink_rates_check(devlink));
 	WARN_ON(!list_empty(&devlink->linecard_list));
 	WARN_ON(!xa_empty(&devlink->ports));
 }
diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index fae81dd6953f..7401aab274e5 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -30,9 +30,20 @@ struct devlink_dev_stats {
 	u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
 };
 
-/* Stores devlink rates associated with a rate domain. */
+/* Stores devlink rates associated with a rate domain.
+ * Multiple devlink objects may share the same domain (when 'shared' is true)
+ * and rate nodes can have members from multiple devices.
+ */
 struct devlink_rate_domain {
+	bool shared;
+	struct list_head list;
 	struct list_head rate_list;
+	/* Fields below are only used for shared rate domains. */
+	const struct devlink_ops *ops;
+	u64 id;
+	refcount_t refcount;
+	/* Serializes access to rates. */
+	struct mutex lock;
 };
 
 struct devlink {
@@ -121,9 +132,17 @@ static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock)
 		device_unlock(devlink->dev);
 }
 
-static inline void devl_rate_domain_lock(struct devlink *devlink) { }
+static inline void devl_rate_domain_lock(struct devlink *devlink)
+{
+	if (devlink->rate_domain->shared)
+		mutex_lock(&devlink->rate_domain->lock);
+}
 
-static inline void devl_rate_domain_unlock(struct devlink *devlink) { }
+static inline void devl_rate_domain_unlock(struct devlink *devlink)
+{
+	if (devlink->rate_domain->shared)
+		mutex_unlock(&devlink->rate_domain->lock);
+}
 
 typedef void devlink_rel_notify_cb_t(struct devlink *devlink, u32 obj_index);
 typedef void devlink_rel_cleanup_cb_t(struct devlink *devlink, u32 obj_index,
@@ -307,6 +326,7 @@ int devlink_resources_validate(struct devlink *devlink,
 
 /* Rates */
 int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *extack);
+int devlink_rates_check(struct devlink *devlink);
 
 /* Linecards */
 unsigned int devlink_linecard_index(struct devlink_linecard *linecard);
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 54e6a9893e3d..38f18216eb80 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -621,6 +621,21 @@ int devlink_rate_nodes_check(struct devlink *devlink, struct netlink_ext_ack *ex
 	return err;
 }
 
+int devlink_rates_check(struct devlink *devlink)
+{
+	struct devlink_rate *devlink_rate;
+	int err = 0;
+
+	devl_rate_domain_lock(devlink);
+	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list)
+		if (devlink_rate->devlink == devlink) {
+			err = -EBUSY;
+			break;
+		}
+	devl_rate_domain_unlock(devlink);
+	return err;
+}
+
 /**
  * devl_rate_node_create - create devlink rate node
  * @devlink: devlink instance
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 05/10] devlink: Allow specifying parent device for rate commands
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (3 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 04/10] devlink: Introduce shared " Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 06/10] devlink: Allow rate node parents from other devlinks Tariq Toukan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Previously, the parent device was assumed to be the same one as one
owning the rate node being manipulated.

This patch expands that by allowing rate commands to accept two
additional arguments, the parent bus name and the parent dev name.
Besides introducing these two attributes in the uapi, this patch:
- adds plumbing in the 'rate-new' and 'rate-set' commands to access
  these attributes.
- extends devlink-nl-pre-doit with getting and saving a reference to the
  parent dev in info->user_ptr[1].
  The parent dev needs to exist and be registered and is then unlocked
  but not put. This ensures that it will stay alive until the end of the
  request.
- changes devlink-nl-post-doit with putting the parent reference, if
  obtained.

Example usage with ynl:
Setting parent to a rate node on another device
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-set --json '{ \
    "bus-name": "pci", \
    "dev-name": "0000:08:00.1", \
    "port-index": 65537, \
    "parent-dev-bus-name": "pci", \
    "parent-dev-name": "0000:08:00.0", \
    "rate-parent-node-name": "group1" \
  }'

Getting the rate details:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-get --json '{ \
    "bus-name": "pci", \
    "dev-name": "0000:08:00.1", \
    "port-index": 65537 \
  }'

Output:
{'bus-name': 'pci',
 'dev-name': '0000:08:00.1',
 'parent-dev-bus-name': 'pci',
 'parent-dev-name': '0000:08:00.0',
 'port-index': 65537,
 'rate-parent-node-name': 'group1',
 'rate-tc-bws': <snipped for brevity>
 'rate-tx-max': 0,
 'rate-tx-priority': 0,
 'rate-tx-share': 0,
 'rate-tx-weight': 0,
 'rate-type': 'leaf'}

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 Documentation/netlink/specs/devlink.yaml | 18 ++++--
 include/uapi/linux/devlink.h             |  3 +
 net/devlink/netlink.c                    | 74 +++++++++++++++++++-----
 net/devlink/netlink_gen.c                | 20 ++++---
 net/devlink/netlink_gen.h                |  7 +++
 5 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 09fbb4c03fc8..252db0f080ba 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -820,6 +820,12 @@ attribute-sets:
       -
         name: region-direct
         type: flag
+      -
+        name: parent-dev-bus-name
+        type: string
+      -
+        name: parent-dev-name
+        type: string
 
   -
     name: dl-dev-stats
@@ -2137,8 +2143,8 @@ operations:
       dont-validate: [ strict ]
       flags: [ admin-perm ]
       do:
-        pre: devlink-nl-pre-doit
-        post: devlink-nl-post-doit
+        pre: devlink-nl-pre-doit-parent-dev-optional
+        post: devlink-nl-post-doit-parent-dev-optional
         request:
           attributes:
             - bus-name
@@ -2149,6 +2155,8 @@ operations:
             - rate-tx-priority
             - rate-tx-weight
             - rate-parent-node-name
+            - parent-dev-bus-name
+            - parent-dev-name
 
     -
       name: rate-new
@@ -2157,8 +2165,8 @@ operations:
       dont-validate: [ strict ]
       flags: [ admin-perm ]
       do:
-        pre: devlink-nl-pre-doit
-        post: devlink-nl-post-doit
+        pre: devlink-nl-pre-doit-parent-dev-optional
+        post: devlink-nl-post-doit-parent-dev-optional
         request:
           attributes:
             - bus-name
@@ -2169,6 +2177,8 @@ operations:
             - rate-tx-priority
             - rate-tx-weight
             - rate-parent-node-name
+            - parent-dev-bus-name
+            - parent-dev-name
 
     -
       name: rate-del
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9401aa343673..a7b62e53b7eb 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -614,6 +614,9 @@ enum devlink_attr {
 
 	DEVLINK_ATTR_REGION_DIRECT,		/* flag */
 
+	DEVLINK_ATTR_PARENT_DEV_BUS_NAME,	/* string */
+	DEVLINK_ATTR_PARENT_DEV_NAME,		/* string */
+
 	/* Add new attributes above here, update the spec in
 	 * Documentation/netlink/specs/devlink.yaml and re-generate
 	 * net/devlink/netlink_gen.c.
diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c
index 593605c1b1ef..d8e7e23afce2 100644
--- a/net/devlink/netlink.c
+++ b/net/devlink/netlink.c
@@ -12,6 +12,7 @@
 #define DEVLINK_NL_FLAG_NEED_PORT		BIT(0)
 #define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT	BIT(1)
 #define DEVLINK_NL_FLAG_NEED_DEV_LOCK		BIT(2)
+#define DEVLINK_NL_FLAG_NEED_PARENT_DEV		BIT(3)
 
 static const struct genl_multicast_group devlink_nl_mcgrps[] = {
 	[DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME },
@@ -177,20 +178,15 @@ int devlink_nl_msg_reply_and_new(struct sk_buff **msg, struct genl_info *info)
 	return 0;
 }
 
-struct devlink *
-devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs,
-			    bool dev_lock)
+static struct devlink *
+devlink_get_lock(struct net *net, struct nlattr *bus_attr, struct nlattr *dev_attr, bool dev_lock)
 {
+	char *busname, *devname;
 	struct devlink *devlink;
 	unsigned long index;
-	char *busname;
-	char *devname;
-
-	if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME])
-		return ERR_PTR(-EINVAL);
 
-	busname = nla_data(attrs[DEVLINK_ATTR_BUS_NAME]);
-	devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]);
+	busname = nla_data(bus_attr);
+	devname = nla_data(dev_attr);
 
 	devlinks_xa_for_each_registered_get(net, index, devlink) {
 		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
@@ -206,19 +202,45 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs,
 	return ERR_PTR(-ENODEV);
 }
 
+struct devlink *
+devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs, bool dev_lock)
+{
+	if (!attrs[DEVLINK_ATTR_BUS_NAME] || !attrs[DEVLINK_ATTR_DEV_NAME])
+		return ERR_PTR(-EINVAL);
+
+	return devlink_get_lock(net, attrs[DEVLINK_ATTR_BUS_NAME],
+				attrs[DEVLINK_ATTR_DEV_NAME], dev_lock);
+}
+
 static int __devlink_nl_pre_doit(struct sk_buff *skb, struct genl_info *info,
 				 u8 flags)
 {
+	bool parent_dev = flags & DEVLINK_NL_FLAG_NEED_PARENT_DEV;
 	bool dev_lock = flags & DEVLINK_NL_FLAG_NEED_DEV_LOCK;
+	struct devlink *devlink, *parent_devlink = NULL;
+	struct nlattr **attrs = info->attrs;
 	struct devlink_port *devlink_port;
-	struct devlink *devlink;
 	int err;
 
-	devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs,
-					      dev_lock);
-	if (IS_ERR(devlink))
-		return PTR_ERR(devlink);
+	if (parent_dev && attrs[DEVLINK_ATTR_PARENT_DEV_BUS_NAME] &&
+	    attrs[DEVLINK_ATTR_PARENT_DEV_NAME]) {
+		parent_devlink = devlink_get_lock(genl_info_net(info),
+						  attrs[DEVLINK_ATTR_PARENT_DEV_BUS_NAME],
+						  attrs[DEVLINK_ATTR_PARENT_DEV_NAME], dev_lock);
+		if (IS_ERR(parent_devlink))
+			return PTR_ERR(parent_devlink);
+		info->user_ptr[1] = parent_devlink;
+		/* Drop the parent devlink lock, but don't put it just yet.
+		 * This will keep it alive until the end of the request.
+		 */
+		devl_dev_unlock(parent_devlink, dev_lock);
+	}
 
+	devlink = devlink_get_from_attrs_lock(genl_info_net(info), attrs, dev_lock);
+	if (IS_ERR(devlink)) {
+		err = PTR_ERR(devlink);
+		goto parent_put;
+	}
 	info->user_ptr[0] = devlink;
 	if (flags & DEVLINK_NL_FLAG_NEED_PORT) {
 		devlink_port = devlink_port_get_from_info(devlink, info);
@@ -237,6 +259,9 @@ static int __devlink_nl_pre_doit(struct sk_buff *skb, struct genl_info *info,
 unlock:
 	devl_dev_unlock(devlink, dev_lock);
 	devlink_put(devlink);
+parent_put:
+	if (parent_dev && parent_devlink)
+		devlink_put(parent_devlink);
 	return err;
 }
 
@@ -265,6 +290,13 @@ int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops,
 	return __devlink_nl_pre_doit(skb, info, DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT);
 }
 
+int devlink_nl_pre_doit_parent_dev_optional(const struct genl_split_ops *ops,
+					    struct sk_buff *skb,
+					    struct genl_info *info)
+{
+	return __devlink_nl_pre_doit(skb, info, DEVLINK_NL_FLAG_NEED_PARENT_DEV);
+}
+
 static void __devlink_nl_post_doit(struct sk_buff *skb, struct genl_info *info,
 				   u8 flags)
 {
@@ -274,6 +306,10 @@ static void __devlink_nl_post_doit(struct sk_buff *skb, struct genl_info *info,
 	devlink = info->user_ptr[0];
 	devl_dev_unlock(devlink, dev_lock);
 	devlink_put(devlink);
+	if ((flags & DEVLINK_NL_FLAG_NEED_PARENT_DEV) && info->user_ptr[1]) {
+		devlink = info->user_ptr[1];
+		devlink_put(devlink);
+	}
 }
 
 void devlink_nl_post_doit(const struct genl_split_ops *ops,
@@ -289,6 +325,14 @@ devlink_nl_post_doit_dev_lock(const struct genl_split_ops *ops,
 	__devlink_nl_post_doit(skb, info, DEVLINK_NL_FLAG_NEED_DEV_LOCK);
 }
 
+void
+devlink_nl_post_doit_parent_dev_optional(const struct genl_split_ops *ops,
+					 struct sk_buff *skb,
+					 struct genl_info *info)
+{
+	__devlink_nl_post_doit(skb, info, DEVLINK_NL_FLAG_NEED_PARENT_DEV);
+}
+
 static int devlink_nl_inst_single_dumpit(struct sk_buff *msg,
 					 struct netlink_callback *cb, int flags,
 					 devlink_nl_dump_one_func_t *dump_one,
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index f9786d51f68f..2dc82cc9a0b1 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -496,7 +496,7 @@ static const struct nla_policy devlink_rate_get_dump_nl_policy[DEVLINK_ATTR_DEV_
 };
 
 /* DEVLINK_CMD_RATE_SET - do */
-static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
+static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_PARENT_DEV_NAME + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
@@ -505,10 +505,12 @@ static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_W
 	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32, },
 	[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32, },
 	[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING, },
+	[DEVLINK_ATTR_PARENT_DEV_BUS_NAME] = { .type = NLA_NUL_STRING, },
+	[DEVLINK_ATTR_PARENT_DEV_NAME] = { .type = NLA_NUL_STRING, },
 };
 
 /* DEVLINK_CMD_RATE_NEW - do */
-static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
+static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_PARENT_DEV_NAME + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
 	[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
@@ -517,6 +519,8 @@ static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TX_W
 	[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32, },
 	[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32, },
 	[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING, },
+	[DEVLINK_ATTR_PARENT_DEV_BUS_NAME] = { .type = NLA_NUL_STRING, },
+	[DEVLINK_ATTR_PARENT_DEV_NAME] = { .type = NLA_NUL_STRING, },
 };
 
 /* DEVLINK_CMD_RATE_DEL - do */
@@ -1160,21 +1164,21 @@ const struct genl_split_ops devlink_nl_ops[74] = {
 	{
 		.cmd		= DEVLINK_CMD_RATE_SET,
 		.validate	= GENL_DONT_VALIDATE_STRICT,
-		.pre_doit	= devlink_nl_pre_doit,
+		.pre_doit	= devlink_nl_pre_doit_parent_dev_optional,
 		.doit		= devlink_nl_rate_set_doit,
-		.post_doit	= devlink_nl_post_doit,
+		.post_doit	= devlink_nl_post_doit_parent_dev_optional,
 		.policy		= devlink_rate_set_nl_policy,
-		.maxattr	= DEVLINK_ATTR_RATE_TX_WEIGHT,
+		.maxattr	= DEVLINK_ATTR_PARENT_DEV_NAME,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 	{
 		.cmd		= DEVLINK_CMD_RATE_NEW,
 		.validate	= GENL_DONT_VALIDATE_STRICT,
-		.pre_doit	= devlink_nl_pre_doit,
+		.pre_doit	= devlink_nl_pre_doit_parent_dev_optional,
 		.doit		= devlink_nl_rate_new_doit,
-		.post_doit	= devlink_nl_post_doit,
+		.post_doit	= devlink_nl_post_doit_parent_dev_optional,
 		.policy		= devlink_rate_new_nl_policy,
-		.maxattr	= DEVLINK_ATTR_RATE_TX_WEIGHT,
+		.maxattr	= DEVLINK_ATTR_PARENT_DEV_NAME,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 	{
diff --git a/net/devlink/netlink_gen.h b/net/devlink/netlink_gen.h
index 8f2bd50ddf5e..dd402264a47f 100644
--- a/net/devlink/netlink_gen.h
+++ b/net/devlink/netlink_gen.h
@@ -27,12 +27,19 @@ int devlink_nl_pre_doit_dev_lock(const struct genl_split_ops *ops,
 int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops,
 				      struct sk_buff *skb,
 				      struct genl_info *info);
+int devlink_nl_pre_doit_parent_dev_optional(const struct genl_split_ops *ops,
+					    struct sk_buff *skb,
+					    struct genl_info *info);
 void
 devlink_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
 		     struct genl_info *info);
 void
 devlink_nl_post_doit_dev_lock(const struct genl_split_ops *ops,
 			      struct sk_buff *skb, struct genl_info *info);
+void
+devlink_nl_post_doit_parent_dev_optional(const struct genl_split_ops *ops,
+					 struct sk_buff *skb,
+					 struct genl_info *info);
 
 int devlink_nl_get_doit(struct sk_buff *skb, struct genl_info *info);
 int devlink_nl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 06/10] devlink: Allow rate node parents from other devlinks
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (4 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 05/10] devlink: Allow specifying parent device for rate commands Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 07/10] net/mlx5: qos: Introduce shared esw qos domains Tariq Toukan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

This commit make use of the unlocked parent devlink from
info->user_ptr[1] to assign a devlink rate node to the requested parent
node. Because it is not locked, none of its mutable fields can be used.
But parent setting only requires:
1. Verifying that the same rate domain is used. The rate domain is
   immutable once set, so this is safe.
2. Comparing devlink_rate->devlink with the requested parent devlink.
   As the shared devlink rate domain is locked, other entities cannot
   concurrently make changes to any of its rates.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 net/devlink/rate.c | 46 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 38f18216eb80..e6b4f4cb8a01 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -125,10 +125,26 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
 			devlink_rate->tx_weight))
 		goto nla_put_failure;
 
-	if (devlink_rate->parent)
-		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME,
-				   devlink_rate->parent->name))
+	if (devlink_rate->parent) {
+		struct devlink_rate *parent = devlink_rate->parent;
+
+		if (nla_put_string(msg, DEVLINK_ATTR_RATE_PARENT_NODE_NAME, parent->name))
 			goto nla_put_failure;
+		if (parent->devlink != devlink) {
+			/* The parent devlink isn't locked, but a reference to
+			 * it is held so it cannot suddenly disappear.
+			 * And since there are rate nodes pointing to it,
+			 * the parent devlink is fully initialized and
+			 * the fields accessed here are valid and immutable.
+			 */
+			if (nla_put_string(msg, DEVLINK_ATTR_PARENT_DEV_BUS_NAME,
+					   parent->devlink->dev->bus->name))
+				goto nla_put_failure;
+			if (nla_put_string(msg, DEVLINK_ATTR_PARENT_DEV_NAME,
+					   dev_name(parent->devlink->dev)))
+				goto nla_put_failure;
+		}
+	}
 
 	genlmsg_end(msg, hdr);
 	return 0;
@@ -281,9 +297,18 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 	const char *parent_name = nla_data(nla_parent);
 	const struct devlink_ops *ops = devlink->ops;
 	size_t len = strlen(parent_name);
+	struct devlink *parent_devlink;
 	struct devlink_rate *parent;
 	int err = -EOPNOTSUPP;
 
+	parent_devlink = info->user_ptr[1] ? : devlink;
+	if (parent_devlink != devlink) {
+		if (parent_devlink->rate_domain != devlink->rate_domain) {
+			NL_SET_ERR_MSG(info->extack,
+				       "Cannot set parent to a rate from a different rate domain");
+			return -EINVAL;
+		}
+	}
 	parent = devlink_rate->parent;
 
 	if (parent && !len) {
@@ -301,7 +326,11 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
 		refcount_dec(&parent->refcnt);
 		devlink_rate->parent = NULL;
 	} else if (len) {
-		parent = devlink_rate_node_get_by_name(devlink, parent_name);
+		/* parent_devlink (if != devlink) isn't locked, but the rate
+		 * domain, immutable once set, is already locked and the parent
+		 * is only used to determine node owner via pointer comparison.
+		 */
+		parent = devlink_rate_node_get_by_name(parent_devlink, parent_name);
 		if (IS_ERR(parent))
 			return -ENODEV;
 
@@ -762,8 +791,8 @@ EXPORT_SYMBOL_GPL(devl_rate_leaf_destroy);
  * devl_rate_nodes_destroy - destroy all devlink rate nodes on device
  * @devlink: devlink instance
  *
- * Unset parent for all rate objects and destroy all rate nodes
- * on specified device.
+ * Unset parent for all rate objects that involve this device and destroy all
+ * rate nodes on it.
  */
 void devl_rate_nodes_destroy(struct devlink *devlink)
 {
@@ -774,7 +803,9 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 	devl_rate_domain_lock(devlink);
 
 	list_for_each_entry(devlink_rate, &devlink->rate_domain->rate_list, list) {
-		if (!devlink_rate->parent || devlink_rate->devlink != devlink)
+		if (!devlink_rate->parent ||
+		    (devlink_rate->devlink != devlink &&
+		     devlink_rate->parent->devlink != devlink))
 			continue;
 
 		refcount_dec(&devlink_rate->parent->refcnt);
@@ -784,6 +815,7 @@ void devl_rate_nodes_destroy(struct devlink *devlink)
 		else if (devlink_rate_is_node(devlink_rate))
 			ops->rate_node_parent_set(devlink_rate, NULL, devlink_rate->priv,
 						  NULL, NULL);
+		devlink_rate->parent = NULL;
 	}
 	list_for_each_entry_safe(devlink_rate, tmp, &devlink->rate_domain->rate_list, list) {
 		if (devlink_rate->devlink == devlink && devlink_rate_is_node(devlink_rate)) {
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 07/10] net/mlx5: qos: Introduce shared esw qos domains
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (5 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 06/10] devlink: Allow rate node parents from other devlinks Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 08/10] net/mlx5: qos: Support cross-esw tx scheduling Tariq Toukan
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Introduce shared esw qos domains, capable of holding rate groups for
multiple E-Switches of the same NIC. Shared qos domains are reference
counted and can be discovered via devcom. The devcom comp lock is used
in write-mode to prevent init/cleanup races.

When initializing a shared qos domain fails due to devcom errors,
the code falls back to using a private qos domain while logging a
message that cross-esw scheduling cannot be supported.

Shared esw qos domains will be used in a future patch to support
cross-eswitch scheduling.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 73 +++++++++++++++++--
 1 file changed, 67 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 8b7c843446e1..6a469f214187 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -11,10 +11,17 @@
 /* Minimum supported BW share value by the HW is 1 Mbit/sec */
 #define MLX5_MIN_BW_SHARE 1
 
-/* Holds rate nodes associated with an E-Switch. */
+/* Holds rate nodes associated with one or more E-Switches.
+ * If cross-esw scheduling is supported, this is shared between all
+ * E-Switches of a NIC.
+ */
 struct mlx5_qos_domain {
 	/* Serializes access to all qos changes in the qos domain. */
 	struct mutex lock;
+	/* Whether this domain is shared with other E-Switches. */
+	bool shared;
+	/* The reference count is only used for shared qos domains. */
+	refcount_t refcnt;
 	/* List of all mlx5_esw_sched_nodes. */
 	struct list_head nodes;
 };
@@ -34,7 +41,7 @@ static void esw_assert_qos_lock_held(struct mlx5_eswitch *esw)
 	lockdep_assert_held(&esw->qos.domain->lock);
 }
 
-static struct mlx5_qos_domain *esw_qos_domain_alloc(void)
+static struct mlx5_qos_domain *esw_qos_domain_alloc(bool shared)
 {
 	struct mlx5_qos_domain *qos_domain;
 
@@ -44,21 +51,75 @@ static struct mlx5_qos_domain *esw_qos_domain_alloc(void)
 
 	mutex_init(&qos_domain->lock);
 	INIT_LIST_HEAD(&qos_domain->nodes);
+	qos_domain->shared = shared;
+	if (shared)
+		refcount_set(&qos_domain->refcnt, 1);
 
 	return qos_domain;
 }
 
-static int esw_qos_domain_init(struct mlx5_eswitch *esw)
+static void esw_qos_devcom_lock(struct mlx5_devcom_comp_dev *devcom, bool shared)
 {
-	esw->qos.domain = esw_qos_domain_alloc();
+	if (shared)
+		mlx5_devcom_comp_lock(devcom);
+}
+
+static void esw_qos_devcom_unlock(struct mlx5_devcom_comp_dev *devcom, bool shared)
+{
+	if (shared)
+		mlx5_devcom_comp_unlock(devcom);
+}
+
+static int esw_qos_domain_init(struct mlx5_eswitch *esw, bool shared)
+{
+	struct mlx5_devcom_comp_dev *devcom = esw->dev->priv.hca_devcom_comp;
+
+	if (shared && IS_ERR_OR_NULL(devcom)) {
+		esw_info(esw->dev, "Cross-esw QoS cannot be initialized because devcom is unavailable.");
+		shared = false;
+	}
+
+	esw_qos_devcom_lock(devcom, shared);
+	if (shared) {
+		struct mlx5_devcom_comp_dev *pos;
+		struct mlx5_core_dev *peer_dev;
+
+		mlx5_devcom_for_each_peer_entry(devcom, peer_dev, pos) {
+			struct mlx5_eswitch *peer_esw = peer_dev->priv.eswitch;
+
+			if (peer_esw->qos.domain && peer_esw->qos.domain->shared) {
+				esw->qos.domain = peer_esw->qos.domain;
+				refcount_inc(&esw->qos.domain->refcnt);
+				goto unlock;
+			}
+		}
+	}
+
+	/* If no shared domain found, this esw will create one.
+	 * Doing it with the devcom comp lock held prevents races with other
+	 * eswitches doing concurrent init.
+	 */
+	esw->qos.domain = esw_qos_domain_alloc(shared);
+unlock:
+	esw_qos_devcom_unlock(devcom, shared);
 
 	return esw->qos.domain ? 0 : -ENOMEM;
 }
 
 static void esw_qos_domain_release(struct mlx5_eswitch *esw)
 {
-	kfree(esw->qos.domain);
+	struct mlx5_devcom_comp_dev *devcom = esw->dev->priv.hca_devcom_comp;
+	struct mlx5_qos_domain *domain = esw->qos.domain;
+	bool shared = domain->shared;
+
+	/* Shared domains are released with the devcom comp lock held to
+	 * prevent races with other eswitches doing concurrent init.
+	 */
+	esw_qos_devcom_lock(devcom, shared);
+	if (!shared || refcount_dec_and_test(&domain->refcnt))
+		kfree(domain);
 	esw->qos.domain = NULL;
+	esw_qos_devcom_unlock(devcom, shared);
 }
 
 enum sched_node_type {
@@ -829,7 +890,7 @@ int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
 	if (esw->qos.domain)
 		return 0;  /* Nothing to change. */
 
-	return esw_qos_domain_init(esw);
+	return esw_qos_domain_init(esw, false);
 }
 
 void mlx5_esw_qos_cleanup(struct mlx5_eswitch *esw)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 08/10] net/mlx5: qos: Support cross-esw tx scheduling
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (6 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 07/10] net/mlx5: qos: Introduce shared esw qos domains Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 09/10] net/mlx5: qos: Init shared devlink rate domain Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 10/10] net/mlx5: Document devlink rates and cross-esw scheduling Tariq Toukan
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Up to now, rate groups could only contain vports from the same E-Switch.
This patch relaxes that restriction if the device supports it
(HCA_CAP.esw_cross_esw_sched == true) and the right conditions are met:
- Link Aggregation (LAG) is enabled.
- The E-Switches use the same qos domain.

This also enables the use of the previously added shared esw qos
domains.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 59 +++++++++++++++----
 1 file changed, 49 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 6a469f214187..e6dcfe348a7e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -147,7 +147,9 @@ struct mlx5_esw_sched_node {
 	enum sched_node_type type;
 	/* The eswitch this node belongs to. */
 	struct mlx5_eswitch *esw;
-	/* The children nodes of this node, empty list for leaf nodes. */
+	/* The children nodes of this node, empty list for leaf nodes.
+	 * Can be from multiple E-Switches.
+	 */
 	struct list_head children;
 	/* Valid only if this node is associated with a vport. */
 	struct mlx5_vport *vport;
@@ -398,6 +400,7 @@ static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_
 {
 	u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {};
 	struct mlx5_core_dev *dev = vport_node->esw->dev;
+	struct mlx5_vport *vport = vport_node->vport;
 	void *attr;
 
 	if (!mlx5_qos_element_type_supported(dev,
@@ -408,7 +411,13 @@ static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_
 	MLX5_SET(scheduling_context, sched_ctx, element_type,
 		 SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT);
 	attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes);
-	MLX5_SET(vport_element, attr, vport_number, vport_node->vport->vport);
+	MLX5_SET(vport_element, attr, vport_number, vport->vport);
+	if (vport->dev != dev) {
+		/* The port is assigned to a node on another eswitch. */
+		MLX5_SET(vport_element, attr, eswitch_owner_vhca_id_valid, true);
+		MLX5_SET(vport_element, attr, eswitch_owner_vhca_id,
+			 MLX5_CAP_GEN(vport->dev, vhca_id));
+	}
 	MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_node->parent->ix);
 	MLX5_SET(scheduling_context, sched_ctx, max_average_bw, vport_node->max_rate);
 
@@ -887,10 +896,16 @@ static int esw_qos_devlink_rate_to_mbps(struct mlx5_core_dev *mdev, const char *
 
 int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
 {
-	if (esw->qos.domain)
-		return 0;  /* Nothing to change. */
+	bool use_shared_domain = esw->mode == MLX5_ESWITCH_OFFLOADS &&
+		MLX5_CAP_QOS(esw->dev, esw_cross_esw_sched);
+
+	if (esw->qos.domain) {
+		if (esw->qos.domain->shared == use_shared_domain)
+			return 0;  /* Nothing to change. */
+		esw_qos_domain_release(esw);
+	}
 
-	return esw_qos_domain_init(esw, false);
+	return esw_qos_domain_init(esw, use_shared_domain);
 }
 
 void mlx5_esw_qos_cleanup(struct mlx5_eswitch *esw)
@@ -1021,16 +1036,40 @@ int mlx5_esw_devlink_rate_node_del(struct devlink_rate *rate_node, void *priv,
 	return 0;
 }
 
-int mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent,
-				     struct netlink_ext_ack *extack)
+static int mlx5_esw_validate_cross_esw_scheduling(struct mlx5_eswitch *esw,
+						  struct mlx5_esw_sched_node *parent,
+						  struct netlink_ext_ack *extack)
 {
-	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
-	int err = 0;
+	if (!parent || esw == parent->esw)
+		return 0;
 
-	if (parent && parent->esw != esw) {
+	if (!MLX5_CAP_QOS(esw->dev, esw_cross_esw_sched)) {
 		NL_SET_ERR_MSG_MOD(extack, "Cross E-Switch scheduling is not supported");
 		return -EOPNOTSUPP;
 	}
+	if (esw->qos.domain != parent->esw->qos.domain) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Cannot add vport to a parent belonging to a different qos domain");
+		return -EOPNOTSUPP;
+	}
+	if (!mlx5_lag_is_active(esw->dev)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Cross E-Switch scheduling requires LAG to be activated");
+		return -EOPNOTSUPP;
+	}
+
+	return 0;
+}
+
+int mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent,
+				     struct netlink_ext_ack *extack)
+{
+	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
+	int err;
+
+	err = mlx5_esw_validate_cross_esw_scheduling(esw, parent, extack);
+	if (err)
+		return err;
 
 	esw_qos_lock(esw);
 	if (!vport->qos.sched_node && parent)
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 09/10] net/mlx5: qos: Init shared devlink rate domain
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (7 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 08/10] net/mlx5: qos: Support cross-esw tx scheduling Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  2025-02-13 18:01 ` [PATCH net-next 10/10] net/mlx5: Document devlink rates and cross-esw scheduling Tariq Toukan
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

If the device can do cross-esw scheduling, switch to using a shared rate
domain so that devlink rate nodes can have parents from other functions
of the same device.

As a fallback mechanism, if switching to a shared rate domain failed, a
warning is logged and the code proceeds with trying to use a private qos
domain since cross-esw scheduling cannot be supported.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index e6dcfe348a7e..9af12ae98691 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -899,6 +899,20 @@ int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
 	bool use_shared_domain = esw->mode == MLX5_ESWITCH_OFFLOADS &&
 		MLX5_CAP_QOS(esw->dev, esw_cross_esw_sched);
 
+	if (use_shared_domain) {
+		u64 guid = mlx5_query_nic_system_image_guid(esw->dev);
+		int err;
+
+		err = devlink_shared_rate_domain_init(priv_to_devlink(esw->dev), guid);
+		if (err) {
+			/* On failure, issue a warning and switch to using a private domain. */
+			esw_warn(esw->dev,
+				 "Shared devlink rate domain init failed (err %d), cross-esw QoS not available",
+				 err);
+			use_shared_domain = false;
+		}
+	}
+
 	if (esw->qos.domain) {
 		if (esw->qos.domain->shared == use_shared_domain)
 			return 0;  /* Nothing to change. */
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next 10/10] net/mlx5: Document devlink rates and cross-esw scheduling
  2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
                   ` (8 preceding siblings ...)
  2025-02-13 18:01 ` [PATCH net-next 09/10] net/mlx5: qos: Init shared devlink rate domain Tariq Toukan
@ 2025-02-13 18:01 ` Tariq Toukan
  9 siblings, 0 replies; 23+ messages in thread
From: Tariq Toukan @ 2025-02-13 18:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko
  Cc: Cosmin Ratiu, Carolina Jubran, Gal Pressman, Mark Bloch,
	Donald Hunter, Jiri Pirko, Jonathan Corbet, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, netdev, linux-kernel, linux-doc,
	linux-rdma

From: Cosmin Ratiu <cratiu@nvidia.com>

Extend the devlink-port documentation with a mention that parents can be
from different devices.

It seems rates were not documented in the mlx5-specific file, so add
examples on how to limit VFs and groups and also provide an example of
the intended way to achieve cross-esw scheduling.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../networking/devlink/devlink-port.rst       |  2 ++
 Documentation/networking/devlink/mlx5.rst     | 33 +++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/Documentation/networking/devlink/devlink-port.rst b/Documentation/networking/devlink/devlink-port.rst
index 9d22d41a7cd1..1d9e5839eef4 100644
--- a/Documentation/networking/devlink/devlink-port.rst
+++ b/Documentation/networking/devlink/devlink-port.rst
@@ -417,6 +417,8 @@ API allows to configure following rate object's parameters:
   Parent node name. Parent node rate limits are considered as additional limits
   to all node children limits. ``tx_max`` is an upper limit for children.
   ``tx_share`` is a total bandwidth distributed among children.
+  If the device supports cross-function scheduling, the parent can be from a
+  different function of the same underlying device.
 
 ``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
 nodes with the same priority form a WFQ subgroup in the sibling group
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 7febe0aecd53..61e76da36faf 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -298,3 +298,36 @@ User commands examples:
 
 .. note::
    This command can run over all interfaces such as PF/VF and representor ports.
+
+Rates
+=====
+
+mlx5 devices can limit transmission of individual VFs or a group of them via
+the devlink-rate API in switchdev mode.
+
+User commands examples:
+
+- Print the existing rates::
+
+    $ devlink port function rate show
+
+- Set a max tx limit on traffic from VF0::
+
+    $ devlink port function rate set pci/0000:82:00.0/1 tx_max 10Gbit
+
+- Create a rate group with a max tx limit and adding two VFs to it::
+
+    $ devlink port function rate add pci/0000:82:00.0/group1 tx_max 10Gbit
+    $ devlink port function rate set pci/0000:82:00.0/1 parent group1
+    $ devlink port function rate set pci/0000:82:00.0/2 parent group1
+
+- Same scenario, with a min guarantee of 20% of the bandwidth for the first VFs::
+
+    $ devlink port function rate add pci/0000:82:00.0/group1 tx_max 10Gbit
+    $ devlink port function rate set pci/0000:82:00.0/1 parent group1 tx_share 2Gbit
+    $ devlink port function rate set pci/0000:82:00.0/2 parent group1
+
+- Cross-device scheduling::
+
+    $ devlink port function rate add pci/0000:82:00.0/group1 tx_max 10Gbit
+    $ devlink port function rate set pci/0000:82:00.1/32769 parent group1
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-13 18:01 ` [PATCH net-next 03/10] devlink: Serialize access to rate domains Tariq Toukan
@ 2025-02-14 12:54   ` Jiri Pirko
  2025-02-19  2:21     ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Pirko @ 2025-02-14 12:54 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Thu, Feb 13, 2025 at 07:01:27PM +0100, tariqt@nvidia.com wrote:
>From: Cosmin Ratiu <cratiu@nvidia.com>
>
>Access to rates in a rate domain should be serialized.
>
>This commit introduces two new functions, devl_rate_domain_lock and
>devl_rate_domain_unlock, and uses them whenever serial access to a rate
>domain is needed. For now, they are no-ops since access to a rate domain
>is protected by the devlink lock.
>
>Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
>Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
>Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>---
> net/devlink/devl_internal.h |   4 ++
> net/devlink/rate.c          | 114 +++++++++++++++++++++++++++---------
> 2 files changed, 89 insertions(+), 29 deletions(-)
>
>diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
>index 209b4a4c7070..fae81dd6953f 100644
>--- a/net/devlink/devl_internal.h
>+++ b/net/devlink/devl_internal.h
>@@ -121,6 +121,10 @@ static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock)
> 		device_unlock(devlink->dev);
> }
> 
>+static inline void devl_rate_domain_lock(struct devlink *devlink) { }
>+
>+static inline void devl_rate_domain_unlock(struct devlink *devlink) { }


For the record, I'm still not convinced that introducing this kind of
shared inter-devlink lock is good idea. We spent quite a bit of painful
times getting rid of global devlink_mutex and making devlink locking
scheme nice and simple as it currently is.

But at the same time I admit I can't think of any other nicer solution
to the problem this patchset is trying to solve.

Jakub, any thoughts?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check
  2025-02-13 18:01 ` [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check Tariq Toukan
@ 2025-02-18  2:54   ` Kalesh Anakkur Purayil
  0 siblings, 0 replies; 23+ messages in thread
From: Kalesh Anakkur Purayil @ 2025-02-18  2:54 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jiri Pirko,
	Jonathan Corbet, Saeed Mahameed, Leon Romanovsky, netdev,
	linux-kernel, linux-doc, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

On Thu, Feb 13, 2025 at 11:34 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Cosmin Ratiu <cratiu@nvidia.com>
>
> The 'mode' param is unused so remove it.
>
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

LGTM,
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>


-- 
Regards,
Kalesh AP

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4239 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-14 12:54   ` Jiri Pirko
@ 2025-02-19  2:21     ` Jakub Kicinski
  2025-02-25 13:36       ` Jiri Pirko
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-19  2:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

On Fri, 14 Feb 2025 13:54:43 +0100 Jiri Pirko wrote:
> For the record, I'm still not convinced that introducing this kind of
> shared inter-devlink lock is good idea. We spent quite a bit of painful
> times getting rid of global devlink_mutex and making devlink locking
> scheme nice and simple as it currently is.
> 
> But at the same time I admit I can't think of any other nicer solution
> to the problem this patchset is trying to solve.
> 
> Jakub, any thoughts?

The problem comes from having a devlink instance per function /
port rather than for the ASIC. Spawn a single instance and the
problem will go away 🤷️

I think we talked about this multiple times, I think at least
once with Jake, too. Not that I remember all the details now..
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-19  2:21     ` Jakub Kicinski
@ 2025-02-25 13:36       ` Jiri Pirko
  2025-02-26  1:40         ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Pirko @ 2025-02-25 13:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Wed, Feb 19, 2025 at 03:21:30AM +0100, kuba@kernel.org wrote:
>On Fri, 14 Feb 2025 13:54:43 +0100 Jiri Pirko wrote:
>> For the record, I'm still not convinced that introducing this kind of
>> shared inter-devlink lock is good idea. We spent quite a bit of painful
>> times getting rid of global devlink_mutex and making devlink locking
>> scheme nice and simple as it currently is.
>> 
>> But at the same time I admit I can't think of any other nicer solution
>> to the problem this patchset is trying to solve.
>> 
>> Jakub, any thoughts?
>
>The problem comes from having a devlink instance per function /
>port rather than for the ASIC. Spawn a single instance and the
>problem will go away 🤷️

Yeah, we currently have VF devlink ports created under PF devlink instance.
That is aligned with PCI geometry. If we have a single per-ASIC parent
devlink, this does not change and we still need to configure cross
PF devlink instances.

The only benefit I see is that we don't need rate domain, but
we can use parent devlink instance lock instead. The locking ordering
might be a bit tricky to fix though.


>
>I think we talked about this multiple times, I think at least
>once with Jake, too. Not that I remember all the details now..
>-- 
>pw-bot: cr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-25 13:36       ` Jiri Pirko
@ 2025-02-26  1:40         ` Jakub Kicinski
  2025-02-26 14:44           ` Jiri Pirko
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-26  1:40 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

On Tue, 25 Feb 2025 14:36:07 +0100 Jiri Pirko wrote:
> >The problem comes from having a devlink instance per function /
> >port rather than for the ASIC. Spawn a single instance and the
> >problem will go away 🤷️  
> 
> Yeah, we currently have VF devlink ports created under PF devlink instance.
> That is aligned with PCI geometry. If we have a single per-ASIC parent
> devlink, this does not change and we still need to configure cross
> PF devlink instances.

Why would there still be PF instances? I'm not suggesting that you
create a hierarchy of instances.

> The only benefit I see is that we don't need rate domain, but
> we can use parent devlink instance lock instead. The locking ordering
> might be a bit tricky to fix though.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-26  1:40         ` Jakub Kicinski
@ 2025-02-26 14:44           ` Jiri Pirko
  2025-02-27  2:53             ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Pirko @ 2025-02-26 14:44 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Wed, Feb 26, 2025 at 02:40:05AM +0100, kuba@kernel.org wrote:
>On Tue, 25 Feb 2025 14:36:07 +0100 Jiri Pirko wrote:
>> >The problem comes from having a devlink instance per function /
>> >port rather than for the ASIC. Spawn a single instance and the
>> >problem will go away 🤷️  
>> 
>> Yeah, we currently have VF devlink ports created under PF devlink instance.
>> That is aligned with PCI geometry. If we have a single per-ASIC parent
>> devlink, this does not change and we still need to configure cross
>> PF devlink instances.
>
>Why would there still be PF instances? I'm not suggesting that you
>create a hierarchy of instances.

I'm not sure how you imagine getting rid of them. One PCI PF
instantiates one devlink now. There are lots of configuration (e.g. params)
that is per-PF. You need this instance for that, how else would you do
per-PF things on shared ASIC instance?
Creating SFs is per-PF operation for example. I didn't to thorough
analysis, but I'm sure there are couple of per-PF things like these.
Also not breaking the existing users may be an argument to keep per-PF
instances.

>
>> The only benefit I see is that we don't need rate domain, but
>> we can use parent devlink instance lock instead. The locking ordering
>> might be a bit tricky to fix though.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-26 14:44           ` Jiri Pirko
@ 2025-02-27  2:53             ` Jakub Kicinski
  2025-02-27 12:22               ` Jiri Pirko
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-02-27  2:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

On Wed, 26 Feb 2025 15:44:35 +0100 Jiri Pirko wrote:
> > Why would there still be PF instances? I'm not suggesting that you
> > create a hierarchy of instances.  
> 
> I'm not sure how you imagine getting rid of them. One PCI PF
> instantiates one devlink now. There are lots of configuration (e.g. params)
> that is per-PF. You need this instance for that, how else would you do
> per-PF things on shared ASIC instance?

There are per-PF ports, right?

> Creating SFs is per-PF operation for example. I didn't to thorough
> analysis, but I'm sure there are couple of per-PF things like these.

Seems like adding a port attribute to SF creation would be a much
smaller extension than adding a layer of objects.

> Also not breaking the existing users may be an argument to keep per-PF
> instances.

We're talking about multi-PF devices only. Besides pretty sure we 
moved multiple params and health reporters to be per port, so IDK 
what changed now.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-27  2:53             ` Jakub Kicinski
@ 2025-02-27 12:22               ` Jiri Pirko
  2025-03-03 22:06                 ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Pirko @ 2025-02-27 12:22 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Thu, Feb 27, 2025 at 03:53:10AM +0100, kuba@kernel.org wrote:
>On Wed, 26 Feb 2025 15:44:35 +0100 Jiri Pirko wrote:
>> > Why would there still be PF instances? I'm not suggesting that you
>> > create a hierarchy of instances.  
>> 
>> I'm not sure how you imagine getting rid of them. One PCI PF
>> instantiates one devlink now. There are lots of configuration (e.g. params)
>> that is per-PF. You need this instance for that, how else would you do
>> per-PF things on shared ASIC instance?
>
>There are per-PF ports, right?

Depends. On normal host sr-iov, no. On smartnic where you have PF in
host, yes.


>
>> Creating SFs is per-PF operation for example. I didn't to thorough
>> analysis, but I'm sure there are couple of per-PF things like these.
>
>Seems like adding a port attribute to SF creation would be a much
>smaller extension than adding a layer of objects.
>
>> Also not breaking the existing users may be an argument to keep per-PF
>> instances.
>
>We're talking about multi-PF devices only. Besides pretty sure we 
>moved multiple params and health reporters to be per port, so IDK 
>what changed now.

Looks like pretty much all current NICs are multi-PFs, aren't they?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-02-27 12:22               ` Jiri Pirko
@ 2025-03-03 22:06                 ` Jakub Kicinski
  2025-03-04 13:11                   ` Jiri Pirko
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-03-03 22:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

On Thu, 27 Feb 2025 13:22:25 +0100 Jiri Pirko wrote:
> >> I'm not sure how you imagine getting rid of them. One PCI PF
> >> instantiates one devlink now. There are lots of configuration (e.g. params)
> >> that is per-PF. You need this instance for that, how else would you do
> >> per-PF things on shared ASIC instance?  
> >
> >There are per-PF ports, right?  
> 
> Depends. On normal host sr-iov, no. On smartnic where you have PF in
> host, yes.

Yet another "great choice" in mlx5 other drivers have foreseen
problems with and avoided.

> >> Creating SFs is per-PF operation for example. I didn't to thorough
> >> analysis, but I'm sure there are couple of per-PF things like these.  
> >
> >Seems like adding a port attribute to SF creation would be a much
> >smaller extension than adding a layer of objects.
> >  
> >> Also not breaking the existing users may be an argument to keep per-PF
> >> instances.  
> >
> >We're talking about multi-PF devices only. Besides pretty sure we 
> >moved multiple params and health reporters to be per port, so IDK 
> >what changed now.  
> 
> Looks like pretty much all current NICs are multi-PFs, aren't they?

Not in a way which requires cross-port state sharing, no.
You should know this.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-03-03 22:06                 ` Jakub Kicinski
@ 2025-03-04 13:11                   ` Jiri Pirko
  2025-03-05  0:04                     ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Pirko @ 2025-03-04 13:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Mon, Mar 03, 2025 at 11:06:23PM +0100, kuba@kernel.org wrote:
>On Thu, 27 Feb 2025 13:22:25 +0100 Jiri Pirko wrote:
>> >> I'm not sure how you imagine getting rid of them. One PCI PF
>> >> instantiates one devlink now. There are lots of configuration (e.g. params)
>> >> that is per-PF. You need this instance for that, how else would you do
>> >> per-PF things on shared ASIC instance?  
>> >
>> >There are per-PF ports, right?  
>> 
>> Depends. On normal host sr-iov, no. On smartnic where you have PF in
>> host, yes.
>
>Yet another "great choice" in mlx5 other drivers have foreseen
>problems with and avoided.

What do you mean? How else to model it? Do you suggest having PF devlink
port for the PF that instantiates? That would sound like Uroboros to me.


>
>> >> Creating SFs is per-PF operation for example. I didn't to thorough
>> >> analysis, but I'm sure there are couple of per-PF things like these.  
>> >
>> >Seems like adding a port attribute to SF creation would be a much
>> >smaller extension than adding a layer of objects.
>> >  
>> >> Also not breaking the existing users may be an argument to keep per-PF
>> >> instances.  
>> >
>> >We're talking about multi-PF devices only. Besides pretty sure we 
>> >moved multiple params and health reporters to be per port, so IDK 
>> >what changed now.  
>> 
>> Looks like pretty much all current NICs are multi-PFs, aren't they?
>
>Not in a way which requires cross-port state sharing, no.
>You should know this.

This is not about cross-port state sharing. This is about per-PF
configuration. What am I missing?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-03-04 13:11                   ` Jiri Pirko
@ 2025-03-05  0:04                     ` Jakub Kicinski
  2025-03-05 11:48                       ` Jiri Pirko
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-03-05  0:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

On Tue, 4 Mar 2025 14:11:40 +0100 Jiri Pirko wrote:
> Mon, Mar 03, 2025 at 11:06:23PM +0100, kuba@kernel.org wrote:
> >On Thu, 27 Feb 2025 13:22:25 +0100 Jiri Pirko wrote:  
> >> Depends. On normal host sr-iov, no. On smartnic where you have PF in
> >> host, yes.  
> >
> >Yet another "great choice" in mlx5 other drivers have foreseen
> >problems with and avoided.  
> 
> What do you mean? How else to model it? Do you suggest having PF devlink
> port for the PF that instantiates? That would sound like Uroboros to me.

I reckon it was always more obvious to those of us working on
NPU-derived devices, to which a PCIe port is just a PCIe port,
with no PCIe<>MAC "pipeline" to speak of.

The reason why having the "PF port" is a good idea is exactly
why we're having this conversation. If you don't you'll assign
to the global scope attributes which are really just port attributes.

> >> Looks like pretty much all current NICs are multi-PFs, aren't they?  
> >
> >Not in a way which requires cross-port state sharing, no.
> >You should know this.  
> 
> This is not about cross-port state sharing. This is about per-PF
> configuration. What am I missing?

Maybe we lost the thread of the conversation.. :)
I'm looking at the next patch in this series and it says:

  devlink: Introduce shared rate domains

  The underlying idea is modeling a piece of hardware which:
  1. Exposes multiple functions as separate devlink objects.
  2. Is capable of instantiating a transmit scheduling tree spanning
     multiple functions.

  Modeling this requires devlink rate nodes with parents across other
  devlink objects.

Are these domains are not cross port?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next 03/10] devlink: Serialize access to rate domains
  2025-03-05  0:04                     ` Jakub Kicinski
@ 2025-03-05 11:48                       ` Jiri Pirko
  0 siblings, 0 replies; 23+ messages in thread
From: Jiri Pirko @ 2025-03-05 11:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Jiri Pirko, Cosmin Ratiu, Carolina Jubran,
	Gal Pressman, Mark Bloch, Donald Hunter, Jonathan Corbet,
	Saeed Mahameed, Leon Romanovsky, netdev, linux-kernel, linux-doc,
	linux-rdma

Wed, Mar 05, 2025 at 01:04:12AM +0100, kuba@kernel.org wrote:
>On Tue, 4 Mar 2025 14:11:40 +0100 Jiri Pirko wrote:
>> Mon, Mar 03, 2025 at 11:06:23PM +0100, kuba@kernel.org wrote:
>> >On Thu, 27 Feb 2025 13:22:25 +0100 Jiri Pirko wrote:  
>> >> Depends. On normal host sr-iov, no. On smartnic where you have PF in
>> >> host, yes.  
>> >
>> >Yet another "great choice" in mlx5 other drivers have foreseen
>> >problems with and avoided.  
>> 
>> What do you mean? How else to model it? Do you suggest having PF devlink
>> port for the PF that instantiates? That would sound like Uroboros to me.
>
>I reckon it was always more obvious to those of us working on
>NPU-derived devices, to which a PCIe port is just a PCIe port,
>with no PCIe<>MAC "pipeline" to speak of.
>
>The reason why having the "PF port" is a good idea is exactly
>why we're having this conversation. If you don't you'll assign
>to the global scope attributes which are really just port attributes.

Well, we have devlink port for uplink for this purpose. Why isn't that
enough?


>
>> >> Looks like pretty much all current NICs are multi-PFs, aren't they?  
>> >
>> >Not in a way which requires cross-port state sharing, no.
>> >You should know this.  
>> 
>> This is not about cross-port state sharing. This is about per-PF
>> configuration. What am I missing?
>
>Maybe we lost the thread of the conversation.. :)
>I'm looking at the next patch in this series and it says:
>
>  devlink: Introduce shared rate domains
>
>  The underlying idea is modeling a piece of hardware which:
>  1. Exposes multiple functions as separate devlink objects.
>  2. Is capable of instantiating a transmit scheduling tree spanning
>     multiple functions.
>
>  Modeling this requires devlink rate nodes with parents across other
>  devlink objects.
>
>Are these domains are not cross port?

Sure. Cross PF even. What I suggest is, if we have devlink instance of
which these 2 PFs are nested, we have this "domain" explicitly defined
and we don't need any other construct.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-03-05 11:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 18:01 [PATCH net-next 00/10] devlink and mlx5: Introduce rate domains Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 01/10] devlink: Remove unused param of devlink_rate_nodes_check Tariq Toukan
2025-02-18  2:54   ` Kalesh Anakkur Purayil
2025-02-13 18:01 ` [PATCH net-next 02/10] devlink: Store devlink rates in a rate domain Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 03/10] devlink: Serialize access to rate domains Tariq Toukan
2025-02-14 12:54   ` Jiri Pirko
2025-02-19  2:21     ` Jakub Kicinski
2025-02-25 13:36       ` Jiri Pirko
2025-02-26  1:40         ` Jakub Kicinski
2025-02-26 14:44           ` Jiri Pirko
2025-02-27  2:53             ` Jakub Kicinski
2025-02-27 12:22               ` Jiri Pirko
2025-03-03 22:06                 ` Jakub Kicinski
2025-03-04 13:11                   ` Jiri Pirko
2025-03-05  0:04                     ` Jakub Kicinski
2025-03-05 11:48                       ` Jiri Pirko
2025-02-13 18:01 ` [PATCH net-next 04/10] devlink: Introduce shared " Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 05/10] devlink: Allow specifying parent device for rate commands Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 06/10] devlink: Allow rate node parents from other devlinks Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 07/10] net/mlx5: qos: Introduce shared esw qos domains Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 08/10] net/mlx5: qos: Support cross-esw tx scheduling Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 09/10] net/mlx5: qos: Init shared devlink rate domain Tariq Toukan
2025-02-13 18:01 ` [PATCH net-next 10/10] net/mlx5: Document devlink rates and cross-esw scheduling Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).