netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/6] mlx5 misc fixes 2025-08-13
@ 2025-08-13 14:31 Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 1/6] net/mlx5: Base ECVF devlink port attrs from 0 Tariq Toukan
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea

Hi,

This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.

Thanks,
Tariq.


Alexei Lazar (1):
  net/mlx5e: Query FW for buffer ownership

Carolina Jubran (4):
  net/mlx5: Fix QoS reference leak in vport enable error path
  net/mlx5: Restore missing scheduling node cleanup on vport enable
    failure
  net/mlx5: Destroy vport QoS element when no configuration remains
  net/mlx5e: Preserve tc-bw during parent changes

Daniel Jurgens (1):
  net/mlx5: Base ECVF devlink port attrs from 0

 .../ethernet/mellanox/mlx5/core/en/dcbnl.h    |  1 -
 .../ethernet/mellanox/mlx5/core/en_dcbnl.c    | 12 ++-
 .../mellanox/mlx5/core/esw/devlink_port.c     |  4 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 78 +++++++++++++++----
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  2 +
 .../net/ethernet/mellanox/mlx5/core/port.c    | 20 +++++
 6 files changed, 98 insertions(+), 19 deletions(-)


base-commit: d7e82594a45c5cb270940ac469846e8026c7db0f
-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net 1/6] net/mlx5: Base ECVF devlink port attrs from 0
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 2/6] net/mlx5: Fix QoS reference leak in vport enable error path Tariq Toukan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Daniel Jurgens

From: Daniel Jurgens <danielj@nvidia.com>

Adjust the vport number by the base ECVF vport number so the port
attributes start at 0. Previously the port attributes would start 1
after the maximum number of host VFs.

Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports")
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index b7102e14d23d..c33accadae0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -47,10 +47,12 @@ static void mlx5_esw_offloads_pf_vf_devlink_port_attrs_set(struct mlx5_eswitch *
 		devlink_port_attrs_pci_vf_set(dl_port, controller_num, pfnum,
 					      vport_num - 1, external);
 	}  else if (mlx5_core_is_ec_vf_vport(esw->dev, vport_num)) {
+		u16 base_vport = mlx5_core_ec_vf_vport_base(dev);
+
 		memcpy(dl_port->attrs.switch_id.id, ppid.id, ppid.id_len);
 		dl_port->attrs.switch_id.id_len = ppid.id_len;
 		devlink_port_attrs_pci_vf_set(dl_port, 0, pfnum,
-					      vport_num - 1, false);
+					      vport_num - base_vport, false);
 	}
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 2/6] net/mlx5: Fix QoS reference leak in vport enable error path
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 1/6] net/mlx5: Base ECVF devlink port attrs from 0 Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 3/6] net/mlx5: Restore missing scheduling node cleanup on vport enable failure Tariq Toukan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Carolina Jubran

From: Carolina Jubran <cjubran@nvidia.com>

Add missing esw_qos_put() call when __esw_qos_alloc_node() fails in
mlx5_esw_qos_vport_enable().

Fixes: be034baba83e ("net/mlx5: Make vport QoS enablement more flexible for future extensions")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 91d863c8c152..79d6add402d7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -1141,8 +1141,10 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t
 
 	parent = parent ?: esw->qos.node0;
 	sched_node = __esw_qos_alloc_node(parent->esw, 0, type, parent);
-	if (!sched_node)
+	if (!sched_node) {
+		esw_qos_put(esw);
 		return -ENOMEM;
+	}
 
 	sched_node->max_rate = max_rate;
 	sched_node->min_rate = min_rate;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 3/6] net/mlx5: Restore missing scheduling node cleanup on vport enable failure
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 1/6] net/mlx5: Base ECVF devlink port attrs from 0 Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 2/6] net/mlx5: Fix QoS reference leak in vport enable error path Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains Tariq Toukan
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Carolina Jubran

From: Carolina Jubran <cjubran@nvidia.com>

Restore the __esw_qos_free_node() call removed by the offending commit.

Fixes: 97733d1e00a0 ("net/mlx5: Add traffic class scheduling support for vport QoS")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 79d6add402d7..1dc98e4065af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -1152,6 +1152,7 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t
 	vport->qos.sched_node = sched_node;
 	err = esw_qos_vport_enable(vport, type, parent, extack);
 	if (err) {
+		__esw_qos_free_node(sched_node);
 		esw_qos_put(esw);
 		vport->qos.sched_node = NULL;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
                   ` (2 preceding siblings ...)
  2025-08-13 14:31 ` [PATCH net 3/6] net/mlx5: Restore missing scheduling node cleanup on vport enable failure Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  2025-08-14  9:55   ` Przemek Kitszel
  2025-08-13 14:31 ` [PATCH net 5/6] net/mlx5e: Preserve tc-bw during parent changes Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 6/6] net/mlx5e: Query FW for buffer ownership Tariq Toukan
  5 siblings, 1 reply; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Carolina Jubran

From: Carolina Jubran <cjubran@nvidia.com>

If a VF has been configured and the user later clears all QoS settings,
the vport element remains in the firmware QoS tree. This leads to
inconsistent behavior compared to VFs that were never configured, since
the FW assumes that unconfigured VFs are outside the QoS hierarchy.
As a result, the bandwidth share across VFs may differ, even though
none of them appear to have any configuration.

Align the driver behavior with the FW expectation by destroying the
vport QoS element when all configurations are removed.

Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 54 +++++++++++++++++--
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 1dc98e4065af..811c1a121c03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -1160,6 +1160,19 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t
 	return err;
 }
 
+static void mlx5_esw_qos_vport_disable_locked(struct mlx5_vport *vport)
+{
+	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
+
+	esw_assert_qos_lock_held(esw);
+	if (!vport->qos.sched_node)
+		return;
+
+	esw_qos_vport_disable(vport, NULL);
+	mlx5_esw_qos_vport_qos_free(vport);
+	esw_qos_put(esw);
+}
+
 void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport)
 {
 	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
@@ -1173,9 +1186,7 @@ void mlx5_esw_qos_vport_disable(struct mlx5_vport *vport)
 	parent = vport->qos.sched_node->parent;
 	WARN(parent != esw->qos.node0, "Disabling QoS on port before detaching it from node");
 
-	esw_qos_vport_disable(vport, NULL);
-	mlx5_esw_qos_vport_qos_free(vport);
-	esw_qos_put(esw);
+	mlx5_esw_qos_vport_disable_locked(vport);
 unlock:
 	esw_qos_unlock(esw);
 }
@@ -1676,6 +1687,23 @@ static bool esw_qos_tc_bw_disabled(u32 *tc_bw)
 	return true;
 }
 
+static bool esw_vport_qos_check_and_disable(struct mlx5_vport *vport,
+					    struct devlink_rate *parent,
+					    u64 tx_max, u64 tx_share,
+					    u32 *tc_bw)
+{
+	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
+
+	if (parent || tx_max || tx_share || !esw_qos_tc_bw_disabled(tc_bw))
+		return false;
+
+	esw_qos_lock(esw);
+	mlx5_esw_qos_vport_disable_locked(vport);
+	esw_qos_unlock(esw);
+
+	return true;
+}
+
 int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
 {
 	if (esw->qos.domain)
@@ -1703,6 +1731,11 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void
 	if (!mlx5_esw_allowed(esw))
 		return -EPERM;
 
+	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
+					    rate_leaf->tx_max, tx_share,
+					    rate_leaf->tc_bw))
+		return 0;
+
 	err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_share", &tx_share, extack);
 	if (err)
 		return err;
@@ -1724,6 +1757,11 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *
 	if (!mlx5_esw_allowed(esw))
 		return -EPERM;
 
+	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent, tx_max,
+					    rate_leaf->tx_share,
+					    rate_leaf->tc_bw))
+		return 0;
+
 	err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_max", &tx_max, extack);
 	if (err)
 		return err;
@@ -1749,6 +1787,11 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf,
 	if (!mlx5_esw_allowed(esw))
 		return -EPERM;
 
+	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
+					    rate_leaf->tx_max,
+					    rate_leaf->tx_share, tc_bw))
+		return 0;
+
 	disable = esw_qos_tc_bw_disabled(tc_bw);
 	esw_qos_lock(esw);
 
@@ -1930,6 +1973,11 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate,
 	struct mlx5_esw_sched_node *node;
 	struct mlx5_vport *vport = priv;
 
+	if (esw_vport_qos_check_and_disable(vport, parent, devlink_rate->tx_max,
+					    devlink_rate->tx_share,
+					    devlink_rate->tc_bw))
+		return 0;
+
 	if (!parent)
 		return mlx5_esw_qos_vport_update_parent(vport, NULL, extack);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 5/6] net/mlx5e: Preserve tc-bw during parent changes
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
                   ` (3 preceding siblings ...)
  2025-08-13 14:31 ` [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  2025-08-13 14:31 ` [PATCH net 6/6] net/mlx5e: Query FW for buffer ownership Tariq Toukan
  5 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Carolina Jubran

From: Carolina Jubran <cjubran@nvidia.com>

When changing parent of a node/leaf with tc-bw configured, the code
saves and restores tc-bw values. However, it was reading the converted
hardware bw_share values (where 0 becomes 1) instead of the original
user values, causing incorrect tc-bw calculations after parent change.

Store original tc-bw values in the node structure and use them directly
for save/restore operations.

Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 811c1a121c03..e774f6fa3377 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -102,6 +102,8 @@ struct mlx5_esw_sched_node {
 	u8 level;
 	/* Valid only when this node represents a traffic class. */
 	u8 tc;
+	/* Valid only for a TC arbiter node or vport TC arbiter. */
+	u32 tc_bw[DEVLINK_RATE_TCS_MAX];
 };
 
 static void esw_qos_node_attach_to_parent(struct mlx5_esw_sched_node *node)
@@ -608,10 +610,7 @@ static void
 esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node,
 				 u32 *tc_bw)
 {
-	struct mlx5_esw_sched_node *vports_tc_node;
-
-	list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry)
-		tc_bw[vports_tc_node->tc] = vports_tc_node->bw_share;
+	memcpy(tc_bw, tc_arbiter_node->tc_bw, sizeof(tc_arbiter_node->tc_bw));
 }
 
 static void
@@ -628,6 +627,7 @@ esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node,
 		u8 tc = vports_tc_node->tc;
 		u32 bw_share;
 
+		tc_arbiter_node->tc_bw[tc] = tc_bw[tc];
 		bw_share = tc_bw[tc] * fw_max_bw_share;
 		bw_share = esw_qos_calc_bw_share(bw_share, divider,
 						 fw_max_bw_share);
@@ -1276,8 +1276,9 @@ static int esw_qos_vport_update(struct mlx5_vport *vport,
 				struct mlx5_esw_sched_node *parent,
 				struct netlink_ext_ack *extack)
 {
-	struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent;
-	enum sched_node_type curr_type = vport->qos.sched_node->type;
+	struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+	struct mlx5_esw_sched_node *curr_parent = vport_node->parent;
+	enum sched_node_type curr_type = vport_node->type;
 	u32 curr_tc_bw[DEVLINK_RATE_TCS_MAX] = {0};
 	int err;
 
@@ -1290,10 +1291,8 @@ static int esw_qos_vport_update(struct mlx5_vport *vport,
 	if (err)
 		return err;
 
-	if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type) {
-		esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node,
-						 curr_tc_bw);
-	}
+	if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type)
+		esw_qos_tc_arbiter_get_bw_shares(vport_node, curr_tc_bw);
 
 	esw_qos_vport_disable(vport, extack);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net 6/6] net/mlx5e: Query FW for buffer ownership
  2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
                   ` (4 preceding siblings ...)
  2025-08-13 14:31 ` [PATCH net 5/6] net/mlx5e: Preserve tc-bw during parent changes Tariq Toukan
@ 2025-08-13 14:31 ` Tariq Toukan
  5 siblings, 0 replies; 9+ messages in thread
From: Tariq Toukan @ 2025-08-13 14:31 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea,
	Alexei Lazar

From: Alexei Lazar <alazar@nvidia.com>

The SW currently saves local buffer ownership when setting
the buffer.
This means that the SW assumes it has ownership of the buffer
after the command is set.

If setting the buffer fails and we remain in FW ownership,
the local buffer ownership state incorrectly remains as SW-owned.
This leads to incorrect behavior in subsequent PFC commands,
causing failures.

Instead of saving local buffer ownership in SW,
query the FW for buffer ownership when setting the buffer.
This ensures that the buffer ownership state is accurately
reflected, avoiding the issues caused by incorrect ownership
states.

Fixes: ecdf2dadee8e ("net/mlx5e: Receive buffer support for DCBX")
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en/dcbnl.h    |  1 -
 .../ethernet/mellanox/mlx5/core/en_dcbnl.c    | 12 ++++++++---
 .../ethernet/mellanox/mlx5/core/mlx5_core.h   |  2 ++
 .../net/ethernet/mellanox/mlx5/core/port.c    | 20 +++++++++++++++++++
 4 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
index b59aee75de94..2c98a5299df3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/dcbnl.h
@@ -26,7 +26,6 @@ struct mlx5e_dcbx {
 	u8                         cap;
 
 	/* Buffer configuration */
-	bool                       manual_buffer;
 	u32                        cable_len;
 	u32                        xoff;
 	u16                        port_buff_cell_sz;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 5fe016e477b3..d166c0d5189e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -362,6 +362,7 @@ static int mlx5e_dcbnl_ieee_getpfc(struct net_device *dev,
 static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 				   struct ieee_pfc *pfc)
 {
+	u8 buffer_ownership = MLX5_BUF_OWNERSHIP_UNKNOWN;
 	struct mlx5e_priv *priv = netdev_priv(dev);
 	struct mlx5_core_dev *mdev = priv->mdev;
 	u32 old_cable_len = priv->dcbx.cable_len;
@@ -389,7 +390,14 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
 
 	if (MLX5_BUFFER_SUPPORTED(mdev)) {
 		pfc_new.pfc_en = (changed & MLX5E_PORT_BUFFER_PFC) ? pfc->pfc_en : curr_pfc_en;
-		if (priv->dcbx.manual_buffer)
+		ret = mlx5_query_port_buffer_ownership(mdev,
+						       &buffer_ownership);
+		if (ret)
+			netdev_err(dev,
+				   "%s, Failed to get buffer ownership: %d\n",
+				   __func__, ret);
+
+		if (buffer_ownership == MLX5_BUF_OWNERSHIP_SW_OWNED)
 			ret = mlx5e_port_manual_buffer_config(priv, changed,
 							      dev->mtu, &pfc_new,
 							      NULL, NULL);
@@ -982,7 +990,6 @@ static int mlx5e_dcbnl_setbuffer(struct net_device *dev,
 	if (!changed)
 		return 0;
 
-	priv->dcbx.manual_buffer = true;
 	err = mlx5e_port_manual_buffer_config(priv, changed, dev->mtu, NULL,
 					      buffer_size, prio2buffer);
 	return err;
@@ -1252,7 +1259,6 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
 		priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
 
 	priv->dcbx.port_buff_cell_sz = mlx5e_query_port_buffers_cell_size(priv);
-	priv->dcbx.manual_buffer = false;
 	priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN;
 
 	mlx5e_ets_init(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index b6d53db27cd5..9d3504f5abfa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -367,6 +367,8 @@ int mlx5_query_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *out);
 int mlx5_set_port_dcbx_param(struct mlx5_core_dev *mdev, u32 *in);
 int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state);
 int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state);
+int mlx5_query_port_buffer_ownership(struct mlx5_core_dev *mdev,
+				     u8 *buffer_ownership);
 int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio);
 int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index 549f1066d2a5..2d7adf7444ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -968,6 +968,26 @@ int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state)
 	return err;
 }
 
+int mlx5_query_port_buffer_ownership(struct mlx5_core_dev *mdev,
+				     u8 *buffer_ownership)
+{
+	u32 out[MLX5_ST_SZ_DW(pfcc_reg)] = {};
+	int err;
+
+	if (!MLX5_CAP_PCAM_FEATURE(mdev, buffer_ownership)) {
+		*buffer_ownership = MLX5_BUF_OWNERSHIP_UNKNOWN;
+		return 0;
+	}
+
+	err = mlx5_query_pfcc_reg(mdev, out, sizeof(out));
+	if (err)
+		return err;
+
+	*buffer_ownership = MLX5_GET(pfcc_reg, out, buf_ownership);
+
+	return 0;
+}
+
 int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio)
 {
 	int sz = MLX5_ST_SZ_BYTES(qpdpm_reg);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains
  2025-08-13 14:31 ` [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains Tariq Toukan
@ 2025-08-14  9:55   ` Przemek Kitszel
  2025-08-17  8:36     ` Carolina Jubran
  0 siblings, 1 reply; 9+ messages in thread
From: Przemek Kitszel @ 2025-08-14  9:55 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
	linux-kernel, Gal Pressman, Dragos Tatulea, Carolina Jubran,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller

On 8/13/25 16:31, Tariq Toukan wrote:
> From: Carolina Jubran <cjubran@nvidia.com>
> 
> If a VF has been configured and the user later clears all QoS settings,
> the vport element remains in the firmware QoS tree. This leads to
> inconsistent behavior compared to VFs that were never configured, since
> the FW assumes that unconfigured VFs are outside the QoS hierarchy.
> As a result, the bandwidth share across VFs may differ, even though
> none of them appear to have any configuration.
> 
> Align the driver behavior with the FW expectation by destroying the
> vport QoS element when all configurations are removed.
> 
> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
> Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw")
> Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 54 +++++++++++++++++--
>   1 file changed, 51 insertions(+), 3 deletions(-)


> +static bool esw_vport_qos_check_and_disable(struct mlx5_vport *vport,
> +					    struct devlink_rate *parent,
> +					    u64 tx_max, u64 tx_share,
> +					    u32 *tc_bw)
> +{
> +	struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
> +
> +	if (parent || tx_max || tx_share || !esw_qos_tc_bw_disabled(tc_bw))
> +		return false;
> +
> +	esw_qos_lock(esw);
> +	mlx5_esw_qos_vport_disable_locked(vport);
> +	esw_qos_unlock(esw);
> +
> +	return true;
> +}
> +
>   int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
>   {
>   	if (esw->qos.domain)
> @@ -1703,6 +1731,11 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void
>   	if (!mlx5_esw_allowed(esw))
>   		return -EPERM;
>   
> +	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
> +					    rate_leaf->tx_max, tx_share,
> +					    rate_leaf->tc_bw))
> +		return 0;
> +

I would rather keep executing the code that "sets tx_share to 0 and
propagates the info", and only then prune all-0 nodes.
Same for other params (tx_max, ...)

That would be less risky and more future-proof, and also would let your
check&disable function to take less params.

Finally, the name is poor, what about:?
	esw_vport_qos_prune_empty(vport, rate_leaf);
(after applying my prev suggestion the above line will be at the
bottom of function).

Also a note, that if you apply the above, it would be also good to
keep the "esw_qos_lock() just once" (as it is now)

>   	err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_share", &tx_share, extack);
>   	if (err)
>   		return err;
> @@ -1724,6 +1757,11 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *
>   	if (!mlx5_esw_allowed(esw))
>   		return -EPERM;
>   
> +	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent, tx_max,
> +					    rate_leaf->tx_share,
> +					    rate_leaf->tc_bw))
> +		return 0;
> +
>   	err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_max", &tx_max, extack);
>   	if (err)
>   		return err;
> @@ -1749,6 +1787,11 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf,
>   	if (!mlx5_esw_allowed(esw))
>   		return -EPERM;
>   
> +	if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
> +					    rate_leaf->tx_max,
> +					    rate_leaf->tx_share, tc_bw))
> +		return 0;
> +
>   	disable = esw_qos_tc_bw_disabled(tc_bw);
>   	esw_qos_lock(esw);
>   
> @@ -1930,6 +1973,11 @@ int mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate,
>   	struct mlx5_esw_sched_node *node;
>   	struct mlx5_vport *vport = priv;
>   
> +	if (esw_vport_qos_check_and_disable(vport, parent, devlink_rate->tx_max,
> +					    devlink_rate->tx_share,
> +					    devlink_rate->tc_bw))
> +		return 0;
> +
>   	if (!parent)
>   		return mlx5_esw_qos_vport_update_parent(vport, NULL, extack);
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains
  2025-08-14  9:55   ` Przemek Kitszel
@ 2025-08-17  8:36     ` Carolina Jubran
  0 siblings, 0 replies; 9+ messages in thread
From: Carolina Jubran @ 2025-08-17  8:36 UTC (permalink / raw)
  To: Przemek Kitszel, Tariq Toukan
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
	linux-kernel, Gal Pressman, Dragos Tatulea, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Andrew Lunn, David S. Miller



On 14/08/2025 12:55, Przemek Kitszel wrote:
> On 8/13/25 16:31, Tariq Toukan wrote:
>> From: Carolina Jubran <cjubran@nvidia.com>
>>
>> If a VF has been configured and the user later clears all QoS settings,
>> the vport element remains in the firmware QoS tree. This leads to
>> inconsistent behavior compared to VFs that were never configured, since
>> the FW assumes that unconfigured VFs are outside the QoS hierarchy.
>> As a result, the bandwidth share across VFs may differ, even though
>> none of them appear to have any configuration.
>>
>> Align the driver behavior with the FW expectation by destroying the
>> vport QoS element when all configurations are removed.
>>
>> Fixes: c9497c98901c ("net/mlx5: Add support for setting VF min rate")
>> Fixes: cf7e73770d1b ("net/mlx5: Manage TC arbiter nodes and implement 
>> full support for tc-bw")
>> Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
>> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>> ---
>>   .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 54 +++++++++++++++++--
>>   1 file changed, 51 insertions(+), 3 deletions(-)
> 
> 
>> +static bool esw_vport_qos_check_and_disable(struct mlx5_vport *vport,
>> +                        struct devlink_rate *parent,
>> +                        u64 tx_max, u64 tx_share,
>> +                        u32 *tc_bw)
>> +{
>> +    struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
>> +
>> +    if (parent || tx_max || tx_share || !esw_qos_tc_bw_disabled(tc_bw))
>> +        return false;
>> +
>> +    esw_qos_lock(esw);
>> +    mlx5_esw_qos_vport_disable_locked(vport);
>> +    esw_qos_unlock(esw);
>> +
>> +    return true;
>> +}
>> +
>>   int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
>>   {
>>       if (esw->qos.domain)
>> @@ -1703,6 +1731,11 @@ int 
>> mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate 
>> *rate_leaf, void
>>       if (!mlx5_esw_allowed(esw))
>>           return -EPERM;
>> +    if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
>> +                        rate_leaf->tx_max, tx_share,
>> +                        rate_leaf->tc_bw))
>> +        return 0;
>> +
> 
> I would rather keep executing the code that "sets tx_share to 0 and
> propagates the info", and only then prune all-0 nodes.
> Same for other params (tx_max, ...)
> 
> That would be less risky and more future-proof, and also would let your
> check&disable function to take less params.
> 
> Finally, the name is poor, what about:?
>      esw_vport_qos_prune_empty(vport, rate_leaf);
> (after applying my prev suggestion the above line will be at the
> bottom of function).
> 
> Also a note, that if you apply the above, it would be also good to
> keep the "esw_qos_lock() just once" (as it is now)
> 

Thanks for the review!
My initial thought was to reduce the amount of FW commands, but I agree 
with your points.
Also, thanks for the name suggestion.
I’ll fix and send v2.

Carolina

>>       err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_share", 
>> &tx_share, extack);
>>       if (err)
>>           return err;
>> @@ -1724,6 +1757,11 @@ int 
>> mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, 
>> void *
>>       if (!mlx5_esw_allowed(esw))
>>           return -EPERM;
>> +    if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent, 
>> tx_max,
>> +                        rate_leaf->tx_share,
>> +                        rate_leaf->tc_bw))
>> +        return 0;
>> +
>>       err = esw_qos_devlink_rate_to_mbps(vport->dev, "tx_max", 
>> &tx_max, extack);
>>       if (err)
>>           return err;
>> @@ -1749,6 +1787,11 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct 
>> devlink_rate *rate_leaf,
>>       if (!mlx5_esw_allowed(esw))
>>           return -EPERM;
>> +    if (esw_vport_qos_check_and_disable(vport, rate_leaf->parent,
>> +                        rate_leaf->tx_max,
>> +                        rate_leaf->tx_share, tc_bw))
>> +        return 0;
>> +
>>       disable = esw_qos_tc_bw_disabled(tc_bw);
>>       esw_qos_lock(esw);
>> @@ -1930,6 +1973,11 @@ int 
>> mlx5_esw_devlink_rate_leaf_parent_set(struct devlink_rate *devlink_rate,
>>       struct mlx5_esw_sched_node *node;
>>       struct mlx5_vport *vport = priv;
>> +    if (esw_vport_qos_check_and_disable(vport, parent, devlink_rate- 
>> >tx_max,
>> +                        devlink_rate->tx_share,
>> +                        devlink_rate->tc_bw))
>> +        return 0;
>> +
>>       if (!parent)
>>           return mlx5_esw_qos_vport_update_parent(vport, NULL, extack);
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-17  8:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 14:31 [PATCH net 0/6] mlx5 misc fixes 2025-08-13 Tariq Toukan
2025-08-13 14:31 ` [PATCH net 1/6] net/mlx5: Base ECVF devlink port attrs from 0 Tariq Toukan
2025-08-13 14:31 ` [PATCH net 2/6] net/mlx5: Fix QoS reference leak in vport enable error path Tariq Toukan
2025-08-13 14:31 ` [PATCH net 3/6] net/mlx5: Restore missing scheduling node cleanup on vport enable failure Tariq Toukan
2025-08-13 14:31 ` [PATCH net 4/6] net/mlx5: Destroy vport QoS element when no configuration remains Tariq Toukan
2025-08-14  9:55   ` Przemek Kitszel
2025-08-17  8:36     ` Carolina Jubran
2025-08-13 14:31 ` [PATCH net 5/6] net/mlx5e: Preserve tc-bw during parent changes Tariq Toukan
2025-08-13 14:31 ` [PATCH net 6/6] net/mlx5e: Query FW for buffer ownership Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).