linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
@ 2025-06-19 11:37 Mark Bloch
  2025-06-19 11:37 ` [PATCH mlx5-next 1/5] net/mlx5: Small refactor for general object capabilities Mark Bloch
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Mark Bloch

PCIe congestion events are events generated by the firmware when the
device side has sustained PCIe inbound or outbound traffic above
certain thresholds. The high and low threshold are hysteresis thresholds
to prevent flapping: once the high threshold has been reached, a low
threshold event will be triggered only after the bandwidth usage went
below the low threshold.

This series adds support for receiving and exposing such events as
ethtool counters.

2 new pairs of counters are exposed: pci_bw_in/outbound_high/low. These
should help the user understand if the device PCI is under pressure.
The thresholds are configurable via sysfs.

Dragos Tatulea (5):
  net/mlx5: Small refactor for general object capabilities
  net/mlx5: Add IFC bits for PCIe Congestion Event object
  net/mlx5e: Create/destroy PCIe Congestion Event object
  net/mlx5e: Add device PCIe congestion ethtool stats
  net/mlx5e: Make PCIe congestion event thresholds configurable

 .../ethernet/mellanox/mlx5/counters.rst       |  32 ++
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +
 .../mellanox/mlx5/core/en/pcie_cong_event.c   | 464 ++++++++++++++++++
 .../mellanox/mlx5/core/en/pcie_cong_event.h   |  11 +
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   3 +
 .../ethernet/mellanox/mlx5/core/en_stats.c    |   1 +
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   4 +
 include/linux/mlx5/mlx5_ifc.h                 |  67 ++-
 10 files changed, 575 insertions(+), 12 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h


base-commit: d3623dd5bd4e1fc9acfc08dd0064658bbbf1e8de
-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH mlx5-next 1/5] net/mlx5: Small refactor for general object capabilities
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
@ 2025-06-19 11:37 ` Mark Bloch
  2025-06-19 11:37 ` [PATCH mlx5-next 2/5] net/mlx5: Add IFC bits for PCIe Congestion Event object Mark Bloch
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Dragos Tatulea,
	Mark Bloch

From: Dragos Tatulea <dtatulea@nvidia.com>

Make enum for capability bits of general object types depend on
the type definitions themselves.

Make sure that capabilities in the [64,127] bit range are
properly calculated (type id - 64).

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 2c09df4ee574..5c8f75605eac 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -12501,17 +12501,6 @@ struct mlx5_ifc_affiliated_event_header_bits {
 	u8         obj_id[0x20];
 };
 
-enum {
-	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = BIT_ULL(0xc),
-	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = BIT_ULL(0x13),
-	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT_ULL(0x20),
-	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = BIT_ULL(0x24),
-};
-
-enum {
-	MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL = BIT_ULL(0x13),
-};
-
 enum {
 	MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 0xc,
 	MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13,
@@ -12523,6 +12512,22 @@ enum {
 	MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS = 0xff15,
 };
 
+enum {
+	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY),
+	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_IPSEC),
+	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_SAMPLER),
+	MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_FLOW_METER_ASO),
+};
+
+enum {
+	MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL - 0x40),
+};
+
 enum {
 	MLX5_IPSEC_OBJECT_ICV_LEN_16B,
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH mlx5-next 2/5] net/mlx5: Add IFC bits for PCIe Congestion Event object
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
  2025-06-19 11:37 ` [PATCH mlx5-next 1/5] net/mlx5: Small refactor for general object capabilities Mark Bloch
@ 2025-06-19 11:37 ` Mark Bloch
  2025-06-19 11:37 ` [PATCH net-next 3/5] net/mlx5e: Create/destroy " Mark Bloch
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Dragos Tatulea,
	Mark Bloch

From: Dragos Tatulea <dtatulea@nvidia.com>

Add definitions for the PCIe Congestion Event object
and the relevant FW command structures.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 40 +++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 5c8f75605eac..0e93f342be09 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -12509,6 +12509,7 @@ enum {
 	MLX5_GENERAL_OBJECT_TYPES_MACSEC = 0x27,
 	MLX5_GENERAL_OBJECT_TYPES_INT_KEK = 0x47,
 	MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL = 0x53,
+	MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT = 0x58,
 	MLX5_GENERAL_OBJECT_TYPES_FLOW_TABLE_ALIAS = 0xff15,
 };
 
@@ -12526,6 +12527,8 @@ enum {
 enum {
 	MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_RDMA_CTRL =
 		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_RDMA_CTRL - 0x40),
+	MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT =
+		BIT_ULL(MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT - 0x40),
 };
 
 enum {
@@ -13284,4 +13287,41 @@ struct mlx5_ifc_mrtcq_reg_bits {
 	u8         reserved_at_80[0x180];
 };
 
+struct mlx5_ifc_pcie_cong_event_obj_bits {
+	u8         modify_select_field[0x40];
+
+	u8         inbound_event_en[0x1];
+	u8         outbound_event_en[0x1];
+	u8         reserved_at_42[0x1e];
+
+	u8         reserved_at_60[0x1];
+	u8         inbound_cong_state[0x3];
+	u8         reserved_at_64[0x1];
+	u8         outbound_cong_state[0x3];
+	u8         reserved_at_68[0x18];
+
+	u8         inbound_cong_low_threshold[0x10];
+	u8         inbound_cong_high_threshold[0x10];
+
+	u8         outbound_cong_low_threshold[0x10];
+	u8         outbound_cong_high_threshold[0x10];
+
+	u8         reserved_at_e0[0x340];
+};
+
+struct mlx5_ifc_pcie_cong_event_cmd_in_bits {
+	struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr;
+	struct mlx5_ifc_pcie_cong_event_obj_bits cong_obj;
+};
+
+struct mlx5_ifc_pcie_cong_event_cmd_out_bits {
+	struct mlx5_ifc_general_obj_out_cmd_hdr_bits hdr;
+	struct mlx5_ifc_pcie_cong_event_obj_bits cong_obj;
+};
+
+enum mlx5e_pcie_cong_event_mod_field {
+	MLX5_PCIE_CONG_EVENT_MOD_EVENT_EN = BIT(0),
+	MLX5_PCIE_CONG_EVENT_MOD_THRESH   = BIT(2),
+};
+
 #endif /* MLX5_IFC_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 3/5] net/mlx5e: Create/destroy PCIe Congestion Event object
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
  2025-06-19 11:37 ` [PATCH mlx5-next 1/5] net/mlx5: Small refactor for general object capabilities Mark Bloch
  2025-06-19 11:37 ` [PATCH mlx5-next 2/5] net/mlx5: Add IFC bits for PCIe Congestion Event object Mark Bloch
@ 2025-06-19 11:37 ` Mark Bloch
  2025-06-19 11:37 ` [PATCH net-next 4/5] net/mlx5e: Add device PCIe congestion ethtool stats Mark Bloch
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Dragos Tatulea,
	Mark Bloch

From: Dragos Tatulea <dtatulea@nvidia.com>

Add initial infrastructure to create and destroy the PCIe Congestion
Event object if the object is supported.

The verb for the object creation function is "set" instead of
"create" because the function will accommodate the modify operation
as well in a subsequent patch.

The next patches will hook it up to the event handler and will add
actual functionality.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   2 +
 .../mellanox/mlx5/core/en/pcie_cong_event.c   | 153 ++++++++++++++++++
 .../mellanox/mlx5/core/en/pcie_cong_event.h   |  11 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   3 +
 5 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d292e6a9e22c..650df18a9216 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -29,7 +29,7 @@ mlx5_core-$(CONFIG_MLX5_CORE_EN) += en/rqt.o en/tir.o en/rss.o en/rx_res.o \
 		en/reporter_tx.o en/reporter_rx.o en/params.o en/xsk/pool.o \
 		en/xsk/setup.o en/xsk/rx.o en/xsk/tx.o en/devlink.o en/ptp.o \
 		en/qos.o en/htb.o en/trap.o en/fs_tt_redirect.o en/selq.o \
-		lib/crypto.o lib/sd.o
+		lib/crypto.o lib/sd.o en/pcie_cong_event.o
 
 #
 # Netdev extra
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 65a73913b9a2..784050bbf7f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -921,6 +921,8 @@ struct mlx5e_priv {
 	struct notifier_block      events_nb;
 	struct notifier_block      blocking_events_nb;
 
+	struct mlx5e_pcie_cong_event *cong_event;
+
 	struct udp_tunnel_nic_info nic_info;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
 	struct mlx5e_dcbx          dcbx;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
new file mode 100644
index 000000000000..95a6db9d30b3
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+// Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.
+
+#include "en.h"
+#include "pcie_cong_event.h"
+
+struct mlx5e_pcie_cong_thresh {
+	u16 inbound_high;
+	u16 inbound_low;
+	u16 outbound_high;
+	u16 outbound_low;
+};
+
+struct mlx5e_pcie_cong_event {
+	u64 obj_id;
+
+	struct mlx5e_priv *priv;
+};
+
+/* In units of 0.01 % */
+static const struct mlx5e_pcie_cong_thresh default_thresh_config = {
+	.inbound_high = 9000,
+	.inbound_low = 7500,
+	.outbound_high = 9000,
+	.outbound_low = 7500,
+};
+
+static int
+mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev,
+			     const struct mlx5e_pcie_cong_thresh *config,
+			     u64 *obj_id)
+{
+	u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)];
+	void *cong_obj;
+	void *hdr;
+	int err;
+
+	hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr);
+	cong_obj = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, cong_obj);
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT);
+
+	MLX5_SET(pcie_cong_event_obj, cong_obj, inbound_event_en, 1);
+	MLX5_SET(pcie_cong_event_obj, cong_obj, outbound_event_en, 1);
+
+	MLX5_SET(pcie_cong_event_obj, cong_obj,
+		 inbound_cong_high_threshold, config->inbound_high);
+	MLX5_SET(pcie_cong_event_obj, cong_obj,
+		 inbound_cong_low_threshold, config->inbound_low);
+
+	MLX5_SET(pcie_cong_event_obj, cong_obj,
+		 outbound_cong_high_threshold, config->outbound_high);
+	MLX5_SET(pcie_cong_event_obj, cong_obj,
+		 outbound_cong_low_threshold, config->outbound_low);
+
+	err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+	if (err)
+		return err;
+
+	*obj_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+
+	mlx5_core_dbg(dev, "PCIe congestion event (obj_id=%llu) created. Config: in: [%u, %u], out: [%u, %u]\n",
+		      *obj_id,
+		      config->inbound_high, config->inbound_low,
+		      config->outbound_high, config->outbound_low);
+
+	return 0;
+}
+
+static int mlx5_cmd_pcie_cong_event_destroy(struct mlx5_core_dev *dev,
+					    u64 obj_id)
+{
+	u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)];
+	void *hdr;
+
+	hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, obj_id);
+
+	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+}
+
+bool mlx5e_pcie_cong_event_supported(struct mlx5_core_dev *dev)
+{
+	u64 features = MLX5_CAP_GEN_2_64(dev, general_obj_types_127_64);
+
+	if (!(features & MLX5_HCA_CAP_2_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT))
+		return false;
+
+	if (dev->sd)
+		return false;
+
+	return true;
+}
+
+int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv)
+{
+	struct mlx5e_pcie_cong_event *cong_event;
+	struct mlx5_core_dev *mdev = priv->mdev;
+	int err;
+
+	if (!mlx5e_pcie_cong_event_supported(mdev))
+		return 0;
+
+	cong_event = kvzalloc_node(sizeof(*cong_event), GFP_KERNEL,
+				   mdev->priv.numa_node);
+	if (!cong_event)
+		return -ENOMEM;
+
+	cong_event->priv = priv;
+
+	err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config,
+					   &cong_event->obj_id);
+	if (err) {
+		mlx5_core_warn(mdev, "Error creating a PCIe congestion event object\n");
+		goto err_free;
+	}
+
+	priv->cong_event = cong_event;
+
+	return 0;
+
+err_free:
+	kvfree(cong_event);
+
+	return err;
+}
+
+void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv)
+{
+	struct mlx5e_pcie_cong_event *cong_event = priv->cong_event;
+	struct mlx5_core_dev *mdev = priv->mdev;
+
+	if (!cong_event)
+		return;
+
+	priv->cong_event = NULL;
+
+	if (mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id))
+		mlx5_core_warn(mdev, "Error destroying PCIe congestion event (obj_id=%llu)\n",
+			       cong_event->obj_id);
+
+	kvfree(cong_event);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h
new file mode 100644
index 000000000000..bf1e3632d596
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. */
+
+#ifndef __MLX5_PCIE_CONG_EVENT_H__
+#define __MLX5_PCIE_CONG_EVENT_H__
+
+bool mlx5e_pcie_cong_event_supported(struct mlx5_core_dev *dev);
+int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv);
+void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv);
+
+#endif /* __MLX5_PCIE_CONG_EVENT_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index dca5ca51a470..c6c2139483e0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -76,6 +76,7 @@
 #include "en/trap.h"
 #include "lib/devcom.h"
 #include "lib/sd.h"
+#include "en/pcie_cong_event.h"
 
 static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev)
 {
@@ -5988,6 +5989,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 	if (mlx5e_monitor_counter_supported(priv))
 		mlx5e_monitor_counter_init(priv);
 
+	mlx5e_pcie_cong_event_init(priv);
 	mlx5e_hv_vhca_stats_create(priv);
 	if (netdev->reg_state != NETREG_REGISTERED)
 		return;
@@ -6027,6 +6029,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 
 	mlx5e_nic_set_rx_mode(priv);
 
+	mlx5e_pcie_cong_event_cleanup(priv);
 	mlx5e_hv_vhca_stats_destroy(priv);
 	if (mlx5e_monitor_counter_supported(priv))
 		mlx5e_monitor_counter_cleanup(priv);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 4/5] net/mlx5e: Add device PCIe congestion ethtool stats
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
                   ` (2 preceding siblings ...)
  2025-06-19 11:37 ` [PATCH net-next 3/5] net/mlx5e: Create/destroy " Mark Bloch
@ 2025-06-19 11:37 ` Mark Bloch
  2025-06-19 11:37 ` [PATCH net-next 5/5] net/mlx5e: Make PCIe congestion event thresholds configurable Mark Bloch
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Dragos Tatulea,
	Mark Bloch

From: Dragos Tatulea <dtatulea@nvidia.com>

Implement the PCIe Congestion Event notifier which triggers a work item
to query the PCIe Congestion Event object. The result of the congestion
state is reflected in the new ethtool stats:

* pci_bw_inbound_high: the device has crossed the high threshold for
  inbound PCIe traffic.
* pci_bw_inbound_low: the device has crossed the low threshold for
  inbound PCIe traffic
* pci_bw_outbound_high: the device has crossed the high threshold for
  outbound PCIe traffic.
* pci_bw_outbound_low: the device has crossed the low threshold for
  outbound PCIe traffic

The high and low thresholds are currently configured at 90% and 75%.
These are hysteresis thresholds which help to check if the
PCI bus on the device side is in a congested state.

If low + 1 = high then the device is in a congested state. If low == high
then the device is not in a congested state.

The counters are also documented.

A follow-up patch will make the thresholds configurable.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../ethernet/mellanox/mlx5/counters.rst       |  32 ++++
 .../mellanox/mlx5/core/en/pcie_cong_event.c   | 175 ++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en_stats.c    |   1 +
 .../ethernet/mellanox/mlx5/core/en_stats.h    |   1 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c  |   4 +
 5 files changed, 213 insertions(+)

diff --git a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
index 43d72c8b713b..754c81436408 100644
--- a/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
+++ b/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
@@ -1341,3 +1341,35 @@ Device Counters
      - The number of times the device owned queue had not enough buffers
        allocated.
      - Error
+
+   * - `pci_bw_inbound_high`
+     - The number of times the device crossed the high inbound pcie bandwidth
+       threshold. To be compared to pci_bw_inbound_low to check if the device
+       is in a congested state.
+       If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested.
+       If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested.
+     - Tnformative
+
+   * - `pci_bw_inbound_low`
+     - The number of times the device crossed the low inbound PCIe bandwidth
+       threshold. To be compared to pci_bw_inbound_high to check if the device
+       is in a congested state.
+       If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested.
+       If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested.
+     - Informative
+
+   * - `pci_bw_outbound_high`
+     - The number of times the device crossed the high outbound pcie bandwidth
+       threshold. To be compared to pci_bw_outbound_low to check if the device
+       is in a congested state.
+       If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested.
+       If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested.
+     - Informative
+
+   * - `pci_bw_outbound_low`
+     - The number of times the device crossed the low outbound PCIe bandwidth
+       threshold. To be compared to pci_bw_outbound_high to check if the device
+       is in a congested state.
+       If pci_bw_outbound_high == pci_bw_outbound_low then the device is not congested.
+       If pci_bw_outbound_high > pci_bw_outbound_low then the device is congested.
+     - Informative
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
index 95a6db9d30b3..a24e5465ceeb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
@@ -4,6 +4,13 @@
 #include "en.h"
 #include "pcie_cong_event.h"
 
+#define MLX5E_CONG_HIGH_STATE 0x7
+
+enum {
+	MLX5E_INBOUND_CONG  = BIT(0),
+	MLX5E_OUTBOUND_CONG = BIT(1),
+};
+
 struct mlx5e_pcie_cong_thresh {
 	u16 inbound_high;
 	u16 inbound_low;
@@ -11,10 +18,27 @@ struct mlx5e_pcie_cong_thresh {
 	u16 outbound_low;
 };
 
+struct mlx5e_pcie_cong_stats {
+	u32 pci_bw_inbound_high;
+	u32 pci_bw_inbound_low;
+	u32 pci_bw_outbound_high;
+	u32 pci_bw_outbound_low;
+};
+
 struct mlx5e_pcie_cong_event {
 	u64 obj_id;
 
 	struct mlx5e_priv *priv;
+
+	/* For event notifier and workqueue. */
+	struct work_struct work;
+	struct mlx5_nb nb;
+
+	/* Stores last read state. */
+	u8 state;
+
+	/* For ethtool stats group. */
+	struct mlx5e_pcie_cong_stats stats;
 };
 
 /* In units of 0.01 % */
@@ -25,6 +49,51 @@ static const struct mlx5e_pcie_cong_thresh default_thresh_config = {
 	.outbound_low = 7500,
 };
 
+static const struct counter_desc mlx5e_pcie_cong_stats_desc[] = {
+	{ MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats,
+			     pci_bw_inbound_high) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats,
+			     pci_bw_inbound_low) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats,
+			     pci_bw_outbound_high) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_pcie_cong_stats,
+			     pci_bw_outbound_low) },
+};
+
+#define NUM_PCIE_CONG_COUNTERS ARRAY_SIZE(mlx5e_pcie_cong_stats_desc)
+
+static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(pcie_cong)
+{
+	return priv->cong_event ? NUM_PCIE_CONG_COUNTERS : 0;
+}
+
+static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(pcie_cong) {}
+
+static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(pcie_cong)
+{
+	if (!priv->cong_event)
+		return;
+
+	for (int i = 0; i < NUM_PCIE_CONG_COUNTERS; i++)
+		ethtool_puts(data, mlx5e_pcie_cong_stats_desc[i].format);
+}
+
+static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(pcie_cong)
+{
+	if (!priv->cong_event)
+		return;
+
+	for (int i = 0; i < NUM_PCIE_CONG_COUNTERS; i++) {
+		u32 ctr = MLX5E_READ_CTR32_CPU(&priv->cong_event->stats,
+					       mlx5e_pcie_cong_stats_desc,
+					       i);
+
+		mlx5e_ethtool_put_stat(data, ctr);
+	}
+}
+
+MLX5E_DEFINE_STATS_GRP(pcie_cong, 0);
+
 static int
 mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev,
 			     const struct mlx5e_pcie_cong_thresh *config,
@@ -89,6 +158,97 @@ static int mlx5_cmd_pcie_cong_event_destroy(struct mlx5_core_dev *dev,
 	return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
 }
 
+static int mlx5_cmd_pcie_cong_event_query(struct mlx5_core_dev *dev,
+					  u64 obj_id,
+					  u32 *state)
+{
+	u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {};
+	u32 out[MLX5_ST_SZ_DW(pcie_cong_event_cmd_out)];
+	void *obj;
+	void *hdr;
+	u8 cong;
+	int err;
+
+	hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr);
+
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+		 MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
+		 MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT);
+	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_id, obj_id);
+
+	err = mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out));
+	if (err)
+		return err;
+
+	obj = MLX5_ADDR_OF(pcie_cong_event_cmd_out, out, cong_obj);
+
+	if (state) {
+		cong = MLX5_GET(pcie_cong_event_obj, obj, inbound_cong_state);
+		if (cong == MLX5E_CONG_HIGH_STATE)
+			*state |= MLX5E_INBOUND_CONG;
+
+		cong = MLX5_GET(pcie_cong_event_obj, obj, outbound_cong_state);
+		if (cong == MLX5E_CONG_HIGH_STATE)
+			*state |= MLX5E_OUTBOUND_CONG;
+	}
+
+	return 0;
+}
+
+static void mlx5e_pcie_cong_event_work(struct work_struct *work)
+{
+	struct mlx5e_pcie_cong_event *cong_event;
+	struct mlx5_core_dev *dev;
+	struct mlx5e_priv *priv;
+	u32 new_cong_state = 0;
+	u32 changes;
+	int err;
+
+	cong_event = container_of(work, struct mlx5e_pcie_cong_event, work);
+	priv = cong_event->priv;
+	dev = priv->mdev;
+
+	err = mlx5_cmd_pcie_cong_event_query(dev, cong_event->obj_id,
+					     &new_cong_state);
+	if (err) {
+		mlx5_core_warn(dev, "Error %d when querying PCIe cong event object (obj_id=%llu).\n",
+			       err, cong_event->obj_id);
+		return;
+	}
+
+	changes = cong_event->state ^ new_cong_state;
+	if (!changes)
+		return;
+
+	cong_event->state = new_cong_state;
+
+	if (changes & MLX5E_INBOUND_CONG) {
+		if (new_cong_state & MLX5E_INBOUND_CONG)
+			cong_event->stats.pci_bw_inbound_high++;
+		else
+			cong_event->stats.pci_bw_inbound_low++;
+	}
+
+	if (changes & MLX5E_OUTBOUND_CONG) {
+		if (new_cong_state & MLX5E_OUTBOUND_CONG)
+			cong_event->stats.pci_bw_outbound_high++;
+		else
+			cong_event->stats.pci_bw_outbound_low++;
+	}
+}
+
+static int mlx5e_pcie_cong_event_handler(struct notifier_block *nb,
+					 unsigned long event, void *eqe)
+{
+	struct mlx5e_pcie_cong_event *cong_event;
+
+	cong_event = mlx5_nb_cof(nb, struct mlx5e_pcie_cong_event, nb);
+	queue_work(cong_event->priv->wq, &cong_event->work);
+
+	return NOTIFY_OK;
+}
+
 bool mlx5e_pcie_cong_event_supported(struct mlx5_core_dev *dev)
 {
 	u64 features = MLX5_CAP_GEN_2_64(dev, general_obj_types_127_64);
@@ -116,6 +276,10 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv)
 	if (!cong_event)
 		return -ENOMEM;
 
+	INIT_WORK(&cong_event->work, mlx5e_pcie_cong_event_work);
+	MLX5_NB_INIT(&cong_event->nb, mlx5e_pcie_cong_event_handler,
+		     OBJECT_CHANGE);
+
 	cong_event->priv = priv;
 
 	err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config,
@@ -125,10 +289,18 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv)
 		goto err_free;
 	}
 
+	err = mlx5_eq_notifier_register(mdev, &cong_event->nb);
+	if (err) {
+		mlx5_core_warn(mdev, "Error registering notifier for the PCIe congestion event\n");
+		goto err_obj_destroy;
+	}
+
 	priv->cong_event = cong_event;
 
 	return 0;
 
+err_obj_destroy:
+	mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id);
 err_free:
 	kvfree(cong_event);
 
@@ -145,6 +317,9 @@ void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv)
 
 	priv->cong_event = NULL;
 
+	mlx5_eq_notifier_unregister(mdev, &cong_event->nb);
+	cancel_work_sync(&cong_event->work);
+
 	if (mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id))
 		mlx5_core_warn(mdev, "Error destroying PCIe congestion event (obj_id=%llu)\n",
 			       cong_event->obj_id);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 19664fa7f217..87536f158d07 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -2612,6 +2612,7 @@ mlx5e_stats_grp_t mlx5e_nic_stats_grps[] = {
 #ifdef CONFIG_MLX5_MACSEC
 	&MLX5E_STATS_GRP(macsec_hw),
 #endif
+	&MLX5E_STATS_GRP(pcie_cong),
 };
 
 unsigned int mlx5e_nic_stats_grps_num(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index def5dea1463d..72dbcc1928ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -535,5 +535,6 @@ extern MLX5E_DECLARE_STATS_GRP(ipsec_hw);
 extern MLX5E_DECLARE_STATS_GRP(ipsec_sw);
 extern MLX5E_DECLARE_STATS_GRP(ptp);
 extern MLX5E_DECLARE_STATS_GRP(macsec_hw);
+extern MLX5E_DECLARE_STATS_GRP(pcie_cong);
 
 #endif /* __MLX5_EN_STATS_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index dfb079e59d85..db54f6d26591 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -21,6 +21,7 @@
 #include "pci_irq.h"
 #include "devlink.h"
 #include "en_accel/ipsec.h"
+#include "en/pcie_cong_event.h"
 
 enum {
 	MLX5_EQE_OWNER_INIT_VAL	= 0x1,
@@ -585,6 +586,9 @@ static void gather_async_events_mask(struct mlx5_core_dev *dev, u64 mask[4])
 		async_event_mask |=
 			(1ull << MLX5_EVENT_TYPE_OBJECT_CHANGE);
 
+	if (mlx5e_pcie_cong_event_supported(dev))
+		async_event_mask |= (1ull << MLX5_EVENT_TYPE_OBJECT_CHANGE);
+
 	mask[0] = async_event_mask;
 
 	if (MLX5_CAP_GEN(dev, event_cap))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next 5/5] net/mlx5e: Make PCIe congestion event thresholds configurable
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
                   ` (3 preceding siblings ...)
  2025-06-19 11:37 ` [PATCH net-next 4/5] net/mlx5e: Add device PCIe congestion ethtool stats Mark Bloch
@ 2025-06-19 11:37 ` Mark Bloch
  2025-06-19 14:55 ` [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Jakub Kicinski
  2025-06-25 11:37 ` Leon Romanovsky
  6 siblings, 0 replies; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 11:37 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel, Dragos Tatulea,
	Mark Bloch

From: Dragos Tatulea <dtatulea@nvidia.com>

Add a new sysfs entry for reading and configuring the PCIe congestion
event thresholds. The format is the following:
<inbound_low> <inbound_high> <outbound_low> <outbound_high>

Units are 0.01 %. Accepted values are in range (0, 10000].

When new thresholds are configured, a object modify operation will
happen. The set function is updated accordingly to act as a modify
as well.

The threshold configuration is stored and queried directly
in the firmware.

To prevent fat fingering the numbers, read them initially as u64.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/en/pcie_cong_event.c   | 152 +++++++++++++++++-
 1 file changed, 144 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
index a24e5465ceeb..a74d1e15c92e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c
@@ -39,9 +39,13 @@ struct mlx5e_pcie_cong_event {
 
 	/* For ethtool stats group. */
 	struct mlx5e_pcie_cong_stats stats;
+
+	struct device_attribute attr;
 };
 
 /* In units of 0.01 % */
+#define MLX5E_PCIE_CONG_THRESH_MAX 10000
+
 static const struct mlx5e_pcie_cong_thresh default_thresh_config = {
 	.inbound_high = 9000,
 	.inbound_low = 7500,
@@ -97,6 +101,7 @@ MLX5E_DEFINE_STATS_GRP(pcie_cong, 0);
 static int
 mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev,
 			     const struct mlx5e_pcie_cong_thresh *config,
+			     bool modify,
 			     u64 *obj_id)
 {
 	u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {};
@@ -108,8 +113,16 @@ mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev,
 	hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr);
 	cong_obj = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, cong_obj);
 
-	MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
-		 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	if (!modify) {
+		MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+			 MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+	} else {
+		MLX5_SET(general_obj_in_cmd_hdr, hdr, opcode,
+			 MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+		MLX5_SET(general_obj_in_cmd_hdr, in, obj_id, *obj_id);
+		MLX5_SET64(pcie_cong_event_obj, cong_obj, modify_select_field,
+			   MLX5_PCIE_CONG_EVENT_MOD_THRESH);
+	}
 
 	MLX5_SET(general_obj_in_cmd_hdr, hdr, obj_type,
 		 MLX5_GENERAL_OBJECT_TYPES_PCIE_CONG_EVENT);
@@ -131,10 +144,12 @@ mlx5_cmd_pcie_cong_event_set(struct mlx5_core_dev *dev,
 	if (err)
 		return err;
 
-	*obj_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+	if (!modify)
+		*obj_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
 
-	mlx5_core_dbg(dev, "PCIe congestion event (obj_id=%llu) created. Config: in: [%u, %u], out: [%u, %u]\n",
+	mlx5_core_dbg(dev, "PCIe congestion event (obj_id=%llu) %s. Config: in: [%u, %u], out: [%u, %u]\n",
 		      *obj_id,
+		      modify ? "modified" : "created",
 		      config->inbound_high, config->inbound_low,
 		      config->outbound_high, config->outbound_low);
 
@@ -160,13 +175,13 @@ static int mlx5_cmd_pcie_cong_event_destroy(struct mlx5_core_dev *dev,
 
 static int mlx5_cmd_pcie_cong_event_query(struct mlx5_core_dev *dev,
 					  u64 obj_id,
-					  u32 *state)
+					  u32 *state,
+					  struct mlx5e_pcie_cong_thresh *config)
 {
 	u32 in[MLX5_ST_SZ_DW(pcie_cong_event_cmd_in)] = {};
 	u32 out[MLX5_ST_SZ_DW(pcie_cong_event_cmd_out)];
 	void *obj;
 	void *hdr;
-	u8 cong;
 	int err;
 
 	hdr = MLX5_ADDR_OF(pcie_cong_event_cmd_in, in, hdr);
@@ -184,6 +199,8 @@ static int mlx5_cmd_pcie_cong_event_query(struct mlx5_core_dev *dev,
 	obj = MLX5_ADDR_OF(pcie_cong_event_cmd_out, out, cong_obj);
 
 	if (state) {
+		u8 cong;
+
 		cong = MLX5_GET(pcie_cong_event_obj, obj, inbound_cong_state);
 		if (cong == MLX5E_CONG_HIGH_STATE)
 			*state |= MLX5E_INBOUND_CONG;
@@ -193,6 +210,19 @@ static int mlx5_cmd_pcie_cong_event_query(struct mlx5_core_dev *dev,
 			*state |= MLX5E_OUTBOUND_CONG;
 	}
 
+	if (config) {
+		*config = (struct mlx5e_pcie_cong_thresh) {
+			.inbound_low = MLX5_GET(pcie_cong_event_obj, obj,
+						inbound_cong_low_threshold),
+			.inbound_high = MLX5_GET(pcie_cong_event_obj, obj,
+						inbound_cong_high_threshold),
+			.outbound_low = MLX5_GET(pcie_cong_event_obj, obj,
+						 outbound_cong_low_threshold),
+			.outbound_high = MLX5_GET(pcie_cong_event_obj, obj,
+						  outbound_cong_high_threshold),
+		};
+	}
+
 	return 0;
 }
 
@@ -210,7 +240,7 @@ static void mlx5e_pcie_cong_event_work(struct work_struct *work)
 	dev = priv->mdev;
 
 	err = mlx5_cmd_pcie_cong_event_query(dev, cong_event->obj_id,
-					     &new_cong_state);
+					     &new_cong_state, NULL);
 	if (err) {
 		mlx5_core_warn(dev, "Error %d when querying PCIe cong event object (obj_id=%llu).\n",
 			       err, cong_event->obj_id);
@@ -249,6 +279,101 @@ static int mlx5e_pcie_cong_event_handler(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
+static bool mlx5e_thresh_check_val(u64 val)
+{
+	return val > 0 && val <= MLX5E_PCIE_CONG_THRESH_MAX;
+}
+
+static bool
+mlx5e_thresh_config_check_order(const struct mlx5e_pcie_cong_thresh *config)
+{
+	if (config->inbound_high <= config->inbound_low)
+		return false;
+
+	if (config->outbound_high <= config->outbound_low)
+		return false;
+
+	return true;
+}
+
+#define MLX5E_PCIE_CONG_THRESH_SYSFS_VALUES 4
+
+static ssize_t thresh_config_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf,
+				   size_t count)
+{
+	struct mlx5e_pcie_cong_thresh config = {};
+	struct mlx5e_pcie_cong_event *cong_event;
+	u64 outbound_high, outbound_low;
+	u64 inbound_high, inbound_low;
+	struct mlx5e_priv *priv;
+	int ret;
+	int err;
+
+	cong_event = container_of(attr, struct mlx5e_pcie_cong_event, attr);
+	priv = cong_event->priv;
+
+	ret = sscanf(buf, "%llu %llu %llu %llu",
+		     &inbound_low, &inbound_high,
+		     &outbound_low, &outbound_high);
+	if (ret != MLX5E_PCIE_CONG_THRESH_SYSFS_VALUES) {
+		mlx5_core_err(priv->mdev, "Invalid format for PCIe congestion threshold configuration. Expected %d, got %d.\n",
+			      MLX5E_PCIE_CONG_THRESH_SYSFS_VALUES, ret);
+		return -EINVAL;
+	}
+
+	if (!mlx5e_thresh_check_val(inbound_high) ||
+	    !mlx5e_thresh_check_val(inbound_low) ||
+	    !mlx5e_thresh_check_val(outbound_high) ||
+	    !mlx5e_thresh_check_val(outbound_low)) {
+		mlx5_core_err(priv->mdev, "Invalid values for PCIe congestion threshold configuration. Valid range [1, %d]\n",
+			      MLX5E_PCIE_CONG_THRESH_MAX);
+		return -EINVAL;
+	}
+
+	config = (struct mlx5e_pcie_cong_thresh) {
+		.inbound_low = inbound_low,
+		.inbound_high = inbound_high,
+		.outbound_low = outbound_low,
+		.outbound_high = outbound_high,
+
+	};
+
+	if (!mlx5e_thresh_config_check_order(&config)) {
+		mlx5_core_err(priv->mdev, "Invalid order of values for PCIe congestion threshold configuration.\n");
+		return -EINVAL;
+	}
+
+	err = mlx5_cmd_pcie_cong_event_set(priv->mdev, &config,
+					   true, &cong_event->obj_id);
+
+	return err ? err : count;
+}
+
+static ssize_t thresh_config_show(struct device *dev,
+				  struct device_attribute *attr,
+				  char *buf)
+{
+	struct mlx5e_pcie_cong_event *cong_event;
+	struct mlx5e_pcie_cong_thresh config;
+	struct mlx5e_priv *priv;
+	int err;
+
+	cong_event = container_of(attr, struct mlx5e_pcie_cong_event, attr);
+	priv = cong_event->priv;
+
+	err = mlx5_cmd_pcie_cong_event_query(priv->mdev, cong_event->obj_id,
+					     NULL, &config);
+
+	if (err)
+		return err;
+
+	return sysfs_emit(buf, "%u %u %u %u\n",
+			  config.inbound_low, config.inbound_high,
+			  config.outbound_low, config.outbound_high);
+}
+
 bool mlx5e_pcie_cong_event_supported(struct mlx5_core_dev *dev)
 {
 	u64 features = MLX5_CAP_GEN_2_64(dev, general_obj_types_127_64);
@@ -283,7 +408,7 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv)
 	cong_event->priv = priv;
 
 	err = mlx5_cmd_pcie_cong_event_set(mdev, &default_thresh_config,
-					   &cong_event->obj_id);
+					   false, &cong_event->obj_id);
 	if (err) {
 		mlx5_core_warn(mdev, "Error creating a PCIe congestion event object\n");
 		goto err_free;
@@ -295,10 +420,20 @@ int mlx5e_pcie_cong_event_init(struct mlx5e_priv *priv)
 		goto err_obj_destroy;
 	}
 
+	cong_event->attr = (struct device_attribute)__ATTR_RW(thresh_config);
+	err = sysfs_create_file(&mdev->device->kobj,
+				&cong_event->attr.attr);
+	if (err) {
+		mlx5_core_warn(mdev, "Error creating a sysfs entry for pcie_cong limits.\n");
+		goto err_unregister_nb;
+	}
+
 	priv->cong_event = cong_event;
 
 	return 0;
 
+err_unregister_nb:
+	mlx5_eq_notifier_unregister(mdev, &cong_event->nb);
 err_obj_destroy:
 	mlx5_cmd_pcie_cong_event_destroy(mdev, cong_event->obj_id);
 err_free:
@@ -316,6 +451,7 @@ void mlx5e_pcie_cong_event_cleanup(struct mlx5e_priv *priv)
 		return;
 
 	priv->cong_event = NULL;
+	sysfs_remove_file(&mdev->device->kobj, &cong_event->attr.attr);
 
 	mlx5_eq_notifier_unregister(mdev, &cong_event->nb);
 	cancel_work_sync(&cong_event->work);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
                   ` (4 preceding siblings ...)
  2025-06-19 11:37 ` [PATCH net-next 5/5] net/mlx5e: Make PCIe congestion event thresholds configurable Mark Bloch
@ 2025-06-19 14:55 ` Jakub Kicinski
  2025-06-19 16:00   ` Mark Bloch
  2025-06-25 11:37 ` Leon Romanovsky
  6 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2025-06-19 14:55 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
	Jonathan Corbet, netdev, linux-rdma, linux-doc, linux-kernel

On Thu, 19 Jun 2025 14:37:16 +0300 Mark Bloch wrote:
> PCIe congestion events are events generated by the firmware when the
> device side has sustained PCIe inbound or outbound traffic above
> certain thresholds. The high and low threshold are hysteresis thresholds
> to prevent flapping: once the high threshold has been reached, a low
> threshold event will be triggered only after the bandwidth usage went
> below the low threshold.

What are we supposed to do with a series half of which is tagged for
one tree and half for another? If you want for some of the patches to
go via the shared tree - you have to post them separately.
Ideally you'd post them to the list in a combined "pull request +
patches" format (see for example how Marc posts CAN patches, or Pablo
posts netfilter). Once we pull that you can sent the net-next stuff
separately as patches.

I feel like I just had the same exact conversation with Tariq recently.
Really not great when same process explainer has to be given to
multiple people from the same company :( I'd like to remind y'all that
reading the mailing list is not optional:

  Mailing list participation
  --------------------------
  
  Linux kernel uses mailing lists as the primary form of communication.
  Maintainers must be subscribed and follow the appropriate subsystem-wide
  mailing list. Either by subscribing to the whole list or using more
  modern, selective setup like
  `lei <https://people.kernel.org/monsieuricon/lore-lei-part-1-getting-started>`_.
  
See: https://www.kernel.org/doc/html/next/maintainer/feature-and-driver-maintainers.html#mailing-list-participation

Then again, I guess you're not a maintainer. There are 2 maintainers
for the driver listed and yet we get patches from a 3rd unlisted person.

SMH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
  2025-06-19 14:55 ` [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Jakub Kicinski
@ 2025-06-19 16:00   ` Mark Bloch
  2025-06-19 19:19     ` Saeed Mahameed
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Bloch @ 2025-06-19 16:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
	Jonathan Corbet, netdev, linux-rdma, linux-doc, linux-kernel



On 19/06/2025 17:55, Jakub Kicinski wrote:
> On Thu, 19 Jun 2025 14:37:16 +0300 Mark Bloch wrote:
>> PCIe congestion events are events generated by the firmware when the
>> device side has sustained PCIe inbound or outbound traffic above
>> certain thresholds. The high and low threshold are hysteresis thresholds
>> to prevent flapping: once the high threshold has been reached, a low
>> threshold event will be triggered only after the bandwidth usage went
>> below the low threshold.
> 
> What are we supposed to do with a series half of which is tagged for
> one tree and half for another? If you want for some of the patches to
> go via the shared tree - you have to post them separately.
> Ideally you'd post them to the list in a combined "pull request +
> patches" format (see for example how Marc posts CAN patches, or Pablo
> posts netfilter). Once we pull that you can sent the net-next stuff
> separately as patches.

Miscommunication about the proper process, thanks for the explanation.
PR + patches seems cleaner and provides more context,
so I’ll go with that.

> 
> I feel like I just had the same exact conversation with Tariq recently.
> Really not great when same process explainer has to be given to
> multiple people from the same company :( I'd like to remind y'all that
> reading the mailing list is not optional:

I do follow the mailing list and double checked what should be done in
this scenario. In the end it's my responsibility so it's my fault.

> 
>   Mailing list participation
>   --------------------------
>   
>   Linux kernel uses mailing lists as the primary form of communication.
>   Maintainers must be subscribed and follow the appropriate subsystem-wide
>   mailing list. Either by subscribing to the whole list or using more
>   modern, selective setup like
>   `lei <https://people.kernel.org/monsieuricon/lore-lei-part-1-getting-started>`_.
>   
> See: https://www.kernel.org/doc/html/next/maintainer/feature-and-driver-maintainers.html#mailing-list-participation
> 
> Then again, I guess you're not a maintainer. There are 2 maintainers
> for the driver listed and yet we get patches from a 3rd unlisted person.

Tariq is on vacation which got extended because of flight issues.
I've mentioned I'll be handling the mlx5 submissionד until his return
on v3 of the tcp zero-copy series.

> 
> SMH


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
  2025-06-19 16:00   ` Mark Bloch
@ 2025-06-19 19:19     ` Saeed Mahameed
  2025-06-19 22:22       ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Saeed Mahameed @ 2025-06-19 19:19 UTC (permalink / raw)
  To: Mark Bloch
  Cc: Jakub Kicinski, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, Jonathan Corbet, netdev, linux-rdma, linux-doc,
	linux-kernel

On 19 Jun 19:00, Mark Bloch wrote:
>
>
>On 19/06/2025 17:55, Jakub Kicinski wrote:
>> On Thu, 19 Jun 2025 14:37:16 +0300 Mark Bloch wrote:
>>> PCIe congestion events are events generated by the firmware when the
>>> device side has sustained PCIe inbound or outbound traffic above
>>> certain thresholds. The high and low threshold are hysteresis thresholds
>>> to prevent flapping: once the high threshold has been reached, a low
>>> threshold event will be triggered only after the bandwidth usage went
>>> below the low threshold.
>>
>> What are we supposed to do with a series half of which is tagged for
>> one tree and half for another? If you want for some of the patches to
>> go via the shared tree - you have to post them separately.
>> Ideally you'd post them to the list in a combined "pull request +
>> patches" format (see for example how Marc posts CAN patches, or Pablo
>> posts netfilter). Once we pull that you can sent the net-next stuff
>> separately as patches.
>
>Miscommunication about the proper process, thanks for the explanation.
>PR + patches seems cleaner and provides more context,
>so I’ll go with that.
>
>>
>> I feel like I just had the same exact conversation with Tariq recently.
>> Really not great when same process explainer has to be given to
>> multiple people from the same company :( I'd like to remind y'all that
>> reading the mailing list is not optional:
>
>I do follow the mailing list and double checked what should be done in
>this scenario. In the end it's my responsibility so it's my fault.
>

I think what Mark did here is fine, Yes I understand this is not
applicable to net-next yet, but the point is review and we can do the
following, when review is done:

I can Apply the mlx5-next portion to mlx5-next and Mark on V2 can send the
net-next stuff + A PR request to the mlx5-next branch, this is how we used
to do it all the time, but this time review happens all at once for both
trees.

Jakub is this acceptable ? 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
  2025-06-19 19:19     ` Saeed Mahameed
@ 2025-06-19 22:22       ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2025-06-19 22:22 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Mark Bloch, David S. Miller, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, Jonathan Corbet, netdev, linux-rdma, linux-doc,
	linux-kernel

On Thu, 19 Jun 2025 12:19:20 -0700 Saeed Mahameed wrote:
> I think what Mark did here is fine, Yes I understand this is not
> applicable to net-next yet,

Yes, once again netdev is the problem.

> but the point is review and we can do the following, when review is done:
> 
> I can Apply the mlx5-next portion to mlx5-next and Mark on V2 can send the
> net-next stuff + A PR request to the mlx5-next branch, this is how we used
> to do it all the time, but this time review happens all at once for both
> trees.
> 
> Jakub is this acceptable ? 

Don't complicate it, please. Send a PR with the interface patches and
we can review the rest.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events
  2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
                   ` (5 preceding siblings ...)
  2025-06-19 14:55 ` [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Jakub Kicinski
@ 2025-06-25 11:37 ` Leon Romanovsky
  6 siblings, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2025-06-25 11:37 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman, saeedm, gal, tariqt, Jonathan Corbet,
	netdev, linux-rdma, linux-doc, linux-kernel

On Thu, Jun 19, 2025 at 02:37:16PM +0300, Mark Bloch wrote:
> PCIe congestion events are events generated by the firmware when the
> device side has sustained PCIe inbound or outbound traffic above
> certain thresholds. The high and low threshold are hysteresis thresholds
> to prevent flapping: once the high threshold has been reached, a low
> threshold event will be triggered only after the bandwidth usage went
> below the low threshold.

<...>

> Dragos Tatulea (5):
>   net/mlx5: Small refactor for general object capabilities
>   net/mlx5: Add IFC bits for PCIe Congestion Event object

Applied these patches to mlx5-next.

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-06-25 11:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19 11:37 [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Mark Bloch
2025-06-19 11:37 ` [PATCH mlx5-next 1/5] net/mlx5: Small refactor for general object capabilities Mark Bloch
2025-06-19 11:37 ` [PATCH mlx5-next 2/5] net/mlx5: Add IFC bits for PCIe Congestion Event object Mark Bloch
2025-06-19 11:37 ` [PATCH net-next 3/5] net/mlx5e: Create/destroy " Mark Bloch
2025-06-19 11:37 ` [PATCH net-next 4/5] net/mlx5e: Add device PCIe congestion ethtool stats Mark Bloch
2025-06-19 11:37 ` [PATCH net-next 5/5] net/mlx5e: Make PCIe congestion event thresholds configurable Mark Bloch
2025-06-19 14:55 ` [PATCH net-next 0/5] net/mlx5e: Add support for PCIe congestion events Jakub Kicinski
2025-06-19 16:00   ` Mark Bloch
2025-06-19 19:19     ` Saeed Mahameed
2025-06-19 22:22       ` Jakub Kicinski
2025-06-25 11:37 ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).