public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm
@ 2026-02-07  1:05 Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 1/5] net: ethtool: Track pause storm events Mohsin Bashir
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

With TX pause enabled, if a device cannot deliver received frames to
the stack (e.g., during a system hang), it may generate excessive pause
frames causing a pause storm. This series updates the uAPI to track TX
pause storm events as part of the pause stats (p1), propose to use the
existing knob (pfc-prevention-tout) to configure storm watchdog (p2),
adds pause storm protection support for fbnic (p3), and leverages p1
to provide observability into these events for fbnic (p4) and mlnx5 (p5)
drivers.

---
Changelog:
V2:
 - Clarify pfc-prevention-tout applies to general pause, not just PFC
   (P2)
 - Add pause storm watchdog timeout configuration via pfc-prevention-tout
   (P3)
 - mlx5: Report device stall prevention events (errors) in pause stats
   (P5)

V1: https://lore.kernel.org/20260122192158.428882-1-mohsin.bashr@gmail.com/

Mohsin Bashir (5):
  net: ethtool: Track pause storm events
  net: ethtool: Update doc for tunable
  eth: fbnic: Add protection against pause storm
  eth: fbnic: Fetch TX pause storm stats
  eth: mlx5: Move pause storm errors to pause stats

 Documentation/netlink/specs/ethtool.yaml      |  13 +++
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  25 ++++
 drivers/net/ethernet/meta/fbnic/fbnic.h       |   3 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  11 ++
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |  46 ++++++++
 .../net/ethernet/meta/fbnic/fbnic_hw_stats.h  |   1 +
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |   2 +
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 110 ++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |  27 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |   5 +
 include/linux/ethtool.h                       |   2 +
 include/uapi/linux/ethtool.h                  |   2 +-
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 net/ethtool/pause.c                           |   4 +-
 14 files changed, 250 insertions(+), 2 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 1/5] net: ethtool: Track pause storm events
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
@ 2026-02-07  1:05 ` Mohsin Bashir
  2026-02-11  9:28   ` Paolo Abeni
  2026-02-07  1:05 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

With TX pause enabled, if a device is unable to pass packets up to the
stack (e.g., CPU is hanged), the device can cause pause storm. Given
that devices can have native support to protect the neighbor from such
flooding, such events need some tracking. This support is to track TX
pause storm events for better observability.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
 Documentation/netlink/specs/ethtool.yaml       | 13 +++++++++++++
 include/linux/ethtool.h                        |  2 ++
 include/uapi/linux/ethtool_netlink_generated.h |  1 +
 net/ethtool/pause.c                            |  4 +++-
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
index 0a2d2343f79a..4707063af3b4 100644
--- a/Documentation/netlink/specs/ethtool.yaml
+++ b/Documentation/netlink/specs/ethtool.yaml
@@ -879,6 +879,19 @@ attribute-sets:
       -
         name: rx-frames
         type: u64
+      -
+        name: tx-pause-storm-events
+        type: u64
+        doc: >-
+            TX pause storm event count. Increments each time device
+            detects that its pause assertion condition has been true
+            for too long for normal operation. As a result, the device
+            has temporarily disabled its own Pause TX function to
+            protect the network from itself.
+            This counter should never increment under normal overload
+            conditions; it indicates catastrophic failure like an OS
+            crash. The rate of incrementing is implementation specific.
+
   -
     name: pause
     attr-cnt-name: __ethtool-a-pause-cnt
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 798abec67a1b..83c375840835 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -512,12 +512,14 @@ struct ethtool_eth_ctrl_stats {
  *
  *	Equivalent to `30.3.4.3 aPAUSEMACCtrlFramesReceived`
  *	from the standard.
+ * @tx_pause_storm_events: TX pause storm event count (see ethtool.yaml).
  */
 struct ethtool_pause_stats {
 	enum ethtool_mac_stats_src src;
 	struct_group(stats,
 		u64 tx_pause_frames;
 		u64 rx_pause_frames;
+		u64 tx_pause_storm_events;
 	);
 };
 
diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
index 556a0c834df5..114b83017297 100644
--- a/include/uapi/linux/ethtool_netlink_generated.h
+++ b/include/uapi/linux/ethtool_netlink_generated.h
@@ -381,6 +381,7 @@ enum {
 	ETHTOOL_A_PAUSE_STAT_PAD,
 	ETHTOOL_A_PAUSE_STAT_TX_FRAMES,
 	ETHTOOL_A_PAUSE_STAT_RX_FRAMES,
+	ETHTOOL_A_PAUSE_STAT_TX_PAUSE_STORM_EVENTS,
 
 	__ETHTOOL_A_PAUSE_STAT_CNT,
 	ETHTOOL_A_PAUSE_STAT_MAX = (__ETHTOOL_A_PAUSE_STAT_CNT - 1)
diff --git a/net/ethtool/pause.c b/net/ethtool/pause.c
index 0f9af1e66548..5d28f642764c 100644
--- a/net/ethtool/pause.c
+++ b/net/ethtool/pause.c
@@ -130,7 +130,9 @@ static int pause_put_stats(struct sk_buff *skb,
 	if (ethtool_put_stat(skb, pause_stats->tx_pause_frames,
 			     ETHTOOL_A_PAUSE_STAT_TX_FRAMES, pad) ||
 	    ethtool_put_stat(skb, pause_stats->rx_pause_frames,
-			     ETHTOOL_A_PAUSE_STAT_RX_FRAMES, pad))
+			     ETHTOOL_A_PAUSE_STAT_RX_FRAMES, pad) ||
+	    ethtool_put_stat(skb, pause_stats->tx_pause_storm_events,
+			     ETHTOOL_A_PAUSE_STAT_TX_PAUSE_STORM_EVENTS, pad))
 		goto err_cancel;
 
 	nla_nest_end(skb, nest);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 1/5] net: ethtool: Track pause storm events Mohsin Bashir
@ 2026-02-07  1:05 ` Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 3/5] eth: fbnic: Add protection against pause storm Mohsin Bashir
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value
for PFC storm prevention. This can also be used to configure storm
detection timeout for global pause settings. In fact some existing
drivers are already using it for the said purpose.

Highlight that the knob can formally be used to configure timeout
value for pause storm prevention mechanism. The update to the ethtool
man page will follow afterwards.

Link: https://lore.kernel.org/aa5f189a-ac62-4633-97b5-ebf939e9c535@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
 include/uapi/linux/ethtool.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index b74b80508553..1cdfb8341df2 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -225,7 +225,7 @@ enum tunable_id {
 	ETHTOOL_ID_UNSPEC,
 	ETHTOOL_RX_COPYBREAK,
 	ETHTOOL_TX_COPYBREAK,
-	ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
+	ETHTOOL_PFC_PREVENTION_TOUT, /* both pause and pfc, see man ethtool */
 	ETHTOOL_TX_COPYBREAK_BUF_SIZE,
 	/*
 	 * Add your fresh new tunable attribute above and remember to update
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 3/5] eth: fbnic: Add protection against pause storm
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 1/5] net: ethtool: Track pause storm events Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
@ 2026-02-07  1:05 ` Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 4/5] eth: fbnic: Fetch TX pause storm stats Mohsin Bashir
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

Add protection against TX pause storms. A pause storm occurs when a
device fails to send received packets up to the stack. When a pause
storm is detected (pause state persists beyond the configured timeout),
the device stops sending the pause frames and begins dropping packets
instead of back-pressuring.

The timeout is configurable via ethtool tunable (pfc-prevention-tout)
with a maximum value of 10485ms, and the default value of 500ms.

Once the device transitions to the storm-detected state, the service
task periodically attempts recovery, returning the device to normal
operation to handle any subsequent pause storm episodes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
Changelog:
V2: Add pause storm watchdog timeout configuration via pfc-prevention-tout
   (P3)
---
 drivers/net/ethernet/meta/fbnic/fbnic.h       |  3 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h   | 10 ++
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   | 43 +++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |  2 +
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 95 +++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h   | 27 ++++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |  5 +
 7 files changed, 185 insertions(+)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic.h b/drivers/net/ethernet/meta/fbnic/fbnic.h
index 779a083b9215..a760a27b1516 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic.h
@@ -98,6 +98,9 @@ struct fbnic_dev {
 
 	/* MDIO bus for PHYs */
 	struct mii_bus *mdio_bus;
+
+	/* In units of ms since API supports values in ms */
+	u16 ps_timeout;
 };
 
 /* Reserve entry 0 in the MSI-X "others" array until we have filled all
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index b717db879cd3..e68c56237b61 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -230,6 +230,7 @@ enum {
 #define FBNIC_INTR_MSIX_CTRL_VECTOR_MASK	CSR_GENMASK(7, 0)
 #define FBNIC_INTR_MSIX_CTRL_ENABLE		CSR_BIT(31)
 enum {
+	FBNIC_INTR_MSIX_CTRL_RXB_IDX	= 7,
 	FBNIC_INTR_MSIX_CTRL_PCS_IDX	= 34,
 };
 
@@ -560,6 +561,11 @@ enum {
 #define FBNIC_RXB_DROP_THLD_CNT			8
 #define FBNIC_RXB_DROP_THLD_ON			CSR_GENMASK(12, 0)
 #define FBNIC_RXB_DROP_THLD_OFF			CSR_GENMASK(25, 13)
+#define FBNIC_RXB_PAUSE_STORM(n)	(0x08019 + (n)) /* 0x20064 + 4*n */
+#define FBNIC_RXB_PAUSE_STORM_CNT		4
+#define FBNIC_RXB_PAUSE_STORM_FORCE_NORMAL	CSR_BIT(20)
+#define FBNIC_RXB_PAUSE_STORM_THLD_TIME		CSR_GENMASK(19, 0)
+#define FBNIC_RXB_PAUSE_STORM_UNIT_WR	0x0801d		/* 0x20074 */
 #define FBNIC_RXB_ECN_THLD(n)		(0x0801e + (n)) /* 0x20078 + 4*n */
 #define FBNIC_RXB_ECN_THLD_CNT			8
 #define FBNIC_RXB_ECN_THLD_ON			CSR_GENMASK(12, 0)
@@ -596,6 +602,9 @@ enum {
 #define FBNIC_RXB_INTF_CREDIT_MASK2		CSR_GENMASK(11, 8)
 #define FBNIC_RXB_INTF_CREDIT_MASK3		CSR_GENMASK(15, 12)
 
+#define FBNIC_RXB_ERR_INTR_STS		0x08050		/* 0x20140 */
+#define FBNIC_RXB_ERR_INTR_STS_PS		CSR_GENMASK(15, 12)
+#define FBNIC_RXB_ERR_INTR_MASK		0x08052		/* 0x20148 */
 #define FBNIC_RXB_PAUSE_EVENT_CNT(n)	(0x08053 + (n))	/* 0x2014c + 4*n */
 #define FBNIC_RXB_DROP_FRMS_STS(n)	(0x08057 + (n))	/* 0x2015c + 4*n */
 #define FBNIC_RXB_DROP_BYTES_STS_L(n) \
@@ -636,6 +645,7 @@ enum {
 
 #define FBNIC_RXB_PBUF_FIFO_LEVEL(n)	(0x0811d + (n)) /* 0x20474 + 4*n */
 
+#define FBNIC_RXB_PAUSE_STORM_UNIT_RD	0x08125		/* 0x20494 */
 #define FBNIC_RXB_INTEGRITY_ERR(n)	(0x0812f + (n))	/* 0x204bc + 4*n */
 #define FBNIC_RXB_MAC_ERR(n)		(0x08133 + (n))	/* 0x204cc + 4*n */
 #define FBNIC_RXB_PARSER_ERR(n)		(0x08137 + (n))	/* 0x204dc + 4*n */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index 11745a2d8a44..dc57519ebbe5 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -1638,6 +1638,47 @@ static void fbnic_get_ts_stats(struct net_device *netdev,
 	}
 }
 
+static int fbnic_get_tunable(struct net_device *netdev,
+			     const struct ethtool_tunable *tun,
+			     void *data)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	int err = 0;
+
+	switch (tun->id) {
+	case ETHTOOL_PFC_PREVENTION_TOUT:
+		*(u16 *)data = fbn->fbd->ps_timeout;
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static int fbnic_set_tunable(struct net_device *netdev,
+			     const struct ethtool_tunable *tun,
+			     const void *data)
+{
+	struct fbnic_net *fbn = netdev_priv(netdev);
+	int err;
+
+	switch (tun->id) {
+	case ETHTOOL_PFC_PREVENTION_TOUT: {
+		u16 ps_timeout = *(u16 *)data;
+
+		err = fbnic_mac_ps_protect_to_config(fbn->fbd, ps_timeout);
+		break;
+	}
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
 static int
 fbnic_get_module_eeprom_by_page(struct net_device *netdev,
 				const struct ethtool_module_eeprom *page_data,
@@ -1912,6 +1953,8 @@ static const struct ethtool_ops fbnic_ethtool_ops = {
 	.set_channels			= fbnic_set_channels,
 	.get_ts_info			= fbnic_get_ts_info,
 	.get_ts_stats			= fbnic_get_ts_stats,
+	.get_tunable			= fbnic_get_tunable,
+	.set_tunable			= fbnic_set_tunable,
 	.get_link_ksettings		= fbnic_phylink_ethtool_ksettings_get,
 	.get_fec_stats			= fbnic_get_fec_stats,
 	.get_fecparam			= fbnic_phylink_get_fecparam,
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
index 02e8b0b257fe..1e6a8fd6f702 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_irq.c
@@ -170,6 +170,8 @@ int fbnic_mac_request_irq(struct fbnic_dev *fbd)
 	fbnic_wr32(fbd, FBNIC_INTR_MSIX_CTRL(FBNIC_INTR_MSIX_CTRL_PCS_IDX),
 		   FBNIC_PCS_MSIX_ENTRY | FBNIC_INTR_MSIX_CTRL_ENABLE);
 
+	fbnic_wr32(fbd, FBNIC_INTR_MSIX_CTRL(FBNIC_INTR_MSIX_CTRL_RXB_IDX), 0);
+
 	fbd->mac_msix_vector = vector;
 
 	return 0;
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
index 9d0e4b2cc9ac..be834983e981 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
@@ -143,6 +143,7 @@ static void fbnic_mac_init_qm(struct fbnic_dev *fbd)
 #define FBNIC_DROP_EN_MASK	0x7d
 #define FBNIC_PAUSE_EN_MASK	0x14
 #define FBNIC_ECN_EN_MASK	0x10
+#define FBNIC_PS_EN_MASK	0x01
 
 struct fbnic_fifo_config {
 	unsigned int addr;
@@ -420,6 +421,14 @@ static void __fbnic_mac_stat_rd64(struct fbnic_dev *fbd, bool reset, u32 reg,
 #define fbnic_mac_stat_rd64(fbd, reset, __stat, __CSR) \
 	__fbnic_mac_stat_rd64(fbd, reset, FBNIC_##__CSR##_L, &(__stat))
 
+bool fbnic_mac_check_tx_pause(struct fbnic_dev *fbd)
+{
+	u32 command_config;
+
+	command_config = rd32(fbd, FBNIC_MAC_COMMAND_CONFIG);
+	return !(command_config & FBNIC_MAC_COMMAND_CONFIG_TX_PAUSE_DIS);
+}
+
 static void fbnic_mac_tx_pause_config(struct fbnic_dev *fbd, bool tx_pause)
 {
 	u32 rxb_pause_ctrl;
@@ -434,6 +443,49 @@ static void fbnic_mac_tx_pause_config(struct fbnic_dev *fbd, bool tx_pause)
 	wr32(fbd, FBNIC_RXB_PAUSE_DROP_CTRL, rxb_pause_ctrl);
 }
 
+static void
+fbnic_mac_ps_protect_to_reset(struct fbnic_dev *fbd, u16 timeout_ms)
+{
+	wr32(fbd, FBNIC_RXB_PAUSE_STORM_UNIT_WR, FBNIC_RXB_PS_CLK_DIV);
+
+	wr32(fbd, FBNIC_RXB_PAUSE_STORM(FBNIC_RXB_INTF_NET),
+	     FIELD_PREP(FBNIC_RXB_PAUSE_STORM_THLD_TIME,
+			FBNIC_MAC_RXB_PS_TO(timeout_ms)) |
+			FBNIC_RXB_PAUSE_STORM_FORCE_NORMAL);
+	wrfl(fbd);
+	wr32(fbd, FBNIC_RXB_PAUSE_STORM(FBNIC_RXB_INTF_NET),
+	     FIELD_PREP(FBNIC_RXB_PAUSE_STORM_THLD_TIME,
+			FBNIC_MAC_RXB_PS_TO(timeout_ms)));
+}
+
+static void
+fbnic_mac_ps_protect_config(struct fbnic_dev *fbd, bool ps_protect)
+{
+	u16 timeout;
+	u32 reg;
+
+	ps_protect = ps_protect && fbd->ps_timeout;
+	timeout = ps_protect ? fbd->ps_timeout : FBNIC_MAC_PS_TO_DEFAULT_MS;
+
+	fbnic_mac_ps_protect_to_reset(fbd, timeout);
+
+	reg = rd32(fbd, FBNIC_RXB_PAUSE_DROP_CTRL);
+	reg &= ~FBNIC_RXB_PAUSE_DROP_CTRL_PS_ENABLE;
+	reg |= FIELD_PREP(FBNIC_RXB_PAUSE_DROP_CTRL_PS_ENABLE, ps_protect);
+	wr32(fbd, FBNIC_RXB_PAUSE_DROP_CTRL, reg);
+
+	/* Clear any pending interrupt status first */
+	wr32(fbd, FBNIC_RXB_ERR_INTR_STS,
+	     FIELD_PREP(FBNIC_RXB_ERR_INTR_STS_PS, FBNIC_PS_EN_MASK));
+
+	/* Unmask the Network to Host PS interrupt if tx_pause is on */
+	reg = rd32(fbd, FBNIC_RXB_ERR_INTR_MASK);
+	reg |= FBNIC_RXB_ERR_INTR_STS_PS;
+	if (ps_protect)
+		reg &= ~FBNIC_RXB_ERR_INTR_STS_PS;
+	wr32(fbd, FBNIC_RXB_ERR_INTR_MASK, reg);
+}
+
 static int fbnic_mac_get_link_event(struct fbnic_dev *fbd)
 {
 	u32 intr_mask = rd32(fbd, FBNIC_SIG_PCS_INTR_STS);
@@ -658,6 +710,7 @@ static void fbnic_mac_link_up_asic(struct fbnic_dev *fbd,
 	u32 cmd_cfg, mac_ctrl;
 
 	fbnic_mac_tx_pause_config(fbd, tx_pause);
+	fbnic_mac_ps_protect_config(fbd, tx_pause);
 
 	cmd_cfg = __fbnic_mac_cmd_config_asic(fbd, tx_pause, rx_pause);
 	mac_ctrl = rd32(fbd, FBNIC_SIG_MAC_IN0);
@@ -918,3 +971,45 @@ int fbnic_mac_init(struct fbnic_dev *fbd)
 
 	return 0;
 }
+
+int fbnic_mac_ps_protect_to_config(struct fbnic_dev *fbd, u16 timeout_ms)
+{
+	u16 old_timeout_ms = fbd->ps_timeout;
+
+	if (timeout_ms == old_timeout_ms)
+		return 0;
+
+	if (timeout_ms == PFC_STORM_PREVENTION_AUTO)
+		timeout_ms = FBNIC_MAC_PS_TO_DEFAULT_MS;
+
+	if (timeout_ms > FBNIC_MAC_PS_TO_MAX_MS)
+		return -EINVAL;
+
+	fbd->ps_timeout = timeout_ms;
+
+	if (!fbnic_mac_check_tx_pause(fbd))
+		return 0;
+
+	if (timeout_ms == 0)
+		fbnic_mac_ps_protect_config(fbd, false);
+	else if (old_timeout_ms == 0)
+		fbnic_mac_ps_protect_config(fbd, true);
+	else
+		fbnic_mac_ps_protect_to_reset(fbd, fbd->ps_timeout);
+
+	return 0;
+}
+
+void fbnic_mac_ps_protect_handler(struct fbnic_dev *fbd)
+{
+	u32 rxb_err_sts = rd32(fbd, FBNIC_RXB_ERR_INTR_STS);
+
+	/* Check if pause storm interrupt for network was triggered */
+	if (rxb_err_sts & FIELD_PREP(FBNIC_RXB_ERR_INTR_STS_PS, FBNIC_PS_EN_MASK)) {
+		/* Write 1 to clear the interrupt status first */
+		wr32(fbd, FBNIC_RXB_ERR_INTR_STS,
+		     FIELD_PREP(FBNIC_RXB_ERR_INTR_STS_PS, FBNIC_PS_EN_MASK));
+
+		fbnic_mac_ps_protect_to_reset(fbd, fbd->ps_timeout);
+	}
+}
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.h b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
index f08fe8b7c497..10f30e0e8f69 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.h
@@ -8,6 +8,30 @@
 
 struct fbnic_dev;
 
+/* The RXB clock runs at 600 MHZ in the ASIC and the PAUSE_STORM_UNIT_WR
+ * is 10us granularity, so set the clock to 6000 (0x1770)
+ */
+#define FBNIC_RXB_PS_CLK_DIV		0x1770
+
+/* Convert milliseconds to pause storm timeout units (10us granularity) */
+#define FBNIC_MAC_RXB_PS_TO(ms)		((ms) * 100)
+
+/* Convert pause storm timeout units (10us granularity) to milliseconds */
+#define FBNIC_MAC_RXB_PS_TO_MS(ps)	((ps) / 100)
+
+/* Set the default timer to 500ms, which should be longer than any
+ * reasonable period of continuous pausing. The service task, which runs
+ * once per second, periodically resets the pause storm trigger.
+ *
+ * As a result, on a functioning system, if pause continues, we enforce
+ * a duty cycle determined by the configured pause storm timeout (50%
+ * default). A crashed system will not have the service task and therefore
+ * pause will remain disabled until reboot recovery.
+ */
+#define FBNIC_MAC_PS_TO_DEFAULT_MS	500
+#define FBNIC_MAC_PS_TO_MAX_MS	\
+	FBNIC_MAC_RXB_PS_TO_MS(FIELD_MAX(FBNIC_RXB_PAUSE_STORM_THLD_TIME))
+
 #define FBNIC_MAX_JUMBO_FRAME_SIZE	9742
 
 /* States loosely based on section 136.8.11.7.5 of IEEE 802.3-2022 Ethernet
@@ -119,4 +143,7 @@ struct fbnic_mac {
 
 int fbnic_mac_init(struct fbnic_dev *fbd);
 void fbnic_mac_get_fw_settings(struct fbnic_dev *fbd, u8 *aui, u8 *fec);
+int fbnic_mac_ps_protect_to_config(struct fbnic_dev *fbd, u16 timeout);
+void fbnic_mac_ps_protect_handler(struct fbnic_dev *fbd);
+bool fbnic_mac_check_tx_pause(struct fbnic_dev *fbd);
 #endif /* _FBNIC_MAC_H_ */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
index 6f9389748a7d..196820f38d58 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c
@@ -220,6 +220,9 @@ static void fbnic_service_task(struct work_struct *work)
 
 	fbnic_get_hw_stats32(fbd);
 
+	if (fbd->ps_timeout && fbnic_mac_check_tx_pause(fbd))
+		fbnic_mac_ps_protect_handler(fbd);
+
 	fbnic_fw_check_heartbeat(fbd);
 
 	fbnic_health_check(fbd);
@@ -296,6 +299,8 @@ static int fbnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Populate driver with hardware-specific info and handlers */
 	fbd->max_num_queues = info->max_num_queues;
 
+	fbd->ps_timeout = FBNIC_MAC_PS_TO_DEFAULT_MS;
+
 	pci_set_master(pdev);
 	pci_save_state(pdev);
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 4/5] eth: fbnic: Fetch TX pause storm stats
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
                   ` (2 preceding siblings ...)
  2026-02-07  1:05 ` [PATCH net-next V2 3/5] eth: fbnic: Add protection against pause storm Mohsin Bashir
@ 2026-02-07  1:05 ` Mohsin Bashir
  2026-02-07  1:05 ` [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

With pause storm protection in place, track the occurrence of pause
storm events. Since there is a one-to-one mapping between pause storm
interrupts and events, use the interrupt count to track this metric.

./ethtool -I -a eth0
Pause parameters for eth0:
Autonegotiate:	off
RX:		off
TX:		on
Statistics:
  tx_pause_frames: 759657
  rx_pause_frames: 0
  tx_pause_storm_events: 219

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h      |  1 +
 drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c  |  3 +++
 drivers/net/ethernet/meta/fbnic/fbnic_hw_stats.h |  1 +
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c      | 15 +++++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
index e68c56237b61..72eb22a52572 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_csr.h
@@ -627,6 +627,7 @@ enum {
 	FBNIC_RXB_ENQUEUE_INDICES	= 4
 };
 
+#define FBNIC_RXB_INTR_PS_COUNT(n)	(0x080e9 + (n))	/* 0x203a4 + 4*n */
 #define FBNIC_RXB_DRBO_FRM_CNT_SRC(n)	(0x080f9 + (n))	/* 0x203e4 + 4*n */
 #define FBNIC_RXB_DRBO_BYTE_CNT_SRC_L(n) \
 					(0x080fd + (n))	/* 0x203f4 + 4*n */
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
index dc57519ebbe5..15d3162ab186 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_ethtool.c
@@ -1751,6 +1751,7 @@ fbnic_get_pause_stats(struct net_device *netdev,
 	struct fbnic_net *fbn = netdev_priv(netdev);
 	struct fbnic_mac_stats *mac_stats;
 	struct fbnic_dev *fbd = fbn->fbd;
+	u64 tx_ps_events;
 
 	mac_stats = &fbd->hw_stats.mac;
 
@@ -1758,6 +1759,8 @@ fbnic_get_pause_stats(struct net_device *netdev,
 
 	pause_stats->tx_pause_frames = mac_stats->pause.tx_pause_frames.value;
 	pause_stats->rx_pause_frames = mac_stats->pause.rx_pause_frames.value;
+	tx_ps_events = mac_stats->pause.tx_pause_storm_events.value;
+	pause_stats->tx_pause_storm_events = tx_ps_events;
 }
 
 static void
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_hw_stats.h b/drivers/net/ethernet/meta/fbnic/fbnic_hw_stats.h
index aa3f429a9aed..caea4be46762 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_hw_stats.h
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_hw_stats.h
@@ -54,6 +54,7 @@ struct fbnic_rmon_stats {
 struct fbnic_pause_stats {
 	struct fbnic_stat_counter tx_pause_frames;
 	struct fbnic_stat_counter rx_pause_frames;
+	struct fbnic_stat_counter tx_pause_storm_events;
 };
 
 struct fbnic_eth_mac_stats {
diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
index be834983e981..8e6b589c25b8 100644
--- a/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
+++ b/drivers/net/ethernet/meta/fbnic/fbnic_mac.c
@@ -418,6 +418,18 @@ static void __fbnic_mac_stat_rd64(struct fbnic_dev *fbd, bool reset, u32 reg,
 	stat->reported = true;
 }
 
+static void fbnic_mac_stat_rd32(struct fbnic_dev *fbd, bool reset, u32 reg,
+				struct fbnic_stat_counter *stat)
+{
+	u32 new_reg_value;
+
+	new_reg_value = rd32(fbd, reg);
+	if (!reset)
+		stat->value += new_reg_value - stat->u.old_reg_value_32;
+	stat->u.old_reg_value_32 = new_reg_value;
+	stat->reported = true;
+}
+
 #define fbnic_mac_stat_rd64(fbd, reset, __stat, __CSR) \
 	__fbnic_mac_stat_rd64(fbd, reset, FBNIC_##__CSR##_L, &(__stat))
 
@@ -812,6 +824,9 @@ fbnic_mac_get_pause_stats(struct fbnic_dev *fbd, bool reset,
 			    MAC_STAT_TX_XOFF_STB);
 	fbnic_mac_stat_rd64(fbd, reset, pause_stats->rx_pause_frames,
 			    MAC_STAT_RX_XOFF_STB);
+	fbnic_mac_stat_rd32(fbd, reset,
+			    FBNIC_RXB_INTR_PS_COUNT(FBNIC_RXB_INTF_NET),
+			    &pause_stats->tx_pause_storm_events);
 }
 
 static void
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
                   ` (3 preceding siblings ...)
  2026-02-07  1:05 ` [PATCH net-next V2 4/5] eth: fbnic: Fetch TX pause storm stats Mohsin Bashir
@ 2026-02-07  1:05 ` Mohsin Bashir
  2026-02-11  9:26   ` Paolo Abeni
  2026-02-11  9:49   ` Tariq Toukan
  2026-02-07  1:38 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
  2026-02-07  1:42 ` [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
  6 siblings, 2 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:05 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

Report device_stall_critical_watermark_cnt as tx_pause_storm_events in
the ethtool_pause_stats struct. This counter tracks pause storm error
events which indicate the NIC has been sending pause frames for an
extended period due to a stall.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index a8af84fc9763..2fe779c164e4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -916,6 +916,23 @@ static int mlx5e_stats_get_ieee(struct mlx5_core_dev *mdev,
 				    sz, MLX5_REG_PPCNT, 0, 0);
 }
 
+static int mlx5e_stats_get_per_prio(struct mlx5_core_dev *mdev,
+				    u32 *ppcnt_per_prio)
+{
+	u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {};
+	int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
+
+	if (!(MLX5_CAP_PCAM_FEATURE(mdev, pfcc_mask) &&
+	      MLX5_CAP_DEBUG(mdev, stall_detect)))
+		return -EOPNOTSUPP;
+
+	MLX5_SET(ppcnt_reg, in, local_port, 1);
+	MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_PRIORITY_COUNTERS_GROUP);
+	MLX5_SET(ppcnt_reg, in, prio_tc, 0);
+	return mlx5_core_access_reg(mdev, in, sz, ppcnt_per_prio, sz,
+				    MLX5_REG_PPCNT, 0, 0);
+}
+
 void mlx5e_stats_pause_get(struct mlx5e_priv *priv,
 			   struct ethtool_pause_stats *pause_stats)
 {
@@ -933,6 +950,14 @@ void mlx5e_stats_pause_get(struct mlx5e_priv *priv,
 		MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3,
 				      eth_802_3_cntrs_grp_data_layout,
 				      a_pause_mac_ctrl_frames_received);
+
+	if (mlx5e_stats_get_per_prio(mdev, ppcnt_ieee_802_3))
+		return;
+
+	pause_stats->tx_pause_storm_events =
+		MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3,
+				      eth_per_prio_grp_data_layout,
+				      device_stall_critical_watermark_cnt);
 }
 
 void mlx5e_stats_eth_phy_get(struct mlx5e_priv *priv,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
                   ` (4 preceding siblings ...)
  2026-02-07  1:05 ` [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir
@ 2026-02-07  1:38 ` Mohsin Bashir
  2026-02-07  1:42 ` [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
  6 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:38 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value
for PFC storm prevention. This can also be used to configure storm
detection timeout for global pause settings. In fact some existing
drivers are already using it for the said purpose.

Highlight that the knob can formally be used to configure timeout
value for pause storm prevention mechanism. The update to the ethtool
man page will follow afterwards.

Link: https://lore.kernel.org/aa5f189a-ac62-4633-97b5-ebf939e9c535@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
---
 include/uapi/linux/ethtool.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index b74b80508553..1cdfb8341df2 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -225,7 +225,7 @@ enum tunable_id {
 	ETHTOOL_ID_UNSPEC,
 	ETHTOOL_RX_COPYBREAK,
 	ETHTOOL_TX_COPYBREAK,
-	ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
+	ETHTOOL_PFC_PREVENTION_TOUT, /* both pause and pfc, see man ethtool */
 	ETHTOOL_TX_COPYBREAK_BUF_SIZE,
 	/*
 	 * Add your fresh new tunable attribute above and remember to update
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm
  2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
                   ` (5 preceding siblings ...)
  2026-02-07  1:38 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
@ 2026-02-07  1:42 ` Mohsin Bashir
  6 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-07  1:42 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	mohsin.bashr, o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko

With TX pause enabled, if a device cannot deliver received frames to
the stack (e.g., during a system hang), it may generate excessive pause
frames causing a pause storm. This series updates the uAPI to track TX
pause storm events as part of the pause stats (p1), propose to use the
existing knob (pfc-prevention-tout) to configure storm watchdog (p2),
adds pause storm protection support for fbnic (p3), and leverages p1
to provide observability into these events for fbnic (p4) and mlnx5 (p5)
drivers.

---
Changelog:
V2:
 - Clarify pfc-prevention-tout applies to general pause, not just PFC
   (P2)
 - Add pause storm watchdog timeout configuration via pfc-prevention-tout
   (P3)
 - mlx5: Report device stall prevention events (errors) in pause stats
   (P5)

V1: https://lore.kernel.org/20260122192158.428882-1-mohsin.bashr@gmail.com/

Mohsin Bashir (5):
  net: ethtool: Track pause storm events
  net: ethtool: Update doc for tunable
  eth: fbnic: Add protection against pause storm
  eth: fbnic: Fetch TX pause storm stats
  eth: mlx5: Move pause storm errors to pause stats

 Documentation/netlink/specs/ethtool.yaml      |  13 +++
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  25 ++++
 drivers/net/ethernet/meta/fbnic/fbnic.h       |   3 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  11 ++
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |  46 ++++++++
 .../net/ethernet/meta/fbnic/fbnic_hw_stats.h  |   1 +
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |   2 +
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 110 ++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |  27 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |   5 +
 include/linux/ethtool.h                       |   2 +
 include/uapi/linux/ethtool.h                  |   2 +-
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 net/ethtool/pause.c                           |   4 +-
 14 files changed, 250 insertions(+), 2 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats
  2026-02-07  1:05 ` [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir
@ 2026-02-11  9:26   ` Paolo Abeni
  2026-02-11  9:49   ` Tariq Toukan
  1 sibling, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2026-02-11  9:26 UTC (permalink / raw)
  To: Mohsin Bashir, netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	o.rempel, saeedm, tariqt, vadim.fedorenko

On 2/7/26 2:05 AM, Mohsin Bashir wrote:
> Report device_stall_critical_watermark_cnt as tx_pause_storm_events in
> the ethtool_pause_stats struct. This counter tracks pause storm error
> events which indicate the NIC has been sending pause frames for an
> extended period due to a stall.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>

I think this deserves and explicit ack from someone in Mellanox, and I'm
wrapping-up the net-next PR right now. Unless the ack is very fast, I
fear I'll have to defer this one.

/P


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next V2 1/5] net: ethtool: Track pause storm events
  2026-02-07  1:05 ` [PATCH net-next V2 1/5] net: ethtool: Track pause storm events Mohsin Bashir
@ 2026-02-11  9:28   ` Paolo Abeni
  0 siblings, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2026-02-11  9:28 UTC (permalink / raw)
  To: Mohsin Bashir, netdev, o.rempel
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch, saeedm,
	tariqt, vadim.fedorenko

On 2/7/26 2:05 AM, Mohsin Bashir wrote:
> With TX pause enabled, if a device is unable to pass packets up to the
> stack (e.g., CPU is hanged), the device can cause pause storm. Given
> that devices can have native support to protect the neighbor from such
> flooding, such events need some tracking. This support is to track TX
> pause storm events for better observability.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>

AFAICS you forgot to retain Oleksij's reviewed-by tag from v1. Perhaps
he will chime-in again...

/P


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats
  2026-02-07  1:05 ` [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir
  2026-02-11  9:26   ` Paolo Abeni
@ 2026-02-11  9:49   ` Tariq Toukan
  2026-02-11 21:18     ` Mohsin Bashir
  1 sibling, 1 reply; 12+ messages in thread
From: Tariq Toukan @ 2026-02-11  9:49 UTC (permalink / raw)
  To: Mohsin Bashir, netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko



On 07/02/2026 3:05, Mohsin Bashir wrote:
> Report device_stall_critical_watermark_cnt as tx_pause_storm_events in
> the ethtool_pause_stats struct. This counter tracks pause storm error
> events which indicate the NIC has been sending pause frames for an
> extended period due to a stall.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com>
> ---
>   .../ethernet/mellanox/mlx5/core/en_stats.c    | 25 +++++++++++++++++++
>   1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index a8af84fc9763..2fe779c164e4 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -916,6 +916,23 @@ static int mlx5e_stats_get_ieee(struct mlx5_core_dev *mdev,
>   				    sz, MLX5_REG_PPCNT, 0, 0);
>   }
>   
> +static int mlx5e_stats_get_per_prio(struct mlx5_core_dev *mdev,
> +				    u32 *ppcnt_per_prio)
> +{
> +	u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {};
> +	int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
> +
> +	if (!(MLX5_CAP_PCAM_FEATURE(mdev, pfcc_mask) &&
> +	      MLX5_CAP_DEBUG(mdev, stall_detect)))
> +		return -EOPNOTSUPP;
> +
> +	MLX5_SET(ppcnt_reg, in, local_port, 1);
> +	MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_PRIORITY_COUNTERS_GROUP);
> +	MLX5_SET(ppcnt_reg, in, prio_tc, 0);

No interest in all other non-0 prios?

> +	return mlx5_core_access_reg(mdev, in, sz, ppcnt_per_prio, sz,
> +				    MLX5_REG_PPCNT, 0, 0);
> +}
> +
>   void mlx5e_stats_pause_get(struct mlx5e_priv *priv,
>   			   struct ethtool_pause_stats *pause_stats)
>   {
> @@ -933,6 +950,14 @@ void mlx5e_stats_pause_get(struct mlx5e_priv *priv,
>   		MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3,
>   				      eth_802_3_cntrs_grp_data_layout,
>   				      a_pause_mac_ctrl_frames_received);
> +
> +	if (mlx5e_stats_get_per_prio(mdev, ppcnt_ieee_802_3))
> +		return;
> +
> +	pause_stats->tx_pause_storm_events =
> +		MLX5E_READ_CTR64_BE_F(ppcnt_ieee_802_3,
> +				      eth_per_prio_grp_data_layout,
> +				      device_stall_critical_watermark_cnt);
>   }
>   
>   void mlx5e_stats_eth_phy_get(struct mlx5e_priv *priv,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats
  2026-02-11  9:49   ` Tariq Toukan
@ 2026-02-11 21:18     ` Mohsin Bashir
  0 siblings, 0 replies; 12+ messages in thread
From: Mohsin Bashir @ 2026-02-11 21:18 UTC (permalink / raw)
  To: Tariq Toukan, netdev
  Cc: alexanderduyck, andrew+netdev, andrew, davem, donald.hunter,
	edumazet, gal, horms, idosch, jacob.e.keller, kernel-team,
	kory.maincent, kuba, lee, leon, linux-rdma, linux, mbloch,
	o.rempel, pabeni, saeedm, tariqt, vadim.fedorenko



On 2/11/26 1:49 AM, Tariq Toukan wrote:

>> +static int mlx5e_stats_get_per_prio(struct mlx5_core_dev *mdev,
>> +                    u32 *ppcnt_per_prio)
>> +{
>> +    u32 in[MLX5_ST_SZ_DW(ppcnt_reg)] = {};
>> +    int sz = MLX5_ST_SZ_BYTES(ppcnt_reg);
>> +
>> +    if (!(MLX5_CAP_PCAM_FEATURE(mdev, pfcc_mask) &&
>> +          MLX5_CAP_DEBUG(mdev, stall_detect)))
>> +        return -EOPNOTSUPP;
>> +
>> +    MLX5_SET(ppcnt_reg, in, local_port, 1);
>> +    MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_PRIORITY_COUNTERS_GROUP);
>> +    MLX5_SET(ppcnt_reg, in, prio_tc, 0);
> 
> No interest in all other non-0 prios?
> 
I opted for prio 0 for simplicity. I can iterate over all prios and 
aggregate if that's needed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-11 21:18 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-07  1:05 [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
2026-02-07  1:05 ` [PATCH net-next V2 1/5] net: ethtool: Track pause storm events Mohsin Bashir
2026-02-11  9:28   ` Paolo Abeni
2026-02-07  1:05 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
2026-02-07  1:05 ` [PATCH net-next V2 3/5] eth: fbnic: Add protection against pause storm Mohsin Bashir
2026-02-07  1:05 ` [PATCH net-next V2 4/5] eth: fbnic: Fetch TX pause storm stats Mohsin Bashir
2026-02-07  1:05 ` [PATCH net-next V2 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir
2026-02-11  9:26   ` Paolo Abeni
2026-02-11  9:49   ` Tariq Toukan
2026-02-11 21:18     ` Mohsin Bashir
2026-02-07  1:38 ` [PATCH net-next V2 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
2026-02-07  1:42 ` [PATCH net-next V2 0/5] net: ethtool: Track TX pause storm Mohsin Bashir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox