public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next V3 0/5] net: ethtool: Track TX pause storm
@ 2026-02-23 17:49 Mohsin Bashir
  2026-02-23 17:49 ` [RFC net-next V3 1/5] net: ethtool: Track pause storm events Mohsin Bashir
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Mohsin Bashir @ 2026-02-23 17:49 UTC (permalink / raw)
  To: netdev
  Cc: alexanderduyck, alok.a.tiwari, andrew+netdev, andrew, davem,
	dg573847474, donald.hunter, edumazet, gal, horms, idosch,
	jacob.e.keller, kernel-team, kory.maincent, kuba, lee, leon,
	mbloch, mike.marciniszyn, mohsin.bashr, o.rempel, pabeni, saeedm,
	tariqt, vadim.fedorenko

With TX pause enabled, if a device cannot deliver received frames to
the stack (e.g., during a system hang), it may generate excessive pause
frames causing a pause storm. This series updates the uAPI to track TX
pause storm events as part of the pause stats (p1), proposes using the
existing pfc-prevention-tout knob to configure the storm watchdog (p2),
adds pause storm protection support for fbnic (p3), and leverages p1
to provide observability into these events for the fbnic (p4) and mlx5
(p5) drivers.

Patches 1-4 are ready for review. The series is marked RFC due to
patch 5, which has a few open design questions:

The mlx5 tracks device_stall_critical_watermark_cnt per priority class
(8 priorities) in the PPCNT register's per-priority counter group. The
ethtool_pause_stats struct exposes tx_pause_storm_events as a single
value. Patch 5 currently aggregates across all 8 priorities. This raises
the following questions:

  1) Should the driver report the aggregate across all priorities, or
     only report prio 0? Looks like the current reporting via ethtool -S
     (tx_pause_storm_warning_events / tx_pause_storm_error_events)
     read from per_prio_counters[0] only, if my reading is correct,
     reporting the aggregate in the new pause stats path would create an
     inconsistency between the two interfaces for the same underlying
     counter.

  2) If aggregating is the preferred approach, there should be a fix for
     existing ethtool -S reporting to aggregate across all priorities,
     so both interfaces stay in sync?

  3) The per-priority data is already cached in per_prio_counters[]
     by the periodic stats update (DECLARE_STATS_GRP_OP_UPDATE_STATS).
     Should the pause stats path read from this cache (zero additional
     firmware trips at the cost of slightly stale data), or issue fresh
     reads via mlx5_core_access_reg() as done in p5?

Feedback on the above would be appreciated.

Changelog:
V3:
 - (P3): Fix checkpatch complaint about line length in
   fbnic_mac_ps_protect_handler()
 - (P5):
    * Aggregate pause stall stats across all priority classes
    * Update commit message
V2:
https://lore.kernel.org/20260207010525.3808842-1-mohsin.bashr@gmail.com/

V1: https://lore.kernel.org/20260122192158.428882-1-mohsin.bashr@gmail.com/

Mohsin Bashir (5):
  net: ethtool: Track pause storm events
  net: ethtool: Update doc for tunable
  eth: fbnic: Add protection against pause storm
  eth: fbnic: Fetch TX pause storm stats
  eth: mlx5: Move pause storm errors to pause stats

 Documentation/netlink/specs/ethtool.yaml      |  13 ++
 .../ethernet/mellanox/mlx5/core/en_stats.c    |  30 +++++
 drivers/net/ethernet/meta/fbnic/fbnic.h       |   3 +
 drivers/net/ethernet/meta/fbnic/fbnic_csr.h   |  11 ++
 .../net/ethernet/meta/fbnic/fbnic_ethtool.c   |  46 ++++++++
 .../net/ethernet/meta/fbnic/fbnic_hw_stats.h  |   1 +
 drivers/net/ethernet/meta/fbnic/fbnic_irq.c   |   2 +
 drivers/net/ethernet/meta/fbnic/fbnic_mac.c   | 111 ++++++++++++++++++
 drivers/net/ethernet/meta/fbnic/fbnic_mac.h   |  27 +++++
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c   |   5 +
 include/linux/ethtool.h                       |   2 +
 include/uapi/linux/ethtool.h                  |   2 +-
 .../uapi/linux/ethtool_netlink_generated.h    |   1 +
 net/ethtool/pause.c                           |   4 +-
 14 files changed, 256 insertions(+), 2 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-23 19:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23 17:49 [RFC net-next V3 0/5] net: ethtool: Track TX pause storm Mohsin Bashir
2026-02-23 17:49 ` [RFC net-next V3 1/5] net: ethtool: Track pause storm events Mohsin Bashir
2026-02-23 17:49 ` [RFC net-next V3 2/5] net: ethtool: Update doc for tunable Mohsin Bashir
2026-02-23 17:49 ` [RFC net-next V3 3/5] eth: fbnic: Add protection against pause storm Mohsin Bashir
2026-02-23 17:49 ` [RFC net-next V3 4/5] eth: fbnic: Fetch TX pause storm stats Mohsin Bashir
2026-02-23 17:49 ` [RFC net-next V3 5/5] eth: mlx5: Move pause storm errors to pause stats Mohsin Bashir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox