netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Jiri Pirko <jiri@resnulli.us>, Jiri Pirko <jiri@nvidia.com>
Cc: Saeed Mahameed <saeed@kernel.org>, Gal Pressman <gal@nvidia.com>,
	"Leon Romanovsky" <leon@kernel.org>,
	Shahar Shitrit <shshitrit@nvidia.com>,
	"Donald Hunter" <donald.hunter@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"Brett Creeley" <brett.creeley@amd.com>,
	Michael Chan <michael.chan@broadcom.com>,
	Pavan Chebbi <pavan.chebbi@broadcom.com>,
	Cai Huoqing <cai.huoqing@linux.dev>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	"Przemek Kitszel" <przemyslaw.kitszel@intel.com>,
	Sunil Goutham <sgoutham@marvell.com>,
	Linu Cherian <lcherian@marvell.com>,
	Geetha sowjanya <gakula@marvell.com>,
	Jerin Jacob <jerinj@marvell.com>, hariprasad <hkelam@marvell.com>,
	"Subbaraya Sundeep" <sbhatta@marvell.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	"Tariq Toukan" <tariqt@nvidia.com>,
	Mark Bloch <mbloch@nvidia.com>, Ido Schimmel <idosch@nvidia.com>,
	Petr Machata <petrm@nvidia.com>,
	Manish Chopra <manishc@marvell.com>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>, <linux-rdma@vger.kernel.org>
Subject: [PATCH net-next 0/5] Expose grace period delay for devlink health reporter
Date: Thu, 17 Jul 2025 19:07:17 +0300	[thread overview]
Message-ID: <1752768442-264413-1-git-send-email-tariqt@nvidia.com> (raw)

Hi,

This series by Shahar implements graceful period delay in devlink health
reporter, and use it in mlx5e driver.

See detailed feature description by Shahar below [1].

Regards,
Tariq

[1]
Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors
are not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.

This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.

To resolve this, introduce a configurable grace period delay attribute
to the devlink health reporter. This delay starts when the first error
is recovered and lasts for a user-defined duration. Once this grace
period delay expires, the actual grace period begins. After the grace
period ends, a new reported error will start the same flow again.

Timeline summary:

----|--------|------------------------------/----------------------/--
error is  error is    grace period delay          grace period
reported  recovered  (recoveries allowed)     (recoveries blocked)

With grace period delay, create a time window during which recovery
attempts are permitted, allowing all reported errors to be handled
sequentially before the grace period starts. Once the grace period
begins, it prevents any further error recoveries until it ends.

When grace period delay is set to 0, current behavior is preserved.


Shahar Shitrit (5):
  devlink: Move graceful period parameter to reporter ops
  devlink: Move health reporter recovery abort logic to a separate
    function
  devlink: Introduce grace period delay for health reporter
  devlink: Make health reporter grace period delay configurable
  net/mlx5e: Set default grace period delay for TX and RX reporters

 Documentation/netlink/specs/devlink.yaml      |   7 ++
 .../networking/devlink/devlink-health.rst     |   2 +-
 drivers/net/ethernet/amd/pds_core/main.c      |   2 +-
 .../net/ethernet/broadcom/bnxt/bnxt_devlink.c |   2 +-
 .../net/ethernet/huawei/hinic/hinic_devlink.c |  10 +-
 .../net/ethernet/intel/ice/devlink/health.c   |   3 +-
 .../marvell/octeontx2/af/rvu_devlink.c        |  32 +++--
 .../mellanox/mlx5/core/diag/reporter_vnic.c   |   2 +-
 .../mellanox/mlx5/core/en/reporter_rx.c       |  13 ++-
 .../mellanox/mlx5/core/en/reporter_tx.c       |  13 ++-
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   2 +-
 .../net/ethernet/mellanox/mlx5/core/health.c  |  41 ++++---
 drivers/net/ethernet/mellanox/mlxsw/core.c    |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_devlink.c |  10 +-
 drivers/net/netdevsim/health.c                |   4 +-
 include/net/devlink.h                         |  15 ++-
 include/uapi/linux/devlink.h                  |   2 +
 net/devlink/health.c                          | 109 +++++++++++++-----
 net/devlink/netlink_gen.c                     |   5 +-
 19 files changed, 191 insertions(+), 85 deletions(-)


base-commit: a96cee9b369ee47b5309311d0d71cb6663b123fc
-- 
2.31.1


             reply	other threads:[~2025-07-17 16:08 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-17 16:07 Tariq Toukan [this message]
2025-07-17 16:07 ` [PATCH net-next 1/5] devlink: Move graceful period parameter to reporter ops Tariq Toukan
2025-07-17 16:07 ` [PATCH net-next 2/5] devlink: Move health reporter recovery abort logic to a separate function Tariq Toukan
2025-07-17 16:07 ` [PATCH net-next 3/5] devlink: Introduce grace period delay for health reporter Tariq Toukan
2025-07-17 16:07 ` [PATCH net-next 4/5] devlink: Make health reporter grace period delay configurable Tariq Toukan
2025-07-19  0:48   ` Jakub Kicinski
2025-07-20 10:11     ` Tariq Toukan
2025-07-19  0:51   ` Jakub Kicinski
2025-07-20 10:47     ` Tariq Toukan
2025-07-17 16:07 ` [PATCH net-next 5/5] net/mlx5e: Set default grace period delay for TX and RX reporters Tariq Toukan
2025-07-19  0:47 ` [PATCH net-next 0/5] Expose grace period delay for devlink health reporter Jakub Kicinski
2025-07-24 10:46   ` Tariq Toukan
2025-07-25  0:10     ` Jakub Kicinski
2025-07-27 11:00       ` Tariq Toukan
2025-07-28 15:17         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1752768442-264413-1-git-send-email-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=brett.creeley@amd.com \
    --cc=cai.huoqing@linux.dev \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=edumazet@google.com \
    --cc=gakula@marvell.com \
    --cc=gal@nvidia.com \
    --cc=hkelam@marvell.com \
    --cc=idosch@nvidia.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jerinj@marvell.com \
    --cc=jiri@nvidia.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=lcherian@marvell.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=manishc@marvell.com \
    --cc=mbloch@nvidia.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    --cc=petrm@nvidia.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=saeed@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=sbhatta@marvell.com \
    --cc=sgoutham@marvell.com \
    --cc=shshitrit@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).