* [PATCH net-next V4 0/5] Expose burst period for devlink health reporter
@ 2025-08-24 8:43 Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 1/5] devlink: Move graceful period parameter to reporter ops Mark Bloch
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Mark Bloch
Hi,
This series by Shahar implements burst period in devlink health
reporter, and use it in mlx5e driver.
This is V4. Find previous versions here:
v3: https://lore.kernel.org/all/1755111349-416632-1-git-send-email-tariqt@nvidia.com/
v2: https://lore.kernel.org/all/1753390134-345154-1-git-send-email-tariqt@nvidia.com/
v1: https://lore.kernel.org/all/1752768442-264413-1-git-send-email-tariqt@nvidia.com/
Changelog:
v3->v4:
- Renamed error burst period to burst period throughout code and commit
messages.
- Renamed devlink_health_reporter_burst_period_active() to
devlink_health_reporter_in_burst().
- Added doc for health-reporter-burst-period attr in devlink.yaml.
– In __devlink_health_reporter_create(), split compound condition into
two separate WARN_ON() checks for clarity and early return.
- Fixed indentations in devlink_health_reporter_ops doc.
v2->v3:
- Rebase.
- Rename feature: graceful period delay -> error burst period.
v1->v2:
- Rebase.
- Fix long attr name.
- Extend the cover letter.
Shahar writes:
--------------------------------------------------------------------------
Currently, the devlink health reporter initiates the grace period
immediately after recovering an error, which blocks further recovery
attempts until the grace period concludes. Since additional errors
are not generally expected during this short interval, any new error
reported during the grace period is not only rejected but also causes
the reporter to enter an error state that requires manual intervention.
This approach poses a problem in scenarios where a single root cause
triggers multiple related errors in quick succession - for example,
a PCI issue affecting multiple hardware queues. Because these errors
are closely related and occur rapidly, it is more effective to handle
them together rather than handling only the first one reported and
blocking any subsequent recovery attempts. Furthermore, setting the
reporter to an error state in this context can be misleading, as these
multiple errors are manifestations of a single underlying issue, making
it unlike the general case where additional errors are not expected
during the grace period.
To resolve this, introduce a configurable burst period attribute to the
devlink health reporter. This period starts when the first error
is recovered and lasts for a user-defined duration. Once this error
burst period expires, the grace period begins. After the grace period
ends, a new reported error will start the same flow again.
Timeline summary:
----|--------|------------------------------/----------------------/--
error is error is burst period grace period
reported recovered (recoveries allowed) (recoveries blocked)
With burst period, create a time window during which recovery attempts
are permitted, allowing all reported errors to be handled sequentially
before the grace period starts. Once the grace period begins, it
prevents any further error recoveries until it ends.
When burst period is set to 0, current behavior is preserved.
Design alternatives considered:
1. Recover all queues upon any error:
A brute-force approach that recovers all queues on any error.
While simple, it is overly aggressive and disrupts unaffected queues
unnecessarily. Also, because this is handled entirely within the
driver, it leads to a driver-specific implementation rather than a
generic one.
2. Per-queue reporter:
This design would isolate recovery handling per SQ or RQ, effectively
removing interdependencies between queues. While conceptually clean,
it introduces significant scalability challenges as the number of
queues grows, as well as synchronization challenges across multiple
reporters.
3. Error aggregation with delayed handling:
Errors arriving during the grace period are saved and processed after
it ends. While addressing the issue of related errors whose recovery
is aborted as grace period started, this adds complexity due to
synchronization needs and contradicts the assumption that no errors
should occur during a healthy system’s grace period. Also, this
breaks the important role of grace period in preventing an infinite
loop of immediate error detection following recovery. In such cases
we want to stop.
4. Allowing a fixed burst of errors before starting grace period:
Allows a set number of recoveries before the grace period begins.
However, it also requires limiting the error reporting window.
To keep the design simple, the burst threshold becomes redundant.
The burst period design was chosen for its simplicity and precision in
addressing the problem at hand. It effectively captures the temporal
correlation of related errors and aligns with the original intent of
the grace period as a stabilization window where further errors are
unexpected, and if they do occur, they indicate an abnormal system
state.
Shahar Shitrit (5):
devlink: Move graceful period parameter to reporter ops
devlink: Move health reporter recovery abort logic to a separate function
devlink: Introduce burst period for health reporter
devlink: Make health reporter burst period configurable
net/mlx5e: Set default burst period for TX and RX reporters
Documentation/netlink/specs/devlink.yaml | 7 ++
.../networking/devlink/devlink-health.rst | 2 +-
drivers/net/ethernet/amd/pds_core/main.c | 2 +-
.../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 2 +-
.../net/ethernet/huawei/hinic/hinic_devlink.c | 10 +-
.../net/ethernet/intel/ice/devlink/health.c | 3 +-
.../marvell/octeontx2/af/rvu_devlink.c | 32 +++--
.../mellanox/mlx5/core/diag/reporter_vnic.c | 2 +-
.../mellanox/mlx5/core/en/reporter_rx.c | 12 +-
.../mellanox/mlx5/core/en/reporter_tx.c | 12 +-
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/health.c | 41 ++++---
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_devlink.c | 9 +-
drivers/net/netdevsim/health.c | 4 +-
include/net/devlink.h | 14 ++-
include/uapi/linux/devlink.h | 2 +
net/devlink/health.c | 109 +++++++++++++-----
net/devlink/netlink_gen.c | 5 +-
19 files changed, 187 insertions(+), 85 deletions(-)
base-commit: b1c92cdf5af3198e8fbc1345a80e2a1dff386c02
--
2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next V4 1/5] devlink: Move graceful period parameter to reporter ops
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
@ 2025-08-24 8:43 ` Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 2/5] devlink: Move health reporter recovery abort logic to a separate function Mark Bloch
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Shahar Shitrit, Jiri Pirko,
Mark Bloch, Brett Creeley, Michael Chan, Pavan Chebbi,
Cai Huoqing, Tony Nguyen, Przemek Kitszel, Sunil Goutham,
Linu Cherian, Geetha sowjanya, Jerin Jacob, hariprasad,
Subbaraya Sundeep, Ido Schimmel, Petr Machata, Manish Chopra,
Jiri Pirko
From: Shahar Shitrit <shshitrit@nvidia.com>
Move the default graceful period from a parameter to
devlink_health_reporter_create() to a field in the
devlink_health_reporter_ops structure.
This change improves consistency, as the graceful period is inherently
tied to the reporter's behavior and recovery policy. It simplifies the
signature of devlink_health_reporter_create() and its internal helper
functions. It also centralizes the reporter configuration at the ops
structure, preparing the groundwork for a downstream patch that will
introduce a devlink health reporter burst period attribute whose
default value will similarly be provided by the driver via the ops
structure.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/amd/pds_core/main.c | 2 +-
.../net/ethernet/broadcom/bnxt/bnxt_devlink.c | 2 +-
.../net/ethernet/huawei/hinic/hinic_devlink.c | 10 +++--
.../net/ethernet/intel/ice/devlink/health.c | 3 +-
.../marvell/octeontx2/af/rvu_devlink.c | 32 +++++++++++----
.../mellanox/mlx5/core/diag/reporter_vnic.c | 2 +-
.../mellanox/mlx5/core/en/reporter_rx.c | 10 +++--
.../mellanox/mlx5/core/en/reporter_tx.c | 10 +++--
.../net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/health.c | 41 +++++++++++--------
drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_devlink.c | 9 ++--
drivers/net/netdevsim/health.c | 4 +-
include/net/devlink.h | 11 +++--
net/devlink/health.c | 28 +++++--------
15 files changed, 97 insertions(+), 71 deletions(-)
diff --git a/drivers/net/ethernet/amd/pds_core/main.c b/drivers/net/ethernet/amd/pds_core/main.c
index 9b81e1c260c2..c7a2eff57632 100644
--- a/drivers/net/ethernet/amd/pds_core/main.c
+++ b/drivers/net/ethernet/amd/pds_core/main.c
@@ -280,7 +280,7 @@ static int pdsc_init_pf(struct pdsc *pdsc)
goto err_out_del_dev;
}
- hr = devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, 0, pdsc);
+ hr = devl_health_reporter_create(dl, &pdsc_fw_reporter_ops, pdsc);
if (IS_ERR(hr)) {
devl_unlock(dl);
dev_warn(pdsc->dev, "Failed to create fw reporter: %pe\n", hr);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 4c4581b0342e..43fb75806cd6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -220,7 +220,7 @@ __bnxt_dl_reporter_create(struct bnxt *bp,
{
struct devlink_health_reporter *reporter;
- reporter = devlink_health_reporter_create(bp->dl, ops, 0, bp);
+ reporter = devlink_health_reporter_create(bp->dl, ops, bp);
if (IS_ERR(reporter)) {
netdev_warn(bp->dev, "Failed to create %s health reporter, rc = %ld\n",
ops->name, PTR_ERR(reporter));
diff --git a/drivers/net/ethernet/huawei/hinic/hinic_devlink.c b/drivers/net/ethernet/huawei/hinic/hinic_devlink.c
index 03e42512a2d5..300bc267a259 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_devlink.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_devlink.c
@@ -443,8 +443,9 @@ int hinic_health_reporters_create(struct hinic_devlink_priv *priv)
struct devlink *devlink = priv_to_devlink(priv);
priv->hw_fault_reporter =
- devlink_health_reporter_create(devlink, &hinic_hw_fault_reporter_ops,
- 0, priv);
+ devlink_health_reporter_create(devlink,
+ &hinic_hw_fault_reporter_ops,
+ priv);
if (IS_ERR(priv->hw_fault_reporter)) {
dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create hw fault reporter, err: %ld\n",
PTR_ERR(priv->hw_fault_reporter));
@@ -452,8 +453,9 @@ int hinic_health_reporters_create(struct hinic_devlink_priv *priv)
}
priv->fw_fault_reporter =
- devlink_health_reporter_create(devlink, &hinic_fw_fault_reporter_ops,
- 0, priv);
+ devlink_health_reporter_create(devlink,
+ &hinic_fw_fault_reporter_ops,
+ priv);
if (IS_ERR(priv->fw_fault_reporter)) {
dev_warn(&priv->hwdev->hwif->pdev->dev, "Failed to create fw fault reporter, err: %ld\n",
PTR_ERR(priv->fw_fault_reporter));
diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c b/drivers/net/ethernet/intel/ice/devlink/health.c
index ab519c0f28bf..8e9a8a8178d4 100644
--- a/drivers/net/ethernet/intel/ice/devlink/health.c
+++ b/drivers/net/ethernet/intel/ice/devlink/health.c
@@ -450,9 +450,8 @@ ice_init_devlink_rep(struct ice_pf *pf,
{
struct devlink *devlink = priv_to_devlink(pf);
struct devlink_health_reporter *rep;
- const u64 graceful_period = 0;
- rep = devl_health_reporter_create(devlink, ops, graceful_period, pf);
+ rep = devl_health_reporter_create(devlink, ops, pf);
if (IS_ERR(rep)) {
struct device *dev = ice_pf_to_dev(pf);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index 27c3a2daaaa9..3735372539bd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -505,7 +505,9 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl)
rvu_reporters->nix_event_ctx = nix_event_context;
rvu_reporters->rvu_hw_nix_intr_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_intr_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_nix_intr_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_nix_intr_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_nix_intr reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_nix_intr_reporter));
@@ -513,7 +515,9 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_nix_gen_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_gen_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_nix_gen_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_nix_gen_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_nix_gen reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_nix_gen_reporter));
@@ -521,7 +525,9 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_nix_err_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_err_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_nix_err_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_nix_err_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_nix_err reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_nix_err_reporter));
@@ -529,7 +535,9 @@ static int rvu_nix_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_nix_ras_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_nix_ras_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_nix_ras_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_nix_ras_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_nix_ras reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_nix_ras_reporter));
@@ -1051,7 +1059,9 @@ static int rvu_npa_register_reporters(struct rvu_devlink *rvu_dl)
rvu_reporters->npa_event_ctx = npa_event_context;
rvu_reporters->rvu_hw_npa_intr_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_intr_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_npa_intr_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_npa_intr_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_npa_intr reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_npa_intr_reporter));
@@ -1059,7 +1069,9 @@ static int rvu_npa_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_npa_gen_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_gen_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_npa_gen_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_npa_gen_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_npa_gen reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_npa_gen_reporter));
@@ -1067,7 +1079,9 @@ static int rvu_npa_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_npa_err_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_err_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_npa_err_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_npa_err_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_npa_err reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_npa_err_reporter));
@@ -1075,7 +1089,9 @@ static int rvu_npa_register_reporters(struct rvu_devlink *rvu_dl)
}
rvu_reporters->rvu_hw_npa_ras_reporter =
- devlink_health_reporter_create(rvu_dl->dl, &rvu_hw_npa_ras_reporter_ops, 0, rvu);
+ devlink_health_reporter_create(rvu_dl->dl,
+ &rvu_hw_npa_ras_reporter_ops,
+ rvu);
if (IS_ERR(rvu_reporters->rvu_hw_npa_ras_reporter)) {
dev_warn(rvu->dev, "Failed to create hw_npa_ras reporter, err=%ld\n",
PTR_ERR(rvu_reporters->rvu_hw_npa_ras_reporter));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
index 32bb769f1829..73f5b62b8c7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/reporter_vnic.c
@@ -135,7 +135,7 @@ void mlx5_reporter_vnic_create(struct mlx5_core_dev *dev)
health->vnic_reporter =
devlink_health_reporter_create(devlink,
&mlx5_reporter_vnic_ops,
- 0, dev);
+ dev);
if (IS_ERR(health->vnic_reporter))
mlx5_core_warn(dev,
"Failed to create vnic reporter, err = %ld\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 16c44d628eda..1b9ea72abc5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -651,22 +651,24 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c)
mutex_unlock(&c->icosq_recovery_lock);
}
+#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500
+
static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
.name = "rx",
.recover = mlx5e_rx_reporter_recover,
.diagnose = mlx5e_rx_reporter_diagnose,
.dump = mlx5e_rx_reporter_dump,
+ .default_graceful_period = MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
};
-#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500
-
void mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
{
+ struct devlink_port *port = priv->netdev->devlink_port;
struct devlink_health_reporter *reporter;
- reporter = devlink_port_health_reporter_create(priv->netdev->devlink_port,
+ reporter = devlink_port_health_reporter_create(port,
&mlx5_rx_reporter_ops,
- MLX5E_REPORTER_RX_GRACEFUL_PERIOD, priv);
+ priv);
if (IS_ERR(reporter)) {
netdev_warn(priv->netdev, "Failed to create rx reporter, err = %ld\n",
PTR_ERR(reporter));
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 85d5cb39b107..7a4a77f6fe6a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -539,22 +539,24 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq)
mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
}
+#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+
static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
.name = "tx",
.recover = mlx5e_tx_reporter_recover,
.diagnose = mlx5e_tx_reporter_diagnose,
.dump = mlx5e_tx_reporter_dump,
+ .default_graceful_period = MLX5_REPORTER_TX_GRACEFUL_PERIOD,
};
-#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
-
void mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
{
+ struct devlink_port *port = priv->netdev->devlink_port;
struct devlink_health_reporter *reporter;
- reporter = devlink_port_health_reporter_create(priv->netdev->devlink_port,
+ reporter = devlink_port_health_reporter_create(port,
&mlx5_tx_reporter_ops,
- MLX5_REPORTER_TX_GRACEFUL_PERIOD, priv);
+ priv);
if (IS_ERR(reporter)) {
netdev_warn(priv->netdev,
"Failed to create tx reporter, err = %ld\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 63a7a788fb0d..b231e7855bca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -1447,7 +1447,7 @@ static void mlx5e_rep_vnic_reporter_create(struct mlx5e_priv *priv,
reporter = devl_port_health_reporter_create(dl_port,
&mlx5_rep_vnic_reporter_ops,
- 0, rpriv);
+ rpriv);
if (IS_ERR(reporter)) {
mlx5_core_err(priv->mdev,
"Failed to create representor vnic reporter, err = %ld\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index cf7a1edd0530..b63c5a221eb9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -669,54 +669,61 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work)
}
}
+#define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000
+#define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000
+#define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000
+#define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD \
+ MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD
+
+static
+const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ecpf_ops = {
+ .name = "fw_fatal",
+ .recover = mlx5_fw_fatal_reporter_recover,
+ .dump = mlx5_fw_fatal_reporter_dump,
+ .default_graceful_period =
+ MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD,
+};
+
static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_pf_ops = {
.name = "fw_fatal",
.recover = mlx5_fw_fatal_reporter_recover,
.dump = mlx5_fw_fatal_reporter_dump,
+ .default_graceful_period = MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD,
};
static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
.name = "fw_fatal",
.recover = mlx5_fw_fatal_reporter_recover,
+ .default_graceful_period =
+ MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD,
};
-#define MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD 180000
-#define MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD 60000
-#define MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD 30000
-#define MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD MLX5_FW_REPORTER_VF_GRACEFUL_PERIOD
-
void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
{
const struct devlink_health_reporter_ops *fw_fatal_ops;
struct mlx5_core_health *health = &dev->priv.health;
const struct devlink_health_reporter_ops *fw_ops;
struct devlink *devlink = priv_to_devlink(dev);
- u64 grace_period;
- fw_fatal_ops = &mlx5_fw_fatal_reporter_pf_ops;
fw_ops = &mlx5_fw_reporter_pf_ops;
if (mlx5_core_is_ecpf(dev)) {
- grace_period = MLX5_FW_REPORTER_ECPF_GRACEFUL_PERIOD;
+ fw_fatal_ops = &mlx5_fw_fatal_reporter_ecpf_ops;
} else if (mlx5_core_is_pf(dev)) {
- grace_period = MLX5_FW_REPORTER_PF_GRACEFUL_PERIOD;
+ fw_fatal_ops = &mlx5_fw_fatal_reporter_pf_ops;
} else {
/* VF or SF */
- grace_period = MLX5_FW_REPORTER_DEFAULT_GRACEFUL_PERIOD;
fw_fatal_ops = &mlx5_fw_fatal_reporter_ops;
fw_ops = &mlx5_fw_reporter_ops;
}
- health->fw_reporter =
- devl_health_reporter_create(devlink, fw_ops, 0, dev);
+ health->fw_reporter = devl_health_reporter_create(devlink, fw_ops, dev);
if (IS_ERR(health->fw_reporter))
mlx5_core_warn(dev, "Failed to create fw reporter, err = %ld\n",
PTR_ERR(health->fw_reporter));
- health->fw_fatal_reporter =
- devl_health_reporter_create(devlink,
- fw_fatal_ops,
- grace_period,
- dev);
+ health->fw_fatal_reporter = devl_health_reporter_create(devlink,
+ fw_fatal_ops,
+ dev);
if (IS_ERR(health->fw_fatal_reporter))
mlx5_core_warn(dev, "Failed to create fw fatal reporter, err = %ld\n",
PTR_ERR(health->fw_fatal_reporter));
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 2bb2b77351bd..980f3223f124 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2043,7 +2043,7 @@ static int mlxsw_core_health_init(struct mlxsw_core *mlxsw_core)
return 0;
fw_fatal = devl_health_reporter_create(devlink, &mlxsw_core_health_fw_fatal_ops,
- 0, mlxsw_core);
+ mlxsw_core);
if (IS_ERR(fw_fatal)) {
dev_err(mlxsw_core->bus_info->dev, "Failed to create fw fatal reporter");
return PTR_ERR(fw_fatal);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_devlink.c b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
index 1adc7fbb3f2f..94c5689b5abd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_devlink.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_devlink.c
@@ -87,20 +87,21 @@ qed_fw_fatal_reporter_recover(struct devlink_health_reporter *reporter,
return 0;
}
+#define QED_REPORTER_FW_GRACEFUL_PERIOD 0
+
static const struct devlink_health_reporter_ops qed_fw_fatal_reporter_ops = {
.name = "fw_fatal",
.recover = qed_fw_fatal_reporter_recover,
.dump = qed_fw_fatal_reporter_dump,
+ .default_graceful_period = QED_REPORTER_FW_GRACEFUL_PERIOD,
};
-#define QED_REPORTER_FW_GRACEFUL_PERIOD 0
-
void qed_fw_reporters_create(struct devlink *devlink)
{
struct qed_devlink *dl = devlink_priv(devlink);
- dl->fw_reporter = devlink_health_reporter_create(devlink, &qed_fw_fatal_reporter_ops,
- QED_REPORTER_FW_GRACEFUL_PERIOD, dl);
+ dl->fw_reporter = devlink_health_reporter_create(devlink,
+ &qed_fw_fatal_reporter_ops, dl);
if (IS_ERR(dl->fw_reporter)) {
DP_NOTICE(dl->cdev, "Failed to create fw reporter, err = %ld\n",
PTR_ERR(dl->fw_reporter));
diff --git a/drivers/net/netdevsim/health.c b/drivers/net/netdevsim/health.c
index 688f05316b5e..3bd0e7a489c3 100644
--- a/drivers/net/netdevsim/health.c
+++ b/drivers/net/netdevsim/health.c
@@ -183,14 +183,14 @@ int nsim_dev_health_init(struct nsim_dev *nsim_dev, struct devlink *devlink)
health->empty_reporter =
devl_health_reporter_create(devlink,
&nsim_dev_empty_reporter_ops,
- 0, health);
+ health);
if (IS_ERR(health->empty_reporter))
return PTR_ERR(health->empty_reporter);
health->dummy_reporter =
devl_health_reporter_create(devlink,
&nsim_dev_dummy_reporter_ops,
- 0, health);
+ health);
if (IS_ERR(health->dummy_reporter)) {
err = PTR_ERR(health->dummy_reporter);
goto err_empty_reporter_destroy;
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 3119d053bc4d..c7ad7a981b39 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -746,6 +746,8 @@ enum devlink_health_reporter_state {
* if priv_ctx is NULL, run a full dump
* @diagnose: callback to diagnose the current status
* @test: callback to trigger a test event
+ * @default_graceful_period: default min time (in msec)
+ * between recovery attempts
*/
struct devlink_health_reporter_ops {
@@ -760,6 +762,7 @@ struct devlink_health_reporter_ops {
struct netlink_ext_ack *extack);
int (*test)(struct devlink_health_reporter *reporter,
struct netlink_ext_ack *extack);
+ u64 default_graceful_period;
};
/**
@@ -1928,22 +1931,22 @@ void devlink_fmsg_binary_pair_put(struct devlink_fmsg *fmsg, const char *name,
struct devlink_health_reporter *
devl_port_health_reporter_create(struct devlink_port *port,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv);
+ void *priv);
struct devlink_health_reporter *
devlink_port_health_reporter_create(struct devlink_port *port,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv);
+ void *priv);
struct devlink_health_reporter *
devl_health_reporter_create(struct devlink *devlink,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv);
+ void *priv);
struct devlink_health_reporter *
devlink_health_reporter_create(struct devlink *devlink,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv);
+ void *priv);
void
devl_health_reporter_destroy(struct devlink_health_reporter *reporter);
diff --git a/net/devlink/health.c b/net/devlink/health.c
index b3ce8ecbb7fb..ba144b7426fa 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -108,11 +108,11 @@ devlink_port_health_reporter_find_by_name(struct devlink_port *devlink_port,
static struct devlink_health_reporter *
__devlink_health_reporter_create(struct devlink *devlink,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv)
+ void *priv)
{
struct devlink_health_reporter *reporter;
- if (WARN_ON(graceful_period && !ops->recover))
+ if (WARN_ON(ops->default_graceful_period && !ops->recover))
return ERR_PTR(-EINVAL);
reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
@@ -122,7 +122,7 @@ __devlink_health_reporter_create(struct devlink *devlink,
reporter->priv = priv;
reporter->ops = ops;
reporter->devlink = devlink;
- reporter->graceful_period = graceful_period;
+ reporter->graceful_period = ops->default_graceful_period;
reporter->auto_recover = !!ops->recover;
reporter->auto_dump = !!ops->dump;
return reporter;
@@ -134,13 +134,12 @@ __devlink_health_reporter_create(struct devlink *devlink,
*
* @port: devlink_port to which health reports will relate
* @ops: devlink health reporter ops
- * @graceful_period: min time (in msec) between recovery attempts
* @priv: driver priv pointer
*/
struct devlink_health_reporter *
devl_port_health_reporter_create(struct devlink_port *port,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv)
+ void *priv)
{
struct devlink_health_reporter *reporter;
@@ -150,8 +149,7 @@ devl_port_health_reporter_create(struct devlink_port *port,
ops->name))
return ERR_PTR(-EEXIST);
- reporter = __devlink_health_reporter_create(port->devlink, ops,
- graceful_period, priv);
+ reporter = __devlink_health_reporter_create(port->devlink, ops, priv);
if (IS_ERR(reporter))
return reporter;
@@ -164,14 +162,13 @@ EXPORT_SYMBOL_GPL(devl_port_health_reporter_create);
struct devlink_health_reporter *
devlink_port_health_reporter_create(struct devlink_port *port,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv)
+ void *priv)
{
struct devlink_health_reporter *reporter;
struct devlink *devlink = port->devlink;
devl_lock(devlink);
- reporter = devl_port_health_reporter_create(port, ops,
- graceful_period, priv);
+ reporter = devl_port_health_reporter_create(port, ops, priv);
devl_unlock(devlink);
return reporter;
}
@@ -182,13 +179,12 @@ EXPORT_SYMBOL_GPL(devlink_port_health_reporter_create);
*
* @devlink: devlink instance which the health reports will relate
* @ops: devlink health reporter ops
- * @graceful_period: min time (in msec) between recovery attempts
* @priv: driver priv pointer
*/
struct devlink_health_reporter *
devl_health_reporter_create(struct devlink *devlink,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv)
+ void *priv)
{
struct devlink_health_reporter *reporter;
@@ -197,8 +193,7 @@ devl_health_reporter_create(struct devlink *devlink,
if (devlink_health_reporter_find_by_name(devlink, ops->name))
return ERR_PTR(-EEXIST);
- reporter = __devlink_health_reporter_create(devlink, ops,
- graceful_period, priv);
+ reporter = __devlink_health_reporter_create(devlink, ops, priv);
if (IS_ERR(reporter))
return reporter;
@@ -210,13 +205,12 @@ EXPORT_SYMBOL_GPL(devl_health_reporter_create);
struct devlink_health_reporter *
devlink_health_reporter_create(struct devlink *devlink,
const struct devlink_health_reporter_ops *ops,
- u64 graceful_period, void *priv)
+ void *priv)
{
struct devlink_health_reporter *reporter;
devl_lock(devlink);
- reporter = devl_health_reporter_create(devlink, ops,
- graceful_period, priv);
+ reporter = devl_health_reporter_create(devlink, ops, priv);
devl_unlock(devlink);
return reporter;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net-next V4 2/5] devlink: Move health reporter recovery abort logic to a separate function
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 1/5] devlink: Move graceful period parameter to reporter ops Mark Bloch
@ 2025-08-24 8:43 ` Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 3/5] devlink: Introduce burst period for health reporter Mark Bloch
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Shahar Shitrit, Jiri Pirko,
Dragos Tatulea, Carolina Jubran, Mark Bloch, Jiri Pirko
From: Shahar Shitrit <shshitrit@nvidia.com>
Extract the health reporter recovery abort logic into a separate
function devlink_health_recover_abort().
The function encapsulates the conditions for aborting recovery:
- When auto-recovery is disabled
- When previous error wasn't recovered
- When within the grace period after last recovery
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
net/devlink/health.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/net/devlink/health.c b/net/devlink/health.c
index ba144b7426fa..9d0d4a9face7 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -586,12 +586,33 @@ static int devlink_health_do_dump(struct devlink_health_reporter *reporter,
return err;
}
+static bool
+devlink_health_recover_abort(struct devlink_health_reporter *reporter,
+ enum devlink_health_reporter_state prev_state)
+{
+ unsigned long recover_ts_threshold;
+
+ if (!reporter->auto_recover)
+ return false;
+
+ /* abort if the previous error wasn't recovered */
+ if (prev_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY)
+ return true;
+
+ recover_ts_threshold = reporter->last_recovery_ts +
+ msecs_to_jiffies(reporter->graceful_period);
+ if (reporter->last_recovery_ts && reporter->recovery_count &&
+ time_is_after_jiffies(recover_ts_threshold))
+ return true;
+
+ return false;
+}
+
int devlink_health_report(struct devlink_health_reporter *reporter,
const char *msg, void *priv_ctx)
{
enum devlink_health_reporter_state prev_health_state;
struct devlink *devlink = reporter->devlink;
- unsigned long recover_ts_threshold;
int ret;
/* write a log message of the current error */
@@ -602,13 +623,7 @@ int devlink_health_report(struct devlink_health_reporter *reporter,
reporter->health_state = DEVLINK_HEALTH_REPORTER_STATE_ERROR;
devlink_recover_notify(reporter, DEVLINK_CMD_HEALTH_REPORTER_RECOVER);
- /* abort if the previous error wasn't recovered */
- recover_ts_threshold = reporter->last_recovery_ts +
- msecs_to_jiffies(reporter->graceful_period);
- if (reporter->auto_recover &&
- (prev_health_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY ||
- (reporter->last_recovery_ts && reporter->recovery_count &&
- time_is_after_jiffies(recover_ts_threshold)))) {
+ if (devlink_health_recover_abort(reporter, prev_health_state)) {
trace_devlink_health_recover_aborted(devlink,
reporter->ops->name,
reporter->health_state,
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net-next V4 3/5] devlink: Introduce burst period for health reporter
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 1/5] devlink: Move graceful period parameter to reporter ops Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 2/5] devlink: Move health reporter recovery abort logic to a separate function Mark Bloch
@ 2025-08-24 8:43 ` Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 4/5] devlink: Make health reporter burst period configurable Mark Bloch
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Shahar Shitrit, Jiri Pirko,
Mark Bloch, Jiri Pirko
From: Shahar Shitrit <shshitrit@nvidia.com>
Currently, the devlink health reporter starts the grace period
immediately after handling an error, blocking any further recoveries
until it finished.
However, when a single root cause triggers multiple errors in a short
time frame, it is desirable to treat them as a bulk of errors and to
allow their recoveries, avoiding premature blocking of subsequent
related errors, and reducing the risk of inconsistent or incomplete
error handling.
To address this, introduce a configurable burst period for devlink
health reporter. Start this period when the first error is handled,
and allow recovery attempts for reported errors during this window.
Once burst period expires, begin the grace period to block further
recoveries until it concludes.
Timeline summary:
----|--------|------------------------------/----------------------/--
error is error is burst period grace period
reported recovered (recoveries allowed) (recoveries blocked)
For calculating the burst period duration, use the same
last_recovery_ts as the grace period. Update it on recovery only
when the burst period is inactive (either disabled or at the
first error).
This patch implements the framework for the burst period and
effectively sets its value to 0 at reporter creation, so the current
behavior remains unchanged, which ensures backward compatibility.
A downstream patch will make the burst period configurable.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
include/net/devlink.h | 3 +++
net/devlink/health.c | 22 +++++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/include/net/devlink.h b/include/net/devlink.h
index c7ad7a981b39..5f44e702c25c 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -748,6 +748,8 @@ enum devlink_health_reporter_state {
* @test: callback to trigger a test event
* @default_graceful_period: default min time (in msec)
* between recovery attempts
+ * @default_burst_period: default time (in msec) for
+ * error recoveries before starting the grace period
*/
struct devlink_health_reporter_ops {
@@ -763,6 +765,7 @@ struct devlink_health_reporter_ops {
int (*test)(struct devlink_health_reporter *reporter,
struct netlink_ext_ack *extack);
u64 default_graceful_period;
+ u64 default_burst_period;
};
/**
diff --git a/net/devlink/health.c b/net/devlink/health.c
index 9d0d4a9face7..94ab77f77add 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -60,6 +60,7 @@ struct devlink_health_reporter {
struct devlink_port *devlink_port;
struct devlink_fmsg *dump_fmsg;
u64 graceful_period;
+ u64 burst_period;
bool auto_recover;
bool auto_dump;
u8 health_state;
@@ -123,6 +124,7 @@ __devlink_health_reporter_create(struct devlink *devlink,
reporter->ops = ops;
reporter->devlink = devlink;
reporter->graceful_period = ops->default_graceful_period;
+ reporter->burst_period = ops->default_burst_period;
reporter->auto_recover = !!ops->recover;
reporter->auto_dump = !!ops->dump;
return reporter;
@@ -508,11 +510,25 @@ static void devlink_recover_notify(struct devlink_health_reporter *reporter,
devlink_nl_notify_send_desc(devlink, msg, &desc);
}
+static bool
+devlink_health_reporter_in_burst(struct devlink_health_reporter *reporter)
+{
+ unsigned long burst_threshold = reporter->last_recovery_ts +
+ msecs_to_jiffies(reporter->burst_period);
+
+ return time_is_after_jiffies(burst_threshold);
+}
+
void
devlink_health_reporter_recovery_done(struct devlink_health_reporter *reporter)
{
reporter->recovery_count++;
- reporter->last_recovery_ts = jiffies;
+ if (!devlink_health_reporter_in_burst(reporter))
+ /* When burst period is set, last_recovery_ts marks the first
+ * recovery within the burst period, not necessarily the last
+ * one.
+ */
+ reporter->last_recovery_ts = jiffies;
}
EXPORT_SYMBOL_GPL(devlink_health_reporter_recovery_done);
@@ -599,7 +615,11 @@ devlink_health_recover_abort(struct devlink_health_reporter *reporter,
if (prev_state != DEVLINK_HEALTH_REPORTER_STATE_HEALTHY)
return true;
+ if (devlink_health_reporter_in_burst(reporter))
+ return false;
+
recover_ts_threshold = reporter->last_recovery_ts +
+ msecs_to_jiffies(reporter->burst_period) +
msecs_to_jiffies(reporter->graceful_period);
if (reporter->last_recovery_ts && reporter->recovery_count &&
time_is_after_jiffies(recover_ts_threshold))
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net-next V4 4/5] devlink: Make health reporter burst period configurable
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
` (2 preceding siblings ...)
2025-08-24 8:43 ` [PATCH net-next V4 3/5] devlink: Introduce burst period for health reporter Mark Bloch
@ 2025-08-24 8:43 ` Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 5/5] net/mlx5e: Set default burst period for TX and RX reporters Mark Bloch
2025-08-27 0:40 ` [PATCH net-next V4 0/5] Expose burst period for devlink health reporter patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Shahar Shitrit, Jiri Pirko,
Dragos Tatulea, Carolina Jubran, Mark Bloch, Donald Hunter,
Jiri Pirko, Jonathan Corbet
From: Shahar Shitrit <shshitrit@nvidia.com>
Enable configuration of the burst period — a time window starting
from the first error recovery, during which the reporter allows
recovery attempts for each reported error.
This feature is helpful when a single underlying issue causes multiple
errors, as it delays the start of the grace period to allow sufficient
time for recovering all related errors. For example, if multiple TX
queues time out simultaneously, a sufficient burst period could allow
all affected TX queues to be recovered within that window. Without this
period, only the first TX queue that reports a timeout will undergo
recovery, while the remaining TX queues will be blocked once the grace
period begins.
Configuration example:
$ devlink health set pci/0000:00:09.0 reporter tx burst_period 500
Configuration example with ynl:
./tools/net/ynl/pyynl/cli.py \
--spec Documentation/netlink/specs/devlink.yaml \
--do health-reporter-set --json '{
"bus-name": "auxiliary",
"dev-name": "mlx5_core.eth.0",
"port-index": 65535,
"health-reporter-name": "tx",
"health-reporter-burst-period": 500
}'
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
Documentation/netlink/specs/devlink.yaml | 7 +++++
.../networking/devlink/devlink-health.rst | 2 +-
include/uapi/linux/devlink.h | 2 ++
net/devlink/health.c | 28 +++++++++++++++++--
net/devlink/netlink_gen.c | 5 ++--
5 files changed, 39 insertions(+), 5 deletions(-)
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index bb87111d5e16..3db59c965869 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -853,6 +853,10 @@ attribute-sets:
type: nest
multi-attr: true
nested-attributes: dl-rate-tc-bws
+ -
+ name: health-reporter-burst-period
+ type: u64
+ doc: Time (in msec) for recoveries before starting the grace period.
-
name: dl-dev-stats
subset-of: devlink
@@ -1216,6 +1220,8 @@ attribute-sets:
name: health-reporter-dump-ts-ns
-
name: health-reporter-auto-dump
+ -
+ name: health-reporter-burst-period
-
name: dl-attr-stats
@@ -1961,6 +1967,7 @@ operations:
- health-reporter-graceful-period
- health-reporter-auto-recover
- health-reporter-auto-dump
+ - health-reporter-burst-period
-
name: health-reporter-recover
diff --git a/Documentation/networking/devlink/devlink-health.rst b/Documentation/networking/devlink/devlink-health.rst
index e0b8cfed610a..4d10536377ab 100644
--- a/Documentation/networking/devlink/devlink-health.rst
+++ b/Documentation/networking/devlink/devlink-health.rst
@@ -50,7 +50,7 @@ Once an error is reported, devlink health will perform the following actions:
* Auto recovery attempt is being done. Depends on:
- Auto-recovery configuration
- - Grace period vs. time passed since last recover
+ - Grace period (and burst period) vs. time passed since last recover
Devlink formatted message
=========================
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9fcb25a0f447..bcad11a787a5 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -636,6 +636,8 @@ enum devlink_attr {
DEVLINK_ATTR_RATE_TC_BWS, /* nested */
+ DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD, /* u64 */
+
/* Add new attributes above here, update the spec in
* Documentation/netlink/specs/devlink.yaml and re-generate
* net/devlink/netlink_gen.c.
diff --git a/net/devlink/health.c b/net/devlink/health.c
index 94ab77f77add..136a67c36a20 100644
--- a/net/devlink/health.c
+++ b/net/devlink/health.c
@@ -116,6 +116,9 @@ __devlink_health_reporter_create(struct devlink *devlink,
if (WARN_ON(ops->default_graceful_period && !ops->recover))
return ERR_PTR(-EINVAL);
+ if (WARN_ON(ops->default_burst_period && !ops->default_graceful_period))
+ return ERR_PTR(-EINVAL);
+
reporter = kzalloc(sizeof(*reporter), GFP_KERNEL);
if (!reporter)
return ERR_PTR(-ENOMEM);
@@ -293,6 +296,10 @@ devlink_nl_health_reporter_fill(struct sk_buff *msg,
devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD,
reporter->graceful_period))
goto reporter_nest_cancel;
+ if (reporter->ops->recover &&
+ devlink_nl_put_u64(msg, DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD,
+ reporter->burst_period))
+ goto reporter_nest_cancel;
if (reporter->ops->recover &&
nla_put_u8(msg, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
reporter->auto_recover))
@@ -458,16 +465,33 @@ int devlink_nl_health_reporter_set_doit(struct sk_buff *skb,
if (!reporter->ops->recover &&
(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] ||
- info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER]))
+ info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] ||
+ info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD]))
return -EOPNOTSUPP;
if (!reporter->ops->dump &&
info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP])
return -EOPNOTSUPP;
- if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD])
+ if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]) {
reporter->graceful_period =
nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD]);
+ if (!reporter->graceful_period)
+ reporter->burst_period = 0;
+ }
+
+ if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD]) {
+ u64 burst_period =
+ nla_get_u64(info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD]);
+
+ if (!reporter->graceful_period && burst_period) {
+ NL_SET_ERR_MSG_MOD(info->extack,
+ "Cannot set burst period without a grace period.");
+ return -EINVAL;
+ }
+
+ reporter->burst_period = burst_period;
+ }
if (info->attrs[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER])
reporter->auto_recover =
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index d97c326a9045..9fd00977d59e 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -389,7 +389,7 @@ static const struct nla_policy devlink_health_reporter_get_dump_nl_policy[DEVLIN
};
/* DEVLINK_CMD_HEALTH_REPORTER_SET - do */
-static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP + 1] = {
+static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
@@ -397,6 +397,7 @@ static const struct nla_policy devlink_health_reporter_set_nl_policy[DEVLINK_ATT
[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = { .type = NLA_U64, },
[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8, },
[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP] = { .type = NLA_U8, },
+ [DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD] = { .type = NLA_U64, },
};
/* DEVLINK_CMD_HEALTH_REPORTER_RECOVER - do */
@@ -1032,7 +1033,7 @@ const struct genl_split_ops devlink_nl_ops[74] = {
.doit = devlink_nl_health_reporter_set_doit,
.post_doit = devlink_nl_post_doit,
.policy = devlink_health_reporter_set_nl_policy,
- .maxattr = DEVLINK_ATTR_HEALTH_REPORTER_AUTO_DUMP,
+ .maxattr = DEVLINK_ATTR_HEALTH_REPORTER_BURST_PERIOD,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
{
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net-next V4 5/5] net/mlx5e: Set default burst period for TX and RX reporters
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
` (3 preceding siblings ...)
2025-08-24 8:43 ` [PATCH net-next V4 4/5] devlink: Make health reporter burst period configurable Mark Bloch
@ 2025-08-24 8:43 ` Mark Bloch
2025-08-27 0:40 ` [PATCH net-next V4 0/5] Expose burst period for devlink health reporter patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: Mark Bloch @ 2025-08-24 8:43 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Tariq Toukan, Leon Romanovsky, Saeed Mahameed, netdev,
linux-kernel, Gal Pressman, Shahar Shitrit, Dragos Tatulea,
Carolina Jubran, Mark Bloch
From: Shahar Shitrit <shshitrit@nvidia.com>
System errors can sometimes cause multiple errors to be reported
to the TX reporter at the same time. For instance, lost interrupts
may cause several SQs to time out simultaneously. When dev_watchdog
notifies the driver for that, it iterates over all SQs to trigger
recovery for the timed-out ones, via TX health reporter.
However, grace period allows only one recovery at a time, so only
the first SQ recovers while others remain blocked. Since no further
recoveries are allowed during the grace period, subsequent errors
cause the reporter to enter an ERROR state, requiring manual
intervention.
To address this, set the TX reporter's default burst period
to 0.5 second. This allows the reporter to detect and handle all
timed-out SQs within this window before initiating the grace period.
To account for the possibility of a similar issue in the RX reporter,
its default burst period is also configured.
Additionally, while here, align the TX definition prefix with the RX,
as these are used only in EN driver.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c | 2 ++
drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 1b9ea72abc5a..eb1cace5910c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -652,6 +652,7 @@ void mlx5e_reporter_icosq_resume_recovery(struct mlx5e_channel *c)
}
#define MLX5E_REPORTER_RX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_RX_BURST_PERIOD 500
static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
.name = "rx",
@@ -659,6 +660,7 @@ static const struct devlink_health_reporter_ops mlx5_rx_reporter_ops = {
.diagnose = mlx5e_rx_reporter_diagnose,
.dump = mlx5e_rx_reporter_dump,
.default_graceful_period = MLX5E_REPORTER_RX_GRACEFUL_PERIOD,
+ .default_burst_period = MLX5E_REPORTER_RX_BURST_PERIOD,
};
void mlx5e_reporter_rx_create(struct mlx5e_priv *priv)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 7a4a77f6fe6a..5a4fe8403a21 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -539,14 +539,16 @@ void mlx5e_reporter_tx_ptpsq_unhealthy(struct mlx5e_ptpsq *ptpsq)
mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
}
-#define MLX5_REPORTER_TX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_TX_GRACEFUL_PERIOD 500
+#define MLX5E_REPORTER_TX_BURST_PERIOD 500
static const struct devlink_health_reporter_ops mlx5_tx_reporter_ops = {
.name = "tx",
.recover = mlx5e_tx_reporter_recover,
.diagnose = mlx5e_tx_reporter_diagnose,
.dump = mlx5e_tx_reporter_dump,
- .default_graceful_period = MLX5_REPORTER_TX_GRACEFUL_PERIOD,
+ .default_graceful_period = MLX5E_REPORTER_TX_GRACEFUL_PERIOD,
+ .default_burst_period = MLX5E_REPORTER_TX_BURST_PERIOD,
};
void mlx5e_reporter_tx_create(struct mlx5e_priv *priv)
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next V4 0/5] Expose burst period for devlink health reporter
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
` (4 preceding siblings ...)
2025-08-24 8:43 ` [PATCH net-next V4 5/5] net/mlx5e: Set default burst period for TX and RX reporters Mark Bloch
@ 2025-08-27 0:40 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-08-27 0:40 UTC (permalink / raw)
To: Mark Bloch
Cc: edumazet, kuba, pabeni, andrew+netdev, davem, tariqt, leon,
saeedm, netdev, linux-kernel, gal
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 24 Aug 2025 11:43:49 +0300 you wrote:
> Hi,
>
> This series by Shahar implements burst period in devlink health
> reporter, and use it in mlx5e driver.
>
> This is V4. Find previous versions here:
> v3: https://lore.kernel.org/all/1755111349-416632-1-git-send-email-tariqt@nvidia.com/
> v2: https://lore.kernel.org/all/1753390134-345154-1-git-send-email-tariqt@nvidia.com/
> v1: https://lore.kernel.org/all/1752768442-264413-1-git-send-email-tariqt@nvidia.com/
>
> [...]
Here is the summary with links:
- [net-next,V4,1/5] devlink: Move graceful period parameter to reporter ops
https://git.kernel.org/netdev/net-next/c/d2b007374551
- [net-next,V4,2/5] devlink: Move health reporter recovery abort logic to a separate function
https://git.kernel.org/netdev/net-next/c/20597fb9436e
- [net-next,V4,3/5] devlink: Introduce burst period for health reporter
https://git.kernel.org/netdev/net-next/c/6a06d8c40510
- [net-next,V4,4/5] devlink: Make health reporter burst period configurable
https://git.kernel.org/netdev/net-next/c/da0e2197645c
- [net-next,V4,5/5] net/mlx5e: Set default burst period for TX and RX reporters
https://git.kernel.org/netdev/net-next/c/2d5ccb93bbb4
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-27 0:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 8:43 [PATCH net-next V4 0/5] Expose burst period for devlink health reporter Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 1/5] devlink: Move graceful period parameter to reporter ops Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 2/5] devlink: Move health reporter recovery abort logic to a separate function Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 3/5] devlink: Introduce burst period for health reporter Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 4/5] devlink: Make health reporter burst period configurable Mark Bloch
2025-08-24 8:43 ` [PATCH net-next V4 5/5] net/mlx5e: Set default burst period for TX and RX reporters Mark Bloch
2025-08-27 0:40 ` [PATCH net-next V4 0/5] Expose burst period for devlink health reporter patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).