From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, Gal Pressman <gal@nvidia.com>,
Moshe Shemesh <moshe@nvidia.com>, Shay Drori <shayd@nvidia.com>
Subject: [PATCH net 3/3] net/mlx5: fw reset, add reset timeout work
Date: Mon, 29 Sep 2025 00:02:09 +0300 [thread overview]
Message-ID: <1759093329-840612-4-git-send-email-tariqt@nvidia.com> (raw)
In-Reply-To: <1759093329-840612-1-git-send-email-tariqt@nvidia.com>
From: Moshe Shemesh <moshe@nvidia.com>
Add sync reset timeout to stop poll_sync_reset in case there was no
reset done or abort event within timeout. Otherwise poll sync reset will
just continue and in case of fw fatal error no health reporting will be
done.
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/fw_reset.c | 24 +++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 22995131824a..89e399606877 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -27,6 +27,7 @@ struct mlx5_fw_reset {
struct work_struct reset_reload_work;
struct work_struct reset_now_work;
struct work_struct reset_abort_work;
+ struct delayed_work reset_timeout_work;
unsigned long reset_flags;
u8 reset_method;
struct timer_list timer;
@@ -259,6 +260,8 @@ static int mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, bool
return -EALREADY;
}
+ if (current_work() != &fw_reset->reset_timeout_work.work)
+ cancel_delayed_work(&fw_reset->reset_timeout_work);
mlx5_stop_sync_reset_poll(dev);
if (poll_health)
mlx5_start_health_poll(dev);
@@ -330,6 +333,11 @@ static int mlx5_sync_reset_set_reset_requested(struct mlx5_core_dev *dev)
}
mlx5_stop_health_poll(dev, true);
mlx5_start_sync_reset_poll(dev);
+
+ if (!test_bit(MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS,
+ &fw_reset->reset_flags))
+ schedule_delayed_work(&fw_reset->reset_timeout_work,
+ msecs_to_jiffies(mlx5_tout_ms(dev, PCI_SYNC_UPDATE)));
return 0;
}
@@ -739,6 +747,19 @@ static void mlx5_sync_reset_events_handle(struct mlx5_fw_reset *fw_reset, struct
}
}
+static void mlx5_sync_reset_timeout_work(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct mlx5_fw_reset *fw_reset =
+ container_of(dwork, struct mlx5_fw_reset, reset_timeout_work);
+ struct mlx5_core_dev *dev = fw_reset->dev;
+
+ if (mlx5_sync_reset_clear_reset_requested(dev, true))
+ return;
+ mlx5_core_warn(dev, "PCI Sync FW Update Reset Timeout.\n");
+}
+
static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long action, void *data)
{
struct mlx5_fw_reset *fw_reset = mlx5_nb_cof(nb, struct mlx5_fw_reset, nb);
@@ -822,6 +843,7 @@ void mlx5_drain_fw_reset(struct mlx5_core_dev *dev)
cancel_work_sync(&fw_reset->reset_reload_work);
cancel_work_sync(&fw_reset->reset_now_work);
cancel_work_sync(&fw_reset->reset_abort_work);
+ cancel_delayed_work(&fw_reset->reset_timeout_work);
}
static const struct devlink_param mlx5_fw_reset_devlink_params[] = {
@@ -865,6 +887,8 @@ int mlx5_fw_reset_init(struct mlx5_core_dev *dev)
INIT_WORK(&fw_reset->reset_reload_work, mlx5_sync_reset_reload_work);
INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event);
INIT_WORK(&fw_reset->reset_abort_work, mlx5_sync_reset_abort_event);
+ INIT_DELAYED_WORK(&fw_reset->reset_timeout_work,
+ mlx5_sync_reset_timeout_work);
init_completion(&fw_reset->done);
return 0;
--
2.31.1
next prev parent reply other threads:[~2025-09-28 21:02 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-28 21:02 [PATCH net 0/3] mlx5 misc fixes 2025-09-28 Tariq Toukan
2025-09-28 21:02 ` [PATCH net 1/3] net/mlx5: Stop polling for command response if interface goes down Tariq Toukan
2025-09-28 21:02 ` [PATCH net 2/3] net/mlx5: pagealloc: Fix reclaim race during command interface teardown Tariq Toukan
2025-09-28 21:02 ` Tariq Toukan [this message]
2025-09-30 2:00 ` [PATCH net 0/3] mlx5 misc fixes 2025-09-28 patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1759093329-840612-4-git-send-email-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).