From: Saeed Mahameed <saeedm@nvidia.com>
To: "David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, Moshe Shemesh <moshe@nvidia.com>,
Maher Sanalla <msanalla@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>
Subject: [net 13/15] net/mlx5: Fix deadlock in sync reset flow
Date: Wed, 4 May 2022 00:02:54 -0700 [thread overview]
Message-ID: <20220504070256.694458-14-saeedm@nvidia.com> (raw)
In-Reply-To: <20220504070256.694458-1-saeedm@nvidia.com>
From: Moshe Shemesh <moshe@nvidia.com>
The sync reset flow can lead to the following deadlock when
poll_sync_reset() is called by timer softirq and waiting on
del_timer_sync() for the same timer. Fix that by moving the part of the
flow that waits for the timer to reset_reload_work.
It fixes the following kernel Trace:
RIP: 0010:del_timer_sync+0x32/0x40
...
Call Trace:
<IRQ>
mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
poll_sync_reset.cold+0x36/0x52 [mlx5_core]
call_timer_fn+0x32/0x130
__run_timers.part.0+0x180/0x280
? tick_sched_handle+0x33/0x60
? tick_sched_timer+0x3d/0x80
? ktime_get+0x3e/0xa0
run_timer_softirq+0x2a/0x50
__do_softirq+0xe1/0x2d6
? hrtimer_interrupt+0x136/0x220
irq_exit+0xae/0xb0
smp_apic_timer_interrupt+0x7b/0x140
apic_timer_interrupt+0xf/0x20
</IRQ>
Fixes: 3c5193a87b0f ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
.../ethernet/mellanox/mlx5/core/fw_reset.c | 34 +++++++++----------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 4aa22dce9b77..ec18d4ccbc11 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -155,22 +155,6 @@ static void mlx5_fw_reset_complete_reload(struct mlx5_core_dev *dev)
}
}
-static void mlx5_sync_reset_reload_work(struct work_struct *work)
-{
- struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
- reset_reload_work);
- struct mlx5_core_dev *dev = fw_reset->dev;
- int err;
-
- mlx5_enter_error_state(dev, true);
- mlx5_unload_one(dev);
- err = mlx5_health_wait_pci_up(dev);
- if (err)
- mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
- fw_reset->ret = err;
- mlx5_fw_reset_complete_reload(dev);
-}
-
static void mlx5_stop_sync_reset_poll(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
@@ -188,6 +172,23 @@ static void mlx5_sync_reset_clear_reset_requested(struct mlx5_core_dev *dev, boo
mlx5_start_health_poll(dev);
}
+static void mlx5_sync_reset_reload_work(struct work_struct *work)
+{
+ struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+ reset_reload_work);
+ struct mlx5_core_dev *dev = fw_reset->dev;
+ int err;
+
+ mlx5_sync_reset_clear_reset_requested(dev, false);
+ mlx5_enter_error_state(dev, true);
+ mlx5_unload_one(dev);
+ err = mlx5_health_wait_pci_up(dev);
+ if (err)
+ mlx5_core_err(dev, "reset reload flow aborted, PCI reads still not working\n");
+ fw_reset->ret = err;
+ mlx5_fw_reset_complete_reload(dev);
+}
+
#define MLX5_RESET_POLL_INTERVAL (HZ / 10)
static void poll_sync_reset(struct timer_list *t)
{
@@ -202,7 +203,6 @@ static void poll_sync_reset(struct timer_list *t)
if (fatal_error) {
mlx5_core_warn(dev, "Got Device Reset\n");
- mlx5_sync_reset_clear_reset_requested(dev, false);
queue_work(fw_reset->wq, &fw_reset->reset_reload_work);
return;
}
--
2.35.1
next prev parent reply other threads:[~2022-05-04 7:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-04 7:02 [pull request][net 00/15] mlx5 fixes 2022-05-03 Saeed Mahameed
2022-05-04 7:02 ` [net 01/15] net/mlx5e: Fix wrong source vport matching on tunnel rule Saeed Mahameed
2022-05-04 10:10 ` patchwork-bot+netdevbpf
2022-05-04 7:02 ` [net 02/15] net/mlx5: Fix slab-out-of-bounds while reading resource dump menu Saeed Mahameed
2022-05-04 7:02 ` [net 03/15] net/mlx5e: Don't match double-vlan packets if cvlan is not set Saeed Mahameed
2022-05-04 7:02 ` [net 04/15] net/mlx5e: Fix the calling of update_buffer_lossy() API Saeed Mahameed
2022-05-04 7:02 ` [net 05/15] net/mlx5e: Lag, Fix use-after-free in fib event handler Saeed Mahameed
2022-05-04 7:02 ` [net 06/15] net/mlx5e: Lag, Fix fib_info pointer assignment Saeed Mahameed
2022-05-04 7:02 ` [net 07/15] net/mlx5e: Lag, Don't skip fib events on current dst Saeed Mahameed
2022-05-04 7:02 ` [net 08/15] net/mlx5e: TC, Fix ct_clear overwriting ct action metadata Saeed Mahameed
2022-05-04 7:02 ` [net 09/15] net/mlx5e: TC, fix decap fallback to uplink when int port not supported Saeed Mahameed
2022-05-04 7:02 ` [net 10/15] net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release Saeed Mahameed
2022-05-04 7:02 ` [net 11/15] net/mlx5e: Avoid checking offload capability in post_parse action Saeed Mahameed
2022-05-04 7:02 ` [net 12/15] net/mlx5e: Fix trust state reset in reload Saeed Mahameed
2022-05-04 7:02 ` Saeed Mahameed [this message]
2022-05-04 7:02 ` [net 14/15] net/mlx5: Avoid double clear or set of sync reset requested Saeed Mahameed
2022-05-04 7:02 ` [net 15/15] net/mlx5: Fix matching on inner TTC Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220504070256.694458-14-saeedm@nvidia.com \
--to=saeedm@nvidia.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=moshe@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.