From: Saeed Mahameed <saeedm@mellanox.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Huy Nguyen <huyn@mellanox.com>,
Daniel Jurgens <danielj@mellanox.com>,
Saeed Mahameed <saeedm@mellanox.com>
Subject: [net 2/4] net/mlx5: No command allowed when command interface is not ready
Date: Wed, 13 Feb 2019 15:44:49 -0800 [thread overview]
Message-ID: <20190213234451.27029-3-saeedm@mellanox.com> (raw)
In-Reply-To: <20190213234451.27029-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.
Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.
Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 18 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/health.c | 2 +-
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 1 +
3 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 3e0fa8a8077b..e267ff93e8a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1583,6 +1583,24 @@ void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev)
spin_unlock_irqrestore(&dev->cmd.alloc_lock, flags);
}
+void mlx5_cmd_flush(struct mlx5_core_dev *dev)
+{
+ struct mlx5_cmd *cmd = &dev->cmd;
+ int i;
+
+ for (i = 0; i < cmd->max_reg_cmds; i++)
+ while (down_trylock(&cmd->sem))
+ mlx5_cmd_trigger_completions(dev);
+
+ while (down_trylock(&cmd->pages_sem))
+ mlx5_cmd_trigger_completions(dev);
+
+ /* Unlock cmdif */
+ up(&cmd->pages_sem);
+ for (i = 0; i < cmd->max_reg_cmds; i++)
+ up(&cmd->sem);
+}
+
static int status_to_err(u8 status)
{
return status ? -1 : 0; /* TBD more meaningful codes */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 196c07383082..cb9fa3430c53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -103,7 +103,7 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, bool force)
mlx5_core_err(dev, "start\n");
if (pci_channel_offline(dev->pdev) || in_fatal(dev) || force) {
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
- mlx5_cmd_trigger_completions(dev);
+ mlx5_cmd_flush(dev);
}
mlx5_notifier_call_chain(dev->priv.events, MLX5_DEV_EVENT_SYS_ERROR, (void *)1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 5300b0b6d836..4fdac020b795 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -126,6 +126,7 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev,
struct ptp_system_timestamp *sts);
void mlx5_cmd_trigger_completions(struct mlx5_core_dev *dev);
+void mlx5_cmd_flush(struct mlx5_core_dev *dev);
int mlx5_cq_debugfs_init(struct mlx5_core_dev *dev);
void mlx5_cq_debugfs_cleanup(struct mlx5_core_dev *dev);
--
2.20.1
next prev parent reply other threads:[~2019-02-13 23:45 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-13 23:44 [pull request][net 0/4] Mellanox, mlx5 fixes 2019-02-13 Saeed Mahameed
2019-02-13 23:44 ` [net 1/4] net/mlx5e: Fix NULL pointer derefernce in set channels error flow Saeed Mahameed
2019-02-13 23:44 ` Saeed Mahameed [this message]
2019-02-13 23:44 ` [net 3/4] net/mlx5: Fix a compilation warning in events.c Saeed Mahameed
2019-02-13 23:44 ` [net 4/4] net/mlx5e: XDP, fix redirect resources availability check Saeed Mahameed
2019-02-14 0:10 ` [pull request][net 0/4] Mellanox, mlx5 fixes 2019-02-13 David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190213234451.27029-3-saeedm@mellanox.com \
--to=saeedm@mellanox.com \
--cc=danielj@mellanox.com \
--cc=davem@davemloft.net \
--cc=huyn@mellanox.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox