From: clsoto@linux.vnet.ibm.com
To: eli@mellanox.com, roland@kernel.org, sean.hefty@intel.com,
hal.rosenstock@gmail.com, linux-rdma@vger.kernel.org,
netdev@vger.kernel.org
Cc: brking@linux.vnet.ibm.com, Carol Soto <clsoto@linux.vnet.ibm.com>
Subject: [Patch 1/2] IB/mlx5: Implementation of PCI error handler
Date: Tue, 11 Mar 2014 22:42:20 -0500 [thread overview]
Message-ID: <20140312034512.065218504@linux.vnet.ibm.com> (raw)
In-Reply-To: 20140312034219.637916521@linux.vnet.ibm.com
[-- Attachment #1: ib_mlx5_add_pci_error.patch --]
[-- Type: text/plain, Size: 3396 bytes --]
This patch is to add PCI error handler function support for mlx5.
Created the functions for error_detected and slot_rest, plus will
send a port down event to users when the driver error_detected
function is invoked. This is to prevent a hang seeing in
mcast_remove_one at the time ib_unregister_device is called for the
ib_sa module. It will fail hardware commands while the driver is
handling a PCI error. It will reduce the hardware commands timeout
to 10 msecs so it does not hang waiting for an interrupt of the
completion of the hardware command.
Signed-off-by: Carol Soto <clsoto@linux.vnet.ibm.com>
---
drivers/infiniband/hw/mlx5/main.c | 32 +++++++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 7 +++++
include/linux/mlx5/driver.h | 4 +--
3 files changed, 40 insertions(+), 3 deletions(-)
Index: b/drivers/infiniband/hw/mlx5/main.c
===================================================================
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1508,11 +1508,41 @@ static DEFINE_PCI_DEVICE_TABLE(mlx5_ib_p
MODULE_DEVICE_TABLE(pci, mlx5_ib_pci_table);
+static pci_ers_result_t mlx5_pci_err_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
+{
+ struct mlx5_ib_dev *dev = mlx5_pci2ibdev(pdev);
+ struct mlx5_core_dev *mdev = &dev->mdev;
+ u8 port;
+
+ /* To avoid the mcast hang with ipoib up */
+ for (port = 1; port <= dev->mdev.caps.num_ports; port++)
+ mlx5_ib_event(mdev, MLX5_DEV_EVENT_PORT_DOWN, &port);
+
+ remove_one(pdev);
+
+ return state == pci_channel_io_perm_failure ?
+ PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t mlx5_pci_slot_reset(struct pci_dev *pdev)
+{
+ int ret = init_one(pdev, 0);
+
+ return ret ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED;
+}
+
+static const struct pci_error_handlers mlx5_err_handler = {
+ .error_detected = mlx5_pci_err_detected,
+ .slot_reset = mlx5_pci_slot_reset,
+};
+
static struct pci_driver mlx5_ib_driver = {
.name = DRIVER_NAME,
.id_table = mlx5_ib_pci_table,
.probe = init_one,
- .remove = remove_one
+ .remove = remove_one,
+ .err_handler = &mlx5_err_handler,
};
static int __init mlx5_ib_init(void)
Index: b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
===================================================================
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -646,6 +646,13 @@ static int mlx5_cmd_invoke(struct mlx5_c
if (callback && page_queue)
return -EINVAL;
+ if (pci_channel_offline(dev->pdev)) {
+ /* Device is going through error recovery
+ * and cannot accept commands.
+ */
+ return -EIO;
+ }
+
ent = alloc_cmd(cmd, in, out, uout, uout_size, callback, context,
page_queue);
if (IS_ERR(ent))
Index: b/include/linux/mlx5/driver.h
===================================================================
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -51,10 +51,10 @@ enum {
};
enum {
- /* one minute for the sake of bringup. Generally, commands must always
+ /* 10 msecs for the sake of bringup. Generally, commands must always
* complete and we may need to increase this timeout value
*/
- MLX5_CMD_TIMEOUT_MSEC = 7200 * 1000,
+ MLX5_CMD_TIMEOUT_MSEC = 10 * 1000,
MLX5_CMD_WQ_MAX_NAME = 32,
};
--
next prev parent reply other threads:[~2014-03-12 3:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-12 3:42 [Patch 0/2] IB/mlx5: Add PCI error handler support for mlx5 clsoto-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
2014-03-12 3:42 ` clsoto [this message]
[not found] ` <20140312034512.065218504-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2014-03-12 18:34 ` [Patch 1/2] IB/mlx5: Implementation of PCI error handler Ben Hutchings
[not found] ` <1394649252.23624.36.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
2014-03-12 22:00 ` Carol Soto
2014-03-13 6:45 ` Eli Cohen
2014-03-13 15:12 ` Carol Soto
2014-03-13 15:40 ` Eli Cohen
2014-03-13 15:51 ` Carol Soto
2014-03-13 16:03 ` Eli Cohen
2014-03-13 16:26 ` Carol Soto
2014-03-12 3:42 ` [Patch 2/2] IB/mlx5: Free resources during PCI error clsoto-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140312034512.065218504@linux.vnet.ibm.com \
--to=clsoto@linux.vnet.ibm.com \
--cc=brking@linux.vnet.ibm.com \
--cc=eli@mellanox.com \
--cc=hal.rosenstock@gmail.com \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=roland@kernel.org \
--cc=sean.hefty@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.