stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Huy Nguyen <huyn@mellanox.com>,
	Moshe Shemesh <moshe@mellanox.com>,
	Saeed Mahameed <saeedm@mellanox.com>
Subject: [PATCH 4.13 15/35] net/mlx5: Cancel health poll before sending panic teardown command
Date: Wed, 22 Nov 2017 11:12:09 +0100	[thread overview]
Message-ID: <20171122101138.912550892@linuxfoundation.org> (raw)
In-Reply-To: <20171122101137.661212603@linuxfoundation.org>

4.13-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Huy Nguyen <huyn@mellanox.com>


[ Upstream commit d2aa060d40fa060e963f9a356d43481e43ba3dac ]

After the panic teardown firmware command, health_care detects the error
in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
no longer needed because the panic teardown firmware command will bring
down the PCI bus communication with the HCA.

The solution is to cancel the health care timer and its pending
workqueue request before sending panic teardown firmware command.

Kernel trace:
mlx5_core 0033:01:00.0: Shutdown was called
mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
Unable to handle kernel paging request for data at address 0x0000003f
Faulting instruction address: 0xc0080000434b8c80

Fixes: 8812c24d28f4 ('net/mlx5: Add fast unload support in shutdown flow')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1545,9 +1545,16 @@ static int mlx5_try_fast_unload(struct m
 		return -EAGAIN;
 	}
 
+	/* Panic tear down fw command will stop the PCI bus communication
+	 * with the HCA, so the health polll is no longer needed.
+	 */
+	mlx5_drain_health_wq(dev);
+	mlx5_stop_health_poll(dev);
+
 	ret = mlx5_cmd_force_teardown_hca(dev);
 	if (ret) {
 		mlx5_core_dbg(dev, "Firmware couldn't do fast unload error: %d\n", ret);
+		mlx5_start_health_poll(dev);
 		return ret;
 	}
 

  parent reply	other threads:[~2017-11-22 10:16 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 10:11 [PATCH 4.13 00/35] 4.13.16-stable review Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 01/35] tcp_nv: fix division by zero in tcpnv_acked() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 02/35] net: vrf: correct FRA_L3MDEV encode type Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 03/35] tcp: do not mangle skb->cb[] in tcp_make_synack() Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 04/35] net: systemport: Correct IPG length settings Greg Kroah-Hartman
2017-11-22 10:11 ` [PATCH 4.13 05/35] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 06/35] l2tp: dont use l2tp_tunnel_find() in l2tp_ip and l2tp_ip6 Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 07/35] bonding: discard lowest hash bit for 802.3ad layer3+4 Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 11/35] net: usb: asix: fill null-ptr-deref in asix_suspend Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 12/35] tcp: gso: avoid refcount_t warning from tcp_gso_segment() Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 13/35] tcp: fix tcp_fastretrans_alert warning Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 14/35] vlan: fix a use-after-free in vlan_device_event() Greg Kroah-Hartman
2017-11-22 10:12 ` Greg Kroah-Hartman [this message]
2017-11-22 10:12 ` [PATCH 4.13 16/35] net/mlx5e: Set page to null in case dma mapping fails Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 17/35] af_netlink: ensure that NLMSG_DONE never fails in dumps Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 18/35] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 20/35] sctp: do not peel off an assoc from one netns to another one Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 21/35] fealnx: Fix building error on MIPS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 22/35] net/sctp: Always set scope_id in sctp_inet6_skb_msgname Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 23/35] ima: do not update security.ima if appraisal status is not INTEGRITY_PASS Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 24/35] serial: omap: Fix EFR write on RTS deassertion Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 25/35] serial: 8250_fintek: Fix finding base_port with activated SuperIO Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 26/35] tpm-dev-common: Reject too short writes Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 27/35] rcu: Fix up pending cbs check in rcu_prepare_for_idle Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 28/35] mm/pagewalk.c: report holes in hugetlb ranges Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 29/35] ocfs2: fix cluster hang after a node dies Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 30/35] ocfs2: should wait dio before inode lock in ocfs2_setattr() Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 31/35] ipmi: fix unsigned long underflow Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 32/35] mm/page_alloc.c: broken deferred calculation Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 33/35] mm/page_ext.c: check if page_ext is not prepared Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 34/35] x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask Greg Kroah-Hartman
2017-11-22 10:12 ` [PATCH 4.13 35/35] coda: fix kernel memory exposure attempt in fsync Greg Kroah-Hartman
2017-11-22 16:49 ` [PATCH 4.13 00/35] 4.13.16-stable review Greg Kroah-Hartman
2017-11-22 21:33 ` Guenter Roeck
2017-11-23 14:48 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122101138.912550892@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=huyn@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=moshe@mellanox.com \
    --cc=saeedm@mellanox.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).