All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Jake Lawrence <lawja@fb.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Saeed Mahameed <saeedm@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.14 32/51] mlx4: disable device on shutdown
Date: Mon,  3 Aug 2020 14:20:17 +0200	[thread overview]
Message-ID: <20200803121851.092119670@linuxfoundation.org> (raw)
In-Reply-To: <20200803121849.488233135@linuxfoundation.org>

From: Jakub Kicinski <kuba@kernel.org>

[ Upstream commit 3cab8c65525920f00d8f4997b3e9bb73aecb3a8e ]

It appears that not disabling a PCI device on .shutdown may lead to
a Hardware Error with particular (perhaps buggy) BIOS versions:

    mlx4_en: eth0: Close port called
    mlx4_en 0000:04:00.0: removed PHC
    reboot: Restarting system
    {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
    {1}[Hardware Error]: event severity: fatal
    {1}[Hardware Error]:  Error 0, type: fatal
    {1}[Hardware Error]:   section_type: PCIe error
    {1}[Hardware Error]:   port_type: 4, root port
    {1}[Hardware Error]:   version: 1.16
    {1}[Hardware Error]:   command: 0x4010, status: 0x0143
    {1}[Hardware Error]:   device_id: 0000:00:02.2
    {1}[Hardware Error]:   slot: 0
    {1}[Hardware Error]:   secondary_bus: 0x04
    {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2f06
    {1}[Hardware Error]:   class_code: 000604
    {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
    {1}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
    {1}[Hardware Error]:   aer_uncor_severity: 0x00062030
    {1}[Hardware Error]:   TLP Header: 40000018 040000ff 791f4080 00000000
[hw error repeats]
    Kernel panic - not syncing: Fatal hardware error!
    CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
    Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017

Fix the mlx4 driver.

This is a very similar problem to what had been fixed in:
commit 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")
to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.

Fixes: 2ba5fbd62b25 ("net/mlx4_core: Handle AER flow properly")
Reported-by: Jake Lawrence <lawja@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index cf9011bb6e0f1..c6660b61e8361 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -4190,12 +4190,14 @@ end:
 static void mlx4_shutdown(struct pci_dev *pdev)
 {
 	struct mlx4_dev_persistent *persist = pci_get_drvdata(pdev);
+	struct mlx4_dev *dev = persist->dev;
 
 	mlx4_info(persist->dev, "mlx4_shutdown was called\n");
 	mutex_lock(&persist->interface_state_mutex);
 	if (persist->interface_state & MLX4_INTERFACE_STATE_UP)
 		mlx4_unload_one(pdev);
 	mutex_unlock(&persist->interface_state_mutex);
+	mlx4_pci_disable_device(dev);
 }
 
 static const struct pci_error_handlers mlx4_err_handler = {
-- 
2.25.1




  parent reply	other threads:[~2020-08-03 12:38 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 12:19 [PATCH 4.14 00/51] 4.14.192-rc1 review Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 01/51] scsi: libsas: direct call probe and destruct Greg Kroah-Hartman
2020-08-03 12:57   ` John Garry
2020-08-05  9:52     ` Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 02/51] net: phy: mdio-bcm-unimac: fix potential NULL dereference in unimac_mdio_probe() Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 03/51] crypto: ccp - Release all allocated memory if sha type is invalid Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 04/51] media: rc: prevent memory leak in cx23888_ir_probe Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 05/51] iio: imu: adis16400: fix memory leak Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 06/51] ath9k_htc: release allocated buffer if timed out Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 07/51] ath9k: " Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 08/51] x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 09/51] PCI/ASPM: Disable ASPM on ASMedia ASM1083/1085 PCIe-to-PCI bridge Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 10/51] wireless: Use offsetof instead of custom macro Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 11/51] ARM: 8986/1: hw_breakpoint: Dont invoke overflow handler on uaccess watchpoints Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 12/51] random32: update the net random state on interrupt and activity Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 13/51] ARM: percpu.h: fix build error Greg Kroah-Hartman
2020-08-03 12:19 ` [PATCH 4.14 14/51] drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl() Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 15/51] drm: hold gem reference until object is no longer accessed Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 16/51] f2fs: check memory boundary by insane namelen Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 17/51] f2fs: check if file namelen exceeds max value Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 18/51] random: fix circular include dependency on arm64 after addition of percpu.h Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 19/51] random32: remove net_rand_state from the latent entropy gcc plugin Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 20/51] 9p/trans_fd: abort p9_read_work if req status changed Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 21/51] 9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 22/51] x86/build/lto: Fix truncated .bss with -fdata-sections Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 23/51] x86, vmlinux.lds: Page-align end of ..page_aligned sections Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 24/51] rds: Prevent kernel-infoleak in rds_notify_queue_get() Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 25/51] xfs: fix missed wakeup on l_flush_wait Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 26/51] net/x25: Fix x25_neigh refcnt leak when x25 disconnect Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 27/51] net/x25: Fix null-ptr-deref in x25_disconnect Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 28/51] selftests/net: rxtimestamp: fix clang issues for target arch PowerPC Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 29/51] sh: Fix validation of system call number Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 30/51] net: lan78xx: add missing endpoint sanity check Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 31/51] net: lan78xx: fix transfer-buffer memory leak Greg Kroah-Hartman
2020-08-03 12:20 ` Greg Kroah-Hartman [this message]
2020-08-03 12:20 ` [PATCH 4.14 33/51] mlxsw: core: Increase scope of RCU read-side critical section Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 34/51] mlxsw: core: Free EMAD transactions using kfree_rcu() Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 35/51] ibmvnic: Fix IRQ mapping disposal in error path Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 36/51] bpf: Fix map leak in HASH_OF_MAPS map Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 37/51] mac80211: mesh: Free ie data when leaving mesh Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 38/51] mac80211: mesh: Free pending skb when destroying a mpath Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 39/51] arm64/alternatives: move length validation inside the subsection Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 40/51] arm64: csum: Fix handling of bad packets Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 41/51] usb: hso: Fix debug compile warning on sparc32 Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 42/51] qed: Disable "MFW indication via attention" SPAM every 5 minutes Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 43/51] nfc: s3fwrn5: add missing release on skb in s3fwrn5_recv_frame Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 44/51] parisc: add support for cmpxchg on u8 pointers Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 45/51] net: ethernet: ravb: exit if re-initialization fails in tx timeout Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 46/51] Revert "i2c: cadence: Fix the hold bit setting" Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 47/51] x86/unwind/orc: Fix ORC for newly forked tasks Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 48/51] cxgb4: add missing release on skb in uld_send() Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 49/51] xen-netfront: fix potential deadlock in xennet_remove() Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 50/51] KVM: LAPIC: Prevent setting the tscdeadline timer if the lapic is hw disabled Greg Kroah-Hartman
2020-08-03 12:20 ` [PATCH 4.14 51/51] x86/i8259: Use printk_deferred() to prevent deadlock Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200803121851.092119670@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=lawja@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.