public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Michael Chan <michael.chan@broadcom.com>,
	Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>,
	siva.kallam@broadcom.com, prashant@broadcom.com,
	mchan@broadcom.com, davem@davemloft.net, edumazet@google.com,
	pabeni@redhat.com, netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 5.19 19/38] tg3: Disable tg3 device on system reboot to avoid triggering AER
Date: Sat, 10 Sep 2022 17:16:04 -0400	[thread overview]
Message-ID: <20220910211623.69825-19-sashal@kernel.org> (raw)
In-Reply-To: <20220910211623.69825-1-sashal@kernel.org>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

[ Upstream commit 2ca1c94ce0b65a2ce7512b718f3d8a0fe6224bca ]

Commit d60cd06331a3 ("PM: ACPI: reboot: Use S5 for reboot") caused a
reboot hang on one Dell servers so the commit was reverted.

Someone managed to collect the AER log and it's caused by MSI:
[ 148.762067] ACPI: Preparing to enter system sleep state S5
[ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5
[ 148.803731] {1}[Hardware Error]: event severity: recoverable
[ 148.810191] {1}[Hardware Error]: Error 0, type: fatal
[ 148.816088] {1}[Hardware Error]: section_type: PCIe error
[ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point
[ 148.829026] {1}[Hardware Error]: version: 3.0
[ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010
[ 148.841140] {1}[Hardware Error]: device_id: 0000:04:00.0
[ 148.847309] {1}[Hardware Error]: slot: 0
[ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00
[ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f
[ 148.865145] {1}[Hardware Error]: class_code: 020000
[ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00010000
[ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030
[ 148.886575] {1}[Hardware Error]: TLP Header: 40000001 0000030f 90028090 00000000
[ 148.894823] tg3 0000:04:00.0: AER: aer_status: 0x00100000, aer_mask: 0x00010000
[ 148.902795] tg3 0000:04:00.0: AER: [20] UnsupReq (First)
[ 148.910234] tg3 0000:04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID
[ 148.918806] tg3 0000:04:00.0: AER: aer_uncor_severity: 0x000ef030
[ 148.925558] tg3 0000:04:00.0: AER: TLP Header: 40000001 0000030f 90028090 00000000

The MSI is probably raised by incoming packets, so power down the device
and disable bus mastering to stop the traffic, as user confirmed this
approach works.

In addition to that, be extra safe and cancel reset task if it's running.

Cc: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/all/b8db79e6857c41dab4ef08bdf826ea7c47e3bafc.1615947283.git.josef@toxicpanda.com/
BugLink: https://bugs.launchpad.net/bugs/1917471
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220826002530.1153296-1-kai.heng.feng@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/broadcom/tg3.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index c28f8cc00d1cf..a9cc85882b315 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18076,16 +18076,20 @@ static void tg3_shutdown(struct pci_dev *pdev)
 	struct net_device *dev = pci_get_drvdata(pdev);
 	struct tg3 *tp = netdev_priv(dev);
 
+	tg3_reset_task_cancel(tp);
+
 	rtnl_lock();
+
 	netif_device_detach(dev);
 
 	if (netif_running(dev))
 		dev_close(dev);
 
-	if (system_state == SYSTEM_POWER_OFF)
-		tg3_power_down(tp);
+	tg3_power_down(tp);
 
 	rtnl_unlock();
+
+	pci_disable_device(pdev);
 }
 
 /**
-- 
2.35.1


  parent reply	other threads:[~2022-09-10 21:18 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-10 21:15 [PATCH AUTOSEL 5.19 01/38] Input: goodix - add support for GT1158 Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 02/38] platform/surface: aggregator_registry: Add support for Surface Laptop Go 2 Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 03/38] drm/msm/rd: Fix FIFO-full deadlock Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 04/38] peci: cpu: Fix use-after-free in adev_release() Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 05/38] hwmon: (pmbus) Use dev_err_probe() to filter -EPROBE_DEFER error messages Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 06/38] kvm: x86: mmu: Always flush TLBs when enabling dirty logging Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 07/38] dt-bindings: iio: gyroscope: bosch,bmg160: correct number of pins Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 08/38] HID: hidraw: fix memory leak in hidraw_release() Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 09/38] HID: asus: ROG NKey: Ignore portion of 0x5a report Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 10/38] HID: ishtp-hid-clientHID: ishtp-hid-client: Fix comment typo Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 11/38] hid: intel-ish-hid: ishtp: Fix ishtp client sending disordered message Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 12/38] HID: thrustmaster: Add sparco wheel and fix array length Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 13/38] HID: AMD_SFH: Add a DMI quirk entry for Chromebooks Sasha Levin
2022-09-10 21:15 ` [PATCH AUTOSEL 5.19 14/38] HID: add Lenovo Yoga C630 battery quirk Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 15/38] HID: Add Apple Touchbar on T2 Macs in hid_have_special_driver list Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 16/38] HID: intel-ish-hid: ipc: Add Meteor Lake PCI device ID Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 17/38] HID: nintendo: fix rumble worker null pointer deref Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 18/38] Bluetooth: MGMT: Fix Get Device Flags Sasha Levin
2022-09-10 21:16 ` Sasha Levin [this message]
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 20/38] r8152: add PID for the Lenovo OneLink+ Dock Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 21/38] gpio: mockup: remove gpio debugfs when remove device Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 22/38] ieee802154: cc2520: add rc code in cc2520_tx() Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 23/38] Input: iforce - add support for Boeder Force Feedback Wheel Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 24/38] drm/amdgpu: disable FRU access on special SIENNA CICHLID card Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 25/38] drm/amd/pm: use vbios carried pptable for all SMU13.0.7 SKUs Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 26/38] nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM610 Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 27/38] nvmet-tcp: fix unhandled tcp states in nvmet_tcp_state_change() Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 28/38] drm/amd/amdgpu: skip ucode loading if ucode_size == 0 Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 29/38] net: dsa: hellcreek: Print warning only once Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 30/38] perf/arm_pmu_platform: fix tests for platform_get_irq() failure Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 31/38] platform/x86: acer-wmi: Acer Aspire One AOD270/Packard Bell Dot keymap fixes Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 32/38] usb: storage: Add ASUS <0x0b05:0x1932> to IGNORE_UAS Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 33/38] misc: fastrpc: increase maximum session count Sasha Levin
2022-09-11  9:31   ` Johan Hovold
2022-09-12  8:57     ` Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 34/38] platform/x86: asus-wmi: Increase FAN_CURVE_BUF_LEN to 32 Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 35/38] LoongArch: Fix section mismatch due to acpi_os_ioremap() Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 36/38] LoongArch: Fix arch_remove_memory() undefined build error Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 37/38] gpio: 104-dio-48e: Make irq_chip immutable Sasha Levin
2022-09-10 21:16 ` [PATCH AUTOSEL 5.19 38/38] gpio: 104-idio-16: " Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220910211623.69825-19-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchan@broadcom.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=prashant@broadcom.com \
    --cc=siva.kallam@broadcom.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox