From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Thinh Tran <thinhtr@linux.vnet.ibm.com>,
Venkata Sai Duggi <venkata.sai.duggi@ibm.com>,
David Christensen <drc@linux.vnet.ibm.com>,
Michael Chan <michael.chan@broadcom.com>,
Jakub Kicinski <kuba@kernel.org>, Sasha Levin <sashal@kernel.org>,
pavan.chebbi@broadcom.com, mchan@broadcom.com,
davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 6.6 27/47] net/tg3: fix race condition in tg3_reset_task()
Date: Mon, 11 Dec 2023 08:50:28 -0500 [thread overview]
Message-ID: <20231211135147.380223-27-sashal@kernel.org> (raw)
In-Reply-To: <20231211135147.380223-1-sashal@kernel.org>
From: Thinh Tran <thinhtr@linux.vnet.ibm.com>
[ Upstream commit 16b55b1f2269962fb6b5154b8bf43f37c9a96637 ]
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_reset_task() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Adding the same checking in tg3_dump_state() to avoid dumping all
device registers when the PCI channel is not in the normal state.
Signed-off-by: Thinh Tran <thinhtr@linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi@ibm.com>
Reviewed-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/broadcom/tg3.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 22b00912f7ac8..95d476ec14f5e 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6439,6 +6439,14 @@ static void tg3_dump_state(struct tg3 *tp)
int i;
u32 *regs;
+ /* If it is a PCI error, all registers will be 0xffff,
+ * we don't dump them out, just report the error and return
+ */
+ if (tp->pdev->error_state != pci_channel_io_normal) {
+ netdev_err(tp->dev, "PCI channel ERROR!\n");
+ return;
+ }
+
regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
if (!regs)
return;
@@ -11170,7 +11178,8 @@ static void tg3_reset_task(struct work_struct *work)
rtnl_lock();
tg3_full_lock(tp, 0);
- if (tp->pcierr_recovery || !netif_running(tp->dev)) {
+ if (tp->pcierr_recovery || !netif_running(tp->dev) ||
+ tp->pdev->error_state != pci_channel_io_normal) {
tg3_flag_clear(tp, RESET_TASK_PENDING);
tg3_full_unlock(tp);
rtnl_unlock();
--
2.42.0
next prev parent reply other threads:[~2023-12-11 13:54 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-11 13:50 [PATCH AUTOSEL 6.6 01/47] hwtracing: hisi_ptt: Handle the interrupt in hardirq context Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 02/47] hwtracing: hisi_ptt: Don't try to attach a task Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 03/47] ASoC: amd: yc: Add HP 255 G10 into quirk table Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 04/47] ASoC: wm8974: Correct boost mixer inputs Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 05/47] arm64: dts: rockchip: fix rk356x pcie msg interrupt name Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 06/47] ASoC: Intel: Skylake: Fix mem leak in few functions Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 07/47] ASoC: nau8822: Fix incorrect type in assignment and cast to restricted __be16 Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 08/47] ASoC: SOF: topology: Fix mem leak in sof_dai_load() Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 09/47] ASoC: Intel: Skylake: mem leak in skl register function Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 10/47] ASoC: cs43130: Fix the position of const qualifier Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 11/47] ASoC: cs43130: Fix incorrect frame delay configuration Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 12/47] ASoC: fsl_xcvr: Enable 2 * TX bit clock for spdif only case Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 13/47] ASoC: rt5650: add mutex to avoid the jack detection failure Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 14/47] ASoC: SOF: mediatek: mt8186: Add Google Steelix topology compatible Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 15/47] ASoC: fsl_xcvr: refine the requested phy clock frequency Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 16/47] ASoC: Intel: skl_hda_dsp_generic: Drop HDMI routes when HDMI is not available Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 17/47] ASoC: SOF: ipc4-topology: Add core_mask in struct snd_sof_pipeline Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 18/47] ASoC: SOF: sof-audio: Modify logic for enabling/disabling topology cores Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 19/47] nouveau/tu102: flush all pdbs on vmm flush Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 20/47] ASoC: amd: yc: Add DMI entry to support System76 Pangolin 13 Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 21/47] ASoC: hdac_hda: Conditionally register dais for HDMI and Analog Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 22/47] ASoC: SOF: ipc4-topology: Correct data structures for the SRC module Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 23/47] ASoC: SOF: ipc4-topology: Correct data structures for the GAIN module Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 24/47] pds_vdpa: fix up format-truncation complaint Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 25/47] pds_vdpa: clear config callback when status goes to 0 Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 26/47] pds_vdpa: set features order Sasha Levin
2023-12-11 13:50 ` Sasha Levin [this message]
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 28/47] ASoC: da7219: Support low DC impedance headset Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 29/47] ASoC: ops: add correct range check for limiting volume Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 30/47] nvme: introduce helper function to get ctrl state Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 31/47] nvme: ensure reset state check ordering Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 32/47] nvme-ioctl: move capable() admin check to the end Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 33/47] nvme: prevent potential spectre v1 gadget Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 34/47] nvme: fix deadlock between reset and scan Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 35/47] arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 36/47] mips/smp: Call rcutree_report_cpu_starting() earlier Sasha Levin
2023-12-12 1:45 ` Huacai Chen
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 37/47] r8152: add vendor/device ID pair for ASUS USB-C2500 Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 38/47] drm/amd/display: Use channel_width = 2 for vram table 3.0 Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 39/47] drm/amd/display: Add monitor patch for specific eDP Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 40/47] drm/amdgpu: Add NULL checks for function pointers Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 41/47] drm/exynos: fix a potential error pointer dereference Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 42/47] drm/exynos: fix a wrong error checking Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 43/47] ALSA: pcmtest: stop timer before buffer is released Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 44/47] hwmon: (corsair-psu) Fix probe when built-in Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 45/47] LoongArch: Apply dynamic relocations for LLD Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 46/47] LoongArch: Set unwind stack type to unknown rather than set error flag Sasha Levin
2023-12-11 13:50 ` [PATCH AUTOSEL 6.6 47/47] LoongArch: Preserve syscall nr across execve() Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231211135147.380223-27-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=davem@davemloft.net \
--cc=drc@linux.vnet.ibm.com \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchan@broadcom.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
--cc=stable@vger.kernel.org \
--cc=thinhtr@linux.vnet.ibm.com \
--cc=venkata.sai.duggi@ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox