From: gregkh@linuxfoundation.org
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Maximilian Luz <luzmaximilian@gmail.com>,
Tsuchiya Yuto <kitakar@gmail.com>,
Kalle Valo <kvalo@codeaurora.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 11/49] mwifiex: pcie: skip cancel_work_sync() on reset failure path
Date: Wed, 10 Mar 2021 14:23:22 +0100 [thread overview]
Message-ID: <20210310132322.315497219@linuxfoundation.org> (raw)
In-Reply-To: <20210310132321.948258062@linuxfoundation.org>
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From: Tsuchiya Yuto <kitakar@gmail.com>
[ Upstream commit 4add4d988f95f47493500a7a19c623827061589b ]
If a reset is performed, but even the reset fails for some reasons (e.g.,
on Surface devices, the fw reset requires another quirks),
cancel_work_sync() hangs in mwifiex_cleanup_pcie().
# firmware went into a bad state
[...]
[ 1608.281690] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
[ 1608.282724] mwifiex_pcie 0000:03:00.0: rx_pending=0, tx_pending=1, cmd_pending=0
[ 1608.292400] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
[ 1608.292405] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
# reset performed after firmware went into a bad state
[ 1609.394320] mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
[ 1609.394335] mwifiex_pcie 0000:03:00.0: WLAN FW is active
# but even the reset failed
[ 1619.499049] mwifiex_pcie 0000:03:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0xe000
[ 1619.499094] mwifiex_pcie 0000:03:00.0: num_data_h2c_failure = 0
[ 1619.499103] mwifiex_pcie 0000:03:00.0: num_cmd_h2c_failure = 0
[ 1619.499110] mwifiex_pcie 0000:03:00.0: is_cmd_timedout = 1
[ 1619.499117] mwifiex_pcie 0000:03:00.0: num_tx_timeout = 0
[ 1619.499124] mwifiex_pcie 0000:03:00.0: last_cmd_index = 0
[ 1619.499133] mwifiex_pcie 0000:03:00.0: last_cmd_id: fa 00 07 01 07 01 07 01 07 01
[ 1619.499140] mwifiex_pcie 0000:03:00.0: last_cmd_act: 00 e0 00 00 00 00 00 00 00 00
[ 1619.499147] mwifiex_pcie 0000:03:00.0: last_cmd_resp_index = 3
[ 1619.499155] mwifiex_pcie 0000:03:00.0: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
[ 1619.499162] mwifiex_pcie 0000:03:00.0: last_event_index = 2
[ 1619.499169] mwifiex_pcie 0000:03:00.0: last_event: 58 00 58 00 58 00 58 00 58 00
[ 1619.499177] mwifiex_pcie 0000:03:00.0: data_sent=0 cmd_sent=1
[ 1619.499185] mwifiex_pcie 0000:03:00.0: ps_mode=0 ps_state=0
[ 1619.499215] mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
# mwifiex_pcie_work hang happening
[ 1823.233923] INFO: task kworker/3:1:44 blocked for more than 122 seconds.
[ 1823.233932] Tainted: G WC OE 5.10.0-rc1-1-mainline #1
[ 1823.233935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1823.233940] task:kworker/3:1 state:D stack: 0 pid: 44 ppid: 2 flags:0x00004000
[ 1823.233960] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
[ 1823.233965] Call Trace:
[ 1823.233981] __schedule+0x292/0x820
[ 1823.233990] schedule+0x45/0xe0
[ 1823.233995] schedule_timeout+0x11c/0x160
[ 1823.234003] wait_for_completion+0x9e/0x100
[ 1823.234012] __flush_work.isra.0+0x156/0x210
[ 1823.234018] ? flush_workqueue_prep_pwqs+0x130/0x130
[ 1823.234026] __cancel_work_timer+0x11e/0x1a0
[ 1823.234035] mwifiex_cleanup_pcie+0x28/0xd0 [mwifiex_pcie]
[ 1823.234049] mwifiex_free_adapter+0x24/0xe0 [mwifiex]
[ 1823.234060] _mwifiex_fw_dpc+0x294/0x560 [mwifiex]
[ 1823.234074] mwifiex_reinit_sw+0x15d/0x300 [mwifiex]
[ 1823.234080] mwifiex_pcie_reset_done+0x50/0x80 [mwifiex_pcie]
[ 1823.234087] pci_try_reset_function+0x5c/0x90
[ 1823.234094] process_one_work+0x1d6/0x3a0
[ 1823.234100] worker_thread+0x4d/0x3d0
[ 1823.234107] ? rescuer_thread+0x410/0x410
[ 1823.234112] kthread+0x142/0x160
[ 1823.234117] ? __kthread_bind_mask+0x60/0x60
[ 1823.234124] ret_from_fork+0x22/0x30
[...]
This is a deadlock caused by calling cancel_work_sync() in
mwifiex_cleanup_pcie():
- Device resets are done via mwifiex_pcie_card_reset()
- which schedules card->work to call mwifiex_pcie_card_reset_work()
- which calls pci_try_reset_function().
- This leads to mwifiex_pcie_reset_done() be called on the same workqueue,
which in turn calls
- mwifiex_reinit_sw() and that calls
- _mwifiex_fw_dpc().
The problem is now that _mwifiex_fw_dpc() calls mwifiex_free_adapter()
in case firmware initialization fails. That ends up calling
mwifiex_cleanup_pcie().
Note that all those calls are still running on the workqueue. So when
mwifiex_cleanup_pcie() now calls cancel_work_sync(), it's really waiting
on itself to complete, causing a deadlock.
This commit fixes the deadlock by skipping cancel_work_sync() on a reset
failure path.
After this commit, when reset fails, the following output is
expected to be shown:
kernel: mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
kernel: mwifiex: Failed to bring up adapter: -5
kernel: mwifiex_pcie 0000:03:00.0: reinit failed: -5
To reproduce this issue, for example, try putting the root port of wifi
into D3 (replace "00:1d.3" with your setup).
# put into D3 (root port)
sudo setpci -v -s 00:1d.3 CAP_PM+4.b=0b
Cc: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201028142346.18355-1-kitakar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 18 +++++++++++++++++-
drivers/net/wireless/marvell/mwifiex/pcie.h | 2 ++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 6a10ff0377a2..33cf952cc01d 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -526,6 +526,8 @@ static void mwifiex_pcie_reset_prepare(struct pci_dev *pdev)
clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+
+ card->pci_reset_ongoing = true;
}
/*
@@ -554,6 +556,8 @@ static void mwifiex_pcie_reset_done(struct pci_dev *pdev)
dev_err(&pdev->dev, "reinit failed: %d\n", ret);
else
mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+
+ card->pci_reset_ongoing = false;
}
static const struct pci_error_handlers mwifiex_pcie_err_handler = {
@@ -3142,7 +3146,19 @@ static void mwifiex_cleanup_pcie(struct mwifiex_adapter *adapter)
int ret;
u32 fw_status;
- cancel_work_sync(&card->work);
+ /* Perform the cancel_work_sync() only when we're not resetting
+ * the card. It's because that function never returns if we're
+ * in reset path. If we're here when resetting the card, it means
+ * that we failed to reset the card (reset failure path).
+ */
+ if (!card->pci_reset_ongoing) {
+ mwifiex_dbg(adapter, MSG, "performing cancel_work_sync()...\n");
+ cancel_work_sync(&card->work);
+ mwifiex_dbg(adapter, MSG, "cancel_work_sync() done\n");
+ } else {
+ mwifiex_dbg(adapter, MSG,
+ "skipped cancel_work_sync() because we're in card reset failure path\n");
+ }
ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status);
if (fw_status == FIRMWARE_READY_PCIE) {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.h b/drivers/net/wireless/marvell/mwifiex/pcie.h
index 843d57eda820..5ed613d65709 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.h
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.h
@@ -242,6 +242,8 @@ struct pcie_service_card {
struct mwifiex_msix_context share_irq_ctx;
struct work_struct work;
unsigned long work_flags;
+
+ bool pci_reset_ongoing;
};
static inline int
--
2.30.1
next prev parent reply other threads:[~2021-03-10 13:25 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-10 13:23 [PATCH 5.10 00/49] 5.10.23-rc1 review gregkh
2021-03-10 13:23 ` [PATCH 5.10 01/49] ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling gregkh
2021-03-10 13:23 ` [PATCH 5.10 02/49] [PATCH v2] ASoC: SOF: Intel: broadwell: fix mutual exclusion with catpt driver gregkh
2021-03-10 13:23 ` gregkh
2021-03-10 13:23 ` [PATCH 5.10 03/49] nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state gregkh
2021-03-10 13:23 ` [PATCH 5.10 04/49] parisc: Enable -mlong-calls gcc option with CONFIG_COMPILE_TEST gregkh
2021-03-10 13:23 ` [PATCH 5.10 05/49] arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+ gregkh
2021-03-10 13:23 ` [PATCH 5.10 06/49] btrfs: export and rename qgroup_reserve_meta gregkh
2021-03-10 13:23 ` [PATCH 5.10 07/49] btrfs: dont flush from btrfs_delayed_inode_reserve_metadata gregkh
2021-03-10 13:23 ` [PATCH 5.10 08/49] iommu/amd: Fix sleeping in atomic in increase_address_space() gregkh
2021-03-10 13:23 ` [PATCH 5.10 09/49] ASoC: intel: sof_rt5682: Add quirk for Dooly gregkh
2021-03-10 13:23 ` [PATCH 5.10 10/49] Bluetooth: btqca: Add valid le states quirk gregkh
2021-03-10 13:23 ` gregkh [this message]
2021-03-10 13:23 ` [PATCH 5.10 12/49] ASoC: Intel: sof_sdw: add quirk for new TigerLake-SDCA device gregkh
2021-03-10 13:23 ` [PATCH 5.10 13/49] bus: ti-sysc: Implement GPMC debug quirk to drop platform data gregkh
2021-03-10 13:23 ` [PATCH 5.10 14/49] net: ipa: ignore CHANNEL_NOT_RUNNING errors gregkh
2021-03-10 17:36 ` Naresh Kamboju
2021-03-10 17:48 ` Alex Elder
2021-03-10 18:00 ` Greg Kroah-Hartman
2021-03-10 18:27 ` Greg Kroah-Hartman
2021-03-10 13:23 ` [PATCH 5.10 15/49] platform/x86: acer-wmi: Cleanup ACER_CAP_FOO defines gregkh
2021-03-10 13:23 ` [PATCH 5.10 16/49] platform/x86: acer-wmi: Cleanup accelerometer device handling gregkh
2021-03-10 13:23 ` [PATCH 5.10 17/49] platform/x86: acer-wmi: Add new force_caps module parameter gregkh
2021-03-10 13:23 ` [PATCH 5.10 18/49] platform/x86: acer-wmi: Add ACER_CAP_SET_FUNCTION_MODE capability flag gregkh
2021-03-10 13:23 ` [PATCH 5.10 19/49] platform/x86: acer-wmi: Add support for SW_TABLET_MODE on Switch devices gregkh
2021-03-10 13:23 ` [PATCH 5.10 20/49] platform/x86: acer-wmi: Add ACER_CAP_KBD_DOCK quirk for the Aspire Switch 10E SW3-016 gregkh
2021-03-10 13:23 ` [PATCH 5.10 21/49] HID: mf: add support for 0079:1846 Mayflash/Dragonrise USB Gamecube Adapter gregkh
2021-03-10 13:23 ` [PATCH 5.10 22/49] media: cx23885: add more quirks for reset DMA on some AMD IOMMU gregkh
2021-03-10 13:23 ` [PATCH 5.10 23/49] ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807 gregkh
2021-03-10 20:04 ` Pavel Machek
2021-03-10 20:04 ` Pavel Machek
2021-03-11 12:37 ` Greg KH
2021-03-10 13:23 ` [PATCH 5.10 24/49] ASoC: Intel: bytcr_rt5640: Add quirk for ARCHOS Cesium 140 gregkh
2021-03-10 13:23 ` [PATCH 5.10 25/49] usb: cdns3: host: add .suspend_quirk for xhci-plat.c gregkh
2021-03-10 13:23 ` [PATCH 5.10 26/49] usb: cdns3: host: add xhci_plat_priv quirk XHCI_SKIP_PHY_INIT gregkh
2021-03-10 13:23 ` [PATCH 5.10 27/49] usb: cdns3: add quirk for enable runtime pm by default gregkh
2021-03-10 13:23 ` [PATCH 5.10 28/49] usb: cdns3: fix NULL pointer dereference on no platform data gregkh
2021-03-10 13:23 ` [PATCH 5.10 29/49] PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller gregkh
2021-03-10 13:23 ` [PATCH 5.10 30/49] KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check gregkh
2021-03-10 13:23 ` [PATCH 5.10 31/49] ASoC: Intel: sof_sdw: add missing TGL_HDMI quirk for Dell SKU 0A32 gregkh
2021-03-10 13:23 ` [PATCH 5.10 32/49] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL gregkh
2021-03-10 13:23 ` [PATCH 5.10 33/49] scsi: ufs: Add a quirk to permit overriding UniPro defaults gregkh
2021-03-10 13:23 ` [PATCH 5.10 34/49] misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eeprom gregkh
2021-03-10 13:23 ` [PATCH 5.10 35/49] scsi: ufs: Introduce a quirk to allow only page-aligned sg entries gregkh
2021-03-10 13:23 ` [PATCH 5.10 36/49] scsi: ufs: ufs-exynos: Apply vendor-specific values for three timeouts gregkh
2021-03-10 13:23 ` [PATCH 5.10 37/49] scsi: ufs: ufs-exynos: Use UFSHCD_QUIRK_ALIGN_SG_WITH_PAGE_SIZE gregkh
2021-03-10 13:23 ` [PATCH 5.10 38/49] drm/msm/a5xx: Remove overwriting A5XX_PC_DBG_ECO_CNTL register gregkh
2021-03-10 13:23 ` [PATCH 5.10 39/49] mmc: sdhci-of-dwcmshc: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN gregkh
2021-03-10 13:23 ` [PATCH 5.10 40/49] HID: i2c-hid: Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for ITE8568 EC on Voyo Winpad A15 gregkh
2021-03-10 13:23 ` [PATCH 5.10 41/49] ALSA: usb-audio: Add DJM750 to Pioneer mixer quirk gregkh
2021-03-10 13:23 ` [PATCH 5.10 42/49] ALSA: usb-audio: add mixer quirks for Pioneer DJM-900NXS2 gregkh
2021-03-10 13:23 ` [PATCH 5.10 43/49] PCI: cadence: Retrain Link to work around Gen2 training defect gregkh
2021-03-10 13:23 ` [PATCH 5.10 44/49] ASoC: Intel: sof_sdw: reorganize quirks by generation gregkh
2021-03-10 13:23 ` [PATCH 5.10 45/49] ASoC: Intel: sof_sdw: add quirk for HP Spectre x360 convertible gregkh
2021-03-10 13:23 ` [PATCH 5.10 46/49] scsi: ufs: Fix a duplicate dev quirk number gregkh
2021-03-10 13:23 ` [PATCH 5.10 47/49] KVM: SVM: Clear the CR4 register on reset gregkh
2021-03-10 13:23 ` [PATCH 5.10 48/49] nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST gregkh
2021-03-10 13:24 ` [PATCH 5.10 49/49] nvme-pci: add quirks for Lexar 256GB SSD gregkh
2021-03-10 17:38 ` [PATCH 5.10 00/49] 5.10.23-rc1 review Naresh Kamboju
2021-03-10 17:58 ` Guenter Roeck
2021-03-10 18:26 ` Greg Kroah-Hartman
2021-03-10 18:27 ` Greg Kroah-Hartman
2021-03-10 20:00 ` Shuah Khan
2021-03-11 2:38 ` Samuel Zou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210310132322.315497219@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=kitakar@gmail.com \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luzmaximilian@gmail.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.