From: gregkh@linuxfoundation.org
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Maximilian Luz <luzmaximilian@gmail.com>,
Tsuchiya Yuto <kitakar@gmail.com>,
Kalle Valo <kvalo@codeaurora.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 07/24] mwifiex: pcie: skip cancel_work_sync() on reset failure path
Date: Wed, 10 Mar 2021 14:24:19 +0100 [thread overview]
Message-ID: <20210310132320.772310804@linuxfoundation.org> (raw)
In-Reply-To: <20210310132320.550932445@linuxfoundation.org>
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
From: Tsuchiya Yuto <kitakar@gmail.com>
[ Upstream commit 4add4d988f95f47493500a7a19c623827061589b ]
If a reset is performed, but even the reset fails for some reasons (e.g.,
on Surface devices, the fw reset requires another quirks),
cancel_work_sync() hangs in mwifiex_cleanup_pcie().
# firmware went into a bad state
[...]
[ 1608.281690] mwifiex_pcie 0000:03:00.0: info: shutdown mwifiex...
[ 1608.282724] mwifiex_pcie 0000:03:00.0: rx_pending=0, tx_pending=1, cmd_pending=0
[ 1608.292400] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
[ 1608.292405] mwifiex_pcie 0000:03:00.0: PREP_CMD: card is removed
# reset performed after firmware went into a bad state
[ 1609.394320] mwifiex_pcie 0000:03:00.0: WLAN FW already running! Skip FW dnld
[ 1609.394335] mwifiex_pcie 0000:03:00.0: WLAN FW is active
# but even the reset failed
[ 1619.499049] mwifiex_pcie 0000:03:00.0: mwifiex_cmd_timeout_func: Timeout cmd id = 0xfa, act = 0xe000
[ 1619.499094] mwifiex_pcie 0000:03:00.0: num_data_h2c_failure = 0
[ 1619.499103] mwifiex_pcie 0000:03:00.0: num_cmd_h2c_failure = 0
[ 1619.499110] mwifiex_pcie 0000:03:00.0: is_cmd_timedout = 1
[ 1619.499117] mwifiex_pcie 0000:03:00.0: num_tx_timeout = 0
[ 1619.499124] mwifiex_pcie 0000:03:00.0: last_cmd_index = 0
[ 1619.499133] mwifiex_pcie 0000:03:00.0: last_cmd_id: fa 00 07 01 07 01 07 01 07 01
[ 1619.499140] mwifiex_pcie 0000:03:00.0: last_cmd_act: 00 e0 00 00 00 00 00 00 00 00
[ 1619.499147] mwifiex_pcie 0000:03:00.0: last_cmd_resp_index = 3
[ 1619.499155] mwifiex_pcie 0000:03:00.0: last_cmd_resp_id: 07 81 07 81 07 81 07 81 07 81
[ 1619.499162] mwifiex_pcie 0000:03:00.0: last_event_index = 2
[ 1619.499169] mwifiex_pcie 0000:03:00.0: last_event: 58 00 58 00 58 00 58 00 58 00
[ 1619.499177] mwifiex_pcie 0000:03:00.0: data_sent=0 cmd_sent=1
[ 1619.499185] mwifiex_pcie 0000:03:00.0: ps_mode=0 ps_state=0
[ 1619.499215] mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
# mwifiex_pcie_work hang happening
[ 1823.233923] INFO: task kworker/3:1:44 blocked for more than 122 seconds.
[ 1823.233932] Tainted: G WC OE 5.10.0-rc1-1-mainline #1
[ 1823.233935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1823.233940] task:kworker/3:1 state:D stack: 0 pid: 44 ppid: 2 flags:0x00004000
[ 1823.233960] Workqueue: events mwifiex_pcie_work [mwifiex_pcie]
[ 1823.233965] Call Trace:
[ 1823.233981] __schedule+0x292/0x820
[ 1823.233990] schedule+0x45/0xe0
[ 1823.233995] schedule_timeout+0x11c/0x160
[ 1823.234003] wait_for_completion+0x9e/0x100
[ 1823.234012] __flush_work.isra.0+0x156/0x210
[ 1823.234018] ? flush_workqueue_prep_pwqs+0x130/0x130
[ 1823.234026] __cancel_work_timer+0x11e/0x1a0
[ 1823.234035] mwifiex_cleanup_pcie+0x28/0xd0 [mwifiex_pcie]
[ 1823.234049] mwifiex_free_adapter+0x24/0xe0 [mwifiex]
[ 1823.234060] _mwifiex_fw_dpc+0x294/0x560 [mwifiex]
[ 1823.234074] mwifiex_reinit_sw+0x15d/0x300 [mwifiex]
[ 1823.234080] mwifiex_pcie_reset_done+0x50/0x80 [mwifiex_pcie]
[ 1823.234087] pci_try_reset_function+0x5c/0x90
[ 1823.234094] process_one_work+0x1d6/0x3a0
[ 1823.234100] worker_thread+0x4d/0x3d0
[ 1823.234107] ? rescuer_thread+0x410/0x410
[ 1823.234112] kthread+0x142/0x160
[ 1823.234117] ? __kthread_bind_mask+0x60/0x60
[ 1823.234124] ret_from_fork+0x22/0x30
[...]
This is a deadlock caused by calling cancel_work_sync() in
mwifiex_cleanup_pcie():
- Device resets are done via mwifiex_pcie_card_reset()
- which schedules card->work to call mwifiex_pcie_card_reset_work()
- which calls pci_try_reset_function().
- This leads to mwifiex_pcie_reset_done() be called on the same workqueue,
which in turn calls
- mwifiex_reinit_sw() and that calls
- _mwifiex_fw_dpc().
The problem is now that _mwifiex_fw_dpc() calls mwifiex_free_adapter()
in case firmware initialization fails. That ends up calling
mwifiex_cleanup_pcie().
Note that all those calls are still running on the workqueue. So when
mwifiex_cleanup_pcie() now calls cancel_work_sync(), it's really waiting
on itself to complete, causing a deadlock.
This commit fixes the deadlock by skipping cancel_work_sync() on a reset
failure path.
After this commit, when reset fails, the following output is
expected to be shown:
kernel: mwifiex_pcie 0000:03:00.0: info: _mwifiex_fw_dpc: unregister device
kernel: mwifiex: Failed to bring up adapter: -5
kernel: mwifiex_pcie 0000:03:00.0: reinit failed: -5
To reproduce this issue, for example, try putting the root port of wifi
into D3 (replace "00:1d.3" with your setup).
# put into D3 (root port)
sudo setpci -v -s 00:1d.3 CAP_PM+4.b=0b
Cc: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201028142346.18355-1-kitakar@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 18 +++++++++++++++++-
drivers/net/wireless/marvell/mwifiex/pcie.h | 2 ++
2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index fc1706d0647d..58c9623c3a91 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -377,6 +377,8 @@ static void mwifiex_pcie_reset_prepare(struct pci_dev *pdev)
clear_bit(MWIFIEX_IFACE_WORK_DEVICE_DUMP, &card->work_flags);
clear_bit(MWIFIEX_IFACE_WORK_CARD_RESET, &card->work_flags);
mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+
+ card->pci_reset_ongoing = true;
}
/*
@@ -405,6 +407,8 @@ static void mwifiex_pcie_reset_done(struct pci_dev *pdev)
dev_err(&pdev->dev, "reinit failed: %d\n", ret);
else
mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+
+ card->pci_reset_ongoing = false;
}
static const struct pci_error_handlers mwifiex_pcie_err_handler = {
@@ -2995,7 +2999,19 @@ static void mwifiex_cleanup_pcie(struct mwifiex_adapter *adapter)
int ret;
u32 fw_status;
- cancel_work_sync(&card->work);
+ /* Perform the cancel_work_sync() only when we're not resetting
+ * the card. It's because that function never returns if we're
+ * in reset path. If we're here when resetting the card, it means
+ * that we failed to reset the card (reset failure path).
+ */
+ if (!card->pci_reset_ongoing) {
+ mwifiex_dbg(adapter, MSG, "performing cancel_work_sync()...\n");
+ cancel_work_sync(&card->work);
+ mwifiex_dbg(adapter, MSG, "cancel_work_sync() done\n");
+ } else {
+ mwifiex_dbg(adapter, MSG,
+ "skipped cancel_work_sync() because we're in card reset failure path\n");
+ }
ret = mwifiex_read_reg(adapter, reg->fw_status, &fw_status);
if (fw_status == FIRMWARE_READY_PCIE) {
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.h b/drivers/net/wireless/marvell/mwifiex/pcie.h
index f7ce9b6db6b4..72d0c01ff359 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.h
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.h
@@ -391,6 +391,8 @@ struct pcie_service_card {
struct mwifiex_msix_context share_irq_ctx;
struct work_struct work;
unsigned long work_flags;
+
+ bool pci_reset_ongoing;
};
static inline int
--
2.30.1
next prev parent reply other threads:[~2021-03-10 13:27 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-10 13:24 [PATCH 5.4 00/24] 5.4.105-rc1 review gregkh
2021-03-10 13:24 ` [PATCH 5.4 01/24] net: dsa: add GRO support via gro_cells gregkh
2021-03-10 13:24 ` [PATCH 5.4 02/24] dm table: fix iterate_devices based device capability checks gregkh
2021-03-10 13:24 ` [PATCH 5.4 03/24] dm table: fix DAX " gregkh
2021-03-10 13:24 ` [PATCH 5.4 04/24] dm table: fix zoned " gregkh
2021-03-10 13:24 ` [PATCH 5.4 05/24] ACPICA: Fix race in generic_serial_bus (I2C) and GPIO op_region parameter handling gregkh
2021-03-10 13:24 ` [PATCH 5.4 06/24] iommu/amd: Fix sleeping in atomic in increase_address_space() gregkh
2021-03-10 13:24 ` gregkh [this message]
2021-03-10 13:24 ` [PATCH 5.4 08/24] platform/x86: acer-wmi: Cleanup ACER_CAP_FOO defines gregkh
2021-03-10 13:24 ` [PATCH 5.4 09/24] platform/x86: acer-wmi: Cleanup accelerometer device handling gregkh
2021-03-10 13:24 ` [PATCH 5.4 10/24] platform/x86: acer-wmi: Add new force_caps module parameter gregkh
2021-03-10 13:24 ` [PATCH 5.4 11/24] platform/x86: acer-wmi: Add ACER_CAP_SET_FUNCTION_MODE capability flag gregkh
2021-03-10 13:24 ` [PATCH 5.4 12/24] platform/x86: acer-wmi: Add support for SW_TABLET_MODE on Switch devices gregkh
2021-03-10 13:24 ` [PATCH 5.4 13/24] platform/x86: acer-wmi: Add ACER_CAP_KBD_DOCK quirk for the Aspire Switch 10E SW3-016 gregkh
2021-03-10 13:24 ` [PATCH 5.4 14/24] HID: mf: add support for 0079:1846 Mayflash/Dragonrise USB Gamecube Adapter gregkh
2021-03-10 13:24 ` [PATCH 5.4 15/24] media: cx23885: add more quirks for reset DMA on some AMD IOMMU gregkh
2021-03-10 13:24 ` [PATCH 5.4 16/24] ACPI: video: Add DMI quirk for GIGABYTE GB-BXBT-2807 gregkh
2021-03-10 13:24 ` [PATCH 5.4 17/24] ASoC: Intel: bytcr_rt5640: Add quirk for ARCHOS Cesium 140 gregkh
2021-03-10 13:24 ` [PATCH 5.4 18/24] PCI: Add function 1 DMA alias quirk for Marvell 9215 SATA controller gregkh
2021-03-10 13:24 ` [PATCH 5.4 19/24] misc: eeprom_93xx46: Add quirk to support Microchip 93LC46B eeprom gregkh
2021-03-10 13:24 ` [PATCH 5.4 20/24] drm/msm/a5xx: Remove overwriting A5XX_PC_DBG_ECO_CNTL register gregkh
2021-03-10 13:24 ` [PATCH 5.4 21/24] mmc: sdhci-of-dwcmshc: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN gregkh
2021-03-10 13:24 ` [PATCH 5.4 22/24] HID: i2c-hid: Add I2C_HID_QUIRK_NO_IRQ_AFTER_RESET for ITE8568 EC on Voyo Winpad A15 gregkh
2021-03-10 13:24 ` [PATCH 5.4 23/24] nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST gregkh
2021-03-10 13:24 ` [PATCH 5.4 24/24] nvme-pci: add quirks for Lexar 256GB SSD gregkh
2021-03-10 22:00 ` [PATCH 5.4 00/24] 5.4.105-rc1 review Shuah Khan
2021-03-10 23:52 ` Guenter Roeck
2021-03-11 4:05 ` Ross Schmidt
2021-03-11 4:19 ` Florian Fainelli
2021-03-11 13:08 ` Greg KH
2021-03-11 17:23 ` Florian Fainelli
2021-03-11 17:40 ` Greg KH
2021-03-11 17:41 ` Florian Fainelli
2021-03-12 12:54 ` Alexander Lobakin
2021-03-15 9:50 ` Pali Rohár
2021-03-23 17:20 ` Florian Fainelli
2021-03-23 23:32 ` Pali Rohár
2021-03-24 10:26 ` Greg KH
2021-03-12 22:55 ` Florian Fainelli
2021-03-13 13:42 ` Greg KH
2021-03-13 22:30 ` Pavel Machek
2021-03-11 7:34 ` Naresh Kamboju
2021-03-11 7:59 ` Jon Hunter
2021-03-11 9:42 ` Samuel Zou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210310132320.772310804@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=kitakar@gmail.com \
--cc=kvalo@codeaurora.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luzmaximilian@gmail.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.