* [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
@ 2024-05-29 3:44 Aaradhana Sahu
2024-05-29 16:32 ` Jeff Johnson
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Aaradhana Sahu @ 2024-05-29 3:44 UTC (permalink / raw)
To: ath12k; +Cc: linux-wireless, Aaradhana Sahu
Whenever firmware is crashed in split-phy below WARN_ON triggered:
? __warn+0x7b/0x1a0
? drv_stop+0x1eb/0x210 [mac80211]
? report_bug+0x10b/0x200
? handle_bug+0x3f/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? drv_stop+0x1eb/0x210 [mac80211]
ieee80211_do_stop+0x5ba/0x850 [mac80211]
ieee80211_stop+0x51/0x180 [mac80211]
__dev_close_many+0xb3/0x130
dev_close_many+0xa3/0x180
? lock_release+0xde/0x420
dev_close.part.147+0x5f/0xa0
cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
ieee80211_restart_work+0xf9/0x130 [mac80211]
process_scheduled_works+0x377/0x6f0
The sequence of WARN_ON is:
Thread 1:
-Firmware crash calls ath12k_core_reset().
-Call ieee80211_restart_hw() inside
ath12k_core_post_reconfigure_recovery() which schedules worker
for both hardware.
-Wait for completion of ab->recovery_start.
Thread 2 (worker thread):
-One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
-Hardware is waiting for ab->reconfigure_complete but at this time
recovery_start_count value is 1 because another worker thread
(local->restart_work) is still waiting for rtnl_lock().
recovery_start_count is not equal to number of radios
(2 in split-phy). So ab->recovery_start complete does not set
due to this, thread 1 is still waiting and not able to perform
hif power down up and firmware reload.
-Wait timeout happens for ab->reconfigure_complete and comeback
to caller (ath12k_mac_op_start()) and sends WMI command to
crashed firmware and gets error.
-This returns error to drv_start() and local->started is set to false.
-Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
WARN_ON as local->started is false.
To fix this issue call ieee80211_restart_hw() after firmware has been
reloaded. Now, each hardware can send WMI command to firmware
successfully. With this fix we don't need to wait for
ab->recovery_start completion so remove
ath12k_mac_wait_reconfigure().
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>
---
drivers/net/wireless/ath/ath12k/core.c | 21 ++++++++-------------
drivers/net/wireless/ath/ath12k/core.h | 3 ---
drivers/net/wireless/ath/ath12k/mac.c | 23 -----------------------
3 files changed, 8 insertions(+), 39 deletions(-)
diff --git a/drivers/net/wireless/ath/ath12k/core.c b/drivers/net/wireless/ath/ath12k/core.c
index e7f628e935e4..9fac8b1b064e 100644
--- a/drivers/net/wireless/ath/ath12k/core.c
+++ b/drivers/net/wireless/ath/ath12k/core.c
@@ -1059,8 +1059,6 @@ static void ath12k_core_post_reconfigure_recovery(struct ath12k_base *ab)
mutex_unlock(&ar->conf_mutex);
}
- /* Restart after all the link/radio halt */
- ieee80211_restart_hw(ah->hw);
break;
case ATH12K_HW_STATE_OFF:
ath12k_warn(ab,
@@ -1087,7 +1085,8 @@ static void ath12k_core_post_reconfigure_recovery(struct ath12k_base *ab)
static void ath12k_core_restart(struct work_struct *work)
{
struct ath12k_base *ab = container_of(work, struct ath12k_base, restart_work);
- int ret;
+ struct ath12k_hw *ah;
+ int ret, i;
ret = ath12k_core_reconfigure_on_crash(ab);
if (ret) {
@@ -1095,8 +1094,12 @@ static void ath12k_core_restart(struct work_struct *work)
return;
}
- if (ab->is_reset)
- complete_all(&ab->reconfigure_complete);
+ if (ab->is_reset) {
+ for (i = 0; i < ab->num_hw; i++) {
+ ah = ab->ah[i];
+ ieee80211_restart_hw(ah->hw);
+ }
+ }
complete(&ab->restart_completed);
}
@@ -1150,20 +1153,14 @@ static void ath12k_core_reset(struct work_struct *work)
ath12k_dbg(ab, ATH12K_DBG_BOOT, "reset starting\n");
ab->is_reset = true;
- atomic_set(&ab->recovery_start_count, 0);
- reinit_completion(&ab->recovery_start);
atomic_set(&ab->recovery_count, 0);
ath12k_core_pre_reconfigure_recovery(ab);
- reinit_completion(&ab->reconfigure_complete);
ath12k_core_post_reconfigure_recovery(ab);
ath12k_dbg(ab, ATH12K_DBG_BOOT, "waiting recovery start...\n");
- time_left = wait_for_completion_timeout(&ab->recovery_start,
- ATH12K_RECOVER_START_TIMEOUT_HZ);
-
ath12k_hif_irq_disable(ab);
ath12k_hif_ce_irq_disable(ab);
@@ -1246,8 +1243,6 @@ struct ath12k_base *ath12k_core_alloc(struct device *dev, size_t priv_size,
mutex_init(&ab->core_lock);
spin_lock_init(&ab->base_lock);
init_completion(&ab->reset_complete);
- init_completion(&ab->reconfigure_complete);
- init_completion(&ab->recovery_start);
INIT_LIST_HEAD(&ab->peers);
init_waitqueue_head(&ab->peer_mapping_wq);
diff --git a/drivers/net/wireless/ath/ath12k/core.h b/drivers/net/wireless/ath/ath12k/core.h
index 7d20b09c52e6..96fafa0e05dc 100644
--- a/drivers/net/wireless/ath/ath12k/core.h
+++ b/drivers/net/wireless/ath/ath12k/core.h
@@ -846,11 +846,8 @@ struct ath12k_base {
struct work_struct reset_work;
atomic_t reset_count;
atomic_t recovery_count;
- atomic_t recovery_start_count;
bool is_reset;
struct completion reset_complete;
- struct completion reconfigure_complete;
- struct completion recovery_start;
/* continuous recovery fail count */
atomic_t fail_cont_count;
unsigned long reset_fail_timeout;
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 784964ae03ec..33616ab795af 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -5834,28 +5834,6 @@ static int ath12k_mac_config_mon_status_default(struct ath12k *ar, bool enable)
/* TODO: Need to support new monitor mode */
}
-static void ath12k_mac_wait_reconfigure(struct ath12k_base *ab)
-{
- int recovery_start_count;
-
- if (!ab->is_reset)
- return;
-
- recovery_start_count = atomic_inc_return(&ab->recovery_start_count);
-
- ath12k_dbg(ab, ATH12K_DBG_MAC, "recovery start count %d\n", recovery_start_count);
-
- if (recovery_start_count == ab->num_radios) {
- complete(&ab->recovery_start);
- ath12k_dbg(ab, ATH12K_DBG_MAC, "recovery started success\n");
- }
-
- ath12k_dbg(ab, ATH12K_DBG_MAC, "waiting reconfigure...\n");
-
- wait_for_completion_timeout(&ab->reconfigure_complete,
- ATH12K_RECONFIGURE_TIMEOUT_HZ);
-}
-
static int ath12k_mac_start(struct ath12k *ar)
{
struct ath12k_hw *ah = ar->ah;
@@ -5987,7 +5965,6 @@ static int ath12k_mac_op_start(struct ieee80211_hw *hw)
break;
case ATH12K_HW_STATE_RESTARTING:
ah->state = ATH12K_HW_STATE_RESTARTED;
- ath12k_mac_wait_reconfigure(ah->ab);
break;
case ATH12K_HW_STATE_RESTARTED:
case ATH12K_HW_STATE_WEDGED:
base-commit: 0442aec67bbd9cbf93bf7f6ee59c9bf5348b9249
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-05-29 3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
@ 2024-05-29 16:32 ` Jeff Johnson
2024-06-10 17:10 ` Kalle Valo
2024-06-17 14:45 ` Kalle Valo
2 siblings, 0 replies; 7+ messages in thread
From: Jeff Johnson @ 2024-05-29 16:32 UTC (permalink / raw)
To: Aaradhana Sahu, ath12k; +Cc: linux-wireless
On 5/28/2024 8:44 PM, Aaradhana Sahu wrote:
> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>
> ? __warn+0x7b/0x1a0
> ? drv_stop+0x1eb/0x210 [mac80211]
> ? report_bug+0x10b/0x200
> ? handle_bug+0x3f/0x70
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? drv_stop+0x1eb/0x210 [mac80211]
> ieee80211_do_stop+0x5ba/0x850 [mac80211]
> ieee80211_stop+0x51/0x180 [mac80211]
> __dev_close_many+0xb3/0x130
> dev_close_many+0xa3/0x180
> ? lock_release+0xde/0x420
> dev_close.part.147+0x5f/0xa0
> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
> ieee80211_restart_work+0xf9/0x130 [mac80211]
> process_scheduled_works+0x377/0x6f0
>
> The sequence of WARN_ON is:
> Thread 1:
> -Firmware crash calls ath12k_core_reset().
> -Call ieee80211_restart_hw() inside
> ath12k_core_post_reconfigure_recovery() which schedules worker
> for both hardware.
> -Wait for completion of ab->recovery_start.
>
> Thread 2 (worker thread):
> -One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
> calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
> -Hardware is waiting for ab->reconfigure_complete but at this time
> recovery_start_count value is 1 because another worker thread
> (local->restart_work) is still waiting for rtnl_lock().
> recovery_start_count is not equal to number of radios
> (2 in split-phy). So ab->recovery_start complete does not set
> due to this, thread 1 is still waiting and not able to perform
> hif power down up and firmware reload.
> -Wait timeout happens for ab->reconfigure_complete and comeback
> to caller (ath12k_mac_op_start()) and sends WMI command to
> crashed firmware and gets error.
> -This returns error to drv_start() and local->started is set to false.
> -Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
> inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
> WARN_ON as local->started is false.
>
> To fix this issue call ieee80211_restart_hw() after firmware has been
> reloaded. Now, each hardware can send WMI command to firmware
> successfully. With this fix we don't need to wait for
> ab->recovery_start completion so remove
> ath12k_mac_wait_reconfigure().
>
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
> Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
>
> Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>
Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-05-29 3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
2024-05-29 16:32 ` Jeff Johnson
@ 2024-06-10 17:10 ` Kalle Valo
2024-06-13 5:01 ` Aaradhana Sahu
2024-06-17 14:45 ` Kalle Valo
2 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2024-06-10 17:10 UTC (permalink / raw)
To: Aaradhana Sahu; +Cc: ath12k, linux-wireless
Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>
> ? __warn+0x7b/0x1a0
> ? drv_stop+0x1eb/0x210 [mac80211]
> ? report_bug+0x10b/0x200
> ? handle_bug+0x3f/0x70
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? drv_stop+0x1eb/0x210 [mac80211]
> ieee80211_do_stop+0x5ba/0x850 [mac80211]
> ieee80211_stop+0x51/0x180 [mac80211]
> __dev_close_many+0xb3/0x130
> dev_close_many+0xa3/0x180
> ? lock_release+0xde/0x420
> dev_close.part.147+0x5f/0xa0
> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
> ieee80211_restart_work+0xf9/0x130 [mac80211]
> process_scheduled_works+0x377/0x6f0
This is just the stack trace, not the full warning. If you send me the
full warning I can add it to the commit message. Also it would be always
good to identify what warning it is exactly as line numbers can change
etc.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-06-10 17:10 ` Kalle Valo
@ 2024-06-13 5:01 ` Aaradhana Sahu
2024-06-13 6:07 ` Kalle Valo
0 siblings, 1 reply; 7+ messages in thread
From: Aaradhana Sahu @ 2024-06-13 5:01 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath12k, linux-wireless
On 6/10/2024 10:40 PM, Kalle Valo wrote:
> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>
>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>
>> ? __warn+0x7b/0x1a0
>> ? drv_stop+0x1eb/0x210 [mac80211]
>> ? report_bug+0x10b/0x200
>> ? handle_bug+0x3f/0x70
>> ? exc_invalid_op+0x13/0x60
>> ? asm_exc_invalid_op+0x16/0x20
>> ? drv_stop+0x1eb/0x210 [mac80211]
>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>> ieee80211_stop+0x51/0x180 [mac80211]
>> __dev_close_many+0xb3/0x130
>> dev_close_many+0xa3/0x180
>> ? lock_release+0xde/0x420
>> dev_close.part.147+0x5f/0xa0
>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>> process_scheduled_works+0x377/0x6f0
>
> This is just the stack trace, not the full warning. If you send me the
> full warning I can add it to the commit message. Also it would be always
> good to identify what warning it is exactly as line numbers can change
> etc.
>
Sure, the full warning is given below:
[ 364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41 drv_stop+0xac/0xbc
[ 364.716875] Modules linked in: ath12k qmi_helpers
[ 364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W 6.9.0-next-20240520-00113-gd981a3784e15 #39
[ 364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
[ 364.739965] Workqueue: events_freezable ieee80211_restart_work
[ 364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 364.752897] pc : drv_stop+0xac/0xbc
[ 364.759752] lr : ieee80211_stop_device+0x54/0x64
[ 364.763226] sp : ffff8000848dbb20
[ 364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
[ 364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
[ 364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
[ 364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
[ 364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
[ 364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
[ 364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
[ 364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
[ 364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
[ 364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
[ 364.835364] Call trace:
[ 364.842478] drv_stop+0xac/0xbc
[ 364.844734] ieee80211_stop_device+0x54/0x64
[ 364.847860] ieee80211_do_stop+0x5a0/0x790
[ 364.852375] ieee80211_stop+0x4c/0x178
[ 364.856280] __dev_close_many+0xb0/0x150
[ 364.860014] dev_close_many+0x88/0x130
[ 364.864092] dev_close.part.171+0x44/0x74
[ 364.867653] dev_close+0x1c/0x28
[ 364.871732] cfg80211_shutdown_all_interfaces+0x44/0xfc
[ 364.875031] ieee80211_restart_work+0xfc/0x14c
[ 364.879979] process_scheduled_works+0x18c/0x2dc
[ 364.884494] worker_thread+0x13c/0x314
[ 364.889266] kthread+0x118/0x124
[ 364.892825] ret_from_fork+0x10/0x20
[ 364.896211] ---[ end trace 0000000000000000 ]---
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-06-13 5:01 ` Aaradhana Sahu
@ 2024-06-13 6:07 ` Kalle Valo
2024-06-13 6:11 ` Aaradhana Sahu
0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2024-06-13 6:07 UTC (permalink / raw)
To: Aaradhana Sahu; +Cc: ath12k, linux-wireless
Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
> On 6/10/2024 10:40 PM, Kalle Valo wrote:
>> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>>
>>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>>
>>> ? __warn+0x7b/0x1a0
>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>> ? report_bug+0x10b/0x200
>>> ? handle_bug+0x3f/0x70
>>> ? exc_invalid_op+0x13/0x60
>>> ? asm_exc_invalid_op+0x16/0x20
>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>>> ieee80211_stop+0x51/0x180 [mac80211]
>>> __dev_close_many+0xb3/0x130
>>> dev_close_many+0xa3/0x180
>>> ? lock_release+0xde/0x420
>>> dev_close.part.147+0x5f/0xa0
>>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>>> process_scheduled_works+0x377/0x6f0
>>
>> This is just the stack trace, not the full warning. If you send me the
>> full warning I can add it to the commit message. Also it would be always
>> good to identify what warning it is exactly as line numbers can change
>> etc.
>>
>
> Sure, the full warning is given below:
>
> [ 364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41
> drv_stop+0xac/0xbc
> [ 364.716875] Modules linked in: ath12k qmi_helpers
> [ 364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W
> 6.9.0-next-20240520-00113-gd981a3784e15 #39
> [ 364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
> [ 364.739965] Workqueue: events_freezable ieee80211_restart_work
> [ 364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 364.752897] pc : drv_stop+0xac/0xbc
> [ 364.759752] lr : ieee80211_stop_device+0x54/0x64
> [ 364.763226] sp : ffff8000848dbb20
> [ 364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
> [ 364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
> [ 364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
> [ 364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
> [ 364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
> [ 364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
> [ 364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
> [ 364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
> [ 364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
> [ 364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
> [ 364.835364] Call trace:
> [ 364.842478] drv_stop+0xac/0xbc
> [ 364.844734] ieee80211_stop_device+0x54/0x64
> [ 364.847860] ieee80211_do_stop+0x5a0/0x790
> [ 364.852375] ieee80211_stop+0x4c/0x178
> [ 364.856280] __dev_close_many+0xb0/0x150
> [ 364.860014] dev_close_many+0x88/0x130
> [ 364.864092] dev_close.part.171+0x44/0x74
> [ 364.867653] dev_close+0x1c/0x28
> [ 364.871732] cfg80211_shutdown_all_interfaces+0x44/0xfc
> [ 364.875031] ieee80211_restart_work+0xfc/0x14c
> [ 364.879979] process_scheduled_works+0x18c/0x2dc
> [ 364.884494] worker_thread+0x13c/0x314
> [ 364.889266] kthread+0x118/0x124
> [ 364.892825] ret_from_fork+0x10/0x20
> [ 364.896211] ---[ end trace 0000000000000000 ]---
Thanks, so I assume it's this check from drv_stop():
if (WARN_ON(!local->started))
return;
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-06-13 6:07 ` Kalle Valo
@ 2024-06-13 6:11 ` Aaradhana Sahu
0 siblings, 0 replies; 7+ messages in thread
From: Aaradhana Sahu @ 2024-06-13 6:11 UTC (permalink / raw)
To: Kalle Valo; +Cc: ath12k, linux-wireless
On 6/13/2024 11:37 AM, Kalle Valo wrote:
> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>
>> On 6/10/2024 10:40 PM, Kalle Valo wrote:
>>> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>>>
>>>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>>>
>>>> ? __warn+0x7b/0x1a0
>>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>>> ? report_bug+0x10b/0x200
>>>> ? handle_bug+0x3f/0x70
>>>> ? exc_invalid_op+0x13/0x60
>>>> ? asm_exc_invalid_op+0x16/0x20
>>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>>>> ieee80211_stop+0x51/0x180 [mac80211]
>>>> __dev_close_many+0xb3/0x130
>>>> dev_close_many+0xa3/0x180
>>>> ? lock_release+0xde/0x420
>>>> dev_close.part.147+0x5f/0xa0
>>>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>>>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>>>> process_scheduled_works+0x377/0x6f0
>>>
>>> This is just the stack trace, not the full warning. If you send me the
>>> full warning I can add it to the commit message. Also it would be always
>>> good to identify what warning it is exactly as line numbers can change
>>> etc.
>>>
>>
>> Sure, the full warning is given below:
>>
>> [ 364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41
>> drv_stop+0xac/0xbc
>> [ 364.716875] Modules linked in: ath12k qmi_helpers
>> [ 364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W
>> 6.9.0-next-20240520-00113-gd981a3784e15 #39
>> [ 364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
>> [ 364.739965] Workqueue: events_freezable ieee80211_restart_work
>> [ 364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [ 364.752897] pc : drv_stop+0xac/0xbc
>> [ 364.759752] lr : ieee80211_stop_device+0x54/0x64
>> [ 364.763226] sp : ffff8000848dbb20
>> [ 364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
>> [ 364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
>> [ 364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
>> [ 364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
>> [ 364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
>> [ 364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
>> [ 364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
>> [ 364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
>> [ 364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
>> [ 364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
>> [ 364.835364] Call trace:
>> [ 364.842478] drv_stop+0xac/0xbc
>> [ 364.844734] ieee80211_stop_device+0x54/0x64
>> [ 364.847860] ieee80211_do_stop+0x5a0/0x790
>> [ 364.852375] ieee80211_stop+0x4c/0x178
>> [ 364.856280] __dev_close_many+0xb0/0x150
>> [ 364.860014] dev_close_many+0x88/0x130
>> [ 364.864092] dev_close.part.171+0x44/0x74
>> [ 364.867653] dev_close+0x1c/0x28
>> [ 364.871732] cfg80211_shutdown_all_interfaces+0x44/0xfc
>> [ 364.875031] ieee80211_restart_work+0xfc/0x14c
>> [ 364.879979] process_scheduled_works+0x18c/0x2dc
>> [ 364.884494] worker_thread+0x13c/0x314
>> [ 364.889266] kthread+0x118/0x124
>> [ 364.892825] ret_from_fork+0x10/0x20
>> [ 364.896211] ---[ end trace 0000000000000000 ]---
>
> Thanks, so I assume it's this check from drv_stop():
>
> if (WARN_ON(!local->started))
> return;
>
Yes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
2024-05-29 3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
2024-05-29 16:32 ` Jeff Johnson
2024-06-10 17:10 ` Kalle Valo
@ 2024-06-17 14:45 ` Kalle Valo
2 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2024-06-17 14:45 UTC (permalink / raw)
To: Aaradhana Sahu; +Cc: ath12k, linux-wireless, Aaradhana Sahu
Aaradhana Sahu <quic_aarasahu@quicinc.com> wrote:
> Whenever firmware is crashed in split-phy below WARN_ON() triggered:
>
> WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41 drv_stop+0xac/0xbc
> Modules linked in: ath12k qmi_helpers
> CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W 6.9.0-next-20240520-00113-gd981a3784e15 #39
> Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
> Workqueue: events_freezable ieee80211_restart_work
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : drv_stop+0xac/0xbc
> lr : ieee80211_stop_device+0x54/0x64
> sp : ffff8000848dbb20
> x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
> x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
> x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
> x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
> x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
> x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
> x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
> x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
> x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
> Call trace:
> drv_stop+0xac/0xbc
> ieee80211_stop_device+0x54/0x64
> ieee80211_do_stop+0x5a0/0x790
> ieee80211_stop+0x4c/0x178
> __dev_close_many+0xb0/0x150
> dev_close_many+0x88/0x130
> dev_close.part.171+0x44/0x74
> dev_close+0x1c/0x28
> cfg80211_shutdown_all_interfaces+0x44/0xfc
> ieee80211_restart_work+0xfc/0x14c
> process_scheduled_works+0x18c/0x2dc
> worker_thread+0x13c/0x314
> kthread+0x118/0x124
> ret_from_fork+0x10/0x20
> ---[ end trace 0000000000000000 ]---
>
> The warning in question is from drv_stop():
>
> if (WARN_ON(!local->started))
> return;
>
> The sequence of WARN_ON() is:
> Thread 1:
> -Firmware crash calls ath12k_core_reset().
> -Call ieee80211_restart_hw() inside
> ath12k_core_post_reconfigure_recovery() which schedules worker
> for both hardware.
> -Wait for completion of ab->recovery_start.
>
> Thread 2 (worker thread):
> -One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
> calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
> -Hardware is waiting for ab->reconfigure_complete but at this time
> recovery_start_count value is 1 because another worker thread
> (local->restart_work) is still waiting for rtnl_lock().
> recovery_start_count is not equal to number of radios
> (2 in split-phy). So ab->recovery_start complete does not set
> due to this, thread 1 is still waiting and not able to perform
> hif power down up and firmware reload.
> -Wait timeout happens for ab->reconfigure_complete and comeback
> to caller (ath12k_mac_op_start()) and sends WMI command to
> crashed firmware and gets error.
> -This returns error to drv_start() and local->started is set to false.
> -Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
> inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
> WARN_ON as local->started is false.
>
> To fix this issue call ieee80211_restart_hw() after firmware has been
> reloaded. Now, each hardware can send WMI command to firmware
> successfully. With this fix we don't need to wait for
> ab->recovery_start completion so remove
> ath12k_mac_wait_reconfigure().
>
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
> Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
>
> Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>
> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Patch applied to ath-next branch of ath.git, thanks.
670d4949bc8e wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
--
https://patchwork.kernel.org/project/linux-wireless/patch/20240529034405.2863150-1-quic_aarasahu@quicinc.com/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-17 14:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29 3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
2024-05-29 16:32 ` Jeff Johnson
2024-06-10 17:10 ` Kalle Valo
2024-06-13 5:01 ` Aaradhana Sahu
2024-06-13 6:07 ` Kalle Valo
2024-06-13 6:11 ` Aaradhana Sahu
2024-06-17 14:45 ` Kalle Valo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox