public inbox for ath12k@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
@ 2024-05-29  3:44 Aaradhana Sahu
  2024-05-29 16:32 ` Jeff Johnson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Aaradhana Sahu @ 2024-05-29  3:44 UTC (permalink / raw)
  To: ath12k; +Cc: linux-wireless, Aaradhana Sahu

Whenever firmware is crashed in split-phy below WARN_ON triggered:

? __warn+0x7b/0x1a0
? drv_stop+0x1eb/0x210 [mac80211]
? report_bug+0x10b/0x200
? handle_bug+0x3f/0x70
? exc_invalid_op+0x13/0x60
? asm_exc_invalid_op+0x16/0x20
? drv_stop+0x1eb/0x210 [mac80211]
ieee80211_do_stop+0x5ba/0x850 [mac80211]
ieee80211_stop+0x51/0x180 [mac80211]
__dev_close_many+0xb3/0x130
dev_close_many+0xa3/0x180
? lock_release+0xde/0x420
dev_close.part.147+0x5f/0xa0
cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
ieee80211_restart_work+0xf9/0x130 [mac80211]
process_scheduled_works+0x377/0x6f0

The sequence of WARN_ON is:
Thread 1:
-Firmware crash calls ath12k_core_reset().
-Call ieee80211_restart_hw() inside
 ath12k_core_post_reconfigure_recovery() which schedules worker
 for both hardware.
-Wait for completion of ab->recovery_start.

Thread 2 (worker thread):
-One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
 calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
-Hardware is waiting for ab->reconfigure_complete but at this time
 recovery_start_count value is 1 because another worker thread
 (local->restart_work) is still waiting for rtnl_lock().
 recovery_start_count is not equal to number of radios
 (2 in split-phy). So ab->recovery_start complete does not set
 due to this, thread 1 is still waiting and not able to perform
 hif power down up and firmware reload.
-Wait timeout happens for ab->reconfigure_complete and comeback
 to caller (ath12k_mac_op_start()) and sends WMI command to
 crashed firmware and gets error.
-This returns error to drv_start() and local->started is set to false.
-Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
 inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
 WARN_ON as local->started is false.

To fix this issue call ieee80211_restart_hw() after firmware has been
reloaded. Now, each hardware can send WMI command to firmware
successfully. With this fix we don't need to wait for
ab->recovery_start completion so remove
ath12k_mac_wait_reconfigure().

Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3

Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>
---
 drivers/net/wireless/ath/ath12k/core.c | 21 ++++++++-------------
 drivers/net/wireless/ath/ath12k/core.h |  3 ---
 drivers/net/wireless/ath/ath12k/mac.c  | 23 -----------------------
 3 files changed, 8 insertions(+), 39 deletions(-)

diff --git a/drivers/net/wireless/ath/ath12k/core.c b/drivers/net/wireless/ath/ath12k/core.c
index e7f628e935e4..9fac8b1b064e 100644
--- a/drivers/net/wireless/ath/ath12k/core.c
+++ b/drivers/net/wireless/ath/ath12k/core.c
@@ -1059,8 +1059,6 @@ static void ath12k_core_post_reconfigure_recovery(struct ath12k_base *ab)
 				mutex_unlock(&ar->conf_mutex);
 			}
 
-			/* Restart after all the link/radio halt */
-			ieee80211_restart_hw(ah->hw);
 			break;
 		case ATH12K_HW_STATE_OFF:
 			ath12k_warn(ab,
@@ -1087,7 +1085,8 @@ static void ath12k_core_post_reconfigure_recovery(struct ath12k_base *ab)
 static void ath12k_core_restart(struct work_struct *work)
 {
 	struct ath12k_base *ab = container_of(work, struct ath12k_base, restart_work);
-	int ret;
+	struct ath12k_hw *ah;
+	int ret, i;
 
 	ret = ath12k_core_reconfigure_on_crash(ab);
 	if (ret) {
@@ -1095,8 +1094,12 @@ static void ath12k_core_restart(struct work_struct *work)
 		return;
 	}
 
-	if (ab->is_reset)
-		complete_all(&ab->reconfigure_complete);
+	if (ab->is_reset) {
+		for (i = 0; i < ab->num_hw; i++) {
+			ah = ab->ah[i];
+			ieee80211_restart_hw(ah->hw);
+		}
+	}
 
 	complete(&ab->restart_completed);
 }
@@ -1150,20 +1153,14 @@ static void ath12k_core_reset(struct work_struct *work)
 	ath12k_dbg(ab, ATH12K_DBG_BOOT, "reset starting\n");
 
 	ab->is_reset = true;
-	atomic_set(&ab->recovery_start_count, 0);
-	reinit_completion(&ab->recovery_start);
 	atomic_set(&ab->recovery_count, 0);
 
 	ath12k_core_pre_reconfigure_recovery(ab);
 
-	reinit_completion(&ab->reconfigure_complete);
 	ath12k_core_post_reconfigure_recovery(ab);
 
 	ath12k_dbg(ab, ATH12K_DBG_BOOT, "waiting recovery start...\n");
 
-	time_left = wait_for_completion_timeout(&ab->recovery_start,
-						ATH12K_RECOVER_START_TIMEOUT_HZ);
-
 	ath12k_hif_irq_disable(ab);
 	ath12k_hif_ce_irq_disable(ab);
 
@@ -1246,8 +1243,6 @@ struct ath12k_base *ath12k_core_alloc(struct device *dev, size_t priv_size,
 	mutex_init(&ab->core_lock);
 	spin_lock_init(&ab->base_lock);
 	init_completion(&ab->reset_complete);
-	init_completion(&ab->reconfigure_complete);
-	init_completion(&ab->recovery_start);
 
 	INIT_LIST_HEAD(&ab->peers);
 	init_waitqueue_head(&ab->peer_mapping_wq);
diff --git a/drivers/net/wireless/ath/ath12k/core.h b/drivers/net/wireless/ath/ath12k/core.h
index 7d20b09c52e6..96fafa0e05dc 100644
--- a/drivers/net/wireless/ath/ath12k/core.h
+++ b/drivers/net/wireless/ath/ath12k/core.h
@@ -846,11 +846,8 @@ struct ath12k_base {
 	struct work_struct reset_work;
 	atomic_t reset_count;
 	atomic_t recovery_count;
-	atomic_t recovery_start_count;
 	bool is_reset;
 	struct completion reset_complete;
-	struct completion reconfigure_complete;
-	struct completion recovery_start;
 	/* continuous recovery fail count */
 	atomic_t fail_cont_count;
 	unsigned long reset_fail_timeout;
diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c
index 784964ae03ec..33616ab795af 100644
--- a/drivers/net/wireless/ath/ath12k/mac.c
+++ b/drivers/net/wireless/ath/ath12k/mac.c
@@ -5834,28 +5834,6 @@ static int ath12k_mac_config_mon_status_default(struct ath12k *ar, bool enable)
 	/* TODO: Need to support new monitor mode */
 }
 
-static void ath12k_mac_wait_reconfigure(struct ath12k_base *ab)
-{
-	int recovery_start_count;
-
-	if (!ab->is_reset)
-		return;
-
-	recovery_start_count = atomic_inc_return(&ab->recovery_start_count);
-
-	ath12k_dbg(ab, ATH12K_DBG_MAC, "recovery start count %d\n", recovery_start_count);
-
-	if (recovery_start_count == ab->num_radios) {
-		complete(&ab->recovery_start);
-		ath12k_dbg(ab, ATH12K_DBG_MAC, "recovery started success\n");
-	}
-
-	ath12k_dbg(ab, ATH12K_DBG_MAC, "waiting reconfigure...\n");
-
-	wait_for_completion_timeout(&ab->reconfigure_complete,
-				    ATH12K_RECONFIGURE_TIMEOUT_HZ);
-}
-
 static int ath12k_mac_start(struct ath12k *ar)
 {
 	struct ath12k_hw *ah = ar->ah;
@@ -5987,7 +5965,6 @@ static int ath12k_mac_op_start(struct ieee80211_hw *hw)
 		break;
 	case ATH12K_HW_STATE_RESTARTING:
 		ah->state = ATH12K_HW_STATE_RESTARTED;
-		ath12k_mac_wait_reconfigure(ah->ab);
 		break;
 	case ATH12K_HW_STATE_RESTARTED:
 	case ATH12K_HW_STATE_WEDGED:

base-commit: 0442aec67bbd9cbf93bf7f6ee59c9bf5348b9249
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-05-29  3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
@ 2024-05-29 16:32 ` Jeff Johnson
  2024-06-10 17:10 ` Kalle Valo
  2024-06-17 14:45 ` Kalle Valo
  2 siblings, 0 replies; 7+ messages in thread
From: Jeff Johnson @ 2024-05-29 16:32 UTC (permalink / raw)
  To: Aaradhana Sahu, ath12k; +Cc: linux-wireless

On 5/28/2024 8:44 PM, Aaradhana Sahu wrote:
> Whenever firmware is crashed in split-phy below WARN_ON triggered:
> 
> ? __warn+0x7b/0x1a0
> ? drv_stop+0x1eb/0x210 [mac80211]
> ? report_bug+0x10b/0x200
> ? handle_bug+0x3f/0x70
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? drv_stop+0x1eb/0x210 [mac80211]
> ieee80211_do_stop+0x5ba/0x850 [mac80211]
> ieee80211_stop+0x51/0x180 [mac80211]
> __dev_close_many+0xb3/0x130
> dev_close_many+0xa3/0x180
> ? lock_release+0xde/0x420
> dev_close.part.147+0x5f/0xa0
> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
> ieee80211_restart_work+0xf9/0x130 [mac80211]
> process_scheduled_works+0x377/0x6f0
> 
> The sequence of WARN_ON is:
> Thread 1:
> -Firmware crash calls ath12k_core_reset().
> -Call ieee80211_restart_hw() inside
>  ath12k_core_post_reconfigure_recovery() which schedules worker
>  for both hardware.
> -Wait for completion of ab->recovery_start.
> 
> Thread 2 (worker thread):
> -One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
>  calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
> -Hardware is waiting for ab->reconfigure_complete but at this time
>  recovery_start_count value is 1 because another worker thread
>  (local->restart_work) is still waiting for rtnl_lock().
>  recovery_start_count is not equal to number of radios
>  (2 in split-phy). So ab->recovery_start complete does not set
>  due to this, thread 1 is still waiting and not able to perform
>  hif power down up and firmware reload.
> -Wait timeout happens for ab->reconfigure_complete and comeback
>  to caller (ath12k_mac_op_start()) and sends WMI command to
>  crashed firmware and gets error.
> -This returns error to drv_start() and local->started is set to false.
> -Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
>  inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
>  WARN_ON as local->started is false.
> 
> To fix this issue call ieee80211_restart_hw() after firmware has been
> reloaded. Now, each hardware can send WMI command to firmware
> successfully. With this fix we don't need to wait for
> ab->recovery_start completion so remove
> ath12k_mac_wait_reconfigure().
> 
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
> Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
> 
> Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>

Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-05-29  3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
  2024-05-29 16:32 ` Jeff Johnson
@ 2024-06-10 17:10 ` Kalle Valo
  2024-06-13  5:01   ` Aaradhana Sahu
  2024-06-17 14:45 ` Kalle Valo
  2 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2024-06-10 17:10 UTC (permalink / raw)
  To: Aaradhana Sahu; +Cc: ath12k, linux-wireless

Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:

> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>
> ? __warn+0x7b/0x1a0
> ? drv_stop+0x1eb/0x210 [mac80211]
> ? report_bug+0x10b/0x200
> ? handle_bug+0x3f/0x70
> ? exc_invalid_op+0x13/0x60
> ? asm_exc_invalid_op+0x16/0x20
> ? drv_stop+0x1eb/0x210 [mac80211]
> ieee80211_do_stop+0x5ba/0x850 [mac80211]
> ieee80211_stop+0x51/0x180 [mac80211]
> __dev_close_many+0xb3/0x130
> dev_close_many+0xa3/0x180
> ? lock_release+0xde/0x420
> dev_close.part.147+0x5f/0xa0
> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
> ieee80211_restart_work+0xf9/0x130 [mac80211]
> process_scheduled_works+0x377/0x6f0

This is just the stack trace, not the full warning. If you send me the
full warning I can add it to the commit message. Also it would be always
good to identify what warning it is exactly as line numbers can change
etc.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-06-10 17:10 ` Kalle Valo
@ 2024-06-13  5:01   ` Aaradhana Sahu
  2024-06-13  6:07     ` Kalle Valo
  0 siblings, 1 reply; 7+ messages in thread
From: Aaradhana Sahu @ 2024-06-13  5:01 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath12k, linux-wireless



On 6/10/2024 10:40 PM, Kalle Valo wrote:
> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
> 
>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>
>> ? __warn+0x7b/0x1a0
>> ? drv_stop+0x1eb/0x210 [mac80211]
>> ? report_bug+0x10b/0x200
>> ? handle_bug+0x3f/0x70
>> ? exc_invalid_op+0x13/0x60
>> ? asm_exc_invalid_op+0x16/0x20
>> ? drv_stop+0x1eb/0x210 [mac80211]
>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>> ieee80211_stop+0x51/0x180 [mac80211]
>> __dev_close_many+0xb3/0x130
>> dev_close_many+0xa3/0x180
>> ? lock_release+0xde/0x420
>> dev_close.part.147+0x5f/0xa0
>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>> process_scheduled_works+0x377/0x6f0
> 
> This is just the stack trace, not the full warning. If you send me the
> full warning I can add it to the commit message. Also it would be always
> good to identify what warning it is exactly as line numbers can change
> etc.
> 

Sure, the full warning is given below:

[  364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41 drv_stop+0xac/0xbc
[  364.716875] Modules linked in: ath12k qmi_helpers
[  364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G      D W          6.9.0-next-20240520-00113-gd981a3784e15 #39
[  364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
[  364.739965] Workqueue: events_freezable ieee80211_restart_work
[  364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  364.752897] pc : drv_stop+0xac/0xbc
[  364.759752] lr : ieee80211_stop_device+0x54/0x64
[  364.763226] sp : ffff8000848dbb20
[  364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
[  364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
[  364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
[  364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
[  364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
[  364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
[  364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
[  364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
[  364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
[  364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
[  364.835364] Call trace:
[  364.842478]  drv_stop+0xac/0xbc
[  364.844734]  ieee80211_stop_device+0x54/0x64
[  364.847860]  ieee80211_do_stop+0x5a0/0x790
[  364.852375]  ieee80211_stop+0x4c/0x178
[  364.856280]  __dev_close_many+0xb0/0x150
[  364.860014]  dev_close_many+0x88/0x130
[  364.864092]  dev_close.part.171+0x44/0x74
[  364.867653]  dev_close+0x1c/0x28
[  364.871732]  cfg80211_shutdown_all_interfaces+0x44/0xfc
[  364.875031]  ieee80211_restart_work+0xfc/0x14c
[  364.879979]  process_scheduled_works+0x18c/0x2dc
[  364.884494]  worker_thread+0x13c/0x314
[  364.889266]  kthread+0x118/0x124
[  364.892825]  ret_from_fork+0x10/0x20
[  364.896211] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-06-13  5:01   ` Aaradhana Sahu
@ 2024-06-13  6:07     ` Kalle Valo
  2024-06-13  6:11       ` Aaradhana Sahu
  0 siblings, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2024-06-13  6:07 UTC (permalink / raw)
  To: Aaradhana Sahu; +Cc: ath12k, linux-wireless

Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:

> On 6/10/2024 10:40 PM, Kalle Valo wrote:
>> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>> 
>>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>>
>>> ? __warn+0x7b/0x1a0
>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>> ? report_bug+0x10b/0x200
>>> ? handle_bug+0x3f/0x70
>>> ? exc_invalid_op+0x13/0x60
>>> ? asm_exc_invalid_op+0x16/0x20
>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>>> ieee80211_stop+0x51/0x180 [mac80211]
>>> __dev_close_many+0xb3/0x130
>>> dev_close_many+0xa3/0x180
>>> ? lock_release+0xde/0x420
>>> dev_close.part.147+0x5f/0xa0
>>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>>> process_scheduled_works+0x377/0x6f0
>> 
>> This is just the stack trace, not the full warning. If you send me the
>> full warning I can add it to the commit message. Also it would be always
>> good to identify what warning it is exactly as line numbers can change
>> etc.
>> 
>
> Sure, the full warning is given below:
>
> [ 364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41
> drv_stop+0xac/0xbc
> [  364.716875] Modules linked in: ath12k qmi_helpers
> [ 364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W
> 6.9.0-next-20240520-00113-gd981a3784e15 #39
> [  364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
> [  364.739965] Workqueue: events_freezable ieee80211_restart_work
> [  364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  364.752897] pc : drv_stop+0xac/0xbc
> [  364.759752] lr : ieee80211_stop_device+0x54/0x64
> [  364.763226] sp : ffff8000848dbb20
> [  364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
> [  364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
> [  364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
> [  364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
> [  364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
> [  364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
> [  364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
> [  364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
> [  364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
> [  364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
> [  364.835364] Call trace:
> [  364.842478]  drv_stop+0xac/0xbc
> [  364.844734]  ieee80211_stop_device+0x54/0x64
> [  364.847860]  ieee80211_do_stop+0x5a0/0x790
> [  364.852375]  ieee80211_stop+0x4c/0x178
> [  364.856280]  __dev_close_many+0xb0/0x150
> [  364.860014]  dev_close_many+0x88/0x130
> [  364.864092]  dev_close.part.171+0x44/0x74
> [  364.867653]  dev_close+0x1c/0x28
> [  364.871732]  cfg80211_shutdown_all_interfaces+0x44/0xfc
> [  364.875031]  ieee80211_restart_work+0xfc/0x14c
> [  364.879979]  process_scheduled_works+0x18c/0x2dc
> [  364.884494]  worker_thread+0x13c/0x314
> [  364.889266]  kthread+0x118/0x124
> [  364.892825]  ret_from_fork+0x10/0x20
> [  364.896211] ---[ end trace 0000000000000000 ]---

Thanks, so I assume it's this check from drv_stop():

	if (WARN_ON(!local->started))
		return;

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-06-13  6:07     ` Kalle Valo
@ 2024-06-13  6:11       ` Aaradhana Sahu
  0 siblings, 0 replies; 7+ messages in thread
From: Aaradhana Sahu @ 2024-06-13  6:11 UTC (permalink / raw)
  To: Kalle Valo; +Cc: ath12k, linux-wireless



On 6/13/2024 11:37 AM, Kalle Valo wrote:
> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
> 
>> On 6/10/2024 10:40 PM, Kalle Valo wrote:
>>> Aaradhana Sahu <quic_aarasahu@quicinc.com> writes:
>>>
>>>> Whenever firmware is crashed in split-phy below WARN_ON triggered:
>>>>
>>>> ? __warn+0x7b/0x1a0
>>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>>> ? report_bug+0x10b/0x200
>>>> ? handle_bug+0x3f/0x70
>>>> ? exc_invalid_op+0x13/0x60
>>>> ? asm_exc_invalid_op+0x16/0x20
>>>> ? drv_stop+0x1eb/0x210 [mac80211]
>>>> ieee80211_do_stop+0x5ba/0x850 [mac80211]
>>>> ieee80211_stop+0x51/0x180 [mac80211]
>>>> __dev_close_many+0xb3/0x130
>>>> dev_close_many+0xa3/0x180
>>>> ? lock_release+0xde/0x420
>>>> dev_close.part.147+0x5f/0xa0
>>>> cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211]
>>>> ieee80211_restart_work+0xf9/0x130 [mac80211]
>>>> process_scheduled_works+0x377/0x6f0
>>>
>>> This is just the stack trace, not the full warning. If you send me the
>>> full warning I can add it to the commit message. Also it would be always
>>> good to identify what warning it is exactly as line numbers can change
>>> etc.
>>>
>>
>> Sure, the full warning is given below:
>>
>> [ 364.713223] WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41
>> drv_stop+0xac/0xbc
>> [  364.716875] Modules linked in: ath12k qmi_helpers
>> [ 364.724598] CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G D W
>> 6.9.0-next-20240520-00113-gd981a3784e15 #39
>> [  364.729378] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
>> [  364.739965] Workqueue: events_freezable ieee80211_restart_work
>> [  364.747082] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [  364.752897] pc : drv_stop+0xac/0xbc
>> [  364.759752] lr : ieee80211_stop_device+0x54/0x64
>> [  364.763226] sp : ffff8000848dbb20
>> [  364.768085] x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
>> [  364.771301] x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
>> [  364.778419] x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
>> [  364.785537] x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
>> [  364.792655] x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
>> [  364.799774] x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
>> [  364.806892] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
>> [  364.814009] x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
>> [  364.821128] x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
>> [  364.828246] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
>> [  364.835364] Call trace:
>> [  364.842478]  drv_stop+0xac/0xbc
>> [  364.844734]  ieee80211_stop_device+0x54/0x64
>> [  364.847860]  ieee80211_do_stop+0x5a0/0x790
>> [  364.852375]  ieee80211_stop+0x4c/0x178
>> [  364.856280]  __dev_close_many+0xb0/0x150
>> [  364.860014]  dev_close_many+0x88/0x130
>> [  364.864092]  dev_close.part.171+0x44/0x74
>> [  364.867653]  dev_close+0x1c/0x28
>> [  364.871732]  cfg80211_shutdown_all_interfaces+0x44/0xfc
>> [  364.875031]  ieee80211_restart_work+0xfc/0x14c
>> [  364.879979]  process_scheduled_works+0x18c/0x2dc
>> [  364.884494]  worker_thread+0x13c/0x314
>> [  364.889266]  kthread+0x118/0x124
>> [  364.892825]  ret_from_fork+0x10/0x20
>> [  364.896211] ---[ end trace 0000000000000000 ]---
> 
> Thanks, so I assume it's this check from drv_stop():
> 
> 	if (WARN_ON(!local->started))
> 		return;
> 

Yes.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy
  2024-05-29  3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
  2024-05-29 16:32 ` Jeff Johnson
  2024-06-10 17:10 ` Kalle Valo
@ 2024-06-17 14:45 ` Kalle Valo
  2 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2024-06-17 14:45 UTC (permalink / raw)
  To: Aaradhana Sahu; +Cc: ath12k, linux-wireless, Aaradhana Sahu

Aaradhana Sahu <quic_aarasahu@quicinc.com> wrote:

> Whenever firmware is crashed in split-phy below WARN_ON() triggered:
> 
> WARNING: CPU: 3 PID: 82 at net/mac80211/driver-ops.c:41 drv_stop+0xac/0xbc
> Modules linked in: ath12k qmi_helpers
> CPU: 3 PID: 82 Comm: kworker/3:2 Tainted: G      D W          6.9.0-next-20240520-00113-gd981a3784e15 #39
> Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C9 (DT)
> Workqueue: events_freezable ieee80211_restart_work
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : drv_stop+0xac/0xbc
> lr : ieee80211_stop_device+0x54/0x64
> sp : ffff8000848dbb20
> x29: ffff8000848dbb20 x28: 0000000000000790 x27: ffff000014d78900
> x26: ffff000014d791f8 x25: ffff000007f0d9b0 x24: 0000000000000018
> x23: 0000000000000001 x22: 0000000000000000 x21: ffff000014d78e10
> x20: ffff800081dc0000 x19: ffff000014d78900 x18: ffffffffffffffff
> x17: ffff7fffbca84000 x16: ffff800083fe0000 x15: ffff800081dc0b48
> x14: 0000000000000076 x13: 0000000000000076 x12: 0000000000000001
> x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff8000848db980
> x8 : ffff000000dddfc0 x7 : 0000000000000400 x6 : ffff800083b012d8
> x5 : ffff800083b012d8 x4 : 0000000000000000 x3 : ffff000014d78398
> x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000014d78900
> Call trace:
>  drv_stop+0xac/0xbc
>  ieee80211_stop_device+0x54/0x64
>  ieee80211_do_stop+0x5a0/0x790
>  ieee80211_stop+0x4c/0x178
>  __dev_close_many+0xb0/0x150
>  dev_close_many+0x88/0x130
>  dev_close.part.171+0x44/0x74
>  dev_close+0x1c/0x28
>  cfg80211_shutdown_all_interfaces+0x44/0xfc
>  ieee80211_restart_work+0xfc/0x14c
>  process_scheduled_works+0x18c/0x2dc
>  worker_thread+0x13c/0x314
>  kthread+0x118/0x124
>  ret_from_fork+0x10/0x20
> ---[ end trace 0000000000000000 ]---
> 
> The warning in question is from drv_stop():
> 
>         if (WARN_ON(!local->started))
>                 return;
> 
> The sequence of WARN_ON() is:
> Thread 1:
> -Firmware crash calls ath12k_core_reset().
> -Call ieee80211_restart_hw() inside
>  ath12k_core_post_reconfigure_recovery() which schedules worker
>  for both hardware.
> -Wait for completion of ab->recovery_start.
> 
> Thread 2 (worker thread):
> -One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and
>  calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start().
> -Hardware is waiting for ab->reconfigure_complete but at this time
>  recovery_start_count value is 1 because another worker thread
>  (local->restart_work) is still waiting for rtnl_lock().
>  recovery_start_count is not equal to number of radios
>  (2 in split-phy). So ab->recovery_start complete does not set
>  due to this, thread 1 is still waiting and not able to perform
>  hif power down up and firmware reload.
> -Wait timeout happens for ab->reconfigure_complete and comeback
>  to caller (ath12k_mac_op_start()) and sends WMI command to
>  crashed firmware and gets error.
> -This returns error to drv_start() and local->started is set to false.
> -Hardware calls cfg80211_shutdown_all_interfaces() after receiving error
>  inside ieee80211_restart_work() and goes to drv_stop(), here we trigger
>  WARN_ON as local->started is false.
> 
> To fix this issue call ieee80211_restart_hw() after firmware has been
> reloaded. Now, each hardware can send WMI command to firmware
> successfully. With this fix we don't need to wait for
> ab->recovery_start completion so remove
> ath12k_mac_wait_reconfigure().
> 
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1
> Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1
> Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
> 
> Signed-off-by: Aaradhana Sahu <quic_aarasahu@quicinc.com>
> Acked-by: Jeff Johnson <quic_jjohnson@quicinc.com>
> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>

Patch applied to ath-next branch of ath.git, thanks.

670d4949bc8e wifi: ath12k: Fix WARN_ON during firmware crash in split-phy

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20240529034405.2863150-1-quic_aarasahu@quicinc.com/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-06-17 14:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-29  3:44 [PATCH] wifi: ath12k: Fix WARN_ON during firmware crash in split-phy Aaradhana Sahu
2024-05-29 16:32 ` Jeff Johnson
2024-06-10 17:10 ` Kalle Valo
2024-06-13  5:01   ` Aaradhana Sahu
2024-06-13  6:07     ` Kalle Valo
2024-06-13  6:11       ` Aaradhana Sahu
2024-06-17 14:45 ` Kalle Valo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox