The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
To: Baochen Qiang <baochen.qiang@oss.qualcomm.com>,
	Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Cc: ath11k@lists.infradead.org, jjohnson@kernel.org,
	linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH] wifi: ath11k: fix warning when unbinding
Date: Fri, 15 May 2026 07:57:34 +0530	[thread overview]
Message-ID: <fdff6264-9c35-4c77-bab2-6db9125d77af@oss.qualcomm.com> (raw)
In-Reply-To: <c2523379-ab12-47e1-a0d0-ef6073deaf11@oss.qualcomm.com>

On 5/14/2026 1:45 PM, Baochen Qiang wrote:
> 
> 
> On 5/14/2026 2:55 PM, Rameshkumar Sundaram wrote:
>> On 5/14/2026 11:48 AM, Jose Ignacio Tornos Martinez wrote:
>>> Hello Rameshkumar,
>>>
>>>> I agree that setting tx_status to NULL makes ath11k_dp_free() more
>>>> defensive, and it matches the ath12k fix.
>>> Ok, I agree too.
>>>
>>>> However, i am still wondering how the second ath11k_dp_free() is reached
>>>> if ATH11K_FLAG_QMI_FAIL is set.
>>>>
>>>> In ath11k_pci_remove(), when ATH11K_FLAG_QMI_FAIL is set, we take the
>>>> qmi_fail path and skip ath11k_core_deinit(). So the normal remove path:
>>>>
>>>>       ath11k_pci_remove()
>>>>         ath11k_core_deinit()
>>>>           ath11k_core_soc_destroy()
>>>>             ath11k_dp_free()
>>>>
>>>> should not run.
>>>>
>>>> So if the double free is still reproducible with QMI_FAIL set (with the
>>>> change i proposed), either the flag is not actually set in this failure
>>>> case, or there is another path calling ath11k_dp_free() ?
>>> Let me try to clarify the issue more.
>>> There are two error actions:
>>> - First the previous error. I reproduce the situation as I commented: running
>>> in a VM the default upstream kernel (with this card using PCI passthrough),
>>> since this is always failing. Let me show the logs in this situation:
>>> [   15.906564] ath11k_pci 0000:07:00.0: BAR 0 [mem 0xfdc00000-0xfddfffff 64bit]: assigned
>>> [   15.926520] ath11k_pci 0000:07:00.0: MSI vectors: 32
>>> [   15.928572] ath11k_pci 0000:07:00.0: wcn6855 hw2.0
>>> [   16.984192] ath11k_pci 0000:07:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id
>>> 0x400c0200
>>> [   16.984351] ath11k_pci 0000:07:00.0: fw_version 0x11088c35 fw_build_timestamp
>>> 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
>>> [   18.186971] ath11k_pci 0000:07:00.0: failed to receive control response completion,
>>> polling..
>>> [   19.211036] ath11k_pci 0000:07:00.0: Service connect timeout
>>> [   19.211815] ath11k_pci 0000:07:00.0: failed to connect to HTT: -110
>>> [   19.214181] ath11k_pci 0000:07:00.0: failed to start core: -110
>>> [   19.531989] ath11k_pci 0000:07:00.0: firmware crashed: MHI_CB_EE_RDDM
>>> [   19.532930] ath11k_pci 0000:07:00.0: ignore reset dev flags 0xc000
>>> [   29.259157] ath11k_pci 0000:07:00.0: failed to wait wlan mode request (mode 4): -110
>>> [   29.259229] ath11k_pci 0000:07:00.0: qmi failed to send wlan mode off: -110
>>> - Second after this, I commanded the unbinded (ath11_pci) and I get the
>>> warning. Let extend here the stack trace:
>>> [   24.238198]  ? free_large_kmalloc+0x57/0x90
>>> [   24.238199]  ? report_bug+0x16b/0x180
>>> [   24.238210]  ? handle_bug+0x3c/0x70
>>> [   24.238218]  ? exc_invalid_op+0x14/0x70
>>> [   24.238218]  ? asm_exc_invalid_op+0x16/0x20
>>> [   24.238224]  ? free_large_kmalloc+0x57/0x90
>>> [   24.238227]  ath11k_dp_free+0x99/0xb0 [ath11k]
>>> [   24.238275]  ath11k_core_deinit+0x12b/0x1a0 [ath11k]
>>> [   24.238287]  ath11k_pci_remove+0x7b/0x120 [ath11k_pci]
>>> [   24.238294]  pci_device_remove+0x3e/0xb0
>>> [   24.238304]  device_release_driver_internal+0x193/0x200
>>> [   24.238315]  unbind_store+0x9d/0xb0
>>> [   24.238320]  kernfs_fop_write_iter+0x13a/0x1d0
>>> [   24.238330]  vfs_write+0x32e/0x470
>>> [   24.238335]  ksys_write+0x5f/0xe0
>>> [   24.238336]  do_syscall_64+0x5f/0xe0
>>> Very easy to reproduce.
>>>
>>
>>
>> Thanks much for the logs, that makes sense. The timestamps explain why my earlier
>> reasoning did not match the trace: unbind reaches ath11k_pci_remove() before
>> ATH11K_FLAG_QMI_FAIL is set by the QMI event worker as it is held up on wlan mode off qmi
> 
> how could QMI worker set this flag? the first failure happens in
> ath12k_core_qmi_firmware_ready() and upon this failure the QMI worker just break out
> without setting any flag, no?
> 


you mean ath1*1*k_core_qmi_firmware_ready() ?. Yes in ToT it breaks out 
without setting any flags, so I proposed to set that on failure case 
ATH11K_QMI_EVENT_FW_READY: (similar to case 
ATH11K_QMI_EVENT_FW_INIT_DONE:) in this mail thread.


--
Ramesh