From: Baochen Qiang <baochen.qiang@oss.qualcomm.com> To: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>, Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Cc: ath11k@lists.infradead.org, jjohnson@kernel.org, linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] wifi: ath11k: fix warning when unbinding Date: Fri, 15 May 2026 14:39:28 +0800 [thread overview] Message-ID: <5bb180ea-d970-4cf0-8d01-620cbdb7be9e@oss.qualcomm.com> (raw) In-Reply-To: <fdff6264-9c35-4c77-bab2-6db9125d77af@oss.qualcomm.com> On 5/15/2026 10:27 AM, Rameshkumar Sundaram wrote: > On 5/14/2026 1:45 PM, Baochen Qiang wrote: >> >> >> On 5/14/2026 2:55 PM, Rameshkumar Sundaram wrote: >>> On 5/14/2026 11:48 AM, Jose Ignacio Tornos Martinez wrote: >>>> Hello Rameshkumar, >>>> >>>>> I agree that setting tx_status to NULL makes ath11k_dp_free() more >>>>> defensive, and it matches the ath12k fix. >>>> Ok, I agree too. >>>> >>>>> However, i am still wondering how the second ath11k_dp_free() is reached >>>>> if ATH11K_FLAG_QMI_FAIL is set. >>>>> >>>>> In ath11k_pci_remove(), when ATH11K_FLAG_QMI_FAIL is set, we take the >>>>> qmi_fail path and skip ath11k_core_deinit(). So the normal remove path: >>>>> >>>>> ath11k_pci_remove() >>>>> ath11k_core_deinit() >>>>> ath11k_core_soc_destroy() >>>>> ath11k_dp_free() >>>>> >>>>> should not run. >>>>> >>>>> So if the double free is still reproducible with QMI_FAIL set (with the >>>>> change i proposed), either the flag is not actually set in this failure >>>>> case, or there is another path calling ath11k_dp_free() ? >>>> Let me try to clarify the issue more. >>>> There are two error actions: >>>> - First the previous error. I reproduce the situation as I commented: running >>>> in a VM the default upstream kernel (with this card using PCI passthrough), >>>> since this is always failing. Let me show the logs in this situation: >>>> [ 15.906564] ath11k_pci 0000:07:00.0: BAR 0 [mem 0xfdc00000-0xfddfffff 64bit]: assigned >>>> [ 15.926520] ath11k_pci 0000:07:00.0: MSI vectors: 32 >>>> [ 15.928572] ath11k_pci 0000:07:00.0: wcn6855 hw2.0 >>>> [ 16.984192] ath11k_pci 0000:07:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id >>>> 0x400c0200 >>>> [ 16.984351] ath11k_pci 0000:07:00.0: fw_version 0x11088c35 fw_build_timestamp >>>> 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 >>>> [ 18.186971] ath11k_pci 0000:07:00.0: failed to receive control response completion, >>>> polling.. >>>> [ 19.211036] ath11k_pci 0000:07:00.0: Service connect timeout >>>> [ 19.211815] ath11k_pci 0000:07:00.0: failed to connect to HTT: -110 >>>> [ 19.214181] ath11k_pci 0000:07:00.0: failed to start core: -110 >>>> [ 19.531989] ath11k_pci 0000:07:00.0: firmware crashed: MHI_CB_EE_RDDM >>>> [ 19.532930] ath11k_pci 0000:07:00.0: ignore reset dev flags 0xc000 >>>> [ 29.259157] ath11k_pci 0000:07:00.0: failed to wait wlan mode request (mode 4): -110 >>>> [ 29.259229] ath11k_pci 0000:07:00.0: qmi failed to send wlan mode off: -110 >>>> - Second after this, I commanded the unbinded (ath11_pci) and I get the >>>> warning. Let extend here the stack trace: >>>> [ 24.238198] ? free_large_kmalloc+0x57/0x90 >>>> [ 24.238199] ? report_bug+0x16b/0x180 >>>> [ 24.238210] ? handle_bug+0x3c/0x70 >>>> [ 24.238218] ? exc_invalid_op+0x14/0x70 >>>> [ 24.238218] ? asm_exc_invalid_op+0x16/0x20 >>>> [ 24.238224] ? free_large_kmalloc+0x57/0x90 >>>> [ 24.238227] ath11k_dp_free+0x99/0xb0 [ath11k] >>>> [ 24.238275] ath11k_core_deinit+0x12b/0x1a0 [ath11k] >>>> [ 24.238287] ath11k_pci_remove+0x7b/0x120 [ath11k_pci] >>>> [ 24.238294] pci_device_remove+0x3e/0xb0 >>>> [ 24.238304] device_release_driver_internal+0x193/0x200 >>>> [ 24.238315] unbind_store+0x9d/0xb0 >>>> [ 24.238320] kernfs_fop_write_iter+0x13a/0x1d0 >>>> [ 24.238330] vfs_write+0x32e/0x470 >>>> [ 24.238335] ksys_write+0x5f/0xe0 >>>> [ 24.238336] do_syscall_64+0x5f/0xe0 >>>> Very easy to reproduce. >>>> >>> >>> >>> Thanks much for the logs, that makes sense. The timestamps explain why my earlier >>> reasoning did not match the trace: unbind reaches ath11k_pci_remove() before >>> ATH11K_FLAG_QMI_FAIL is set by the QMI event worker as it is held up on wlan mode off qmi >> >> how could QMI worker set this flag? the first failure happens in >> ath12k_core_qmi_firmware_ready() and upon this failure the QMI worker just break out >> without setting any flag, no? >> > > > you mean ath1*1*k_core_qmi_firmware_ready() ?. Yes in ToT it breaks out without setting > any flags, so I proposed to set that on failure case ATH11K_QMI_EVENT_FW_READY: (similar > to case ATH11K_QMI_EVENT_FW_INIT_DONE:) in this mail thread. Hmm, I mixed it with ath12k. You are right, for ATH11K_QMI_EVENT_FW_INIT_DONE, the ATH11K_FLAG_QMI_FAIL is set upon failure. > > > -- > Ramesh