* ath11k resume fails due to kernel blocks probing MHI virtual devices @ 2024-01-29 10:10 Baochen Qiang 2024-01-29 12:22 ` Rafael J. Wysocki 0 siblings, 1 reply; 5+ messages in thread From: Baochen Qiang @ 2024-01-29 10:10 UTC (permalink / raw) To: rafael, pavel Cc: Manivannan Sadhasivam, Kalle Valo (QUIC), Jeff Johnson, linux-pm, kernel@quicinc.com, linux-wireless, ath11k Hi Rafael and Pavel, Currently I am facing an ath11k (a kernel WLAN driver) resume issue related with kernel PM framework and MHI module. Before introducing the issue details, I'd like to summarize how ath11k interacts with MHI stack to download WLAN firmware to hardware target: 1. when booting/restarting, ath11k powers on MHI module and waits for MHI channels to be ready. 2. When power on, MHI stack creates some virtual MHI devices, which represents MHI hardware channels, and adds them to MHI bus. This triggers MHI client driver, named QRTR, to get matched and probe those MHI devices. In probe, QRTR initializes MHI channels and finally move them to ready state. 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware target, then WLAN is working. Such an flow works well in general, but introduces issues in hibernation cycle: when preparing for hibernation, ath11k powers down MHI, this results in MHI devices being destroyed thus QRTR resets MHI channels. When resuming back from hibernation, ath11k powers on MHI and waits for MHI channels to be ready in its resume callback. As said above, MHI creates and adds MHI devices to MHI bus, but they can't be probed at that time because device probe is prohibited in device_block_probing(), finally this results in ath11k resume timeout. Now there is an potential fix to this issue which would needs changes in MHI stack, i.e., don't destroy MHI devices while hibernating. And we have had a plenty talk with MHI community regarding this change, see [1] and [2]. However Mani (the MHI maintainer) doesn't think it's right to fix it in MHI stack. Instead, he thought we might need to add a new PM callback which will be called after device probe is unblocked. By registering such a callback ath11k can wait the dependency driver, i.e., QRTR, to probe and initialize those MHI devices. Your thoughts? [1] https://lists.infradead.org/pipermail/ath11k/2023-December/005098.html [2] https://lists.infradead.org/pipermail/ath11k/2024-January/005205.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ath11k resume fails due to kernel blocks probing MHI virtual devices 2024-01-29 10:10 ath11k resume fails due to kernel blocks probing MHI virtual devices Baochen Qiang @ 2024-01-29 12:22 ` Rafael J. Wysocki 2024-01-29 12:31 ` Manivannan Sadhasivam 0 siblings, 1 reply; 5+ messages in thread From: Rafael J. Wysocki @ 2024-01-29 12:22 UTC (permalink / raw) To: Baochen Qiang Cc: rafael, pavel, Manivannan Sadhasivam, Kalle Valo (QUIC), Jeff Johnson, linux-pm, kernel@quicinc.com, linux-wireless, ath11k On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang@quicinc.com> wrote: > > Hi Rafael and Pavel, > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue > related with kernel PM framework and MHI module. > > Before introducing the issue details, I'd like to summarize how ath11k > interacts with MHI stack to download WLAN firmware to hardware target: > 1. when booting/restarting, ath11k powers on MHI module and waits for > MHI channels to be ready. > 2. When power on, MHI stack creates some virtual MHI devices, which > represents MHI hardware channels, and adds them to MHI bus. This > triggers MHI client driver, named QRTR, to get matched and probe those > MHI devices. In probe, QRTR initializes MHI channels and finally move > them to ready state. > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware > target, then WLAN is working. > > Such an flow works well in general, but introduces issues in hibernation > cycle: when preparing for hibernation, ath11k powers down MHI, this > results in MHI devices being destroyed thus QRTR resets MHI channels. > When resuming back from hibernation, ath11k powers on MHI and waits for > MHI channels to be ready in its resume callback. As said above, MHI > creates and adds MHI devices to MHI bus, but they can't be probed at > that time because device probe is prohibited in device_block_probing(), > finally this results in ath11k resume timeout. > > Now there is an potential fix to this issue which would needs changes in > MHI stack, i.e., don't destroy MHI devices while hibernating. Exactly. > And we have had a plenty talk with MHI community regarding this change, see [1] > and [2]. > > However Mani (the MHI maintainer) doesn't think it's right to fix it in > MHI stack. Instead, he thought we might need to add a new PM callback > which will be called after device probe is unblocked. By registering > such a callback ath11k can wait the dependency driver, i.e., QRTR, to > probe and initialize those MHI devices. > > Your thoughts? I'm not quite sure why do the pointless device destruction and re-creation in the hibernation frlo and add a new callback to the PM core to work around this. It doesn't sound like a straightforward approach to me. > [1] https://lists.infradead.org/pipermail/ath11k/2023-December/005098.html > [2] https://lists.infradead.org/pipermail/ath11k/2024-January/005205.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ath11k resume fails due to kernel blocks probing MHI virtual devices 2024-01-29 12:22 ` Rafael J. Wysocki @ 2024-01-29 12:31 ` Manivannan Sadhasivam 2024-01-29 12:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 5+ messages in thread From: Manivannan Sadhasivam @ 2024-01-29 12:31 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Baochen Qiang, pavel, Manivannan Sadhasivam, Kalle Valo (QUIC), Jeff Johnson, linux-pm, kernel@quicinc.com, linux-wireless, ath11k On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang@quicinc.com> wrote: > > > > Hi Rafael and Pavel, > > > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue > > related with kernel PM framework and MHI module. > > > > Before introducing the issue details, I'd like to summarize how ath11k > > interacts with MHI stack to download WLAN firmware to hardware target: > > 1. when booting/restarting, ath11k powers on MHI module and waits for > > MHI channels to be ready. > > 2. When power on, MHI stack creates some virtual MHI devices, which > > represents MHI hardware channels, and adds them to MHI bus. This > > triggers MHI client driver, named QRTR, to get matched and probe those > > MHI devices. In probe, QRTR initializes MHI channels and finally move > > them to ready state. > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware > > target, then WLAN is working. > > > > Such an flow works well in general, but introduces issues in hibernation > > cycle: when preparing for hibernation, ath11k powers down MHI, this > > results in MHI devices being destroyed thus QRTR resets MHI channels. > > When resuming back from hibernation, ath11k powers on MHI and waits for > > MHI channels to be ready in its resume callback. As said above, MHI > > creates and adds MHI devices to MHI bus, but they can't be probed at > > that time because device probe is prohibited in device_block_probing(), > > finally this results in ath11k resume timeout. > > > > Now there is an potential fix to this issue which would needs changes in > > MHI stack, i.e., don't destroy MHI devices while hibernating. > > Exactly. > During hibernation, the power to ath11k could be lost and in that case, there will be no channels available from the device. So keeping the "struct dev" when there is no real device attached to the system, goes against the driver model IMO since we would be messing with the refcount. For instance in the case of USB, if the device get's unplugged, would it make sense to keep the "struct dev" for the device in kernel in a hope that it would come back again? The driver model as I understood is, once the actual physical device gets removed, the refcount for "struct dev" should be decremented and it should be destroyed. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ath11k resume fails due to kernel blocks probing MHI virtual devices 2024-01-29 12:31 ` Manivannan Sadhasivam @ 2024-01-29 12:37 ` Rafael J. Wysocki 2024-01-29 12:47 ` Manivannan Sadhasivam 0 siblings, 1 reply; 5+ messages in thread From: Rafael J. Wysocki @ 2024-01-29 12:37 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: Rafael J. Wysocki, Baochen Qiang, pavel, Kalle Valo (QUIC), Jeff Johnson, linux-pm, kernel@quicinc.com, linux-wireless, ath11k, Greg Kroah-Hartman On Mon, Jan 29, 2024 at 1:31 PM Manivannan Sadhasivam <mani@kernel.org> wrote: > > On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote: > > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang@quicinc.com> wrote: > > > > > > Hi Rafael and Pavel, > > > > > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue > > > related with kernel PM framework and MHI module. > > > > > > Before introducing the issue details, I'd like to summarize how ath11k > > > interacts with MHI stack to download WLAN firmware to hardware target: > > > 1. when booting/restarting, ath11k powers on MHI module and waits for > > > MHI channels to be ready. > > > 2. When power on, MHI stack creates some virtual MHI devices, which > > > represents MHI hardware channels, and adds them to MHI bus. This > > > triggers MHI client driver, named QRTR, to get matched and probe those > > > MHI devices. In probe, QRTR initializes MHI channels and finally move > > > them to ready state. > > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware > > > target, then WLAN is working. > > > > > > Such an flow works well in general, but introduces issues in hibernation > > > cycle: when preparing for hibernation, ath11k powers down MHI, this > > > results in MHI devices being destroyed thus QRTR resets MHI channels. > > > When resuming back from hibernation, ath11k powers on MHI and waits for > > > MHI channels to be ready in its resume callback. As said above, MHI > > > creates and adds MHI devices to MHI bus, but they can't be probed at > > > that time because device probe is prohibited in device_block_probing(), > > > finally this results in ath11k resume timeout. > > > > > > Now there is an potential fix to this issue which would needs changes in > > > MHI stack, i.e., don't destroy MHI devices while hibernating. > > > > Exactly. > > > > During hibernation, the power to ath11k could be lost and in that case, there > will be no channels available from the device. So keeping the "struct dev" when > there is no real device attached to the system, goes against the driver model > IMO since we would be messing with the refcount. But this is system hibernation or suspend and the reason for the power loss is quite different from device removal at run time. The device is going to be back during resume (or at least it is not expected to go away in the meantime), so it is pointless to destroy its representation in memory. > For instance in the case of USB, if the device get's unplugged, would it make > sense to keep the "struct dev" for the device in kernel in a hope that it would > come back again? At run time - no, during system suspend - yes. It is not even recommended to free IRQs during system suspend. > The driver model as I understood is, once the actual physical device gets > removed, the refcount for "struct dev" should be decremented and it should be > destroyed. Not really. Thanks! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ath11k resume fails due to kernel blocks probing MHI virtual devices 2024-01-29 12:37 ` Rafael J. Wysocki @ 2024-01-29 12:47 ` Manivannan Sadhasivam 0 siblings, 0 replies; 5+ messages in thread From: Manivannan Sadhasivam @ 2024-01-29 12:47 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Manivannan Sadhasivam, Baochen Qiang, pavel, Kalle Valo (QUIC), Jeff Johnson, linux-pm, kernel@quicinc.com, linux-wireless, ath11k, Greg Kroah-Hartman On Mon, Jan 29, 2024 at 01:37:41PM +0100, Rafael J. Wysocki wrote: > On Mon, Jan 29, 2024 at 1:31 PM Manivannan Sadhasivam <mani@kernel.org> wrote: > > > > On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote: > > > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang@quicinc.com> wrote: > > > > > > > > Hi Rafael and Pavel, > > > > > > > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue > > > > related with kernel PM framework and MHI module. > > > > > > > > Before introducing the issue details, I'd like to summarize how ath11k > > > > interacts with MHI stack to download WLAN firmware to hardware target: > > > > 1. when booting/restarting, ath11k powers on MHI module and waits for > > > > MHI channels to be ready. > > > > 2. When power on, MHI stack creates some virtual MHI devices, which > > > > represents MHI hardware channels, and adds them to MHI bus. This > > > > triggers MHI client driver, named QRTR, to get matched and probe those > > > > MHI devices. In probe, QRTR initializes MHI channels and finally move > > > > them to ready state. > > > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware > > > > target, then WLAN is working. > > > > > > > > Such an flow works well in general, but introduces issues in hibernation > > > > cycle: when preparing for hibernation, ath11k powers down MHI, this > > > > results in MHI devices being destroyed thus QRTR resets MHI channels. > > > > When resuming back from hibernation, ath11k powers on MHI and waits for > > > > MHI channels to be ready in its resume callback. As said above, MHI > > > > creates and adds MHI devices to MHI bus, but they can't be probed at > > > > that time because device probe is prohibited in device_block_probing(), > > > > finally this results in ath11k resume timeout. > > > > > > > > Now there is an potential fix to this issue which would needs changes in > > > > MHI stack, i.e., don't destroy MHI devices while hibernating. > > > > > > Exactly. > > > > > > > During hibernation, the power to ath11k could be lost and in that case, there > > will be no channels available from the device. So keeping the "struct dev" when > > there is no real device attached to the system, goes against the driver model > > IMO since we would be messing with the refcount. > > But this is system hibernation or suspend and the reason for the power > loss is quite different from device removal at run time. > > The device is going to be back during resume (or at least it is not > expected to go away in the meantime), so it is pointless to destroy > its representation in memory. > > > For instance in the case of USB, if the device get's unplugged, would it make > > sense to keep the "struct dev" for the device in kernel in a hope that it would > > come back again? > > At run time - no, during system suspend - yes. > > It is not even recommended to free IRQs during system suspend. > Hmm, okay. Thanks for clearing it up. > > The driver model as I understood is, once the actual physical device gets > > removed, the refcount for "struct dev" should be decremented and it should be > > destroyed. > > Not really. > Okay. My undestanding seem to be wrong then. I will move forward with the proposal to keep the devices. - Mani -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-29 12:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-29 10:10 ath11k resume fails due to kernel blocks probing MHI virtual devices Baochen Qiang 2024-01-29 12:22 ` Rafael J. Wysocki 2024-01-29 12:31 ` Manivannan Sadhasivam 2024-01-29 12:37 ` Rafael J. Wysocki 2024-01-29 12:47 ` Manivannan Sadhasivam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).