* rtw88: USB devices randomly stop receiving anything
@ 2024-09-25 11:46 Bitterblue Smith
2024-09-26 13:04 ` petter
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Bitterblue Smith @ 2024-09-25 11:46 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org; +Cc: Ping-Ke Shih
Hi,
I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
I assume all USB devices are affected. If I have qBittorrent running,
the wifi stops working after a few hours:
Sep 24 00:48:21 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:21 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:23 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:23 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:25 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:25 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:27 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:27 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:29 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:29 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:31 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:31 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:33 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:33 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:35 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:35 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:37 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:37 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:39 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:39 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:41 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 24 00:48:41 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Added BSSID ... into ignore list, ignoring for 10 seconds
Sep 24 00:48:42 ideapad2 NetworkManager[433]: <info> [1727128122.0377] device (wlp3s0f3u2i2): supplicant interface state: completed -> disconnected
Sep 24 00:48:45 ideapad2 NetworkManager[433]: <info> [1727128125.6030] device (wlp3s0f3u2i2): supplicant interface state: disconnected -> scanning
Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Removed BSSID ... from ignore list (clear)
Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: SME: Trying to authenticate with ... (SSID='...' freq=2472 MHz)
Sep 24 00:48:50 ideapad2 kernel: wlp3s0f3u2i2: authenticate with ... (local address=,,,)
Sep 24 00:48:51 ideapad2 NetworkManager[433]: <info> [1727128131.2488] device (wlp3s0f3u2i2): supplicant interface state: scanning -> authenticating
Sep 24 00:48:51 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 1/3)
Sep 24 00:48:51 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:52 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 2/3)
Sep 24 00:48:52 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:53 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 3/3)
Sep 24 00:48:53 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 24 00:48:54 ideapad2 kernel: wlp3s0f3u2i2: authentication with ... timed out
After this all scans return nothing. The chip is still alive,
though. The LED blinks during the scans (it's hardware-controlled)
and another device in monitor mode can see the probe requests.
I confirmed that even C2H stop coming. I used aireplay-ng to send
some authentication or association frames (can't remember) which
require TX ACK report. I saw "failed to get tx report from firmware"
and no C2H.
While qBittorrent is needed to trigger this bug, simply downloading
a random Linux iso did not do the job. "Other" torrents did. It's
unclear why. Maybe it's uploading that triggers the bug.
I left iperf3 running all day and nothing happened. Only qBittorrent
can break it.
RTL8822CE doesn't have this problem. I can use qBittorrent with it
just fine.
I mounted debugfs and dumped the MAC registers during a scan using
this command:
for i in {00..20}; do sleep 0.5; cat /sys/kernel/debug/ieee80211/phy2/rtw88/mac_{0..7} > dead-$i.txt; done
I thought maybe some RX URBs failed silently and rtw88 stopped
sending them to the device (== stopped requesting data from it),
but that's not the case. [1]
I have the device in this state right now. Is there anything else
I should look at?
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/realtek/rtw88/usb.c?h=v6.10.11&id=25eaef533bf3ccc6fee5067aac16f41f280e343e#n641
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: USB devices randomly stop receiving anything
2024-09-25 11:46 rtw88: USB devices randomly stop receiving anything Bitterblue Smith
@ 2024-09-26 13:04 ` petter
2024-09-26 16:51 ` Bitterblue Smith
2024-09-29 11:43 ` Bitterblue Smith
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
2 siblings, 1 reply; 12+ messages in thread
From: petter @ 2024-09-26 13:04 UTC (permalink / raw)
To: Bitterblue Smith; +Cc: linux-wireless, Ping-Ke Shih
On 2024-09-25 13:46, Bitterblue Smith wrote:
> Hi,
>
> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
> I assume all USB devices are affected. If I have qBittorrent running,
> the wifi stops working after a few hours:
>
> Sep 24 00:48:21 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:21 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:23 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:23 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:25 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:25 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:27 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:27 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:29 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:29 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:31 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:31 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:33 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:33 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:35 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:35 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:37 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:37 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:39 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:39 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:41 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:41 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
> CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Added
> BSSID ... into ignore list, ignoring for 10 seconds
> Sep 24 00:48:42 ideapad2 NetworkManager[433]: <info> [1727128122.0377]
> device (wlp3s0f3u2i2): supplicant interface state: completed ->
> disconnected
> Sep 24 00:48:45 ideapad2 NetworkManager[433]: <info> [1727128125.6030]
> device (wlp3s0f3u2i2): supplicant interface state: disconnected ->
> scanning
> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Removed
> BSSID ... from ignore list (clear)
> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: SME:
> Trying to authenticate with ... (SSID='...' freq=2472 MHz)
> Sep 24 00:48:50 ideapad2 kernel: wlp3s0f3u2i2: authenticate with ...
> (local address=,,,)
> Sep 24 00:48:51 ideapad2 NetworkManager[433]: <info> [1727128131.2488]
> device (wlp3s0f3u2i2): supplicant interface state: scanning ->
> authenticating
> Sep 24 00:48:51 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
> 1/3)
> Sep 24 00:48:51 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:52 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
> 2/3)
> Sep 24 00:48:52 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:53 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
> 3/3)
> Sep 24 00:48:53 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
> report from firmware
> Sep 24 00:48:54 ideapad2 kernel: wlp3s0f3u2i2: authentication with ...
> timed out
>
> After this all scans return nothing. The chip is still alive,
> though. The LED blinks during the scans (it's hardware-controlled)
> and another device in monitor mode can see the probe requests.
>
> I confirmed that even C2H stop coming. I used aireplay-ng to send
> some authentication or association frames (can't remember) which
> require TX ACK report. I saw "failed to get tx report from firmware"
> and no C2H.
>
> While qBittorrent is needed to trigger this bug, simply downloading
> a random Linux iso did not do the job. "Other" torrents did. It's
> unclear why. Maybe it's uploading that triggers the bug.
>
> I left iperf3 running all day and nothing happened. Only qBittorrent
> can break it.
>
> RTL8822CE doesn't have this problem. I can use qBittorrent with it
> just fine.
>
> I mounted debugfs and dumped the MAC registers during a scan using
> this command:
>
> for i in {00..20}; do sleep 0.5; cat
> /sys/kernel/debug/ieee80211/phy2/rtw88/mac_{0..7} > dead-$i.txt; done
>
> I thought maybe some RX URBs failed silently and rtw88 stopped
> sending them to the device (== stopped requesting data from it),
> but that's not the case. [1]
>
> I have the device in this state right now. Is there anything else
> I should look at?
What hardware are you running on? This looks very similar to some issue
me and some colleagues have seen from time-to-time when using LM842
(8822cu)[1][2][3], when running it on our i.MX6SX arm board. It has thou
been harder and harder to trigger that issue on our board. But the
outcome when it happens is identical to your. In our case we get it when
running a number of mender streamed installations. We also can trigger
something similar when doing hw-offload scanning, so we have disabled
that in our setup. For us however it seems related to slower platforms,
we haven't seen it on systems with better performance. Also it become a
lot better when the USB RX aggregation was added to the chip + running
with the patch in [3]. We also got it on LM808 (8812AU) then after
suggestion we tried morrownr driver [4] with USB aggregation enabled and
couldn't trigger it anymore. But feels like all these things are just
ways to reduce the risk of getting into this state. So I think you just
found yet another way to reproduce the behavior. So hopefully that is
the first step of finding the root cause of it. I will gladly help to
test things in this area if you guys find something interesting.
[1]
https://lore.kernel.org/all/20230526055551.1823094-1-petter@technux.se/t/
[2]
https://lore.kernel.org/linux-wireless/20230616122612.GL18491@pengutronix.de/T/#t
[3]
https://lore.kernel.org/linux-wireless/20230612134048.321500-1-petter@technux.se/
[4] https://github.com/morrownr/8812au-20210820
>
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/realtek/rtw88/usb.c?h=v6.10.11&id=25eaef533bf3ccc6fee5067aac16f41f280e343e#n641
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: USB devices randomly stop receiving anything
2024-09-26 13:04 ` petter
@ 2024-09-26 16:51 ` Bitterblue Smith
2024-09-27 6:44 ` petter
0 siblings, 1 reply; 12+ messages in thread
From: Bitterblue Smith @ 2024-09-26 16:51 UTC (permalink / raw)
To: petter; +Cc: linux-wireless, Ping-Ke Shih
On 26/09/2024 16:04, petter@technux.se wrote:
> On 2024-09-25 13:46, Bitterblue Smith wrote:
>> Hi,
>>
>> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
>> I assume all USB devices are affected. If I have qBittorrent running,
>> the wifi stops working after a few hours:
>>
>> Sep 24 00:48:21 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:21 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:23 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:23 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:25 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:25 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:27 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:27 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:29 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:29 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:31 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:31 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:33 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:33 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:35 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:35 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:37 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:37 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:39 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:39 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:41 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
>> Sep 24 00:48:41 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
>> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Added BSSID ... into ignore list, ignoring for 10 seconds
>> Sep 24 00:48:42 ideapad2 NetworkManager[433]: <info> [1727128122.0377] device (wlp3s0f3u2i2): supplicant interface state: completed -> disconnected
>> Sep 24 00:48:45 ideapad2 NetworkManager[433]: <info> [1727128125.6030] device (wlp3s0f3u2i2): supplicant interface state: disconnected -> scanning
>> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Removed BSSID ... from ignore list (clear)
>> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: SME: Trying to authenticate with ... (SSID='...' freq=2472 MHz)
>> Sep 24 00:48:50 ideapad2 kernel: wlp3s0f3u2i2: authenticate with ... (local address=,,,)
>> Sep 24 00:48:51 ideapad2 NetworkManager[433]: <info> [1727128131.2488] device (wlp3s0f3u2i2): supplicant interface state: scanning -> authenticating
>> Sep 24 00:48:51 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 1/3)
>> Sep 24 00:48:51 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:52 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 2/3)
>> Sep 24 00:48:52 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:53 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 3/3)
>> Sep 24 00:48:53 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
>> Sep 24 00:48:54 ideapad2 kernel: wlp3s0f3u2i2: authentication with ... timed out
>>
>> After this all scans return nothing. The chip is still alive,
>> though. The LED blinks during the scans (it's hardware-controlled)
>> and another device in monitor mode can see the probe requests.
>>
>> I confirmed that even C2H stop coming. I used aireplay-ng to send
>> some authentication or association frames (can't remember) which
>> require TX ACK report. I saw "failed to get tx report from firmware"
>> and no C2H.
>>
>> While qBittorrent is needed to trigger this bug, simply downloading
>> a random Linux iso did not do the job. "Other" torrents did. It's
>> unclear why. Maybe it's uploading that triggers the bug.
>>
>> I left iperf3 running all day and nothing happened. Only qBittorrent
>> can break it.
>>
>> RTL8822CE doesn't have this problem. I can use qBittorrent with it
>> just fine.
>>
>> I mounted debugfs and dumped the MAC registers during a scan using
>> this command:
>>
>> for i in {00..20}; do sleep 0.5; cat /sys/kernel/debug/ieee80211/phy2/rtw88/mac_{0..7} > dead-$i.txt; done
>>
>> I thought maybe some RX URBs failed silently and rtw88 stopped
>> sending them to the device (== stopped requesting data from it),
>> but that's not the case. [1]
>>
>> I have the device in this state right now. Is there anything else
>> I should look at?
>
> What hardware are you running on? This looks very similar to some issue me and some colleagues have seen from time-to-time when using LM842 (8822cu)[1][2][3], when running it on our i.MX6SX arm board. It has thou been harder and harder to trigger that issue on our board. But the outcome when it happens is identical to your. In our case we get it when running a number of mender streamed installations. We also can trigger something similar when doing hw-offload scanning, so we have disabled that in our setup. For us however it seems related to slower platforms, we haven't seen it on systems with better performance. Also it become a lot better when the USB RX aggregation was added to the chip + running with the patch in [3]. We also got it on LM808 (8812AU) then after suggestion we tried morrownr driver [4] with USB aggregation enabled and couldn't trigger it anymore. But feels like all these things are just ways to reduce the risk of getting into this state. So I think you just
> found yet another way to reproduce the behavior. So hopefully that is the first step of finding the root cause of it. I will gladly help to test things in this area if you guys find something interesting.
>
> [1] https://lore.kernel.org/all/20230526055551.1823094-1-petter@technux.se/t/
> [2] https://lore.kernel.org/linux-wireless/20230616122612.GL18491@pengutronix.de/T/#t
> [3] https://lore.kernel.org/linux-wireless/20230612134048.321500-1-petter@technux.se/
> [4] https://github.com/morrownr/8812au-20210820
>
>>
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/realtek/rtw88/usb.c?h=v6.10.11&id=25eaef533bf3ccc6fee5067aac16f41f280e343e#n641
The hardware is a Lenovo Ideapad 3 15ADA6 with AMD Athlon Gold 3150U.
How does Mender handle the data transfers? Does it have something
in common with torrents?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: USB devices randomly stop receiving anything
2024-09-26 16:51 ` Bitterblue Smith
@ 2024-09-27 6:44 ` petter
0 siblings, 0 replies; 12+ messages in thread
From: petter @ 2024-09-27 6:44 UTC (permalink / raw)
To: Bitterblue Smith; +Cc: linux-wireless, Ping-Ke Shih
On 2024-09-26 18:51, Bitterblue Smith wrote:
> On 26/09/2024 16:04, petter@technux.se wrote:
>> On 2024-09-25 13:46, Bitterblue Smith wrote:
>>> Hi,
>>>
>>> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
>>> I assume all USB devices are affected. If I have qBittorrent running,
>>> the wifi stops working after a few hours:
>>>
>>> Sep 24 00:48:21 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:21 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:23 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:23 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:25 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:25 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:27 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:27 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:29 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:29 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:31 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:31 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:33 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:33 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:35 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:35 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:37 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:37 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:39 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:39 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:41 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-BEACON-LOSS
>>> Sep 24 00:48:41 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2:
>>> CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
>>> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Added
>>> BSSID ... into ignore list, ignoring for 10 seconds
>>> Sep 24 00:48:42 ideapad2 NetworkManager[433]: <info>
>>> [1727128122.0377] device (wlp3s0f3u2i2): supplicant interface state:
>>> completed -> disconnected
>>> Sep 24 00:48:45 ideapad2 NetworkManager[433]: <info>
>>> [1727128125.6030] device (wlp3s0f3u2i2): supplicant interface state:
>>> disconnected -> scanning
>>> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Removed
>>> BSSID ... from ignore list (clear)
>>> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: SME:
>>> Trying to authenticate with ... (SSID='...' freq=2472 MHz)
>>> Sep 24 00:48:50 ideapad2 kernel: wlp3s0f3u2i2: authenticate with ...
>>> (local address=,,,)
>>> Sep 24 00:48:51 ideapad2 NetworkManager[433]: <info>
>>> [1727128131.2488] device (wlp3s0f3u2i2): supplicant interface state:
>>> scanning -> authenticating
>>> Sep 24 00:48:51 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
>>> 1/3)
>>> Sep 24 00:48:51 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:52 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
>>> 2/3)
>>> Sep 24 00:48:52 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:53 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try
>>> 3/3)
>>> Sep 24 00:48:53 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx
>>> report from firmware
>>> Sep 24 00:48:54 ideapad2 kernel: wlp3s0f3u2i2: authentication with
>>> ... timed out
>>>
>>> After this all scans return nothing. The chip is still alive,
>>> though. The LED blinks during the scans (it's hardware-controlled)
>>> and another device in monitor mode can see the probe requests.
>>>
>>> I confirmed that even C2H stop coming. I used aireplay-ng to send
>>> some authentication or association frames (can't remember) which
>>> require TX ACK report. I saw "failed to get tx report from firmware"
>>> and no C2H.
>>>
>>> While qBittorrent is needed to trigger this bug, simply downloading
>>> a random Linux iso did not do the job. "Other" torrents did. It's
>>> unclear why. Maybe it's uploading that triggers the bug.
>>>
>>> I left iperf3 running all day and nothing happened. Only qBittorrent
>>> can break it.
>>>
>>> RTL8822CE doesn't have this problem. I can use qBittorrent with it
>>> just fine.
>>>
>>> I mounted debugfs and dumped the MAC registers during a scan using
>>> this command:
>>>
>>> for i in {00..20}; do sleep 0.5; cat
>>> /sys/kernel/debug/ieee80211/phy2/rtw88/mac_{0..7} > dead-$i.txt; done
>>>
>>> I thought maybe some RX URBs failed silently and rtw88 stopped
>>> sending them to the device (== stopped requesting data from it),
>>> but that's not the case. [1]
>>>
>>> I have the device in this state right now. Is there anything else
>>> I should look at?
>>
>> What hardware are you running on? This looks very similar to some
>> issue me and some colleagues have seen from time-to-time when using
>> LM842 (8822cu)[1][2][3], when running it on our i.MX6SX arm board. It
>> has thou been harder and harder to trigger that issue on our board.
>> But the outcome when it happens is identical to your. In our case we
>> get it when running a number of mender streamed installations. We also
>> can trigger something similar when doing hw-offload scanning, so we
>> have disabled that in our setup. For us however it seems related to
>> slower platforms, we haven't seen it on systems with better
>> performance. Also it become a lot better when the USB RX aggregation
>> was added to the chip + running with the patch in [3]. We also got it
>> on LM808 (8812AU) then after suggestion we tried morrownr driver [4]
>> with USB aggregation enabled and couldn't trigger it anymore. But
>> feels like all these things are just ways to reduce the risk of
>> getting into this state. So I think you just
>> found yet another way to reproduce the behavior. So hopefully that is
>> the first step of finding the root cause of it. I will gladly help to
>> test things in this area if you guys find something interesting.
>>
>> [1]
>> https://lore.kernel.org/all/20230526055551.1823094-1-petter@technux.se/t/
>> [2]
>> https://lore.kernel.org/linux-wireless/20230616122612.GL18491@pengutronix.de/T/#t
>> [3]
>> https://lore.kernel.org/linux-wireless/20230612134048.321500-1-petter@technux.se/
>> [4] https://github.com/morrownr/8812au-20210820
>>
>>>
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/realtek/rtw88/usb.c?h=v6.10.11&id=25eaef533bf3ccc6fee5067aac16f41f280e343e#n641
>
> The hardware is a Lenovo Ideapad 3 15ADA6 with AMD Athlon Gold 3150U.
>
> How does Mender handle the data transfers? Does it have something
> in common with torrents?
In my case I guess they behaves a bit the same, meaning that mender will
sort of stream data, by downloading the os image in chunks and wire it
to disk. So both the torrent and mender will most likely stress the
network and system in a similar way by performing a lot of RX + disk I/O
which seems to make the driver behaves bad after some time.
For me it feels like it easier to trigger the issue with mender updates
when combining it with some tx traffic etc which I guess is happening
when you use qbitorrent..
I will see if I can reproduce the issue using bitorrent also in some
good way. Also after the usb aggregation changes I do not see issue with
"failed to get tx report" that frequent. Instead its more often stuck
with "firmware failed to leave lps state" but rest of the side-effects
as you describe is the same.
INFO[0000] Native sector size of block device /dev/mmcblkX is 512 bytes.
Mender will write in chunks of 1048576 bytes
.................[54407.626931] rtw_8822cu 1-1:1.2: firmware failed to
leave lps state
[54408.136328] Bluetooth: hci0: urb fb582009 failed to resubmit (2)
[54408.622588] wlxxxxxxx: deauthenticating from e5:65:d5:35:95:d5 by
local choice (Reason: 3=DEAUTH_LEAVING)
[54408.919367] rtw_8822cu 1-1:1.2: firmware failed to leave lps state
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: USB devices randomly stop receiving anything
2024-09-25 11:46 rtw88: USB devices randomly stop receiving anything Bitterblue Smith
2024-09-26 13:04 ` petter
@ 2024-09-29 11:43 ` Bitterblue Smith
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
2 siblings, 0 replies; 12+ messages in thread
From: Bitterblue Smith @ 2024-09-29 11:43 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org; +Cc: Ping-Ke Shih
On 25/09/2024 14:46, Bitterblue Smith wrote:
[...]
> I have the device in this state right now. Is there anything else
> I should look at?
>
The host controller died when I unplugged another device,
so I can't get any more information from this one.
^ permalink raw reply [flat|nested] 12+ messages in thread
* rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-09-25 11:46 rtw88: USB devices randomly stop receiving anything Bitterblue Smith
2024-09-26 13:04 ` petter
2024-09-29 11:43 ` Bitterblue Smith
@ 2024-09-30 20:56 ` Bitterblue Smith
2024-10-01 1:25 ` Ping-Ke Shih
` (2 more replies)
2 siblings, 3 replies; 12+ messages in thread
From: Bitterblue Smith @ 2024-09-30 20:56 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org; +Cc: Ping-Ke Shih, Sascha Hauer
On 25/09/2024 14:46, Bitterblue Smith wrote:
> Hi,
>
> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
> I assume all USB devices are affected. If I have qBittorrent running,
> the wifi stops working after a few hours:
>
> Sep 24 00:48:21 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:21 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:23 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:23 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:25 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:25 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:27 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:27 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:29 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:29 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:31 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:31 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:33 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:33 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:35 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:35 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:37 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:37 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:39 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:39 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:41 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
> Sep 24 00:48:41 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
> Sep 24 00:48:42 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Added BSSID ... into ignore list, ignoring for 10 seconds
> Sep 24 00:48:42 ideapad2 NetworkManager[433]: <info> [1727128122.0377] device (wlp3s0f3u2i2): supplicant interface state: completed -> disconnected
> Sep 24 00:48:45 ideapad2 NetworkManager[433]: <info> [1727128125.6030] device (wlp3s0f3u2i2): supplicant interface state: disconnected -> scanning
> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: Removed BSSID ... from ignore list (clear)
> Sep 24 00:48:47 ideapad2 wpa_supplicant[1290]: wlp3s0f3u2i2: SME: Trying to authenticate with ... (SSID='...' freq=2472 MHz)
> Sep 24 00:48:50 ideapad2 kernel: wlp3s0f3u2i2: authenticate with ... (local address=,,,)
> Sep 24 00:48:51 ideapad2 NetworkManager[433]: <info> [1727128131.2488] device (wlp3s0f3u2i2): supplicant interface state: scanning -> authenticating
> Sep 24 00:48:51 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 1/3)
> Sep 24 00:48:51 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:52 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 2/3)
> Sep 24 00:48:52 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:53 ideapad2 kernel: wlp3s0f3u2i2: send auth to ... (try 3/3)
> Sep 24 00:48:53 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
> Sep 24 00:48:54 ideapad2 kernel: wlp3s0f3u2i2: authentication with ... timed out
>
> After this all scans return nothing. The chip is still alive,
> though. The LED blinks during the scans (it's hardware-controlled)
> and another device in monitor mode can see the probe requests.
>
> I confirmed that even C2H stop coming. I used aireplay-ng to send
> some authentication or association frames (can't remember) which
> require TX ACK report. I saw "failed to get tx report from firmware"
> and no C2H.
>
> While qBittorrent is needed to trigger this bug, simply downloading
> a random Linux iso did not do the job. "Other" torrents did. It's
> unclear why. Maybe it's uploading that triggers the bug.
>
> I left iperf3 running all day and nothing happened. Only qBittorrent
> can break it.
>
> RTL8822CE doesn't have this problem. I can use qBittorrent with it
> just fine.
>
> I mounted debugfs and dumped the MAC registers during a scan using
> this command:
>
> for i in {00..20}; do sleep 0.5; cat /sys/kernel/debug/ieee80211/phy2/rtw88/mac_{0..7} > dead-$i.txt; done
>
> I thought maybe some RX URBs failed silently and rtw88 stopped
> sending them to the device (== stopped requesting data from it),
> but that's not the case. [1]
>
> I have the device in this state right now. Is there anything else
> I should look at?
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/realtek/rtw88/usb.c?h=v6.10.11&id=25eaef533bf3ccc6fee5067aac16f41f280e343e#n641
I still don't know why qBittorrent is required to trigger this bug,
but I found the problem.
alloc_skb fails (silently) therefore the RX URB is not submitted
ever again. There are only 4 RX URBs.
static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
{
struct rtw_dev *rtwdev = rtwusb->rtwdev;
int error;
rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
if (!rxcb->rx_skb)
return;
usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
rtw_usb_read_port_complete, rxcb);
error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
I added an error message there:
rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
if (!rxcb->rx_skb) {
rtw_err(rtwdev, "failed to allocate rx_skb\n");
return;
}
Now I get these before it stops receiving:
Sep 30 22:34:38 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to allocate rx_skb
Sep 30 22:35:03 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to allocate rx_skb
Sep 30 22:35:03 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to allocate rx_skb
Sep 30 22:35:03 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to allocate rx_skb
Sep 30 22:35:05 ideapad2 wpa_supplicant[1287]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS Sep 30 22:35:05 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 30 22:35:07 ideapad2 wpa_supplicant[1287]: wlp3s0f3u2i2: CTRL-EVENT-BEACON-LOSS
Sep 30 22:35:07 ideapad2 kernel: rtw_8723du 1-2:1.2: failed to get tx report from firmware
Sep 30 22:35:08 ideapad2 wpa_supplicant[1287]: wlp3s0f3u2i2: CTRL-EVENT-DISCONNECTED bssid=... reason=4 locally_generated=1
What to do about it?
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
@ 2024-10-01 1:25 ` Ping-Ke Shih
2024-10-07 22:03 ` Bitterblue Smith
2024-10-02 10:58 ` Kalle Valo
2024-10-06 15:43 ` Michał Pecio
2 siblings, 1 reply; 12+ messages in thread
From: Ping-Ke Shih @ 2024-10-01 1:25 UTC (permalink / raw)
To: Bitterblue Smith, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>
> alloc_skb fails (silently) therefore the RX URB is not submitted
> ever again. There are only 4 RX URBs.
Though only 4 RX URB, it might be possible more than 4 RX skb.
In rtw_usb_read_port_complete(), queue RX skb into rtwusb->rx_queue, and kick
off rx_work. It means some RX skb are inflight, but not sure how many.
>
> static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
> {
> struct rtw_dev *rtwdev = rtwusb->rtwdev;
> int error;
>
> rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
> if (!rxcb->rx_skb)
> return;
>
> usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
> usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
> rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
> rtw_usb_read_port_complete, rxcb);
>
> error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
>
> I added an error message there:
>
> rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
> if (!rxcb->rx_skb) {
> rtw_err(rtwdev, "failed to allocate rx_skb\n");
> return;
> }
My first thought is to change GFP_ATOMIC to GFP_KERNEL, but kernel documentation
notes that
"NEVER SLEEP IN A COMPLETION HANDLER. These are often called in atomic context."
However, I feel it is possible to do rtw_usb_rx_resubmit() in a work.
Another thought is to allocate a new skb with size urb->actual_length, and
copy received data to the new skb, and queue to rtwusb->rx_queue. Then reuse
the original rx_skb. This thought is based on what urb->actual_length would
be smaller than RTW_USB_MAX_RECVBUF_SZ, but not very sure if this is fact.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
2024-10-01 1:25 ` Ping-Ke Shih
@ 2024-10-02 10:58 ` Kalle Valo
2024-10-03 4:26 ` Ping-Ke Shih
2024-10-06 15:43 ` Michał Pecio
2 siblings, 1 reply; 12+ messages in thread
From: Kalle Valo @ 2024-10-02 10:58 UTC (permalink / raw)
To: Bitterblue Smith
Cc: linux-wireless@vger.kernel.org, Ping-Ke Shih, Sascha Hauer
Bitterblue Smith <rtl8821cerfe2@gmail.com> writes:
> I still don't know why qBittorrent is required to trigger this bug,
> but I found the problem.
>
> alloc_skb fails (silently) therefore the RX URB is not submitted
> ever again. There are only 4 RX URBs.
Why is alloc_skb() failing silently? Or is that by design? I was under
impression that we should not have error messages for allocation
failures but is that only for kmalloc() & co?
I did a quick look and in wireless drivers some of the alloc_skb()
callers print an error and some fail silently. I think we should start
printing errors for all alloc_skb() calls. Thoughts?
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-10-02 10:58 ` Kalle Valo
@ 2024-10-03 4:26 ` Ping-Ke Shih
0 siblings, 0 replies; 12+ messages in thread
From: Ping-Ke Shih @ 2024-10-03 4:26 UTC (permalink / raw)
To: Kalle Valo, Bitterblue Smith; +Cc: linux-wireless@vger.kernel.org, Sascha Hauer
> I did a quick look and in wireless drivers some of the alloc_skb()
> callers print an error and some fail silently. I think we should start
> printing errors for all alloc_skb() calls. Thoughts?
I think we don't always print an error because TCP protocol can guarantee
reliability and packets lost are normal. Also in memory limited platforms
it might be regular that fails allocating memory causing flooding log if
we always print an error.
Maybe print with rate limit is a choice? But this kind of log seems
unnecessary and annoying to users.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
2024-10-01 1:25 ` Ping-Ke Shih
2024-10-02 10:58 ` Kalle Valo
@ 2024-10-06 15:43 ` Michał Pecio
2024-10-07 22:04 ` Bitterblue Smith
2 siblings, 1 reply; 12+ messages in thread
From: Michał Pecio @ 2024-10-06 15:43 UTC (permalink / raw)
To: rtl8821cerfe2; +Cc: linux-wireless, pkshih, sha, Kalle Valo
Hi,
> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
Does it mean there are working patches for 8811AU somewhere?
I have this damn thing, lying unused because unsupported.
> The host controller died when I unplugged another device
xhci not responding, assume dead?
Yuck. Maybe linux-usb should hear about it if it's reproducible.
> alloc_skb fails (silently) therefore the RX URB is not submitted
> ever again. There are only 4 RX URBs.
drivers/net/usb/usbnet.c::rx_submit() deals with it by queuing a work
to resubmit the URB later using blocking allocations.
Failure of usb_submit_urb() is handled same way. This too can return
-ENOMEM, on xhci for example.
Regards,
Michal
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-10-01 1:25 ` Ping-Ke Shih
@ 2024-10-07 22:03 ` Bitterblue Smith
0 siblings, 0 replies; 12+ messages in thread
From: Bitterblue Smith @ 2024-10-07 22:03 UTC (permalink / raw)
To: Ping-Ke Shih, linux-wireless@vger.kernel.org; +Cc: Sascha Hauer
On 01/10/2024 04:25, Ping-Ke Shih wrote:
> Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>>
>> alloc_skb fails (silently) therefore the RX URB is not submitted
>> ever again. There are only 4 RX URBs.
>
> Though only 4 RX URB, it might be possible more than 4 RX skb.
> In rtw_usb_read_port_complete(), queue RX skb into rtwusb->rx_queue, and kick
> off rx_work. It means some RX skb are inflight, but not sure how many.
>
>>
>> static void rtw_usb_rx_resubmit(struct rtw_usb *rtwusb, struct rx_usb_ctrl_block *rxcb)
>> {
>> struct rtw_dev *rtwdev = rtwusb->rtwdev;
>> int error;
>>
>> rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
>> if (!rxcb->rx_skb)
>> return;
>>
>> usb_fill_bulk_urb(rxcb->rx_urb, rtwusb->udev,
>> usb_rcvbulkpipe(rtwusb->udev, rtwusb->pipe_in),
>> rxcb->rx_skb->data, RTW_USB_MAX_RECVBUF_SZ,
>> rtw_usb_read_port_complete, rxcb);
>>
>> error = usb_submit_urb(rxcb->rx_urb, GFP_ATOMIC);
>>
>> I added an error message there:
>>
>> rxcb->rx_skb = alloc_skb(RTW_USB_MAX_RECVBUF_SZ, GFP_ATOMIC);
>> if (!rxcb->rx_skb) {
>> rtw_err(rtwdev, "failed to allocate rx_skb\n");
>> return;
>> }
>
> My first thought is to change GFP_ATOMIC to GFP_KERNEL, but kernel documentation
> notes that
> "NEVER SLEEP IN A COMPLETION HANDLER. These are often called in atomic context."
> However, I feel it is possible to do rtw_usb_rx_resubmit() in a work.
>
Yes, maybe even in the existing rx_work.
> Another thought is to allocate a new skb with size urb->actual_length, and
> copy received data to the new skb, and queue to rtwusb->rx_queue. Then reuse
> the original rx_skb. This thought is based on what urb->actual_length would
> be smaller than RTW_USB_MAX_RECVBUF_SZ, but not very sure if this is fact.
>
>
I think actual_length is often close to RTW_USB_MAX_RECVBUF_SZ (32768).
Only with RTL8723DU it's small, probably around 1600 bytes, because it
doesn't use aggregation.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck
2024-10-06 15:43 ` Michał Pecio
@ 2024-10-07 22:04 ` Bitterblue Smith
0 siblings, 0 replies; 12+ messages in thread
From: Bitterblue Smith @ 2024-10-07 22:04 UTC (permalink / raw)
To: Michał Pecio; +Cc: linux-wireless, pkshih, sha, Kalle Valo
[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]
On 06/10/2024 18:43, Michał Pecio wrote:
> Hi,
>
>> I have this problem with RTL8811CU, RTL8723DU, RTL8811AU, RTL8812AU.
>
> Does it mean there are working patches for 8811AU somewhere?
> I have this damn thing, lying unused because unsupported.
>
Indeed, v1 is here:
https://lore.kernel.org/linux-wireless/ade57ca1-444f-49e2-b49e-f4b9da65b2cc@gmail.com/
v2 is almost ready.
>> The host controller died when I unplugged another device
>
> xhci not responding, assume dead?
> Yuck. Maybe linux-usb should hear about it if it's reproducible.
>
Yes, that's what happened:
Sep 29 14:32:10 ideapad2 kernel: xhci_hcd 0000:03:00.3: xHCI host not responding to stop endpoint command
Sep 29 14:32:10 ideapad2 kernel: xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead
Sep 29 14:32:10 ideapad2 kernel: xhci_hcd 0000:03:00.3: HC died; cleaning up
I'm blaming the device, or a loose port. The device is one of those
cheap wifi adapters from Aliexpress. It has a bad habit of
disappearing when stressed. Unless this is not supposed to happen
even with bad devices/ports?
>> alloc_skb fails (silently) therefore the RX URB is not submitted
>> ever again. There are only 4 RX URBs.
>
> drivers/net/usb/usbnet.c::rx_submit() deals with it by queuing a work
> to resubmit the URB later using blocking allocations.
>
That looks like what the out-of-tree Realtek wifi drivers do as well.
> Failure of usb_submit_urb() is handled same way. This too can return
> -ENOMEM, on xhci for example.
>
> Regards,
> Michal
[-- Attachment #2: 300M Wireless USB Adapter.jpg --]
[-- Type: image/jpeg, Size: 105204 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-10-07 22:04 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 11:46 rtw88: USB devices randomly stop receiving anything Bitterblue Smith
2024-09-26 13:04 ` petter
2024-09-26 16:51 ` Bitterblue Smith
2024-09-27 6:44 ` petter
2024-09-29 11:43 ` Bitterblue Smith
2024-09-30 20:56 ` rtw88: alloc_skb(32768, GFP_ATOMIC) fails, driver gets stuck Bitterblue Smith
2024-10-01 1:25 ` Ping-Ke Shih
2024-10-07 22:03 ` Bitterblue Smith
2024-10-02 10:58 ` Kalle Valo
2024-10-03 4:26 ` Ping-Ke Shih
2024-10-06 15:43 ` Michał Pecio
2024-10-07 22:04 ` Bitterblue Smith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).