* [PATCH] wifi: rtw88: increase TX report timeout to fix race condition
@ 2026-05-01 15:04 luka.gejak
2026-05-01 19:26 ` Bitterblue Smith
0 siblings, 1 reply; 5+ messages in thread
From: luka.gejak @ 2026-05-01 15:04 UTC (permalink / raw)
To: Ping-Ke Shih, Kalle Valo
Cc: Yan-Hsuan Chuang, Brian Norris, Stanislaw Gruszka, linux-wireless,
linux-kernel, Luka Gejak, stable
From: Luka Gejak <luka.gejak@linux.dev>
The driver expects the firmware to report TX status within 500ms.
However, a race condition exists when the hardware is under heavy TX
load and is simultaneously interrupted by background scans or
power-saving state transitions. During these events, the firmware may
go off-channel for longer than 500ms, delaying the TX reports.
When this happens, the purge timer fires prematurely, dropping the
tracking skbs from the queue and spamming the kernel log with:
"failed to get tx report from firmware". Dropping these tracking skbs
prevents the driver from reporting TX status back to mac80211, which
breaks rate control accounting and degrades performance.
Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough
to comfortably accommodate the duration of full WiFi background scans
and sleep transitions without incorrectly tripping the purge timer,
while still eventually catching true firmware lockups.
Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver")
Cc: stable@vger.kernel.org
Tested-by: Luka Gejak <luka.gejak@linux.dev>
Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
---
drivers/net/wireless/realtek/rtw88/tx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/realtek/rtw88/tx.h b/drivers/net/wireless/realtek/rtw88/tx.h
index d34cdeca16f1..95d15e4f5d34 100644
--- a/drivers/net/wireless/realtek/rtw88/tx.h
+++ b/drivers/net/wireless/realtek/rtw88/tx.h
@@ -7,7 +7,7 @@
#define RTK_TX_MAX_AGG_NUM_MASK 0x1f
-#define RTW_TX_PROBE_TIMEOUT msecs_to_jiffies(500)
+#define RTW_TX_PROBE_TIMEOUT msecs_to_jiffies(2500)
struct rtw_tx_desc {
__le32 w0;
--
2.54.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition 2026-05-01 15:04 [PATCH] wifi: rtw88: increase TX report timeout to fix race condition luka.gejak @ 2026-05-01 19:26 ` Bitterblue Smith 2026-05-01 20:46 ` Luka Gejak 0 siblings, 1 reply; 5+ messages in thread From: Bitterblue Smith @ 2026-05-01 19:26 UTC (permalink / raw) To: luka.gejak, Ping-Ke Shih, Kalle Valo Cc: Yan-Hsuan Chuang, Brian Norris, Stanislaw Gruszka, linux-wireless, linux-kernel, stable On 01/05/2026 18:04, luka.gejak@linux.dev wrote: > From: Luka Gejak <luka.gejak@linux.dev> > > The driver expects the firmware to report TX status within 500ms. > However, a race condition exists when the hardware is under heavy TX > load and is simultaneously interrupted by background scans or > power-saving state transitions. During these events, the firmware may > go off-channel for longer than 500ms, delaying the TX reports. > But power saving state transitions should not happen during heavy TX load. > When this happens, the purge timer fires prematurely, dropping the > tracking skbs from the queue and spamming the kernel log with: > "failed to get tx report from firmware". Dropping these tracking skbs > prevents the driver from reporting TX status back to mac80211, which > breaks rate control accounting and degrades performance. > But mac80211 doesn't handle rate control for these chips. How much does performance degrade? > Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough > to comfortably accommodate the duration of full WiFi background scans > and sleep transitions without incorrectly tripping the purge timer, > while still eventually catching true firmware lockups. > rtw88 supports many chips. Which one are you using? Perhaps provide a full description of the problem you encountered. > Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") > Cc: stable@vger.kernel.org > Tested-by: Luka Gejak <luka.gejak@linux.dev> > Signed-off-by: Luka Gejak <luka.gejak@linux.dev> > --- > drivers/net/wireless/realtek/rtw88/tx.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/wireless/realtek/rtw88/tx.h b/drivers/net/wireless/realtek/rtw88/tx.h > index d34cdeca16f1..95d15e4f5d34 100644 > --- a/drivers/net/wireless/realtek/rtw88/tx.h > +++ b/drivers/net/wireless/realtek/rtw88/tx.h > @@ -7,7 +7,7 @@ > > #define RTK_TX_MAX_AGG_NUM_MASK 0x1f > > -#define RTW_TX_PROBE_TIMEOUT msecs_to_jiffies(500) > +#define RTW_TX_PROBE_TIMEOUT msecs_to_jiffies(2500) > > struct rtw_tx_desc { > __le32 w0; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition 2026-05-01 19:26 ` Bitterblue Smith @ 2026-05-01 20:46 ` Luka Gejak 2026-05-01 21:28 ` Bitterblue Smith 0 siblings, 1 reply; 5+ messages in thread From: Luka Gejak @ 2026-05-01 20:46 UTC (permalink / raw) To: Bitterblue Smith, Ping-Ke Shih, Kalle Valo Cc: Yan-Hsuan Chuang, Brian Norris, Stanislaw Gruszka, linux-wireless, linux-kernel, stable, luka.gejak On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote: >On 01/05/2026 18:04, luka.gejak@linux.dev wrote: >> From: Luka Gejak <luka.gejak@linux.dev> >> >> The driver expects the firmware to report TX status within 500ms. >> However, a race condition exists when the hardware is under heavy TX >> load and is simultaneously interrupted by background scans or >> power-saving state transitions. During these events, the firmware may >> go off-channel for longer than 500ms, delaying the TX reports. >> Hi Bitterblue, thanks for the review. > >But power saving state transitions should not happen during heavy TX load. > You are absolutely right that power save transitions don't happen during heavy TX. The issue is strictly tied to off-channel dwell time. I reliably trigger this on my rtl8723du (USB) by forcing background scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware goes off-channel to scan, which delays the TX report well beyond the current 500ms threshold. >> When this happens, the purge timer fires prematurely, dropping the >> tracking skbs from the queue and spamming the kernel log with: >> "failed to get tx report from firmware". Dropping these tracking skbs >> prevents the driver from reporting TX status back to mac80211, which >> breaks rate control accounting and degrades performance. >> > >But mac80211 doesn't handle rate control for these chips. How much does >performance degrade? > I understand the firmware handles that internally. The performance degradation I am actually seeing is TCP window collapse, as the host stack interprets the dropped tracking skbs as packet loss. In my testing with iperf3, throughput drops from a steady 80-90 Mbps to near-zero for nearly 2 seconds following the scan before recovery begins. >> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough >> to comfortably accommodate the duration of full WiFi background scans >> and sleep transitions without incorrectly tripping the purge timer, >> while still eventually catching true firmware lockups. >> > >rtw88 supports many chips. Which one are you using? > >Perhaps provide a full description of the problem you encountered. > ... I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to 2500ms is too heavy-handed. Since this impacts all rtw88 chips, including PCIe variants where 500ms might be exactly what is needed to catch a real firmware lockup, the blast radius is too large. How would you prefer I handle this for the v2 patch? I can either implement a more conservative global bump, or make the timeout dynamic based on the HCI interface so USB devices get a longer timeout to accommodate the bus latency during scans. Best regards, Luka Gejak ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition 2026-05-01 20:46 ` Luka Gejak @ 2026-05-01 21:28 ` Bitterblue Smith 2026-05-01 21:33 ` Luka Gejak 0 siblings, 1 reply; 5+ messages in thread From: Bitterblue Smith @ 2026-05-01 21:28 UTC (permalink / raw) To: Luka Gejak, Ping-Ke Shih, Kalle Valo Cc: Yan-Hsuan Chuang, Brian Norris, Stanislaw Gruszka, linux-wireless, linux-kernel, stable On 01/05/2026 23:46, Luka Gejak wrote: > On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote: >> On 01/05/2026 18:04, luka.gejak@linux.dev wrote: >>> From: Luka Gejak <luka.gejak@linux.dev> >>> >>> The driver expects the firmware to report TX status within 500ms. >>> However, a race condition exists when the hardware is under heavy TX >>> load and is simultaneously interrupted by background scans or >>> power-saving state transitions. During these events, the firmware may >>> go off-channel for longer than 500ms, delaying the TX reports. >>> > Hi Bitterblue, > thanks for the review. >> >> But power saving state transitions should not happen during heavy TX load. >> > You are absolutely right that power save transitions don't happen > during heavy TX. The issue is strictly tied to off-channel dwell time. > I reliably trigger this on my rtl8723du (USB) by forcing background > scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware > goes off-channel to scan, which delays the TX report well beyond the > current 500ms threshold. > >>> When this happens, the purge timer fires prematurely, dropping the >>> tracking skbs from the queue and spamming the kernel log with: >>> "failed to get tx report from firmware". Dropping these tracking skbs >>> prevents the driver from reporting TX status back to mac80211, which >>> breaks rate control accounting and degrades performance. >>> >> >> But mac80211 doesn't handle rate control for these chips. How much does >> performance degrade? >> > > I understand the firmware handles that internally. The performance > degradation I am actually seeing is TCP window collapse, as the host > stack interprets the dropped tracking skbs as packet loss. In my > testing with iperf3, throughput drops from a steady 80-90 Mbps to > near-zero for nearly 2 seconds following the scan before recovery > begins. > >>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough >>> to comfortably accommodate the duration of full WiFi background scans >>> and sleep transitions without incorrectly tripping the purge timer, >>> while still eventually catching true firmware lockups. >>> >> >> rtw88 supports many chips. Which one are you using? >> >> Perhaps provide a full description of the problem you encountered. >> > > ... > > I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to > 2500ms is too heavy-handed. Since this impacts all rtw88 chips, > including PCIe variants where 500ms might be exactly what is needed to > catch a real firmware lockup, the blast radius is too large. How would > you prefer I handle this for the v2 patch? I can either implement a > more conservative global bump, or make the timeout dynamic based on > the HCI interface so USB devices get a longer timeout to accommodate > the bus latency during scans. > > Best regards, > Luka Gejak No idea, I'm just asking some questions... Actually, I have one more: what version of the driver did you test? My quick test with RTL8723DU doesn't show any "failed to get tx report from firmware" when scanning while running iperf3. Does it take a long time to trigger? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition 2026-05-01 21:28 ` Bitterblue Smith @ 2026-05-01 21:33 ` Luka Gejak 0 siblings, 0 replies; 5+ messages in thread From: Luka Gejak @ 2026-05-01 21:33 UTC (permalink / raw) To: Bitterblue Smith, Ping-Ke Shih, Kalle Valo Cc: Yan-Hsuan Chuang, Brian Norris, Stanislaw Gruszka, linux-wireless, linux-kernel, stable, luka.gejak On May 1, 2026 11:28:38 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote: >On 01/05/2026 23:46, Luka Gejak wrote: >> On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote: >>> On 01/05/2026 18:04, luka.gejak@linux.dev wrote: >>>> From: Luka Gejak <luka.gejak@linux.dev> >>>> >>>> The driver expects the firmware to report TX status within 500ms. >>>> However, a race condition exists when the hardware is under heavy TX >>>> load and is simultaneously interrupted by background scans or >>>> power-saving state transitions. During these events, the firmware may >>>> go off-channel for longer than 500ms, delaying the TX reports. >>>> >> Hi Bitterblue, >> thanks for the review. >>> >>> But power saving state transitions should not happen during heavy TX load. >>> >> You are absolutely right that power save transitions don't happen >> during heavy TX. The issue is strictly tied to off-channel dwell time. >> I reliably trigger this on my rtl8723du (USB) by forcing background >> scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware >> goes off-channel to scan, which delays the TX report well beyond the >> current 500ms threshold. >> >>>> When this happens, the purge timer fires prematurely, dropping the >>>> tracking skbs from the queue and spamming the kernel log with: >>>> "failed to get tx report from firmware". Dropping these tracking skbs >>>> prevents the driver from reporting TX status back to mac80211, which >>>> breaks rate control accounting and degrades performance. >>>> >>> >>> But mac80211 doesn't handle rate control for these chips. How much does >>> performance degrade? >>> >> >> I understand the firmware handles that internally. The performance >> degradation I am actually seeing is TCP window collapse, as the host >> stack interprets the dropped tracking skbs as packet loss. In my >> testing with iperf3, throughput drops from a steady 80-90 Mbps to >> near-zero for nearly 2 seconds following the scan before recovery >> begins. >> >>>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough >>>> to comfortably accommodate the duration of full WiFi background scans >>>> and sleep transitions without incorrectly tripping the purge timer, >>>> while still eventually catching true firmware lockups. >>>> >>> >>> rtw88 supports many chips. Which one are you using? >>> >>> Perhaps provide a full description of the problem you encountered. >>> >> >> ... >> >> I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to >> 2500ms is too heavy-handed. Since this impacts all rtw88 chips, >> including PCIe variants where 500ms might be exactly what is needed to >> catch a real firmware lockup, the blast radius is too large. How would >> you prefer I handle this for the v2 patch? I can either implement a >> more conservative global bump, or make the timeout dynamic based on >> the HCI interface so USB devices get a longer timeout to accommodate >> the bus latency during scans. >> >> Best regards, >> Luka Gejak > >No idea, I'm just asking some questions... > >Actually, I have one more: what version of the driver did you test? > >My quick test with RTL8723DU doesn't show any "failed to get tx report >from firmware" when scanning while running iperf3. Does it take a long >time to trigger? I am testing against the latest wireless-next tree. You are correct that it is an intermittent race condition, which explains why it doesn't appear in every test run. To reproduce this, I use a script to sustain heavy TX load while forcing background scans in a loop. Under this stress, it typically manifests after a few minutes of operation. Best regards, Luka Gejak ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-01 21:33 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-01 15:04 [PATCH] wifi: rtw88: increase TX report timeout to fix race condition luka.gejak 2026-05-01 19:26 ` Bitterblue Smith 2026-05-01 20:46 ` Luka Gejak 2026-05-01 21:28 ` Bitterblue Smith 2026-05-01 21:33 ` Luka Gejak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox