public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Luka Gejak <luka.gejak@linux.dev>
To: Bitterblue Smith <rtl8821cerfe2@gmail.com>,
	Ping-Ke Shih <pkshih@realtek.com>, Kalle Valo <kvalo@kernel.org>
Cc: Yan-Hsuan Chuang <yhchuang@realtek.com>,
	Brian Norris <briannorris@chromium.org>,
	Stanislaw Gruszka <sgruszka@redhat.com>,
	linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, luka.gejak@linux.dev
Subject: Re: [PATCH] wifi: rtw88: increase TX report timeout to fix race condition
Date: Fri, 01 May 2026 23:33:21 +0200	[thread overview]
Message-ID: <39E7B292-F03C-4307-B0BE-62DEC191FED8@linux.dev> (raw)
In-Reply-To: <bc0a9969-b386-42d1-ada2-99ac39e394f3@gmail.com>

On May 1, 2026 11:28:38 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>On 01/05/2026 23:46, Luka Gejak wrote:
>> On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith <rtl8821cerfe2@gmail.com> wrote:
>>> On 01/05/2026 18:04, luka.gejak@linux.dev wrote:
>>>> From: Luka Gejak <luka.gejak@linux.dev>
>>>>
>>>> The driver expects the firmware to report TX status within 500ms.
>>>> However, a race condition exists when the hardware is under heavy TX
>>>> load and is simultaneously interrupted by background scans or
>>>> power-saving state transitions. During these events, the firmware may
>>>> go off-channel for longer than 500ms, delaying the TX reports.
>>>>
>> Hi Bitterblue,
>> thanks for the review.
>>>
>>> But power saving state transitions should not happen during heavy TX load.
>>>
>> You are absolutely right that power save transitions don't happen 
>> during heavy TX. The issue is strictly tied to off-channel dwell time.
>> I reliably trigger this on my rtl8723du (USB) by forcing background 
>> scans (iw dev wlanX scan) while under heavy iperf3 load. The firmware 
>> goes off-channel to scan, which delays the TX report well beyond the 
>> current 500ms threshold.
>> 
>>>> When this happens, the purge timer fires prematurely, dropping the
>>>> tracking skbs from the queue and spamming the kernel log with:
>>>> "failed to get tx report from firmware". Dropping these tracking skbs
>>>> prevents the driver from reporting TX status back to mac80211, which
>>>> breaks rate control accounting and degrades performance.
>>>>
>>>
>>> But mac80211 doesn't handle rate control for these chips. How much does
>>> performance degrade?
>>>
>> 
>> I understand the firmware handles that internally. The performance 
>> degradation I am actually seeing is TCP window collapse, as the host 
>> stack interprets the dropped tracking skbs as packet loss. In my 
>> testing with iperf3, throughput drops from a steady 80-90 Mbps to 
>> near-zero for nearly 2 seconds following the scan before recovery 
>> begins.
>> 
>>>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms. This timeout is large enough
>>>> to comfortably accommodate the duration of full WiFi background scans
>>>> and sleep transitions without incorrectly tripping the purge timer,
>>>> while still eventually catching true firmware lockups.
>>>>
>>>
>>> rtw88 supports many chips. Which one are you using?
>>>
>>> Perhaps provide a full description of the problem you encountered.
>>>
>> 
>> ...
>> 
>> I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to 
>> 2500ms is too heavy-handed. Since this impacts all rtw88 chips, 
>> including PCIe variants where 500ms might be exactly what is needed to
>> catch a real firmware lockup, the blast radius is too large. How would
>> you prefer I handle this for the v2 patch? I can either implement a 
>> more conservative global bump, or make the timeout dynamic based on 
>> the HCI interface so USB devices get a longer timeout to accommodate 
>> the bus latency during scans.
>> 
>> Best regards,
>> Luka Gejak
>
>No idea, I'm just asking some questions...
>
>Actually, I have one more: what version of the driver did you test?
>
>My quick test with RTL8723DU doesn't show any "failed to get tx report
>from firmware" when scanning while running iperf3. Does it take a long
>time to trigger?

I am testing against the latest wireless-next tree.
You are correct that it is an intermittent race condition, which 
explains why it doesn't appear in every test run. To reproduce this, I
use a script to sustain heavy TX load while forcing background scans 
in a loop. Under this stress, it typically manifests after a few 
minutes of operation.
Best regards,
Luka Gejak

      reply	other threads:[~2026-05-01 21:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 15:04 [PATCH] wifi: rtw88: increase TX report timeout to fix race condition luka.gejak
2026-05-01 19:26 ` Bitterblue Smith
2026-05-01 20:46   ` Luka Gejak
2026-05-01 21:28     ` Bitterblue Smith
2026-05-01 21:33       ` Luka Gejak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39E7B292-F03C-4307-B0BE-62DEC191FED8@linux.dev \
    --to=luka.gejak@linux.dev \
    --cc=briannorris@chromium.org \
    --cc=kvalo@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=pkshih@realtek.com \
    --cc=rtl8821cerfe2@gmail.com \
    --cc=sgruszka@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=yhchuang@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox