From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com [91.218.175.184]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34D2A351C03 for ; Fri, 1 May 2026 21:33:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.184 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777671224; cv=none; b=k3SyYV1Yam+Gx87KGLSFulTabgL3/R9qYNYghnBijCxHFCjvyQD0giHmpt7Hzmm9b/F0OJDDOT/AS+Dy9Z67nkL2KLoaA71fMizccHt2VVfQ8dxcBZLXfDjdXVOucpdZtkVT4ye9eYKjO21aeFvLfV3RryE6UgnDN9FOCidTDnQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777671224; c=relaxed/simple; bh=VBCdIfMdhRSGjRAFYSHsv7WfE9XYU5ZakY5w6z/akKE=; h=Date:From:To:CC:Subject:In-Reply-To:References:Message-ID: MIME-Version:Content-Type; b=pMA7kPx54/4eb3ji/mqA+TnTwoJUXmCNL/5GxrPg4FKEZlPvVMS7B1yUnfOYzSZpDa855r5XJPn+WSMYIJw0Vz8bAvEzlZH6dXM1jjGX+1UlpoinOuMErXnnC1bGlStGLruvbujIqgNdfJSkD0SzHls6zgDdMB/ZfSuuSFldcjM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=w/QRcQ0u; arc=none smtp.client-ip=91.218.175.184 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="w/QRcQ0u" Date: Fri, 01 May 2026 23:33:21 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1777671209; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VBCdIfMdhRSGjRAFYSHsv7WfE9XYU5ZakY5w6z/akKE=; b=w/QRcQ0uyU3yvbMHSfXcx5jZNEJ7F/CFIQNT7CId4NSpVX0tVbHq7EMDTeyxuRfpEq+Gzw VQ0ooAYAtpyscY1LSFJS57186mwjUzn2aO5MSWG77WINbz12w45KGuEWWOWKZKK3SKYHQJ G8NnAJvMD151og9HLW2cnfB7C9nkZxU= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Luka Gejak To: Bitterblue Smith , Ping-Ke Shih , Kalle Valo CC: Yan-Hsuan Chuang , Brian Norris , Stanislaw Gruszka , linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, luka.gejak@linux.dev Subject: =?US-ASCII?Q?Re=3A_=5BPATCH=5D_wifi=3A_rtw88=3A_increase_TX?= =?US-ASCII?Q?_report_timeout_to_fix_race_condition?= In-Reply-To: References: <20260501150402.227788-1-luka.gejak@linux.dev> <72f6fffd-bd77-437f-a9d9-6a542a8b365b@gmail.com> <6CD170FE-CAED-4B91-AEED-A1AFB98FFE8A@linux.dev> Message-ID: <39E7B292-F03C-4307-B0BE-62DEC191FED8@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT On May 1, 2026 11:28:38 PM GMT+02:00, Bitterblue Smith wrote: >On 01/05/2026 23:46, Luka Gejak wrote: >> On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith wrote: >>> On 01/05/2026 18:04, luka=2Egejak@linux=2Edev wrote: >>>> From: Luka Gejak >>>> >>>> The driver expects the firmware to report TX status within 500ms=2E >>>> However, a race condition exists when the hardware is under heavy TX >>>> load and is simultaneously interrupted by background scans or >>>> power-saving state transitions=2E During these events, the firmware m= ay >>>> go off-channel for longer than 500ms, delaying the TX reports=2E >>>> >> Hi Bitterblue, >> thanks for the review=2E >>> >>> But power saving state transitions should not happen during heavy TX l= oad=2E >>> >> You are absolutely right that power save transitions don't happen=20 >> during heavy TX=2E The issue is strictly tied to off-channel dwell time= =2E >> I reliably trigger this on my rtl8723du (USB) by forcing background=20 >> scans (iw dev wlanX scan) while under heavy iperf3 load=2E The firmware= =20 >> goes off-channel to scan, which delays the TX report well beyond the=20 >> current 500ms threshold=2E >>=20 >>>> When this happens, the purge timer fires prematurely, dropping the >>>> tracking skbs from the queue and spamming the kernel log with: >>>> "failed to get tx report from firmware"=2E Dropping these tracking sk= bs >>>> prevents the driver from reporting TX status back to mac80211, which >>>> breaks rate control accounting and degrades performance=2E >>>> >>> >>> But mac80211 doesn't handle rate control for these chips=2E How much d= oes >>> performance degrade? >>> >>=20 >> I understand the firmware handles that internally=2E The performance=20 >> degradation I am actually seeing is TCP window collapse, as the host=20 >> stack interprets the dropped tracking skbs as packet loss=2E In my=20 >> testing with iperf3, throughput drops from a steady 80-90 Mbps to=20 >> near-zero for nearly 2 seconds following the scan before recovery=20 >> begins=2E >>=20 >>>> Increase RTW_TX_PROBE_TIMEOUT to 2500ms=2E This timeout is large enou= gh >>>> to comfortably accommodate the duration of full WiFi background scans >>>> and sleep transitions without incorrectly tripping the purge timer, >>>> while still eventually catching true firmware lockups=2E >>>> >>> >>> rtw88 supports many chips=2E Which one are you using? >>> >>> Perhaps provide a full description of the problem you encountered=2E >>> >>=20 >> =2E=2E=2E >>=20 >> I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to=20 >> 2500ms is too heavy-handed=2E Since this impacts all rtw88 chips,=20 >> including PCIe variants where 500ms might be exactly what is needed to >> catch a real firmware lockup, the blast radius is too large=2E How woul= d >> you prefer I handle this for the v2 patch? I can either implement a=20 >> more conservative global bump, or make the timeout dynamic based on=20 >> the HCI interface so USB devices get a longer timeout to accommodate=20 >> the bus latency during scans=2E >>=20 >> Best regards, >> Luka Gejak > >No idea, I'm just asking some questions=2E=2E=2E > >Actually, I have one more: what version of the driver did you test? > >My quick test with RTL8723DU doesn't show any "failed to get tx report >from firmware" when scanning while running iperf3=2E Does it take a long >time to trigger? I am testing against the latest wireless-next tree=2E You are correct that it is an intermittent race condition, which=20 explains why it doesn't appear in every test run=2E To reproduce this, I use a script to sustain heavy TX load while forcing background scans=20 in a loop=2E Under this stress, it typically manifests after a few=20 minutes of operation=2E Best regards, Luka Gejak