From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-181.mta1.migadu.com (out-181.mta1.migadu.com [95.215.58.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C77126B756 for ; Fri, 1 May 2026 20:47:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777668436; cv=none; b=B2xyLv/rXK3mQaNZ6+fIi9AK3VRv9xSByttDjemTMNF8ZYg00uDEnxSPPiUe+eivNtplKfl0U6XARZZJx46E1HQjvQpO6vG0qfUVs3tWBgjPxx+nR19NDrNs6zmbRFNnQhW9VVa/RPHPfWjfnjn8NMF2tpf/nN+aeWaBXydhXVQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777668436; c=relaxed/simple; bh=8wnO9B0Eu6mpNIqDouO4jaaMmliPGSkqOKx4553yYLY=; h=Date:From:To:CC:Subject:In-Reply-To:References:Message-ID: MIME-Version:Content-Type; b=iJo2Xl4Z3cHM/fgeA66A2WvmteJ2ocT55lBqJ6EZyfaiZLZQpww5OvxJ4Ul7GVk1lsuPYtTS39CYSqX6MQ8ph/LDE2m6PDYZBubZ+BxNdW/DVq/kWQDk/94BZ6TBdpeIW/yZsv+9phdVtGm2oQpwv/K2ZAmJgRYJensQcsvKVy4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=Diq3/lpn; arc=none smtp.client-ip=95.215.58.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="Diq3/lpn" Date: Fri, 01 May 2026 22:46:51 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1777668423; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8wnO9B0Eu6mpNIqDouO4jaaMmliPGSkqOKx4553yYLY=; b=Diq3/lpntL7ul9Ajpu/jhcUuuR6F/rGVA/IArTkYSUWHSFh1q77ixK733DgwZcz+FEOIcu HbYzg49Xwct6ybioc4jTSMUGwIZSysJWCk9a6iNsY4Kf7Jbk1LxhsqFmbLnIe+jmOsppap XEbkUn6RnaGWx0uzUMpPncLnziE+IkY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Luka Gejak To: Bitterblue Smith , Ping-Ke Shih , Kalle Valo CC: Yan-Hsuan Chuang , Brian Norris , Stanislaw Gruszka , linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, luka.gejak@linux.dev Subject: =?US-ASCII?Q?Re=3A_=5BPATCH=5D_wifi=3A_rtw88=3A_increase_TX?= =?US-ASCII?Q?_report_timeout_to_fix_race_condition?= In-Reply-To: <72f6fffd-bd77-437f-a9d9-6a542a8b365b@gmail.com> References: <20260501150402.227788-1-luka.gejak@linux.dev> <72f6fffd-bd77-437f-a9d9-6a542a8b365b@gmail.com> Message-ID: <6CD170FE-CAED-4B91-AEED-A1AFB98FFE8A@linux.dev> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT On May 1, 2026 9:26:30 PM GMT+02:00, Bitterblue Smith wrote: >On 01/05/2026 18:04, luka=2Egejak@linux=2Edev wrote: >> From: Luka Gejak >>=20 >> The driver expects the firmware to report TX status within 500ms=2E >> However, a race condition exists when the hardware is under heavy TX >> load and is simultaneously interrupted by background scans or >> power-saving state transitions=2E During these events, the firmware may >> go off-channel for longer than 500ms, delaying the TX reports=2E >>=20 Hi Bitterblue, thanks for the review=2E > >But power saving state transitions should not happen during heavy TX load= =2E > You are absolutely right that power save transitions don't happen=20 during heavy TX=2E The issue is strictly tied to off-channel dwell time=2E I reliably trigger this on my rtl8723du (USB) by forcing background=20 scans (iw dev wlanX scan) while under heavy iperf3 load=2E The firmware=20 goes off-channel to scan, which delays the TX report well beyond the=20 current 500ms threshold=2E >> When this happens, the purge timer fires prematurely, dropping the >> tracking skbs from the queue and spamming the kernel log with: >> "failed to get tx report from firmware"=2E Dropping these tracking skbs >> prevents the driver from reporting TX status back to mac80211, which >> breaks rate control accounting and degrades performance=2E >>=20 > >But mac80211 doesn't handle rate control for these chips=2E How much does >performance degrade? > I understand the firmware handles that internally=2E The performance=20 degradation I am actually seeing is TCP window collapse, as the host=20 stack interprets the dropped tracking skbs as packet loss=2E In my=20 testing with iperf3, throughput drops from a steady 80-90 Mbps to=20 near-zero for nearly 2 seconds following the scan before recovery=20 begins=2E >> Increase RTW_TX_PROBE_TIMEOUT to 2500ms=2E This timeout is large enough >> to comfortably accommodate the duration of full WiFi background scans >> and sleep transitions without incorrectly tripping the purge timer, >> while still eventually catching true firmware lockups=2E >>=20 > >rtw88 supports many chips=2E Which one are you using? > >Perhaps provide a full description of the problem you encountered=2E > =2E=2E=2E I also realize now that globally changing RTW_TX_PROBE_TIMEOUT to=20 2500ms is too heavy-handed=2E Since this impacts all rtw88 chips,=20 including PCIe variants where 500ms might be exactly what is needed to catch a real firmware lockup, the blast radius is too large=2E How would you prefer I handle this for the v2 patch? I can either implement a=20 more conservative global bump, or make the timeout dynamic based on=20 the HCI interface so USB devices get a longer timeout to accommodate=20 the bus latency during scans=2E Best regards, Luka Gejak