Linux wireless drivers development
 help / color / mirror / Atom feed
From: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
To: Hauke Mehrtens <hauke@hauke-m.de>,
	Jeff Johnson <jjohnson@kernel.org>,
	linux-wireless@vger.kernel.org, ath11k@lists.infradead.org,
	Baochen Qiang <quic_bqiang@quicinc.com>
Subject: Re: ath11k: WCN6855 WoWLAN resume leaves RX in unrecoverable reorder state → TCP collapses
Date: Tue, 26 May 2026 09:51:06 +0800	[thread overview]
Message-ID: <9e71acb7-38dc-4207-aff5-618abb7b92f3@oss.qualcomm.com> (raw)
In-Reply-To: <0fbb0b6e-c849-4e26-9c46-2ac4986f6b52@hauke-m.de>



On 5/25/2026 9:24 PM, Hauke Mehrtens wrote:
> I used AI to help me debug this problem.
> 
> On Lenovo ThinkPad P14s G4 AMD (QCNFA765 / WCN6855 hw2.1), ~1 in 10
> suspend/resume cycles leaves the ath11k RX path delivering MSDUs out of
> order (~16% of TCP segments). TCP cwnd stays at 1-3 MSS and goodput
> collapses to ~3 Mbit/s; UDP on the same link in the same minute pushes
> 100+ Mbit/s.
> 
> This machine is in the DMI quirk list at
> `drivers/net/wireless/ath/ath11k/core.c` that forces `ATH11K_PM_WOW`.
> In WOW mode the firmware is kept alive across suspend; the WOW resume
> path does not re-initialise REO HW or per-TID BA state.
> The PM_WOW quirk was added as a workaround for unexpected-wakeup bug
> https://bugzilla.kernel.org/show_bug.cgi?id=219196
> 
> ## Affected components
> 
> - **Driver:** ath11k_pci
> - **Chip:** WCN6855 hw2.1 (`17cb:1103`, QCNFA765)
> - **Firmware:** `WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41`
>   (fw_version `0x11088c35`, 2024-04-17)
> - **Kernel:** observed on 7.0.9-arch1-1; also present across many
>   earlier kernel versions over the past >1 year on this hardware
> - **Machine:** Lenovo ThinkPad P14s G4 AMD, DMI `21K5CTO1WW` (matches
>   quirk entry "P14s G4 AMD #1" at `core.c:961-966` via `"21K5"`
>   substring)
> 
> ## Reproduce
> 
> 1. Associate to an HE AP (characterised at 6 GHz, HE-MCS 5/6 NSS 2
>    160 MHz, -56 dBm, using MT7915 with OpenWrt 25.12).
> 2. Suspend, wake, test `iperf3` TCP. Repeat. On average within ~10
>    cycles, one resume leaves the link broken.
> 3. In the broken state: `iw dev wlpXsY link` still reports ~1.3 Gbit/s
>    "bitrate". Ping and UDP iperf3 look fine. TCP iperf3 collapses to
>    ~3 Mbit/s with cwnd stuck at 1-3 MSS.
> 
> ## Evidence
> 
> ### iperf3, same link same minute
> 
> ```
> AP -> STA, UDP -b 200M -l 1400 -t 15:
>   sender:   200 Mbit/s, 267876 datagrams
>   receiver: 102 Mbit/s, 137290 received, 130585 "lost"
>   (not real loss; iperf3 UDP counts out-of-window arrivals as lost)
> 
> AP -> STA, TCP -t 15:
>   3.43 Mbit/s, 521 retransmits, cwnd 1.41-5.66 KB throughout
> ```
> 
> ### UDP run: no real loss anywhere
> 
> - `ip -s link` delta: `+267,953 packets`, `0 errors`, `0 dropped`
>   (AP sent 267,876).
> - `/proc/net/snmp` Udp: `RcvbufErrors 0, InErrors 0`.
> - ath11k `pdev_stats` delta: `MSDUs delivered to HTT +267,985`.
> - `soc_dp_stats` entirely zero: no RXDMA / REO / HAL / TCL / backpressure
>   errors of any kind.
> - AP `iw station get`: ~1.3% retry rate, -65 dBm ACK signal,
>   `expected throughput 1049 Mbps`.
> 
> → Air link clean. Host data path clean. Firmware delivered every
> datagram. No drops anywhere.
> 
> ### TCP socket reorder (`ss -tin` once per second during TCP iperf3)
> 
> ```
>    t (s)    bytes_rx   segs_in   rcv_ooopack
>    0        1,291,653       895          158
>    1        1,717,365     1,189          210
>    2        2,060,541     1,426          274
>    3        2,519,557     1,743          335
>    4        3,050,973     2,110          397
>    5        3,446,277     2,383          450
>    6        3,906,741     2,701          513
> ```
> 
> ~60 ooo packets/s out of ~370 segs/s = **~16% out-of-order**, sustained.
> 
> ### Packet-level pattern (`tcpdump` on wlpXsY)
> 
> Seq normalised to 0 at flow start:
> 
> ```
> 22 ms     2896:4344
> 25 ms     4344:5792
> 27 ms     1448:2896           <-- late; fills gap from 5 ms earlier
> 28 ms     5792:7240
> 54 ms     8688:10136
> 55 ms     10136:11584
> 57 ms     7240:8688           <-- late
> 107 ms    26064:27512
> 107 ms    28960:30408
> 108 ms    30408:31856
> 109 ms    27512:28960         <-- late
> 156 ms    57920:59368
> 156 ms    59368:60816
> 157 ms    56472:57920         <-- late
> ```
> 
> Fingerprint: A-MPDU subframe lost on first transmission, retried, retry
> arrives 2-5 ms later. Working REO HW would buffer the continuation
> until the missing subframe arrived or the per-TID reorder timeout
> (`HAL_DEFAULT_REO_TIMEOUT_USEC`, 40 ms) expired. Here both continuation
> and retry pass through unordered.
> 
> ## Diagnosis
> 
> - Air link healthy; host data path clean; REO HW error counters all
>   zero — REO simply isn't enforcing order for this peer's TIDs.
> - dmesg across 3 days of suspend cycles shows zero ath11k re-init
>   activity (no `fw_version` reprint, no `wcn6855 hw2.1` reprint). The
>   firmware instance is the same one from the most recent `modprobe`.
>   `ath11k_core_suspend_wow` / `ath11k_core_resume_wow` neither power
>   down the device nor re-initialise REO.
> - `rmmod` triggers full `ath11k_hif_power_down` + chip re-init on next
>   `modprobe`, which re-runs `ath11k_hw_wcn6855_reo_setup`. This is the
>   only reliable recovery, so the corrupted state lives in firmware /
>   REO HW that the WOW resume path never touches.
> 
> The non-WOW path (`ath11k_core_suspend_default`) does power-down + full
> re-init on resume, re-running `ath11k_dp_srng_common_setup()` →
> `hw_ops->reo_setup()`. The WOW path does not.
> 
> ## Related
> 
> - Bug #219196 — unexpected wakeups; the WOW workaround was added to
>   mitigate this
> - `ce8669a27016` — introduced the WOW quirk table (2025-03-28, Baochen Qiang)
> - `0eb002c93c3b` — added `21K5` / `21K6` (this laptop) to the quirk
>   table (2025-09-29, Mark Pearson)
> - `4015b1972763` — adds Z13/Z16 Gen1 to WOW quirk (Nov 2025)
> 
> Hauke
> 

can you please try below fix ?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a2451a34afdf563b3102d36a4b6cf335cf813e2


      reply	other threads:[~2026-05-26  1:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-25 13:24 ath11k: WCN6855 WoWLAN resume leaves RX in unrecoverable reorder state → TCP collapses Hauke Mehrtens
2026-05-26  1:51 ` Baochen Qiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e71acb7-38dc-4207-aff5-618abb7b92f3@oss.qualcomm.com \
    --to=baochen.qiang@oss.qualcomm.com \
    --cc=ath11k@lists.infradead.org \
    --cc=hauke@hauke-m.de \
    --cc=jjohnson@kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=quic_bqiang@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox