* [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
@ 2026-03-09 21:48 LB F
2026-03-10 2:02 ` Ping-Ke Shih
0 siblings, 1 reply; 34+ messages in thread
From: LB F @ 2026-03-09 21:48 UTC (permalink / raw)
To: pkshih; +Cc: linux-wireless, linux-kernel
Hi Ping-Ke,
I am writing to formally report a critical bug that causes a hard
system freeze on laptops equipped with the RTL8821CE WiFi module, and
to propose solutions.
Description:
On an HP laptop equipped with a Realtek RTL8821CE 802.11ac PCIe
adapter (PCI ID: 10ec:c821), the system experiences a hard lockup
(complete freeze of the UI and kernel, sysrq doesn't work, requires
holding the power button) when the WiFi adapter enters the power
saving state.
This issue occurs consistently across multiple Linux distributions and
kernel versions (reproduced on upstream kernel 6.13 and 6.19-rc).
Steps to Reproduce:
1. Use a system with RTL8821CE (pci:10ec:c821).
2. Ensure NetworkManager is configured with wifi.powersave = 3 (or
power saving is enabled via TLP/iw).
3. Connect to a WiFi network and let the system idle.
4. The system will eventually freeze completely.
Workarounds that successfully prevent the freeze:
* Passing disable_lps_deep=y to rtw88_core.
* Passing disable_aspm=y to rtw88_pci (or pcie_aspm=off).
* Disabling WiFi power save via NetworkManager.
Technical Analysis:
The root cause appears to be an unhandled race condition or hardware
bug between the adapter's Low Power State (LPS) Deep mode
(LPS_DEEP_MODE_LCLK) and the PCIe Active State Power Management (ASPM
L1) mechanism.
When the firmware drops into LPS_DEEP_MODE_LCLK concurrently with the
PCIe bus entering ASPM L1, the chip fails to handle PCIe Wake
signaling correctly. While there is an existing workaround in
rtw_pci_napi_poll (pci.c:1806) that sets `rtwpci->rx_no_aspm = true`
during NAPI poll for 8821CE, this polling wrapper is insufficient. The
deadlock often occurs during idle states when polling isn't actively
disabling ASPM, but the system suddenly needs to wake the radio.
Proposed Solutions:
Given that LPS_DEEP_MODE_LCLK seems fundamentally unreliable on 8821ce
PCIe variants when paired with standard Windows-era ASPM
implementations on laptops (HP, Lenovo, ASUS are all affected), the
most robust solution is to strip the unsupported deep sleep flag from
the hardware spec.
```diff
--- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c
+++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c
@@ -1999,7 +1999,7 @@ struct rtw_chip_info rtw8821c_hw_spec = {
.bt_supported = true,
.fbtc_has_ext_ctrl = true,
.coex_info_hw_supported = true,
- .lps_deep_mode_supported = BIT(LPS_DEEP_MODE_LCLK),
+ .lps_deep_mode_supported = 0, /* Disabled due to ASPM L1 hard locks */
.dpk_supported = true,
.pstdma_type = COEX_PSTDMA_FORCE_LPSOFF,
.bfee_support = false,
```
Alternatively, a PCI Subsystem-based quirk should be introduced in
rtw_pci_aspm_set() to refuse ASPM BIT_L1_SW_EN transitions for
affected hardware IDs, similar to how CLKREQ issues are handled for
8822C via efuse->rfe_option.
Cross-Reference Analysis of other RTL8821CE Bugs:
After aggregating recent open bug reports for the 8821ce chip on
Bugzilla (https://bugzilla.kernel.org), it is apparent that almost all
of them are victims of the exact same underlying race condition.
1. Bug 215131: System freeze preceded by 'pci bus timeout, check dma
status'. Workaround used: disable_aspm=1.
2. Bug 219830: Log shows 'firmware failed to leave lps state' and
'failed to send h2c command'. A direct smoking gun for LPS Deep mode
freezing.
3. Bug 218697 & Bug 217491: Endless 'timed out to flush queue' floods.
4. Bug 217781 & Bug 216685: Random dropouts and low wireless speed.
Given the volume and age of these unresolved reports, disabling
.lps_deep_mode_supported (or restricting ASPM L1) specifically for
10ec:c821 is desperately needed.
System Information:
- Hardware: HP Notebook (SKU: P3S95EA#ACB, Family: 103C_5335KV)
- CPU: Intel Core i3-5005U
- WiFi PCI ID: 10ec:c821, Subsystem: 103c:831a
- Kernel: 6.13 / 6.19
- Driver module: rtw88_8821ce
I am happy to test any patches provided or formally submit the patch
above if maintainers agree it is the right approach. Thank you!
^ permalink raw reply [flat|nested] 34+ messages in thread* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-09 21:48 [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) LB F @ 2026-03-10 2:02 ` Ping-Ke Shih 2026-03-10 11:01 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-10 2:02 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Hi Ping-Ke, > > I am writing to formally report a critical bug that causes a hard > system freeze on laptops equipped with the RTL8821CE WiFi module, and > to propose solutions. > > Description: > On an HP laptop equipped with a Realtek RTL8821CE 802.11ac PCIe > adapter (PCI ID: 10ec:c821), the system experiences a hard lockup > (complete freeze of the UI and kernel, sysrq doesn't work, requires > holding the power button) when the WiFi adapter enters the power > saving state. > > This issue occurs consistently across multiple Linux distributions and > kernel versions (reproduced on upstream kernel 6.13 and 6.19-rc). > > Steps to Reproduce: > 1. Use a system with RTL8821CE (pci:10ec:c821). > 2. Ensure NetworkManager is configured with wifi.powersave = 3 (or > power saving is enabled via TLP/iw). > 3. Connect to a WiFi network and let the system idle. > 4. The system will eventually freeze completely. Can you dig kernel log (by netconsole or ramoops) if something useful? I'd like to know this is hardware level freeze or kernel can capture something wrong. > > Workarounds that successfully prevent the freeze: > * Passing disable_lps_deep=y to rtw88_core. > * Passing disable_aspm=y to rtw88_pci (or pcie_aspm=off). > * Disabling WiFi power save via NetworkManager. Are these totally needed to workaround the problem? Or disable_aspm is enough? I'd list them in order of power consumption impact: (the topmost is lower impact) 1. disable_aspm=y 2. disable_lps_deep=y 3. disable WiFi power save If you can do experiments on your platform, we can be easier to decide which workarounds are adopted. > > Technical Analysis: > The root cause appears to be an unhandled race condition or hardware > bug between the adapter's Low Power State (LPS) Deep mode > (LPS_DEEP_MODE_LCLK) and the PCIe Active State Power Management (ASPM > L1) mechanism. > > When the firmware drops into LPS_DEEP_MODE_LCLK concurrently with the > PCIe bus entering ASPM L1, the chip fails to handle PCIe Wake > signaling correctly. While there is an existing workaround in > rtw_pci_napi_poll (pci.c:1806) that sets `rtwpci->rx_no_aspm = true` > during NAPI poll for 8821CE, this polling wrapper is insufficient. The > deadlock often occurs during idle states when polling isn't actively > disabling ASPM, but the system suddenly needs to wake the radio. `rtwpci->rx_no_aspm = true` was another workaround years ago on certain platform. I'd say ASPM has many interoperability problems, even years ago. But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, and going to receive packets. The rx_no_aspm workaround is to forcely turn off ASPM during this period. > > Proposed Solutions: > Given that LPS_DEEP_MODE_LCLK seems fundamentally unreliable on 8821ce > PCIe variants when paired with standard Windows-era ASPM > implementations on laptops (HP, Lenovo, ASUS are all affected), the > most robust solution is to strip the unsupported deep sleep flag from > the hardware spec. > > ```diff > --- a/drivers/net/wireless/realtek/rtw88/rtw8821c.c > +++ b/drivers/net/wireless/realtek/rtw88/rtw8821c.c > @@ -1999,7 +1999,7 @@ struct rtw_chip_info rtw8821c_hw_spec = { > .bt_supported = true, > .fbtc_has_ext_ctrl = true, > .coex_info_hw_supported = true, > - .lps_deep_mode_supported = BIT(LPS_DEEP_MODE_LCLK), > + .lps_deep_mode_supported = 0, /* Disabled due to ASPM L1 hard locks */ > .dpk_supported = true, > .pstdma_type = COEX_PSTDMA_FORCE_LPSOFF, > .bfee_support = false, > ``` > > Alternatively, a PCI Subsystem-based quirk should be introduced in > rtw_pci_aspm_set() to refuse ASPM BIT_L1_SW_EN transitions for > affected hardware IDs, similar to how CLKREQ issues are handled for > 8822C via efuse->rfe_option. I'd add a quirk to your platforms, so other platforms can still have better power consumption. > > Cross-Reference Analysis of other RTL8821CE Bugs: > After aggregating recent open bug reports for the 8821ce chip on > Bugzilla (https://bugzilla.kernel.org), it is apparent that almost all > of them are victims of the exact same underlying race condition. > 1. Bug 215131: System freeze preceded by 'pci bus timeout, check dma > status'. Workaround used: disable_aspm=1. > 2. Bug 219830: Log shows 'firmware failed to leave lps state' and > 'failed to send h2c command'. A direct smoking gun for LPS Deep mode > freezing. > 3. Bug 218697 & Bug 217491: Endless 'timed out to flush queue' floods. > 4. Bug 217781 & Bug 216685: Random dropouts and low wireless speed. > > Given the volume and age of these unresolved reports, disabling > .lps_deep_mode_supported (or restricting ASPM L1) specifically for > 10ec:c821 is desperately needed. > > System Information: > - Hardware: HP Notebook (SKU: P3S95EA#ACB, Family: 103C_5335KV) > - CPU: Intel Core i3-5005U > - WiFi PCI ID: 10ec:c821, Subsystem: 103c:831a > - Kernel: 6.13 / 6.19 > - Driver module: rtw88_8821ce > > I am happy to test any patches provided or formally submit the patch > above if maintainers agree it is the right approach. Thank you! We have not modified RTL8821CE for a long time, so I'd add workaround to specific platform as mentioned above. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-10 2:02 ` Ping-Ke Shih @ 2026-03-10 11:01 ` LB F 2026-03-10 15:12 ` LB F 2026-03-11 2:15 ` Ping-Ke Shih 0 siblings, 2 replies; 34+ messages in thread From: LB F @ 2026-03-10 11:01 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, Thank you for the incredibly fast response and assistance! > Can you dig kernel log (by netconsole or ramoops) if something useful? > I'd like to know this is hardware level freeze or kernel can capture something wrong. I managed to pull a call trace from a historic journald log just before the system hung. The kernel gets trapped in an IRQ thread inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` `ieee80211_rx_list` before everything freezes. Here is the relevant snippet: ```text Call Trace: <IRQ> ? __alloc_skb+0x23a/0x2a0 ? __alloc_skb+0x10c/0x2a0 ? __pfx_irq_thread_fn+0x10/0x10 [ ... truncated module list ... ] Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] ``` It behaves exactly like a PCIe bus deadlock or a hardware fault that eventually brings down the CPU handling the IRQ. > Are these totally needed to workaround the problem? Or disable_aspm is enough? > I'd list them in order of power consumption impact: > 1. disable_aspm=y > 2. disable_lps_deep=y > 3. disable WiFi power save To verify which parameters are strictly necessary, I performed isolated testing today. I ensured no other modprobe configs were active, rebuilt the initramfs, and manually enforced that `wifi.powersave` was active via `iw dev wlan0 set power_save on` during all tests (as the OS power management profiles were defaulting it to off, which initially masked the issue). I tested each workaround individually across multiple sleep/wake cycles and active usage: **Test 1 (ASPM Disabled, LPS Deep Enabled):** - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core disable_lps_deep=n`) - Result: Stable. No freezes were observed during usage or transitions into/out of S3 sleep while power saving was enforced. **Test 2 (ASPM Enabled, LPS Deep Disabled):** - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci disable_aspm=n`) - Result: Stable. No freezes were observed under the same forced power save conditions. **Conclusion:** It appears we do not need both workarounds simultaneously for this specific hardware. Using only `disable_aspm=y` seems to be sufficient to prevent the system freeze. Given your note about the power consumption impact ranking, this looks like the optimal path forward. > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > and going to receive packets. The rx_no_aspm workaround is to forcely turn > off ASPM during this period. By "deadlock" I meant a hardware-level bus lockup. It seems the physical RTL8821CE chip itself crashes or hangs the system's PCIe bus when trying to negotiate waking up from ASPM L1 while simultaneously existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI helps during active Rx decoding, but the laptop often freezes while completely idle, presumably when the AP sends a basic beacon, the chip attempts to leave LPS Deep + L1, and the hardware simply gives up and halts the system. > We have not modified RTL8821CE for a long time, so I'd add workaround > to specific platform as mentioned above. Adding a DMI/platform quirk specifically for this laptop to disable ASPM would be wonderful and deeply appreciated. I agree it is safer than touching the global flags for hardware that is functioning correctly out in the wild. Here is the exact identifying information for my system: System Vendor: HP Product Name: HP Notebook SKU Number: P3S95EA#ACB Family: 103C_5335KV PCI ID: 10ec:c821 Subsystem ID: 103c:831a I am completely ready to test any patch or quirk you send my way. Thank you so much for your time and helping track this down! Best regards, Oleksandr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-10 11:01 ` LB F @ 2026-03-10 15:12 ` LB F 2026-03-11 2:20 ` Ping-Ke Shih 2026-03-11 2:15 ` Ping-Ke Shih 1 sibling, 1 reply; 34+ messages in thread From: LB F @ 2026-03-10 15:12 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, Thank you for your guidance. To provide you with the cleanest possible diagnostic data, we devised a strict testing environment: 1. **Live USB Environment:** We booted a completely fresh Live USB of CachyOS (Kernel 6.19.6) to eliminate any potential interference from installed software, TLP profiles, or custom NetworkManager configurations. 2. **Aggressive Local Logging:** Because the system freeze physically locks the PCIe bus and disables the Wi-Fi adapter instantly, using `netconsole` was impossible (the network drops microseconds before the freeze). To overcome this, we wrote an "aggressive logger" script that pipes `dmesg -w` directly to an independent FAT32 USB drive while issuing a `sync` command twice a second. This bypassed RAM caching and physically burned the logs to the drive right up to the moment of the hard freeze. The script we used was: ```bash #!/bin/bash LOG_FILE="/run/media/liveuser/LOGS/kernel_freeze.log" dmesg -w > "$LOG_FILE" & while true; do sync sleep 0.5 done ``` 3. No workarounds (`disable_aspm=n`, `disable_lps_deep=n`) were active in this test. We manually enabled power saving (`iw dev wlan0 set power_save on`) and triggered the freeze via typical web browsing. Here are the precise, unadulterated logs showing the adapter successfully connecting to the network, sitting idle for about 10 seconds (presumably entering power-saving states), and then suffering a fatal firmware lockup right before the PCIe bus froze: ``` [ 304.709201] audit: type=1111 ... op=connection-add-activate ... name="Andrey_5G" ... [ 305.617785] wlan0: authenticate with 6c:68:a4:1c:97:5b ... [ 305.660333] wlan0: authenticated [ 305.661661] wlan0: associate with 6c:68:a4:1c:97:5b (try 1/3) [ 305.663404] wlan0: associated [ 305.719997] wlan0: Limiting TX power to 30 (30 - 0) dBm as advertised by 6c:68:a4:1c:97:5b ... (~10 seconds of idle network time) ... [ 316.907114] rtw88_8821ce 0000:13:00.0: failed to send h2c command [ 316.911190] rtw88_8821ce 0000:13:00.0: failed to send h2c command [ 316.921504] rtw88_8821ce 0000:13:00.0: coex request time out ... [ 349.630952] rtw88_8821ce 0000:13:00.0: failed to send h2c command [ 349.635023] rtw88_8821ce 0000:13:00.0: failed to send h2c command [ 357.811235] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state [ 359.797238] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state ... (repeats indefinitely until hard reset) ... ``` As the logs clearly demonstrate, the adapter authenticates perfectly but the firmware explicitly fails to leave the LPS state after a brief idle period, dropping all H2C commands immediately before the system-wide hard freeze begins. We will upload the full, unabridged `.log` file to our Bugzilla thread (Bug 221195) momentarily, but we wanted to provide you with this exact 'smoking gun' trace right away to help identify the root cause. Please let us know if this information is helpful or if there are any specific module patches or further tests you would like us to perform to assist with debugging. Best regards, Oleksandr вт, 10 мар. 2026 г. в 13:01, LB F <goainwo@gmail.com>: > > Hi Ping-Ke, > > Thank you for the incredibly fast response and assistance! > > > Can you dig kernel log (by netconsole or ramoops) if something useful? > > I'd like to know this is hardware level freeze or kernel can capture something wrong. > > I managed to pull a call trace from a historic journald log just > before the system hung. The kernel gets trapped in an IRQ thread > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` > `ieee80211_rx_list` before everything freezes. Here is the relevant > snippet: > > ```text > Call Trace: > <IRQ> > ? __alloc_skb+0x23a/0x2a0 > ? __alloc_skb+0x10c/0x2a0 > ? __pfx_irq_thread_fn+0x10/0x10 > [ ... truncated module list ... ] > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] > ``` > > It behaves exactly like a PCIe bus deadlock or a hardware fault that > eventually brings down the CPU handling the IRQ. > > > Are these totally needed to workaround the problem? Or disable_aspm is enough? > > I'd list them in order of power consumption impact: > > 1. disable_aspm=y > > 2. disable_lps_deep=y > > 3. disable WiFi power save > > To verify which parameters are strictly necessary, I performed > isolated testing today. I ensured no other modprobe configs were > active, rebuilt the initramfs, and manually enforced that > `wifi.powersave` was active via `iw dev wlan0 set power_save on` > during all tests (as the OS power management profiles were defaulting > it to off, which initially masked the issue). > > I tested each workaround individually across multiple sleep/wake > cycles and active usage: > > **Test 1 (ASPM Disabled, LPS Deep Enabled):** > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core > disable_lps_deep=n`) > - Result: Stable. No freezes were observed during usage or transitions > into/out of S3 sleep while power saving was enforced. > > **Test 2 (ASPM Enabled, LPS Deep Disabled):** > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci > disable_aspm=n`) > - Result: Stable. No freezes were observed under the same forced power > save conditions. > > **Conclusion:** It appears we do not need both workarounds > simultaneously for this specific hardware. Using only `disable_aspm=y` > seems to be sufficient to prevent the system freeze. Given your note > about the power consumption impact ranking, this looks like the > optimal path forward. > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > > and going to receive packets. The rx_no_aspm workaround is to forcely turn > > off ASPM during this period. > > By "deadlock" I meant a hardware-level bus lockup. It seems the > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus > when trying to negotiate waking up from ASPM L1 while simultaneously > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI > helps during active Rx decoding, but the laptop often freezes while > completely idle, presumably when the AP sends a basic beacon, the chip > attempts to leave LPS Deep + L1, and the hardware simply gives up and > halts the system. > > > We have not modified RTL8821CE for a long time, so I'd add workaround > > to specific platform as mentioned above. > > Adding a DMI/platform quirk specifically for this laptop to disable > ASPM would be wonderful and deeply appreciated. I agree it is safer > than touching the global flags for hardware that is functioning > correctly out in the wild. > > Here is the exact identifying information for my system: > > System Vendor: HP > Product Name: HP Notebook > SKU Number: P3S95EA#ACB > Family: 103C_5335KV > PCI ID: 10ec:c821 > Subsystem ID: 103c:831a > > I am completely ready to test any patch or quirk you send my way. > Thank you so much for your time and helping track this down! > > Best regards, > Oleksandr ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-10 15:12 ` LB F @ 2026-03-11 2:20 ` Ping-Ke Shih 0 siblings, 0 replies; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-11 2:20 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > > Hi Ping-Ke, > > Thank you for your guidance. To provide you with the cleanest possible > diagnostic data, we devised a strict testing environment: > > 1. **Live USB Environment:** We booted a completely fresh Live USB of > CachyOS (Kernel 6.19.6) to eliminate any potential interference from > installed software, TLP profiles, or custom NetworkManager > configurations. > 2. **Aggressive Local Logging:** Because the system freeze physically > locks the PCIe bus and disables the Wi-Fi adapter instantly, using > `netconsole` was impossible (the network drops microseconds before the > freeze). > > To overcome this, we wrote an "aggressive logger" script that pipes > `dmesg -w` directly to an independent FAT32 USB drive while issuing a > `sync` command twice a second. This bypassed RAM caching and > physically burned the logs to the drive right up to the moment of the > hard freeze. The script we used was: > > ```bash > #!/bin/bash > LOG_FILE="/run/media/liveuser/LOGS/kernel_freeze.log" > dmesg -w > "$LOG_FILE" & > while true; do > sync > sleep 0.5 > done > ``` > > 3. No workarounds (`disable_aspm=n`, `disable_lps_deep=n`) were active > in this test. We manually enabled power saving (`iw dev wlan0 set > power_save on`) and triggered the freeze via typical web browsing. > > Here are the precise, unadulterated logs showing the adapter > successfully connecting to the network, sitting idle for about 10 > seconds (presumably entering power-saving states), and then suffering > a fatal firmware lockup right before the PCIe bus froze: > > ``` > [ 304.709201] audit: type=1111 ... op=connection-add-activate ... > name="Andrey_5G" ... > [ 305.617785] wlan0: authenticate with 6c:68:a4:1c:97:5b ... > [ 305.660333] wlan0: authenticated > [ 305.661661] wlan0: associate with 6c:68:a4:1c:97:5b (try 1/3) > [ 305.663404] wlan0: associated > [ 305.719997] wlan0: Limiting TX power to 30 (30 - 0) dBm as > advertised by 6c:68:a4:1c:97:5b > ... (~10 seconds of idle network time) ... > [ 316.907114] rtw88_8821ce 0000:13:00.0: failed to send h2c command > [ 316.911190] rtw88_8821ce 0000:13:00.0: failed to send h2c command > [ 316.921504] rtw88_8821ce 0000:13:00.0: coex request time out > ... > [ 349.630952] rtw88_8821ce 0000:13:00.0: failed to send h2c command > [ 349.635023] rtw88_8821ce 0000:13:00.0: failed to send h2c command > [ 357.811235] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state > [ 359.797238] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state > ... (repeats indefinitely until hard reset) ... > ``` Just want to clarify that these logs only appear in test 3, right? No these logs in test 1/2. > > As the logs clearly demonstrate, the adapter authenticates perfectly > but the firmware explicitly fails to leave the LPS state after a brief > idle period, dropping all H2C commands immediately before the > system-wide hard freeze begins. > > We will upload the full, unabridged `.log` file to our Bugzilla thread > (Bug 221195) momentarily, but we wanted to provide you with this exact > 'smoking gun' trace right away to help identify the root cause. > > Please let us know if this information is helpful or if there are any > specific module patches or further tests you would like us to perform > to assist with debugging. Thanks for your detail tests and logs. With this kind of hardware problem, to dig the cause, we need real hardware and hardware scope to measure signals. I'd apply quirk or some validations on RX path. That'd be a better way. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-10 11:01 ` LB F 2026-03-10 15:12 ` LB F @ 2026-03-11 2:15 ` Ping-Ke Shih 2026-03-11 2:22 ` Ping-Ke Shih 1 sibling, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-11 2:15 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > > Hi Ping-Ke, > > Thank you for the incredibly fast response and assistance! > > > Can you dig kernel log (by netconsole or ramoops) if something useful? > > I'd like to know this is hardware level freeze or kernel can capture something > wrong. > > I managed to pull a call trace from a historic journald log just > before the system hung. The kernel gets trapped in an IRQ thread > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` > `ieee80211_rx_list` before everything freezes. Here is the relevant > snippet: > > ```text > Call Trace: > <IRQ> > ? __alloc_skb+0x23a/0x2a0 > ? __alloc_skb+0x10c/0x2a0 > ? __pfx_irq_thread_fn+0x10/0x10 > [ ... truncated module list ... ] > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] > ``` > > It behaves exactly like a PCIe bus deadlock or a hardware fault that > eventually brings down the CPU handling the IRQ. I wonder if there is a malformed data, causing this trace and the leads kernel freezes. If we can do validation on RX data before calling ieee80211_rx_list(), maybe trace disappears and everything will be fine? Even no need workaround. > > > Are these totally needed to workaround the problem? Or disable_aspm is enough? > > I'd list them in order of power consumption impact: > > 1. disable_aspm=y > > 2. disable_lps_deep=y > > 3. disable WiFi power save > > To verify which parameters are strictly necessary, I performed > isolated testing today. I ensured no other modprobe configs were > active, rebuilt the initramfs, and manually enforced that > `wifi.powersave` was active via `iw dev wlan0 set power_save on` > during all tests (as the OS power management profiles were defaulting > it to off, which initially masked the issue). > > I tested each workaround individually across multiple sleep/wake > cycles and active usage: > > **Test 1 (ASPM Disabled, LPS Deep Enabled):** > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core > disable_lps_deep=n`) > - Result: Stable. No freezes were observed during usage or transitions > into/out of S3 sleep while power saving was enforced. > > **Test 2 (ASPM Enabled, LPS Deep Disabled):** > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci > disable_aspm=n`) > - Result: Stable. No freezes were observed under the same forced power > save conditions. > > **Conclusion:** It appears we do not need both workarounds > simultaneously for this specific hardware. Using only `disable_aspm=y` > seems to be sufficient to prevent the system freeze. Given your note > about the power consumption impact ranking, this looks like the > optimal path forward. Let's test my RFT patch to disable ASPM then. > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > > and going to receive packets. The rx_no_aspm workaround is to forcely turn > > off ASPM during this period. > > By "deadlock" I meant a hardware-level bus lockup. It seems the > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus > when trying to negotiate waking up from ASPM L1 while simultaneously > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI > helps during active Rx decoding, but the laptop often freezes while > completely idle, presumably when the AP sends a basic beacon, the chip > attempts to leave LPS Deep + L1, and the hardware simply gives up and > halts the system. I think this is your perspective and induction, right? Did you measure real hardware signals? My point is that if this is a hardware-level bus lockup, let's apply quirk. If some malformed data causing kernel hangs, I'd add sanity check on RX data, but I don't actually know what we should check for now. > > > We have not modified RTL8821CE for a long time, so I'd add workaround > > to specific platform as mentioned above. > > Adding a DMI/platform quirk specifically for this laptop to disable > ASPM would be wonderful and deeply appreciated. I agree it is safer > than touching the global flags for hardware that is functioning > correctly out in the wild. > > Here is the exact identifying information for my system: > > System Vendor: HP > Product Name: HP Notebook > SKU Number: P3S95EA#ACB > Family: 103C_5335KV > PCI ID: 10ec:c821 > Subsystem ID: 103c:831a > > I am completely ready to test any patch or quirk you send my way. > Thank you so much for your time and helping track this down! I sent a RFT [1] for test. Please check if it works on your HP notebook. If you check rtw88 log, you can see I added similar patch 5 years ago, and replaced by preferred the change of "rtwpci->rx_no_aspm", which I think it can only resolve problem on partial notebooks though.... [1] https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek.com/T/#u Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-11 2:15 ` Ping-Ke Shih @ 2026-03-11 2:22 ` Ping-Ke Shih 2026-03-11 11:00 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-11 2:22 UTC (permalink / raw) To: Ping-Ke Shih, LB F Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > > LB F <goainwo@gmail.com> wrote: > > > > Hi Ping-Ke, > > > > Thank you for the incredibly fast response and assistance! > > > > > Can you dig kernel log (by netconsole or ramoops) if something useful? > > > I'd like to know this is hardware level freeze or kernel can capture something > > wrong. > > > > I managed to pull a call trace from a historic journald log just > > before the system hung. The kernel gets trapped in an IRQ thread > > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` > > `ieee80211_rx_list` before everything freezes. Here is the relevant > > snippet: > > > > ```text > > Call Trace: > > <IRQ> > > ? __alloc_skb+0x23a/0x2a0 > > ? __alloc_skb+0x10c/0x2a0 > > ? __pfx_irq_thread_fn+0x10/0x10 > > [ ... truncated module list ... ] > > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) > > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] > > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc > > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] > > ``` > > > > It behaves exactly like a PCIe bus deadlock or a hardware fault that > > eventually brings down the CPU handling the IRQ. > > I wonder if there is a malformed data, causing this trace and the leads > kernel freezes. If we can do validation on RX data before calling > ieee80211_rx_list(), maybe trace disappears and everything will be fine? > Even no need workaround. > > > > > > Are these totally needed to workaround the problem? Or disable_aspm is enough? > > > I'd list them in order of power consumption impact: > > > 1. disable_aspm=y > > > 2. disable_lps_deep=y > > > 3. disable WiFi power save > > > > To verify which parameters are strictly necessary, I performed > > isolated testing today. I ensured no other modprobe configs were > > active, rebuilt the initramfs, and manually enforced that > > `wifi.powersave` was active via `iw dev wlan0 set power_save on` > > during all tests (as the OS power management profiles were defaulting > > it to off, which initially masked the issue). > > > > I tested each workaround individually across multiple sleep/wake > > cycles and active usage: > > > > **Test 1 (ASPM Disabled, LPS Deep Enabled):** > > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core > > disable_lps_deep=n`) > > - Result: Stable. No freezes were observed during usage or transitions > > into/out of S3 sleep while power saving was enforced. > > > > **Test 2 (ASPM Enabled, LPS Deep Disabled):** > > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci > > disable_aspm=n`) > > - Result: Stable. No freezes were observed under the same forced power > > save conditions. > > > > **Conclusion:** It appears we do not need both workarounds > > simultaneously for this specific hardware. Using only `disable_aspm=y` > > seems to be sufficient to prevent the system freeze. Given your note > > about the power consumption impact ranking, this looks like the > > optimal path forward. > > Let's test my RFT patch to disable ASPM then. > > > > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > > > and going to receive packets. The rx_no_aspm workaround is to forcely turn > > > off ASPM during this period. > > > > By "deadlock" I meant a hardware-level bus lockup. It seems the > > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus > > when trying to negotiate waking up from ASPM L1 while simultaneously > > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI > > helps during active Rx decoding, but the laptop often freezes while > > completely idle, presumably when the AP sends a basic beacon, the chip > > attempts to leave LPS Deep + L1, and the hardware simply gives up and > > halts the system. > > I think this is your perspective and induction, right? Did you measure > real hardware signals? > > My point is that if this is a hardware-level bus lockup, let's apply > quirk. If some malformed data causing kernel hangs, I'd add sanity check > on RX data, but I don't actually know what we should check for now. > > > > > > We have not modified RTL8821CE for a long time, so I'd add workaround > > > to specific platform as mentioned above. > > > > Adding a DMI/platform quirk specifically for this laptop to disable > > ASPM would be wonderful and deeply appreciated. I agree it is safer > > than touching the global flags for hardware that is functioning > > correctly out in the wild. > > > > Here is the exact identifying information for my system: > > > > System Vendor: HP > > Product Name: HP Notebook > > SKU Number: P3S95EA#ACB > > Family: 103C_5335KV > > PCI ID: 10ec:c821 > > Subsystem ID: 103c:831a > > > > I am completely ready to test any patch or quirk you send my way. > > Thank you so much for your time and helping track this down! > > I sent a RFT [1] for test. Please check if it works on your HP notebook. > If you check rtw88 log, you can see I added similar patch 5 years ago, > and replaced by preferred the change of "rtwpci->rx_no_aspm", which I > think it can only resolve problem on partial notebooks though.... > > [1] > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek. > com/T/#u Forgot to say. Could you share your full name for me as a reporter in commit message? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-11 2:22 ` Ping-Ke Shih @ 2026-03-11 11:00 ` LB F 2026-03-11 15:22 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-11 11:00 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, Thank you for the incredibly fast turnaround and for providing the RFT patch with the DMI quirk! First, I want to mention that I am not an IT professional or a programmer. I am just a regular Linux user who really wants to help solve this problem. I am trying my best to verify everything carefully, so please forgive me if my terminology or induction was slightly off. To answer your clarifying questions from the previous emails: > Just want to clarify that these logs only appear in test 3, right? > No these logs in test 1/2. Yes, exactly. The `failed to send h2c command` errors only caused a complete system freeze when no workarounds were active and the adapter attempted to sleep (Test 3). > I think this is your perspective and induction, right? Did you measure > real hardware signals? You are entirely correct. This is just my induction based solely on the timing of the logs and system behavior. I do not have access to an oscilloscope or any hardware diagnostic tools. Given this, I completely agree that your approach of applying a platform-specific quirk is the safest and best solution. > Forgot to say. Could you share your full name for me as a reporter > in commit message? My full name is Oleksandr Havrylov. I would be honored to be included as the reporter in the commit message. ### Recent Baseline Testing Before Your Patch Before applying your patch today, we ran a few more controlled tests to double-check our baseline. We verified that our local workaround (`modprobe.d disable_aspm=y`) **does indeed keep the system completely stable** and prevents the hard freeze, even when NetworkManager's `wifi.powersave` is set to ON (default). However, we noticed one interesting detail in the kernel logs: while the system no longer freezes with `disable_aspm=y`, `dmesg` still constantly logs `firmware failed to leave lps state` and `failed to send h2c command` when the laptop is completely idle. It seems the firmware still crashes during LPS, but because ASPM is disabled, the PCIe bus ignores the crash and the system survives perfectly fine. I just wanted to mention this for completeness! ### Testing Plan I have **not** applied your RFT patch just yet. I wanted to make sure our testing baseline was 100% clean and documented first. I will compile your patch and perform rigorous testing this evening (I am in the EET timezone, Ukraine). I will test it with the native `power_save` fully enabled to ensure your patch successfully prevents the hard lockups as intended. I will stay in touch and reply back to this thread with a formal `Tested-by` confirmation (and any logs if needed) as soon as my testing is complete. Thank you again for all your help! Best regards, Oleksandr Havrylov ср, 11 мар. 2026 г. в 04:22, Ping-Ke Shih <pkshih@realtek.com>: > > Ping-Ke Shih <pkshih@realtek.com> wrote: > > > > LB F <goainwo@gmail.com> wrote: > > > > > > Hi Ping-Ke, > > > > > > Thank you for the incredibly fast response and assistance! > > > > > > > Can you dig kernel log (by netconsole or ramoops) if something useful? > > > > I'd like to know this is hardware level freeze or kernel can capture something > > > wrong. > > > > > > I managed to pull a call trace from a historic journald log just > > > before the system hung. The kernel gets trapped in an IRQ thread > > > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` > > > `ieee80211_rx_list` before everything freezes. Here is the relevant > > > snippet: > > > > > > ```text > > > Call Trace: > > > <IRQ> > > > ? __alloc_skb+0x23a/0x2a0 > > > ? __alloc_skb+0x10c/0x2a0 > > > ? __pfx_irq_thread_fn+0x10/0x10 > > > [ ... truncated module list ... ] > > > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) > > > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > > > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] > > > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc > > > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] > > > ``` > > > > > > It behaves exactly like a PCIe bus deadlock or a hardware fault that > > > eventually brings down the CPU handling the IRQ. > > > > I wonder if there is a malformed data, causing this trace and the leads > > kernel freezes. If we can do validation on RX data before calling > > ieee80211_rx_list(), maybe trace disappears and everything will be fine? > > Even no need workaround. > > > > > > > > > Are these totally needed to workaround the problem? Or disable_aspm is enough? > > > > I'd list them in order of power consumption impact: > > > > 1. disable_aspm=y > > > > 2. disable_lps_deep=y > > > > 3. disable WiFi power save > > > > > > To verify which parameters are strictly necessary, I performed > > > isolated testing today. I ensured no other modprobe configs were > > > active, rebuilt the initramfs, and manually enforced that > > > `wifi.powersave` was active via `iw dev wlan0 set power_save on` > > > during all tests (as the OS power management profiles were defaulting > > > it to off, which initially masked the issue). > > > > > > I tested each workaround individually across multiple sleep/wake > > > cycles and active usage: > > > > > > **Test 1 (ASPM Disabled, LPS Deep Enabled):** > > > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core > > > disable_lps_deep=n`) > > > - Result: Stable. No freezes were observed during usage or transitions > > > into/out of S3 sleep while power saving was enforced. > > > > > > **Test 2 (ASPM Enabled, LPS Deep Disabled):** > > > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci > > > disable_aspm=n`) > > > - Result: Stable. No freezes were observed under the same forced power > > > save conditions. > > > > > > **Conclusion:** It appears we do not need both workarounds > > > simultaneously for this specific hardware. Using only `disable_aspm=y` > > > seems to be sufficient to prevent the system freeze. Given your note > > > about the power consumption impact ranking, this looks like the > > > optimal path forward. > > > > Let's test my RFT patch to disable ASPM then. > > > > > > > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > > > > and going to receive packets. The rx_no_aspm workaround is to forcely turn > > > > off ASPM during this period. > > > > > > By "deadlock" I meant a hardware-level bus lockup. It seems the > > > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus > > > when trying to negotiate waking up from ASPM L1 while simultaneously > > > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI > > > helps during active Rx decoding, but the laptop often freezes while > > > completely idle, presumably when the AP sends a basic beacon, the chip > > > attempts to leave LPS Deep + L1, and the hardware simply gives up and > > > halts the system. > > > > I think this is your perspective and induction, right? Did you measure > > real hardware signals? > > > > My point is that if this is a hardware-level bus lockup, let's apply > > quirk. If some malformed data causing kernel hangs, I'd add sanity check > > on RX data, but I don't actually know what we should check for now. > > > > > > > > > We have not modified RTL8821CE for a long time, so I'd add workaround > > > > to specific platform as mentioned above. > > > > > > Adding a DMI/platform quirk specifically for this laptop to disable > > > ASPM would be wonderful and deeply appreciated. I agree it is safer > > > than touching the global flags for hardware that is functioning > > > correctly out in the wild. > > > > > > Here is the exact identifying information for my system: > > > > > > System Vendor: HP > > > Product Name: HP Notebook > > > SKU Number: P3S95EA#ACB > > > Family: 103C_5335KV > > > PCI ID: 10ec:c821 > > > Subsystem ID: 103c:831a > > > > > > I am completely ready to test any patch or quirk you send my way. > > > Thank you so much for your time and helping track this down! > > > > I sent a RFT [1] for test. Please check if it works on your HP notebook. > > If you check rtw88 log, you can see I added similar patch 5 years ago, > > and replaced by preferred the change of "rtwpci->rx_no_aspm", which I > > think it can only resolve problem on partial notebooks though.... > > > > [1] > > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek. > > com/T/#u > > Forgot to say. Could you share your full name for me as a reporter > in commit message? > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-11 11:00 ` LB F @ 2026-03-11 15:22 ` LB F 2026-03-12 1:56 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-11 15:22 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, I successfully applied your patch out-of-tree and performed rigorous testing on the host machine. I can officially confirm that the patch works flawlessly. The DMI quirk triggered correctly and successfully prevented the hardware-level PCIe bus lockups on my HP P3S95EA#ACB. Testing Environment & Methodology: - Kernel: CachyOS Linux 6.19.6-2-cachyos x86_64 - Toolchain: Clang/LLVM 21.1.8 (`make CC=clang LLVM=1 modules`) - Extraction: We fetched the strict `drivers/net/wireless/realtek/rtw88` sub-tree out of the torvalds/linux `v6.19` tree utilizing `git sparse-checkout` to cleanly apply the patch without having to compile the entire 2.5GB+ kernel. - The resulting `.ko` object files were compressed to `.zst` and installed successfully over the generic CachyOS system driver objects. Verification Conditions: - Removed ALL local workarounds. `disable_aspm=Y` is no longer forced via `/etc/modprobe.d/` overrides. - Power saving remains natively ON `wifi.powersave = 3` (managed by NetworkManager). - Left the laptop in multiple 5-10 minute complete idle states to enforce sleep modes. Post-Boot Log Analysis & Potential Improvement Proposition: The system remained 100% stable without any kernel panics or UI freezes. However, I continuously monitored the `dmesg` ring buffer and noticed an intriguing behavior. While the laptop sits completely idle (NetworkManager connected, but no active traffic), the `rtw88` driver starts flooded the logs with thousands of firmware errors: [ 1084.746485] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state [ 1084.749662] rtw88_8821ce 0000:13:00.0: failed to send h2c command [ 1084.752895] rtw88_8821ce 0000:13:00.0: failed to send h2c command If my understanding of this architecture is correct, previously, when ASPM wasn't disabled, this exact failure of the adapter firmare inside `LPS_DEEP_MODE_LCLK` would violently lock up the PCIe bus and crash the host. Now, thanks to your DMI ASPM quirk at the `rtw88_pci` level, the host PCIe controller doesn't enter `L1` and is perfectly shielded from the adapter locking itself up! The OS handles the timeouts gracefully and driver recovery prevents a hard freeze. A question for your consideration: Given the immense volume of these `h2c` timeout errors (and the underlying firmware's fundamental inability to cleanly enter/exit its own sleep states without L1 participation on this HP model), do you think it would be beneficial to *also* dynamically disable LPS Deep sleep when this specific ASPM quirk is triggered? For example, dynamically forcing `rtwdev->lps_conf.deep_mode = LPS_DEEP_MODE_NONE` when the DMI ASPM flag is active, strictly to prevent the firmware from attempting a sleep cycle that is doomed to fail and polluting the queues and logs? Perhaps this might also save microscopic CPU interrupts from continuous H2C polling timeouts? If you believe that simply letting the driver recover and tolerating the error spam in `dmesg` is the preferred/safer upstream approach, I am perfectly happy. The patch functions as advertised and system stability is unequivocally restored! Thank you immensely for your rapid debugging and definitive patch for this long-standing issue and for bringing stability to this model. Tested-by: Oleksandr Havrylov <goainwo@gmail.com> *(Note: I was a bit unsure which of the two active mailing list threads was the most appropriate place for this final report — the original bug discussion or the new RFT patch submission thread — so I replied to both just to ensure it is correctly attached to the patch. Apologies for the duplicate email!)* Best regards, Oleksandr Havrylov ср, 11 мар. 2026 г. в 13:00, LB F <goainwo@gmail.com>: > > Hi Ping-Ke, > > Thank you for the incredibly fast turnaround and for providing the RFT > patch with the DMI quirk! > > First, I want to mention that I am not an IT professional or a > programmer. I am just a regular Linux user who really wants to help > solve this problem. I am trying my best to verify everything > carefully, so please forgive me if my terminology or induction was > slightly off. > > To answer your clarifying questions from the previous emails: > > > Just want to clarify that these logs only appear in test 3, right? > > No these logs in test 1/2. > > Yes, exactly. The `failed to send h2c command` errors only caused a > complete system freeze when no workarounds were active and the adapter > attempted to sleep (Test 3). > > > I think this is your perspective and induction, right? Did you measure > > real hardware signals? > > You are entirely correct. This is just my induction based solely on > the timing of the logs and system behavior. I do not have access to an > oscilloscope or any hardware diagnostic tools. Given this, I > completely agree that your approach of applying a platform-specific > quirk is the safest and best solution. > > > Forgot to say. Could you share your full name for me as a reporter > > in commit message? > > My full name is Oleksandr Havrylov. I would be honored to be included > as the reporter in the commit message. > > ### Recent Baseline Testing Before Your Patch > > Before applying your patch today, we ran a few more controlled tests > to double-check our baseline. We verified that our local workaround > (`modprobe.d disable_aspm=y`) **does indeed keep the system completely > stable** and prevents the hard freeze, even when NetworkManager's > `wifi.powersave` is set to ON (default). > > However, we noticed one interesting detail in the kernel logs: while > the system no longer freezes with `disable_aspm=y`, `dmesg` still > constantly logs `firmware failed to leave lps state` and `failed to > send h2c command` when the laptop is completely idle. It seems the > firmware still crashes during LPS, but because ASPM is disabled, the > PCIe bus ignores the crash and the system survives perfectly fine. I > just wanted to mention this for completeness! > > ### Testing Plan > > I have **not** applied your RFT patch just yet. I wanted to make sure > our testing baseline was 100% clean and documented first. > > I will compile your patch and perform rigorous testing this evening (I > am in the EET timezone, Ukraine). I will test it with the native > `power_save` fully enabled to ensure your patch successfully prevents > the hard lockups as intended. > > I will stay in touch and reply back to this thread with a formal > `Tested-by` confirmation (and any logs if needed) as soon as my > testing is complete. Thank you again for all your help! > > Best regards, > Oleksandr Havrylov > > ср, 11 мар. 2026 г. в 04:22, Ping-Ke Shih <pkshih@realtek.com>: > > > > Ping-Ke Shih <pkshih@realtek.com> wrote: > > > > > > LB F <goainwo@gmail.com> wrote: > > > > > > > > Hi Ping-Ke, > > > > > > > > Thank you for the incredibly fast response and assistance! > > > > > > > > > Can you dig kernel log (by netconsole or ramoops) if something useful? > > > > > I'd like to know this is hardware level freeze or kernel can capture something > > > > wrong. > > > > > > > > I managed to pull a call trace from a historic journald log just > > > > before the system hung. The kernel gets trapped in an IRQ thread > > > > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211` > > > > `ieee80211_rx_list` before everything freezes. Here is the relevant > > > > snippet: > > > > > > > > ```text > > > > Call Trace: > > > > <IRQ> > > > > ? __alloc_skb+0x23a/0x2a0 > > > > ? __alloc_skb+0x10c/0x2a0 > > > > ? __pfx_irq_thread_fn+0x10/0x10 > > > > [ ... truncated module list ... ] > > > > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full) > > > > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > > > > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211] > > > > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc > > > > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci] > > > > ``` > > > > > > > > It behaves exactly like a PCIe bus deadlock or a hardware fault that > > > > eventually brings down the CPU handling the IRQ. > > > > > > I wonder if there is a malformed data, causing this trace and the leads > > > kernel freezes. If we can do validation on RX data before calling > > > ieee80211_rx_list(), maybe trace disappears and everything will be fine? > > > Even no need workaround. > > > > > > > > > > > > Are these totally needed to workaround the problem? Or disable_aspm is enough? > > > > > I'd list them in order of power consumption impact: > > > > > 1. disable_aspm=y > > > > > 2. disable_lps_deep=y > > > > > 3. disable WiFi power save > > > > > > > > To verify which parameters are strictly necessary, I performed > > > > isolated testing today. I ensured no other modprobe configs were > > > > active, rebuilt the initramfs, and manually enforced that > > > > `wifi.powersave` was active via `iw dev wlan0 set power_save on` > > > > during all tests (as the OS power management profiles were defaulting > > > > it to off, which initially masked the issue). > > > > > > > > I tested each workaround individually across multiple sleep/wake > > > > cycles and active usage: > > > > > > > > **Test 1 (ASPM Disabled, LPS Deep Enabled):** > > > > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core > > > > disable_lps_deep=n`) > > > > - Result: Stable. No freezes were observed during usage or transitions > > > > into/out of S3 sleep while power saving was enforced. > > > > > > > > **Test 2 (ASPM Enabled, LPS Deep Disabled):** > > > > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci > > > > disable_aspm=n`) > > > > - Result: Stable. No freezes were observed under the same forced power > > > > save conditions. > > > > > > > > **Conclusion:** It appears we do not need both workarounds > > > > simultaneously for this specific hardware. Using only `disable_aspm=y` > > > > seems to be sufficient to prevent the system freeze. Given your note > > > > about the power consumption impact ranking, this looks like the > > > > optimal path forward. > > > > > > Let's test my RFT patch to disable ASPM then. > > > > > > > > > > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR, > > > > > and going to receive packets. The rx_no_aspm workaround is to forcely turn > > > > > off ASPM during this period. > > > > > > > > By "deadlock" I meant a hardware-level bus lockup. It seems the > > > > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus > > > > when trying to negotiate waking up from ASPM L1 while simultaneously > > > > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI > > > > helps during active Rx decoding, but the laptop often freezes while > > > > completely idle, presumably when the AP sends a basic beacon, the chip > > > > attempts to leave LPS Deep + L1, and the hardware simply gives up and > > > > halts the system. > > > > > > I think this is your perspective and induction, right? Did you measure > > > real hardware signals? > > > > > > My point is that if this is a hardware-level bus lockup, let's apply > > > quirk. If some malformed data causing kernel hangs, I'd add sanity check > > > on RX data, but I don't actually know what we should check for now. > > > > > > > > > > > > We have not modified RTL8821CE for a long time, so I'd add workaround > > > > > to specific platform as mentioned above. > > > > > > > > Adding a DMI/platform quirk specifically for this laptop to disable > > > > ASPM would be wonderful and deeply appreciated. I agree it is safer > > > > than touching the global flags for hardware that is functioning > > > > correctly out in the wild. > > > > > > > > Here is the exact identifying information for my system: > > > > > > > > System Vendor: HP > > > > Product Name: HP Notebook > > > > SKU Number: P3S95EA#ACB > > > > Family: 103C_5335KV > > > > PCI ID: 10ec:c821 > > > > Subsystem ID: 103c:831a > > > > > > > > I am completely ready to test any patch or quirk you send my way. > > > > Thank you so much for your time and helping track this down! > > > > > > I sent a RFT [1] for test. Please check if it works on your HP notebook. > > > If you check rtw88 log, you can see I added similar patch 5 years ago, > > > and replaced by preferred the change of "rtwpci->rx_no_aspm", which I > > > think it can only resolve problem on partial notebooks though.... > > > > > > [1] > > > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek. > > > com/T/#u > > > > Forgot to say. Could you share your full name for me as a reporter > > in commit message? > > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-11 15:22 ` LB F @ 2026-03-12 1:56 ` Ping-Ke Shih 2026-03-12 21:42 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-12 1:56 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Hi Ping-Ke, > > I successfully applied your patch out-of-tree and performed rigorous > testing on the host machine. > > I can officially confirm that the patch works flawlessly. The DMI > quirk triggered correctly and successfully prevented the > hardware-level PCIe bus lockups on my HP P3S95EA#ACB. Thanks for your quickly test with my patch. :) > > Testing Environment & Methodology: > - Kernel: CachyOS Linux 6.19.6-2-cachyos x86_64 > - Toolchain: Clang/LLVM 21.1.8 (`make CC=clang LLVM=1 modules`) > - Extraction: We fetched the strict > `drivers/net/wireless/realtek/rtw88` sub-tree out of the > torvalds/linux `v6.19` tree utilizing `git sparse-checkout` to cleanly > apply the patch without having to compile the entire 2.5GB+ kernel. > - The resulting `.ko` object files were compressed to `.zst` and > installed successfully over the generic CachyOS system driver objects. > > Verification Conditions: > - Removed ALL local workarounds. `disable_aspm=Y` is no longer forced > via `/etc/modprobe.d/` overrides. > - Power saving remains natively ON `wifi.powersave = 3` (managed by > NetworkManager). > - Left the laptop in multiple 5-10 minute complete idle states to > enforce sleep modes. > > Post-Boot Log Analysis & Potential Improvement Proposition: > The system remained 100% stable without any kernel panics or UI freezes. > However, I continuously monitored the `dmesg` ring buffer and noticed > an intriguing behavior. While the laptop sits completely idle > (NetworkManager connected, but no active traffic), the `rtw88` driver > starts flooded the logs with thousands of firmware errors: > > [ 1084.746485] rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state > [ 1084.749662] rtw88_8821ce 0000:13:00.0: failed to send h2c command > [ 1084.752895] rtw88_8821ce 0000:13:00.0: failed to send h2c command > > If my understanding of this architecture is correct, previously, when > ASPM wasn't disabled, this exact failure of the adapter firmare inside > `LPS_DEEP_MODE_LCLK` would violently lock up the PCIe bus and crash > the host. Now, thanks to your DMI ASPM quirk at the `rtw88_pci` level, > the host PCIe controller doesn't enter `L1` and is perfectly shielded > from the adapter locking itself up! The OS handles the timeouts > gracefully and driver recovery prevents a hard freeze. I'm really not sure how/why kernel becomes frozen. As I mentioned before it might because of received malformed data and no complete validation before reporting RX packet to mac80211. Not sure if you can try to dig and add some validation? (Current DMI patch is fine to me.) > > A question for your consideration: Given the immense volume of these > `h2c` timeout errors (and the underlying firmware's fundamental > inability to cleanly enter/exit its own sleep states without L1 > participation on this HP model), do you think it would be beneficial > to *also* dynamically disable LPS Deep sleep when this specific ASPM > quirk is triggered? > > For example, dynamically forcing `rtwdev->lps_conf.deep_mode = > LPS_DEEP_MODE_NONE` when the DMI ASPM flag is active, strictly to > prevent the firmware from attempting a sleep cycle that is doomed to > fail and polluting the queues and logs? Perhaps this might also save > microscopic CPU interrupts from continuous H2C polling timeouts? Are the 'h2c' timeout messages flooding? or appears periodically? Does it really affect connection stable? If you change another AP or connection on 5GHz band, does the messages still present? I think it isn't easy to find out the cause without measuring hardware signals, since I saw the message very very rare. So, I'd adopt your suggestion (dynamic LPS_DEEP_MODE_NONE) if the test is positive. > > If you believe that simply letting the driver recover and tolerating > the error spam in `dmesg` is the preferred/safer upstream approach, I > am perfectly happy. The patch functions as advertised and system > stability is unequivocally restored! > > Thank you immensely for your rapid debugging and definitive patch for > this long-standing issue and for bringing stability to this model. > > Tested-by: Oleksandr Havrylov <goainwo@gmail.com> I will add this to my patch then. > > *(Note: I was a bit unsure which of the two active mailing list > threads was the most appropriate place for this final report — the > original bug discussion or the new RFT patch submission thread — so I > replied to both just to ensure it is correctly attached to the patch. > Apologies for the duplicate email!)* > Let's discuss in this thread. For RFT patch, I suppose you only reply me about the test result and give me Tested-by tag if it works. By the way, your this reply is top posting that mailing list isn't preferred, so I delete old discussion. Please avoid this in the future. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-12 1:56 ` Ping-Ke Shih @ 2026-03-12 21:42 ` LB F 2026-03-13 0:03 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-12 21:42 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > I'm really not sure how/why kernel becomes frozen. As I mentioned before > it might because of received malformed data and no complete validation > before reporting RX packet to mac80211. > Not sure if you can try to dig and add some validation? I reviewed both rx.c and pci.c in detail and found a genuine validation gap specific to the 8821CE chip. In rtw_pci_rx_napi() (pci.c), the RX path allocates a new skb based on the pkt_len field from the RX descriptor: new_len = pkt_stat.pkt_len + pkt_offset; new = dev_alloc_skb(new_len); skb_put_data(new, skb->data, new_len); /* ... */ skb_pull(new, pkt_offset); ieee80211_rx_napi(rtwdev->hw, NULL, new, napi); If pkt_stat.pkt_len is zero, new_len equals pkt_offset, skb_put_data copies only the descriptor header, and skb_pull then removes that header -- leaving an empty skb (len=0) that is passed unconditionally to ieee80211_rx_napi() with no length guard. Protection already exists for the 8703B chip in rtw_rx_fill_rx_status(): if (rtwdev->chip->id == RTW_CHIP_TYPE_8703B && pkt_stat->pkt_len == 0) { rx_status->flag |= RX_FLAG_NO_PSDU; rtw_dbg(rtwdev, RTW_DBG_RX, "zero length packet"); } No equivalent check exists for RTW_CHIP_TYPE_8821CE. Removing the chip-id restriction would be a minimal, safe fix for all chips: --- a/rx.c +++ b/rx.c - if (rtwdev->chip->id == RTW_CHIP_TYPE_8703B && pkt_stat->pkt_len == 0) { + if (pkt_stat->pkt_len == 0) { I also checked PHY-level error counters from debugfs during normal operation (phy_info): OFDM cnt (ok, err) = (867, 11) -> 1.3% PHY CRC error rate VHT cnt (ok, err) = (267, 32) -> 10.7% PHY CRC error rate Frames with crc_err are passed to mac80211 with RX_FLAG_FAILED_FCS_CRC set (not dropped by the driver), which is the correct approach. However, I do not believe the freeze is caused by malformed RX data. The freeze occurs deterministically about 10 seconds after the system becomes fully idle with zero active network traffic, which matches the LPS_DEEP_MODE_LCLK entry sequence rather than a random data corruption pattern. The freeze behaviour also disappears entirely when ASPM L1 is disabled (as confirmed by the Live USB logs I provided earlier), which is the hallmark of a PCIe bus gating deadlock, not a data path issue. > Are the 'h2c' timeout messages flooding? or appears periodically? Does it > really affect connection stable? The errors appear periodically in bursts during idle; network connectivity is never affected (parallel ping tests show 0% packet loss). The flooding documented in previous tests (hundreds per minute) was observed under conditions where the LPS state machine had reached a persistent failure mode after extended uptime. In shorter tests from a fresh module load, the errors are sporadic (3-5 per 10 minutes). > If you change another AP or connection on 5GHz band, does the messages > still present? Yes. The issue has persisted for 2 years across 3 completely different Access Points. It is reproducible on 5GHz only (2.4GHz is disabled on all my networks). > I think it isn't easy to find out the cause without measuring hardware > signals, since I saw the message very very rare. So, I'd adopt your > suggestion (dynamic LPS_DEEP_MODE_NONE) if the test is positive. The test is definitively positive. Test environment: stock CachyOS 6.19.6 kernel, PCIe ASPM L1 confirmed ENABLED via lspci ('LnkCtl: ASPM L1 Enabled'), no out-of-tree patches. The rtw88 module stack was fully reloaded (including rtw88_core) for each scenario. The disable_lps_deep parameter, which belongs to rtw88_core, was verified via /sys/module/rtw88_core/parameters/ before and after each reload. Test protocol: after module reload and Wi-Fi reconnect (verified via HTTP 204 check), a 5-minute warm-up period elapsed before the 5-minute measurement window began. This ensures the firmware's LPS state machine has fully initialised before results are recorded. Methodology verified: 'modprobe -r rtw88_8821ce' removes only the chip-specific modules, leaving rtw88_core in memory. The correct procedure used was to explicitly also remove rtw88_core, then reload all modules with the desired parameter. Results (battery power, true idle each): disable_lps_deep=N (DEFAULT): Warm-up (5 min cumulative): h2c=4 lps=0 Measurement (5 min): h2c=0 lps=0 [errors are bursty] disable_lps_deep=Y (CONFIRMED via sysfs): Warm-up (5 min cumulative): h2c=0 lps=0 Measurement (5 min): h2c=0 lps=0 ALL 10 minutes: h2c=0 With disable_lps_deep=Y, not a single h2c timeout was recorded across the entire 10-minute observation window (warm-up + measurement). With disable_lps_deep=N, errors appeared within the first 5 minutes of idle. Setting disable_lps_deep=Y completely eliminates the firmware timeout loop, confirming that the root cause is the firmware attempting LPS_DEEP_MODE_LCLK while PCIe constraints prevent it from completing. Dynamic LPS_DEEP_MODE_NONE for the ASPM DMI quirk entry is the correct and complete architectural solution. --- Technical Appendix: RX Validation Audit Findings --- I performed a deep audit of the RX descriptor parsing logic in rx.c and pci.c. I found two concrete areas where validation is incomplete for the 8821CE: 1. Out-of-Bounds Read in rtw_pci_rx_napi (pci.c): The DMA buffer size is fixed at ~11.5KB (RTK_PCI_RX_BUF_SIZE). However, the hardware descriptor (W0_PKT_LEN) is 14 bits, allowing it to indicate up to 16KB. The driver calculates new_len = pkt_stat.pkt_len + pkt_offset and calls skb_put_data(new, skb->data, new_len) without checking if new_len exceeds the DMA source buffer. If hardware sends a malformed large length, this leads to an OOB read of adjacent memory. 2. Missing 8821CE guard in rtw_rx_fill_rx_status (rx.c): The check for pkt_len == 0 (which results in an empty SKB being passed to mac80211) is manually restricted to RTW_CHIP_TYPE_8703B: if (rtwdev->chip->id == RTW_CHIP_TYPE_8703B && pkt_stat->pkt_len == 0) Expanding this guard to all chips (or specifically 8821CE) would be safer. While these vulnerabilities exist, I still believe the freeze is PCIe-timing related (LCLK entry/ASPM conflict), as no RX-related warnings or memory corruption traces were found in dmesg prior to the hard freeze. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-12 21:42 ` LB F @ 2026-03-13 0:03 ` LB F 2026-03-13 0:29 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-13 0:03 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > I'm really not sure how/why kernel becomes frozen. As I mentioned before > it might because of received malformed data and no complete validation > before reporting RX packet to mac80211. > Not sure if you can try to dig and add some validation? Hi Ping-Ke, I took your advice and performed a deeper audit of the rtw88 PCI implementation, focusing on both validation and concurrency. While the RX gaps I previously mentioned are real, I found two critical architectural issues in the TX path that likely contribute to the "hard freezes" and DMA stalls we've seen. 1. Concurrency: TX Descriptor Management Race (pci.c:836) --------------------------------------------------------- In rtw_pci_tx_write_data(), rtw88 fetches the descriptor address based on the current write pointer (wp) BEFORE acquiring the irq_lock: ```c /* drivers/net/wireless/realtek/rtw88/pci.c:836 */ buf_desc = get_tx_buffer_desc(ring, tx_buf_desc_sz); memset(buf_desc, 0, tx_buf_desc_sz); /* ... packets are filled ... */ spin_lock_bh(&rtwpci->irq_lock); // [!] Lock is taken too late ``` Since mac80211 can call rtw_ops_tx and rtw_ops_wake_tx_queue (the latter calling __rtw_tx_work) concurrently on different CPUs—especially for high-priority AC_VO traffic—two threads can fetch the same wp for the same queue simultaneously. Result: CPU 0 prepares data in slot [N], while CPU 1 simultaneously zeros out or overwrites slot [N]. This explains why we see intermittent descriptor corruption and subsequent DMA/firmware hangs. 2. Synchronization: Missing DMA Memory Barrier (pci.c:786) ---------------------------------------------------------- In rtw_pci_tx_kick_off_queue(), the doorbell is hit without a memory barrier: ```c /* drivers/net/wireless/realtek/rtw88/pci.c:786 */ rtw_write16(rtwdev, bd_idx, ring->r.wp & TRX_BD_IDX_MASK); ``` For PCIe DMA, it is vital to ensure descriptor RAM writes are visible to the device before the MMIO register doorbell hits. Standard Linux practice usually dictates a wmb() here. Without it, the Wi-Fi controller may read stale or uninitialized memory, leading to the "failed to leave lps state" timeouts and H2C command failures we've logged. 3. Confirmed RX Limit Mismatch (rtw8821c.c:254) ----------------------------------------------- I verified that the hardware is explicitly programmed with a 12KB limit: ```c /* drivers/net/wireless/realtek/rtw88/rtw8821c.c:254 */ rtw_write8(rtwdev, REG_RX_PKT_LIMIT, WLAN_RX_PKT_LIMIT_512); ``` Since the driver's RX buffer (RTK_PCI_RX_BUF_SIZE) is only 11.2KB, any malformed or large packet will result in an OOB read in rtw_pci_rx_napi(). I believe addressing these three points (TX locking, TX barriers, and RX buffer consistency) would significantly harden the driver against the stability issues reported in Bug 221195. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-13 0:03 ` LB F @ 2026-03-13 0:29 ` LB F 2026-03-14 10:52 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-13 0:29 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, I apologize for the rapid follow-up and for being perhaps a bit over-assertive in my previous email. As I continued to dig into the code, I realized that some of my interpretations of hardware registers (like REG_RX_PKT_LIMIT) and kernel serialization might be simplified compared to the real-world complexities you deal with. I'd like to reframe my previous notes as "curious observations" that I stumbled upon while testing, and I'd value your professional take on whether they are relevant: 1. RX Host-Side Validation: While searching for the 12KB limit I mentioned, I noticed that in rtw_pci_rx_napi(), the driver uses the pkt_len field from the descriptor directly for skb_put_data() without checking it against the host buffer size (RTK_PCI_RX_BUF_SIZE). Even if the hardware normally clips DMA, would it be worth adding a host-side guard there as a "hardening" measure against potentially malformed hardware reports? 2. TX Write Pointer (wp) Fetch: I noticed that in rtw_pci_tx_write_data(), get_tx_buffer_desc() fetches the wp outside the irq_lock. I wasn't sure if mac80211 guarantees that the direct TX path and the background worker threads can never collide on the same queue, but I thought it was worth mentioning just in case. 3. Memory Barriers: The wmb() point was more of an architectural observation regarding PCI best practices for non-x86 platforms. I understand x86 is quite forgiving here, but I noticed it was a pattern that stood out. Please treat these as humble suggestions from someone trying to learn the driver's internals. I didn't mean to imply these were "critical bugs" without your expert verification. Thank you for your patience with my technical excitement! Best regards, Oleksandr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-13 0:29 ` LB F @ 2026-03-14 10:52 ` LB F 2026-03-14 12:39 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-14 10:52 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org After extended testing with your DMI patch applied, the hard freeze is gone. However, with ASPM disabled but LPS Deep still active, I observe periodic h2c timeouts during idle which cause occasional WiFi throughput drops and Bluetooth audio stuttering. When I additionally set disable_lps_deep=Y, all symptoms disappear completely. This confirms that combining the ASPM quirk with dynamic LPS_DEEP_MODE_NONE would be the complete fix. Ready to test an updated patch if you decide to include this. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-14 10:52 ` LB F @ 2026-03-14 12:39 ` LB F 2026-03-15 0:24 ` LB F 2026-03-16 2:50 ` Ping-Ke Shih 0 siblings, 2 replies; 34+ messages in thread From: LB F @ 2026-03-14 12:39 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > I'd adopt your suggestion (dynamic LPS_DEEP_MODE_NONE) if the test > is positive. Hi Ping-Ke, Following your suggestion, I performed an additional experiment to validate the dynamic LPS_DEEP_MODE_NONE idea. Please treat this purely as a field test report -- I am not a kernel developer, and the implementation below is certainly not upstream-quality. I am sharing it only in the hope that it helps you design a proper solution. What I did: I extended your DMI quirk in pci.c with an additional capability flag for LPS Deep mode. The only file touched was pci.c (your patch) -- main.c was left completely unmodified. The changes to your patch are as follows: /* 1. Extended the capabilities enum */ enum rtw88_quirk_dis_pci_caps { QUIRK_DIS_PCI_CAP_ASPM, QUIRK_DIS_PCI_CAP_LPS_DEEP, /* test addition */ }; /* 2. Extended disable_pci_caps() callback */ static int disable_pci_caps(const struct dmi_system_id *dmi) { uintptr_t dis_caps = (uintptr_t)dmi->driver_data; if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_ASPM)) rtw_pci_disable_aspm = true; if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_LPS_DEEP)) rtw_disable_lps_deep_mode = true; return 1; } /* 3. Both flags set for the HP P3S95EA#ACB entry */ .driver_data = (void *)(BIT(QUIRK_DIS_PCI_CAP_ASPM) | BIT(QUIRK_DIS_PCI_CAP_LPS_DEEP)), I am aware that setting rtw_disable_lps_deep_mode from pci.c is architecturally impure -- it is a global flag that would affect all rtw88 devices in a hypothetical multi-adapter system. A proper per-device solution (e.g. a flag inside struct rtw_dev set during probe) would be cleaner. I simply used the existing global as the most straightforward way to validate the concept. Verification: Confirmed no rtw88-related entries exist in /etc/modprobe.d/, /lib/modprobe.d/, or /run/modprobe.d/, ruling out any external parameter injection. After loading the patched modules, the following was confirmed via sysfs: /sys/module/rtw88_core/parameters/disable_lps_deep_mode = Y /sys/module/rtw88_pci/parameters/disable_aspm = Y This confirms the DMI quirk is the sole source of both values. Results (10-minute idle observation, battery power, wifi.powersave=3): With your ASPM patch alone (LPS Deep still active): - periodic "failed to send h2c command" bursts observed - occasional WiFi throughput drops and Bluetooth audio stuttering With ASPM patch + LPS Deep disabled via the quirk: - h2c=0, lps=0 across the entire observation window - WiFi throughput stable, Bluetooth audio uninterrupted The result confirms that disabling LPS Deep Mode in addition to ASPM completely eliminates the remaining firmware timeout loop on this platform. I hope this experiment is useful as a data point. Please feel free to discard the implementation and design a proper solution -- I am ready to test any updated patch you send. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-14 12:39 ` LB F @ 2026-03-15 0:24 ` LB F 2026-03-16 2:55 ` Ping-Ke Shih 2026-03-16 2:50 ` Ping-Ke Shih 1 sibling, 1 reply; 34+ messages in thread From: LB F @ 2026-03-15 0:24 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Oleksandr Havrylov <goainwo@gmail.com> wrote: > After extended testing with your DMI patch applied, the hard freeze is > gone. However, with ASPM disabled but LPS Deep still active, I observe > periodic h2c timeouts during idle which cause occasional WiFi throughput > drops and Bluetooth audio stuttering. When I additionally set > disable_lps_deep=Y, all symptoms disappear completely. This confirms > that combining the ASPM quirk with dynamic LPS_DEEP_MODE_NONE would be > the complete fix. Ready to test an updated patch if you decide to > include this. Hi Ping-Ke, While monitoring logs with the current patch applied, I noticed two things that might be useful. First, the following message appears each time the driver loads: rtw88_8821ce 0000:13:00.0: can't disable ASPM; OS doesn't have ASPM control This suggests the BIOS retains control over ASPM and prevents any OS-level override via pci_disable_link_state(). The system remains stable regardless, which confirms that the rtw_pci_disable_aspm flag approach in your patch is the correct and effective method here. Second, during normal operation I observe this warning periodically: WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x177/0x1020 [mac80211] This is the same location that appeared in the call trace just before the hard freeze. You mentioned earlier that malformed RX data reaching mac80211 could be a factor. I'm not sure if this warning is related, but I wanted to flag it in case it is useful for your RX validation investigation. No h2c timeouts or firmware errors have been observed. The system remains fully stable. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-15 0:24 ` LB F @ 2026-03-16 2:55 ` Ping-Ke Shih 2026-03-16 20:27 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-16 2:55 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > > Oleksandr Havrylov <goainwo@gmail.com> wrote: > > After extended testing with your DMI patch applied, the hard freeze is > > gone. However, with ASPM disabled but LPS Deep still active, I observe > > periodic h2c timeouts during idle which cause occasional WiFi throughput > > drops and Bluetooth audio stuttering. When I additionally set > > disable_lps_deep=Y, all symptoms disappear completely. This confirms > > that combining the ASPM quirk with dynamic LPS_DEEP_MODE_NONE would be > > the complete fix. Ready to test an updated patch if you decide to > > include this. > > Hi Ping-Ke, > > While monitoring logs with the current patch applied, I noticed two > things that might be useful. > > First, the following message appears each time the driver loads: > > rtw88_8821ce 0000:13:00.0: can't disable ASPM; OS doesn't have ASPM control > > This suggests the BIOS retains control over ASPM and prevents any > OS-level override via pci_disable_link_state(). The system remains > stable regardless, which confirms that the rtw_pci_disable_aspm flag > approach in your patch is the correct and effective method here. Not sure if this is because PCIE bridge has no ASPM capability? > > Second, during normal operation I observe this warning periodically: > > WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x177/0x1020 [mac80211] LN5491 (kernel v6.19.6) is: case RX_ENC_VHT: if (WARN_ONCE(status->rate_idx > 11 || !status->nss || status->nss > 8, "Rate marked as a VHT rate but data is invalid: MCS: %d, NSS: %d\n", status->rate_idx, status->nss)) goto drop; break; Looks like driver reports improper VHT nss/rate? But this warns once, and you message isn't like this. Could you check the source code LN5491 you are using? Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-16 2:55 ` Ping-Ke Shih @ 2026-03-16 20:27 ` LB F 2026-03-17 1:28 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-16 20:27 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > Not sure if this is because PCIE bridge has no ASPM capability? That could indeed be the case -- I do not have a way to confirm without further hardware-level inspection. > LN5491 (kernel v6.19.6) is: > case RX_ENC_VHT: > if (WARN_ONCE(status->rate_idx > 11 || > !status->nss || > status->nss > 8, > "Rate marked as a VHT rate but data is invalid: MCS: %d, NSS: %d\n", > status->rate_idx, status->nss)) > goto drop; > break; > Looks like driver reports improper VHT nss/rate? But this warns once, and > you message isn't like this. > Could you check the source code LN5491 you are using? The file net/mac80211/rx.c is not available on disk on my system (CachyOS ships only .h files in the headers package), but I located the exact warning message in journalctl: Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 This confirms that line 5491 in my kernel matches exactly what you showed from v6.19.6 -- the RX_ENC_VHT case checking for status->nss == 0. The offset in my trace is slightly different (+0x183 vs +0x177), which is likely due to CachyOS's LTO/AutoFDO compiler optimizations. The warning appeared once in my initial test session: Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x183/0x1020 [mac80211] However, in subsequent module reload and reconnect cycles I was unable to reproduce it. This is consistent with WARN_ONCE behavior -- it likely fired on the first invalid nss=0 packet after the initial driver load and has not triggered since. I cannot confirm it as a reliable symptom. --- Regarding patch stability: the results below are from testing your original RFT patch [1], not any newer submission. I want to be explicit to avoid confusion: [1] https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek.com/ This is the exact diff I compiled and tested: --- a/drivers/net/wireless/realtek/rtw88/pci.c +++ b/drivers/net/wireless/realtek/rtw88/pci.c @@ -2,6 +2,7 @@ /* Copyright(c) 2018-2019 Realtek Corporation */ +#include <linux/dmi.h> #include <linux/module.h> #include <linux/pci.h> #include "main.h" @@ -1744,6 +1745,34 @@ const struct pci_error_handlers rtw_pci_err_handler = { }; EXPORT_SYMBOL(rtw_pci_err_handler); +enum rtw88_quirk_dis_pci_caps { + QUIRK_DIS_PCI_CAP_ASPM, +}; + +static int disable_pci_caps(const struct dmi_system_id *dmi) +{ + uintptr_t dis_caps = (uintptr_t)dmi->driver_data; + + if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_ASPM)) + rtw_pci_disable_aspm = true; + + return 1; +} + +static const struct dmi_system_id rtw88_pci_quirks[] = { + { + .callback = disable_pci_caps, + .ident = "HP Notebook - P3S95EA#ACB", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "HP"), + DMI_MATCH(DMI_PRODUCT_NAME, "HP Notebook"), + DMI_MATCH(DMI_PRODUCT_SKU, "P3S95EA#ACB"), + }, + .driver_data = (void *)BIT(QUIRK_DIS_PCI_CAP_ASPM), + }, + {} +}; + int rtw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -1808,6 +1837,7 @@ int rtw_pci_probe(struct pci_dev *pdev, bridge && bridge->vendor == PCI_VENDOR_ID_INTEL) rtwpci->rx_no_aspm = true; + dmi_check_system(rtw88_pci_quirks); rtw_pci_phy_cfg(rtwdev); ret = rtw_register_hw(rtwdev, hw); Results with only this patch applied: - The hard freeze lockup is gone. - However, during idle the logs are flooded with: rtw88_8821ce 0000:13:00.0: failed to send h2c command rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state - To give a concrete sense of the volume: over an ~80-minute observation window after a clean module reload, I recorded 11,757 "failed to send h2c command" events and 2 "firmware failed to leave lps state" events -- approximately 110 errors per minute during active periods. - These errors cause Bluetooth audio stuttering and WiFi throughput drops. When I additionally set disable_lps_deep=Y alongside your ASPM patch, all h2c errors vanish completely and Bluetooth/WiFi remain fully stable. This confirms that disabling LPS Deep is necessary for complete stability on this specific HP SKU. I also noticed what appears to be a new patch in a separate mailing list thread. I will test it shortly and report back with the results. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-16 20:27 ` LB F @ 2026-03-17 1:28 ` Ping-Ke Shih 2026-03-18 0:00 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-17 1:28 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > > Ping-Ke Shih <pkshih@realtek.com> wrote: > > Not sure if this is because PCIE bridge has no ASPM capability? > > That could indeed be the case -- I do not have a way to confirm > without further hardware-level inspection. > > > LN5491 (kernel v6.19.6) is: > > case RX_ENC_VHT: > > if (WARN_ONCE(status->rate_idx > 11 || > > !status->nss || > > status->nss > 8, > > "Rate marked as a VHT rate but data is > invalid: MCS: %d, NSS: %d\n", > > status->rate_idx, status->nss)) > > goto drop; > > break; > > Looks like driver reports improper VHT nss/rate? But this warns once, and > > you message isn't like this. > > Could you check the source code LN5491 you are using? > > The file net/mac80211/rx.c is not available on disk on my system > (CachyOS ships only .h files in the headers package), but I located > the exact warning message in journalctl: > > Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 > > This confirms that line 5491 in my kernel matches exactly what you > showed from v6.19.6 -- the RX_ENC_VHT case checking for > status->nss == 0. The offset in my trace is slightly different > (+0x183 vs +0x177), which is likely due to CachyOS's LTO/AutoFDO > compiler optimizations. > > The warning appeared once in my initial test session: > > Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 > WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x183/0x1020 [mac80211] > > However, in subsequent module reload and reconnect cycles I was unable > to reproduce it. This is consistent with WARN_ONCE behavior -- it > likely fired on the first invalid nss=0 packet after the initial > driver load and has not triggered since. I cannot confirm it as a > reliable symptom. To reproduce this reliable, you need to remove driver ko and mac80211.ko, and reinstall them. However, you have confirmed this is the symptom. I think only if you want to dig why the rate reported by hardware is weird, otherwise we can ignore this warning. > > --- > > Regarding patch stability: the results below are from testing your > original RFT patch [1], not any newer submission. I want to be > explicit to avoid confusion: > > [1] > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek. > com/ > > This is the exact diff I compiled and tested: > > --- a/drivers/net/wireless/realtek/rtw88/pci.c > +++ b/drivers/net/wireless/realtek/rtw88/pci.c > @@ -2,6 +2,7 @@ > /* Copyright(c) 2018-2019 Realtek Corporation > */ > > +#include <linux/dmi.h> > #include <linux/module.h> > #include <linux/pci.h> > #include "main.h" > @@ -1744,6 +1745,34 @@ const struct pci_error_handlers rtw_pci_err_handler = { > }; > EXPORT_SYMBOL(rtw_pci_err_handler); > > +enum rtw88_quirk_dis_pci_caps { > + QUIRK_DIS_PCI_CAP_ASPM, > +}; > + > +static int disable_pci_caps(const struct dmi_system_id *dmi) > +{ > + uintptr_t dis_caps = (uintptr_t)dmi->driver_data; > + > + if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_ASPM)) > + rtw_pci_disable_aspm = true; > + > + return 1; > +} > + > +static const struct dmi_system_id rtw88_pci_quirks[] = { > + { > + .callback = disable_pci_caps, > + .ident = "HP Notebook - P3S95EA#ACB", > + .matches = { > + DMI_MATCH(DMI_SYS_VENDOR, "HP"), > + DMI_MATCH(DMI_PRODUCT_NAME, "HP Notebook"), > + DMI_MATCH(DMI_PRODUCT_SKU, "P3S95EA#ACB"), > + }, > + .driver_data = (void *)BIT(QUIRK_DIS_PCI_CAP_ASPM), > + }, > + {} > +}; > + > int rtw_pci_probe(struct pci_dev *pdev, > const struct pci_device_id *id) > { > @@ -1808,6 +1837,7 @@ int rtw_pci_probe(struct pci_dev *pdev, > bridge && bridge->vendor == PCI_VENDOR_ID_INTEL) > rtwpci->rx_no_aspm = true; > > + dmi_check_system(rtw88_pci_quirks); > rtw_pci_phy_cfg(rtwdev); > > ret = rtw_register_hw(rtwdev, hw); > > Results with only this patch applied: > > - The hard freeze lockup is gone. > - However, during idle the logs are flooded with: > > rtw88_8821ce 0000:13:00.0: failed to send h2c command > rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state > > - To give a concrete sense of the volume: over an ~80-minute > observation window after a clean module reload, I recorded > 11,757 "failed to send h2c command" events and 2 "firmware > failed to leave lps state" events -- approximately 110 errors > per minute during active periods. > - These errors cause Bluetooth audio stuttering and WiFi > throughput drops. > > When I additionally set disable_lps_deep=Y alongside your ASPM patch, > all h2c errors vanish completely and Bluetooth/WiFi remain fully > stable. This confirms that disabling LPS Deep is necessary for > complete stability on this specific HP SKU. > > I also noticed what appears to be a new patch in a separate mailing > list thread. I will test it shortly and report back with the results. Thanks for your experiments in detail. :) Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-17 1:28 ` Ping-Ke Shih @ 2026-03-18 0:00 ` LB F 2026-03-18 0:58 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-18 0:00 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > To reproduce this reliable, you need to remove driver ko and mac80211.ko, > and reinstall them. > > However, you have confirmed this is the symptom. I think only if you > want to dig why the rate reported by hardware is weird, otherwise we > can ignore this warning. Following your suggestion, I performed a full stack reload including mac80211.ko and cfg80211.ko, and was able to reproduce the warning: [152.226055] Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 [152.226057] WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x177/0x1020 [mac80211] [152.226336] CPU: 2 UID: 0 PID: 638 Comm: irq/56-rtw_pci Tainted: G IOE 6.19.7-1-cachyos [152.226344] Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 One observation worth mentioning: the warning triggered approximately 72 seconds after initial association, coinciding with a Bluetooth device connecting to the system. This may suggest the NSS=0 condition occurs during BT coexistence negotiation rather than during normal WiFi traffic. I am not sure if this is relevant, but I wanted to mention it in case it helps narrow down the root cause. I also noticed the offset is now +0x177, which matches exactly what you showed from v6.19.6. The earlier +0x183 was likely an artifact of CachyOS's LTO optimizations while mac80211 had been resident for a long time. As you noted, this appears to be a separate issue from the freeze and h2c timeout problems, so I leave it to your judgment whether it warrants further investigation. --- I would like to take this opportunity to thank you sincerely for your time, patience, and expertise throughout this whole process. From your very first response to the final v2 patch, your guidance made it possible to properly identify and resolve a bug that had been causing real frustration for users of this hardware for a long time. If any further testing of the rtw88 driver is needed -- whether for this hardware or for other patches -- I am happy to help to the best of my abilities and available time. This HP Notebook with RTL8821CE will remain available for testing whenever it is useful. Best regards, Oleksandr Havrylov вт, 17 мар. 2026 г. в 03:28, Ping-Ke Shih <pkshih@realtek.com>: > > LB F <goainwo@gmail.com> wrote: > > > > Ping-Ke Shih <pkshih@realtek.com> wrote: > > > Not sure if this is because PCIE bridge has no ASPM capability? > > > > That could indeed be the case -- I do not have a way to confirm > > without further hardware-level inspection. > > > > > LN5491 (kernel v6.19.6) is: > > > case RX_ENC_VHT: > > > if (WARN_ONCE(status->rate_idx > 11 || > > > !status->nss || > > > status->nss > 8, > > > "Rate marked as a VHT rate but data is > > invalid: MCS: %d, NSS: %d\n", > > > status->rate_idx, status->nss)) > > > goto drop; > > > break; > > > Looks like driver reports improper VHT nss/rate? But this warns once, and > > > you message isn't like this. > > > Could you check the source code LN5491 you are using? > > > > The file net/mac80211/rx.c is not available on disk on my system > > (CachyOS ships only .h files in the headers package), but I located > > the exact warning message in journalctl: > > > > Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 > > > > This confirms that line 5491 in my kernel matches exactly what you > > showed from v6.19.6 -- the RX_ENC_VHT case checking for > > status->nss == 0. The offset in my trace is slightly different > > (+0x183 vs +0x177), which is likely due to CachyOS's LTO/AutoFDO > > compiler optimizations. > > > > The warning appeared once in my initial test session: > > > > Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 > > WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x183/0x1020 [mac80211] > > > > However, in subsequent module reload and reconnect cycles I was unable > > to reproduce it. This is consistent with WARN_ONCE behavior -- it > > likely fired on the first invalid nss=0 packet after the initial > > driver load and has not triggered since. I cannot confirm it as a > > reliable symptom. > > To reproduce this reliable, you need to remove driver ko and mac80211.ko, > and reinstall them. > > However, you have confirmed this is the symptom. I think only if you > want to dig why the rate reported by hardware is weird, otherwise we > can ignore this warning. > > > > > --- > > > > Regarding patch stability: the results below are from testing your > > original RFT patch [1], not any newer submission. I want to be > > explicit to avoid confusion: > > > > [1] > > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek. > > com/ > > > > This is the exact diff I compiled and tested: > > > > --- a/drivers/net/wireless/realtek/rtw88/pci.c > > +++ b/drivers/net/wireless/realtek/rtw88/pci.c > > @@ -2,6 +2,7 @@ > > /* Copyright(c) 2018-2019 Realtek Corporation > > */ > > > > +#include <linux/dmi.h> > > #include <linux/module.h> > > #include <linux/pci.h> > > #include "main.h" > > @@ -1744,6 +1745,34 @@ const struct pci_error_handlers rtw_pci_err_handler = { > > }; > > EXPORT_SYMBOL(rtw_pci_err_handler); > > > > +enum rtw88_quirk_dis_pci_caps { > > + QUIRK_DIS_PCI_CAP_ASPM, > > +}; > > + > > +static int disable_pci_caps(const struct dmi_system_id *dmi) > > +{ > > + uintptr_t dis_caps = (uintptr_t)dmi->driver_data; > > + > > + if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_ASPM)) > > + rtw_pci_disable_aspm = true; > > + > > + return 1; > > +} > > + > > +static const struct dmi_system_id rtw88_pci_quirks[] = { > > + { > > + .callback = disable_pci_caps, > > + .ident = "HP Notebook - P3S95EA#ACB", > > + .matches = { > > + DMI_MATCH(DMI_SYS_VENDOR, "HP"), > > + DMI_MATCH(DMI_PRODUCT_NAME, "HP Notebook"), > > + DMI_MATCH(DMI_PRODUCT_SKU, "P3S95EA#ACB"), > > + }, > > + .driver_data = (void *)BIT(QUIRK_DIS_PCI_CAP_ASPM), > > + }, > > + {} > > +}; > > + > > int rtw_pci_probe(struct pci_dev *pdev, > > const struct pci_device_id *id) > > { > > @@ -1808,6 +1837,7 @@ int rtw_pci_probe(struct pci_dev *pdev, > > bridge && bridge->vendor == PCI_VENDOR_ID_INTEL) > > rtwpci->rx_no_aspm = true; > > > > + dmi_check_system(rtw88_pci_quirks); > > rtw_pci_phy_cfg(rtwdev); > > > > ret = rtw_register_hw(rtwdev, hw); > > > > Results with only this patch applied: > > > > - The hard freeze lockup is gone. > > - However, during idle the logs are flooded with: > > > > rtw88_8821ce 0000:13:00.0: failed to send h2c command > > rtw88_8821ce 0000:13:00.0: firmware failed to leave lps state > > > > - To give a concrete sense of the volume: over an ~80-minute > > observation window after a clean module reload, I recorded > > 11,757 "failed to send h2c command" events and 2 "firmware > > failed to leave lps state" events -- approximately 110 errors > > per minute during active periods. > > - These errors cause Bluetooth audio stuttering and WiFi > > throughput drops. > > > > When I additionally set disable_lps_deep=Y alongside your ASPM patch, > > all h2c errors vanish completely and Bluetooth/WiFi remain fully > > stable. This confirms that disabling LPS Deep is necessary for > > complete stability on this specific HP SKU. > > > > I also noticed what appears to be a new patch in a separate mailing > > list thread. I will test it shortly and report back with the results. > > Thanks for your experiments in detail. :) > > Ping-Ke > ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-18 0:00 ` LB F @ 2026-03-18 0:58 ` Ping-Ke Shih 2026-03-18 23:55 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-18 0:58 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > To reproduce this reliable, you need to remove driver ko and mac80211.ko, > > and reinstall them. > > > > However, you have confirmed this is the symptom. I think only if you > > want to dig why the rate reported by hardware is weird, otherwise we > > can ignore this warning. > > Following your suggestion, I performed a full stack reload including > mac80211.ko and cfg80211.ko, and was able to reproduce the warning: > > [152.226055] Rate marked as a VHT rate but data is invalid: MCS: 0, NSS: 0 > [152.226057] WARNING: net/mac80211/rx.c:5491 at > ieee80211_rx_list+0x177/0x1020 [mac80211] > [152.226336] CPU: 2 UID: 0 PID: 638 Comm: irq/56-rtw_pci Tainted: G > IOE 6.19.7-1-cachyos > [152.226344] Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020 > > One observation worth mentioning: the warning triggered approximately > 72 seconds after initial association, coinciding with a Bluetooth > device connecting to the system. This may suggest the NSS=0 condition > occurs during BT coexistence negotiation rather than during normal > WiFi traffic. I am not sure if this is relevant, but I wanted to > mention it in case it helps narrow down the root cause. > > I also noticed the offset is now +0x177, which matches exactly what > you showed from v6.19.6. The earlier +0x183 was likely an artifact of > CachyOS's LTO optimizations while mac80211 had been resident for a > long time. > > As you noted, this appears to be a separate issue from the freeze and > h2c timeout problems, so I leave it to your judgment whether it > warrants further investigation. I add a printk to show the case VHT and NSS==0 as below. Please help to collect the output, and then I can see what it happened. diff --git a/drivers/net/wireless/realtek/rtw88/rx.c b/drivers/net/wireless/realtek/rtw88/rx.c index 8b0afaaffaa0..a4e3a3bce748 100644 --- a/drivers/net/wireless/realtek/rtw88/rx.c +++ b/drivers/net/wireless/realtek/rtw88/rx.c @@ -230,6 +230,11 @@ static void rtw_rx_fill_rx_status(struct rtw_dev *rtwdev, &rx_status->nss); } + if (rx_status->encoding == RX_ENC_VHT && rx_status->nss == 0) { + printk("VHT NSS=0 pkt_stat->rate=0x%x rx_status->band=%d rx_status->rate_idx=%d\n", + pkt_stat->rate, rx_status->band, rx_status->rate_idx); + } + rx_status->flag |= RX_FLAG_MACTIME_START; rx_status->mactime = pkt_stat->tsf_low; > > --- > > I would like to take this opportunity to thank you sincerely for your > time, patience, and expertise throughout this whole process. From your > very first response to the final v2 patch, your guidance made it > possible to properly identify and resolve a bug that had been causing > real frustration for users of this hardware for a long time. I also thanks for your time and help. :) Ping-Ke ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-18 0:58 ` Ping-Ke Shih @ 2026-03-18 23:55 ` LB F 2026-03-19 0:22 ` LB F 2026-03-19 1:24 ` Ping-Ke Shih 0 siblings, 2 replies; 34+ messages in thread From: LB F @ 2026-03-18 23:55 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > I add a printk to show the case VHT and NSS==0 as below. Please help to > collect the output, and then I can see what it happened. Hi Ping-Ke, I applied your diagnostic patch (using pr_err for maximum log visibility) and spent the last couple of days testing it on the affected hardware. The results answer both open questions cleanly. --- Regarding your earlier question: > Not sure if this is because PCIE bridge has no ASPM capability? You were correct. The very beginning of the boot log shows: [0.177872] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [15.157752] r8169 0000:07:00.0: can't disable ASPM; OS doesn't have ASPM control The BIOS on this HP laptop uses the ACPI FADT table to globally revoke OS control over PCIe ASPM before Linux even takes over. This has an important implication: since ASPM is already disabled at the hardware level by firmware, the instability on this specific SKU is caused entirely by LPS Deep Mode, not ASPM itself. This explains why the ASPM-only quirk (v1 patch) did not stop the h2c timeouts -- ASPM was never actually active on this machine to begin with. Disabling LPS Deep Mode via the v2 quirk is what eliminates the firmware timeout loop entirely. --- Regarding the VHT NSS=0 diagnostic patch: During normal idle, active pinging, and heavy VHT throughput (175.5 Mb/s), the pr_err condition never triggered -- no "VHT NSS=0" lines appeared in dmesg during active use. However, the standard WARNING at mac80211/rx.c:5491 does reliably appear exactly once after a fresh full stack reload (including mac80211.ko and cfg80211.ko) or after resume from suspend: [167.708201] WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x177/0x1020 [mac80211] This suggests the hardware reports a malformed nss=0 VHT rate only during initial link establishment. Since mac80211 uses WARN_ONCE, it is suppressed on all subsequent packets. The diagnostic module remains installed. I will report back immediately if the pr_err condition is caught, or if any other relevant symptoms appear. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-18 23:55 ` LB F @ 2026-03-19 0:22 ` LB F 2026-03-19 0:49 ` Ping-Ke Shih 2026-03-19 1:24 ` Ping-Ke Shih 1 sibling, 1 reply; 34+ messages in thread From: LB F @ 2026-03-19 0:22 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Hi Ping-Ke, I successfully collected the output with your diagnostic printk. Here is the exact log entry triggered when the warning fires: [ 180.424146] VHT NSS=0 pkt_stat->rate=0x65 rx_status->band=1 rx_status->rate_idx=0 [ 180.424157] WARNING: net/mac80211/rx.c:5491 at ieee80211_rx_list+0x177/0x1020 [mac80211] Looking at the rtw88 source code, this perfectly explains why `nss` is 0: 1. The hardware/firmware reports `pkt_stat->rate = 0x65` (101 in decimal). 2. `rtw_rx_fill_rx_status()` checks if `pkt_stat->rate >= DESC_RATEVHT1SS_MCS0` (which is `0x2c`). Since `0x65 >= 0x2c`, it correctly sets `rx_status->encoding = RX_ENC_VHT`. 3. It then calls `rtw_desc_to_mcsrate(pkt_stat->rate, &rx_status->rate_idx, &rx_status->nss)`. 4. Inside `rtw_desc_to_mcsrate()`, the value `0x65` falls completely outside any known bounds. The highest defined rate in `enum rtw_trx_desc_rate` is `DESC_RATEVHT4SS_MCS9` (`0x53`). The HT range (`DESC_RATEMCS0` to `DESC_RATEMCS31`) ends at `0x2b`. 5. Because `0x65` matches absolutely none of the `if/else` brackets in `rtw_desc_to_mcsrate()`, the function simply returns without mutating `mcs` and `nss`. 6. Since `rx_status` was initialized with `memset(rx_status, 0, ...)` at the beginning of the function, `rx_status->nss` remains `0`. So mac80211 complains because the rtw88 driver doesn't know what rate `0x65` means, leaves NSS at 0, but still flags it as a VHT packet. Any idea what `0x65` represents from the hardware's perspective? Is it a firmware bug or a proprietary control/management frame rate index? Looking forward to your thoughts! Best regards, Oleksandr ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-19 0:22 ` LB F @ 2026-03-19 0:49 ` Ping-Ke Shih 0 siblings, 0 replies; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-19 0:49 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Hi Ping-Ke, > > I successfully collected the output with your diagnostic printk. > > Here is the exact log entry triggered when the warning fires: > > [ 180.424146] VHT NSS=0 pkt_stat->rate=0x65 rx_status->band=1 > rx_status->rate_idx=0 > [ 180.424157] WARNING: net/mac80211/rx.c:5491 at > ieee80211_rx_list+0x177/0x1020 [mac80211] > > Looking at the rtw88 source code, this perfectly explains why `nss` is 0: > 1. The hardware/firmware reports `pkt_stat->rate = 0x65` (101 in decimal). > 2. `rtw_rx_fill_rx_status()` checks if `pkt_stat->rate >= > DESC_RATEVHT1SS_MCS0` (which is `0x2c`). Since `0x65 >= 0x2c`, it > correctly sets `rx_status->encoding = RX_ENC_VHT`. > 3. It then calls `rtw_desc_to_mcsrate(pkt_stat->rate, > &rx_status->rate_idx, &rx_status->nss)`. > 4. Inside `rtw_desc_to_mcsrate()`, the value `0x65` falls completely > outside any known bounds. The highest defined rate in `enum > rtw_trx_desc_rate` is `DESC_RATEVHT4SS_MCS9` (`0x53`). The HT range > (`DESC_RATEMCS0` to `DESC_RATEMCS31`) ends at `0x2b`. > 5. Because `0x65` matches absolutely none of the `if/else` brackets in > `rtw_desc_to_mcsrate()`, the function simply returns without mutating > `mcs` and `nss`. > 6. Since `rx_status` was initialized with `memset(rx_status, 0, ...)` > at the beginning of the function, `rx_status->nss` remains `0`. > > So mac80211 complains because the rtw88 driver doesn't know what rate > `0x65` means, leaves NSS at 0, but still flags it as a VHT packet. > > Any idea what `0x65` represents from the hardware's perspective? Is it > a firmware bug or a proprietary control/management frame rate index? > > Looking forward to your thoughts! Not sure what hardware get wrong. Let's validate rate when reading from hardware. Since 1M rate can only 20MHz, I set it together. Please help to test below. I suppose you can see "weird rate=xxx", but "WARNING: net/mac80211/rx.c:5491" disappears. diff --git a/drivers/net/wireless/realtek/rtw88/rx.c b/drivers/net/wireless/realtek/rtw88/rx.c index 8b0afaaffaa0..3d5e48264fc5 100644 --- a/drivers/net/wireless/realtek/rtw88/rx.c +++ b/drivers/net/wireless/realtek/rtw88/rx.c @@ -295,6 +295,12 @@ void rtw_rx_query_rx_desc(struct rtw_dev *rtwdev, void *rx_desc8, pkt_stat->tsf_low = le32_get_bits(rx_desc->w5, RTW_RX_DESC_W5_TSFL); + if (pkt_stat->rate >= DESC_RATE_MAX) { + printk("weird rate=%d\n", pkt_stat->rate); + pkt_stat->rate = DESC_RATE1M; + pkt_stat->bw = RTW_CHANNEL_WIDTH_20; + } + /* drv_info_sz is in unit of 8-bytes */ pkt_stat->drv_info_sz *= 8; Ping-Ke ^ permalink raw reply related [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-18 23:55 ` LB F 2026-03-19 0:22 ` LB F @ 2026-03-19 1:24 ` Ping-Ke Shih 2026-03-19 23:58 ` LB F 1 sibling, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-19 1:24 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > I add a printk to show the case VHT and NSS==0 as below. Please help to > > collect the output, and then I can see what it happened. > > Hi Ping-Ke, > > I applied your diagnostic patch (using pr_err for maximum log > visibility) and spent the last couple of days testing it on the > affected hardware. The results answer both open questions cleanly. > > --- > > Regarding your earlier question: > > Not sure if this is because PCIE bridge has no ASPM capability? > > You were correct. The very beginning of the boot log shows: > > [0.177872] ACPI FADT declares the system doesn't support PCIe ASPM, > so disable it > [15.157752] r8169 0000:07:00.0: can't disable ASPM; OS doesn't have > ASPM control > > The BIOS on this HP laptop uses the ACPI FADT table to globally revoke > OS control over PCIe ASPM before Linux even takes over. This has an > important implication: since ASPM is already disabled at the hardware > level by firmware, the instability on this specific SKU is caused > entirely by LPS Deep Mode, not ASPM itself. Checking rtw88 code related to rtw_pci_disable_aspm, I found that driver does check device ASPM capability before configuring ASPM. It looks a little weird why OS doesn't turn off these capabilities of device. Maybe we should check the capabilities of PCI bridge side? > > This explains why the ASPM-only quirk (v1 patch) did not stop the h2c > timeouts -- ASPM was never actually active on this machine to begin > with. Disabling LPS Deep Mode via the v2 quirk is what eliminates the > firmware timeout loop entirely. I think there are two problems. One is ASPM causing system frozen, and the other is LPS deep mode causing H2C timeouts. If you turn on ASPM and disable LPS deep mode, I feel H2C timeout can disappear, but it might go frozen first though. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-19 1:24 ` Ping-Ke Shih @ 2026-03-19 23:58 ` LB F 2026-03-20 0:41 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-19 23:58 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > Maybe we should check the capabilities of PCI bridge side? > I think there are two problems. One is ASPM causing system frozen, > and the other is LPS deep mode causing H2C timeouts. Hi Ping-Ke, You were right on both counts. Here are the PCI bridge capabilities. The upstream bridge for the RTL8821CE (13:00.0) is: Intel Corporation Wildcat Point-LP PCI Express Root Port #5 (00:1c.4) Bridge (00:1c.4): LnkCap: Port #5, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us LnkCtl: ASPM L0s L1 Enabled WiFi card (13:00.0): LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <64us LnkCtl: ASPM L0s L1 Enabled So ASPM L0s and L1 are enabled by the BIOS on both ends of the bus, despite the ACPI FADT claiming the OS has no ASPM control. ASPM was active on this machine all along. I apologize for the incorrect earlier conclusion that ASPM was not active. This confirms your analysis: there are indeed two separate problems -- ASPM causing the hard freeze, and LPS Deep Mode causing the H2C timeouts. The v2 patch correctly addresses both. --- Regarding your rate validation patch: I applied it (removing the earlier pr_err block and inserting the new check in rtw_rx_query_rx_desc). The patch compiled and installed correctly -- verified via strings on the installed .zst module. I was unable to reproduce the "weird rate" condition or the WARNING during this test session. The diagnostic module remains installed and active -- I will report back immediately if I manage to catch it. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-19 23:58 ` LB F @ 2026-03-20 0:41 ` LB F 2026-03-20 1:00 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-20 0:41 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > Not sure what hardware get wrong. Let's validate rate when reading > from hardware. Hi Ping-Ke, One additional observation while monitoring logs with your rate validation patch installed. During normal usage with Wi-Fi connected and a Bluetooth A2DP device connecting to the system, the following message appeared in dmesg: [180.420000] rtw_8821ce 0000:13:00.0: unused phy status page (11) Looking at rtw_rx_fill_phy_info() in rx.c, this message is emitted when the firmware sends a PHY status report with a page number that the driver does not recognize. In this case page 11 appeared at the moment the Bluetooth device was establishing its connection. We have not observed any stability issues or connectivity drops associated with this message -- the driver appears to handle it gracefully by ignoring it. We are not sure whether this is related to the rate=0x65 issue or is simply a separate artifact of BT/Wi-Fi coexistence on this chip. We wanted to mention it in case it is useful context. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-20 0:41 ` LB F @ 2026-03-20 1:00 ` Ping-Ke Shih 2026-03-20 1:19 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-20 1:00 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > Not sure what hardware get wrong. Let's validate rate when reading > > from hardware. > > Hi Ping-Ke, > > One additional observation while monitoring logs with your rate > validation patch installed. > > During normal usage with Wi-Fi connected and a Bluetooth A2DP device > connecting to the system, the following message appeared in dmesg: > > [180.420000] rtw_8821ce 0000:13:00.0: unused phy status page (11) > > Looking at rtw_rx_fill_phy_info() in rx.c, this message is emitted > when the firmware sends a PHY status report with a page number that > the driver does not recognize. In this case page 11 appeared at the > moment the Bluetooth device was establishing its connection. It seems like hardware reports incorrect about the PHY status, which only 0 or 1 is expected. I don't know how it could be. Maybe, we can ignore this message, or change it to debug level if it appears frequently and you don't want to see it. > > We have not observed any stability issues or connectivity drops > associated with this message -- the driver appears to handle it > gracefully by ignoring it. We are not sure whether this is related > to the rate=0x65 issue or is simply a separate artifact of BT/Wi-Fi > coexistence on this chip. We wanted to mention it in case it is > useful context. Two messages look like hardware goes weird. The report values become unpredictable. Maybe we need more validation.... However, driver will become very dirty since I can't conclude a single rule to address them. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-20 1:00 ` Ping-Ke Shih @ 2026-03-20 1:19 ` LB F 2026-03-20 2:02 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-20 1:19 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > Not sure what hardware get wrong. Let's validate rate when reading > from hardware. Since 1M rate can only 20MHz, I set it together. > Please help to test below. I suppose you can see "weird rate=xxx", > but "WARNING: net/mac80211/rx.c:5491" disappears. Hi Ping-Ke, I can confirm your patch works as expected. Here are the full results. --- Test environment --- Kernel: 6.19.7-1-cachyos Patch: your rate validation patch applied to rtw_rx_query_rx_desc(), on top of the v2 DMI quirk (ASPM + LPS Deep disabled) --- Captured log (relevant excerpt) --- [ 43.046] input: Soundcore Q10i (AVRCP) <-- BT headset connected [ 111.551] rtw_8821ce 0000:13:00.0: unused phy status page (13) [ 111.635] weird rate=101 [ 111.635] rtw_8821ce 0000:13:00.0: unused phy status page (7) [ 111.741] weird rate=102 [ 115.045] weird rate=98 [ 118.371] weird rate=104 --- Analysis --- 1. Timing: the anomalous events began approximately 68 seconds after the Bluetooth A2DP headset (Soundcore Q10i) established its connection. No anomalies were observed before BT connected. 2. Multiple invalid rate values were captured, not just 0x65: weird rate=101 (0x65) weird rate=102 (0x66) weird rate=98 (0x62) weird rate=104 (0x68) All four values exceed DESC_RATE_MAX (0x53 = 83 decimal). This suggests the hardware occasionally reports a range of out-of-bounds rate values during BT/Wi-Fi coexistence, not a single fixed value. 3. The "unused phy status page" messages (pages 13 and 7) appeared immediately before and alongside the "weird rate" events. As noted in my previous message, only pages 0 and 1 are expected. This further suggests the firmware leaks internal coexistence state into the RX ring during BT antenna arbitration. 4. Most importantly: the WARNING: net/mac80211/rx.c:5491 did NOT appear anywhere in the log. Your rate clamping patch successfully intercepts the out-of-bounds values before they propagate to mac80211, preventing the invalid VHT NSS=0 warning entirely. --- Conclusion --- Your patch achieves the intended result. The "weird rate" printk confirms the hardware is the source of the invalid values (occurring during BT coexistence), and the mac80211 WARNING is suppressed. Please let me know if you need any additional data or further testing. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-20 1:19 ` LB F @ 2026-03-20 2:02 ` Ping-Ke Shih 2026-03-21 12:07 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-20 2:02 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > Not sure what hardware get wrong. Let's validate rate when reading > > from hardware. Since 1M rate can only 20MHz, I set it together. > > Please help to test below. I suppose you can see "weird rate=xxx", > > but "WARNING: net/mac80211/rx.c:5491" disappears. > > Hi Ping-Ke, > > I can confirm your patch works as expected. Here are the full results. > > --- Test environment --- > > Kernel: 6.19.7-1-cachyos > Patch: your rate validation patch applied to rtw_rx_query_rx_desc(), > on top of the v2 DMI quirk (ASPM + LPS Deep disabled) > > --- Captured log (relevant excerpt) --- > > [ 43.046] input: Soundcore Q10i (AVRCP) <-- BT headset connected > [ 111.551] rtw_8821ce 0000:13:00.0: unused phy status page (13) > [ 111.635] weird rate=101 > [ 111.635] rtw_8821ce 0000:13:00.0: unused phy status page (7) > [ 111.741] weird rate=102 > [ 115.045] weird rate=98 > [ 118.371] weird rate=104 > > --- Analysis --- > > 1. Timing: the anomalous events began approximately 68 seconds after > the Bluetooth A2DP headset (Soundcore Q10i) established its > connection. No anomalies were observed before BT connected. > > 2. Multiple invalid rate values were captured, not just 0x65: > > weird rate=101 (0x65) > weird rate=102 (0x66) > weird rate=98 (0x62) > weird rate=104 (0x68) > > All four values exceed DESC_RATE_MAX (0x53 = 83 decimal). This > suggests the hardware occasionally reports a range of out-of-bounds > rate values during BT/Wi-Fi coexistence, not a single fixed value. > > 3. The "unused phy status page" messages (pages 13 and 7) appeared > immediately before and alongside the "weird rate" events. As noted > in my previous message, only pages 0 and 1 are expected. This > further suggests the firmware leaks internal coexistence state > into the RX ring during BT antenna arbitration. > > 4. Most importantly: the WARNING: net/mac80211/rx.c:5491 did NOT > appear anywhere in the log. Your rate clamping patch successfully > intercepts the out-of-bounds values before they propagate to > mac80211, preventing the invalid VHT NSS=0 warning entirely. > > --- Conclusion --- > > Your patch achieves the intended result. The "weird rate" printk > confirms the hardware is the source of the invalid values (occurring > during BT coexistence), and the mac80211 WARNING is suppressed. > > Please let me know if you need any additional data or further testing. I'll send formal patch (Cc you) for the invalid VHT NSS=0, but not to handle "unused phy status page". Please give me Tested-by tag on the patch after I send it. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-20 2:02 ` Ping-Ke Shih @ 2026-03-21 12:07 ` LB F 2026-03-23 2:01 ` Ping-Ke Shih 0 siblings, 1 reply; 34+ messages in thread From: LB F @ 2026-03-21 12:07 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Ping-Ke Shih <pkshih@realtek.com> wrote: > I'll send formal patch (Cc you) for the invalid VHT NSS=0, but not > to handle "unused phy status page". Please give me Tested-by tag on > the patch after I send it. Hi Ping-Ke, Just a quick update to keep you informed -- no rush on anything. My kernel updated from 6.19.7 to 6.19.9, which wiped the previously installed out-of-tree modules. I rebuilt and reinstalled both patches: 1. The v2 DMI quirk (main.h + pci.c) disabling ASPM and LPS Deep Mode for the HP P3S95EA#ACB SKU. 2. The rate validation patch (rx.c) clamping out-of-bounds rate values before they reach mac80211. Both patches apply cleanly and the system remains fully stable on 6.19.9. The DMI quirk is confirmed active via sysfs (disable_aspm=Y, disable_lps_deep=Y) with no manual modprobe overrides. I am looking forward to your formal patch for the VHT NSS=0 issue and will provide a Tested-by tag as soon as it arrives. Thank you again for all your work and patience throughout this process. Best regards, Oleksandr Havrylov ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-21 12:07 ` LB F @ 2026-03-23 2:01 ` Ping-Ke Shih 2026-03-25 20:38 ` LB F 0 siblings, 1 reply; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-23 2:01 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > I'll send formal patch (Cc you) for the invalid VHT NSS=0, but not > > to handle "unused phy status page". Please give me Tested-by tag on > > the patch after I send it. > > Hi Ping-Ke, > > Just a quick update to keep you informed -- no rush on anything. > > My kernel updated from 6.19.7 to 6.19.9, which wiped the previously > installed out-of-tree modules. I rebuilt and reinstalled both patches: > > 1. The v2 DMI quirk (main.h + pci.c) disabling ASPM and LPS Deep > Mode for the HP P3S95EA#ACB SKU. > 2. The rate validation patch (rx.c) clamping out-of-bounds rate > values before they reach mac80211. > > Both patches apply cleanly and the system remains fully stable on > 6.19.9. The DMI quirk is confirmed active via sysfs (disable_aspm=Y, > disable_lps_deep=Y) with no manual modprobe overrides. > > I am looking forward to your formal patch for the VHT NSS=0 issue and > will provide a Tested-by tag as soon as it arrives. Thank you again > for all your work and patience throughout this process. I sent the VHT NSS=0 patch [1]. Please help to give it a test. Thanks. [1] https://lore.kernel.org/linux-wireless/20260323015849.9424-1-pkshih@realtek.com/T/#u Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-23 2:01 ` Ping-Ke Shih @ 2026-03-25 20:38 ` LB F 0 siblings, 0 replies; 34+ messages in thread From: LB F @ 2026-03-25 20:38 UTC (permalink / raw) To: Ping-Ke Shih; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Cross-platform analysis: RTL8821CE ASPM/LPS instability affects multiple OEM platforms beyond HP P3S95EA#ACB Hi Ping-Ke, First of all, thank you very much for your work on the rtw88 driver and for the time you spent helping us resolve the issues on our HP laptop. Both patches -- the v2 DMI quirk (ASPM + LPS Deep) and the v2 RX rate validation (rx.c) -- have been tested and verified stable on our system across two kernel updates (6.19.9-1 and 6.19.9-2), sustained network load (~1.9 GB), and multiple suspend/resume cycles. The system is now completely free of freezes, h2c errors, and mac80211 warnings. Your patches genuinely solved every issue we had. While working through this, I noticed that many other users across different hardware platforms appear to be experiencing the same problems that your patches resolved for us. I decided to collect and organize these observations in case they might be useful to you. Please note that this is an amateur analysis, not a professional one -- I am just a user trying to help. It is entirely possible that I have missed nuances or made incorrect assumptions. My only goal is to share what I found, in case it provides useful data points or sparks ideas for broader improvements. If any of this is not relevant or not useful, please feel free to disregard it. 1. KERNEL BUGZILLA: OPEN RTL8821CE REPORTS ========================================== I reviewed all open RTL8821CE bugs in kernel.org Bugzilla. Three of the six show symptoms that directly match the root causes addressed by your patches (ASPM deadlock and LPS Deep h2c failures). --- Directly correlated with ASPM/LPS patches --- Bug 215131 - System freeze (ASPM L1 deadlock) Title: "Realtek 8821CE makes the system freeze after 9e2fd29864c5 (rtw88: add napi support)" Reporter: Kai-Heng Feng (Canonical) Kernel: 5.15+ Symptoms: Hard freeze preceded by "pci bus timeout, check dma status" warnings. RX tag mismatch in rtw_pci_dma_check(). Workaround confirmed by reporter: rtw88_pci.disable_aspm=1 Reporter note: "disable_aspm=1 is not a viable workaround because it increases power consumption significantly" Status: OPEN since 2021-11-24. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215131 Relevance: Identical root cause to Bug 221195. The reporter's confirmed workaround (disable_aspm=1) is exactly what the DMI quirk implements. Bug 219830 - h2c/LPS failures + BT crackling Title: "rtw88_8821ce (RTL8821CE) slow performance and error messages in dmesg" Reporter: Bmax Y14 laptop, Fedora 41, kernel 6.13.5 Symptoms: - "failed to send h2c command" (periodic) - "firmware failed to leave lps state" (periodic) - Lower signal strength vs Windows - Bluetooth crackling during audio playback Cross-ref: https://github.com/lwfinger/rtw88/issues/306 Status: OPEN since 2025-03-02. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219830 Relevance: The h2c/lps errors are the same messages we observed before the DMI quirk disabled LPS Deep Mode. The BT crackling may correlate with the invalid RX rate condition addressed by your rx.c validation patch. Bug 218697 - TX queue flush timeout during scan Title: "rtw88_8821ce timed out to flush queue 2" Reporter: Arch Linux, kernel 6.8.4 / 6.8.5 Symptoms: - "timed out to flush queue 2" every ~30 seconds - "failed to get tx report from firmware" - Stack trace: ieee80211_scan_work -> rtw_ops_flush -> rtw_mac_flush_queues timeout Status: OPEN since 2024-04-08. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218697 Relevance: The flush timeout occurs when the firmware cannot respond to TX queue operations -- consistent with firmware being stuck in LPS Deep during scan. --- Potentially related (no confirmed workaround data) --- Bug 217491 - "timed out to flush queue 1" regression (kernel 6.3) Manjaro user. Floods of "timed out to flush queue 1/2". Similar pattern to Bug 218697. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217491 Bug 217781 - Random sudden dropouts Arch user. Random disconnections during streaming/transfers. Reproduced on Ubuntu and Fedora (kernels 5.15 to 6.4). Link: https://bugzilla.kernel.org/show_bug.cgi?id=217781 Bug 216685 - Low wireless speed Reduced throughput vs expected 802.11ac performance. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216685 2. SYMPTOM-TO-PATCH MAPPING ============================= dmesg signature Patch that resolves it -------------------------- ---------------------- Hard system freeze pci.c DMI quirk (disable ASPM) "pci bus timeout, check dma" pci.c DMI quirk (disable ASPM) "firmware failed to leave lps" pci.c DMI quirk (disable LPS Deep) "failed to send h2c command" pci.c DMI quirk (disable LPS Deep) "timed out to flush queue N" pci.c DMI quirk (disable LPS Deep) [1] "failed to get tx report" pci.c DMI quirk (disable LPS Deep) [1] VHT NSS=0 mac80211 WARNING rx.c rate validation (v2) Confirmed in bugs: 215131, 219830, 218697, 221195. [1] Inferred: flush timeout occurs when firmware cannot exit LPS to process TX queue operations. 3. AFFECTED HARDWARE ===================== Current DMI quirk coverage: HP P3S95EA#ACB only. Platforms confirmed affected in Bugzilla: Bug 221195: HP Notebook 81F0 (P3S95EA#ACB) Bug 215131: unknown (Canonical upstream testing) Bug 219830: Bmax Y14 Bug 218697: unknown (Arch Linux user) Platforms reported in community forums as requiring disable_aspm=Y and/or disable_lps_deep=Y for stable RTL8821CE: - HP 17-by4063CL - Lenovo IdeaPad S145-15AST, IdeaPad 3, IdeaPad 330S - ASUS VivoBook X series - Acer Aspire 3/5 series All use PCI Device ID 10ec:c821 with different Subsystem IDs. 4. LPS_DEEP_MODE_LCLK IN THE rtw88 TREE ========================================= I verified in the source which chips have the same lps_deep_mode_supported flag: Chip file lps_deep_mode_supported PCIe variant --------- ----------------------- ------------ rtw8821c.c BIT(LPS_DEEP_MODE_LCLK) rtw8821ce ✓ rtw8822c.c BIT(LPS_DEEP_MODE_LCLK) | PG rtw8822ce ✓ rtw8822b.c BIT(LPS_DEEP_MODE_LCLK) rtw8822be ✓ rtw8814a.c BIT(LPS_DEEP_MODE_LCLK) rtw8814ae ✓ rtw8723d.c 0 rtw8723de ✗ rtw8703b.c 0 (SDIO) - rtw8821a.c 0 (legacy) - Source references: rtw8821c.c:2002 rtw8822c.c:5365 rtw8822b.c:2545 rtw8814a.c:2211 rtw8723d.c:2144 RTL8822CE community reports (Manjaro, Arch forums) confirm the same disable_aspm=Y + disable_lps_deep=Y workaround is effective, consistent with rtw8822c.c having LCLK enabled. 5. COMMUNITY WORKAROUND REFERENCES ==================================== The following are concrete examples of forums and wikis where the same modprobe workarounds are actively recommended: Arch Wiki - RTW88 section: https://wiki.archlinux.org/title/Network_configuration/Wireless (section "RTW88" and "rtl8821ce" under Troubleshooting/Realtek) Recommends rtw88-dkms-git and documents the rtw88_8821ce issues. Arch Wiki - RTW89 section (same page): Documents the identical ASPM pattern for the newer RTW89 driver: options rtw89_pci disable_aspm_l1=y disable_aspm_l1ss=y options rtw89_core disable_ps_mode=y This suggests the ASPM/LPS interaction is a systemic Realtek design issue, not specific to rtw88 or the 8821CE chip. Arch Linux Forum - RTL8821CE thread: https://bbs.archlinux.org/viewtopic.php?id=273440 Referenced by the Arch Wiki as the primary rtl8821ce discussion. lwfinger/rtw88 GitHub: https://github.com/lwfinger/rtw88/issues/306 Directly cross-referenced by Bug 219830. Reporter reports h2c errors and investigated antenna hardware (no fault found). lwfinger/rtw89 GitHub: https://github.com/lwfinger/rtw89/issues/275#issuecomment-1784155449 Same ASPM-disable pattern documented for the newer RTW89 driver on HP and Lenovo laptops. 6. OBSERVATIONS ================ a) Three open Bugzilla reporters (215131, 219830, 218697) show symptoms identical to those resolved by your patches but have no upstream fix available since they are not the HP P3S95EA#ACB. b) Bug 215131 reporter (Kai-Heng Feng, Canonical) explicitly confirmed disable_aspm=1 as a workaround and called it "not viable" due to power cost. A DMI quirk for their platform would give them a proper fix. c) The Arch Wiki documents the same ASPM-disable pattern for both RTW88 and RTW89 drivers, suggesting the underlying hardware/firmware limitation is shared across generations. d) Asking Bugzilla reporters to provide their DMI data (dmidecode -t 1,2) could allow extending the quirk table with minimal effort and zero risk to unaffected platforms. e) The rx.c rate validation patch is chip-agnostic and requires no platform-specific consideration. 7. REFERENCES ============== Kernel Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215131 https://bugzilla.kernel.org/show_bug.cgi?id=219830 https://bugzilla.kernel.org/show_bug.cgi?id=218697 https://bugzilla.kernel.org/show_bug.cgi?id=217491 https://bugzilla.kernel.org/show_bug.cgi?id=217781 https://bugzilla.kernel.org/show_bug.cgi?id=216685 GitHub: https://github.com/lwfinger/rtw88/issues/306 https://github.com/lwfinger/rtw89/issues/275 Arch Wiki: https://wiki.archlinux.org/title/Network_configuration/Wireless Arch Linux Forum: https://bbs.archlinux.org/viewtopic.php?id=273440 Source code (lps_deep_mode_supported): drivers/net/wireless/realtek/rtw88/rtw8821c.c:2002 drivers/net/wireless/realtek/rtw88/rtw8822c.c:5365 drivers/net/wireless/realtek/rtw88/rtw8822b.c:2545 drivers/net/wireless/realtek/rtw88/rtw8814a.c:2211 drivers/net/wireless/realtek/rtw88/rtw8723d.c:2144 Best regards, Oleksandr Havrylov <goainwo@gmail.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) 2026-03-14 12:39 ` LB F 2026-03-15 0:24 ` LB F @ 2026-03-16 2:50 ` Ping-Ke Shih 1 sibling, 0 replies; 34+ messages in thread From: Ping-Ke Shih @ 2026-03-16 2:50 UTC (permalink / raw) To: LB F; +Cc: linux-wireless@vger.kernel.org, linux-kernel@vger.kernel.org LB F <goainwo@gmail.com> wrote: > Ping-Ke Shih <pkshih@realtek.com> wrote: > > I'd adopt your suggestion (dynamic LPS_DEEP_MODE_NONE) if the test > > is positive. > > Hi Ping-Ke, > > Following your suggestion, I performed an additional experiment to > validate the dynamic LPS_DEEP_MODE_NONE idea. Please treat this > purely as a field test report -- I am not a kernel developer, and the > implementation below is certainly not upstream-quality. I am sharing > it only in the hope that it helps you design a proper solution. > > What I did: > > I extended your DMI quirk in pci.c with an additional capability flag > for LPS Deep mode. The only file touched was pci.c (your patch) -- > main.c was left completely unmodified. > > The changes to your patch are as follows: > > /* 1. Extended the capabilities enum */ > enum rtw88_quirk_dis_pci_caps { > QUIRK_DIS_PCI_CAP_ASPM, > QUIRK_DIS_PCI_CAP_LPS_DEEP, /* test addition */ > }; > > /* 2. Extended disable_pci_caps() callback */ > static int disable_pci_caps(const struct dmi_system_id *dmi) > { > uintptr_t dis_caps = (uintptr_t)dmi->driver_data; > > if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_ASPM)) > rtw_pci_disable_aspm = true; > > if (dis_caps & BIT(QUIRK_DIS_PCI_CAP_LPS_DEEP)) > rtw_disable_lps_deep_mode = true; > > return 1; > } > > /* 3. Both flags set for the HP P3S95EA#ACB entry */ > .driver_data = (void *)(BIT(QUIRK_DIS_PCI_CAP_ASPM) | > BIT(QUIRK_DIS_PCI_CAP_LPS_DEEP)), > > I am aware that setting rtw_disable_lps_deep_mode from pci.c is > architecturally impure -- it is a global flag that would affect all > rtw88 devices in a hypothetical multi-adapter system. A proper > per-device solution (e.g. a flag inside struct rtw_dev set during > probe) would be cleaner. I simply used the existing global as the > most straightforward way to validate the concept. > > Verification: > > Confirmed no rtw88-related entries exist in /etc/modprobe.d/, > /lib/modprobe.d/, or /run/modprobe.d/, ruling out any external > parameter injection. > > After loading the patched modules, the following was confirmed via > sysfs: > > /sys/module/rtw88_core/parameters/disable_lps_deep_mode = Y > /sys/module/rtw88_pci/parameters/disable_aspm = Y > > This confirms the DMI quirk is the sole source of both values. > > Results (10-minute idle observation, battery power, wifi.powersave=3): > > With your ASPM patch alone (LPS Deep still active): > - periodic "failed to send h2c command" bursts observed > - occasional WiFi throughput drops and Bluetooth audio stuttering > > With ASPM patch + LPS Deep disabled via the quirk: > - h2c=0, lps=0 across the entire observation window > - WiFi throughput stable, Bluetooth audio uninterrupted > > The result confirms that disabling LPS Deep Mode in addition to ASPM > completely eliminates the remaining firmware timeout loop on this > platform. > > I hope this experiment is useful as a data point. Please feel free to > discard the implementation and design a proper solution -- I am ready > to test any updated patch you send. Thanks for your analysis of TX/RX paths, and the changes above and verifications. :) I'd update the patch as your proposal and send a patch. For suggestions of TX/RX paths, I only read them a little bit, and I will study them entirely when I have more free time. Ping-Ke ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2026-03-25 20:39 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-09 21:48 [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) LB F 2026-03-10 2:02 ` Ping-Ke Shih 2026-03-10 11:01 ` LB F 2026-03-10 15:12 ` LB F 2026-03-11 2:20 ` Ping-Ke Shih 2026-03-11 2:15 ` Ping-Ke Shih 2026-03-11 2:22 ` Ping-Ke Shih 2026-03-11 11:00 ` LB F 2026-03-11 15:22 ` LB F 2026-03-12 1:56 ` Ping-Ke Shih 2026-03-12 21:42 ` LB F 2026-03-13 0:03 ` LB F 2026-03-13 0:29 ` LB F 2026-03-14 10:52 ` LB F 2026-03-14 12:39 ` LB F 2026-03-15 0:24 ` LB F 2026-03-16 2:55 ` Ping-Ke Shih 2026-03-16 20:27 ` LB F 2026-03-17 1:28 ` Ping-Ke Shih 2026-03-18 0:00 ` LB F 2026-03-18 0:58 ` Ping-Ke Shih 2026-03-18 23:55 ` LB F 2026-03-19 0:22 ` LB F 2026-03-19 0:49 ` Ping-Ke Shih 2026-03-19 1:24 ` Ping-Ke Shih 2026-03-19 23:58 ` LB F 2026-03-20 0:41 ` LB F 2026-03-20 1:00 ` Ping-Ke Shih 2026-03-20 1:19 ` LB F 2026-03-20 2:02 ` Ping-Ke Shih 2026-03-21 12:07 ` LB F 2026-03-23 2:01 ` Ping-Ke Shih 2026-03-25 20:38 ` LB F 2026-03-16 2:50 ` Ping-Ke Shih
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox