public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: LB F <goainwo@gmail.com>
To: Ping-Ke Shih <pkshih@realtek.com>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict)
Date: Wed, 11 Mar 2026 13:00:21 +0200	[thread overview]
Message-ID: <CALdGYqQn8GGXXjZTsL+a5Mfdmw5HRYB2Jyvqq5M5SUwxK9yd_g@mail.gmail.com> (raw)
In-Reply-To: <e6720993c8c14245981432cfa4ae902b@realtek.com>

Hi Ping-Ke,

Thank you for the incredibly fast turnaround and for providing the RFT
patch with the DMI quirk!

First, I want to mention that I am not an IT professional or a
programmer. I am just a regular Linux user who really wants to help
solve this problem. I am trying my best to verify everything
carefully, so please forgive me if my terminology or induction was
slightly off.

To answer your clarifying questions from the previous emails:

> Just want to clarify that these logs only appear in test 3, right?
> No these logs in test 1/2.

Yes, exactly. The `failed to send h2c command` errors only caused a
complete system freeze when no workarounds were active and the adapter
attempted to sleep (Test 3).

> I think this is your perspective and induction, right? Did you measure
> real hardware signals?

You are entirely correct. This is just my induction based solely on
the timing of the logs and system behavior. I do not have access to an
oscilloscope or any hardware diagnostic tools. Given this, I
completely agree that your approach of applying a platform-specific
quirk is the safest and best solution.

> Forgot to say. Could you share your full name for me as a reporter
> in commit message?

My full name is Oleksandr Havrylov. I would be honored to be included
as the reporter in the commit message.

### Recent Baseline Testing Before Your Patch

Before applying your patch today, we ran a few more controlled tests
to double-check our baseline. We verified that our local workaround
(`modprobe.d disable_aspm=y`) **does indeed keep the system completely
stable** and prevents the hard freeze, even when NetworkManager's
`wifi.powersave` is set to ON (default).

However, we noticed one interesting detail in the kernel logs: while
the system no longer freezes with `disable_aspm=y`, `dmesg` still
constantly logs `firmware failed to leave lps state` and `failed to
send h2c command` when the laptop is completely idle. It seems the
firmware still crashes during LPS, but because ASPM is disabled, the
PCIe bus ignores the crash and the system survives perfectly fine. I
just wanted to mention this for completeness!

### Testing Plan

I have **not** applied your RFT patch just yet. I wanted to make sure
our testing baseline was 100% clean and documented first.

I will compile your patch and perform rigorous testing this evening (I
am in the EET timezone, Ukraine). I will test it with the native
`power_save` fully enabled to ensure your patch successfully prevents
the hard lockups as intended.

I will stay in touch and reply back to this thread with a formal
`Tested-by` confirmation (and any logs if needed) as soon as my
testing is complete. Thank you again for all your help!

Best regards,
Oleksandr Havrylov

ср, 11 мар. 2026 г. в 04:22, Ping-Ke Shih <pkshih@realtek.com>:
>
> Ping-Ke Shih <pkshih@realtek.com> wrote:
> >
> > LB F <goainwo@gmail.com> wrote:
> > >
> > > Hi Ping-Ke,
> > >
> > > Thank you for the incredibly fast response and assistance!
> > >
> > > > Can you dig kernel log (by netconsole or ramoops) if something useful?
> > > > I'd like to know this is hardware level freeze or kernel can capture something
> > > wrong.
> > >
> > > I managed to pull a call trace from a historic journald log just
> > > before the system hung. The kernel gets trapped in an IRQ thread
> > > inside `rtw_pci_interrupt_threadfn`, calling up into `mac80211`
> > > `ieee80211_rx_list` before everything freezes. Here is the relevant
> > > snippet:
> > >
> > > ```text
> > > Call Trace:
> > > <IRQ>
> > > ? __alloc_skb+0x23a/0x2a0
> > > ? __alloc_skb+0x10c/0x2a0
> > > ? __pfx_irq_thread_fn+0x10/0x10
> > > [ ... truncated module list ... ]
> > > Tainted: G W I 6.19.6-2-cachyos #1 PREEMPT(full)
> > > Hardware name: HP HP Notebook/81F0, BIOS F.50 11/20/2020
> > > RIP: 0010:ieee80211_rx_list+0x1012/0x1020 [mac80211]
> > > CPU: 2 UID: 0 PID: 765 Comm: irq/56-rtw88_pc
> > > rtw_pci_interrupt_threadfn+0x239/0x310 [rtw88_pci]
> > > ```
> > >
> > > It behaves exactly like a PCIe bus deadlock or a hardware fault that
> > > eventually brings down the CPU handling the IRQ.
> >
> > I wonder if there is a malformed data, causing this trace and the leads
> > kernel freezes. If we can do validation on RX data before calling
> > ieee80211_rx_list(), maybe trace disappears and everything will be fine?
> > Even no need workaround.
> >
> > >
> > > > Are these totally needed to workaround the problem? Or disable_aspm is enough?
> > > > I'd list them in order of power consumption impact:
> > > > 1. disable_aspm=y
> > > > 2. disable_lps_deep=y
> > > > 3. disable WiFi power save
> > >
> > > To verify which parameters are strictly necessary, I performed
> > > isolated testing today. I ensured no other modprobe configs were
> > > active, rebuilt the initramfs, and manually enforced that
> > > `wifi.powersave` was active via `iw dev wlan0 set power_save on`
> > > during all tests (as the OS power management profiles were defaulting
> > > it to off, which initially masked the issue).
> > >
> > > I tested each workaround individually across multiple sleep/wake
> > > cycles and active usage:
> > >
> > > **Test 1 (ASPM Disabled, LPS Deep Enabled):**
> > > - Kernel parameters: `rtw88_pci disable_aspm=y` (and `rtw88_core
> > > disable_lps_deep=n`)
> > > - Result: Stable. No freezes were observed during usage or transitions
> > > into/out of S3 sleep while power saving was enforced.
> > >
> > > **Test 2 (ASPM Enabled, LPS Deep Disabled):**
> > > - Kernel parameters: `rtw88_core disable_lps_deep=y` (and `rtw88_pci
> > > disable_aspm=n`)
> > > - Result: Stable. No freezes were observed under the same forced power
> > > save conditions.
> > >
> > > **Conclusion:** It appears we do not need both workarounds
> > > simultaneously for this specific hardware. Using only `disable_aspm=y`
> > > seems to be sufficient to prevent the system freeze. Given your note
> > > about the power consumption impact ranking, this looks like the
> > > optimal path forward.
> >
> > Let's test my RFT patch to disable ASPM then.
> >
> > >
> > > > But what does 'deadlock' mean? As I know NAPI poll is scheduled by ISR,
> > > > and going to receive packets. The rx_no_aspm workaround is to forcely turn
> > > > off ASPM during this period.
> > >
> > > By "deadlock" I meant a hardware-level bus lockup. It seems the
> > > physical RTL8821CE chip itself crashes or hangs the system's PCIe bus
> > > when trying to negotiate waking up from ASPM L1 while simultaneously
> > > existing in `LPS_DEEP_MODE_LCLK`. The `rx_no_aspm` workaround in NAPI
> > > helps during active Rx decoding, but the laptop often freezes while
> > > completely idle, presumably when the AP sends a basic beacon, the chip
> > > attempts to leave LPS Deep + L1, and the hardware simply gives up and
> > > halts the system.
> >
> > I think this is your perspective and induction, right? Did you measure
> > real hardware signals?
> >
> > My point is that if this is a hardware-level bus lockup, let's apply
> > quirk. If some malformed data causing kernel hangs, I'd add sanity check
> > on RX data, but I don't actually know what we should check for now.
> >
> > >
> > > > We have not modified RTL8821CE for a long time, so I'd add workaround
> > > > to specific platform as mentioned above.
> > >
> > > Adding a DMI/platform quirk specifically for this laptop to disable
> > > ASPM would be wonderful and deeply appreciated. I agree it is safer
> > > than touching the global flags for hardware that is functioning
> > > correctly out in the wild.
> > >
> > > Here is the exact identifying information for my system:
> > >
> > > System Vendor: HP
> > > Product Name: HP Notebook
> > > SKU Number: P3S95EA#ACB
> > > Family: 103C_5335KV
> > > PCI ID: 10ec:c821
> > > Subsystem ID: 103c:831a
> > >
> > > I am completely ready to test any patch or quirk you send my way.
> > > Thank you so much for your time and helping track this down!
> >
> > I sent a RFT [1] for test. Please check if it works on your HP notebook.
> > If you check rtw88 log, you can see I added similar patch 5 years ago,
> > and replaced by preferred the change of "rtwpci->rx_no_aspm", which I
> > think it can only resolve problem on partial notebooks though....
> >
> > [1]
> > https://lore.kernel.org/linux-wireless/20260311020816.7065-1-pkshih@realtek.
> > com/T/#u
>
> Forgot to say. Could you share your full name for me as a reporter
> in commit message?
>
>

  reply	other threads:[~2026-03-11 11:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 21:48 [BUG] wifi: rtw88: Hard system freeze on RTL8821CE when power_save is enabled (LPS/ASPM conflict) LB F
2026-03-10  2:02 ` Ping-Ke Shih
2026-03-10 11:01   ` LB F
2026-03-10 15:12     ` LB F
2026-03-11  2:20       ` Ping-Ke Shih
2026-03-11  2:15     ` Ping-Ke Shih
2026-03-11  2:22       ` Ping-Ke Shih
2026-03-11 11:00         ` LB F [this message]
2026-03-11 15:22           ` LB F
2026-03-12  1:56             ` Ping-Ke Shih
2026-03-12 21:42               ` LB F
2026-03-13  0:03                 ` LB F
2026-03-13  0:29                   ` LB F
2026-03-14 10:52                     ` LB F
2026-03-14 12:39                       ` LB F
2026-03-15  0:24                         ` LB F
2026-03-16  2:55                           ` Ping-Ke Shih
2026-03-16 20:27                             ` LB F
2026-03-17  1:28                               ` Ping-Ke Shih
2026-03-18  0:00                                 ` LB F
2026-03-18  0:58                                   ` Ping-Ke Shih
2026-03-18 23:55                                     ` LB F
2026-03-19  0:22                                       ` LB F
2026-03-19  0:49                                         ` Ping-Ke Shih
2026-03-19  1:24                                       ` Ping-Ke Shih
2026-03-19 23:58                                         ` LB F
2026-03-20  0:41                                           ` LB F
2026-03-20  1:00                                             ` Ping-Ke Shih
2026-03-20  1:19                                               ` LB F
2026-03-20  2:02                                                 ` Ping-Ke Shih
2026-03-21 12:07                                                   ` LB F
2026-03-23  2:01                                                     ` Ping-Ke Shih
2026-03-25 20:38                                                       ` LB F
2026-03-16  2:50                         ` Ping-Ke Shih

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALdGYqQn8GGXXjZTsL+a5Mfdmw5HRYB2Jyvqq5M5SUwxK9yd_g@mail.gmail.com \
    --to=goainwo@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=pkshih@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox