linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Wireless <linux-wireless@vger.kernel.org>,
	Linux PCI <linux-pci@vger.kernel.org>,
	Linux PowerPC <linuxppc-dev@lists.ozlabs.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Ping-Ke Shih <pkshih@realtek.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	Jian-Hong Pan <jhp@endlessos.org>
Subject: Fwd: The PCIe AER error flood between PCIe bridge and Realtek's RTL8723BE makes system hang
Date: Sat, 11 Nov 2023 09:07:35 +0700	[thread overview]
Message-ID: <60585667-70ca-4ace-8d8f-dbdd8d4428a6@gmail.com> (raw)

Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

> We have an ASUS X555UQ laptop equipped with Intel i7-6500U CPU and Realtek RTL8723BE PCIe Wireless adapter.
> 
> We tested it with kernel 6.6.  System keeps showing AER error message flood, even hangs up, until rtl8723be's ASPM is disabled.
> 
> kernel: pcieport 0000:00:1c.5: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> kernel: pcieport 0000:00:1c.5:   device [8086:9d15] error status/mask=00000001/00002000
> kernel: pcieport 0000:00:1c.5:    [ 0] RxErr                  (First)
> kernel: pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> kernel: pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> kernel: pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
> kernel: pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> kernel: pcieport 0000:00:1c.5: AER: Multiple Corrected error received: 0000:00:1c.5
> kernel: pcieport 0000:00:1c.5: AER: can't find device of ID00e5
> 
> Here is the PCI tree:
> $ lspci -tv
> -[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers
>            +-02.0  Intel Corporation Skylake GT2 [HD Graphics 520]
>            +-04.0  Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
>            +-14.0  Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller
>            +-14.2  Intel Corporation Sunrise Point-LP Thermal subsystem
>            +-15.0  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0
>            +-15.1  Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1
>            +-16.0  Intel Corporation Sunrise Point-LP CSME HECI #1
>            +-17.0  Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode]
>            +-1c.0-[01]----00.0  NVIDIA Corporation GM108M [GeForce 940MX]
>            +-1c.4-[02]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>            +-1c.5-[03]----00.0  Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter
>            +-1f.0  Intel Corporation Sunrise Point-LP LPC Controller
>            +-1f.2  Intel Corporation Sunrise Point-LP PMC
>            +-1f.3  Intel Corporation Sunrise Point-LP HD Audio
>            \-1f.4  Intel Corporation Sunrise Point-LP SMBus

And then the reporter found that it was ASPM bug:

> Notice a long time ago discussion mail: Dmesg filled with "AER: Corrected error received" [1]
> 
> So, I force write 1 to clear Receiver Error Status bit of Correctable Error Status Register, like
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 9c8fd69ae5ad..39faedd2ec8e 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1141,8 +1160,9 @@ static void aer_isr_one_error(struct aer_rpc *rpc,
>                         e_info.multi_error_valid = 0;
>                 aer_print_port_info(pdev, &e_info);
>  
> -               if (find_source_device(pdev, &e_info))
> -                       aer_process_err_devices(&e_info);
> +               //if (find_source_device(pdev, &e_info))
> +               //      aer_process_err_devices(&e_info);
> +               pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_STATUS, 0x1);
>         }
>  
>         if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) {
> 
> Then, system should clear the error right away.  However, system still get the AER flood ...
> 
> Seems that we still have to disable rtl8723be's ASPM.

See Bugzilla for the full thread and attached full kernel logs.

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218127

-- 
An old man doll... just what I always wanted! - Clara

                 reply	other threads:[~2023-11-11  2:08 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60585667-70ca-4ace-8d8f-dbdd8d4428a6@gmail.com \
    --to=bagasdotme@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=jhp@endlessos.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=pkshih@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).