netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
       [not found] <20250305063035.415717-1-hans.zhang@cixtech.com>
@ 2025-03-05 22:20 ` Bjorn Helgaas
  2025-03-06  3:16   ` hans.zhang
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2025-03-05 22:20 UTC (permalink / raw)
  To: hans.zhang
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, Heiner Kallweit, nic_swsd, netdev

[+cc r8169 maintainers, since upstream r8169 claims device 0x8125]

On Wed, Mar 05, 2025 at 02:30:35PM +0800, hans.zhang@cixtech.com wrote:
> From: Hans Zhang <hans.zhang@cixtech.com>
> 
> This patch is intended to disable L0s ASPM link state for RTL8125 2.5GbE
> Controller due to the fact that it is possible to corrupt TX data when
> coming back out of L0s on some systems. This quirk uses the ASPM api to
> prevent the ASPM subsystem from re-enabling the L0s state.

Sounds like this should be a documented erratum.  Realtek folks?  Or
maybe an erratum on the other end of the link, which looks like a CIX
Root Port:

  https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001

If it's a CIX Root Port defect, it could affect devices other than
RTL8125.

> And it causes the following AER errors:
>   pcieport 0003:30:00.0: AER: Multiple Corrected error received: 0003:31:00.0
>   pcieport 0003:30:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>   pcieport 0003:30:00.0:   device [1f6c:0001] error status/mask=00001000/0000e000
>   pcieport 0003:30:00.0:    [12] Timeout
>   r8125 0003:31:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>   r8125 0003:31:00.0:   device [10ec:8125] error status/mask=00001000/0000e000
>   r8125 0003:31:00.0:    [12] Timeout
>   r8125 0003:31:00.0: AER:   Error of this Agent is reported first

Looks like a driver name of "r8125", but I don't see that upstream.
Is this an out-of-tree driver?

> And the RTL8125 website does not say that it supports L0s. It only supports
> L1 and L1ss.
> 
> RTL8125 website: https://www.realtek.com/Product/Index?id=3962

I don't think it matters what the web site says.  Apparently the
device advertises L0s support via Link Capabilities.

> Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
> Reviewed-by: Peter Chen <peter.chen@cixtech.com>
> ---
>  drivers/pci/quirks.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 82b21e34c545..5f69bb5ee3ff 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2514,6 +2514,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
>  
> +/*
> + * The RTL8125 may experience data corruption issues when transitioning out
> + * of L0S. To prevent this we need to disable L0S on the PCIe link.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x8125, quirk_disable_aspm_l0s);
> +
>  static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>  {
>  	pci_info(dev, "Disabling ASPM L0s/L1\n");
> 
> base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
  2025-03-05 22:20 ` [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller Bjorn Helgaas
@ 2025-03-06  3:16   ` hans.zhang
  2025-03-06  3:32   ` hans.zhang
  2025-03-06 23:00   ` Heiner Kallweit
  2 siblings, 0 replies; 6+ messages in thread
From: hans.zhang @ 2025-03-06  3:16 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, Heiner Kallweit, nic_swsd, netdev



On 2025/3/6 06:20, Bjorn Helgaas wrote:
> [Some people who received this message don't often get email from helgaas@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL
> 
> [+cc r8169 maintainers, since upstream r8169 claims device 0x8125]
> 
> On Wed, Mar 05, 2025 at 02:30:35PM +0800, hans.zhang@cixtech.com wrote:
>> From: Hans Zhang <hans.zhang@cixtech.com>
>>
>> This patch is intended to disable L0s ASPM link state for RTL8125 2.5GbE
>> Controller due to the fact that it is possible to corrupt TX data when
>> coming back out of L0s on some systems. This quirk uses the ASPM api to
>> prevent the ASPM subsystem from re-enabling the L0s state.
> 
> Sounds like this should be a documented erratum.  Realtek folks?  Or
> maybe an erratum on the other end of the link, which looks like a CIX
> Root Port:
> 
>    https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001
> 
> If it's a CIX Root Port defect, it could affect devices other than
> RTL8125.
> 
>> And it causes the following AER errors:
>>    pcieport 0003:30:00.0: AER: Multiple Corrected error received: 0003:31:00.0
>>    pcieport 0003:30:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>>    pcieport 0003:30:00.0:   device [1f6c:0001] error status/mask=00001000/0000e000
>>    pcieport 0003:30:00.0:    [12] Timeout
>>    r8125 0003:31:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>>    r8125 0003:31:00.0:   device [10ec:8125] error status/mask=00001000/0000e000
>>    r8125 0003:31:00.0:    [12] Timeout
>>    r8125 0003:31:00.0: AER:   Error of this Agent is reported first
> 
> Looks like a driver name of "r8125", but I don't see that upstream.
> Is this an out-of-tree driver?

I'm terribly sorry. In the r8169 driver, I see the setting for pulling 
the disable L0s. Please discard this patch.

When I enabled ASPM and insmod r8169.ko, it worked fine.

Previously, we used a separate RTL8125 driver from RELTEAK. Compared 
with upstream, it has more functions and is more complete, but their 
drivers do not disable L0s, so this patch is mentioned.

Best regards,
Hans

drivers/net/ethernet/realtek/r8169_main.c

static void rtl_task(struct work_struct *work)
{
	struct rtl8169_private *tp =
		container_of(work, struct rtl8169_private, wk.work);
	int ret;

	if (test_and_clear_bit(RTL_FLAG_TASK_TX_TIMEOUT, tp->wk.flags)) {
		/* if chip isn't accessible, reset bus to revive it */
		if (RTL_R32(tp, TxConfig) == ~0) {
			ret = pci_reset_bus(tp->pci_dev);
			if (ret < 0) {
				netdev_err(tp->dev, "Can't reset secondary PCI bus, detach NIC\n");
				netif_device_detach(tp->dev);
				return;
			}
		}

		/* ASPM compatibility issues are a typical reason for tx timeouts */
		ret = pci_disable_link_state(tp->pci_dev, PCIE_LINK_STATE_L1 |
							  PCIE_LINK_STATE_L0S);
		if (!ret)
			netdev_warn_once(tp->dev, "ASPM disabled on Tx timeout\n");
		goto reset;
	}

	if (test_and_clear_bit(RTL_FLAG_TASK_RESET_PENDING, tp->wk.flags)) {
reset:
		rtl_reset_work(tp);
		netif_wake_queue(tp->dev);
	}
}


> 
>> And the RTL8125 website does not say that it supports L0s. It only supports
>> L1 and L1ss.
>>
>> RTL8125 website: https://www.realtek.com/Product/Index?id=3962
> 
> I don't think it matters what the web site says.  Apparently the
> device advertises L0s support via Link Capabilities.
> 
>> Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
>> Reviewed-by: Peter Chen <peter.chen@cixtech.com>
>> ---
>>   drivers/pci/quirks.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 82b21e34c545..5f69bb5ee3ff 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -2514,6 +2514,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
>>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
>>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
>>
>> +/*
>> + * The RTL8125 may experience data corruption issues when transitioning out
>> + * of L0S. To prevent this we need to disable L0S on the PCIe link.
>> + */
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x8125, quirk_disable_aspm_l0s);
>> +
>>   static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>>   {
>>        pci_info(dev, "Disabling ASPM L0s/L1\n");
>>
>> base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
>> --
>> 2.47.1
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
  2025-03-05 22:20 ` [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller Bjorn Helgaas
  2025-03-06  3:16   ` hans.zhang
@ 2025-03-06  3:32   ` hans.zhang
  2025-03-06 16:28     ` Bjorn Helgaas
  2025-03-06 23:00   ` Heiner Kallweit
  2 siblings, 1 reply; 6+ messages in thread
From: hans.zhang @ 2025-03-06  3:32 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, Heiner Kallweit, nic_swsd, netdev



On 2025/3/6 06:20, Bjorn Helgaas wrote:
> Sounds like this should be a documented erratum.  Realtek folks?  Or
> maybe an erratum on the other end of the link, which looks like a CIX
> Root Port:
> 
>    https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001

Hi Bjorn,


Name: CIX P1 CD8180 PCI Express Root Port

0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
0001:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
0002:00:00.0 PCI bridge [0604]: Device [1f6c:0001]
0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]


This URL does not appear right, how should be changed, is it you? Or can 
you tell me who I should call to change it?

The correct answer is:
0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
0001:C0:00.0 PCI bridge [0604]: Device [1f6c:0001]
0002:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]
0004:00:00.0 PCI bridge [0604]: Device [1f6c:0001]

The domain might be random, so whichever controller probes first, it's 
assigned first. The URL currently shows the BDF with one controller 
missing. That's the order in which we're going to controller probe.

Best regards,
Hans



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
  2025-03-06  3:32   ` hans.zhang
@ 2025-03-06 16:28     ` Bjorn Helgaas
  2025-03-07  1:38       ` hans.zhang
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2025-03-06 16:28 UTC (permalink / raw)
  To: hans.zhang
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, Heiner Kallweit, nic_swsd, netdev

On Thu, Mar 06, 2025 at 11:32:04AM +0800, hans.zhang wrote:
> On 2025/3/6 06:20, Bjorn Helgaas wrote:
> > Sounds like this should be a documented erratum.  Realtek folks?  Or
> > maybe an erratum on the other end of the link, which looks like a CIX
> > Root Port:
> > 
> >    https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001
> 
> Name: CIX P1 CD8180 PCI Express Root Port
> 
> 0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0001:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0002:00:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 
> 
> This URL does not appear right, how should be changed, is it you? Or can you
> tell me who I should call to change it?
> 
> The correct answer is:
> 0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0001:C0:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0002:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 0004:00:00.0 PCI bridge [0604]: Device [1f6c:0001]

This part of the web page is just commentary.  In this case it's just
an example of what devices might be on some system.  It's not a
requirement that all systems have this many devices or devices at
these addresses.

The only important parts are the Vendor ID, Device ID, and the name
("CIX P1 CD8180 PCI Express Root Port").  If those are correct, no
need to do anything.

> The domain might be random, so whichever controller probes first, it's
> assigned first. The URL currently shows the BDF with one controller missing.
> That's the order in which we're going to controller probe.
> 
> Best regards,
> Hans
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
  2025-03-05 22:20 ` [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller Bjorn Helgaas
  2025-03-06  3:16   ` hans.zhang
  2025-03-06  3:32   ` hans.zhang
@ 2025-03-06 23:00   ` Heiner Kallweit
  2 siblings, 0 replies; 6+ messages in thread
From: Heiner Kallweit @ 2025-03-06 23:00 UTC (permalink / raw)
  To: Bjorn Helgaas, hans.zhang
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, nic_swsd, netdev

On 05.03.2025 23:20, Bjorn Helgaas wrote:
> [+cc r8169 maintainers, since upstream r8169 claims device 0x8125]
> 
> On Wed, Mar 05, 2025 at 02:30:35PM +0800, hans.zhang@cixtech.com wrote:
>> From: Hans Zhang <hans.zhang@cixtech.com>
>>
>> This patch is intended to disable L0s ASPM link state for RTL8125 2.5GbE
>> Controller due to the fact that it is possible to corrupt TX data when
>> coming back out of L0s on some systems. This quirk uses the ASPM api to
>> prevent the ASPM subsystem from re-enabling the L0s state.
> 
> Sounds like this should be a documented erratum.  Realtek folks?  Or
> maybe an erratum on the other end of the link, which looks like a CIX
> Root Port:
> 
>   https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001
> 
> If it's a CIX Root Port defect, it could affect devices other than
> RTL8125.
> 
>> And it causes the following AER errors:
>>   pcieport 0003:30:00.0: AER: Multiple Corrected error received: 0003:31:00.0
>>   pcieport 0003:30:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>>   pcieport 0003:30:00.0:   device [1f6c:0001] error status/mask=00001000/0000e000
>>   pcieport 0003:30:00.0:    [12] Timeout
>>   r8125 0003:31:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
>>   r8125 0003:31:00.0:   device [10ec:8125] error status/mask=00001000/0000e000
>>   r8125 0003:31:00.0:    [12] Timeout
>>   r8125 0003:31:00.0: AER:   Error of this Agent is reported first
> 
> Looks like a driver name of "r8125", but I don't see that upstream.
> Is this an out-of-tree driver?
> 
Yes, this refers to Realtek's out-of-tree r8125 driver.
As stated by Hans, with the r8169 in-tree driver the issue doesn't occur.

>> And the RTL8125 website does not say that it supports L0s. It only supports
>> L1 and L1ss.
>>
>> RTL8125 website: https://www.realtek.com/Product/Index?id=3962
> 
> I don't think it matters what the web site says.  Apparently the
> device advertises L0s support via Link Capabilities.
> 
>> Signed-off-by: Hans Zhang <hans.zhang@cixtech.com>
>> Reviewed-by: Peter Chen <peter.chen@cixtech.com>
>> ---
>>  drivers/pci/quirks.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 82b21e34c545..5f69bb5ee3ff 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -2514,6 +2514,12 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f1, quirk_disable_aspm_l0s);
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x10f4, quirk_disable_aspm_l0s);
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x1508, quirk_disable_aspm_l0s);
>>  
>> +/*
>> + * The RTL8125 may experience data corruption issues when transitioning out
>> + * of L0S. To prevent this we need to disable L0S on the PCIe link.
>> + */
>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x8125, quirk_disable_aspm_l0s);
>> +
>>  static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>>  {
>>  	pci_info(dev, "Disabling ASPM L0s/L1\n");
>>
>> base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
>> -- 
>> 2.47.1
>>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller
  2025-03-06 16:28     ` Bjorn Helgaas
@ 2025-03-07  1:38       ` hans.zhang
  0 siblings, 0 replies; 6+ messages in thread
From: hans.zhang @ 2025-03-07  1:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: bhelgaas, cix-kernel-upstream, linux-pci, linux-kernel,
	Peter Chen, ChunHao Lin, Heiner Kallweit, nic_swsd, netdev



On 2025/3/7 00:28, Bjorn Helgaas wrote:
> [Some people who received this message don't often get email from helgaas@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> EXTERNAL EMAIL
> 
> On Thu, Mar 06, 2025 at 11:32:04AM +0800, hans.zhang wrote:
>> On 2025/3/6 06:20, Bjorn Helgaas wrote:
>>> Sounds like this should be a documented erratum.  Realtek folks?  Or
>>> maybe an erratum on the other end of the link, which looks like a CIX
>>> Root Port:
>>>
>>>     https://admin.pci-ids.ucw.cz/read/PC/1f6c/0001
>>
>> Name: CIX P1 CD8180 PCI Express Root Port
>>
>> 0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0001:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0002:00:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]
>>
>>
>> This URL does not appear right, how should be changed, is it you? Or can you
>> tell me who I should call to change it?
>>
>> The correct answer is:
>> 0000:90:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0001:C0:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0002:60:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0003:30:00.0 PCI bridge [0604]: Device [1f6c:0001]
>> 0004:00:00.0 PCI bridge [0604]: Device [1f6c:0001]
> 
> This part of the web page is just commentary.  In this case it's just
> an example of what devices might be on some system.  It's not a
> requirement that all systems have this many devices or devices at
> these addresses.
> 
> The only important parts are the Vendor ID, Device ID, and the name
> ("CIX P1 CD8180 PCI Express Root Port").  If those are correct, no
> need to do anything.

I see. Thank you very much Bjorn.

Best regards,
Hans


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-07  1:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250305063035.415717-1-hans.zhang@cixtech.com>
2025-03-05 22:20 ` [PATCH] PCI: Add PCI quirk to disable L0s ASPM state for RTL8125 2.5GbE Controller Bjorn Helgaas
2025-03-06  3:16   ` hans.zhang
2025-03-06  3:32   ` hans.zhang
2025-03-06 16:28     ` Bjorn Helgaas
2025-03-07  1:38       ` hans.zhang
2025-03-06 23:00   ` Heiner Kallweit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).