linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
@ 2023-02-05 15:48 Maciej W. Rozycki
  2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Hi,

 This is v6 of the change to work around a PCIe link training phenomenon 
where a pair of devices both capable of operating at a link speed above 
2.5GT/s seems unable to negotiate the link speed and continues training 
indefinitely with the Link Training bit switching on and off repeatedly 
and the data link layer never reaching the active state.

 Following Bjorn's suggestion from the previous iteration:
<https://lore.kernel.org/lkml/20221109050418.GA529724@bhelgaas/> I have 
moved the workaround into the PCI core.  I have kept the part specific to 
ASMedia (to lift the speed restriction after a successful retrain) within, 
although I find it a good candidate for a standalone quirk.  It seems to 
me we'd have to add additional classes of fixups however to move this part 
to drivers/pci/quirks.c, which I think would be an overkill.  So I've only 
made it explicitly guarded by CONFIG_PCI_QUIRKS; I can see there's prior 
art with this approach.

 In the course of the update I have realised that commit 6b2f1351af56 
("PCI: Wait for device to become ready after secondary bus reset") makes 
no sense and was about to figure out what to do here about it, but then 
found Lukas's recent patch series addressing this issue (thanks, Lukas, 
you made my life easier!), so I have rebased my patch set on top of 
Lukas's:
<https://lore.kernel.org/all/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de/>.

 This has resulted in mild ugliness in that `pcie_downstream_link_retrain' 
may be called from `pci_bridge_wait_for_secondary_bus' twice, first time 
via `pcie_wait_for_link_delay' and second time via `pci_dev_wait'.  This 
second call to `pcie_downstream_link_retrain' will do nothing, because for 
`link_active_reporting' devices `pcie_wait_for_link_delay' will have 
ensured the link has gone up or the second call won't have been reached.

 I have also decided to move the initialisation of `link_active_reporting' 
earlier on, so as to have a single way to check for the feature.  This has 
brought an extra patch and its 3 clean-up dependencies into the series.

 This was originally observed in a configuration featuring a downstream 
port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the 
Pericom PI7C9X2G304 Gen 2 switch.  However in the course of review I have 
come to the conclusion that similarly to the earlier similar change to 
U-Boot it is indeed expected to be safe to apply this workaround to any 
downstream port that has failed link negotiation provided that:

1. the port is capable of reporting the data link layer link active 
   status (because unlike U-Boot we cannot busy-loop continuously polling 
   the link training bit),

and:

2. we don't attempt to lift the 2.5GT/s speed restriction, imposed as the
   basis of the workaround, for devices not explicitly known to continue 
   working in that case.

It is expected to be safe because the workaround is applied to a failed 
link, that is one that does not (at the time this code is executed) work 
anyway, so trying to bring it up cannot make the situation worse.

 This has been verified with a SiFive HiFive unmatched board, with and 
without CONFIG_PCI_QUIRKS enabled, booting with or without the workaround 
activated in U-Boot, which covered both the link retraining part of the 
quirk and the lifting of speed restriction already imposed by U-Boot.

 I have also issued resets via sysfs to see how this change behaves.  For 
the problematic link this required a hack to remove a `dev->subordinate' 
check from `pci_parent_bus_reset', which in turn triggered the workaround 
as expected and brought the link up (but otherwise clobbered downstream 
devices as one would inevitably expect).

 I have no way to verify these patches with power management or hot-plug 
events, but owing to Lukas's effort they get into the same infrastructure, 
so I expect the workaround to do its job as expected.  I note that there 
is an extra call to `pcie_wait_for_link' from `pciehp_check_link_status', 
but I expect it to work too.  For `link_active_reporting' devices it will 
call `pcie_downstream_link_retrain' and for`!link_active_reporting' ones 
we have no means to do anything anyway.

 The 3 extra clean-ups were only compile-tested (with PowerPC and x86-64 
configurations, as appropriate), because I have no suitable hardware 
available.

 Please see individual change descriptions for further details.

 Let me know if this is going in the right direction.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/7] PCI: Export PCI link retrain timeout
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
@ 2023-02-05 15:48 ` Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
available via "pci.h" for PCI drivers to use.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v5.

New change in v5.
---
 drivers/pci/pci.h       |    2 ++
 drivers/pci/pcie/aspm.c |    4 +---
 2 files changed, 3 insertions(+), 3 deletions(-)

linux-pcie-link-retrain-timeout.diff
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -11,6 +11,8 @@
 
 #define PCI_VSEC_ID_INTEL_TBT	0x1234	/* Thunderbolt */
 
+#define PCIE_LINK_RETRAIN_TIMEOUT HZ
+
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
 
Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -90,8 +90,6 @@ static const char *policy_str[] = {
 	[POLICY_POWER_SUPERSAVE] = "powersupersave"
 };
 
-#define LINK_RETRAIN_TIMEOUT HZ
-
 /*
  * The L1 PM substate capability is only implemented in function 0 in a
  * multi function device.
@@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
 	}
 
 	/* Wait for link training end. Break out after waiting for timeout */
-	end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
+	end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
 	do {
 		pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
 		if (!(reg16 & PCI_EXP_LNKSTA_LT))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
  2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Make `quirk_enable_clear_retrain_link' `pci_fixup_early' so that any later 
fixups can rely on `clear_retrain_link' to have been already initialised.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v5.

New change in v5.
---
 drivers/pci/quirks.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

linux-pcie-clear-retrain-link-early.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -2407,9 +2407,9 @@ static void quirk_enable_clear_retrain_l
 	dev->clear_retrain_link = 1;
 	pci_info(dev, "Enable PCIe Retrain Link quirk\n");
 }
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
 
 static void fixup_rev1_53c810(struct pci_dev *dev)
 {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
  2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Determine whether Data Link Layer Link Active Reporting is available 
ahead of calling any fixups so that the cached value can be used there 
and later on.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v6.
---
 drivers/pci/probe.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

linux-pcie-link-active-reporting-early.diff
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -819,7 +819,6 @@ static void pci_set_bus_speed(struct pci
 
 		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
 		bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
-		bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
 
 		pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
 		pcie_update_link_speed(bus, linksta);
@@ -1828,6 +1827,7 @@ int pci_setup_device(struct pci_dev *dev
 	int pos = 0;
 	struct pci_bus_region region;
 	struct resource *res;
+	u32 linkcap;
 
 	hdr_type = pci_hdr_type(dev);
 
@@ -1873,6 +1873,10 @@ int pci_setup_device(struct pci_dev *dev
 	/* "Unknown power state" */
 	dev->current_state = PCI_UNKNOWN;
 
+	/* Set it early to make it available to fixups, etc.  */
+	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+	dev->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
+
 	/* Early fixups, before probing the BARs */
 	pci_fixup_device(pci_fixup_early, dev);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting'
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (2 preceding siblings ...)
  2023-02-05 15:49 ` [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 5/7] net/mlx5: " Maciej W. Rozycki
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with a PPC64LE configuration.

New change in v6.
---
 arch/powerpc/kernel/eeh_pe.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

linux-pcie-link-active-reporting-eeh.diff
Index: linux-macro/arch/powerpc/kernel/eeh_pe.c
===================================================================
--- linux-macro.orig/arch/powerpc/kernel/eeh_pe.c
+++ linux-macro/arch/powerpc/kernel/eeh_pe.c
@@ -671,9 +671,8 @@ static void eeh_bridge_check_link(struct
 	eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val);
 
 	/* Check link */
-	eeh_ops->read_config(edev, cap + PCI_EXP_LNKCAP, 4, &val);
-	if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
-		eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val);
+	if (!edev->pdev->link_active_reporting) {
+		eeh_edev_dbg(edev, "No link reporting capability\n");
 		msleep(1000);
 		return;
 	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 5/7] net/mlx5: Rely on `link_active_reporting'
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (3 preceding siblings ...)
  2023-02-05 15:49 ` [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with PPC64LE and x86-64 
configurations.

New change in v6.
---
 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

linux-pcie-link-active-reporting-mlx5.diff
Index: linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
===================================================================
--- linux-macro.orig/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -294,7 +294,6 @@ static int mlx5_pci_link_toggle(struct m
 	unsigned long timeout;
 	struct pci_dev *sdev;
 	int cap, err;
-	u32 reg32;
 
 	/* Check that all functions under the pci bridge are PFs of
 	 * this device otherwise fail this function.
@@ -333,11 +332,8 @@ static int mlx5_pci_link_toggle(struct m
 		return err;
 
 	/* Check link */
-	err = pci_read_config_dword(bridge, cap + PCI_EXP_LNKCAP, &reg32);
-	if (err)
-		return err;
-	if (!(reg32 & PCI_EXP_LNKCAP_DLLLARC)) {
-		mlx5_core_warn(dev, "No PCI link reporting capability (0x%08x)\n", reg32);
+	if (!bridge->link_active_reporting) {
+		mlx5_core_warn(dev, "No PCI link reporting capability\n");
 		msleep(1000);
 		goto restore;
 	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (4 preceding siblings ...)
  2023-02-05 15:49 ` [PATCH v6 5/7] net/mlx5: " Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-13 13:53   ` Lukas Wunner
  2023-02-05 15:49 ` [PATCH v6 7/7] PCI: Work around PCIe link training failures Maciej W. Rozycki
  2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki
  7 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Use `link_active_reporting' to determine whether Data Link Layer Link 
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.

New change in v6.
---
 drivers/pci/hotplug/pciehp_hpc.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

linux-pcie-link-active-reporting-hpc.diff
Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
@@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
 struct controller *pcie_init(struct pcie_device *dev)
 {
 	struct controller *ctrl;
-	u32 slot_cap, slot_cap2, link_cap;
+	u32 slot_cap, slot_cap2;
 	u8 poweron;
 	struct pci_dev *pdev = dev->port;
 	struct pci_bus *subordinate = pdev->subordinate;
@@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
 	if (dmi_first_match(inband_presence_disabled_dmi_table))
 		ctrl->inband_presence_disabled = 1;
 
-	/* Check if Data Link Layer Link Active Reporting is implemented */
-	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
-
 	/* Clear all remaining event bits in Slot Status register. */
 	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
 		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
@@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
 		FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
 		FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
 		FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
-		FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
+		FLAG(pdev->link_active_reporting, true),
 		pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
 
 	/*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 7/7] PCI: Work around PCIe link training failures
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (5 preceding siblings ...)
  2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
  2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

Attempt to handle cases such as with a downstream port of the ASMedia 
ASM2824 PCIe switch where link training never completes and the link 
continues switching between speeds indefinitely with the data link layer 
never reaching the active state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
falling back to 2.5GT/s.

Instead the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported repeatedly 
active ~84% of the time.  Forcibly limiting the target link speed to 
2.5GT/s with the upstream ASM2824 device however makes the two switches 
communicate correctly.  Removing the speed restriction afterwards makes 
the two devices switch to 5.0GT/s then.

Make use of these observations then and detect the inability to train 
the link, by checking for the Data Link Layer Link Active status bit 
being off while the Link Bandwidth Management Status indicating that 
hardware has changed the link speed or width in an attempt to correct 
unreliable link operation.

Restrict the speed to 2.5GT/s then with the Target Link Speed field, 
request a retrain and wait 200ms for the data link to go up.  If this 
turns out successful, then lift the restriction, letting the devices 
negotiate a higher speed.

Also check for a 2.5GT/s speed restriction the firmware may have already 
arranged and lift it too with ports of devices known to continue working 
afterwards, currently the ASM2824 only, that already report their data 
link being up.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
---
Changes from v5:

- Move from a quirk into PCI core and call at device probing, hot-plug,
  reset and resume.  Keep the ASMedia part under CONFIG_PCI_QUIRKS.

- Rely on `dev->link_active_reporting' rather than re-retrieving the 
  capability.

Changes from v4:

- Remove <linux/bug.h> inclusion no longer needed.

- Make the quirk generic based on probing device features rather than 
  specific to the ASM2824 part only; take the Retrain Link bit erratum 
  into account.

- Still lift the 2.5GT/s speed restriction with the ASM2824 only.

- Increase retrain timeout from 200ms to 1s (PCIE_LINK_RETRAIN_TIMEOUT).

- Remove retrain success notification.

- Use PCIe helpers rather than generic PCI functions throughout.

- Trim down and update the wording of the change description for the 
  switch from an ASM2824-specific to a generic fixup.

Changes from v3:

- Remove the <linux/pci_ids.h> entry for the ASM2824.

Changes from v2:

- Regenerate for 5.17-rc2 for a merge conflict.

- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
  early return.

Changes from v1:

- Regenerate for a merge conflict.
---
 drivers/pci/pci.c   |  154 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 drivers/pci/pci.h   |    1 
 drivers/pci/probe.c |    2 
 3 files changed, 152 insertions(+), 5 deletions(-)

linux-pcie-asm2824-manual-retrain.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -859,6 +859,132 @@ int pci_wait_for_pending(struct pci_dev
 	return 0;
 }
 
+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s.  However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time.  Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however.  And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation.  For a port that has been left unconnected
+ * both bits will be clear.  So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Since the exact nature of the problem isn't known and in principle this
+ * could trigger where an ASM2824 device is downstream rather upstream,
+ * apply this erratum workaround to any downstream ports as long as they
+ * support Link Active reporting and have the Link Control 2 register.
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up.
+ *
+ * If this turns out successful and we know by the Vendor:Device ID it is
+ * safe to do so, then lift the restriction, letting the devices negotiate
+ * a higher speed.  Also check for a similar 2.5GT/s speed restriction the
+ * firmware may have already arranged and lift it with ports that already
+ * report their data link being up.
+ *
+ * Return 0 if the link has been successfully retrained, otherwise -1.
+ */
+int pcie_downstream_link_retrain(struct pci_dev *dev)
+{
+	static const struct pci_device_id ids[] = {
+		{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
+		{}
+	};
+	u16 lnksta, lnkctl2;
+
+	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
+	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
+		return -1;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
+	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+	if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+	    PCI_EXP_LNKSTA_LBMS) {
+		unsigned long timeout;
+		u16 lnkctl;
+
+		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
+
+		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+		lnkctl |= PCI_EXP_LNKCTL_RL;
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+		/*
+		 * Due to an erratum in some devices the Retrain Link bit
+		 * needs to be cleared again manually to allow the link
+		 * training to succeed.
+		 */
+		lnkctl &= ~PCI_EXP_LNKCTL_RL;
+		if (dev->clear_retrain_link)
+			pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
+						   lnkctl);
+
+		timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
+		do {
+			pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
+					     &lnksta);
+			if (lnksta & PCI_EXP_LNKSTA_DLLLA)
+				break;
+			usleep_range(10000, 20000);
+		} while (time_before(jiffies, timeout));
+
+		if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
+			pci_info(dev, "retraining failed\n");
+			return -1;
+		}
+	}
+
+	if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+	    pci_match_id(ids, dev)) {
+		u32 lnkcap;
+		u16 lnkctl;
+
+		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+		lnkctl |= PCI_EXP_LNKCTL_RL;
+		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+		lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+	}
+
+	return 0;
+}
+
+/* Same as above, but called for a downstream device.  */
+static int pcie_upstream_link_retrain(struct pci_dev *dev)
+{
+	struct pci_dev *bridge;
+
+	bridge = pci_upstream_bridge(dev);
+	if (bridge)
+		return pcie_downstream_link_retrain(bridge);
+	else
+		return -1;
+}
+
 static int pci_acs_enable;
 
 /**
@@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus)
 
 static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
 {
+	int retrain = 0;
 	int delay = 1;
-	u32 id;
 
 	/*
 	 * After reset, the device should not silently discard config
@@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev *
 	 * Command register instead of Vendor ID so we don't have to
 	 * contend with the CRS SV value.
 	 */
-	pci_read_config_dword(dev, PCI_COMMAND, &id);
-	while (PCI_POSSIBLE_ERROR(id)) {
+	for (;;) {
+		u32 id;
+
+		pci_read_config_dword(dev, PCI_COMMAND, &id);
+		if (!PCI_POSSIBLE_ERROR(id)) {
+			if (delay > PCI_RESET_WAIT)
+				pci_info(dev, "ready %dms after %s\n",
+					 delay - 1, reset_type);
+			break;
+		}
+
 		if (delay > timeout) {
 			pci_warn(dev, "not ready %dms after %s; giving up\n",
 				 delay - 1, reset_type);
 			return -ENOTTY;
 		}
 
-		if (delay > PCI_RESET_WAIT)
+		if (delay > PCI_RESET_WAIT) {
+			if (!retrain) {
+				retrain = 1;
+				if (pcie_upstream_link_retrain(dev) == 0) {
+					delay = 1;
+					continue;
+				}
+			}
 			pci_info(dev, "not ready %dms after %s; waiting\n",
 				 delay - 1, reset_type);
+		}
 
 		msleep(delay);
 		delay *= 2;
-		pci_read_config_dword(dev, PCI_COMMAND, &id);
 	}
 
 	if (delay > PCI_RESET_WAIT)
@@ -4901,6 +5043,8 @@ static bool pcie_wait_for_link_delay(str
 		msleep(10);
 		timeout -= 10;
 	}
+	if (active && !ret)
+		ret = pcie_downstream_link_retrain(pdev) == 0;
 	if (active && ret)
 		msleep(delay);
 
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -37,6 +37,7 @@ int pci_mmap_fits(struct pci_dev *pdev,
 		  enum pci_mmap_api mmap_api);
 
 bool pci_reset_supported(struct pci_dev *dev);
+int pcie_downstream_link_retrain(struct pci_dev *dev);
 void pci_init_reset_methods(struct pci_dev *dev);
 int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
 int pci_bus_error_reset(struct pci_dev *dev);
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -2546,6 +2546,8 @@ void pci_device_add(struct pci_dev *dev,
 	dma_set_max_seg_size(&dev->dev, 65536);
 	dma_set_seg_boundary(&dev->dev, 0xffffffff);
 
+	pcie_downstream_link_retrain(dev);
+
 	/* Fix up broken headers */
 	pci_fixup_device(pci_fixup_header, dev);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'
  2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
@ 2023-02-13 13:53   ` Lukas Wunner
  0 siblings, 0 replies; 10+ messages in thread
From: Lukas Wunner @ 2023-02-13 13:53 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: linux-pci, linux-kernel, Eric Dumazet, Oliver O'Halloran,
	Stefan Roese, Leon Romanovsky, linux-rdma, Jakub Kicinski,
	Paolo Abeni, Jim Wilson, Nicholas Piggin, Alex Williamson,
	Bjorn Helgaas, Mika Westerberg, David Abdurachmanov, linuxppc-dev,
	Mahesh J Salgaonkar, David S. Miller, netdev, Pali Rohár,
	Saeed Mahameed

On Sun, Feb 05, 2023 at 03:49:21PM +0000, Maciej W. Rozycki wrote:
> Use `link_active_reporting' to determine whether Data Link Layer Link 
> Active Reporting is available rather than re-retrieving the capability.
> 
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>

Reviewed-by: Lukas Wunner <lukas@wunner.de>

I believe this should work without the preceding patches in the series,
hence can be applied independently.

Thanks,

Lukas

>  drivers/pci/hotplug/pciehp_hpc.c |    7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> linux-pcie-link-active-reporting-hpc.diff
> Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> @@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
>  struct controller *pcie_init(struct pcie_device *dev)
>  {
>  	struct controller *ctrl;
> -	u32 slot_cap, slot_cap2, link_cap;
> +	u32 slot_cap, slot_cap2;
>  	u8 poweron;
>  	struct pci_dev *pdev = dev->port;
>  	struct pci_bus *subordinate = pdev->subordinate;
> @@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
>  	if (dmi_first_match(inband_presence_disabled_dmi_table))
>  		ctrl->inband_presence_disabled = 1;
>  
> -	/* Check if Data Link Layer Link Active Reporting is implemented */
> -	pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
> -
>  	/* Clear all remaining event bits in Slot Status register. */
>  	pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
>  		PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> @@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
>  		FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
>  		FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
>  		FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
> -		FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
> +		FLAG(pdev->link_active_reporting, true),
>  		pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
>  
>  	/*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
  2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
                   ` (6 preceding siblings ...)
  2023-02-05 15:49 ` [PATCH v6 7/7] PCI: Work around PCIe link training failures Maciej W. Rozycki
@ 2023-02-19 18:52 ` Maciej W. Rozycki
  7 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2023-02-19 18:52 UTC (permalink / raw)
  To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy,
	Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Pali Rohár, David Abdurachmanov, linux-rdma, Mika Westerberg,
	linuxppc-dev, linux-kernel, Alex Williamson, Lukas Wunner,
	linux-pci, Stefan Roese, Jim Wilson, netdev

On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:

>  This is v6 of the change to work around a PCIe link training phenomenon 
> where a pair of devices both capable of operating at a link speed above 
> 2.5GT/s seems unable to negotiate the link speed and continues training 
> indefinitely with the Link Training bit switching on and off repeatedly 
> and the data link layer never reaching the active state.

 Ping for: 
<https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2302022022230.45310@angie.orcam.me.uk/>.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-19 18:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 5/7] net/mlx5: " Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
2023-02-13 13:53   ` Lukas Wunner
2023-02-05 15:49 ` [PATCH v6 7/7] PCI: Work around PCIe link training failures Maciej W. Rozycki
2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).