* [PATCH v6 1/7] PCI: Export PCI link retrain timeout
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
@ 2023-02-05 15:48 ` Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
` (6 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:48 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
available via "pci.h" for PCI drivers to use.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v5.
New change in v5.
---
drivers/pci/pci.h | 2 ++
drivers/pci/pcie/aspm.c | 4 +---
2 files changed, 3 insertions(+), 3 deletions(-)
linux-pcie-link-retrain-timeout.diff
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -11,6 +11,8 @@
#define PCI_VSEC_ID_INTEL_TBT 0x1234 /* Thunderbolt */
+#define PCIE_LINK_RETRAIN_TIMEOUT HZ
+
extern const unsigned char pcie_link_speed[];
extern bool pci_early_dump;
Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -90,8 +90,6 @@ static const char *policy_str[] = {
[POLICY_POWER_SUPERSAVE] = "powersupersave"
};
-#define LINK_RETRAIN_TIMEOUT HZ
-
/*
* The L1 PM substate capability is only implemented in function 0 in a
* multi function device.
@@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
}
/* Wait for link training end. Break out after waiting for timeout */
- end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
+ end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
do {
pcie_capability_read_word(parent, PCI_EXP_LNKSTA, ®16);
if (!(reg16 & PCI_EXP_LNKSTA_LT))
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
` (5 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Make `quirk_enable_clear_retrain_link' `pci_fixup_early' so that any later
fixups can rely on `clear_retrain_link' to have been already initialised.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
No change from v5.
New change in v5.
---
drivers/pci/quirks.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
linux-pcie-clear-retrain-link-early.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -2407,9 +2407,9 @@ static void quirk_enable_clear_retrain_l
dev->clear_retrain_link = 1;
pci_info(dev, "Enable PCIe Retrain Link quirk\n");
}
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
static void fixup_rev1_53c810(struct pci_dev *dev)
{
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-02-05 15:48 ` [PATCH v6 1/7] PCI: Export PCI link retrain timeout Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Determine whether Data Link Layer Link Active Reporting is available
ahead of calling any fixups so that the cached value can be used there
and later on.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
New change in v6.
---
drivers/pci/probe.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
linux-pcie-link-active-reporting-early.diff
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -819,7 +819,6 @@ static void pci_set_bus_speed(struct pci
pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
- bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_update_link_speed(bus, linksta);
@@ -1828,6 +1827,7 @@ int pci_setup_device(struct pci_dev *dev
int pos = 0;
struct pci_bus_region region;
struct resource *res;
+ u32 linkcap;
hdr_type = pci_hdr_type(dev);
@@ -1873,6 +1873,10 @@ int pci_setup_device(struct pci_dev *dev
/* "Unknown power state" */
dev->current_state = PCI_UNKNOWN;
+ /* Set it early to make it available to fixups, etc. */
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+ dev->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
+
/* Early fixups, before probing the BARs */
pci_fixup_device(pci_fixup_early, dev);
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting'
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
` (2 preceding siblings ...)
2023-02-05 15:49 ` [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 5/7] net/mlx5: " Maciej W. Rozycki
` (3 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with a PPC64LE configuration.
New change in v6.
---
arch/powerpc/kernel/eeh_pe.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
linux-pcie-link-active-reporting-eeh.diff
Index: linux-macro/arch/powerpc/kernel/eeh_pe.c
===================================================================
--- linux-macro.orig/arch/powerpc/kernel/eeh_pe.c
+++ linux-macro/arch/powerpc/kernel/eeh_pe.c
@@ -671,9 +671,8 @@ static void eeh_bridge_check_link(struct
eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val);
/* Check link */
- eeh_ops->read_config(edev, cap + PCI_EXP_LNKCAP, 4, &val);
- if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
- eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val);
+ if (!edev->pdev->link_active_reporting) {
+ eeh_edev_dbg(edev, "No link reporting capability\n");
msleep(1000);
return;
}
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v6 5/7] net/mlx5: Rely on `link_active_reporting'
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
` (3 preceding siblings ...)
2023-02-05 15:49 ` [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.
New change in v6.
---
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
linux-pcie-link-active-reporting-mlx5.diff
Index: linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
===================================================================
--- linux-macro.orig/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -294,7 +294,6 @@ static int mlx5_pci_link_toggle(struct m
unsigned long timeout;
struct pci_dev *sdev;
int cap, err;
- u32 reg32;
/* Check that all functions under the pci bridge are PFs of
* this device otherwise fail this function.
@@ -333,11 +332,8 @@ static int mlx5_pci_link_toggle(struct m
return err;
/* Check link */
- err = pci_read_config_dword(bridge, cap + PCI_EXP_LNKCAP, ®32);
- if (err)
- return err;
- if (!(reg32 & PCI_EXP_LNKCAP_DLLLARC)) {
- mlx5_core_warn(dev, "No PCI link reporting capability (0x%08x)\n", reg32);
+ if (!bridge->link_active_reporting) {
+ mlx5_core_warn(dev, "No PCI link reporting capability\n");
msleep(1000);
goto restore;
}
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
` (4 preceding siblings ...)
2023-02-05 15:49 ` [PATCH v6 5/7] net/mlx5: " Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-13 13:53 ` Lukas Wunner
2023-02-05 15:49 ` [PATCH v6 7/7] PCI: Work around PCIe link training failures Maciej W. Rozycki
2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki
7 siblings, 1 reply; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.
New change in v6.
---
drivers/pci/hotplug/pciehp_hpc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
linux-pcie-link-active-reporting-hpc.diff
Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
@@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
- u32 slot_cap, slot_cap2, link_cap;
+ u32 slot_cap, slot_cap2;
u8 poweron;
struct pci_dev *pdev = dev->port;
struct pci_bus *subordinate = pdev->subordinate;
@@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
if (dmi_first_match(inband_presence_disabled_dmi_table))
ctrl->inband_presence_disabled = 1;
- /* Check if Data Link Layer Link Active Reporting is implemented */
- pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
-
/* Clear all remaining event bits in Slot Status register. */
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
@@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
- FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
+ FLAG(pdev->link_active_reporting, true),
pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
/*
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'
2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
@ 2023-02-13 13:53 ` Lukas Wunner
0 siblings, 0 replies; 15+ messages in thread
From: Lukas Wunner @ 2023-02-13 13:53 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Alex Williamson, Mika Westerberg,
Stefan Roese, Jim Wilson, David Abdurachmanov, Pali Rohár,
linux-pci, linuxppc-dev, linux-rdma, netdev, linux-kernel
On Sun, Feb 05, 2023 at 03:49:21PM +0000, Maciej W. Rozycki wrote:
> Use `link_active_reporting' to determine whether Data Link Layer Link
> Active Reporting is available rather than re-retrieving the capability.
>
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
I believe this should work without the preceding patches in the series,
hence can be applied independently.
Thanks,
Lukas
> drivers/pci/hotplug/pciehp_hpc.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> linux-pcie-link-active-reporting-hpc.diff
> Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> @@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
> struct controller *pcie_init(struct pcie_device *dev)
> {
> struct controller *ctrl;
> - u32 slot_cap, slot_cap2, link_cap;
> + u32 slot_cap, slot_cap2;
> u8 poweron;
> struct pci_dev *pdev = dev->port;
> struct pci_bus *subordinate = pdev->subordinate;
> @@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
> if (dmi_first_match(inband_presence_disabled_dmi_table))
> ctrl->inband_presence_disabled = 1;
>
> - /* Check if Data Link Layer Link Active Reporting is implemented */
> - pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
> -
> /* Clear all remaining event bits in Slot Status register. */
> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> @@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
> FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
> FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
> FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
> - FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
> + FLAG(pdev->link_active_reporting, true),
> pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
>
> /*
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v6 7/7] PCI: Work around PCIe link training failures
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
` (5 preceding siblings ...)
2023-02-05 15:49 ` [PATCH v6 6/7] PCI: pciehp: " Maciej W. Rozycki
@ 2023-02-05 15:49 ` Maciej W. Rozycki
2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki
7 siblings, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-05 15:49 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.
Instead the link continues oscillating between the two speeds, at the
rate of 34-35 times per second, with link training reported repeatedly
active ~84% of the time. Forcibly limiting the target link speed to
2.5GT/s with the upstream ASM2824 device however makes the two switches
communicate correctly. Removing the speed restriction afterwards makes
the two devices switch to 5.0GT/s then.
Make use of these observations then and detect the inability to train
the link, by checking for the Data Link Layer Link Active status bit
being off while the Link Bandwidth Management Status indicating that
hardware has changed the link speed or width in an attempt to correct
unreliable link operation.
Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this
turns out successful, then lift the restriction, letting the devices
negotiate a higher speed.
Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards, currently the ASM2824 only, that already report their data
link being up.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/lkml/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
---
Changes from v5:
- Move from a quirk into PCI core and call at device probing, hot-plug,
reset and resume. Keep the ASMedia part under CONFIG_PCI_QUIRKS.
- Rely on `dev->link_active_reporting' rather than re-retrieving the
capability.
Changes from v4:
- Remove <linux/bug.h> inclusion no longer needed.
- Make the quirk generic based on probing device features rather than
specific to the ASM2824 part only; take the Retrain Link bit erratum
into account.
- Still lift the 2.5GT/s speed restriction with the ASM2824 only.
- Increase retrain timeout from 200ms to 1s (PCIE_LINK_RETRAIN_TIMEOUT).
- Remove retrain success notification.
- Use PCIe helpers rather than generic PCI functions throughout.
- Trim down and update the wording of the change description for the
switch from an ASM2824-specific to a generic fixup.
Changes from v3:
- Remove the <linux/pci_ids.h> entry for the ASM2824.
Changes from v2:
- Regenerate for 5.17-rc2 for a merge conflict.
- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
early return.
Changes from v1:
- Regenerate for a merge conflict.
---
drivers/pci/pci.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++--
drivers/pci/pci.h | 1
drivers/pci/probe.c | 2
3 files changed, 152 insertions(+), 5 deletions(-)
linux-pcie-asm2824-manual-retrain.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -859,6 +859,132 @@ int pci_wait_for_pending(struct pci_dev
return 0;
}
+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time. Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however. And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation. For a port that has been left unconnected
+ * both bits will be clear. So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Since the exact nature of the problem isn't known and in principle this
+ * could trigger where an ASM2824 device is downstream rather upstream,
+ * apply this erratum workaround to any downstream ports as long as they
+ * support Link Active reporting and have the Link Control 2 register.
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up.
+ *
+ * If this turns out successful and we know by the Vendor:Device ID it is
+ * safe to do so, then lift the restriction, letting the devices negotiate
+ * a higher speed. Also check for a similar 2.5GT/s speed restriction the
+ * firmware may have already arranged and lift it with ports that already
+ * report their data link being up.
+ *
+ * Return 0 if the link has been successfully retrained, otherwise -1.
+ */
+int pcie_downstream_link_retrain(struct pci_dev *dev)
+{
+ static const struct pci_device_id ids[] = {
+ { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
+ {}
+ };
+ u16 lnksta, lnkctl2;
+
+ if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
+ !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
+ return -1;
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+ PCI_EXP_LNKSTA_LBMS) {
+ unsigned long timeout;
+ u16 lnkctl;
+
+ pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+ /*
+ * Due to an erratum in some devices the Retrain Link bit
+ * needs to be cleared again manually to allow the link
+ * training to succeed.
+ */
+ lnkctl &= ~PCI_EXP_LNKCTL_RL;
+ if (dev->clear_retrain_link)
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
+ lnkctl);
+
+ timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
+ do {
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
+ &lnksta);
+ if (lnksta & PCI_EXP_LNKSTA_DLLLA)
+ break;
+ usleep_range(10000, 20000);
+ } while (time_before(jiffies, timeout));
+
+ if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
+ pci_info(dev, "retraining failed\n");
+ return -1;
+ }
+ }
+
+ if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+ pci_match_id(ids, dev)) {
+ u32 lnkcap;
+ u16 lnkctl;
+
+ pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+ }
+
+ return 0;
+}
+
+/* Same as above, but called for a downstream device. */
+static int pcie_upstream_link_retrain(struct pci_dev *dev)
+{
+ struct pci_dev *bridge;
+
+ bridge = pci_upstream_bridge(dev);
+ if (bridge)
+ return pcie_downstream_link_retrain(bridge);
+ else
+ return -1;
+}
+
static int pci_acs_enable;
/**
@@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus)
static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
{
+ int retrain = 0;
int delay = 1;
- u32 id;
/*
* After reset, the device should not silently discard config
@@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev *
* Command register instead of Vendor ID so we don't have to
* contend with the CRS SV value.
*/
- pci_read_config_dword(dev, PCI_COMMAND, &id);
- while (PCI_POSSIBLE_ERROR(id)) {
+ for (;;) {
+ u32 id;
+
+ pci_read_config_dword(dev, PCI_COMMAND, &id);
+ if (!PCI_POSSIBLE_ERROR(id)) {
+ if (delay > PCI_RESET_WAIT)
+ pci_info(dev, "ready %dms after %s\n",
+ delay - 1, reset_type);
+ break;
+ }
+
if (delay > timeout) {
pci_warn(dev, "not ready %dms after %s; giving up\n",
delay - 1, reset_type);
return -ENOTTY;
}
- if (delay > PCI_RESET_WAIT)
+ if (delay > PCI_RESET_WAIT) {
+ if (!retrain) {
+ retrain = 1;
+ if (pcie_upstream_link_retrain(dev) == 0) {
+ delay = 1;
+ continue;
+ }
+ }
pci_info(dev, "not ready %dms after %s; waiting\n",
delay - 1, reset_type);
+ }
msleep(delay);
delay *= 2;
- pci_read_config_dword(dev, PCI_COMMAND, &id);
}
if (delay > PCI_RESET_WAIT)
@@ -4901,6 +5043,8 @@ static bool pcie_wait_for_link_delay(str
msleep(10);
timeout -= 10;
}
+ if (active && !ret)
+ ret = pcie_downstream_link_retrain(pdev) == 0;
if (active && ret)
msleep(delay);
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -37,6 +37,7 @@ int pci_mmap_fits(struct pci_dev *pdev,
enum pci_mmap_api mmap_api);
bool pci_reset_supported(struct pci_dev *dev);
+int pcie_downstream_link_retrain(struct pci_dev *dev);
void pci_init_reset_methods(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -2546,6 +2546,8 @@ void pci_device_add(struct pci_dev *dev,
dma_set_max_seg_size(&dev->dev, 65536);
dma_set_seg_boundary(&dev->dev, 0xffffffff);
+ pcie_downstream_link_retrain(dev);
+
/* Fix up broken headers */
pci_fixup_device(pci_fixup_header, dev);
^ permalink raw reply [flat|nested] 15+ messages in thread* [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-05 15:48 [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
` (6 preceding siblings ...)
2023-02-05 15:49 ` [PATCH v6 7/7] PCI: Work around PCIe link training failures Maciej W. Rozycki
@ 2023-02-19 18:52 ` Maciej W. Rozycki
2023-02-19 19:46 ` Lukas Wunner
7 siblings, 1 reply; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-19 18:52 UTC (permalink / raw)
To: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Saeed Mahameed, Leon Romanovsky, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni
Cc: Alex Williamson, Lukas Wunner, Mika Westerberg, Stefan Roese,
Jim Wilson, David Abdurachmanov, Pali Rohár, linux-pci,
linuxppc-dev, linux-rdma, netdev, linux-kernel
On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:
> This is v6 of the change to work around a PCIe link training phenomenon
> where a pair of devices both capable of operating at a link speed above
> 2.5GT/s seems unable to negotiate the link speed and continues training
> indefinitely with the Link Training bit switching on and off repeatedly
> and the data link layer never reaching the active state.
Ping for:
<https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2302022022230.45310@angie.orcam.me.uk/>.
Maciej
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-19 18:52 ` [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 " Maciej W. Rozycki
@ 2023-02-19 19:46 ` Lukas Wunner
2023-02-21 21:46 ` Pali Rohár
2023-02-22 11:54 ` Maciej W. Rozycki
0 siblings, 2 replies; 15+ messages in thread
From: Lukas Wunner @ 2023-02-19 19:46 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Bjorn Helgaas, Nicholas Piggin, Christophe Leroy, Saeed Mahameed,
Leon Romanovsky, Alex Williamson, Stefan Roese, Jim Wilson,
David Abdurachmanov, Pali Rohár, linux-pci,
Philipp Rosenberger
[+cc Philipp]
On Sun, Feb 19, 2023 at 06:52:21PM +0000, Maciej W. Rozycki wrote:
> On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:
> > This is v6 of the change to work around a PCIe link training phenomenon
> > where a pair of devices both capable of operating at a link speed above
> > 2.5GT/s seems unable to negotiate the link speed and continues training
> > indefinitely with the Link Training bit switching on and off repeatedly
> > and the data link layer never reaching the active state.
>
> Ping for:
> <https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2302022022230.45310@angie.orcam.me.uk/>.
Philipp is witnessing similar issues with a Pericom PI7C9X2G404EL
switch below a Broadcom STB host controller: On some rare occasions,
when booting the system the link trains correctly at 5 GT/s and the
switch is accessible without any issues. But most of the time,
the switch is inaccessible on boot. The Broadcom STB host controller
claims not to support Link Active Reporting, but in reality has a
link status indicator in a custom register. It indicates that the
link is up, the link trains to 2.5 GT/s but the switch is inaccessible.
Due to a quirk of the Broadcom STB host controller, ECAM access to
the inaccessible switch raises an unhandled CPU exception and thus
causes a kernel panic, making the issue difficult to debug.
The switch works fine 100% when plugged into a TI Sitara AM64 board
(contains a DesignWare-derived PCIe host controller).
The switch is the same as yours, only with 4 instead of 3 ports.
Perhaps the issue you're seeing isn't specific to the ASMedia switch,
but is due to an oddity of the Pericom switch, that happens to be
triggered by the Broadcom STB host controller as well?
I've cooked up a modified version of patch 7 in your series which
performs the link retraining in the pci-brcmstb.c driver before
performing the first access to the switch. Unfortunately it
didn't result in any kind of improvement. Next step is to hook up
a Teledyne T28 analyzer to see what's going on.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-19 19:46 ` Lukas Wunner
@ 2023-02-21 21:46 ` Pali Rohár
2023-02-22 8:40 ` Lukas Wunner
2023-02-22 11:54 ` Maciej W. Rozycki
1 sibling, 1 reply; 15+ messages in thread
From: Pali Rohár @ 2023-02-21 21:46 UTC (permalink / raw)
To: Lukas Wunner
Cc: Maciej W. Rozycki, Bjorn Helgaas, Nicholas Piggin,
Christophe Leroy, Saeed Mahameed, Leon Romanovsky,
Alex Williamson, Stefan Roese, Jim Wilson, David Abdurachmanov,
linux-pci, Philipp Rosenberger
Hello!
On Sunday 19 February 2023 20:46:19 Lukas Wunner wrote:
> [+cc Philipp]
>
> On Sun, Feb 19, 2023 at 06:52:21PM +0000, Maciej W. Rozycki wrote:
> > On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:
> > > This is v6 of the change to work around a PCIe link training phenomenon
> > > where a pair of devices both capable of operating at a link speed above
> > > 2.5GT/s seems unable to negotiate the link speed and continues training
> > > indefinitely with the Link Training bit switching on and off repeatedly
> > > and the data link layer never reaching the active state.
> >
> > Ping for:
> > <https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2302022022230.45310@angie.orcam.me.uk/>.
>
> Philipp is witnessing similar issues with a Pericom PI7C9X2G404EL
> switch below a Broadcom STB host controller: On some rare occasions,
> when booting the system the link trains correctly at 5 GT/s and the
> switch is accessible without any issues. But most of the time,
> the switch is inaccessible on boot. The Broadcom STB host controller
> claims not to support Link Active Reporting, but in reality has a
> link status indicator in a custom register. It indicates that the
> link is up, the link trains to 2.5 GT/s but the switch is inaccessible.
This is interesting. Do you know which layer it indicates that is up?
I can image that PCIe physical layer or data link layer is up but
PCIe transaction layer not yet up and so sending config requests fail.
Existence of custom register may explain that it indicates different
"link up" meaning.
> Due to a quirk of the Broadcom STB host controller, ECAM access to
> the inaccessible switch raises an unhandled CPU exception and thus
> causes a kernel panic, making the issue difficult to debug.
Is this ARM Cortex A53 core and unhandled exception is asynchronous one
with syndrome 0xbf000002?
> The switch works fine 100% when plugged into a TI Sitara AM64 board
> (contains a DesignWare-derived PCIe host controller).
It is really DesignWare? I had an impression that TI uses PCIe IPs from
Cadence, not from DesignWare. And Cadence controllers behave in some
cases different from Designware controllers.
> The switch is the same as yours, only with 4 instead of 3 ports.
> Perhaps the issue you're seeing isn't specific to the ASMedia switch,
> but is due to an oddity of the Pericom switch, that happens to be
> triggered by the Broadcom STB host controller as well?
>
> I've cooked up a modified version of patch 7 in your series which
> performs the link retraining in the pci-brcmstb.c driver before
> performing the first access to the switch. Unfortunately it
> didn't result in any kind of improvement. Next step is to hook up
> a Teledyne T28 analyzer to see what's going on.
Can you use Teledyne T28 for debugging this issue? Because this is
something which can finally show what is happing there.
I would really like to see what switch and controller are sending that
they can result in such buggy and incredible state.
> Thanks,
>
> Lukas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-21 21:46 ` Pali Rohár
@ 2023-02-22 8:40 ` Lukas Wunner
2023-02-22 9:17 ` Pali Rohár
0 siblings, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2023-02-22 8:40 UTC (permalink / raw)
To: Pali Rohár
Cc: Maciej W. Rozycki, Bjorn Helgaas, Nicholas Piggin,
Christophe Leroy, Saeed Mahameed, Leon Romanovsky,
Alex Williamson, Stefan Roese, Jim Wilson, David Abdurachmanov,
linux-pci, Philipp Rosenberger
On Tue, Feb 21, 2023 at 10:46:11PM +0100, Pali Rohár wrote:
> On Sunday 19 February 2023 20:46:19 Lukas Wunner wrote:
> > > On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:
> > > > This is v6 of the change to work around a PCIe link training phenomenon
> > > > where a pair of devices both capable of operating at a link speed above
> > > > 2.5GT/s seems unable to negotiate the link speed and continues training
> > > > indefinitely with the Link Training bit switching on and off repeatedly
> > > > and the data link layer never reaching the active state.
> >
> > Philipp is witnessing similar issues with a Pericom PI7C9X2G404EL
> > switch below a Broadcom STB host controller: On some rare occasions,
> > when booting the system the link trains correctly at 5 GT/s and the
> > switch is accessible without any issues. But most of the time,
> > the switch is inaccessible on boot. The Broadcom STB host controller
> > claims not to support Link Active Reporting, but in reality has a
> > link status indicator in a custom register. It indicates that the
> > link is up, the link trains to 2.5 GT/s but the switch is inaccessible.
>
> This is interesting. Do you know which layer it indicates that is up?
> I can image that PCIe physical layer or data link layer is up but
> PCIe transaction layer not yet up and so sending config requests fail.
> Existence of custom register may explain that it indicates different
> "link up" meaning.
drivers/pci/controller/pcie-brcmstb.c defines the following bits:
#define PCIE_MISC_PCIE_STATUS_PCIE_PORT_MASK 0x80
#define PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK 0x20
#define PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK 0x10
#define PCIE_MISC_PCIE_STATUS_PCIE_LINK_IN_L23_MASK 0x40
And brcm_pcie_link_up() checks that both DL_ACTIVE and PHYLINKUP are set.
A public spec for the Broadcom STB PCIe controller does not seem to exist,
so I do not know what the register bits mean exactly.
> > Due to a quirk of the Broadcom STB host controller, ECAM access to
> > the inaccessible switch raises an unhandled CPU exception and thus
> > causes a kernel panic, making the issue difficult to debug.
>
> Is this ARM Cortex A53 core and unhandled exception is asynchronous one
> with syndrome 0xbf000002?
It's a Cortex A72 and yes the exception looks like this:
SError Interrupt on CPU1, code 0x00000000bf000002 -- SError
I was wondering why we're not checking in the exception handler whether
the accessed address is in ECAM space, and just return from the handler
since such exceptions could be handled by returning "all ones" in
software from the PCI core.
Then again, perhaps there's a method to stop the controller from
raising an exception on ECAM access to an inaccessible device.
If such a method exists (e.g. some register bit), that would
obviously be preferred.
> > The switch works fine 100% when plugged into a TI Sitara AM64 board
> > (contains a DesignWare-derived PCIe host controller).
>
> It is really DesignWare? I had an impression that TI uses PCIe IPs from
> Cadence, not from DesignWare. And Cadence controllers behave in some
> cases different from Designware controllers.
You're right, I was mistaken, it's indeed a Cadence.
> > Next step is to hook up
> > a Teledyne T28 analyzer to see what's going on.
>
> Can you use Teledyne T28 for debugging this issue? Because this is
> something which can finally show what is happing there.
Yes it should be possible to debug this, the analyzer is capable of
logging the link training sequence and present it in a Wireshark-esque
interface.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-22 8:40 ` Lukas Wunner
@ 2023-02-22 9:17 ` Pali Rohár
0 siblings, 0 replies; 15+ messages in thread
From: Pali Rohár @ 2023-02-22 9:17 UTC (permalink / raw)
To: Lukas Wunner
Cc: Maciej W. Rozycki, Bjorn Helgaas, Nicholas Piggin,
Christophe Leroy, Saeed Mahameed, Leon Romanovsky,
Alex Williamson, Stefan Roese, Jim Wilson, David Abdurachmanov,
linux-pci, Philipp Rosenberger
On Wednesday 22 February 2023 09:40:33 Lukas Wunner wrote:
> On Tue, Feb 21, 2023 at 10:46:11PM +0100, Pali Rohár wrote:
> > On Sunday 19 February 2023 20:46:19 Lukas Wunner wrote:
> > > > On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:
> > > > > This is v6 of the change to work around a PCIe link training phenomenon
> > > > > where a pair of devices both capable of operating at a link speed above
> > > > > 2.5GT/s seems unable to negotiate the link speed and continues training
> > > > > indefinitely with the Link Training bit switching on and off repeatedly
> > > > > and the data link layer never reaching the active state.
> > >
> > > Philipp is witnessing similar issues with a Pericom PI7C9X2G404EL
> > > switch below a Broadcom STB host controller: On some rare occasions,
> > > when booting the system the link trains correctly at 5 GT/s and the
> > > switch is accessible without any issues. But most of the time,
> > > the switch is inaccessible on boot. The Broadcom STB host controller
> > > claims not to support Link Active Reporting, but in reality has a
> > > link status indicator in a custom register. It indicates that the
> > > link is up, the link trains to 2.5 GT/s but the switch is inaccessible.
> >
> > This is interesting. Do you know which layer it indicates that is up?
> > I can image that PCIe physical layer or data link layer is up but
> > PCIe transaction layer not yet up and so sending config requests fail.
> > Existence of custom register may explain that it indicates different
> > "link up" meaning.
>
> drivers/pci/controller/pcie-brcmstb.c defines the following bits:
>
> #define PCIE_MISC_PCIE_STATUS_PCIE_PORT_MASK 0x80
> #define PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK 0x20
> #define PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK 0x10
> #define PCIE_MISC_PCIE_STATUS_PCIE_LINK_IN_L23_MASK 0x40
>
> And brcm_pcie_link_up() checks that both DL_ACTIVE and PHYLINKUP are set.
>
> A public spec for the Broadcom STB PCIe controller does not seem to exist,
> so I do not know what the register bits mean exactly.
Ok, so then this is question for Broadcom. Without spec we probably
cannot do anything.
>
> > > Due to a quirk of the Broadcom STB host controller, ECAM access to
> > > the inaccessible switch raises an unhandled CPU exception and thus
> > > causes a kernel panic, making the issue difficult to debug.
> >
> > Is this ARM Cortex A53 core and unhandled exception is asynchronous one
> > with syndrome 0xbf000002?
>
> It's a Cortex A72 and yes the exception looks like this:
>
> SError Interrupt on CPU1, code 0x00000000bf000002 -- SError
This is core specific exception and for A53 it means AXI Slave Error.
I guess that on A72 it could mean too. But generally for Aarch64 they
are not defined.
> I was wondering why we're not checking in the exception handler whether
> the accessed address is in ECAM space, and just return from the handler
> since such exceptions could be handled by returning "all ones" in
> software from the PCI core.
We are not checking them because it is asynchronous exception. You do
not know who, when and why caused this exception. Exception may come
also after executing lot of other instructions. It is non-recoverable
fatal exception and it means something like internal core error. The
only reasonable thing to do is to reset CPU.
CPU should not receive AXI Slave Error under non-fatal condition and it
is basically bug in that PCIe controller that it sends such thing to
the CPU core. PCIe controller should for ECAM load/store operations
always returns AXI Slave OK response and on error it should set all-ones
in data part.
The proper way is to find out if PCIe controller does not have some
hidden or debug register which allows to disable sending these errors.
IIRC DesignWare has it, so there is big chance that Broadcom has it too.
But for example Cadence does not have it (yet).
> Then again, perhaps there's a method to stop the controller from
> raising an exception on ECAM access to an inaccessible device.
> If such a method exists (e.g. some register bit), that would
> obviously be preferred.
Other way is map ECAM address memory space in strong ordering mode.
It would cause CPU core to wait during executing of load / store
operation after they completely finish and then AXI Slave Error is
reported as synchronous exception as Data Abort, which is possible.
On ARMv7 it was possible by marking mapping as Device. On ARMv8 it
should be possible too, but on A53 it is unimplemented. Maybe possible
on A72? But it is kind like a hack to workaround total mess.
I will send separate email to Broadcom people who already helped with
one of their PCIe controller in the past. Maybe there is hidden register
debug bit which can turn it off.
>
> > > The switch works fine 100% when plugged into a TI Sitara AM64 board
> > > (contains a DesignWare-derived PCIe host controller).
> >
> > It is really DesignWare? I had an impression that TI uses PCIe IPs from
> > Cadence, not from DesignWare. And Cadence controllers behave in some
> > cases different from Designware controllers.
>
> You're right, I was mistaken, it's indeed a Cadence.
>
>
> > > Next step is to hook up
> > > a Teledyne T28 analyzer to see what's going on.
> >
> > Can you use Teledyne T28 for debugging this issue? Because this is
> > something which can finally show what is happing there.
>
> Yes it should be possible to debug this, the analyzer is capable of
> logging the link training sequence and present it in a Wireshark-esque
> interface.
>
> Thanks,
>
> Lukas
The main issue is that nobody had analyzer for doing it, it is not a
cheap device.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures
2023-02-19 19:46 ` Lukas Wunner
2023-02-21 21:46 ` Pali Rohár
@ 2023-02-22 11:54 ` Maciej W. Rozycki
1 sibling, 0 replies; 15+ messages in thread
From: Maciej W. Rozycki @ 2023-02-22 11:54 UTC (permalink / raw)
To: Lukas Wunner
Cc: Bjorn Helgaas, Nicholas Piggin, Christophe Leroy, Saeed Mahameed,
Leon Romanovsky, Alex Williamson, Stefan Roese, Jim Wilson,
David Abdurachmanov, Pali Rohár, linux-pci,
Philipp Rosenberger
On Sun, 19 Feb 2023, Lukas Wunner wrote:
> > > This is v6 of the change to work around a PCIe link training phenomenon
> > > where a pair of devices both capable of operating at a link speed above
> > > 2.5GT/s seems unable to negotiate the link speed and continues training
> > > indefinitely with the Link Training bit switching on and off repeatedly
> > > and the data link layer never reaching the active state.
> >
> > Ping for:
> > <https://lore.kernel.org/linux-pci/alpine.DEB.2.21.2302022022230.45310@angie.orcam.me.uk/>.
>
> Philipp is witnessing similar issues with a Pericom PI7C9X2G404EL
> switch below a Broadcom STB host controller: On some rare occasions,
> when booting the system the link trains correctly at 5 GT/s and the
> switch is accessible without any issues. But most of the time,
> the switch is inaccessible on boot. The Broadcom STB host controller
> claims not to support Link Active Reporting, but in reality has a
> link status indicator in a custom register. It indicates that the
> link is up, the link trains to 2.5 GT/s but the switch is inaccessible.
Thank you for chiming in.
Note that the U-Boot variant of this workaround referred to by the link
in the change description does not rely on Link Active Reporting, but
busy-loops polling on Link Training status instead and verifies training
remains stable off long enough. We can't do this in Linux of course, but
I guess a variant using link status reported in a vendor-specific register
could be made.
> The switch is the same as yours, only with 4 instead of 3 ports.
> Perhaps the issue you're seeing isn't specific to the ASMedia switch,
> but is due to an oddity of the Pericom switch, that happens to be
> triggered by the Broadcom STB host controller as well?
I have seen two reports from people claiming their devices to be absent
from `lspci' output, an Intel and a Realtek NIC respectively, when plugged
into a slot behing the ASMedia switch, while working just fine in another
system. Neither replied to my request for further information, so it's
not clear to me if it's the same issue, but it may or may not be limited
to Pericom hardware.
As I mentioned in previous iterations of the change there is an option
card available with the ASM2824 switch onboard and dual M.2 slots behind,
which could be used for experimenting. Sold under the StarTech brand as
PEX8M2E2 and under the Ableconn brand as PEXM2-130. And M.2 M-Key to PCIe
slot adapters are widely available. I just can't be persuaded to buy such
an option card, especially as ASMedia have been unhelpful and ignored my
enquiry.
> I've cooked up a modified version of patch 7 in your series which
> performs the link retraining in the pci-brcmstb.c driver before
> performing the first access to the switch. Unfortunately it
> didn't result in any kind of improvement. Next step is to hook up
> a Teledyne T28 analyzer to see what's going on.
That would be most helpful, although given your lack of success with my
workaround it might be a different issue here after all.
Let me know if I could help anyhow. I have a few of these Pericom-based
riser card adapters as spares, though I'm short on high-speed (5GT/s+)
PCIe slots to try them with. Delock has claimed, back in Jul 2021, I'm
the only one to report them the device not to work.
Maciej
^ permalink raw reply [flat|nested] 15+ messages in thread