* CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine @ 2017-01-11 19:49 Uwe Kleine-König 2017-01-11 22:02 ` Bjorn Helgaas 2017-01-17 15:14 ` Bjorn Helgaas 0 siblings, 2 replies; 18+ messages in thread From: Uwe Kleine-König @ 2017-01-11 19:49 UTC (permalink / raw) To: linux-pci; +Cc: Thomas Petazzoni, Russell King, linux-arm-kernel, Andrew Lunn [-- Attachment #1.1: Type: text/plain, Size: 735 bytes --] Hello, on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM is enabled: # dmesg | grep ath [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3 [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110 [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110 if however PCIEASPM is off, the driver probes correctly and the ath10k adapter works fine. I wonder if someone has an idea what needs to be done to fix this problem. (OK, I could disable PCIEASPM, but I'd like to have a solution for a distribution kernel where I think PCIEASPM=y is sensible in general.) Best regards Uwe [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König @ 2017-01-11 22:02 ` Bjorn Helgaas 2017-01-12 13:18 ` Uwe Kleine-König 2017-01-17 15:14 ` Bjorn Helgaas 1 sibling, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-11 22:02 UTC (permalink / raw) To: Uwe Kleine-König Cc: Thomas Petazzoni, linux-pci, Andrew Lunn, linux-arm-kernel, Russell King Hi Uwe, On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K=F6nig wrote: > Hello, > = > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM > is enabled: > = > # dmesg | grep ath > [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, cu= rrently in D3 > [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110 > [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110 > = > if however PCIEASPM is off, the driver probes correctly and the ath10k > adapter works fine. > = > I wonder if someone has an idea what needs to be done to fix this > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution > for a distribution kernel where I think PCIEASPM=3Dy is sensible in > general.) PCIEASPM=3Dy is definitely sensible and disabling ASPM is OK for a workaround but is not a fix. We have several open issues related to ASPM: https://bugzilla.kernel.org/show_bug.cgi?id=3D102311 ASPM: ASMEDA asm1062= not working https://bugzilla.kernel.org/show_bug.cgi?id=3D187731 Null pointer derefer= ence in ASPM https://bugzilla.kernel.org/show_bug.cgi?id=3D189951 Enabling ASPM causes= NIC performance issue https://bugzilla.kernel.org/show_bug.cgi?id=3D60111 NULL pointer deref in= ASPM alloc_pcie_link_state() I don't recognize yours as being one of these. Can you open a new issue and attach the complete dmesg log and "lspci -vv" output? Is this a regression? Bjorn _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-11 22:02 ` Bjorn Helgaas @ 2017-01-12 13:18 ` Uwe Kleine-König 2017-01-12 15:03 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Uwe Kleine-König @ 2017-01-12 13:18 UTC (permalink / raw) To: Bjorn Helgaas Cc: Thomas Petazzoni, linux-pci, Andrew Lunn, linux-arm-kernel, Russell King [-- Attachment #1.1.1: Type: text/plain, Size: 1068 bytes --] On 01/11/2017 11:02 PM, Bjorn Helgaas wrote: > Hi Uwe, > > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-König wrote: >> Hello, >> >> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the >> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM >> is enabled: >> >> [...] > We have several open issues related to ASPM: > > https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working > https://bugzilla.kernel.org/show_bug.cgi?id=187731 Null pointer dereference in ASPM > https://bugzilla.kernel.org/show_bug.cgi?id=189951 Enabling ASPM causes NIC performance issue > https://bugzilla.kernel.org/show_bug.cgi?id=60111 NULL pointer deref in ASPM alloc_pcie_link_state() > > I don't recognize yours as being one of these. Can you open a new > issue and attach the complete dmesg log and "lspci -vv" output? Done: https://bugzilla.kernel.org/show_bug.cgi?id=192441 > Is this a regression? As written in the bug report this also happens on 4.7. Best regards Uwe [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-12 13:18 ` Uwe Kleine-König @ 2017-01-12 15:03 ` Bjorn Helgaas 2017-01-12 15:24 ` Andrew Lunn 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-12 15:03 UTC (permalink / raw) To: Uwe Kleine-König Cc: Thomas Petazzoni, linux-pci, Russell King, linux-arm-kernel, Andrew Lunn On Thu, Jan 12, 2017 at 02:18:46PM +0100, Uwe Kleine-K=F6nig wrote: > On 01/11/2017 11:02 PM, Bjorn Helgaas wrote: > > Hi Uwe, > > = > > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K=F6nig wrote: > >> Hello, > >> > >> on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > >> ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM > >> is enabled: > >> > >> [...] > > We have several open issues related to ASPM: > > = > > https://bugzilla.kernel.org/show_bug.cgi?id=3D102311 ASPM: ASMEDA asm= 1062 not working > > https://bugzilla.kernel.org/show_bug.cgi?id=3D187731 Null pointer der= eference in ASPM > > https://bugzilla.kernel.org/show_bug.cgi?id=3D189951 Enabling ASPM ca= uses NIC performance issue > > https://bugzilla.kernel.org/show_bug.cgi?id=3D60111 NULL pointer dere= f in ASPM alloc_pcie_link_state() > > = > > I don't recognize yours as being one of these. Can you open a new > > issue and attach the complete dmesg log and "lspci -vv" output? > = > Done: https://bugzilla.kernel.org/show_bug.cgi?id=3D192441 Thanks! Can you attach a dmesg with CONFIG_PCIEASPM turned off, too? There are several interesting things going on with that ath10k device, and not all of them seem ASPM-related: pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] pci 0000:02:00.0: reg 0x10: [mem 0xe8000000-0xe81fffff 64bit] pci 0000:02:00.0: reg 0x30: [mem 0xe8200000-0xe820ffff pref] pci 0000:02:00.0: of_irq_parse_pci() failed with rc=3D134 pci 0000:02:00.0: BAR 0: assigned [mem 0xe0000000-0xe01fffff 64bit] pci 0000:02:00.0: BAR 0: error updating (0xe0000004 !=3D 0xffffffff) pci 0000:02:00.0: BAR 0: error updating (high 0x000000 !=3D 0xffffffff) 1) We found BAR 0 (reg 0x10) with 0xe8000000, so firmware probably programmed it, and it probably works there. 2) The host bridge window doesn't include that BAR 0 space. Unfortunately I don't think we print the initial 00:02.0 bridge window leading to bus 02; we only print the new window we assign to it. 3) No idea what the of_irq_parse_pci() issue is. 4) No idea why the BAR 0 update failed. Maybe a Marvell config accessor problem? I don't see any connection between these and ASPM, so I'm curious why things work with CONFIG_PCIEASPM turned off. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-12 15:03 ` Bjorn Helgaas @ 2017-01-12 15:24 ` Andrew Lunn 0 siblings, 0 replies; 18+ messages in thread From: Andrew Lunn @ 2017-01-12 15:24 UTC (permalink / raw) To: Bjorn Helgaas Cc: Uwe Kleine-König, Thomas Petazzoni, linux-pci, linux-arm-kernel, Russell King > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff] > pci 0000:02:00.0: reg 0x10: [mem 0xe8000000-0xe81fffff 64bit] > pci 0000:02:00.0: reg 0x30: [mem 0xe8200000-0xe820ffff pref] > pci 0000:02:00.0: of_irq_parse_pci() failed with rc=134 > pci 0000:02:00.0: BAR 0: assigned [mem 0xe0000000-0xe01fffff 64bit] > pci 0000:02:00.0: BAR 0: error updating (0xe0000004 != 0xffffffff) > pci 0000:02:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) > > 3) No idea what the of_irq_parse_pci() issue is. 134 is 0x86. Could it be: #define PCIBIOS_DEVICE_NOT_FOUND 0x86 pci-mvebu.c will return this in a few places, mvebu_pcie_wr_conf(), mvebu_pcie_rd_conf(). Could this be rc = pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin); It looks like pci_read_config_byte() is expected to return a real errno value, and maybe it is returning PCIBIOS_DEVICE_NOT_FOUND? Andrew ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König 2017-01-11 22:02 ` Bjorn Helgaas @ 2017-01-17 15:14 ` Bjorn Helgaas 2017-01-17 15:25 ` Russell King - ARM Linux 1 sibling, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-17 15:14 UTC (permalink / raw) To: Uwe Kleine-König Cc: Thomas Petazzoni, linux-pci, Andrew Lunn, linux-arm-kernel, Russell King On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K=F6nig wrote: > Hello, > = > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM > is enabled: > = > # dmesg | grep ath > [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, cu= rrently in D3 > [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110 > [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110 > = > if however PCIEASPM is off, the driver probes correctly and the ath10k > adapter works fine. > = > I wonder if someone has an idea what needs to be done to fix this > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution > for a distribution kernel where I think PCIEASPM=3Dy is sensible in > general.) Can somebody confirm that this system (Marvell Armada 385-based Turris Omnia) does actually support ASPM in hardware? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 15:14 ` Bjorn Helgaas @ 2017-01-17 15:25 ` Russell King - ARM Linux 2017-01-17 17:46 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-17 15:25 UTC (permalink / raw) To: Bjorn Helgaas Cc: Uwe Kleine-König, linux-pci, Thomas Petazzoni, linux-arm-kernel, Andrew Lunn On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote: > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-König wrote: > > Hello, > > > > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM > > is enabled: > > > > # dmesg | grep ath > > [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3 > > [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110 > > [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110 > > > > if however PCIEASPM is off, the driver probes correctly and the ath10k > > adapter works fine. > > > > I wonder if someone has an idea what needs to be done to fix this > > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution > > for a distribution kernel where I think PCIEASPM=y is sensible in > > general.) > > Can somebody confirm that this system (Marvell Armada 385-based Turris > Omnia) does actually support ASPM in hardware? What sort of "hardware" are you referring to? >From my reading of the specs, ASPM doesn't require any external hardware. It's all done inside the PCIe root hub and PCIe device. The PCIe spec specifically prohibits cutting power supplies and clocks to PCIe devices during L0s and L1, with the exception that the PCIe clock may be stopped in L1 if CLKREQ# is deasserted. CLKREQ# handling generally requires GPIO usage, and as there's no support for that, there's no support for stopping the PCIe clock in L1. We do the correct thing there, preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set. That all said, it would probably be a good idea to throw some printk() debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read() so we can see what's going on at that level, and maybe also some debug in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see what's happening at the PCIe device too. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 15:25 ` Russell King - ARM Linux @ 2017-01-17 17:46 ` Bjorn Helgaas 2017-01-17 17:51 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-17 17:46 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Thomas Petazzoni, linux-pci, linux-arm-kernel, Uwe Kleine-König, Andrew Lunn On Tue, Jan 17, 2017 at 03:25:49PM +0000, Russell King - ARM Linux wrote: > On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote: > > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-K=F6nig wrote: > > > Hello, > > > = > > > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > > > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEAS= PM > > > is enabled: > > > = > > > # dmesg | grep ath > > > [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state= , currently in D3 > > > [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -1= 10 > > > [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -1= 10 > > > = > > > if however PCIEASPM is off, the driver probes correctly and the ath10k > > > adapter works fine. > > > = > > > I wonder if someone has an idea what needs to be done to fix this > > > problem. (OK, I could disable PCIEASPM, but I'd like to have a soluti= on > > > for a distribution kernel where I think PCIEASPM=3Dy is sensible in > > > general.) > > = > > Can somebody confirm that this system (Marvell Armada 385-based Turris > > Omnia) does actually support ASPM in hardware? > = > What sort of "hardware" are you referring to? > = > From my reading of the specs, ASPM doesn't require any external hardware. > It's all done inside the PCIe root hub and PCIe device. Right. My question is just whether we know that the Marvell PCIe root hub hardware works correctly with respect to ASPM. The PCI core isn't doing anything special for Marvell, so problems here are likely to be either in the Marvell hardware or in the pci-mvebu.c driver. > The PCIe spec specifically prohibits cutting power supplies and clocks to > PCIe devices during L0s and L1, with the exception that the PCIe clock may > be stopped in L1 if CLKREQ# is deasserted. CLKREQ# handling generally > requires GPIO usage, and as there's no support for that, there's no suppo= rt > for stopping the PCIe clock in L1. We do the correct thing there, > preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set. > = > That all said, it would probably be a good idea to throw some printk() > debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read() > so we can see what's going on at that level, and maybe also some debug > in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see > what's happening at the PCIe device too. Uwe has already done that; the dmesg logs including this instrumentation are at https://bugzilla.kernel.org/show_bug.cgi?id=3D192441 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 17:46 ` Bjorn Helgaas @ 2017-01-17 17:51 ` Russell King - ARM Linux 2017-01-17 17:57 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-17 17:51 UTC (permalink / raw) To: Bjorn Helgaas Cc: Uwe Kleine-König, linux-pci, Thomas Petazzoni, linux-arm-kernel, Andrew Lunn On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote: > On Tue, Jan 17, 2017 at 03:25:49PM +0000, Russell King - ARM Linux wrote: > > On Tue, Jan 17, 2017 at 09:14:44AM -0600, Bjorn Helgaas wrote: > > > On Wed, Jan 11, 2017 at 08:49:46PM +0100, Uwe Kleine-König wrote: > > > > Hello, > > > > > > > > on an Marvell Armada 385 based machine (Turris Omnia) with 4.9 the > > > > ath10k driver fails to bind to the matching hardware if CONFIG_PCIEASPM > > > > is enabled: > > > > > > > > # dmesg | grep ath > > > > [ 7.207770] ath10k_pci 0000:02:00.0: Refused to change power state, currently in D3 > > > > [ 7.237955] ath10k_pci 0000:02:00.0: failed to wake up device : -110 > > > > [ 7.238146] ath10k_pci: probe of 0000:02:00.0 failed with error -110 > > > > > > > > if however PCIEASPM is off, the driver probes correctly and the ath10k > > > > adapter works fine. > > > > > > > > I wonder if someone has an idea what needs to be done to fix this > > > > problem. (OK, I could disable PCIEASPM, but I'd like to have a solution > > > > for a distribution kernel where I think PCIEASPM=y is sensible in > > > > general.) > > > > > > Can somebody confirm that this system (Marvell Armada 385-based Turris > > > Omnia) does actually support ASPM in hardware? > > > > What sort of "hardware" are you referring to? > > > > From my reading of the specs, ASPM doesn't require any external hardware. > > It's all done inside the PCIe root hub and PCIe device. > > Right. My question is just whether we know that the Marvell PCIe root > hub hardware works correctly with respect to ASPM. The PCI core isn't > doing anything special for Marvell, so problems here are likely to be > either in the Marvell hardware or in the pci-mvebu.c driver. > > > The PCIe spec specifically prohibits cutting power supplies and clocks to > > PCIe devices during L0s and L1, with the exception that the PCIe clock may > > be stopped in L1 if CLKREQ# is deasserted. CLKREQ# handling generally > > requires GPIO usage, and as there's no support for that, there's no support > > for stopping the PCIe clock in L1. We do the correct thing there, > > preventing the PCI_EXP_LNKCTL_CLKREQ_EN bit being set. > > > > That all said, it would probably be a good idea to throw some printk() > > debugging into mvebu_sw_pci_bridge_write() and mvebu_sw_pci_bridge_read() > > so we can see what's going on at that level, and maybe also some debug > > in mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() so we can see > > what's happening at the PCIe device too. > > Uwe has already done that; the dmesg logs including this > instrumentation are at > https://bugzilla.kernel.org/show_bug.cgi?id=192441 Grr, <swears about SSL incompatibilities>... wget's the URL and then uses elinks on it... Umm, not quite. He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() but not the bridge from the descriptions given on the attachments. Obviously, it's going to be a lot of work to manufacture the links to look at each attachment to thoroughly check, so I'm not going to do that given quite how broken SSL crap is today. (Try installing elinks and pointing it at the above URL.) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 17:51 ` Russell King - ARM Linux @ 2017-01-17 17:57 ` Russell King - ARM Linux 2017-01-17 18:14 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-17 17:57 UTC (permalink / raw) To: Bjorn Helgaas Cc: Thomas Petazzoni, linux-pci, linux-arm-kernel, Uwe Kleine-König, Andrew Lunn On Tue, Jan 17, 2017 at 05:51:16PM +0000, Russell King - ARM Linux wrote: > On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote: > > Uwe has already done that; the dmesg logs including this > > instrumentation are at > > https://bugzilla.kernel.org/show_bug.cgi?id=192441 > > Grr, <swears about SSL incompatibilities>... wget's the URL and then > uses elinks on it... > > Umm, not quite. He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() > but not the bridge from the descriptions given on the attachments. > Obviously, it's going to be a lot of work to manufacture the links to > look at each attachment to thoroughly check, so I'm not going to do > that given quite how broken SSL crap is today. > > (Try installing elinks and pointing it at the above URL.) Oh, and looking at some of the debug that's been added: [ 3.646322] mvebu_pcie_rd_conf(where=16, size=4, val=3892314116) => 0 [ 3.646325] mvebu_pcie_wr_conf(where=16, size=4, val=4294967295) [ 3.646329] mvebu_pcie_rd_conf(where=16, size=4, val=4292870148) => 0 [ 3.646332] mvebu_pcie_wr_conf(where=16, size=4, val=3892314116) Please print register values in HEX, not decimal. Same for register addresses. Hex is the normal base to print this information, which the human brain can easily comprehend and translate to bits in a register. Decimal values are useless and might as well be encrypted. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 17:57 ` Russell King - ARM Linux @ 2017-01-17 18:14 ` Bjorn Helgaas 2017-01-17 19:34 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-17 18:14 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Thomas Petazzoni, linux-pci, Uwe Kleine-König, linux-arm-kernel, Andrew Lunn On Tue, Jan 17, 2017 at 05:57:28PM +0000, Russell King - ARM Linux wrote: > On Tue, Jan 17, 2017 at 05:51:16PM +0000, Russell King - ARM Linux wrote: > > On Tue, Jan 17, 2017 at 11:46:49AM -0600, Bjorn Helgaas wrote: > > > Uwe has already done that; the dmesg logs including this > > > instrumentation are at > > > https://bugzilla.kernel.org/show_bug.cgi?id=192441 > > > > Grr, <swears about SSL incompatibilities>... wget's the URL and then > > uses elinks on it... > > > > Umm, not quite. He's done mvebu_pcie_hw_wr_conf() and mvebu_pcie_hw_rd_conf() > > but not the bridge from the descriptions given on the attachments. > > Obviously, it's going to be a lot of work to manufacture the links to > > look at each attachment to thoroughly check, so I'm not going to do > > that given quite how broken SSL crap is today. > > > > (Try installing elinks and pointing it at the above URL.) > > Oh, and looking at some of the debug that's been added: > > [ 3.646322] mvebu_pcie_rd_conf(where=16, size=4, val=3892314116) => 0 > [ 3.646325] mvebu_pcie_wr_conf(where=16, size=4, val=4294967295) > [ 3.646329] mvebu_pcie_rd_conf(where=16, size=4, val=4292870148) => 0 > [ 3.646332] mvebu_pcie_wr_conf(where=16, size=4, val=3892314116) > > Please print register values in HEX, not decimal. Same for register > addresses. Hex is the normal base to print this information, which > the human brain can easily comprehend and translate to bits in a > register. Decimal values are useless and might as well be encrypted. The instrumentation has evolved a bit since then. Latest is below (could still use improvement, but it does address your suggestions above): https://bugzilla.kernel.org/attachment.cgi?id=251691 (CONFIG_PCIEASPM=y) https://bugzilla.kernel.org/attachment.cgi?id=251701 (CONFIG_PCIEASPM not set) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 18:14 ` Bjorn Helgaas @ 2017-01-17 19:34 ` Russell King - ARM Linux 2017-01-17 21:02 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-17 19:34 UTC (permalink / raw) To: Bjorn Helgaas Cc: Thomas Petazzoni, linux-pci, linux-arm-kernel, Uwe Kleine-König, Andrew Lunn On Tue, Jan 17, 2017 at 12:14:58PM -0600, Bjorn Helgaas wrote: > The instrumentation has evolved a bit since then. Latest is below (could > still use improvement, but it does address your suggestions above): > > https://bugzilla.kernel.org/attachment.cgi?id=251691 (CONFIG_PCIEASPM=y) > https://bugzilla.kernel.org/attachment.cgi?id=251701 (CONFIG_PCIEASPM not set) Thanks. The point at which things die is when we request a link retrain - I've augmented the trace with the register names: pci 0000:02:00.0: rd where=0x074 size=4 val=0x8dc1 (hw) EXP_DEVCAP pcie_aspm_configure_common_clock(): pci 0000:02:00.0: rd where=0x082 size=2 val=0x1011 (hw) EXP_LNKSTA pci 0000:??:??.?: rd where=0x052 size=2 val=0x1011 (sw) EXP_LNKSTA pci 0000:02:00.0: rd where=0x080 size=2 val=0x0 (hw) EXP_LNKCTL pci 0000:02:00.0: wr where=0x080 size=2 val=0x40 (hw) EXP_LNKCTL Enables common clock configuration on the device. pci 0000:??:??.?: rd where=0x050 size=2 val=0x40 (sw) EXP_LNKCTL pci 0000:??:??.?: wr where=0x050 size=2 val=0x40 (sw) EXP_LNKCTL Common clock configuration is already enabled on the root. pci 0000:??:??.?: rd where=0x050 size=4 val=0x10110040 (sw) EXP_LNKCTL pci 0000:??:??.?: wr where=0x050 size=2 val=0x60 (sw) EXP_LNKCTL Here we request the train, setting bit 5 in the link control register. pci 0000:??:??.?: rd where=0x050 size=4 val=0x110040 (sw) EXP_LNKCTL pci 0000:??:??.?: rd where=0x052 size=2 val=0x811 (sw) EXP_LNKSTA pci 0000:??:??.?: rd where=0x052 size=2 val=0x811 (sw) EXP_LNKSTA Waiting for the link training bit to clear... pci 0000:??:??.?: rd where=0x052 size=2 val=0x11 (sw) EXP_LNKSTA and it's cleared here - but note that the link is still down. pci 0000:??:??.?: rd where=0x04c size=4 val=0x3ac12 (sw) EXP_LNKCAP pci 0000:??:??.?: rd where=0x050 size=2 val=0x40 (sw) EXP_LNKCTL pcie_get_aspm_reg() for the root. pci 0000:02:00.0: rd where=0x07c size=4 val=0xffffffff (no link) pcie_get_aspm_reg() for the device (fails). So, I think the question is... why does asking for a retrain cause the link to fail and never recover? Uwe, can you try: setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ 0x50.w=0x60 and see whether it remains alive (you can check by reading the root register 0x52.w - bit 12 should be set once bit 11 clears again. If that's successful, maybe setting the common clock bit on the PCIe device is what's causing the problem, in which case: setpci -s 02:00.0 0x80.w=0x40 setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ 0x50.w=0x60 I would imagine would cause the link to go down. So, the question this gives us is why the common clock setup is not working on your platform. Maybe we need to source the SLC bit in the link status from DT, though I'd like to understand what's going on here more first. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 19:34 ` Russell King - ARM Linux @ 2017-01-17 21:02 ` Russell King - ARM Linux 2017-01-17 22:22 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-17 21:02 UTC (permalink / raw) To: Bjorn Helgaas Cc: Thomas Petazzoni, linux-pci, Uwe Kleine-König, linux-arm-kernel, Andrew Lunn On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote: > Uwe, can you try: > > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > 0x50.w=0x60 > > and see whether it remains alive (you can check by reading the root > register 0x52.w - bit 12 should be set once bit 11 clears again. For reference, this I got wrong... 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down). > If that's successful, maybe setting the common clock bit on the PCIe > device is what's causing the problem, in which case: > > setpci -s 02:00.0 0x80.w=0x40 > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > 0x50.w=0x60 Having worked with Uwe over IRC, it seems that any request to retrain causes the link to go down, either with or without the common clock bit set: # setpci -s 2.0 0x50.w=0x60 # setpci -s 2.0 0x52.w 0011 # memtool md 0xf1041a04+4 f1041a04: 00010201 ... reboot ... # setpci -s 2.0 0x50.w=0x20 # memtool md 0xf1041a04+4 f1041a04: 00010201 which doesn't point towards ASPM itself, but the problem is caused by a side effect of ASPM's setup code which always triggers a retrain. Bit 5 in that register is documented (at least in the Armada 370 docs and Armada XP docs I have) as: 5 RetrnLnk RW Retrain Link 0x0 This bit forces the device to initiate link retraining. Always returns 0 when read. NOTE: If configured as an Endpoint, this field is reserved and has no effect. Bjorn, are you aware of similar situations where a request for the PCIe link to be retrained causes it to fail? Here, on my Armada 388, I can request a link retrain with or without the common clock bit set and everything's happy (this is with an ASM1062 SATA mini-PCIe card): root@clearfog21:~# setpci -s 2.0 0x50.w=0x60 root@clearfog21:~# setpci -s 2.0 0x52.w 0012 root@clearfog21:~# /shared/bin/devmem2 0xf1041a04 Value at address 0xf1041a04: 0x00010100 root@clearfog21:~# setpci -s 2.0 0x50.w=0x20 root@clearfog21:~# setpci -s 2.0 0x52.w 0012 root@clearfog21:~# /shared/bin/devmem2 0xf1041a04 Value at address 0xf1041a04: 0x00010100 One curious observation I have noticed on Armada 388 is this behaviour: root@clearfog21:~# setpci -s 2.0 0x50.l=0xffff0040 0x50.l 0x50.l=0x0fff0040 0x50.l 10120040 00120040 bit 28 is writable, which goes against the 370/XP docs: 28 SltClkCfg RO Slot Clock Configuration 0x1 0 = Independent: The device uses an independent clock, irrespective of the presence of a reference clock on the connector. 1 = Reference: The device uses the reference clock that the platform provides. It seems that this bit is _not_ read-only. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 21:02 ` Russell King - ARM Linux @ 2017-01-17 22:22 ` Bjorn Helgaas 2017-01-17 23:37 ` David Daney 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-17 22:22 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Thomas Petazzoni, Andrew Lunn, Uwe Kleine-König, David Daney, linux-pci, linux-arm-kernel [+cc David] On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote: > On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote: > > Uwe, can you try: > > > > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > > 0x50.w=0x60 > > > > and see whether it remains alive (you can check by reading the root > > register 0x52.w - bit 12 should be set once bit 11 clears again. > > For reference, this I got wrong... > > 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down). > > > If that's successful, maybe setting the common clock bit on the PCIe > > device is what's causing the problem, in which case: > > > > setpci -s 02:00.0 0x80.w=0x40 > > setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > > 0x50.w=0x60 > > Having worked with Uwe over IRC, it seems that any request to retrain > causes the link to go down, either with or without the common clock bit > set: > > # setpci -s 2.0 0x50.w=0x60 > # setpci -s 2.0 0x52.w > 0011 > # memtool md 0xf1041a04+4 > f1041a04: 00010201 > ... reboot ... > # setpci -s 2.0 0x50.w=0x20 > # memtool md 0xf1041a04+4 > f1041a04: 00010201 > > which doesn't point towards ASPM itself, but the problem is caused by > a side effect of ASPM's setup code which always triggers a retrain. > > Bit 5 in that register is documented (at least in the Armada 370 docs > and Armada XP docs I have) as: > > 5 RetrnLnk RW Retrain Link > 0x0 This bit forces the device to initiate link retraining. > Always returns 0 when read. > NOTE: If configured as an Endpoint, this field is > reserved and has no effect. > > Bjorn, are you aware of similar situations where a request for the PCIe > link to be retrained causes it to fail? The only one that comes to mind is this patch from David (CC'd) that avoids ASPM-related retrains when we know the link doesn't support ASPM: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3 Side note: it looks like we don't use the recommended retrain algorithm in the implementation note about avoiding race conditions in PCIe r3.0, sec 7.8.7. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 22:22 ` Bjorn Helgaas @ 2017-01-17 23:37 ` David Daney 2017-01-18 14:22 ` Bjorn Helgaas 0 siblings, 1 reply; 18+ messages in thread From: David Daney @ 2017-01-17 23:37 UTC (permalink / raw) To: Bjorn Helgaas, Russell King - ARM Linux Cc: Thomas Petazzoni, Andrew Lunn, Uwe Kleine-König, David Daney, linux-pci, linux-arm-kernel On 01/17/2017 02:22 PM, Bjorn Helgaas wrote: > [+cc David] > > On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote: >> On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote: >>> Uwe, can you try: >>> >>> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ >>> 0x50.w=0x60 >>> >>> and see whether it remains alive (you can check by reading the root >>> register 0x52.w - bit 12 should be set once bit 11 clears again. >> >> For reference, this I got wrong... >> >> 0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down). >> >>> If that's successful, maybe setting the common clock bit on the PCIe >>> device is what's causing the problem, in which case: >>> >>> setpci -s 02:00.0 0x80.w=0x40 >>> setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ >>> 0x50.w=0x60 >> >> Having worked with Uwe over IRC, it seems that any request to retrain >> causes the link to go down, either with or without the common clock bit >> set: >> >> # setpci -s 2.0 0x50.w=0x60 >> # setpci -s 2.0 0x52.w >> 0011 >> # memtool md 0xf1041a04+4 >> f1041a04: 00010201 >> ... reboot ... >> # setpci -s 2.0 0x50.w=0x20 >> # memtool md 0xf1041a04+4 >> f1041a04: 00010201 >> >> which doesn't point towards ASPM itself, but the problem is caused by >> a side effect of ASPM's setup code which always triggers a retrain. >> >> Bit 5 in that register is documented (at least in the Armada 370 docs >> and Armada XP docs I have) as: >> >> 5 RetrnLnk RW Retrain Link >> 0x0 This bit forces the device to initiate link retraining. >> Always returns 0 when read. >> NOTE: If configured as an Endpoint, this field is >> reserved and has no effect. >> >> Bjorn, are you aware of similar situations where a request for the PCIe >> link to be retrained causes it to fail? Link (re)training can fail for several reasons including, but not limited to: - Poor signal propagation through the chips/packages/boards/connectors, also known as Signal Integrity (SI) problmes. - Incorrect implementation, in hardware, of link training protocols at either end of the link Usually, system and PCIe device vendors do a lot of testing and signal analysis across a variety of configurations with the end goal being that PCIe looks like a bullet-proof interconnect to the end consumer. Unfortunatly, sometimes it doesn't work. In these cases, the vendors of the devices on each end of the link tend to point fingers at the link partner for being detective in some way. This patch: > > The only one that comes to mind is this patch from David (CC'd) that > avoids ASPM-related retrains when we know the link doesn't support ASPM: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3 > Is an attempt to work around the problem from the system (host) end. If the system vendor knows a priori that a defective PCIe device is present in the system, the PCIe root port can be configured to indicate no ASPM is supported, resulting (with the patch) in no link retraining being attempted. To me it feels that we need a black list of devices that fail at a high rate in the link retraining, that when encountered would disable ASPM on the link where they reside. Just my $0.02 David Daney > Side note: it looks like we don't use the recommended retrain > algorithm in the implementation note about avoiding race conditions in > PCIe r3.0, sec 7.8.7. > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-17 23:37 ` David Daney @ 2017-01-18 14:22 ` Bjorn Helgaas 2017-01-18 17:36 ` David Daney 0 siblings, 1 reply; 18+ messages in thread From: Bjorn Helgaas @ 2017-01-18 14:22 UTC (permalink / raw) To: David Daney Cc: Thomas Petazzoni, Andrew Lunn, Uwe Kleine-König, David Daney, linux-pci, Russell King - ARM Linux, linux-arm-kernel On Tue, Jan 17, 2017 at 03:37:10PM -0800, David Daney wrote: > On 01/17/2017 02:22 PM, Bjorn Helgaas wrote: > >[+cc David] > > > >On Tue, Jan 17, 2017 at 09:02:58PM +0000, Russell King - ARM Linux wrote: > >>On Tue, Jan 17, 2017 at 07:34:14PM +0000, Russell King - ARM Linux wrote: > >>>Uwe, can you try: > >>> > >>>setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > >>> 0x50.w=0x60 > >>> > >>>and see whether it remains alive (you can check by reading the root > >>>register 0x52.w - bit 12 should be set once bit 11 clears again. > >> > >>For reference, this I got wrong... > >> > >>0xf1041a04 bit 0 indicates link status (0 = link up, 1 = link down). > >> > >>>If that's successful, maybe setting the common clock bit on the PCIe > >>>device is what's causing the problem, in which case: > >>> > >>>setpci -s 02:00.0 0x80.w=0x40 > >>>setpci -s <whatever-the-id-of-the-root-is-it's-blanked-out-in-the-above> \ > >>> 0x50.w=0x60 > >> > >>Having worked with Uwe over IRC, it seems that any request to retrain > >>causes the link to go down, either with or without the common clock bit > >>set: > >> > >># setpci -s 2.0 0x50.w=0x60 > >># setpci -s 2.0 0x52.w > >>0011 > >># memtool md 0xf1041a04+4 > >>f1041a04: 00010201 > >>... reboot ... > >># setpci -s 2.0 0x50.w=0x20 > >># memtool md 0xf1041a04+4 > >>f1041a04: 00010201 > >> > >>which doesn't point towards ASPM itself, but the problem is caused by > >>a side effect of ASPM's setup code which always triggers a retrain. > >> > >>Bit 5 in that register is documented (at least in the Armada 370 docs > >>and Armada XP docs I have) as: > >> > >>5 RetrnLnk RW Retrain Link > >> 0x0 This bit forces the device to initiate link retraining. > >> Always returns 0 when read. > >> NOTE: If configured as an Endpoint, this field is > >> reserved and has no effect. > >> > >>Bjorn, are you aware of similar situations where a request for the PCIe > >>link to be retrained causes it to fail? > > > Link (re)training can fail for several reasons including, but not > limited to: > > - Poor signal propagation through the > chips/packages/boards/connectors, also known as Signal Integrity > (SI) problmes. > > - Incorrect implementation, in hardware, of link training protocols > at either end of the link > > Usually, system and PCIe device vendors do a lot of testing and > signal analysis across a variety of configurations with the end goal > being that PCIe looks like a bullet-proof interconnect to the end > consumer. > > Unfortunatly, sometimes it doesn't work. In these cases, the > vendors of the devices on each end of the link tend to point fingers > at the link partner for being detective in some way. > > This patch: > > > > >The only one that comes to mind is this patch from David (CC'd) that > >avoids ASPM-related retrains when we know the link doesn't support ASPM: > >http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3 > > > > Is an attempt to work around the problem from the system (host) end. > If the system vendor knows a priori that a defective PCIe device is > present in the system, the PCIe root port can be configured to > indicate no ASPM is supported, resulting (with the patch) in no link > retraining being attempted. > > To me it feels that we need a black list of devices that fail at a > high rate in the link retraining, that when encountered would > disable ASPM on the link where they reside. I should have asked you for details about the defective devices related to e53f9a28bee3 :) If we had included that in the changelog, we would have something to seed a blacklist with. There are several situations other than ASPM where link retraining is required per spec (rate change, error handling, etc), and I guess we'd have to avoid all of them. So I suppose e53f9a28bee3 avoids the most obvious failures, but maybe we could still see issues in those other cases. Bjorn _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-18 14:22 ` Bjorn Helgaas @ 2017-01-18 17:36 ` David Daney 2017-01-18 17:55 ` Russell King - ARM Linux 0 siblings, 1 reply; 18+ messages in thread From: David Daney @ 2017-01-18 17:36 UTC (permalink / raw) To: Bjorn Helgaas, David Daney Cc: Thomas Petazzoni, Andrew Lunn, Uwe Kleine-König, David Daney, linux-pci, Russell King - ARM Linux, linux-arm-kernel On 01/18/2017 06:22 AM, Bjorn Helgaas wrote: > On Tue, Jan 17, 2017 at 03:37:10PM -0800, David Daney wrote: [...] >> >> >> Link (re)training can fail for several reasons including, but not >> limited to: >> >> - Poor signal propagation through the >> chips/packages/boards/connectors, also known as Signal Integrity >> (SI) problmes. >> >> - Incorrect implementation, in hardware, of link training protocols >> at either end of the link >> >> Usually, system and PCIe device vendors do a lot of testing and >> signal analysis across a variety of configurations with the end goal >> being that PCIe looks like a bullet-proof interconnect to the end >> consumer. >> >> Unfortunatly, sometimes it doesn't work. In these cases, the >> vendors of the devices on each end of the link tend to point fingers >> at the link partner for being detective in some way. >> >> This patch: >> >>> >>> The only one that comes to mind is this patch from David (CC'd) that >>> avoids ASPM-related retrains when we know the link doesn't support ASPM: >>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e53f9a28bee3 >>> >> >> Is an attempt to work around the problem from the system (host) end. >> If the system vendor knows a priori that a defective PCIe device is >> present in the system, the PCIe root port can be configured to >> indicate no ASPM is supported, resulting (with the patch) in no link >> retraining being attempted. >> >> To me it feels that we need a black list of devices that fail at a >> high rate in the link retraining, that when encountered would >> disable ASPM on the link where they reside. > > I should have asked you for details about the defective devices > related to e53f9a28bee3 :) If we had included that in the changelog, > we would have something to seed a blacklist with. The device I saw failing I don't have access to any more, so I don't know the PCI IDs. It was a solid-state storage device with a Xilinx FPGA acting as the PCIe endpoint. In any event, it would only fail in about 0.5% of system boots, it wasn't the case that it could be made to reliably fail. The tricky thing here is assigning the blame for failure in link training. In the case in question we spent many months analysing the analog properties of the bus and examining/decoding analog scope captures of the failures before credibly assigning blame to the other guy. Usually what happens is the device vendor accurately claims that their device works flawlessly in conjunction with certain Intel root ports, so the problem must be fixed in the root port of the failing system. If you have a black list, you may be disabling ASPM in systems where it can work without failures. > > There are several situations other than ASPM where link retraining is > required per spec (rate change, error handling, etc), and I guess we'd > have to avoid all of them. So I suppose e53f9a28bee3 avoids the most > obvious failures, but maybe we could still see issues in those other > cases. > > Bjorn > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine 2017-01-18 17:36 ` David Daney @ 2017-01-18 17:55 ` Russell King - ARM Linux 0 siblings, 0 replies; 18+ messages in thread From: Russell King - ARM Linux @ 2017-01-18 17:55 UTC (permalink / raw) To: David Daney Cc: Bjorn Helgaas, Thomas Petazzoni, Andrew Lunn, Uwe Kleine-König, David Daney, linux-pci, linux-arm-kernel On Wed, Jan 18, 2017 at 09:36:55AM -0800, David Daney wrote: > On 01/18/2017 06:22 AM, Bjorn Helgaas wrote: > The tricky thing here is assigning the blame for failure in link training. > In the case in question we spent many months analysing the analog properties > of the bus and examining/decoding analog scope captures of the failures > before credibly assigning blame to the other guy. Usually what happens is > the device vendor accurately claims that their device works flawlessly in > conjunction with certain Intel root ports, so the problem must be fixed in > the root port of the failing system. If you have a black list, you may be > disabling ASPM in systems where it can work without failures. So what we need is not a table of just devices, but a combination of devices... iow, "when root A and endpoint B are combined, retrains need to be avoided." -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-01-18 17:55 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-01-11 19:49 CONFIG_PCIEASPM breaks PCIe on Marvell Armada 385 machine Uwe Kleine-König 2017-01-11 22:02 ` Bjorn Helgaas 2017-01-12 13:18 ` Uwe Kleine-König 2017-01-12 15:03 ` Bjorn Helgaas 2017-01-12 15:24 ` Andrew Lunn 2017-01-17 15:14 ` Bjorn Helgaas 2017-01-17 15:25 ` Russell King - ARM Linux 2017-01-17 17:46 ` Bjorn Helgaas 2017-01-17 17:51 ` Russell King - ARM Linux 2017-01-17 17:57 ` Russell King - ARM Linux 2017-01-17 18:14 ` Bjorn Helgaas 2017-01-17 19:34 ` Russell King - ARM Linux 2017-01-17 21:02 ` Russell King - ARM Linux 2017-01-17 22:22 ` Bjorn Helgaas 2017-01-17 23:37 ` David Daney 2017-01-18 14:22 ` Bjorn Helgaas 2017-01-18 17:36 ` David Daney 2017-01-18 17:55 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).