* pciehp is broken from 4.10-rc1
@ 2017-02-03 4:11 Yinghai Lu
2017-02-03 5:52 ` Lukas Wunner
2017-02-03 15:09 ` Bjorn Helgaas
0 siblings, 2 replies; 24+ messages in thread
From: Yinghai Lu @ 2017-02-03 4:11 UTC (permalink / raw)
To: Bjorn Helgaas, Rafael J. Wysocki, Lukas Wunner; +Cc: linux-pci@vger.kernel.org
4.9 is working,
sca05-0a81e0db:~ # uname -a
Linux sca05-0a81e0db 4.9.0-yh #28 SMP Thu Feb 2 18:19:00 PST 2017
x86_64 x86_64 x86_64 GNU/Linux
sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power
[ 130.641527] mlx4_core 0000:65:00.0: PME# disabled
[ 132.114003] iommu: Removing device 0000:65:00.0 from group 172
[ 132.133504] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command
0x11f1 (issued 70480 msec ago)
[ 132.216228] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down
[ 132.222477] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event
ignored; already powering off
sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power
[ 175.771846] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up
[ 175.777898] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event
ignored; already powering on
[ 175.956632] pci 0000:65:00.0: [15b3:1003] type 00 class 0x0c0600
[ 175.963581] pci 0000:65:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit]
[ 175.971312] pci 0000:65:00.0: reg 0x18: [mem 0x00000000-0x07ffffff
64bit pref]
[ 175.980100] pci 0000:65:00.0: calling quirk_broken_intx_masking+0x0/0x20
[ 175.987590] calling quirk_broken_intx_masking+0x0/0x20 @ 16793 for
0000:65:00.0
[ 175.995855] pci fixup quirk_broken_intx_masking+0x0/0x20 returned
after 0 usecs for 0000:65:00.0
[ 176.006876] pci 0000:65:00.0: reg 0x134: [mem 0x00000000-0x07ffffff
64bit pref]
[ 176.015045] pci 0000:65:00.0: VF(n) BAR2 space: [mem
0x00000000-0x1ffffffff 64bit pref] (contains BAR2 for 64 VFs)
[ 176.031852] iommu: Adding device 0000:65:00.0 to group 172
[ 176.038263] pci 0000:65:00.0: BAR 2: assigned [mem
0x387800000000-0x387807ffffff 64bit pref]
[ 176.047817] pci 0000:65:00.0: BAR 9: assigned [mem
0x387808000000-0x387a07ffffff 64bit pref]
[ 176.057363] pci 0000:65:00.0: BAR 0: assigned [mem
0xc0000000-0xc00fffff 64bit]
[ 176.065657] pcieport 0000:60:03.2: PCI bridge to [bus 65-67]
[ 176.071983] pcieport 0000:60:03.2: bridge window [io 0xa000-0xafff]
[ 176.079277] pcieport 0000:60:03.2: bridge window [mem
0xc0000000-0xc3ffffff]
[ 176.087348] pcieport 0000:60:03.2: bridge window [mem
0x387800000000-0x387bffffffff 64bit pref]
[ 176.097267] pcieport 0000:60:03.2: Max Payload Size set to 256/
256 (was 256), Max Read Rq 128
[ 176.107253] pci 0000:65:00.0: Max Payload Size set to 256/ 256
(was 128), Max Read Rq 512
[ 176.116910] mlx4_core: Initializing 0000:65:00.0
[ 176.122103] mlx4_core 0000:65:00.0: enabling device (0140 -> 0142)
[ 176.129142] mlx4_core 0000:65:00.0: enabling bus mastering
[ 182.909586] mlx4_core 0000:65:00.0: Old device ETS support detected
[ 182.916585] mlx4_core 0000:65:00.0: Consider upgrading device FW.
[ 183.725530] mlx4_core 0000:65:00.0: PCIe link speed is 8.0GT/s,
device supports 8.0GT/s
[ 183.734471] mlx4_core 0000:65:00.0: PCIe link width is x8, device supports x8
[ 184.073280] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0
[ 184.080870] <mlx4_ib> mlx4_ib_add: counter index 1 for port 2 allocated 0
[ 184.202450] RDS/IB: mlx4_0: FMR supported and preferred
sca05-0a81e0db:~ # uname -a
Linux sca05-0a81e0db 4.10.0-rc1-yh #29 SMP Thu Feb 2 18:45:03 PST 2017
x86_64 x86_64 x86_64 GNU/Linux
sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power
[ 141.838027] mlx4_core 0000:65:00.0: PME# disabled
[ 143.279434] iommu: Removing device 0000:65:00.0 from group 172
[ 143.292329] pcieport 0000:60:03.2: PME# enabled
[ 143.297431] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command
0x11f1 (issued 81476 msec ago)
[ 143.337545] pcieport 0000:60:03.2: PME# disabled
[ 143.380359] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down
[ 143.386735] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event
ignored; already powering off
[ 143.445483] pcieport 0000:60:03.2: PME# enabled
[ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up
[ 143.999004] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event
queued; currently getting powered off
[ 144.025590] pcieport 0000:60:03.2: PME# disabled
[ 144.133548] pcieport 0000:60:03.2: PME# enabled
[ 144.333603] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled
sca05-0a81e0db:~ # [ 144.357483] pcieport 0000:60:03.2: PME# disabled
[ 144.465566] pcieport 0000:60:03.2: PME# enabled
sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power
[ 221.041664] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled
After reverting
>From 68db9bc814362e7f24371c27d12a4f34477d9356 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas@wunner.de>
Date: Fri, 28 Oct 2016 10:52:06 +0200
Subject: PCI: pciehp: Add runtime PM support for PCIe hotplug ports
the hotplug work again.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: pciehp is broken from 4.10-rc1 2017-02-03 4:11 pciehp is broken from 4.10-rc1 Yinghai Lu @ 2017-02-03 5:52 ` Lukas Wunner 2017-02-04 7:00 ` Yinghai Lu 2017-02-03 15:09 ` Bjorn Helgaas 1 sibling, 1 reply; 24+ messages in thread From: Lukas Wunner @ 2017-02-03 5:52 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg [cc += Mika] On Thu, Feb 02, 2017 at 08:11:48PM -0800, Yinghai Lu wrote: > 4.9 is working, [...] > sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power > [ 175.771846] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up > [ 175.777898] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event > ignored; already powering on > [ 175.956632] pci 0000:65:00.0: [15b3:1003] type 00 class 0x0c0600 [...] > sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power > [ 221.041664] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled Could you check if the port above 0000:60:03.2 is runtime suspended when you're trying this, i.e. does its power/runtime_status entry in sysfs say "suspended"? Normally the hotplug slot should keep its parents awake so that config space accesses work. And the hotplug port's config space should be accessible in D3 per the spec. If you add pm_runtime_get_sync(&ctrl->pcie->port->dev) in drivers/pci/pciehp_ctrl.c:pciehp_enable_slot() before the call to pciehp_get_power_status(), and a corresponding pm_runtime_put() afterwards, does the issue go away? Thanks, Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-03 5:52 ` Lukas Wunner @ 2017-02-04 7:00 ` Yinghai Lu 2017-02-04 8:12 ` Lukas Wunner 0 siblings, 1 reply; 24+ messages in thread From: Yinghai Lu @ 2017-02-04 7:00 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg On Thu, Feb 2, 2017 at 9:52 PM, Lukas Wunner <lukas@wunner.de> wrote: > [cc += Mika] > > On Thu, Feb 02, 2017 at 08:11:48PM -0800, Yinghai Lu wrote: >> 4.9 is working, > [...] >> sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power >> [ 175.771846] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up >> [ 175.777898] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event >> ignored; already powering on >> [ 175.956632] pci 0000:65:00.0: [15b3:1003] type 00 class 0x0c0600 > [...] >> sca05-0a81e0db:~ # echo 1 > /sys/bus/pci/slots/8/power >> [ 221.041664] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled > > Could you check if the port above 0000:60:03.2 is runtime suspended > when you're trying this, i.e. does its power/runtime_status entry in > sysfs say "suspended"? yes. > > Normally the hotplug slot should keep its parents awake so that config > space accesses work. And the hotplug port's config space should be > accessible in D3 per the spec. > > If you add pm_runtime_get_sync(&ctrl->pcie->port->dev) in > drivers/pci/pciehp_ctrl.c:pciehp_enable_slot() before the call to > pciehp_get_power_status(), and a corresponding pm_runtime_put() > afterwards, does the issue go away? Still not working. the problem is sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power [ 141.838027] mlx4_core 0000:65:00.0: PME# disabled [ 143.279434] iommu: Removing device 0000:65:00.0 from group 172 [ 143.292329] pcieport 0000:60:03.2: PME# enabled [ 143.297431] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command 0x11f1 (issued 81476 msec ago) [ 143.337545] pcieport 0000:60:03.2: PME# disabled [ 143.380359] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down [ 143.386735] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event ignored; already powering off [ 143.445483] pcieport 0000:60:03.2: PME# enabled [ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up [ 143.999004] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event queued; currently getting powered off [ 144.025590] pcieport 0000:60:03.2: PME# disabled [ 144.133548] pcieport 0000:60:03.2: PME# enabled [ 144.333603] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled sca05-0a81e0db:~ # [ 144.357483] pcieport 0000:60:03.2: PME# disabled [ 144.465566] pcieport 0000:60:03.2: PME# enabled we have extra Link Up event queued, while pm_runtime_get_sync/pm_runtime_put ? [ 143.445483] pcieport 0000:60:03.2: PME# enabled [ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up Thanks Yinghai ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 7:00 ` Yinghai Lu @ 2017-02-04 8:12 ` Lukas Wunner 2017-02-04 18:56 ` Lukas Wunner 0 siblings, 1 reply; 24+ messages in thread From: Lukas Wunner @ 2017-02-04 8:12 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg On Fri, Feb 03, 2017 at 11:00:19PM -0800, Yinghai Lu wrote: > On Thu, Feb 2, 2017 at 9:52 PM, Lukas Wunner <lukas@wunner.de> wrote: > > Could you check if the port above 0000:60:03.2 is runtime suspended > > when you're trying this, i.e. does its power/runtime_status entry in > > sysfs say "suspended"? > > yes. Huh? That shouldn't happen, the port 0000:60:03.2 should block its parents from runtime suspending (by way of checking is_hotplug_bridge in pci_dev_check_d3cold()). Maybe there's a misunderstanding here. I was referring to the port *above* 0000:60:03.2. I don't know it's device name, it's not apparent from the logs you've posted so far. I've opened a bugzilla entry for this: https://bugzilla.kernel.org/show_bug.cgi?id=193951 Please attach the output of "lspci -vvvvxxxx" and full dmesg output. > > If you add pm_runtime_get_sync(&ctrl->pcie->port->dev) in > > drivers/pci/pciehp_ctrl.c:pciehp_enable_slot() before the call to > > pciehp_get_power_status(), and a corresponding pm_runtime_put() > > afterwards, does the issue go away? > > Still not working. > > the problem is > sca05-0a81e0db:~ # echo 0 > /sys/bus/pci/slots/8/power > [ 141.838027] mlx4_core 0000:65:00.0: PME# disabled > [ 143.279434] iommu: Removing device 0000:65:00.0 from group 172 > [ 143.292329] pcieport 0000:60:03.2: PME# enabled > [ 143.297431] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command > 0x11f1 (issued 81476 msec ago) > [ 143.337545] pcieport 0000:60:03.2: PME# disabled > [ 143.380359] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down > [ 143.386735] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event > ignored; already powering off > [ 143.445483] pcieport 0000:60:03.2: PME# enabled > [ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up > [ 143.999004] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event > queued; currently getting powered off > [ 144.025590] pcieport 0000:60:03.2: PME# disabled > [ 144.133548] pcieport 0000:60:03.2: PME# enabled > [ 144.333603] pciehp 0000:60:03.2:pcie004: Slot(8): Already enabled > sca05-0a81e0db:~ # [ 144.357483] pcieport 0000:60:03.2: PME# disabled > [ 144.465566] pcieport 0000:60:03.2: PME# enabled > > we have extra Link Up event queued, while pm_runtime_get_sync/pm_runtime_put ? > [ 143.445483] pcieport 0000:60:03.2: PME# enabled > [ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up I notice that with 68db9bc81436 applied, PME is repeatedly enabled and disabled on the port, presumably whenever it switches from D3 to D0 and vice-versa. Perhaps this port sends an interrupt while PME is enabled and the slot is actually occupied, despite it having been disabled via sysfs. That's a case I couldn't test when developing the patch for lack of PME capable hardware. If you comment out the calls to __pci_enable_wake() in drivers/pci/pci.c:pci_finish_runtime_suspend(), does the issue go away? Thanks, Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 8:12 ` Lukas Wunner @ 2017-02-04 18:56 ` Lukas Wunner 2017-02-04 21:44 ` Yinghai Lu 0 siblings, 1 reply; 24+ messages in thread From: Lukas Wunner @ 2017-02-04 18:56 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > On Fri, Feb 03, 2017 at 11:00:19PM -0800, Yinghai Lu wrote: > > we have extra Link Up event queued, while pm_runtime_get_sync/pm_runtime_put ? > > [ 143.445483] pcieport 0000:60:03.2: PME# enabled > > [ 143.992915] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up > > I notice that with 68db9bc81436 applied, PME is repeatedly enabled and > disabled on the port, presumably whenever it switches from D3 to D0 > and vice-versa. > > Perhaps this port sends an interrupt while PME is enabled and the slot > is actually occupied, despite it having been disabled via sysfs. Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, so here's a tentative patch. Thanks, Lukas -- >8 -- Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend Since commit 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe hotplug ports") we runtime suspend a hotplug port to D3hot when all its children are runtime suspended or none are present. When runtime suspending the port the PCI core automatically enables PME: pci_pm_runtime_suspend() pci_finish_runtime_suspend() __pci_enable_wake() According to the PCI Express Base Specification, section 6.7.3.4: "Note that PME and Hot-Plug Event interrupts (when both are implemented) always share the same MSI or MSI-X vector [...] If wake generation is required by the associated form factor specification, a hot-plug capable Downstream Port must support generation of a wakeup event (using the PME mechanism) on hotplug events that occur when the system is in a sleep state or the Port is in device state D1, D2, or D3Hot." Thus, if the port is runtime suspended even though it is still occupied, it may immediately be woken by a PME interrupt. One scenario where this happens is if all children of the hotplug port have runtime suspended. Another scenario is power control via sysfs: If a user manually turns the hotplug port off (e.g. to safely remove the card), PME will signal an interrupt for the still-occupied slot, which is interpreted by pciehp as re-insertion of a card. As a result, power control via sysfs is no longer possible. This was observed and reported by Yinghai Lu. PME is in fact unnecessary on hotplug ports: Hotplug can be signaled even in D3hot, and commit 68db9bc81436 ensures that all parents of the hotplug port are kept awake so that interrupts can be delivered. PME would allow us to runtime suspend the parent ports as well, but we do not make use of it because we cannot be sure if PME actually works. Thunderbolt controllers for instance advertise PME capability, but at least on Macs the PME pin is not connected. Since we do not rely on PME for hotplug ports, we may as well not enable it, thereby avoiding its negative side effects. However the present commit deliberately only avoids enabling PME on runtime suspend, the ability to enable it for system sleep is retained. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=193951 Fixes: 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe hotplug ports") Reported-by: Yinghai Lu <yinghai@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> --- drivers/pci/pci.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a881c0d..daf3c02 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2072,7 +2072,13 @@ int pci_finish_runtime_suspend(struct pci_dev *dev) dev->runtime_d3cold = target_state == PCI_D3cold; - __pci_enable_wake(dev, target_state, true, pci_dev_run_wake(dev)); + /* + * Enable PME if dev can generate wakeup events at runtime, but not on + * hotplug ports since PME signals unwanted interrupts if the slot is + * occupied (PCIe Base Specification, section 6.7.3.4). + */ + __pci_enable_wake(dev, target_state, true, pci_dev_run_wake(dev) && + !dev->is_hotplug_bridge); error = pci_set_power_state(dev, target_state); -- 2.11.0 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 18:56 ` Lukas Wunner @ 2017-02-04 21:44 ` Yinghai Lu 2017-02-04 23:34 ` Lukas Wunner 0 siblings, 1 reply; 24+ messages in thread From: Yinghai Lu @ 2017-02-04 21:44 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, > so here's a tentative patch. > > > -- >8 -- > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend it works: sca05-0a81e0bc:~ # echo 0 > /sys/bus/pci/slots/8/power [ 176.983229] pci_hotplug: power_write_file: power = 0 [ 176.988945] pciehp 0000:60:03.2:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1 [ 176.998482] pciehp 0000:60:03.2:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:65:00 [ 177.008215] mlx4_core 0000:65:00.0: PME# disabled [ 178.817209] iommu: Removing device 0000:65:00.0 from group 172 [ 178.829869] pcieport 0000:60:03.2: root_bridge ACPI_HANDLE ffff9059bb00ce88 : pci0000:60 [ 178.839114] pcieport 0000:60:03.2: pciehp is native [ 178.844717] pci 0000:65:00.0: freeing pci_dev info [ 178.854842] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command 0x11f1 (issued 97032 msec ago) [ 178.865199] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0010 from Slot Status [ 178.873890] pciehp 0000:60:03.2:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 178.937499] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0100 from Slot Status [ 178.946052] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down [ 178.952422] pciehp 0000:60:03.2:pcie004: Slot(8): Link Down event ignored; already powering off [ 179.894895] pciehp 0000:60:03.2:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 179.903941] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0010 from Slot Status sca05-0a81e0bc:~ # sca05-0a81e0bc:~ # sca05-0a81e0bc:~ # echo 1 > /sys/bus/pci/slots/8/power [ 222.136244] pci_hotplug: power_write_file: power = 1 [ 222.141818] pciehp 0000:60:03.2:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 222.151710] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0010 from Slot Status [ 222.160303] pciehp 0000:60:03.2:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 222.169156] pciehp 0000:60:03.2:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 222.178395] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0010 from Slot Status [ 222.204325] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0010 from Slot Status [ 222.784102] pciehp 0000:60:03.2:pcie004: pciehp_check_link_active: lnk_status = f083 [ 222.792750] pciehp 0000:60:03.2:pcie004: pending interrupts 0x0100 from Slot Status [ 222.801297] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up [ 222.807358] pciehp 0000:60:03.2:pcie004: Slot(8): Link Up event ignored; already powering on [ 222.807409] pciehp 0000:60:03.2:pcie004: pciehp_check_link_active: lnk_status = f083 [ 222.968062] pciehp 0000:60:03.2:pcie004: pciehp_check_link_status: lnk_status = f083 [ 222.976922] pci 0000:65:00.0: [15b3:1003] type 00 class 0x0c0600 [ 222.983864] pci 0000:65:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit] [ 222.991598] pci 0000:65:00.0: reg 0x18: [mem 0x00000000-0x07ffffff 64bit pref] [ 222.999979] pci_bus 0000:65: bridge ACPI_HANDLE ffff9019bb0d0438 : 0000:60:03.2 [ 223.010044] pci 0000:65:00.0: reg 0x134: [mem 0x00000000-0x07ffffff 64bit pref] [ 223.018214] pci 0000:65:00.0: VF(n) BAR2 space: [mem 0x00000000-0x1ffffffff 64bit pref] (contains BAR2 for 64 VFs) [ 223.033412] pci 0000:65:00.0: on_all_pcie_path: 1 [ 223.040666] iommu: Adding device 0000:65:00.0 to group 172 [ 223.047042] pci 0000:65:00.0: BAR 2: assigned [mem 0x387bf8000000-0x387bffffffff 64bit pref] [ 223.056589] pci 0000:65:00.0: BAR 9: assigned [mem 0x3879f8000000-0x387bf7ffffff 64bit pref] [ 223.066068] pci 0000:65:00.0: BAR 0: [mem size 0x00100000 64bit] + pref [ 223.073464] pci 0000:65:00.0: BAR 0: assigned [mem 0x3879f7f00000-0x3879f7ffffff 64bit] [ 223.082527] pcieport 0000:60:03.2: PCI bridge to [bus 65-67] [ 223.088864] pcieport 0000:60:03.2: bridge window [io 0xa000-0xafff] [ 223.096161] pcieport 0000:60:03.2: bridge window [mem 0xc0000000-0xc3ffffff] [ 223.104229] pcieport 0000:60:03.2: bridge window [mem 0x387800000000-0x387bffffffff 64bit pref] [ 223.114152] pcieport 0000:60:03.2: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 128 [ 223.124152] pci 0000:65:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 512 [ 223.133597] pci 0000:65:00.0: calling mellanox_check_broken_intx_masking+0x0/0x130 [ 223.142055] calling mellanox_check_broken_intx_masking+0x0/0x130 @ 16969 for 0000:65:00.0 [ 223.151291] pci fixup mellanox_check_broken_intx_masking+0x0/0x130 returned after 0 usecs for 0000:65:00.0 [ 223.162157] pcieport 0000:60:03.2: root_bridge ACPI_HANDLE ffff9059bb00ce88 : pci0000:60 [ 223.171395] pcieport 0000:60:03.2: pciehp is native [ 223.177183] mlx4_core: Initializing 0000:65:00.0 [ 223.182374] mlx4_core 0000:65:00.0: enabling device (0140 -> 0142) ... [ 231.714671] pciehp 0000:60:03.2:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100 [ 232.728377] pciehp 0000:60:03.2:pcie004: Timeout on hotplug command 0x11f1 (issued 1016 msec ago) [ 232.738298] pciehp 0000:60:03.2:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 21:44 ` Yinghai Lu @ 2017-02-04 23:34 ` Lukas Wunner 2017-02-05 4:22 ` Yinghai Lu 2017-02-06 11:45 ` Rafael J. Wysocki 0 siblings, 2 replies; 24+ messages in thread From: Lukas Wunner @ 2017-02-04 23:34 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: > On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: > > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, > > so here's a tentative patch. > > > > > > -- >8 -- > > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend > > it works: Thanks a lot for the report and for testing the patch! @Rafael: Bjorn usually wants you to ack PM-related patches. Looking at the patch again I'm now thinking it might be cleaner to move the check for is_hotplug_bridge into pci_dev_run_wake(). Thoughts? Thanks, Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 23:34 ` Lukas Wunner @ 2017-02-05 4:22 ` Yinghai Lu 2017-02-05 5:20 ` Yinghai Lu 2017-02-05 7:34 ` Lukas Wunner 2017-02-06 11:45 ` Rafael J. Wysocki 1 sibling, 2 replies; 24+ messages in thread From: Yinghai Lu @ 2017-02-05 4:22 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg, Linux Kernel Mailing List On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: >> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: >> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: >> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, >> > so here's a tentative patch. >> > >> > >> > -- >8 -- >> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend >> >> it works: > > Thanks a lot for the report and for testing the patch! Wait, Commit 68db9bc still has problem with another server (skylake based), and this patch does not help. sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power [ 362.721197] pci_hotplug: power_write_file: power = 0 [ 362.726887] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1 [ 362.736431] pciehp 0000:b3:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:b4:00 [ 362.746160] mlx4_core 0000:b4:00.0: PME# disabled [ 364.494033] pcieport 0000:b3:00.0: root_bridge ACPI_HANDLE ffff9e56b8811550 : pci0000:b3 [ 364.503274] pcieport 0000:b3:00.0: pciehp is native [ 364.508863] pci 0000:b4:00.0: freeing pci_dev info [ 364.514718] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 364.523443] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 364.587047] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 364.595592] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down [ 364.602325] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down event ignored; already powering off [ 365.568415] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 365.569338] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power [ 375.376609] pci_hotplug: power_write_file: power = 1 [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status -bash: echo: write error: Operation not permitted revert commit 68db9bc, also make it working again. Thanks Yinghai ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-05 4:22 ` Yinghai Lu @ 2017-02-05 5:20 ` Yinghai Lu 2017-02-05 7:34 ` Lukas Wunner 1 sibling, 0 replies; 24+ messages in thread From: Yinghai Lu @ 2017-02-05 5:20 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg, Linux Kernel Mailing List On Sat, Feb 4, 2017 at 8:22 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lukas@wunner.de> wrote: >> On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: >>> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: >>> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: >>> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, >>> > so here's a tentative patch. >>> > >>> > >>> > -- >8 -- >>> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend >>> >>> it works: >> >> Thanks a lot for the report and for testing the patch! > > Wait, Commit 68db9bc still has problem with another server (skylake > based), and this patch does not help. > > > sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power > [ 362.721197] pci_hotplug: power_write_file: power = 0 > [ 362.726887] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: > SLOTCTRL a8 value read 11f1 > [ 362.736431] pciehp 0000:b3:00.0:pcie004: pciehp_unconfigure_device: > domain:bus:dev = 0000:b4:00 > [ 362.746160] mlx4_core 0000:b4:00.0: PME# disabled > [ 364.494033] pcieport 0000:b3:00.0: root_bridge ACPI_HANDLE > ffff9e56b8811550 : pci0000:b3 > [ 364.503274] pcieport 0000:b3:00.0: pciehp is native > [ 364.508863] pci 0000:b4:00.0: freeing pci_dev info > [ 364.514718] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 364.523443] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: > SLOTCTRL a8 write cmd 400 > [ 364.587047] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0108 > from Slot Status > [ 364.595592] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down > [ 364.602325] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down event > ignored; already powering off > [ 365.568415] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: > SLOTCTRL a8 write cmd 300 > [ 365.569338] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > [ 375.376609] pci_hotplug: power_write_file: power = 1 > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: > SLOTCTRL a8 value read 17f1 > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: > SLOTCTRL a8 write cmd 0 > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: > SLOTCTRL a8 write cmd 200 > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link > Active not set in 1000 msec > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval > 20 ms to get ffffffff > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: > lnk_status = 5001 > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: > SLOTCTRL a8 write cmd 400 > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: > SLOTCTRL a8 write cmd 300 > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 380.018020] pciehp 0000:b3:00.0:pcie004: > pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > -bash: echo: write error: Operation not permitted > > revert commit 68db9bc, also make it working again. output after reverting 68db9bc sca05-0a81fd8d:~ # echo 0 > /sys/bus/pci/slots/11/power [ 359.966115] pci_hotplug: power_write_file: power = 0 [ 359.971759] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1 [ 359.981284] pciehp 0000:b3:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:b4:00 [ 359.991017] mlx4_core 0000:b4:00.0: PME# disabled [ 361.579571] pci 0000:b4:00.0: freeing pci_dev info [ 361.585390] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 361.594116] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 361.657705] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 361.666268] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down [ 361.673076] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Down event ignored; already powering off [ 362.621894] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 362.622499] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status sca05-0a81fd8d:~ # sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power [ 368.797970] pci_hotplug: power_write_file: power = 1 [ 368.803544] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 368.813743] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 368.822410] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 368.831280] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 368.832115] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 369.455188] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 [ 369.463844] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 369.465786] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 [ 369.481042] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Up [ 369.487219] pciehp 0000:b3:00.0:pcie004: Slot(11): Link Up event ignored; already powering on [ 369.573787] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = f083 [ 369.582664] pci 0000:b4:00.0: [15b3:1003] type 00 class 0x0c0600 [ 369.589626] pci 0000:b4:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit] [ 369.597359] pci 0000:b4:00.0: reg 0x18: [mem 0x00000000-0x07ffffff 64bit pref] [ 369.605749] pci_bus 0000:b4: bridge ACPI_HANDLE ffff9c2fb8817780 : 0000:b3:00.0 [ 369.615407] pci 0000:b4:00.0: reg 0x134: [mem 0x00000000-0x07ffffff 64bit pref] [ 369.623571] pci 0000:b4:00.0: VF(n) BAR2 space: [mem 0x00000000-0x1ffffffff 64bit pref] (contains BAR2 for 64 VFs) [ 369.638820] pci 0000:b4:00.0: on_all_pcie_path: 1 [ 369.644445] pci 0000:b4:00.0: BAR 2: assigned [mem 0x396ff8000000-0x396fffffffff 64bit pref] [ 369.654012] pci 0000:b4:00.0: BAR 9: assigned [mem 0x396df8000000-0x396ff7ffffff 64bit pref] [ 369.663489] pci 0000:b4:00.0: BAR 0: [mem size 0x00100000 64bit] + pref [ 369.670879] pci 0000:b4:00.0: BAR 0: assigned [mem 0xddf00000-0xddffffff 64bit] [ 369.679171] pcieport 0000:b3:00.0: PCI bridge to [bus b4-b7] [ 369.685495] pcieport 0000:b3:00.0: bridge window [io 0xf000-0xffff] [ 369.692791] pcieport 0000:b3:00.0: bridge window [mem 0xdd000000-0xddffffff] [ 369.700857] pcieport 0000:b3:00.0: bridge window [mem 0x396df8000000-0x396fffffffff 64bit pref] [ 369.710778] pcieport 0000:b3:00.0: Max Payload Size set to 256/ 256 (was 256), Max Read Rq 128 [ 369.720776] pci 0000:b4:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 512 [ 369.730231] pci 0000:b4:00.0: calling mellanox_check_broken_intx_masking+0x0/0x130 [ 369.738691] calling mellanox_check_broken_intx_masking+0x0/0x130 @ 40613 for 0000:b4:00.0 [ 369.747913] pci fixup mellanox_check_broken_intx_masking+0x0/0x130 returned after 0 usecs for 0000:b4:00.0 [ 369.759192] mlx4_core: Initializing 0000:b4:00.0 [ 369.764398] mlx4_core 0000:b4:00.0: enabling device (0000 -> 0002) [ 369.771854] alloc irq_desc for 71 on node 5 [ 369.776904] IOAPIC[31]: Set IRTE entry (P:1 FPD:0 Dst_Mode:1 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:1 Avail:0 Vector:D7 Dest:00143FFF SID:B32C SQ:0 SVT:1) [ 369.792059] IOAPIC[24]: Set routing entry (31-0 -> 0xd7 -> IRQ 71 Mode:1 Active:1 Dest:1327103) ... [ 377.032574] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_on: SLOTCTRL a8 write cmd 100 [ 377.032802] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 377.050076] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd c0 [ 377.050328] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-05 4:22 ` Yinghai Lu 2017-02-05 5:20 ` Yinghai Lu @ 2017-02-05 7:34 ` Lukas Wunner 2017-02-06 10:37 ` Mika Westerberg ` (2 more replies) 1 sibling, 3 replies; 24+ messages in thread From: Lukas Wunner @ 2017-02-05 7:34 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg, Linux Kernel Mailing List On Sat, Feb 04, 2017 at 08:22:59PM -0800, Yinghai Lu wrote: > On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > > On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: > >> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: > >> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > >> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, > >> > so here's a tentative patch. > >> > > >> > -- >8 -- > >> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend > >> > >> it works: > > > > Thanks a lot for the report and for testing the patch! > > Wait, Commit 68db9bc still has problem with another server (skylake > based), and this patch does not help. [...] > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > [ 375.376609] pci_hotplug: power_write_file: power = 1 > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status So on this Skylake machine link training fails after resuming from D3hot to D0. One thing that's a bit fishy is that normally the Link Disable bit is cleared when powering on the slot. This results in a debug message in dmesg containg the string "lnk_ctrl = ", and that line is missing from the output you've pasted above, suggesting that the machine is not running a stock v4.10 kernel after all but something else. Could you check why this message is not printed? Could you check with lspci if the Link Disable bit is set before you invoke "echo 1"? This is the call stack: pciehp_sysfs_enable_slot() pciehp_enable_slot() board_added() pciehp_power_on_slot() pciehp_link_enable() __pciehp_link_set() Another theory is that the link is generally unreliable on this machine since the Link Bandwidth Management Status bit is set in the Link Status Register ("lnk_status = 5001"), which according to the spec means: "Hardware has changed Link speed or width to attempt to correct unreliable Link operation, either through an LTSSM timeout or a higher level process. This bit must be set if the Physical Layer reports a speed or width change was initiated by the Downstream component that was not indicated as an autonomous change." In this case it would be good to know which hardware exactly we're dealing with so that we might quirk it to not runtime suspend the port. To that end, could you attach a full dmesg log to the bugzilla entry I've created? https://bugzilla.kernel.org/show_bug.cgi?id=193951 @Mika, Rafael: Are you aware of Skylake machines with unreliable link training, or perhaps errata of Skylake chips related to link training on hotplug ports? Thanks, Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-05 7:34 ` Lukas Wunner @ 2017-02-06 10:37 ` Mika Westerberg 2017-02-06 11:49 ` Rafael J. Wysocki 2017-02-06 21:35 ` Lukas Wunner 2017-02-06 18:10 ` Bjorn Helgaas [not found] ` <20170206204249.GA679@wunner.de> 2 siblings, 2 replies; 24+ messages in thread From: Mika Westerberg @ 2017-02-06 10:37 UTC (permalink / raw) To: Lukas Wunner Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Linux Kernel Mailing List On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > > [ 375.376609] pci_hotplug: power_write_file: power = 1 > > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec > > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff > > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 > > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 > > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 > > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status It would be good to see the output when 68db9bc is reverted. Yinghai, can you attach that to the bugzilla but as well? > So on this Skylake machine link training fails after resuming from D3hot > to D0. > > One thing that's a bit fishy is that normally the Link Disable bit is > cleared when powering on the slot. This results in a debug message > in dmesg containg the string "lnk_ctrl = ", and that line is missing > from the output you've pasted above, suggesting that the machine is > not running a stock v4.10 kernel after all but something else. Could > you check why this message is not printed? Could you check with lspci > if the Link Disable bit is set before you invoke "echo 1"? > > This is the call stack: > pciehp_sysfs_enable_slot() > pciehp_enable_slot() > board_added() > pciehp_power_on_slot() > pciehp_link_enable() > __pciehp_link_set() > > Another theory is that the link is generally unreliable on this machine > since the Link Bandwidth Management Status bit is set in the Link Status > Register ("lnk_status = 5001"), which according to the spec means: > > "Hardware has changed Link speed or width to attempt to correct unreliable > Link operation, either through an LTSSM timeout or a higher level process. > This bit must be set if the Physical Layer reports a speed or width change > was initiated by the Downstream component that was not indicated as an > autonomous change." > > In this case it would be good to know which hardware exactly we're dealing > with so that we might quirk it to not runtime suspend the port. To that > end, could you attach a full dmesg log to the bugzilla entry I've created? > https://bugzilla.kernel.org/show_bug.cgi?id=193951 > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > training, or perhaps errata of Skylake chips related to link training > on hotplug ports? According to the 100-series (the chipset used with Skylake) errata below, I don't see any mentions related to PCIe link training issues. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-06 10:37 ` Mika Westerberg @ 2017-02-06 11:49 ` Rafael J. Wysocki 2017-02-06 21:35 ` Lukas Wunner 1 sibling, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2017-02-06 11:49 UTC (permalink / raw) To: Mika Westerberg Cc: Lukas Wunner, Yinghai Lu, Bjorn Helgaas, linux-pci@vger.kernel.org, Linux Kernel Mailing List On Monday, February 06, 2017 12:37:06 PM Mika Westerberg wrote: > On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > > > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > > > [ 375.376609] pci_hotplug: power_write_file: power = 1 > > > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > > > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > > > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > > > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec > > > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff > > > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 > > > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > > > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > > > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 > > > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 > > > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > > [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > > > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > It would be good to see the output when 68db9bc is reverted. Yinghai, > can you attach that to the bugzilla but as well? > > > So on this Skylake machine link training fails after resuming from D3hot > > to D0. > > > > One thing that's a bit fishy is that normally the Link Disable bit is > > cleared when powering on the slot. This results in a debug message > > in dmesg containg the string "lnk_ctrl = ", and that line is missing > > from the output you've pasted above, suggesting that the machine is > > not running a stock v4.10 kernel after all but something else. Could > > you check why this message is not printed? Could you check with lspci > > if the Link Disable bit is set before you invoke "echo 1"? > > > > This is the call stack: > > pciehp_sysfs_enable_slot() > > pciehp_enable_slot() > > board_added() > > pciehp_power_on_slot() > > pciehp_link_enable() > > __pciehp_link_set() > > > > Another theory is that the link is generally unreliable on this machine > > since the Link Bandwidth Management Status bit is set in the Link Status > > Register ("lnk_status = 5001"), which according to the spec means: > > > > "Hardware has changed Link speed or width to attempt to correct unreliable > > Link operation, either through an LTSSM timeout or a higher level process. > > This bit must be set if the Physical Layer reports a speed or width change > > was initiated by the Downstream component that was not indicated as an > > autonomous change." > > > > In this case it would be good to know which hardware exactly we're dealing > > with so that we might quirk it to not runtime suspend the port. To that > > end, could you attach a full dmesg log to the bugzilla entry I've created? > > https://bugzilla.kernel.org/show_bug.cgi?id=193951 > > > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > > training, or perhaps errata of Skylake chips related to link training > > on hotplug ports? > > According to the 100-series (the chipset used with Skylake) errata > below, I don't see any mentions related to PCIe link training issues. > > http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf Still, it does look like errata to me. At least I don't see what can be done on the software side to avoid this from happening except for leaving the port(s) in question in D0. Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-06 10:37 ` Mika Westerberg 2017-02-06 11:49 ` Rafael J. Wysocki @ 2017-02-06 21:35 ` Lukas Wunner 2017-02-08 13:00 ` Erik Veijola 2017-02-08 17:25 ` Bjorn Helgaas 1 sibling, 2 replies; 24+ messages in thread From: Lukas Wunner @ 2017-02-06 21:35 UTC (permalink / raw) To: Mika Westerberg Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Linux Kernel Mailing List On Mon, Feb 06, 2017 at 12:37:06PM +0200, Mika Westerberg wrote: > On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > > training, or perhaps errata of Skylake chips related to link training > > on hotplug ports? > > According to the 100-series (the chipset used with Skylake) errata > below, I don't see any mentions related to PCIe link training issues. > > http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf Yinghai Lu responded off-list that the hardware in question is an unreleased / secret Intel product, so this particular issue cannot be expected to be documented publicly at this point. Of course this raises the question whether issues with unreleased products can at all be considered valid regressions, given that the final product may not regress. It seems like a novelty to me that patches would get reverted for something like this, but we'll see. Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-06 21:35 ` Lukas Wunner @ 2017-02-08 13:00 ` Erik Veijola 2017-02-08 17:25 ` Bjorn Helgaas 1 sibling, 0 replies; 24+ messages in thread From: Erik Veijola @ 2017-02-08 13:00 UTC (permalink / raw) To: Lukas Wunner Cc: Mika Westerberg, Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Linux Kernel Mailing List On Mon, Feb 06, 2017 at 10:35:28PM +0100, Lukas Wunner wrote: > On Mon, Feb 06, 2017 at 12:37:06PM +0200, Mika Westerberg wrote: > > On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > > > training, or perhaps errata of Skylake chips related to link training > > > on hotplug ports? > > > > According to the 100-series (the chipset used with Skylake) errata > > below, I don't see any mentions related to PCIe link training issues. > > > > http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf > > Yinghai Lu responded off-list that the hardware in question is an > unreleased / secret Intel product, so this particular issue cannot > be expected to be documented publicly at this point. > > Of course this raises the question whether issues with unreleased > products can at all be considered valid regressions, given that the > final product may not regress. It seems like a novelty to me that > patches would get reverted for something like this, but we'll see. > > Lukas > Yinghai, we may have a similar system in our lab so we could test this also. What is your setup for doing the test? -Erik ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-06 21:35 ` Lukas Wunner 2017-02-08 13:00 ` Erik Veijola @ 2017-02-08 17:25 ` Bjorn Helgaas 1 sibling, 0 replies; 24+ messages in thread From: Bjorn Helgaas @ 2017-02-08 17:25 UTC (permalink / raw) To: Lukas Wunner Cc: Mika Westerberg, Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Linux Kernel Mailing List On Mon, Feb 06, 2017 at 10:35:28PM +0100, Lukas Wunner wrote: > On Mon, Feb 06, 2017 at 12:37:06PM +0200, Mika Westerberg wrote: > > On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > > > training, or perhaps errata of Skylake chips related to link training > > > on hotplug ports? > > > > According to the 100-series (the chipset used with Skylake) errata > > below, I don't see any mentions related to PCIe link training issues. > > > > http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/100-series-chipset-spec-update.pdf > > Yinghai Lu responded off-list that the hardware in question is an > unreleased / secret Intel product, so this particular issue cannot > be expected to be documented publicly at this point. > > Of course this raises the question whether issues with unreleased > products can at all be considered valid regressions, given that the > final product may not regress. It seems like a novelty to me that > patches would get reverted for something like this, but we'll see. I assume the hardware will eventually be released, and I assume the hardware will not be changed because of this issue. I would like to avoid the situation of having v4.9 but not v4.10 work on this hardware. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-05 7:34 ` Lukas Wunner 2017-02-06 10:37 ` Mika Westerberg @ 2017-02-06 18:10 ` Bjorn Helgaas [not found] ` <20170206204249.GA679@wunner.de> 2 siblings, 0 replies; 24+ messages in thread From: Bjorn Helgaas @ 2017-02-06 18:10 UTC (permalink / raw) To: Lukas Wunner Cc: Yinghai Lu, Bjorn Helgaas, Rafael J. Wysocki, linux-pci@vger.kernel.org, Mika Westerberg, Linux Kernel Mailing List On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > On Sat, Feb 04, 2017 at 08:22:59PM -0800, Yinghai Lu wrote: > > On Sat, Feb 4, 2017 at 3:34 PM, Lukas Wunner <lukas@wunner.de> wrote: > > > On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: > > >> On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: > > >> > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > > >> > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, > > >> > so here's a tentative patch. > > >> > > > >> > -- >8 -- > > >> > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend > > >> > > >> it works: > > > > > > Thanks a lot for the report and for testing the patch! > > > > Wait, Commit 68db9bc still has problem with another server (skylake > > based), and this patch does not help. > [...] > > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > > [ 375.376609] pci_hotplug: power_write_file: power = 1 > > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec > > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff > > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 > > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 > > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 > > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > So on this Skylake machine link training fails after resuming from D3hot > to D0. > > One thing that's a bit fishy is that normally the Link Disable bit is > cleared when powering on the slot. This results in a debug message > in dmesg containg the string "lnk_ctrl = ", and that line is missing > from the output you've pasted above, suggesting that the machine is > not running a stock v4.10 kernel after all but something else. Could > you check why this message is not printed? Could you check with lspci > if the Link Disable bit is set before you invoke "echo 1"? > > This is the call stack: > pciehp_sysfs_enable_slot() > pciehp_enable_slot() > board_added() > pciehp_power_on_slot() > pciehp_link_enable() > __pciehp_link_set() > > Another theory is that the link is generally unreliable on this machine > since the Link Bandwidth Management Status bit is set in the Link Status > Register ("lnk_status = 5001"), which according to the spec means: > > "Hardware has changed Link speed or width to attempt to correct unreliable > Link operation, either through an LTSSM timeout or a higher level process. > This bit must be set if the Physical Layer reports a speed or width change > was initiated by the Downstream component that was not indicated as an > autonomous change." > > In this case it would be good to know which hardware exactly we're dealing > with so that we might quirk it to not runtime suspend the port. To that > end, could you attach a full dmesg log to the bugzilla entry I've created? > https://bugzilla.kernel.org/show_bug.cgi?id=193951 > > @Mika, Rafael: Are you aware of Skylake machines with unreliable link > training, or perhaps errata of Skylake chips related to link training > on hotplug ports? I think we're prematurely leaping to the conclusion that this is a hardware erratum. I don't have nearly the confidence that pciehp is handling this correctly that you seem to have. If this is a hardware erratum, we should be able to turn off CONFIG_HOTPLUG_PCI_PCIE and drive through this scenario manually with setpci. That sequence would be immensely helpful to any hardware engineers who want to investigate this. I'm hesitant to add a quirk until we have a better understanding of what's going on. Yinghai tripped over this one broken case, but I don't have any reason to believe that's the only one. Bjorn ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <20170206204249.GA679@wunner.de>]
[parent not found: <CAE9FiQXSmB6Cs55nFtdw3rRrVrivwpDGNTwLwYtvWCEe4nsuHg@mail.gmail.com>]
* Re: pciehp is broken from 4.10-rc1 [not found] ` <CAE9FiQXSmB6Cs55nFtdw3rRrVrivwpDGNTwLwYtvWCEe4nsuHg@mail.gmail.com> @ 2017-02-07 6:08 ` Lukas Wunner 2017-02-07 18:08 ` Yinghai Lu 0 siblings, 1 reply; 24+ messages in thread From: Lukas Wunner @ 2017-02-07 6:08 UTC (permalink / raw) To: Yinghai Lu; +Cc: Bjorn Helgaas, linux-pci, Rafael J. Wysocki, Mika Westerberg [cc += linux-pci] On Mon, Feb 06, 2017 at 07:51:08PM -0800, Yinghai Lu wrote: > On Mon, Feb 6, 2017 at 12:42 PM, Lukas Wunner <lukas@wunner.de> wrote: > > On Sun, Feb 05, 2017 at 08:34:54AM +0100, Lukas Wunner wrote: > >> On Sat, Feb 04, 2017 at 08:22:59PM -0800, Yinghai Lu wrote: > >> > Wait, Commit 68db9bc still has problem with another server (skylake > >> > based), and this patch does not help. > >> [...] > >> > sca05-0a81fd8d:~ # echo 1 > /sys/bus/pci/slots/11/power > >> > [ 375.376609] pci_hotplug: power_write_file: power = 1 > >> > [ 375.382175] pciehp 0000:b3:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > >> > [ 375.392695] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > [ 375.401370] pciehp 0000:b3:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > >> > [ 375.410231] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > >> > [ 375.411071] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > [ 375.445222] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > [ 377.444400] pciehp 0000:b3:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec > >> > [ 378.960364] pci 0000:b4:00.0 id reading try 50 times with interval 20 ms to get ffffffff > >> > [ 378.969406] pciehp 0000:b3:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 > >> > [ 378.978059] pciehp 0000:b3:00.0:pcie004: link training error: status 0x5001 > >> > [ 378.985834] pciehp 0000:b3:00.0:pcie004: Failed to check link status > >> > [ 378.987185] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > [ 378.987253] pciehp 0000:b3:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 > >> > [ 380.000409] pciehp 0000:b3:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 > >> > [ 380.000674] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > [ 380.018020] pciehp 0000:b3:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > >> > [ 380.019053] pciehp 0000:b3:00.0:pcie004: pending interrupts 0x0010 from Slot Status > >> > >> So on this Skylake machine link training fails after resuming from D3hot > >> to D0. > >> > >> One thing that's a bit fishy is that normally the Link Disable bit is > >> cleared when powering on the slot. This results in a debug message > >> in dmesg containg the string "lnk_ctrl = ", and that line is missing > >> from the output you've pasted above, suggesting that the machine is > >> not running a stock v4.10 kernel after all but something else. Could > >> you check why this message is not printed? Could you check with lspci > >> if the Link Disable bit is set before you invoke "echo 1"? > > > > Could you answer the questions above please? > > link is always enabled, except some BIOS disable the link if the the > card is not present. > > there is one time we disable link when power off the slot., and that > line get removed > to support link event. > > now that enable link is once on BIOS have that disabled during booting. > > also i had one local patch: > during enabling, the function will check if the old value is the same > as new value. > if it the same it will not write again. > > void pcie_link_disable_set(struct pci_dev *dev, int bit) > { > u16 lnk_ctrl, old_lnk_ctrl; > > if (!pci_is_pcie(dev)) > return; > > pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnk_ctrl); > old_lnk_ctrl = lnk_ctrl; > > if (!bit) > lnk_ctrl &= ~PCI_EXP_LNKCTL_LD; > else > lnk_ctrl |= PCI_EXP_LNKCTL_LD; > > if (old_lnk_ctrl == lnk_ctrl) > return; > > pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnk_ctrl); > > dev_printk(KERN_DEBUG, &dev->dev, "%s: lnk_ctrl = %x\n", __func__, > lnk_ctrl); > } > EXPORT_SYMBOL(pcie_link_disable_set); So you're not even running a stock v4.10 kernel? The function you've quoted above is not part of the stock kernel. Where are you calling this from? Please retry with a stock v4.10-rc7 kernel and report back if the issue persists. Sorry to be blunt, but I think it's unreasonable and unfair to report an issue with link training and alleging that it's caused by my patch, and afterwards coming out of the closet that you're using custom patches that modify link training, the very part that is failing. Lukas ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-07 6:08 ` Lukas Wunner @ 2017-02-07 18:08 ` Yinghai Lu 2017-02-08 8:46 ` Yinghai Lu 0 siblings, 1 reply; 24+ messages in thread From: Yinghai Lu @ 2017-02-07 18:08 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Rafael J. Wysocki, Mika Westerberg On Mon, Feb 6, 2017 at 10:08 PM, Lukas Wunner <lukas@wunner.de> wrote: > Please retry with a stock v4.10-rc7 kernel and report back if the issue > persists. sca05-0a81fd7f:~ # echo 0 > /sys/bus/pci/slots/7/power [ 454.072658] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 11f1 [ 454.082194] pciehp 0000:73:00.0:pcie004: pciehp_unconfigure_device: domain:bus:dev = 0000:74:00 [ 454.091931] mlx4_core 0000:74:00.0: PME# disabled [ 455.418084] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 455.426802] pciehp 0000:73:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 455.490381] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 455.498938] pciehp 0000:73:00.0:pcie004: Slot(7): Link Down [ 455.505426] pciehp 0000:73:00.0:pcie004: Slot(7): Link Down event ignored; already powering off [ 456.440169] pciehp 0000:73:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 456.441181] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power [ 470.356464] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 470.366662] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 470.375339] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 470.384199] pciehp 0000:73:00.0:pcie004: __pciehp_link_set: lnk_ctrl = 40 [ 470.391789] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 470.391966] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 470.428791] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 472.428147] pciehp 0000:73:00.0:pcie004: Data Link Layer Link Active not set in 1000 msec [ 473.944158] pci 0000:74:00.0 id reading try 50 times with interval 20 ms to get ffffffff [ 473.953209] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: lnk_status = 5001 [ 473.961861] pciehp 0000:73:00.0:pcie004: link training error: status 0x5001 [ 473.969642] pciehp 0000:73:00.0:pcie004: Failed to check link status [ 473.970721] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 473.970818] pciehp 0000:73:00.0:pcie004: pciehp_power_off_slot: SLOTCTRL a8 write cmd 400 [ 475.000272] pciehp 0000:73:00.0:pcie004: pciehp_green_led_off: SLOTCTRL a8 write cmd 300 [ 475.000880] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 475.017879] pciehp 0000:73:00.0:pcie004: pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 [ 475.018386] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status -bash: echo: write error: Operation not permitted ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-07 18:08 ` Yinghai Lu @ 2017-02-08 8:46 ` Yinghai Lu 2017-02-18 23:46 ` Bjorn Helgaas 0 siblings, 1 reply; 24+ messages in thread From: Yinghai Lu @ 2017-02-08 8:46 UTC (permalink / raw) To: Lukas Wunner Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Rafael J. Wysocki, Mika Westerberg On Tue, Feb 07, 2017 at 10:08:13AM -0800, Yinghai Lu wrote: > On Mon, Feb 6, 2017 at 10:08 PM, Lukas Wunner <lukas@wunner.de> wrote: > > Please retry with a stock v4.10-rc7 kernel and report back if the issue > > persists. > > sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power > [ 470.356464] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: > SLOTCTRL a8 value read 17f1 > [ 470.366662] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 470.375339] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: > SLOTCTRL a8 write cmd 0 > [ 470.384199] pciehp 0000:73:00.0:pcie004: __pciehp_link_set: lnk_ctrl = 40 > [ 470.391789] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: > SLOTCTRL a8 write cmd 200 > [ 470.391966] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 470.428791] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 472.428147] pciehp 0000:73:00.0:pcie004: Data Link Layer Link > Active not set in 1000 msec > [ 473.944158] pci 0000:74:00.0 id reading try 50 times with interval > 20 ms to get ffffffff > [ 473.953209] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: > lnk_status = 5001 > [ 473.961861] pciehp 0000:73:00.0:pcie004: link training error: status 0x5001 > [ 473.969642] pciehp 0000:73:00.0:pcie004: Failed to check link status > [ 473.970721] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 473.970818] pciehp 0000:73:00.0:pcie004: pciehp_power_off_slot: > SLOTCTRL a8 write cmd 400 > [ 475.000272] pciehp 0000:73:00.0:pcie004: pciehp_green_led_off: > SLOTCTRL a8 write cmd 300 > [ 475.000880] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > [ 475.017879] pciehp 0000:73:00.0:pcie004: > pciehp_set_attention_status: SLOTCTRL a8 write cmd 40 > [ 475.018386] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 > from Slot Status > -bash: echo: write error: Operation not permitted Following change could make it work: --- drivers/pci/hotplug/pciehp_ctrl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c =================================================================== --- linux-2.6.orig/drivers/pci/hotplug/pciehp_ctrl.c +++ linux-2.6/drivers/pci/hotplug/pciehp_ctrl.c @@ -89,17 +89,17 @@ static int board_added(struct slot *p_sl struct controller *ctrl = p_slot->ctrl; struct pci_bus *parent = ctrl->pcie->port->subordinate; + pm_runtime_get_sync(&ctrl->pcie->port->dev); if (POWER_CTRL(ctrl)) { /* Power on slot */ retval = pciehp_power_on_slot(p_slot); if (retval) - return retval; + goto err_exit; } pciehp_green_led_blink(p_slot); /* Check link training status */ - pm_runtime_get_sync(&ctrl->pcie->port->dev); retval = pciehp_check_link_status(ctrl); if (retval) { ctrl_err(ctrl, "Failed to check link status\n"); after that change will get: sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power [ 300.949937] pci_hotplug: power_write_file: power = 1 [ 300.955502] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 300.982557] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 300.991171] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 301.000033] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 301.009274] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 301.662172] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 [ 301.670827] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 301.679376] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up [ 301.685463] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up event ignored; already powering on [ 301.685508] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 [ 302.005967] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: lnk_status = f083 [ 302.014859] pci 0000:74:00.0: [15b3:1003] type 00 class 0x0c0600 ... that means in commit 68db9bc8 changelog about power on does not need D0 ----- Even turning on slot power doesn't seem to require the port to be in D0, at least the PCIe spec doesn't say so and I confirmed that by testing with a Thunderbolt controller. ----- may not stand on this silicon. Thanks Yinghai ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-08 8:46 ` Yinghai Lu @ 2017-02-18 23:46 ` Bjorn Helgaas 2017-02-19 1:54 ` Yinghai Lu 0 siblings, 1 reply; 24+ messages in thread From: Bjorn Helgaas @ 2017-02-18 23:46 UTC (permalink / raw) To: Yinghai Lu Cc: Lukas Wunner, Bjorn Helgaas, linux-pci@vger.kernel.org, Rafael J. Wysocki, Mika Westerberg On Wed, Feb 08, 2017 at 12:46:26AM -0800, Yinghai Lu wrote: > after that change will get: > > sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power > [ 300.949937] pci_hotplug: power_write_file: power = 1 > [ 300.955502] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 > [ 300.982557] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status Can anybody explain this Command Completed interrupt? I think we should be in this path: pciehp_sysfs_enable_slot pciehp_enable_slot pciehp_get_adapter_status pciehp_get_power_status "SLOTCTRL a8 value read 17f1" (DLLSCE PWR_OFF PWR_IND_OFF CCIE HPIE ATTN_IND_OFF ABPE) board_added pciehp_power_on_slot pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) "SLOTCTRL a8 write cmd 0" (PWR_ON) pciehp_link_enable __pciehp_link_set I don't see a write to SLTCTL between pciehp_get_power_status() and pciehp_power_on_slot(), so I don't know why we see a Command Completed interrupt. > [ 300.991171] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > [ 301.000033] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 > [ 301.009274] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status > [ 301.662172] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 > [ 301.670827] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status > [ 301.679376] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up > [ 301.685463] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up event ignored; already powering on > [ 301.685508] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 > [ 302.005967] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: lnk_status = f083 > [ 302.014859] pci 0000:74:00.0: [15b3:1003] type 00 class 0x0c0600 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-18 23:46 ` Bjorn Helgaas @ 2017-02-19 1:54 ` Yinghai Lu 2017-02-19 2:53 ` Yinghai Lu 0 siblings, 1 reply; 24+ messages in thread From: Yinghai Lu @ 2017-02-19 1:54 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lukas Wunner, Bjorn Helgaas, linux-pci@vger.kernel.org, Rafael J. Wysocki, Mika Westerberg On Sat, Feb 18, 2017 at 3:46 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Wed, Feb 08, 2017 at 12:46:26AM -0800, Yinghai Lu wrote: >> after that change will get: >> >> sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power >> [ 300.949937] pci_hotplug: power_write_file: power = 1 >> [ 300.955502] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 >> [ 300.982557] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status > > Can anybody explain this Command Completed interrupt? I think we should be > in this path: > > pciehp_sysfs_enable_slot > pciehp_enable_slot > pciehp_get_adapter_status > pciehp_get_power_status > "SLOTCTRL a8 value read 17f1" (DLLSCE PWR_OFF PWR_IND_OFF CCIE HPIE ATTN_IND_OFF ABPE) > board_added > pciehp_power_on_slot > pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC) > "SLOTCTRL a8 write cmd 0" (PWR_ON) > pciehp_link_enable > __pciehp_link_set > > I don't see a write to SLTCTL between pciehp_get_power_status() and > pciehp_power_on_slot(), so I don't know why we see a Command Completed > interrupt. > >> [ 300.991171] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 >> [ 301.000033] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 >> [ 301.009274] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status >> [ 301.662172] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 >> [ 301.670827] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status >> [ 301.679376] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up >> [ 301.685463] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up event ignored; already powering on >> [ 301.685508] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f083 >> [ 302.005967] pciehp 0000:73:00.0:pcie004: pciehp_check_link_status: lnk_status = f083 >> [ 302.014859] pci 0000:74:00.0: [15b3:1003] type 00 class 0x0c0600 That should belong to pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 We print out debug info after cmd write. pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_ON); should we adjust that print out sequence ? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-19 1:54 ` Yinghai Lu @ 2017-02-19 2:53 ` Yinghai Lu 0 siblings, 0 replies; 24+ messages in thread From: Yinghai Lu @ 2017-02-19 2:53 UTC (permalink / raw) To: Bjorn Helgaas Cc: Lukas Wunner, Bjorn Helgaas, linux-pci@vger.kernel.org, Rafael J. Wysocki, Mika Westerberg [-- Attachment #1: Type: text/plain, Size: 2024 bytes --] On Sat, Feb 18, 2017 at 5:54 PM, Yinghai Lu <yinghai@kernel.org> wrote: > On Sat, Feb 18, 2017 at 3:46 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >> On Wed, Feb 08, 2017 at 12:46:26AM -0800, Yinghai Lu wrote: >>> after that change will get: >>> >>> sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power >>> [ 300.949937] pci_hotplug: power_write_file: power = 1 >>> [ 300.955502] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 >>> [ 300.982557] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status >> > > That should belong to > pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 > > We print out debug info after cmd write. > pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC); > ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, > pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, > PCI_EXP_SLTCTL_PWR_ON); > > should we adjust that print out sequence ? after adjust with attached patch. sca05-0a81fd7f:~ # echo 1 > /sys/bus/pci/slots/7/power [ 376.779461] pci_hotplug: power_write_file: power = 1 [ 376.785042] pciehp 0000:73:00.0:pcie004: pciehp_get_power_status: SLOTCTRL a8 value read 17f1 [ 376.794577] pciehp 0000:73:00.0:pcie004: pciehp_power_on_slot: SLOTCTRL a8 write cmd 0 [ 376.883811] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 376.892619] pciehp 0000:73:00.0:pcie004: pciehp_green_led_blink: SLOTCTRL a8 write cmd 200 [ 377.011807] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0010 from Slot Status [ 377.500257] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f103 [ 377.503324] pciehp 0000:73:00.0:pcie004: pciehp_check_link_active: lnk_status = f103 [ 377.517559] pciehp 0000:73:00.0:pcie004: pending interrupts 0x0108 from Slot Status [ 377.526113] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up [ 377.532177] pciehp 0000:73:00.0:pcie004: Slot(7): Link Up event ignored; already powering on Thanks Yinghai [-- Attachment #2: pciehp_move_printout_before_write_cmd.patch --] [-- Type: text/x-patch, Size: 4598 bytes --] diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index 026830a..94df18f99 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -475,9 +475,9 @@ void pciehp_set_attention_status(struct slot *slot, u8 value) default: return; } - pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_cmd); + pcie_write_cmd_nowait(ctrl, slot_cmd, PCI_EXP_SLTCTL_AIC); } void pciehp_green_led_on(struct slot *slot) @@ -487,11 +487,11 @@ void pciehp_green_led_on(struct slot *slot) if (!PWR_LED(ctrl)) return; - pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON, - PCI_EXP_SLTCTL_PIC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_IND_ON); + pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_ON, + PCI_EXP_SLTCTL_PIC); } void pciehp_green_led_off(struct slot *slot) @@ -501,11 +501,11 @@ void pciehp_green_led_off(struct slot *slot) if (!PWR_LED(ctrl)) return; - pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF, - PCI_EXP_SLTCTL_PIC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_IND_OFF); + pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF, + PCI_EXP_SLTCTL_PIC); } void pciehp_green_led_blink(struct slot *slot) @@ -515,11 +515,11 @@ void pciehp_green_led_blink(struct slot *slot) if (!PWR_LED(ctrl)) return; - pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK, - PCI_EXP_SLTCTL_PIC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_IND_BLINK); + pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_PWR_IND_BLINK, + PCI_EXP_SLTCTL_PIC); } int pciehp_power_on_slot(struct slot *slot) @@ -536,10 +536,10 @@ int pciehp_power_on_slot(struct slot *slot) PCI_EXP_SLTSTA_PFD); ctrl->power_fault_detected = 0; - pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_ON); + pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC); retval = pciehp_link_enable(ctrl); if (retval) @@ -552,10 +552,10 @@ void pciehp_power_off_slot(struct slot *slot) { struct controller *ctrl = slot->ctrl; - pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, PCI_EXP_SLTCTL_PWR_OFF); + pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC); } static irqreturn_t pciehp_isr(int irq, void *dev_id) @@ -701,9 +701,9 @@ void pcie_enable_notification(struct controller *ctrl) PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_DLLSCE); - pcie_write_cmd_nowait(ctrl, cmd, mask); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd); + pcie_write_cmd_nowait(ctrl, cmd, mask); } static void pcie_disable_notification(struct controller *ctrl) @@ -714,9 +714,9 @@ static void pcie_disable_notification(struct controller *ctrl) PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE | PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE | PCI_EXP_SLTCTL_DLLSCE); - pcie_write_cmd(ctrl, 0, mask); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0); + pcie_write_cmd(ctrl, 0, mask); } /* @@ -743,18 +743,18 @@ int pciehp_reset_slot(struct slot *slot, int probe) ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE; stat_mask |= PCI_EXP_SLTSTA_DLLSC; - pcie_write_cmd(ctrl, 0, ctrl_mask); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0); + pcie_write_cmd(ctrl, 0, ctrl_mask); if (pciehp_poll_mode) del_timer_sync(&ctrl->poll_timer); pci_reset_bridge_secondary_bus(ctrl->pcie->port); pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask); - pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask); ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__, pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask); + pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask); if (pciehp_poll_mode) int_poll_timeout(ctrl->poll_timer.data); ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-04 23:34 ` Lukas Wunner 2017-02-05 4:22 ` Yinghai Lu @ 2017-02-06 11:45 ` Rafael J. Wysocki 1 sibling, 0 replies; 24+ messages in thread From: Rafael J. Wysocki @ 2017-02-06 11:45 UTC (permalink / raw) To: Lukas Wunner Cc: Yinghai Lu, Bjorn Helgaas, linux-pci@vger.kernel.org, Mika Westerberg On Sunday, February 05, 2017 12:34:43 AM Lukas Wunner wrote: > On Sat, Feb 04, 2017 at 01:44:34PM -0800, Yinghai Lu wrote: > > On Sat, Feb 4, 2017 at 10:56 AM, Lukas Wunner <lukas@wunner.de> wrote: > > > On Sat, Feb 04, 2017 at 09:12:54AM +0100, Lukas Wunner wrote: > > > Section 6.7.3.4 of the PCIe Base spec seems to support the theory above, > > > so here's a tentative patch. > > > > > > > > > -- >8 -- > > > Subject: [PATCH] PCI: pciehp: Don't enable PME on runtime suspend > > > > it works: > > Thanks a lot for the report and for testing the patch! > > > @Rafael: > Bjorn usually wants you to ack PM-related patches. Looking at the > patch again I'm now thinking it might be cleaner to move the check > for is_hotplug_bridge into pci_dev_run_wake(). Thoughts? Yes, pci_dev_run_wake() would be a better place. Thanks, Rafael ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: pciehp is broken from 4.10-rc1 2017-02-03 4:11 pciehp is broken from 4.10-rc1 Yinghai Lu 2017-02-03 5:52 ` Lukas Wunner @ 2017-02-03 15:09 ` Bjorn Helgaas 1 sibling, 0 replies; 24+ messages in thread From: Bjorn Helgaas @ 2017-02-03 15:09 UTC (permalink / raw) To: Yinghai Lu Cc: Bjorn Helgaas, Rafael J. Wysocki, Lukas Wunner, linux-pci@vger.kernel.org, Mika Westerberg, linux-kernel [+cc Mika, linux-kernel] On Thu, Feb 02, 2017 at 08:11:48PM -0800, Yinghai Lu wrote: > 4.9 is working, > ... > After reverting > > From 68db9bc814362e7f24371c27d12a4f34477d9356 Mon Sep 17 00:00:00 2001 > From: Lukas Wunner <lukas@wunner.de> > Date: Fri, 28 Oct 2016 10:52:06 +0200 > Subject: PCI: pciehp: Add runtime PM support for PCIe hotplug ports > > the hotplug work again. I provisionally reverted 68db9bc81436 ("PCI: pciehp: Add runtime PM support for PCIe hotplug ports") on my for-linus branch while this is being debugged. To be clear, this revert is only a last resort to avoid releasing v4.10 with a regression. I hope and assume we'll have a real fix before v4.10 and we'll be able to drop the revert in favor of the fix. Can someone please open a kernel.org bugzilla, mark it as a regression, and attach the complete dmesg log and "lspci -vv" output? Please mention that bugzilla in the changelog of the fix. Bjorn ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2017-02-19 2:53 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-02-03 4:11 pciehp is broken from 4.10-rc1 Yinghai Lu
2017-02-03 5:52 ` Lukas Wunner
2017-02-04 7:00 ` Yinghai Lu
2017-02-04 8:12 ` Lukas Wunner
2017-02-04 18:56 ` Lukas Wunner
2017-02-04 21:44 ` Yinghai Lu
2017-02-04 23:34 ` Lukas Wunner
2017-02-05 4:22 ` Yinghai Lu
2017-02-05 5:20 ` Yinghai Lu
2017-02-05 7:34 ` Lukas Wunner
2017-02-06 10:37 ` Mika Westerberg
2017-02-06 11:49 ` Rafael J. Wysocki
2017-02-06 21:35 ` Lukas Wunner
2017-02-08 13:00 ` Erik Veijola
2017-02-08 17:25 ` Bjorn Helgaas
2017-02-06 18:10 ` Bjorn Helgaas
[not found] ` <20170206204249.GA679@wunner.de>
[not found] ` <CAE9FiQXSmB6Cs55nFtdw3rRrVrivwpDGNTwLwYtvWCEe4nsuHg@mail.gmail.com>
2017-02-07 6:08 ` Lukas Wunner
2017-02-07 18:08 ` Yinghai Lu
2017-02-08 8:46 ` Yinghai Lu
2017-02-18 23:46 ` Bjorn Helgaas
2017-02-19 1:54 ` Yinghai Lu
2017-02-19 2:53 ` Yinghai Lu
2017-02-06 11:45 ` Rafael J. Wysocki
2017-02-03 15:09 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).