* PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time
@ 2008-11-28 12:28 Thomas Renninger
2008-12-05 12:41 ` Identified: PCIe ASPM causes machine (HP Compaq 6735s) to sometimes hang in endless loop Thomas Renninger
2008-12-05 13:07 ` [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger
0 siblings, 2 replies; 12+ messages in thread
From: Thomas Renninger @ 2008-11-28 12:28 UTC (permalink / raw)
To: linux-kernel; +Cc: jbarnes, shaohua.li, Rafael Wysocki, shemminger, netdev
Hi,
The hang does not always happen.
On latest vanilla 2.6.28-rc6 it nearly always hangs, on a .27 SUSE kernel
maybe 1 out for 3 times.
I very much expect (I am pretty sure now) it is PCIE ASPM.
I tried to compile out:
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
Both disabled survived a reasonable amount of reboots, but when enabling
CONFIG_PCIEASPM it did not boot anymore on the second try.
The machine is hanging at the network card's PCI initialization rather early,
here is a photo of the hang:
ftp.suse.com/pub/people/trenn/HP_pci_aspm_hang.jpg
This is the network card:
Marvell 11ab:4357 (PCI ID) using the sky2 driver.
Once the machine booted (even with aspm enabled, the network device works
properly).
Possibly unrelated, because the machine hangs far before the sky2 driver kicks
in or does PCI subsystem already need to set something similar?:
The sky2 driver has some extra ASPM poking in its power_on routine:
/* set all bits to 0 except bits 15..12 and 8 */
reg &= P_ASPM_CONTROL_MSK;
sky2_pci_write32(hw, PCI_DEV_REG4, reg);
It seem to be this Marval/Yukon card:
CHIP_ID_YUKON_UL_2 = 0xba, /* YUKON-2 Ultra 2 */
Oh wait,
The SubDevice PCI ID (via hwinfo --netcard) is 0xba, but the driver thinks it
is a (via dmesg):
CHIP_ID_YUKON_FE_P = 0xb8, /* YUKON-2 FE+ */
Any ideas from people with more knowledge in the PCI(e) area is very much
appreciated.
Thanks,
Thomas
^ permalink raw reply [flat|nested] 12+ messages in thread* Identified: PCIe ASPM causes machine (HP Compaq 6735s) to sometimes hang in endless loop 2008-11-28 12:28 PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger @ 2008-12-05 12:41 ` Thomas Renninger 2008-12-05 13:07 ` [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger 1 sibling, 0 replies; 12+ messages in thread From: Thomas Renninger @ 2008-12-05 12:41 UTC (permalink / raw) To: linux-kernel; +Cc: jbarnes, shaohua.li, Rafael Wysocki, shemminger, netdev Hi, I got it. It is and endless loop in drivers/pci/pcie/aspm.c: On Friday 28 November 2008 13:28:54 Thomas Renninger wrote: > Hi, > > The hang does not always happen. > On latest vanilla 2.6.28-rc6 it nearly always hangs, on a .27 SUSE kernel > maybe 1 out for 3 times. > > I very much expect (I am pretty sure now) it is PCIE ASPM. > > I tried to compile out: > CONFIG_PCIEAER=y > CONFIG_PCIEASPM=y > > Both disabled survived a reasonable amount of reboots, but when enabling > CONFIG_PCIEASPM it did not boot anymore on the second try. > > The machine is hanging at the network card's PCI initialization rather > early, here is a photo of the hang: > ftp.suse.com/pub/people/trenn/HP_pci_aspm_hang.jpg First, I'd like to know whether I could have solved that easier. The machine is a laptop without firewire and serial console. It has an PCIe slot. sysrq did not work at this point, because the keyboard is not functional yet. Via serial console it probably would have been possible to trigger a backtrace, you see the function where it loops -> found in 10 minutes, but as said this machine has no serial port. Did I miss some nice debug functionality/trick which could have found that easier? After I knew it is PCIe ASPM, I went through the code, found the loop, added a printk there and some more at other places and got it... At the end is some info for people who know more about PCI than me. It would be great if the root cause is found/fixed. I will provide a patch in a follow up mail to workaround the hang which IMO (if reviewed) should go into stable kernels back to when ASPM was added (was that .26?). Thanks, Thomas In case the machine boots fine I get: -------- pci 0000:00:04.0: Writing 0x63 to 104 pci 0000:00:04.0: Reading 0x7011 at 106 PCI: bridge 0000:00:04.0 io port: [3000, 4fff] PCI: bridge 0000:00:04.0 32bit mmio: [93100000, 941fffff] PCI: bridge 0000:00:04.0 64bit mmio pref: [90000000, 90ffffff] PCI: bridge 0000:00:07.0 io port: [2000, 2fff] PCI: bridge 0000:00:07.0 32bit mmio: [92100000, 930fffff] PCI: bridge 0000:00:07.0 64bit mmio pref: [91000000, 91ffffff] PCI: 0000:06:00.0 reg 10 64bit mmio: [92000000, 92003fff] -------- If the machine hangs I get: -------- pci 0000:00:04.0: Writing 0x63 to 104 pci 0000:00:04.0: Reading 0x7811 at 106 pci 0000:00:04.0: Reading 0x7811 at 106 pci 0000:00:04.0: Reading 0x7811 at 106 pci 0000:00:04.0: Reading 0x7811 at 106 pci 0000:00:04.0: Reading 0x7811 at 106 .... pci 0000:00:04.0: Could not configure ASPM PCI: bridge 0000:00:04.0 io port: [3000, 4fff] PCI: bridge 0000:00:04.0 32bit mmio: [93100000, 941fffff] PCI: bridge 0000:00:04.0 64bit mmio pref: [90000000, 90ffffff] PCI: bridge 0000:00:07.0 io port: [2000, 2fff] PCI: bridge 0000:00:07.0 32bit mmio: [92100000, 930fffff] PCI: bridge 0000:00:07.0 64bit mmio pref: [91000000, 91ffffff] PCI: 0000:06:00.0 reg 10 64bit mmio: [92000000, 92003fff] -------- This is the patch for above output: --- drivers/pci/pcie/aspm.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) Index: linux-2.6.27/drivers/pci/pcie/aspm.c =================================================================== --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c +++ linux-2.6.27/drivers/pci/pcie/aspm.c @@ -165,6 +165,7 @@ static void pcie_aspm_configure_common_c u16 reg16 = 0; struct pci_dev *child_dev; int same_clock = 1; + int loop_count = 0; /* * all functions of a slot should have the same Slot Clock @@ -210,14 +211,21 @@ static void pcie_aspm_configure_common_c /* retrain link */ reg16 |= PCI_EXP_LNKCTL_RL; pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); + dev_printk (KERN_INFO, &pdev->dev, "Writing 0x%x to %d\n", + reg16, pos + PCI_EXP_LNKCTL); /* Wait for link training end */ - while (1) { + while (loop_count < 100) { pci_read_config_word(pdev, pos + PCI_EXP_LNKSTA, ®16); + dev_printk (KERN_INFO, &pdev->dev, "Reading 0x%x at %d\n", + reg16, pos + PCI_EXP_LNKSTA); if (!(reg16 & PCI_EXP_LNKSTA_LT)) break; cpu_relax(); + loop_count++; } + if (loop_count == 100) + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); } /* ======================================================= Here the lspci -nn -vv output of the bridge (ASPM poking on that one makes the machine hang) 00:04.0 PCI bridge [0604]: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 0) [1022:9604] (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 00003000-00004fff Memory behind bridge: 93100000-941fffff Prefetchable memory behind bridge: 0000000090000000-0000000090ffffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <64ns, L1 <1us ClockPM- Suprise- LLActRep+ BwNot+ LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise- Slot # 4, PowerLimit 25.000000; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit- Count=1/1 Enable+ Address: fee0300c Data: 4151 Capabilities: [b0] Subsystem: Hewlett-Packard Company Device [103c:30e4] Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100] Vendor Specific Information <?> Capabilities: [110] Virtual Channel <?> Kernel driver in use: pcieport-driver Kernel modules: shpchp ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-11-28 12:28 PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger 2008-12-05 12:41 ` Identified: PCIe ASPM causes machine (HP Compaq 6735s) to sometimes hang in endless loop Thomas Renninger @ 2008-12-05 13:07 ` Thomas Renninger 2008-12-05 18:21 ` Matthew Garrett 1 sibling, 1 reply; 12+ messages in thread From: Thomas Renninger @ 2008-12-05 13:07 UTC (permalink / raw) To: linux-kernel Cc: jbarnes, shaohua.li, Rafael Wysocki, shemminger, netdev, Stable Hi, This is intended for review by someone with more PCI experience, first. If it is considered as safe as I think it is :), it would be great if this can be picked up ang go into stable kernels as well if the feedback is positive (just checked with .25, the patch still patches there, even without offset). Thanks, Thomas PCIE: Break out of endless loop waiting for PCI config bits to switch Makes a Compaq 6735s boot reliably again which hang in the loop on some boots. Signed-off-by: Thomas Renninger <trenn@suse.de> --- drivers/pci/pcie/aspm.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) Index: linux-2.6.27/drivers/pci/pcie/aspm.c =================================================================== --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c +++ linux-2.6.27/drivers/pci/pcie/aspm.c @@ -165,6 +165,7 @@ static void pcie_aspm_configure_common_c u16 reg16 = 0; struct pci_dev *child_dev; int same_clock = 1; + int loop_count = 0; /* * all functions of a slot should have the same Slot Clock @@ -212,12 +213,15 @@ static void pcie_aspm_configure_common_c pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); /* Wait for link training end */ - while (1) { + while (loop_count < 100) { pci_read_config_word(pdev, pos + PCI_EXP_LNKSTA, ®16); if (!(reg16 & PCI_EXP_LNKSTA_LT)) break; cpu_relax(); + loop_count++; } + if (loop_count == 100) + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); } /* ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-05 13:07 ` [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger @ 2008-12-05 18:21 ` Matthew Garrett 2008-12-08 1:32 ` Shaohua Li 0 siblings, 1 reply; 12+ messages in thread From: Matthew Garrett @ 2008-12-05 18:21 UTC (permalink / raw) To: Thomas Renninger Cc: linux-kernel, jbarnes, shaohua.li, Rafael Wysocki, shemminger, netdev, Stable On Fri, Dec 05, 2008 at 02:07:13PM +0100, Thomas Renninger wrote: > PCIE: Break out of endless loop waiting for PCI config bits to switch > > Makes a Compaq 6735s boot reliably again which hang in the loop > on some boots. Which device does it get stuck on? > + if (loop_count == 100) > + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); "ASPM: Could not configure common clock\n"? ASPM should still work, though with higher latency. It probably also needs to revert the configuration changes. -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-05 18:21 ` Matthew Garrett @ 2008-12-08 1:32 ` Shaohua Li 2008-12-08 14:56 ` Thomas Renninger 2008-12-08 15:04 ` Thomas Renninger 0 siblings, 2 replies; 12+ messages in thread From: Shaohua Li @ 2008-12-08 1:32 UTC (permalink / raw) To: Matthew Garrett Cc: Thomas Renninger, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Sat, 2008-12-06 at 02:21 +0800, Matthew Garrett wrote: > On Fri, Dec 05, 2008 at 02:07:13PM +0100, Thomas Renninger wrote: > > > PCIE: Break out of endless loop waiting for PCI config bits to switch > > > > Makes a Compaq 6735s boot reliably again which hang in the loop > > on some boots. > > Which device does it get stuck on? > > > + if (loop_count == 100) > > + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); > > "ASPM: Could not configure common clock\n"? ASPM should still work, > though with higher latency. It probably also needs to revert the > configuration changes. Yep, Just undo the pci config writes of pcie_aspm_configure_common_clock should be fine to me. Maybe an expiration time is ok here. Does the device work after this? Thanks, Shaohua ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-08 1:32 ` Shaohua Li @ 2008-12-08 14:56 ` Thomas Renninger 2008-12-08 15:04 ` Thomas Renninger 1 sibling, 0 replies; 12+ messages in thread From: Thomas Renninger @ 2008-12-08 14:56 UTC (permalink / raw) To: Shaohua Li Cc: Matthew Garrett, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Monday 08 December 2008 02:32:42 Shaohua Li wrote: > On Sat, 2008-12-06 at 02:21 +0800, Matthew Garrett wrote: > > On Fri, Dec 05, 2008 at 02:07:13PM +0100, Thomas Renninger wrote: > > > PCIE: Break out of endless loop waiting for PCI config bits to switch > > > > > > Makes a Compaq 6735s boot reliably again which hang in the loop > > > on some boots. > > > > Which device does it get stuck on? > > > > > + if (loop_count == 100) > > > + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); > > > > "ASPM: Could not configure common clock\n"? ASPM should still work, > > though with higher latency. It probably also needs to revert the > > configuration changes. > > Yep, Just undo the pci config writes of pcie_aspm_configure_common_clock > should be fine to me. Maybe an expiration time is ok here. > Does the device work after this? After not writing back the pci values as done with this patch? Yes, I think it did. It definitely does with this patch. What about this one: ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-08 1:32 ` Shaohua Li 2008-12-08 14:56 ` Thomas Renninger @ 2008-12-08 15:04 ` Thomas Renninger 2008-12-08 15:09 ` Matthew Garrett 1 sibling, 1 reply; 12+ messages in thread From: Thomas Renninger @ 2008-12-08 15:04 UTC (permalink / raw) To: Shaohua Li Cc: Matthew Garrett, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Monday 08 December 2008 02:32:42 Shaohua Li wrote: > On Sat, 2008-12-06 at 02:21 +0800, Matthew Garrett wrote: > > On Fri, Dec 05, 2008 at 02:07:13PM +0100, Thomas Renninger wrote: > > > PCIE: Break out of endless loop waiting for PCI config bits to switch > > > > > > Makes a Compaq 6735s boot reliably again which hang in the loop > > > on some boots. > > > > Which device does it get stuck on? > > > > > + if (loop_count == 100) > > > + dev_printk (KERN_WARNING, &pdev->dev, "Could not configure ASPM\n"); > > > > "ASPM: Could not configure common clock\n"? ASPM should still work, > > though with higher latency. It probably also needs to revert the > > configuration changes. > > Yep, Just undo the pci config writes of pcie_aspm_configure_common_clock > should be fine to me. Maybe an expiration time is ok here. > Does the device work after this? After not writing back the pci values as done with this patch? Yes, I think it did, it's the network card behind the bridge. It definitely does with this patch. Thanks, Thomas What about this one. (I assume there cannot be more than 256 functions in a slot. There might be a better value): PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch Makes a Compaq 6735s boot reliably again which hang in the loop on some boots. Also correctly recover PCI bits if link trainig timed out. Signed-off-by: Thomas Renninger <trenn@suse.de> --- drivers/pci/pcie/aspm.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) Index: linux-2.6.27/drivers/pci/pcie/aspm.c =================================================================== --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c +++ linux-2.6.27/drivers/pci/pcie/aspm.c @@ -16,6 +16,7 @@ #include <linux/pm.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/jiffies.h> #include <linux/pci-aspm.h> #include "../pci.h" @@ -161,11 +162,12 @@ static void pcie_check_clock_pm(struct p */ static void pcie_aspm_configure_common_clock(struct pci_dev *pdev) { - int pos, child_pos; + int pos, child_pos, i = 0; u16 reg16 = 0; struct pci_dev *child_dev; int same_clock = 1; - + unsigned long start_jiffies = jiffies; + u16 child_regs[256], parent_reg; /* * all functions of a slot should have the same Slot Clock * Configuration, so just check one function @@ -191,16 +193,18 @@ static void pcie_aspm_configure_common_c child_pos = pci_find_capability(child_dev, PCI_CAP_ID_EXP); pci_read_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, ®16); + child_regs[i] = reg16; if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else reg16 &= ~PCI_EXP_LNKCTL_CCC; pci_write_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, reg16); + i++; } /* Configure upstream component */ - pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); + parent_reg = pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else @@ -212,12 +216,29 @@ static void pcie_aspm_configure_common_c pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); /* Wait for link training end */ - while (1) { + /* break out after waiting for 1 second */ + while ((jiffies - start_jiffies) < HZ) { pci_read_config_word(pdev, pos + PCI_EXP_LNKSTA, ®16); if (!(reg16 & PCI_EXP_LNKSTA_LT)) break; cpu_relax(); } + /* training failed -> recover */ + if ((jiffies - start_jiffies) >= HZ) { + dev_printk (KERN_ERR, &pdev->dev, "ASPM: Could not configure" + " common clock\n"); + i = 0; + list_for_each_entry(child_dev, &pdev->subordinate->devices, + bus_list) { + child_pos = pci_find_capability(child_dev, + PCI_CAP_ID_EXP); + pci_write_config_word(child_dev, + child_pos + PCI_EXP_LNKCTL, + child_regs[i]); + i++; + } + pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, parent_reg); + } } /* ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-08 15:04 ` Thomas Renninger @ 2008-12-08 15:09 ` Matthew Garrett 2008-12-08 15:17 ` Thomas Renninger 0 siblings, 1 reply; 12+ messages in thread From: Matthew Garrett @ 2008-12-08 15:09 UTC (permalink / raw) To: Thomas Renninger Cc: Shaohua Li, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Mon, Dec 08, 2008 at 04:04:09PM +0100, Thomas Renninger wrote: > - pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > + parent_reg = pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); I don't think that does what you think it does :) -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-08 15:09 ` Matthew Garrett @ 2008-12-08 15:17 ` Thomas Renninger 2008-12-09 1:19 ` Shaohua Li 0 siblings, 1 reply; 12+ messages in thread From: Thomas Renninger @ 2008-12-08 15:17 UTC (permalink / raw) To: Matthew Garrett Cc: Shaohua Li, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Monday 08 December 2008 16:09:19 Matthew Garrett wrote: > On Mon, Dec 08, 2008 at 04:04:09PM +0100, Thomas Renninger wrote: > > - pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > > + parent_reg = pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > > I don't think that does what you think it does :) Hehe, thanks for the quick and detailed review! This one should be better: PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch Makes a Compaq 6735s boot reliably again which hang in the loop on some boots. Signed-off-by: Thomas Renninger <trenn@suse.de> --- drivers/pci/pcie/aspm.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) Index: linux-2.6.27/drivers/pci/pcie/aspm.c =================================================================== --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c +++ linux-2.6.27/drivers/pci/pcie/aspm.c @@ -16,6 +16,7 @@ #include <linux/pm.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/jiffies.h> #include <linux/pci-aspm.h> #include "../pci.h" @@ -161,11 +162,12 @@ static void pcie_check_clock_pm(struct p */ static void pcie_aspm_configure_common_clock(struct pci_dev *pdev) { - int pos, child_pos; + int pos, child_pos, i = 0; u16 reg16 = 0; struct pci_dev *child_dev; int same_clock = 1; - + unsigned long start_jiffies = jiffies; + u16 child_regs[256], parent_reg; /* * all functions of a slot should have the same Slot Clock * Configuration, so just check one function @@ -191,16 +193,19 @@ static void pcie_aspm_configure_common_c child_pos = pci_find_capability(child_dev, PCI_CAP_ID_EXP); pci_read_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, ®16); + child_regs[i] = reg16; if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else reg16 &= ~PCI_EXP_LNKCTL_CCC; pci_write_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, reg16); + i++; } /* Configure upstream component */ pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); + parent_reg = reg16; if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else @@ -212,12 +217,29 @@ static void pcie_aspm_configure_common_c pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); /* Wait for link training end */ - while (1) { + /* break out after waiting for 1 second */ + while ((jiffies - start_jiffies) < HZ) { pci_read_config_word(pdev, pos + PCI_EXP_LNKSTA, ®16); if (!(reg16 & PCI_EXP_LNKSTA_LT)) break; cpu_relax(); } + /* training failed -> recover */ + if ((jiffies - start_jiffies) >= HZ) { + dev_printk (KERN_ERR, &pdev->dev, "ASPM: Could not configure" + " common clock\n"); + i = 0; + list_for_each_entry(child_dev, &pdev->subordinate->devices, + bus_list) { + child_pos = pci_find_capability(child_dev, + PCI_CAP_ID_EXP); + pci_write_config_word(child_dev, + child_pos + PCI_EXP_LNKCTL, + child_regs[i]); + i++; + } + pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, parent_reg); + } } /* ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-08 15:17 ` Thomas Renninger @ 2008-12-09 1:19 ` Shaohua Li 2008-12-09 12:05 ` Thomas Renninger 0 siblings, 1 reply; 12+ messages in thread From: Shaohua Li @ 2008-12-09 1:19 UTC (permalink / raw) To: Thomas Renninger Cc: Matthew Garrett, linux-kernel@vger.kernel.org, jbarnes@virtuousgeek.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Mon, 2008-12-08 at 23:17 +0800, Thomas Renninger wrote: > On Monday 08 December 2008 16:09:19 Matthew Garrett wrote: > > On Mon, Dec 08, 2008 at 04:04:09PM +0100, Thomas Renninger wrote: > > > - pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > > > + parent_reg = pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > > > > I don't think that does what you think it does :) > > Hehe, thanks for the quick and detailed review! > > This one should be better: > > PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch > > Makes a Compaq 6735s boot reliably again which hang in the loop > on some boots. > > Signed-off-by: Thomas Renninger <trenn@suse.de> > > --- > drivers/pci/pcie/aspm.c | 28 +++++++++++++++++++++++++--- > 1 file changed, 25 insertions(+), 3 deletions(-) > > Index: linux-2.6.27/drivers/pci/pcie/aspm.c > =================================================================== > --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c > +++ linux-2.6.27/drivers/pci/pcie/aspm.c > @@ -16,6 +16,7 @@ > #include <linux/pm.h> > #include <linux/init.h> > #include <linux/slab.h> > +#include <linux/jiffies.h> > #include <linux/pci-aspm.h> > #include "../pci.h" > > @@ -161,11 +162,12 @@ static void pcie_check_clock_pm(struct p > */ > static void pcie_aspm_configure_common_clock(struct pci_dev *pdev) > { > - int pos, child_pos; > + int pos, child_pos, i = 0; > u16 reg16 = 0; > struct pci_dev *child_dev; > int same_clock = 1; > - > + unsigned long start_jiffies = jiffies; > + u16 child_regs[256], parent_reg; child_regs[8] should be enough. There should be just one pcie slot under the port. > /* > * all functions of a slot should have the same Slot Clock > * Configuration, so just check one function > @@ -191,16 +193,19 @@ static void pcie_aspm_configure_common_c > child_pos = pci_find_capability(child_dev, PCI_CAP_ID_EXP); > pci_read_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, > ®16); > + child_regs[i] = reg16; > if (same_clock) > reg16 |= PCI_EXP_LNKCTL_CCC; > else > reg16 &= ~PCI_EXP_LNKCTL_CCC; > pci_write_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, > reg16); > + i++; > } > > /* Configure upstream component */ > pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); > + parent_reg = reg16; > if (same_clock) > reg16 |= PCI_EXP_LNKCTL_CCC; > else > @@ -212,12 +217,29 @@ static void pcie_aspm_configure_common_c > pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); > > /* Wait for link training end */ > - while (1) { > + /* break out after waiting for 1 second */ should we set start_jiffies here? Otherwise, it's ok to me. Thanks, Shaohua ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-09 1:19 ` Shaohua Li @ 2008-12-09 12:05 ` Thomas Renninger 2008-12-09 23:05 ` Jesse Barnes 0 siblings, 1 reply; 12+ messages in thread From: Thomas Renninger @ 2008-12-09 12:05 UTC (permalink / raw) To: Shaohua Li, jbarnes@virtuousgeek.org Cc: Matthew Garrett, linux-kernel@vger.kernel.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org Hi Jesse, can you add this one, please. I adjusted the patch to suggestions from Matthew and Shaohua, thus added their Signed-offs. This should still go into .28 as it makes machines boot which now freeze since the ASPM patch was introduced. Thanks, Thomas ---- PCIe: ASPM: Break out of endless loop waiting for PCI config bits to switch Makes a Compaq 6735s boot reliably again which hang in the loop on some boots. Give the link one second to train, otherwise break out of the loop and reset the previously set clock bits. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Matthew Garrett <mjg59@srcf.ucam.org> --- drivers/pci/pcie/aspm.c | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) Index: linux-2.6.27/drivers/pci/pcie/aspm.c =================================================================== --- linux-2.6.27.orig/drivers/pci/pcie/aspm.c +++ linux-2.6.27/drivers/pci/pcie/aspm.c @@ -16,6 +16,7 @@ #include <linux/pm.h> #include <linux/init.h> #include <linux/slab.h> +#include <linux/jiffies.h> #include <linux/pci-aspm.h> #include "../pci.h" @@ -161,11 +162,12 @@ static void pcie_check_clock_pm(struct p */ static void pcie_aspm_configure_common_clock(struct pci_dev *pdev) { - int pos, child_pos; + int pos, child_pos, i = 0; u16 reg16 = 0; struct pci_dev *child_dev; int same_clock = 1; - + unsigned long start_jiffies; + u16 child_regs[8], parent_reg; /* * all functions of a slot should have the same Slot Clock * Configuration, so just check one function @@ -191,16 +193,19 @@ static void pcie_aspm_configure_common_c child_pos = pci_find_capability(child_dev, PCI_CAP_ID_EXP); pci_read_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, ®16); + child_regs[i] = reg16; if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else reg16 &= ~PCI_EXP_LNKCTL_CCC; pci_write_config_word(child_dev, child_pos + PCI_EXP_LNKCTL, reg16); + i++; } /* Configure upstream component */ pci_read_config_word(pdev, pos + PCI_EXP_LNKCTL, ®16); + parent_reg = reg16; if (same_clock) reg16 |= PCI_EXP_LNKCTL_CCC; else @@ -212,12 +217,30 @@ static void pcie_aspm_configure_common_c pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, reg16); /* Wait for link training end */ - while (1) { + /* break out after waiting for 1 second */ + start_jiffies = jiffies; + while ((jiffies - start_jiffies) < HZ) { pci_read_config_word(pdev, pos + PCI_EXP_LNKSTA, ®16); if (!(reg16 & PCI_EXP_LNKSTA_LT)) break; cpu_relax(); } + /* training failed -> recover */ + if ((jiffies - start_jiffies) >= HZ) { + dev_printk (KERN_ERR, &pdev->dev, "ASPM: Could not configure" + " common clock\n"); + i = 0; + list_for_each_entry(child_dev, &pdev->subordinate->devices, + bus_list) { + child_pos = pci_find_capability(child_dev, + PCI_CAP_ID_EXP); + pci_write_config_word(child_dev, + child_pos + PCI_EXP_LNKCTL, + child_regs[i]); + i++; + } + pci_write_config_word(pdev, pos + PCI_EXP_LNKCTL, parent_reg); + } } /* ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time 2008-12-09 12:05 ` Thomas Renninger @ 2008-12-09 23:05 ` Jesse Barnes 0 siblings, 0 replies; 12+ messages in thread From: Jesse Barnes @ 2008-12-09 23:05 UTC (permalink / raw) To: Thomas Renninger Cc: Shaohua Li, Matthew Garrett, linux-kernel@vger.kernel.org, Rafael Wysocki, shemminger@linux-foundation.org, netdev@vger.kernel.org, Stable@kernel.org On Tuesday, December 09, 2008 4:05 am Thomas Renninger wrote: > Hi Jesse, > > can you add this one, please. > I adjusted the patch to suggestions from Matthew and Shaohua, thus > added their Signed-offs. > > This should still go into .28 as it makes machines boot which > now freeze since the ASPM patch was introduced. Ok, just pushed it. Thanks. Jesse ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-12-09 23:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-11-28 12:28 PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger 2008-12-05 12:41 ` Identified: PCIe ASPM causes machine (HP Compaq 6735s) to sometimes hang in endless loop Thomas Renninger 2008-12-05 13:07 ` [PATCH] PCIe ASPM causes machine (HP Compaq 6735s) to sometimes freeze hard at boot at PCI initialization time Thomas Renninger 2008-12-05 18:21 ` Matthew Garrett 2008-12-08 1:32 ` Shaohua Li 2008-12-08 14:56 ` Thomas Renninger 2008-12-08 15:04 ` Thomas Renninger 2008-12-08 15:09 ` Matthew Garrett 2008-12-08 15:17 ` Thomas Renninger 2008-12-09 1:19 ` Shaohua Li 2008-12-09 12:05 ` Thomas Renninger 2008-12-09 23:05 ` Jesse Barnes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).