* [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
@ 2017-08-15 15:24 Ding Tianhong
2017-08-15 17:03 ` Bjorn Helgaas
2017-08-17 4:59 ` Michael Ellerman
0 siblings, 2 replies; 9+ messages in thread
From: Ding Tianhong @ 2017-08-15 15:24 UTC (permalink / raw)
To: leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr,
asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw,
l.stach, amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher,
catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem,
alexander.duyck, eric.dumazet, linux-arm-kernel, netdev,
linux-pci, linux-kernel, linuxarm
Cc: Ding Tianhong
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):
[ 4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[ 4.253011] PGD 0
[ 4.253011] P4D 0
[ 4.253011]
[ 4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 4.262015] Modules linked in:
[ 4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[ 4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[ 4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
[ 4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[ 4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
[ 4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
[ 4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
[ 4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
[ 4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
[ 4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
[ 4.331002] FS: 0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
[ 4.339002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
[ 4.351002] Call Trace:
[ 4.354012] ? pci_configure_device+0x19f/0x570
[ 4.359002] ? pci_conf1_read+0xb8/0xf0
[ 4.363002] ? raw_pci_read+0x23/0x40
[ 4.366011] ? pci_read+0x2c/0x30
[ 4.370014] ? pci_read_config_word+0x67/0x70
[ 4.374012] pci_device_add+0x28/0x230
[ 4.378012] ? pci_vpd_f0_read+0x50/0x80
[ 4.382014] pci_scan_single_device+0x96/0xc0
[ 4.386012] pci_scan_slot+0x79/0xf0
[ 4.389001] pci_scan_child_bus+0x31/0x180
[ 4.394014] acpi_pci_root_create+0x1c6/0x240
[ 4.398013] pci_acpi_scan_root+0x15f/0x1b0
[ 4.402012] acpi_pci_root_add+0x2e6/0x400
[ 4.406012] ? acpi_evaluate_integer+0x37/0x60
[ 4.411002] acpi_bus_attach+0xdf/0x200
[ 4.415002] acpi_bus_attach+0x6a/0x200
[ 4.418014] acpi_bus_attach+0x6a/0x200
[ 4.422013] acpi_bus_scan+0x38/0x70
[ 4.426011] acpi_scan_init+0x10c/0x271
[ 4.429001] acpi_init+0x2fa/0x348
[ 4.433004] ? acpi_sleep_proc_init+0x2d/0x2d
[ 4.437001] do_one_initcall+0x43/0x169
[ 4.441001] kernel_init_freeable+0x1d0/0x258
[ 4.445003] ? rest_init+0xe0/0xe0
[ 4.449001] kernel_init+0xe/0x150
====================== cut here =============================
It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.
Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
drivers/pci/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..7e2022f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
- if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+ if (highest_pcie_bridge &&
+ pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
return NULL;
return highest_pcie_bridge;
--
1.8.3.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-15 15:24 [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device Ding Tianhong @ 2017-08-15 17:03 ` Bjorn Helgaas 2017-08-16 0:26 ` David Miller 2017-08-16 19:33 ` Thierry Reding 2017-08-17 4:59 ` Michael Ellerman 1 sibling, 2 replies; 9+ messages in thread From: Bjorn Helgaas @ 2017-08-15 17:03 UTC (permalink / raw) To: Ding Tianhong Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, davem, l.stach On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > Eric report a oops when booting the system after applying > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > ... > It looks like the pci_find_pcie_root_port() was trying to > find the Root Port for the PCI device which is the Root > Port already, it will return NULL and trigger the problem, > so check the highest_pcie_bridge to fix thie problem. The problem was actually with a Root Complex Integrated Endpoint that has no upstream PCIe device: 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) Subsystem: Intel Corporation Device 0e2a Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag- RBE- FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") This also Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering series, we only used pci_find_pcie_root_port() in a Chelsio quirk that only applied to non-integrated endpoints, so we didn't trip over the bug. > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > --- > drivers/pci/pci.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc34..7e2022f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > bridge = pci_upstream_bridge(bridge); > } > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > + if (highest_pcie_bridge && > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > return NULL; > > return highest_pcie_bridge; > -- I think structuring the fix as follows is a little more readable: diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc3456dc1..587cd7623ed8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) bridge = pci_upstream_bridge(bridge); } - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) - return NULL; + if (highest_pcie_bridge && + pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT) + return highest_pcie_bridge; - return highest_pcie_bridge; + return NULL; } EXPORT_SYMBOL(pci_find_pcie_root_port); _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-15 17:03 ` Bjorn Helgaas @ 2017-08-16 0:26 ` David Miller 2017-08-16 19:33 ` Thierry Reding 1 sibling, 0 replies; 9+ messages in thread From: David Miller @ 2017-08-16 0:26 UTC (permalink / raw) To: helgaas Cc: dingtianhong, leedom, ashok.raj, bhelgaas, werner, ganeshgr, asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland, robin.murphy, alexander.duyck, eric.dumazet, linux-arm-kernel, netdev, linux-pci, linux-kernel, linuxarm From: Bjorn Helgaas <helgaas@kernel.org> Date: Tue, 15 Aug 2017 12:03:31 -0500 > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: >> Eric report a oops when booting the system after applying >> the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): >> ... > >> It looks like the pci_find_pcie_root_port() was trying to >> find the Root Port for the PCI device which is the Root >> Port already, it will return NULL and trigger the problem, >> so check the highest_pcie_bridge to fix thie problem. > > The problem was actually with a Root Complex Integrated Endpoint that > has no upstream PCIe device: > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > Subsystem: Intel Corporation Device 0e2a > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes > >> Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > This also > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") > > which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that > only applied to non-integrated endpoints, so we didn't trip over the > bug. ... > I think structuring the fix as follows is a little more readable: > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc3456dc1..587cd7623ed8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c I've integrated all of this feedback and the other Fixes: tag and applied it to 'net', thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-15 17:03 ` Bjorn Helgaas 2017-08-16 0:26 ` David Miller @ 2017-08-16 19:33 ` Thierry Reding 2017-08-16 20:02 ` Bjorn Helgaas 2017-08-17 5:12 ` Michael Ellerman 1 sibling, 2 replies; 9+ messages in thread From: Thierry Reding @ 2017-08-16 19:33 UTC (permalink / raw) To: Bjorn Helgaas Cc: Ding Tianhong, mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, davem, l.stach [-- Attachment #1: Type: text/plain, Size: 8419 bytes --] On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote: > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > > Eric report a oops when booting the system after applying > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > > ... > > > It looks like the pci_find_pcie_root_port() was trying to > > find the Root Port for the PCI device which is the Root > > Port already, it will return NULL and trigger the problem, > > so check the highest_pcie_bridge to fix thie problem. > > The problem was actually with a Root Complex Integrated Endpoint that > has no upstream PCIe device: > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > Subsystem: Intel Corporation Device 0e2a > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > ExtTag- RBE- FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes I've started seeing this crash on Tegra K1 as well. Here's the device for which it oopses: 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 391 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00001000-00001fff [size=4K] Memory behind bridge: 13000000-130fffff [size=1M] Prefetchable memory behind bridge: 0000000020000000-00000000200fffff [size=1M] Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge Capabilities: [48] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+ Address: 000000fcfffff000 Data: 0000 Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed- Mapping Address Base: 00000000fee00000 Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ RBE+ DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 0.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd On, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Kernel driver in use: pcieport > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > This also > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") > > which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that > only applied to non-integrated endpoints, so we didn't trip over the > bug. > > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > --- > > drivers/pci/pci.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index af0cc34..7e2022f 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > > bridge = pci_upstream_bridge(bridge); > > } > > > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > + if (highest_pcie_bridge && > > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > return NULL; > > > > return highest_pcie_bridge; > > -- > > I think structuring the fix as follows is a little more readable: > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc3456dc1..587cd7623ed8 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > bridge = pci_upstream_bridge(bridge); > } > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > - return NULL; > + if (highest_pcie_bridge && > + pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT) > + return highest_pcie_bridge; > > - return highest_pcie_bridge; > + return NULL; > } > EXPORT_SYMBOL(pci_find_pcie_root_port); In case of Tegra, dev actually points to the root port. Now if I read the above code correctly, highest_pcie_bridge will still be NULL in that case, which in turn will return NULL from pci_find_pcie_root_port(). But shouldn't it really return dev? The patch that I used to fix the issue is this: --->8--- diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2c712dcfd37d..dd56c1c05614 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource); */ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) { - struct pci_dev *bridge, *highest_pcie_bridge = NULL; + struct pci_dev *bridge, *highest_pcie_bridge = dev; bridge = pci_upstream_bridge(dev); while (bridge && pci_is_pcie(bridge)) { --->8--- That works correctly if this function ends up being called on the PCIe root port, though perhaps that's not what this function is supposed to do. It's somewhat unclear from the kerneldoc what the function should be doing when called on a root port device itself. Thierry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-16 19:33 ` Thierry Reding @ 2017-08-16 20:02 ` Bjorn Helgaas 2017-08-16 20:59 ` David Miller 2017-08-17 5:12 ` Michael Ellerman 1 sibling, 1 reply; 9+ messages in thread From: Bjorn Helgaas @ 2017-08-16 20:02 UTC (permalink / raw) To: Thierry Reding Cc: mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, Ding Tianhong, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, davem, l.stach On Wed, Aug 16, 2017 at 09:33:03PM +0200, Thierry Reding wrote: > On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote: > > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote: > > > Eric report a oops when booting the system after applying > > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): > > > ... > > > > > It looks like the pci_find_pcie_root_port() was trying to > > > find the Root Port for the PCI device which is the Root > > > Port already, it will return NULL and trigger the problem, > > > so check the highest_pcie_bridge to fix thie problem. > > > > The problem was actually with a Root Complex Integrated Endpoint that > > has no upstream PCIe device: > > > > 00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04) > > Subsystem: Intel Corporation Device 0e2a > > Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > > ExtTag- RBE- FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+ > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 128 bytes, MaxReadReq 128 bytes > > I've started seeing this crash on Tegra K1 as well. Here's the device > for which it oopses: > > 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 391 > Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 > I/O behind bridge: 00001000-00001fff [size=4K] > Memory behind bridge: 13000000-130fffff [size=1M] > Prefetchable memory behind bridge: 0000000020000000-00000000200fffff [size=1M] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge > Capabilities: [48] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+ > Address: 000000fcfffff000 Data: 0000 > Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed- > Mapping Address Base: 00000000fee00000 > Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0 > ExtTag+ RBE+ > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 512 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns > ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > Slot #0, PowerLimit 0.000W; Interlock- NoCompl- > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- > Control: AttnInd Off, PwrInd On, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState+ > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- > RootCap: CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd- > AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Kernel driver in use: pcieport > > > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > > > > This also > > > > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") > > > > which added pci_find_pcie_root_port(). Prior to this Relaxed Ordering > > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that > > only applied to non-integrated endpoints, so we didn't trip over the > > bug. > > > > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > > > --- > > > drivers/pci/pci.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index af0cc34..7e2022f 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > > > bridge = pci_upstream_bridge(bridge); > > > } > > > > > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > > + if (highest_pcie_bridge && > > > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > > return NULL; > > > > > > return highest_pcie_bridge; > > > -- > > > > I think structuring the fix as follows is a little more readable: > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index af0cc3456dc1..587cd7623ed8 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > > bridge = pci_upstream_bridge(bridge); > > } > > > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > > - return NULL; > > + if (highest_pcie_bridge && > > + pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT) > > + return highest_pcie_bridge; > > > > - return highest_pcie_bridge; > > + return NULL; > > } > > EXPORT_SYMBOL(pci_find_pcie_root_port); > > In case of Tegra, dev actually points to the root port. Now if I read > the above code correctly, highest_pcie_bridge will still be NULL in that > case, which in turn will return NULL from pci_find_pcie_root_port(). But > shouldn't it really return dev? > > The patch that I used to fix the issue is this: > > --->8--- > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2c712dcfd37d..dd56c1c05614 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource); > */ > struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > { > - struct pci_dev *bridge, *highest_pcie_bridge = NULL; > + struct pci_dev *bridge, *highest_pcie_bridge = dev; > > bridge = pci_upstream_bridge(dev); > while (bridge && pci_is_pcie(bridge)) { > --->8--- > > That works correctly if this function ends up being called on the PCIe > root port, though perhaps that's not what this function is supposed to > do. It's somewhat unclear from the kerneldoc what the function should > be doing when called on a root port device itself. Your fix looks right to me. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-16 20:02 ` Bjorn Helgaas @ 2017-08-16 20:59 ` David Miller 2017-08-17 1:14 ` Ding Tianhong 0 siblings, 1 reply; 9+ messages in thread From: David Miller @ 2017-08-16 20:59 UTC (permalink / raw) To: helgaas Cc: thierry.reding, dingtianhong, mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, l.stach From: Bjorn Helgaas <helgaas@kernel.org> Date: Wed, 16 Aug 2017 15:02:37 -0500 > Your fix looks right to me. Someone please submit this fix formally because this change is now in Linus's tree. Thank you. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-16 20:59 ` David Miller @ 2017-08-17 1:14 ` Ding Tianhong 0 siblings, 0 replies; 9+ messages in thread From: Ding Tianhong @ 2017-08-17 1:14 UTC (permalink / raw) To: David Miller, helgaas Cc: thierry.reding, mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, l.stach On 2017/8/17 4:59, David Miller wrote: > From: Bjorn Helgaas <helgaas@kernel.org> > Date: Wed, 16 Aug 2017 15:02:37 -0500 > >> Your fix looks right to me. > > Someone please submit this fix formally because this change is now in > Linus's tree. > I will send it. > Thank you. > > . > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-16 19:33 ` Thierry Reding 2017-08-16 20:02 ` Bjorn Helgaas @ 2017-08-17 5:12 ` Michael Ellerman 1 sibling, 0 replies; 9+ messages in thread From: Michael Ellerman @ 2017-08-17 5:12 UTC (permalink / raw) To: Thierry Reding, Bjorn Helgaas Cc: Ding Tianhong, mark.rutland, gabriele.paoloni, asit.k.mallick, catalin.marinas, will.deacon, linuxarm, alexander.duyck, ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr, Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner, linux-arm-kernel, amira, netdev, linux-kernel, David.Laight, Suravee.Suthikulpanit, robin.murphy, davem, l.stach Thierry Reding <thierry.reding@gmail.com> writes: ... > > In case of Tegra, dev actually points to the root port. Now if I read > the above code correctly, highest_pcie_bridge will still be NULL in that > case, which in turn will return NULL from pci_find_pcie_root_port(). But > shouldn't it really return dev? > > The patch that I used to fix the issue is this: > > --->8--- > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2c712dcfd37d..dd56c1c05614 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource); > */ > struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > { > - struct pci_dev *bridge, *highest_pcie_bridge = NULL; > + struct pci_dev *bridge, *highest_pcie_bridge = dev; > > bridge = pci_upstream_bridge(dev); > while (bridge && pci_is_pcie(bridge)) { > --->8--- > > That works correctly if this function ends up being called on the PCIe > root port, though perhaps that's not what this function is supposed to > do. It's somewhat unclear from the kerneldoc what the function should > be doing when called on a root port device itself. That also works for me on powerpc (oops reported up thread). cheers ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device 2017-08-15 15:24 [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device Ding Tianhong 2017-08-15 17:03 ` Bjorn Helgaas @ 2017-08-17 4:59 ` Michael Ellerman 1 sibling, 0 replies; 9+ messages in thread From: Michael Ellerman @ 2017-08-17 4:59 UTC (permalink / raw) To: Ding Tianhong, leedom, ashok.raj, bhelgaas, helgaas, werner, ganeshgr, asit.k.mallick, patrick.j.cramer, Suravee.Suthikulpanit, Bob.Shaw, l.stach, amira, gabriele.paoloni, David.Laight, jeffrey.t.kirsher, catalin.marinas, will.deacon, mark.rutland, robin.murphy, davem, alexander.duyck, eric.dumazet, linux-arm-kernel, netdev, linux-pci, linux-kernel, linuxarm, linuxppc-dev Cc: Ding Tianhong Ding Tianhong <dingtianhong@huawei.com> writes: > Eric report a oops when booting the system after applying > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): I'm seeing a similar oops on powerpc: [ 0.177242] pci_bus 0015:70: root bus resource [bus 70-ff] [ 0.178012] Unable to handle kernel paging request for data at address 0x00000050 [ 0.178017] Faulting instruction address: 0xc0000000005f84b4 [ 0.178022] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.178024] SMP NR_CPUS=2048 [ 0.178025] NUMA [ 0.178028] pSeries [ 0.178031] Modules linked in: [ 0.178036] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.13.0-rc4-gcc-6.3.1-00167-ga99b646afa8a #407 [ 0.178040] task: c0000003f7400000 task.stack: c0000003f7480000 [ 0.178043] NIP: c0000000005f84b4 LR: c0000000005f5ccc CTR: 0000000000000000 [ 0.178046] REGS: c0000003f74836d0 TRAP: 0380 Tainted: G W (4.13.0-rc4-gcc-6.3.1-00167-ga99b646afa8a) [ 0.178050] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> [ 0.178057] CR: 48000842 XER: 2000000f [ 0.178061] CFAR: c0000000005f840c SOFTE: 1 [ 0.178061] GPR00: c0000000005f5cb4 c0000003f7483950 c000000000fa0000 0000000000000000 [ 0.178061] GPR04: 0000000000000001 0000000000000028 c0000003f7483820 f000000000ff6360 [ 0.178061] GPR08: 00000003fe2f0000 0000000000000000 c0000003f5759000 0000000002001001 [ 0.178061] GPR12: 0000000000000010 c00000000fd80000 c00000000000db08 0000000000000000 [ 0.178061] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.178061] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.178061] GPR24: 0000000000000000 c000000000c5f680 c0000003f756b678 c0000003f5759000 [ 0.178061] GPR28: 0000000000000030 c0000003f756b098 c0000003f5759000 c0000003f756b000 [ 0.178110] NIP [c0000000005f84b4] pci_find_pcie_root_port+0xb4/0xd0 [ 0.178114] LR [c0000000005f5ccc] pci_device_add+0x32c/0x470 [ 0.178117] Call Trace: [ 0.178120] [c0000003f7483950] [c0000000005f5cb4] pci_device_add+0x314/0x470 (unreliable) [ 0.178126] [c0000003f74839f0] [c00000000005b85c] of_create_pci_dev+0x35c/0x400 [ 0.178130] [c0000003f7483ab0] [c00000000005ba14] __of_scan_bus+0x114/0x1e0 [ 0.178135] [c0000003f7483b20] [c000000000059a9c] pcibios_scan_phb+0x23c/0x270 [ 0.178140] [c0000003f7483bc0] [c000000000d8057c] pcibios_init+0x84/0xdc [ 0.178144] [c0000003f7483c40] [c00000000000d680] do_one_initcall+0x60/0x1c0 [ 0.178149] [c0000003f7483d00] [c000000000d74454] kernel_init_freeable+0x2c4/0x3a0 [ 0.178153] [c0000003f7483dc0] [c00000000000db24] kernel_init+0x24/0x150 [ 0.178158] [c0000003f7483e30] [c00000000000bc28] ret_from_kernel_thread+0x5c/0xb4 ... And the patch below fixes it. Thanks. cheers > ====================== cut here ============================= > > It looks like the pci_find_pcie_root_port() was trying to > find the Root Port for the PCI device which is the Root > Port already, it will return NULL and trigger the problem, > so check the highest_pcie_bridge to fix thie problem. > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") > Reported-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> > --- > drivers/pci/pci.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc34..7e2022f 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev) > bridge = pci_upstream_bridge(bridge); > } > > - if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > + if (highest_pcie_bridge && > + pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT) > return NULL; > > return highest_pcie_bridge; > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-08-17 5:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-15 15:24 [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device Ding Tianhong 2017-08-15 17:03 ` Bjorn Helgaas 2017-08-16 0:26 ` David Miller 2017-08-16 19:33 ` Thierry Reding 2017-08-16 20:02 ` Bjorn Helgaas 2017-08-16 20:59 ` David Miller 2017-08-17 1:14 ` Ding Tianhong 2017-08-17 5:12 ` Michael Ellerman 2017-08-17 4:59 ` Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).