* PCIe device issue since v6.1.16
@ 2023-09-19 14:17 Schroeder, Chad
2023-09-19 14:59 ` Lukas Wunner
0 siblings, 1 reply; 5+ messages in thread
From: Schroeder, Chad @ 2023-09-19 14:17 UTC (permalink / raw)
To: bhelgaas@google.com; +Cc: linux-pci@vger.kernel.org, Schroeder, Chad
Recently, I discovered an issue with a PCIe device and recent kernels.
Background:
Using a Linux host system, the PCIe device, a Torrent QN16e PCIe 16-128 Channel QAM Modulator, is passed through to a FreeBSD guest.
Until recently, this was working as expected.
Issue:
Starting with kernel v6.1.16, when the guest domain was started, the kernel would immediately report errors via vfio-pci.
kernel: vfio-pci 0000:65:00.0: vfio_bar_restore: reset recovery - restoring BARs
The guest would boot and its driver would load. Then, when guest user-space, would access the device, a PCI system error (PERR) was raised,
as reported by the impi event log, and the hardware itself would suffer a catastrophic event, cycling the server.
Discovery:
After researching the issue, I found the commit that lead system error:
https://lore.kernel.org/all/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de/
Specifically, this removal:
- Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
is now performed by pci_bridge_wait_for_secondary_bus(). A static
delay this long is only necessary for Conventional PCI, so modern
PCIe systems benefit from shorter reset times as a side effect.
Resolution:
I reintroduced the 1 second delay to pci_reset_secondary_bus, recompiled and installed and the system is now working as expected.
void pci_reset_secondary_bus(struct pci_dev *dev)
{
u16 ctrl;
pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
/*
* PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms. Double
* this to 2ms to ensure that we meet the minimum requirement.
*/
msleep(2);
ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
ssleep(1);
}
PCIe Device:
https://www.videopropulsion.com/content/torrent-qn16e-pcie-16-128-channel-qam-modulator
$ lspci
0000:65:00.0 Multimedia controller: Genroco, Inc Device 0004 (rev 01)
Subsystem: Genroco, Inc Device 0004
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 349
NUMA node: 0
IOMMU group: 1
Region 0: Memory at e0e00000 (32-bit, non-prefetchable) [size=32K]
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [200 v1] Vendor Specific Information: ID=1172 Rev=0 Len=044 <?>
Kernel driver in use: vfio-pci
Regards,
Chad Schroeder
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCIe device issue since v6.1.16
2023-09-19 14:17 PCIe device issue since v6.1.16 Schroeder, Chad
@ 2023-09-19 14:59 ` Lukas Wunner
2023-09-19 15:08 ` Schroeder, Chad
0 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2023-09-19 14:59 UTC (permalink / raw)
To: Schroeder, Chad; +Cc: bhelgaas@google.com, linux-pci@vger.kernel.org
Hi Chad,
On Tue, Sep 19, 2023 at 02:17:29PM +0000, Schroeder, Chad wrote:
> After researching the issue, I found the commit that lead system error:
>
> https://lore.kernel.org/all/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de/
>
> Specifically, this removal:
>
> - Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
> is now performed by pci_bridge_wait_for_secondary_bus(). A static
> delay this long is only necessary for Conventional PCI, so modern
> PCIe systems benefit from shorter reset times as a side effect.
Thanks for the report and sorry for the breakage.
This endpoint device only supports Gen1 speed (2.5GT/s) and does not support
Data Link Layer Link Active Reporting. I have a suspicion that I neglected
to take this case into account in pci_bridge_wait_for_secondary_bus().
To better understand what's going on, could you also provide "lspci -vvv"
output of the parent bridge above 0000:65:00.0 (i.e. of the bridge whose
secondary bus is 65)?
Thanks!
Lukas
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: PCIe device issue since v6.1.16
2023-09-19 14:59 ` Lukas Wunner
@ 2023-09-19 15:08 ` Schroeder, Chad
2023-09-19 18:41 ` Lukas Wunner
0 siblings, 1 reply; 5+ messages in thread
From: Schroeder, Chad @ 2023-09-19 15:08 UTC (permalink / raw)
To: Lukas Wunner
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, Schroeder, Chad
Hi Lukas,
Thanks for the reply. Here you go:
$ lspci
0000:64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode])
Subsystem: Super Micro Computer Inc Device 0953
Physical Slot: 2
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 30
NUMA node: 0
IOMMU group: 0
Bus: primary=64, secondary=65, subordinate=65, sec-latency=0
I/O behind bridge: f000-0fff [disabled] [16-bit]
Memory behind bridge: e0e00000-e0efffff [size=1M] [32-bit]
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled] [64-bit]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity+ SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Subsystem: Super Micro Computer Inc Device 0953
Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit-
Address: fee00038 Data: 0000
Masking: 00000002 Pending: 00000000
Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag+ RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #5, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <16us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #2, PowerLimit 75W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd On, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState+
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal+ PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BCD, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd+
AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled, ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?>
Capabilities: [110 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [148 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES+ TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn- NFERptEn- FERptEn-
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [1d0 v1] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?>
Capabilities: [250 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: LaneErr at lane: 15
Capabilities: [280 v1] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?>
Capabilities: [298 v1] Vendor Specific Information: ID=0007 Rev=0 Len=024 <?>
Capabilities: [300 v1] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?>
Kernel driver in use: pcieport
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: PCIe device issue since v6.1.16
2023-09-19 15:08 ` Schroeder, Chad
@ 2023-09-19 18:41 ` Lukas Wunner
2023-09-20 15:36 ` Schroeder, Chad
0 siblings, 1 reply; 5+ messages in thread
From: Lukas Wunner @ 2023-09-19 18:41 UTC (permalink / raw)
To: Schroeder, Chad; +Cc: bhelgaas@google.com, linux-pci@vger.kernel.org
Hi Chad,
On Tue, Sep 19, 2023 at 03:08:08PM +0000, Schroeder, Chad wrote:
> 0000:64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode])
[...]
> Bus: primary=64, secondary=65, subordinate=65, sec-latency=0
[...]
> Capabilities: [90] Express (v2) Root Port (Slot+), MSI 00
[...]
> LnkCap: Port #5, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <16us
> ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
> LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x1
> TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
pci_bridge_secondary_bus_reset() calls pcibios_reset_secondary_bus()
to perform the SBR, then calls pci_bridge_wait_for_secondary_bus()
to perform the delays prescribed by the spec.
At the bottom of pci_bridge_wait_for_secondary_bus(), there's an if-else
clause and based on the lspci output quoted above, I'd expect that control
flow enters the else branch (because the SkyLake Root Port supports more
than 5 GT/s, namely 8 GT/s.
The function then calls pcie_wait_for_link_delay() to wait for the link
to come up, then waits 100 msec per PCIe r6.1 sec 6.6.1.
Afterwards it calls pci_dev_wait() to poll the Vendor ID register of the
VideoPropulsion card.
My guess is that the VideoPropulsion card requires a longer delay than
100 msec before its config space is first accessed. The PCI system errors
that you mention are probably raised by the card because it is not ready
yet to handle those config space accesses.
Since this is a PCIe r1.0 card, I've checked whether PCIe r1.0 has
longer delays after reset than contemporary revisions of the PCIe
Base Spec. But that's not the case. PCIe r1.0 sec 7.6 says:
"To allow components to perform internal initialization, system
software must wait for at least 100 ms from the end of a reset
(cold/warm/hot) before it is permitted to issue Configuration
Requests to PCI Express devices
[...]
The Root Complex and/or system software must allow 1.0s (+50% / -0%)
after a reset (hot/warm/cold), before it may determine that a device
which fails to return a Successful Completion status for a valid
Configuration Request is a broken device"
Those timing requirements are essentially identical to what contemporary
PCIe revisions prescribe. It's also what the code in the kernel follows.
Which leads me to believe that the longer delay before the first config
space access required by this particular card is a quirk. So I'm
proposing the following patch:
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 20ac67d..3cbff71 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5993,3 +5993,9 @@ static void dpc_log_size(struct pci_dev *dev)
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2f, dpc_log_size);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a31, dpc_log_size);
#endif
+
+static void pci_fixup_d3cold_delay_1sec(struct pci_dev *pdev)
+{
+ pdev->d3cold_delay = 1000;
+}
+DECLARE_PCI_FIXUP_FINAL(0x5555, 0x0004, pci_fixup_d3cold_delay_1sec);
Could you test if applying this on top of v6.1.16 fixes the issue?
(Apply with "patch -p1 < filename.patch" at the top of the kernel
source tree.)
Thanks,
Lukas
^ permalink raw reply related [flat|nested] 5+ messages in thread
* RE: PCIe device issue since v6.1.16
2023-09-19 18:41 ` Lukas Wunner
@ 2023-09-20 15:36 ` Schroeder, Chad
0 siblings, 0 replies; 5+ messages in thread
From: Schroeder, Chad @ 2023-09-20 15:36 UTC (permalink / raw)
To: Lukas Wunner
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, Schroeder, Chad
> Which leads me to believe that the longer delay before the first config
> space access required by this particular card is a quirk. So I'm
> proposing the following patch:
>
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 20ac67d..3cbff71 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5993,3 +5993,9 @@ static void dpc_log_size(struct pci_dev *dev)
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2f, dpc_log_size);
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a31,
> dpc_log_size);
> #endif
> +
> +static void pci_fixup_d3cold_delay_1sec(struct pci_dev *pdev)
> +{
> + pdev->d3cold_delay = 1000;
> +}
> +DECLARE_PCI_FIXUP_FINAL(0x5555, 0x0004, pci_fixup_d3cold_delay_1sec);
>
>
> Could you test if applying this on top of v6.1.16 fixes the issue?
> (Apply with "patch -p1 < filename.patch" at the top of the kernel
> source tree.)
>
> Thanks,
>
> Lukas
I've applied the quirk patch to v6.5.4 and it does indeed elicit the same successful result as the original ssleep delay.
The device and server are functioning as expected.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-09-20 15:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-19 14:17 PCIe device issue since v6.1.16 Schroeder, Chad
2023-09-19 14:59 ` Lukas Wunner
2023-09-19 15:08 ` Schroeder, Chad
2023-09-19 18:41 ` Lukas Wunner
2023-09-20 15:36 ` Schroeder, Chad
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.