* Re: [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination
[not found] <bug-215027-41252@https.bugzilla.kernel.org/>
@ 2021-11-15 21:20 ` Bjorn Helgaas
2021-11-15 21:52 ` Keith Busch
0 siblings, 1 reply; 2+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 21:20 UTC (permalink / raw)
To: linux-pci
Cc: bjorn, Naveen Naidu, Krzysztof Wilczyński, Keith Busch,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Nirmal Patel,
Jonathan Derrick
[+cc Naveen, NVMe, VMD folks]
On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215027
>
> Bug ID: 215027
> Summary: "PCIe Bus Error: severity=Corrected, type=Physical
> Layer" flood on Intel VMD + Samsung NVMe combination
> Product: Drivers
> Version: 2.5
> Kernel Version: mainline, linux-next
> Hardware: All
> OS: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: PCI
> Assignee: drivers_pci@kernel-bugs.osdl.org
> Reporter: kai.heng.feng@canonical.com
> Regression: No
>
> The following tests (and any combination of them) don't help:
> - Change NVMe LTR value to 0 or any other number
> - Disable NVMe APST
> - Disable PCIe ASPM
> - Any version of kernel, including linux-next
> - "Fix long standing AER Error Handling Issues" patch series [1]
>
> [1]
> https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@gmail.com/
Thanks a lot for the report, Kai-Heng. It's on v5.15, which is good,
and not marked as a regression. Samples from dmesg:
[ 0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[ 0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
[ 0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
[ 1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
[ 1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000
[ 1.569083] nvme nvme0: pci function 10000:e1:00.0
[ 1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
[ 1.573997] nvme 10000:e1:00.0: PCI INT A: not connected
[ 1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
[ 1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver)
[ 1.587454] nvme 10000:e1:00.0: device [144d:a80a] error status/mask=00000001/0000e000
[ 1.589502] nvme 10000:e1:00.0: [ 0] RxErr
[ 1.589813] nvme nvme0: Shutdown timeout set to 10 seconds
[ 1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
[ 1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100
[ 1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
...
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination
2021-11-15 21:20 ` [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination Bjorn Helgaas
@ 2021-11-15 21:52 ` Keith Busch
0 siblings, 0 replies; 2+ messages in thread
From: Keith Busch @ 2021-11-15 21:52 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-pci, bjorn, Naveen Naidu, Krzysztof Wilczyński,
Jens Axboe, Christoph Hellwig, Sagi Grimberg, Nirmal Patel,
Jonathan Derrick
On Mon, Nov 15, 2021 at 03:20:50PM -0600, Bjorn Helgaas wrote:
> [+cc Naveen, NVMe, VMD folks]
>
> On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027
> >
> > Bug ID: 215027
> > Summary: "PCIe Bus Error: severity=Corrected, type=Physical
> > Layer" flood on Intel VMD + Samsung NVMe combination
> > Product: Drivers
> > Version: 2.5
> > Kernel Version: mainline, linux-next
> > Hardware: All
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: normal
> > Priority: P1
> > Component: PCI
> > Assignee: drivers_pci@kernel-bugs.osdl.org
> > Reporter: kai.heng.feng@canonical.com
> > Regression: No
> >
> > The following tests (and any combination of them) don't help:
> > - Change NVMe LTR value to 0 or any other number
> > - Disable NVMe APST
> > - Disable PCIe ASPM
> > - Any version of kernel, including linux-next
> > - "Fix long standing AER Error Handling Issues" patch series [1]
> >
> > [1]
> > https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@gmail.com/
>
> Thanks a lot for the report, Kai-Heng. It's on v5.15, which is good,
> and not marked as a regression. Samples from dmesg:
>
> [ 0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
> [ 0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
> [ 0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
> [ 1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
> [ 1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000
> [ 1.569083] nvme nvme0: pci function 10000:e1:00.0
> [ 1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
> [ 1.573997] nvme 10000:e1:00.0: PCI INT A: not connected
> [ 1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
> [ 1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver)
> [ 1.587454] nvme 10000:e1:00.0: device [144d:a80a] error status/mask=00000001/0000e000
> [ 1.589502] nvme 10000:e1:00.0: [ 0] RxErr
> [ 1.589813] nvme nvme0: Shutdown timeout set to 10 seconds
> [ 1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
> [ 1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100
> [ 1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
> ...
Just for testing purposes, does it still produce the repeated error
messages if you disable VMD?
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-11-15 21:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-215027-41252@https.bugzilla.kernel.org/>
2021-11-15 21:20 ` [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination Bjorn Helgaas
2021-11-15 21:52 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox