public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination
       [not found] <bug-215027-41252@https.bugzilla.kernel.org/>
@ 2021-11-15 21:20 ` Bjorn Helgaas
  2021-11-15 21:52   ` Keith Busch
  0 siblings, 1 reply; 2+ messages in thread
From: Bjorn Helgaas @ 2021-11-15 21:20 UTC (permalink / raw)
  To: linux-pci
  Cc: bjorn, Naveen Naidu, Krzysztof Wilczyński, Keith Busch,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Nirmal Patel,
	Jonathan Derrick

[+cc Naveen, NVMe, VMD folks]

On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215027
> 
>             Bug ID: 215027
>            Summary: "PCIe Bus Error: severity=Corrected, type=Physical
>                     Layer" flood on Intel VMD + Samsung NVMe combination
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: mainline, linux-next
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: PCI
>           Assignee: drivers_pci@kernel-bugs.osdl.org
>           Reporter: kai.heng.feng@canonical.com
>         Regression: No
> 
> The following tests (and any combination of them) don't help:
> - Change NVMe LTR value to 0 or any other number
> - Disable NVMe APST
> - Disable PCIe ASPM
> - Any version of kernel, including linux-next
> - "Fix long standing AER Error Handling Issues" patch series [1]
> 
> [1]
> https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@gmail.com/

Thanks a lot for the report, Kai-Heng.  It's on v5.15, which is good,
and not marked as a regression.  Samples from dmesg:

  [    0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
  [    0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
  [    0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
  [    1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
  [    1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000
  [    1.569083] nvme nvme0: pci function 10000:e1:00.0
  [    1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
  [    1.573997] nvme 10000:e1:00.0: PCI INT A: not connected
  [    1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
  [    1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver)
  [    1.587454] nvme 10000:e1:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
  [    1.589502] nvme 10000:e1:00.0:    [ 0] RxErr
  [    1.589813] nvme nvme0: Shutdown timeout set to 10 seconds
  [    1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
  [    1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100
  [    1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
  ...

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination
  2021-11-15 21:20 ` [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination Bjorn Helgaas
@ 2021-11-15 21:52   ` Keith Busch
  0 siblings, 0 replies; 2+ messages in thread
From: Keith Busch @ 2021-11-15 21:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, bjorn, Naveen Naidu, Krzysztof Wilczyński,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Nirmal Patel,
	Jonathan Derrick

On Mon, Nov 15, 2021 at 03:20:50PM -0600, Bjorn Helgaas wrote:
> [+cc Naveen, NVMe, VMD folks]
> 
> On Mon, Nov 15, 2021 at 07:17:01AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215027
> > 
> >             Bug ID: 215027
> >            Summary: "PCIe Bus Error: severity=Corrected, type=Physical
> >                     Layer" flood on Intel VMD + Samsung NVMe combination
> >            Product: Drivers
> >            Version: 2.5
> >     Kernel Version: mainline, linux-next
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: PCI
> >           Assignee: drivers_pci@kernel-bugs.osdl.org
> >           Reporter: kai.heng.feng@canonical.com
> >         Regression: No
> > 
> > The following tests (and any combination of them) don't help:
> > - Change NVMe LTR value to 0 or any other number
> > - Disable NVMe APST
> > - Disable PCIe ASPM
> > - Any version of kernel, including linux-next
> > - "Fix long standing AER Error Handling Issues" patch series [1]
> > 
> > [1]
> > https://lore.kernel.org/linux-pci/cover.1635179600.git.naveennaidu479@gmail.com/
> 
> Thanks a lot for the report, Kai-Heng.  It's on v5.15, which is good,
> and not marked as a regression.  Samples from dmesg:
> 
>   [    0.408995] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
>   [    0.410076] acpi PNP0A08:00: _OSC: platform does not support [AER]
>   [    0.412207] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug SHPCHotplug PME PCIeCapability LTR]
>   [    1.367220] vmd 0000:00:0e.0: PCI host bridge to bus 10000:e0
>   [    1.490742] vmd 0000:00:0e.0: Bound to PCI domain 10000
>   [    1.569083] nvme nvme0: pci function 10000:e1:00.0
>   [    1.571421] pcieport 10000:e0:06.0: can't derive routing for PCI INT A
>   [    1.573997] nvme 10000:e1:00.0: PCI INT A: not connected
>   [    1.579028] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   [    1.584839] nvme 10000:e1:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver)
>   [    1.587454] nvme 10000:e1:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
>   [    1.589502] nvme 10000:e1:00.0:    [ 0] RxErr
>   [    1.589813] nvme nvme0: Shutdown timeout set to 10 seconds
>   [    1.591509] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   [    1.595252] pcieport 10000:e0:06.0: AER: can't find device of IDe100
>   [    1.597213] pcieport 10000:e0:06.0: AER: Corrected error received: 10000:e1:00.0
>   ...

Just for testing purposes, does it still produce the repeated error
messages if you disable VMD?

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-11-15 21:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-215027-41252@https.bugzilla.kernel.org/>
2021-11-15 21:20 ` [Bug 215027] New: "PCIe Bus Error: severity=Corrected, type=Physical Layer" flood on Intel VMD + Samsung NVMe combination Bjorn Helgaas
2021-11-15 21:52   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox