From: Hans-Peter Jansen <hpj@urpla.net>
To: linux-kernel@vger.kernel.org
Subject: Re: AMD PCI Bridge: Hardware error from APEI
Date: Sat, 11 Jul 2020 18:32:21 +0200 [thread overview]
Message-ID: <7789846.exkBBusKcl@xrated> (raw)
In-Reply-To: <10516708.ThPc60jCtT@xrated>
Am Dienstag, 7. Juli 2020, 08:56:41 CEST schrieben Sie:
> Am Samstag, 27. Juni 2020, 20:23:35 CEST schrieben Sie:
> > Dear hacker from the order of the penguins,
> >
> > we're facing a disturbing issue here after swapping a motherboard of a
> > mission critical system:
> >
> > Specs:
> > ASUS KNPA-U16 with an AMD EPYC 7261, 2x32 GB Kingston KSM26RD4/32MEI
> > (officially supported RAM modules)
> >
> > openSUSE 15.1, Kernel 5.7.5
>
> Not sure, how to proceed with this one?
>
> After 9½ days uptime, it cumulated about 34,000 incidents:
>
> [...]
>
> Needless so say, that this is no permanent solution.
>
> Any ideas anybody?
After swapping the PCIe slot for the Digital Devices Max S8 4/8, the error has
moved:
2020-07-11T18:25:34.380002+02:00 tyrex kernel: [ 889.223783] {20}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
2020-07-11T18:25:34.380025+02:00 tyrex kernel: [ 889.223787] {20}[Hardware Error]: It has been corrected by h/w and requires no further action
2020-07-11T18:25:34.380028+02:00 tyrex kernel: [ 889.223789] {20}[Hardware Error]: event severity: corrected
2020-07-11T18:25:34.380031+02:00 tyrex kernel: [ 889.223791] {20}[Hardware Error]: Error 0, type: corrected
2020-07-11T18:25:34.380032+02:00 tyrex kernel: [ 889.223793] {20}[Hardware Error]: fru_text: PcieError
2020-07-11T18:25:34.380034+02:00 tyrex kernel: [ 889.223795] {20}[Hardware Error]: section_type: PCIe error
2020-07-11T18:25:34.380577+02:00 tyrex kernel: [ 889.223796] {20}[Hardware Error]: port_type: 4, root port
2020-07-11T18:25:34.380586+02:00 tyrex kernel: [ 889.223798] {20}[Hardware Error]: version: 0.2
2020-07-11T18:25:34.380588+02:00 tyrex kernel: [ 889.223800] {20}[Hardware Error]: command: 0x0407, status: 0x0010
2020-07-11T18:25:34.380590+02:00 tyrex kernel: [ 889.223802] {20}[Hardware Error]: device_id: 0000:40:03.1
2020-07-11T18:25:34.380591+02:00 tyrex kernel: [ 889.223803] {20}[Hardware Error]: slot: 16
2020-07-11T18:25:34.380593+02:00 tyrex kernel: [ 889.223804] {20}[Hardware Error]: secondary_bus: 0x41
2020-07-11T18:25:34.380595+02:00 tyrex kernel: [ 889.223806] {20}[Hardware Error]: vendor_id: 0x1022, device_id: 0x1453
2020-07-11T18:25:34.380597+02:00 tyrex kernel: [ 889.223808] {20}[Hardware Error]: class_code: 060400
2020-07-11T18:25:34.380599+02:00 tyrex kernel: [ 889.223810] {20}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0012
2020-07-11T18:25:34.380601+02:00 tyrex kernel: [ 889.223908] pcieport 0000:40:03.1: AER: aer_status: 0x00001000, aer_mask: 0x00006000
2020-07-11T18:25:34.380603+02:00 tyrex kernel: [ 889.223912] pcieport 0000:40:03.1: AER: [12] Timeout
2020-07-11T18:25:34.380605+02:00 tyrex kernel: [ 889.223915] pcieport 0000:40:03.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID
It looks like the system is creating such devices on demand:
40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 39, NUMA node 2
Bus: primary=40, secondary=41, subordinate=41, sec-latency=0
I/O behind bridge: None
Memory behind bridge: e5d00000-e5dfffff [size=1M]
Prefetchable memory behind bridge: None
Capabilities: [50] Power Management version 3
Capabilities: [58] Express Root Port (Slot+), MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [270] #19
Capabilities: [2a0] Access Control Services
Capabilities: [370] L1 PM Substates
Capabilities: [380] Downstream Port Containment
Capabilities: [3c4] #23
Kernel driver in use: pcieport
in order to handle:
41:00.0 Multimedia controller: Digital Devices GmbH Max
Subsystem: Digital Devices GmbH Max S8 4/8
Flags: bus master, fast devsel, latency 0, IRQ 181, NUMA node 2
Memory at e5d00000 (64-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 3
Capabilities: [70] MSI: Enable- Count=1/2 Maskable- 64bit+
Capabilities: [90] Express Endpoint, MSI 00
Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
Kernel driver in use: ddbridge
Kernel modules: ddbridge
Hrmpf.
Pete
next prev parent reply other threads:[~2020-07-11 16:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-27 18:23 AMD PCI Bridge: Hardware error from APEI Hans-Peter Jansen
2020-07-07 6:56 ` Hans-Peter Jansen
2020-07-11 16:32 ` Hans-Peter Jansen [this message]
2020-07-15 8:11 ` Hans-Peter Jansen
2021-05-15 17:11 ` Hans-Peter Jansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7789846.exkBBusKcl@xrated \
--to=hpj@urpla.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.