All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans-Peter Jansen <hpj@urpla.net>
To: linux-kernel@vger.kernel.org
Subject: AMD PCI Bridge: Hardware error from APEI
Date: Sat, 27 Jun 2020 20:23:35 +0200	[thread overview]
Message-ID: <2559180.cUrjzdZFCD@xrated> (raw)

Dear hacker from the order of the penguins,

we're facing a disturbing issue here after swapping a motherboard of a 
mission critical system:

Jun 27 20:05:29 server kernel: {10}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Jun 27 20:05:29 server kernel: {10}[Hardware Error]: It has been corrected by h/w and requires no further action
Jun 27 20:05:29 server kernel: {10}[Hardware Error]: event severity: corrected
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:  Error 0, type: corrected
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:  fru_text: PcieError
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   section_type: PCIe error
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   port_type: 4, root port
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   version: 0.2
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   command: 0x0407, status: 0x0010
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   device_id: 0000:60:03.1
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   slot: 19
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   secondary_bus: 0x62
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1453
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   class_code: 060400
Jun 27 20:05:29 server kernel: {10}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0012
Jun 27 20:05:29 server kernel: pcieport 0000:60:03.1: AER: aer_status: 0x00001000, aer_mask: 0x00006000
Jun 27 20:05:29 server kernel: pcieport 0000:60:03.1: AER:    [12] Timeout               
Jun 27 20:05:29 server kernel: pcieport 0000:60:03.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID


Jun 27 20:05:51 server kernel: {11}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
Jun 27 20:05:51 server kernel: {11}[Hardware Error]: It has been corrected by h/w and requires no further action
Jun 27 20:05:51 server kernel: {11}[Hardware Error]: event severity: corrected
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:  Error 0, type: corrected
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:  fru_text: PcieError
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   section_type: PCIe error
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   port_type: 4, root port
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   version: 0.2
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   command: 0x0407, status: 0x0010
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   device_id: 0000:60:03.1
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   slot: 19
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   secondary_bus: 0x62
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   vendor_id: 0x1022, device_id: 0x1453
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   class_code: 060400
Jun 27 20:05:51 server kernel: {11}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0012
Jun 27 20:05:51 server kernel: pcieport 0000:60:03.1: AER: aer_status: 0x00001000, aer_mask: 0x00006000
Jun 27 20:05:51 server kernel: pcieport 0000:60:03.1: AER:    [12] Timeout               
Jun 27 20:05:51 server kernel: pcieport 0000:60:03.1: AER: aer_layer=Data Link Layer, aer_agent=Transmitter ID


60:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 44, NUMA node 3
        Bus: primary=60, secondary=62, subordinate=62, sec-latency=0
        I/O behind bridge: None
        Memory behind bridge: e3600000-e36fffff [size=1M]
        Prefetchable memory behind bridge: None
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Root Port (Slot+), MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
        Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [270] #19
        Capabilities: [2a0] Access Control Services
        Capabilities: [370] L1 PM Substates
        Capabilities: [380] Downstream Port Containment
        Capabilities: [3c4] #23
        Kernel driver in use: pcieport

It's probably related to a satellite receiver card, since it only appeared 
after plugging:

62:00.0 Multimedia controller: Digital Devices GmbH Max
        Subsystem: Digital Devices GmbH Max S8 4/8
        Flags: bus master, fast devsel, latency 0, IRQ 161, NUMA node 3
        Memory at e3600000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [50] Power Management version 3
        Capabilities: [70] MSI: Enable- Count=1/2 Maskable- 64bit+
        Capabilities: [90] Express Endpoint, MSI 00
        Capabilities: [100] Vendor Specific Information: ID=0000 Rev=0 Len=00c <?>
        Kernel driver in use: ddbridge
        Kernel modules: ddbridge

Specs:
ASUS KNPA-U16 with an AMD EPYC 7261, 2x32 GB Kingston KSM26RD4/32MEI 
(officially supported RAM modules)

openSUSE 15.1, Kernel 5.7.5

Cheers,
Pete



             reply	other threads:[~2020-06-27 18:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27 18:23 Hans-Peter Jansen [this message]
2020-07-07  6:56 ` AMD PCI Bridge: Hardware error from APEI Hans-Peter Jansen
2020-07-11 16:32   ` Hans-Peter Jansen
2020-07-15  8:11     ` Hans-Peter Jansen
2021-05-15 17:11       ` Hans-Peter Jansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2559180.cUrjzdZFCD@xrated \
    --to=hpj@urpla.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.