linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Chen, Gong" <gong.chen@linux.intel.com>
To: tony.luck@intel.com, bp@alien8.de
Cc: linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org
Subject: Extended H/W error log driver
Date: Fri, 11 Oct 2013 02:32:38 -0400	[thread overview]
Message-ID: <1381473166-29303-1-git-send-email-gong.chen@linux.intel.com> (raw)

[PATCH 1/8] ACPI, APEI, CPER: Fix status check during error printing
[PATCH 2/8] ACPI, CPER: Update cper info
[PATCH 3/8] ACPI, x86: Extended error log driver for x86 platform
[PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS
[PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
[PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability
[PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format
[PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver

This patch series adds an enhanced MCA event logging driver provided by Intel.
Please refer to this link: htpp://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html

Certain usages such as Predictive Failure Analysis (PFA) require more
information about the error than what can be described in processor
machine check banks. Most server processors log additional information
about the error in processor uncore registers. Since the addresses
and layout of these registers vary widely from one processor to another,
system software cannot readily make use of them. To complicate matters
further, some of the additionalerror information cannot be constructed
without detailed knowledge about platform topology. This enhanced MCA
logging driver allows firmware to provide additional error information
to MCE/CMCI handler and thus addresses this gap.

After applying this patch series, when a memory corrected error happens,
we can get following information:

dmesg output:

[56005.785917] {3}Hardware error detected on CPU0
[56005.785959] {3}event severity: corrected
[56005.785975] {3}sub_event[0], severity: corrected
[56005.785977] {3}section_type: memory error
[56005.785981] {3}physical_address: 0x0000000851fe0000
[56005.786027] {3}DIMM location: Memriser1 CHANNEL A DIMM 0
[56005.786154] {4}Hardware error detected on CPU0
[56005.786159] {4}event severity: corrected
[56005.786162] {4}sub_event[0], severity: corrected
[56005.786166] {4}section_type: memory error


trace output:

# tracer: nop
#
# entries-in-buffer/entries-written: 4/4   #P:120
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
...
...
          <idle>-0     [000] d.h. 56068.488759: extlog_mem_event: 3 corrected errors:unknown on Memriser1 CHANNEL A DIMM 0(FRU: 00000000-0000-0000-0000-000000000000  physical addr: 0x0000000851fe0000 node: 0 card: 0 module: 0 rank: 0 bank: 0 row: 28927 column: 1296)
          <idle>-0     [000] d.h. 56068.488834: extlog_mem_event: 4 corrected errors:unknown
...
...

dmesg output are shrank to only keep the most important data. The trace
output will contain most of data. Not sure if all fields are meaningful
to users. Some fields like FRU ID/FRU TEXT depends on BIOS manufactor.
So welcome to add comments for what is needed or not.


             reply	other threads:[~2013-10-11  6:47 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-11  6:32 Chen, Gong [this message]
2013-10-11  6:32 ` [PATCH 1/8] ACPI, APEI, CPER: Fix status check during error printing Chen, Gong
2013-10-11  8:50   ` Borislav Petkov
2013-10-11  6:32 ` [PATCH 2/8] ACPI, CPER: Update cper info Chen, Gong
2013-10-11  9:06   ` Borislav Petkov
2013-10-11 15:47     ` Borislav Petkov
2013-10-16  1:57       ` Joe Perches
2013-10-16  2:46         ` Chen Gong
2013-10-16  3:10           ` Joe Perches
2013-10-15 18:17   ` Naveen N. Rao
2013-10-16  1:39     ` Chen Gong
2013-10-17 12:21       ` Naveen N. Rao
2013-10-18 11:06         ` Naveen N. Rao
2013-10-11  6:32 ` [PATCH 3/8] ACPI, x86: Extended error log driver for x86 platform Chen, Gong
2013-10-11 15:24   ` Borislav Petkov
2013-10-14  3:16     ` Chen Gong
2013-10-14 10:26       ` Borislav Petkov
2013-10-14 13:03         ` Chen Gong
2013-10-14 13:28           ` Borislav Petkov
2013-10-14 16:50         ` Tony Luck
2013-10-14 17:07           ` Borislav Petkov
2013-10-14 17:16             ` Tony Luck
2013-10-11  6:32 ` [PATCH 4/8] DMI: Parse memory device (type 17) in SMBIOS Chen, Gong
2013-10-11 15:40   ` Borislav Petkov
2013-10-14  3:21     ` Chen Gong
2013-10-14 10:30       ` Borislav Petkov
2013-10-15 19:00   ` Naveen N. Rao
2013-10-11  6:32 ` [PATCH 5/8] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error Chen, Gong
2013-10-11 15:41   ` Borislav Petkov
2013-10-15 17:26   ` Naveen N. Rao
2013-10-16  1:35     ` Chen Gong
2013-10-11  6:32 ` [PATCH 6/8] ACPI, APEI, CPER: Enhance memory reporting capability Chen, Gong
2013-10-11 15:49   ` Borislav Petkov
2013-10-15 19:18   ` Naveen N. Rao
2013-10-11  6:32 ` [PATCH 7/8] ACPI, APEI, CPER: Cleanup CPER memory error output format Chen, Gong
2013-10-11 16:02   ` Borislav Petkov
2013-10-14  4:55     ` Chen Gong
2013-10-14 10:36       ` Borislav Petkov
2013-10-14 17:12         ` Tony Luck
2013-10-14 18:47           ` Borislav Petkov
2013-10-14 21:03             ` Tony Luck
2013-10-14 21:50               ` Borislav Petkov
2013-10-15  9:18                 ` Chen Gong
2013-10-15 10:13                   ` Borislav Petkov
2013-10-15 11:28           ` Naveen N. Rao
2013-10-15 11:41           ` Naveen N. Rao
2013-10-15 12:29             ` Borislav Petkov
2013-10-15 16:42               ` Joe Perches
2013-10-15 16:49                 ` Tony Luck
2013-10-15 16:56                   ` Borislav Petkov
2013-10-11  6:32 ` [PATCH 8/8] ACPI / trace: Add trace interface for eMCA driver Chen, Gong
2013-10-11  7:52   ` Borislav Petkov
2013-10-11 16:14   ` Borislav Petkov
2013-10-14  7:07     ` Chen Gong
2013-10-15 16:54   ` Naveen N. Rao
2013-10-15 17:00     ` Borislav Petkov
2013-10-15 17:30       ` Naveen N. Rao
2013-10-15 17:47         ` Borislav Petkov
2013-10-16  0:43         ` Mauro Carvalho Chehab
2013-10-16  9:16           ` Borislav Petkov
2013-10-16 10:35             ` Mauro Carvalho Chehab
2013-10-16 10:42               ` Borislav Petkov
2013-10-16 11:55                 ` Mauro Carvalho Chehab
2013-10-16 12:20                   ` Borislav Petkov
2013-10-16 20:47                   ` Luck, Tony
2013-10-17 10:34                     ` Mauro Carvalho Chehab
2013-10-17 21:35                       ` Luck, Tony
2013-10-16 20:35               ` Luck, Tony
2013-10-17 10:32                 ` Mauro Carvalho Chehab
2013-10-16  9:50     ` Chen Gong
2013-10-16 10:49       ` Borislav Petkov
2013-10-18 11:04         ` Naveen N. Rao
2013-10-11  7:00 ` Extended H/W error log driver Joe Perches
2013-10-11  8:04 ` Borislav Petkov
2013-10-11 14:54   ` Luck, Tony
2013-10-11 15:27     ` Borislav Petkov
2013-10-14  6:49   ` Chen Gong
2013-10-14 10:55     ` Borislav Petkov
2013-10-15  4:07       ` Chen Gong
2013-10-15  9:28         ` Borislav Petkov
2013-10-15 16:15           ` Tony Luck
2013-10-15 19:10             ` Naveen N. Rao
2013-10-15 19:23               ` Borislav Petkov
2013-10-17 12:07                 ` Naveen N. Rao
2013-10-17 13:04                   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1381473166-29303-1-git-send-email-gong.chen@linux.intel.com \
    --to=gong.chen@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).