From: Borislav Petkov <bp@alien8.de>
To: "Chen, Gong" <gong.chen@linux.intel.com>,
Steven Rostedt <rostedt@goodmis.org>
Cc: tony.luck@intel.com, joe@perches.com,
naveen.n.rao@linux.vnet.ibm.com, arozansk@redhat.com,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 0/9] Extended H/W error log driver
Date: Wed, 16 Oct 2013 18:05:50 +0200 [thread overview]
Message-ID: <20131016160550.GG13608@pd.tnic> (raw)
In-Reply-To: <1381935366-11731-1-git-send-email-gong.chen@linux.intel.com>
On Wed, Oct 16, 2013 at 10:55:57AM -0400, Chen, Gong wrote:
> [PATCH v2 1/9] ACPI, APEI, CPER: Fix status check during error printing
> [PATCH v2 2/9] ACPI, CPER: Update cper info
> [PATCH v2 3/9] bitops: Introduce a more generic BITMASK macro
> [PATCH v2 4/9] ACPI, x86: Extended error log driver for x86 platform
> [PATCH v2 5/9] DMI: Parse memory device (type 17) in SMBIOS
> [PATCH v2 6/9] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
> [PATCH v2 7/9] ACPI, APEI, CPER: Enhance memory reporting capability
> [PATCH v2 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format
> [PATCH v2 9/9] ACPI / trace: Add trace interface for eMCA driver
>
> This patch series adds an enhanced MCA event logging driver provided by Intel.
> Please refer to this link: htpp://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
>
> Certain usages such as Predictive Failure Analysis (PFA) require more
> information about the error than what can be described in processor
> machine check banks. Most server processors log additional information
> about the error in processor uncore registers. Since the addresses
> and layout of these registers vary widely from one processor to another,
> system software cannot readily make use of them. To complicate matters
> further, some of the additionalerror information cannot be constructed
space between "additional" and "error".
> without detailed knowledge about platform topology. This enhanced MCA
> logging driver allows firmware to provide additional error information
> to MCE/CMCI handler and thus addresses this gap.
This paragraph sounds like a very good description of the feature and
should actually be the Kconfig text in patch 4/9.
>
> After applying this patch series, when a memory corrected error happens,
> we can get following information:
>
> dmesg output:
>
> [ 949.545817] {1}Hardware error detected on CPU15
> [ 949.549786] {1}event severity: corrected
> [ 949.549786] {1} Error 0, type: corrected
> [ 949.549786] {1} section_type: memory error
> [ 949.549786] {1} physical_address: 0x0000001057eb0000
> [ 949.549786] {1} DIMM location: Memriser3 CHANNEL A DIMM 0
> [ 949.549786] {1}Above error has been corrected by h/w and require no further action
> [ 949.549786] mce: [Hardware Error]: Machine check events logged
> [ 1010.902124] {2}Hardware error detected on CPU15
> [ 1010.906064] {2}event severity: corrected
> [ 1010.906064] {2} Error 0, type: corrected
> [ 1010.906064] {2} section_type: memory error
> [ 1010.906064] {2} physical_address: 0x0000001057eb0000
> [ 1010.906064] {2} DIMM location: Memriser3 CHANNEL A DIMM 0
> [ 1010.906064] {2}Above error has been corrected by h/w and require no further action
> [ 1010.906064] mce: [Hardware Error]: Machine check events logged
Yep, looks almost very good. One nit: can you raise the action line
higher, like this:
> [ 949.545817] {1}Hardware error detected on CPU15
> [ 949.549786] {1}It has been corrected by h/w and requires no further action
<here come the error details>
I mean, this is only the printk output and with a userspace consumer of
the tracepoint, none of this will go to dmesg but in cases when there's
no userspace consumer, it is still readable and understandable.
> For trace output format we still need further discussion. In the last
> patch(support trace interface) I have to reserve previous Kconfig
> format because I find once I put trace_event interface in the module,
> it will not work. I will paste another trace patch(it only works when
> acpi_extlog is builtin) for your answer.
I think to be able to define TRACE_EVENTs in modules, you need
https://lwn.net/Articles/383362/
Steve, that still true?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2013-10-16 16:06 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-16 14:55 [PATCH v2 0/9] Extended H/W error log driver Chen, Gong
2013-10-16 14:55 ` [PATCH v2 1/9] ACPI, APEI, CPER: Fix status check during error printing Chen, Gong
2013-10-16 16:53 ` Mauro Carvalho Chehab
2013-10-16 14:55 ` [PATCH v2 2/9] ACPI, CPER: Update cper info Chen, Gong
2013-10-16 16:28 ` Borislav Petkov
2013-10-16 16:52 ` Mauro Carvalho Chehab
2013-10-16 14:56 ` [PATCH v2 3/9] bitops: Introduce a more generic BITMASK macro Chen, Gong
2013-10-16 16:41 ` Borislav Petkov
2013-10-16 17:02 ` Mauro Carvalho Chehab
2013-10-17 2:31 ` Chen Gong
2013-10-17 2:59 ` Joe Perches
2013-10-17 6:30 ` Chen Gong
2013-10-17 6:58 ` Joe Perches
2013-10-17 7:38 ` Chen Gong
2013-10-17 8:32 ` Joe Perches
2013-10-17 8:40 ` Borislav Petkov
2013-10-17 8:55 ` Joe Perches
2013-10-17 16:10 ` Tony Luck
2013-10-17 18:13 ` Joe Perches
2013-10-16 14:56 ` [PATCH v2 4/9] ACPI, x86: Extended error log driver for x86 platform Chen, Gong
2013-10-16 17:02 ` Borislav Petkov
2013-10-16 14:56 ` [PATCH v2 5/9] DMI: Parse memory device (type 17) in SMBIOS Chen, Gong
2013-10-16 17:05 ` Borislav Petkov
2013-10-17 10:14 ` Mauro Carvalho Chehab
2013-10-16 14:56 ` [PATCH v2 6/9] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error Chen, Gong
2013-10-16 16:43 ` Mauro Carvalho Chehab
2013-10-17 10:23 ` Mauro Carvalho Chehab
2013-10-17 12:16 ` Chen Gong
2013-10-17 12:23 ` Naveen N. Rao
2013-10-16 14:56 ` [PATCH v2 7/9] ACPI, APEI, CPER: Enhance memory reporting capability Chen, Gong
2013-10-16 17:11 ` Borislav Petkov
2013-10-17 10:24 ` Mauro Carvalho Chehab
2013-10-16 14:56 ` [PATCH v2 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format Chen, Gong
2013-10-16 17:24 ` Borislav Petkov
2013-10-17 10:27 ` Mauro Carvalho Chehab
2013-10-16 14:56 ` [PATCH v2 9/9] ACPI / trace: Add trace interface for eMCA driver Chen, Gong
2013-10-16 15:50 ` Mauro Carvalho Chehab
2013-10-16 17:29 ` Borislav Petkov
2013-10-16 15:06 ` [PATCH v2 0/9] Extended H/W error log driver Chen Gong
2013-10-16 16:05 ` Borislav Petkov [this message]
2013-10-16 16:49 ` Joe Perches
2013-10-16 16:56 ` Steven Rostedt
2013-10-16 18:00 ` Borislav Petkov
2013-10-16 18:11 ` Borislav Petkov
2013-10-17 14:33 ` Chen Gong
2013-10-17 15:25 ` Steven Rostedt
2013-10-17 15:35 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131016160550.GG13608@pd.tnic \
--to=bp@alien8.de \
--cc=arozansk@redhat.com \
--cc=gong.chen@linux.intel.com \
--cc=joe@perches.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=rostedt@goodmis.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.