From: Borislav Petkov <bp@alien8.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
"Chen, Gong" <gong.chen@linux.intel.com>,
"joe@perches.com" <joe@perches.com>,
"m.chehab@samsung.com" <m.chehab@samsung.com>,
"arozansk@redhat.com" <arozansk@redhat.com>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform
Date: Fri, 18 Oct 2013 23:27:41 +0200 [thread overview]
Message-ID: <20131018212741.GA26049@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F31D41E37@ORSMSX106.amr.corp.intel.com>
On Fri, Oct 18, 2013 at 08:57:22PM +0000, Luck, Tony wrote:
> Long term ... I'd be happy to see mce_log() go away. But we need to
> have a robust, well tested replacement in place for some time before
> such a move is up for discussion.
Basically a userspace daemon consuming the tracepoint or plural,
tracepoints.
> Yes - double error reporting should be avoided.
Right.
> Our first platforms to implement this only do so for memory errors.
> This could change in the future (the UEFI appendix N error record has
> defined sub-sections for lots of types of errors).
Ok.
> Currently EDAC hooked into the mce even notification chain provides a
> return code to indicate whether it completely processed the error, or
> whether to fall through to the rest of mce_log():
>
> if (ret == NOTIFY_STOP)
> return;
>
> Having both EDAC and this new extended error log both registered on this
> chain would probably not be helpful in most cases.
Not only that - you don't need EDAC because all the information is in
the MCA registers and the eMCA supplement, if there is one.
EDAC would be used on older systems which don't sport eMCA.
Now, concerning the current situation, we probably want to do something
like this:
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1b04123f3d9..382c78eaf474 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -154,6 +154,10 @@ void mce_log(struct mce *mce)
/* Emit the trace record: */
trace_mce_record(mce);
+ if (mce_ext_err_print)
+ if (mce_ext_err_print(NULL, m.extcpu, i))
+ return;
+
ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
if (ret == NOTIFY_STOP)
return;
--
Right, we've moved the eMCA print thingie to mce_log so that we get a
chance to run the first TP issuing the raw MCA registers and then run
the eMCA TP as a follow-up.
We've taught mce_ext_err_print() to return a true/false retval to denote:
* true: it has collected data successfully, no need to go down the reporting
chain
* false: eMCA failed somehow, log the error down and trigger mcelog in
userspace.
How does that sound?
> Not sure if we should handle that with user education to not load both
> an EDAC and ext_log driver or if there should be some enforcement.
Definitely enforcement. The flags thing I was telling you about recently
could be one way to do it.
> trace_mce_record() dumps the raw data from the machine check banks. I
> think there may still be a case for having this. Analysis tools that
> look at this trace as well should be smart enough to connect the dots.
Yes, sure. The more non-overlaping data we get, the better.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
next prev parent reply other threads:[~2013-10-18 21:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-18 8:23 [PATCH v3 0/9] Extended H/W error log driver Chen, Gong
2013-10-18 8:23 ` [PATCH v3 1/9] ACPI, APEI, CPER: Fix status check during error printing Chen, Gong
2013-10-18 8:23 ` [PATCH v3 2/9] ACPI, CPER: Update cper info Chen, Gong
2013-10-18 12:39 ` Naveen N. Rao
2013-10-18 8:23 ` [PATCH v3 3/9] bitops: Introduce a more generic BITMASK macro Chen, Gong
2013-10-18 8:23 ` [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform Chen, Gong
2013-10-18 12:37 ` Naveen N. Rao
2013-10-18 12:53 ` Borislav Petkov
2013-10-18 20:57 ` Luck, Tony
2013-10-18 20:57 ` Luck, Tony
2013-10-18 21:27 ` Borislav Petkov [this message]
2013-10-18 22:22 ` Luck, Tony
2013-10-18 22:22 ` Luck, Tony
2013-10-19 9:57 ` Borislav Petkov
2013-10-21 19:03 ` Luck, Tony
2013-10-21 22:39 ` Tony Luck
2013-10-22 8:37 ` Borislav Petkov
2013-10-22 9:32 ` Naveen N. Rao
2013-10-19 11:31 ` Chen Gong
2013-10-20 7:06 ` Chen Gong
2013-10-20 8:21 ` Borislav Petkov
2013-10-21 16:27 ` Naveen N. Rao
2013-10-20 7:25 ` [PATCH V4 " Chen, Gong
2014-06-27 5:34 ` [PATCH v3 " Xie XiuQi
2014-06-27 5:34 ` Xie XiuQi
2014-06-27 9:22 ` Borislav Petkov
2014-06-27 20:43 ` Luck, Tony
2014-06-27 20:43 ` Luck, Tony
2014-06-27 21:14 ` Borislav Petkov
2014-06-27 22:10 ` Luck, Tony
2014-06-27 22:10 ` Luck, Tony
2014-06-27 22:14 ` Borislav Petkov
2014-06-30 6:35 ` Xie XiuQi
2013-10-18 8:23 ` [PATCH v3 5/9] DMI: Parse memory device (type 17) in SMBIOS Chen, Gong
2013-10-18 8:23 ` [PATCH v3 6/9] ACPI, APEI, CPER: Add UEFI 2.4 support for memory error Chen, Gong
2013-10-18 8:23 ` [PATCH v3 7/9] ACPI, APEI, CPER: Enhance memory reporting capability Chen, Gong
2013-10-18 8:23 ` [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory error output format Chen, Gong
2013-10-18 12:01 ` Naveen N. Rao
2013-10-19 11:26 ` Chen Gong
2013-10-21 16:22 ` Naveen N. Rao
2013-10-21 17:14 ` Luck, Tony
2013-10-21 17:14 ` Luck, Tony
2013-10-22 8:42 ` Borislav Petkov
2013-10-18 8:23 ` [PATCH v3 9/9] EDAC, GHES: Update ghes error record info Chen, Gong
2013-10-18 9:20 ` [PATCH v3 0/9] Extended H/W error log driver Borislav Petkov
2013-10-18 16:17 ` Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131018212741.GA26049@pd.tnic \
--to=bp@alien8.de \
--cc=arozansk@redhat.com \
--cc=gong.chen@linux.intel.com \
--cc=joe@perches.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.chehab@samsung.com \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.