From: Borislav Petkov <bp@alien8.de>
To: Corey Minyard <minyard@acm.org>
Cc: Corey Minyard <cminyard@mvista.com>,
"Luck, Tony" <tony.luck@intel.com>,
Steven Rostedt <rostedt@goodmis.org>,
"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH][RT] x86: Fix an RT MCE crash
Date: Fri, 1 Jul 2016 09:20:50 +0200 [thread overview]
Message-ID: <20160701072050.GA4593@pd.tnic> (raw)
In-Reply-To: <5775A181.2050404@acm.org>
On Thu, Jun 30, 2016 at 05:47:29PM -0500, Corey Minyard wrote:
> You are right, I enabled that on the tip of master and I get the
> following spewing out for a while:
>
> EDAC MC0: 27843 CE memory read error on CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
> (channel:1 slot:0 page:0x102c offset:0x180 grain:32 syndrome:0x0 - OVERFLOW
> area:DRAM err_code:0001:0091 socket:0 ha:0 channel_mask:2 rank:0)
>
> So there's apparently something broken in the hardware.
Yeah, DIMM0 on your socket 0 is generating a bunch of correctable errors
and might go bad soon, the stress being on "might". You could replace
it.
> That sounds like a bit much.
Actually, you probably would need only a couple:
1. 648ed94038c0 ("x86/mce: Provide a lockless memory pool to save error records")
2. 061120aed708 ("x86/mce: Don't use percpu workqueues")
- that one is unrelated but should be nice for RT as it gets rid of percpu
workqueues and I know RT hates them :)
3. fd4cf79fcc4b ("x86/mce: Remove the MCE ring for Action Optional errors")
- this one connects the genpool to MCE
4. f29a7aff4bd6 ("x86/mce: Avoid potential deadlock due to printk() in MCE context")
- and this is the last one which I meant earlier.
So that's 4 patches, more or less.
Now, you're in the perfect position to test those because you *actually*
have a real-life system which generates those errors so it is the
perfect candidate for testing the backports. And you should test them
with the failing DIMM still in place, of course.
HTH.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
next prev parent reply other threads:[~2016-07-01 7:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-30 13:24 [PATCH][RT] x86: Fix an RT MCE crash minyard
2016-06-30 13:43 ` Steven Rostedt
2016-06-30 14:49 ` Corey Minyard
2016-06-30 15:51 ` Steven Rostedt
2016-06-30 15:58 ` Corey Minyard
2016-06-30 16:01 ` Borislav Petkov
2016-06-30 16:17 ` Luck, Tony
2016-06-30 16:40 ` Corey Minyard
2016-06-30 17:01 ` Borislav Petkov
2016-06-30 17:18 ` Corey Minyard
2016-06-30 17:26 ` Borislav Petkov
2016-06-30 17:54 ` Corey Minyard
2016-06-30 18:22 ` Borislav Petkov
2016-06-30 19:44 ` Corey Minyard
2016-06-30 20:34 ` Borislav Petkov
2016-06-30 22:47 ` Corey Minyard
2016-07-01 7:20 ` Borislav Petkov [this message]
2016-07-06 0:59 ` Corey Minyard
2016-07-06 8:37 ` Borislav Petkov
2016-07-06 12:03 ` Corey Minyard
2016-07-06 13:32 ` Steven Rostedt
2016-07-06 13:43 ` Sebastian Andrzej Siewior
2016-07-11 17:32 ` Steven Rostedt
2016-07-01 9:20 ` Daniel Wagner
2016-06-30 16:04 ` Corey Minyard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160701072050.GA4593@pd.tnic \
--to=bp@alien8.de \
--cc=cminyard@mvista.com \
--cc=linux-rt-users@vger.kernel.org \
--cc=minyard@acm.org \
--cc=rostedt@goodmis.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.