From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: xlpang@redhat.com, x86@kernel.org, linux-kernel@vger.kernel.org,
kexec@lists.infradead.org, Ingo Molnar <mingo@redhat.com>,
Dave Young <dyoung@redhat.com>,
Prarit Bhargava <prarit@redhat.com>,
Junichi Nomura <j-nomura@ce.jp.nec.com>,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic
Date: Mon, 23 Jan 2017 09:40:09 -0800 [thread overview]
Message-ID: <20170123174008.GA4945@intel.com> (raw)
In-Reply-To: <20170123145056.fyraeehjfnwmmfb6@pd.tnic>
On Mon, Jan 23, 2017 at 03:50:56PM +0100, Borislav Petkov wrote:
> On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote:
> > One possible timing sequence would be:
> > 1st kernel running on multiple cpus panicked
> > then the crash dump code starts
> > the crash dump code stops the others cpus except the crashing one
> > 2nd kernel boots up on the crash cpu with "nr_cpus=1"
> > some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu)
>
> Where does this broadcasted MCE come from?
>
> The crash dump code triggered it? Or it happened before the panic()?
>
> Are you talking about an *actual* sequence which you're experiencing on
> real hw or is this something hypothetical?
If the system had experienced some memory corruption, but
recovered ... then there would be some pages sitting around
that the old kernel had marked as POISON and stopped using.
The kexec'd kernel doesn't know about these, so may touch that
memory while taking a crash dump ... and then you have a
broadcast machine check (on older[1] Intel CPUs that don't support
local machine check).
This is hard to work around. You really need all the CPUs to
have set CR4.MCE=1 (if any didn't, then they will force a reset
when they see the machine check). Also you need to make sure that
they jump to the copy of do_machine_check() in the new kernel, not
the old kernel.
A while ago I played with the nr_cpus=N code to have it bring
all the CPUs far enough online to get the machine check initialization
done, then any extras above "N" just go back offline again.
But I never got this to work reliably.
-Tony
[1] older == all released ones, at the moment.
next prev parent reply other threads:[~2017-01-23 17:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-23 8:01 [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic Xunlei Pang
2017-01-23 12:51 ` Borislav Petkov
2017-01-23 13:35 ` Xunlei Pang
2017-01-23 14:50 ` Borislav Petkov
2017-01-23 17:40 ` Luck, Tony [this message]
2017-01-23 17:51 ` Borislav Petkov
2017-01-23 18:01 ` Luck, Tony
2017-01-23 18:14 ` Borislav Petkov
2017-01-24 2:33 ` Xunlei Pang
2017-01-24 1:46 ` Xunlei Pang
2017-01-24 1:51 ` Xunlei Pang
2017-01-24 1:27 ` Xunlei Pang
2017-01-24 12:22 ` Borislav Petkov
2017-01-26 6:30 ` Xunlei Pang
2017-01-26 6:44 ` Borislav Petkov
2017-02-16 5:36 ` Xunlei Pang
2017-02-16 10:18 ` Borislav Petkov
2017-02-16 11:52 ` Xunlei Pang
2017-02-16 12:22 ` Borislav Petkov
2017-02-17 1:53 ` Xunlei Pang
2017-02-17 9:07 ` Borislav Petkov
2017-02-17 16:21 ` Xunlei Pang
2017-02-21 18:20 ` Luck, Tony
2017-02-22 5:50 ` Xunlei Pang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170123174008.GA4945@intel.com \
--to=tony.luck@intel.com \
--cc=bp@alien8.de \
--cc=dyoung@redhat.com \
--cc=j-nomura@ce.jp.nec.com \
--cc=k-ueda@ct.jp.nec.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=prarit@redhat.com \
--cc=x86@kernel.org \
--cc=xlpang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).