From: Naoya Horiguchi <nao.horiguchi@gmail.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Tony Luck <tony.luck@intel.com>, Vivek Goyal <vgoyal@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Junichi Nomura <j-nomura@ce.jp.nec.com>,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [PATCH 1/2] x86: mce: kdump: use under_crashdumping to turn off MCE in all CPUs together
Date: Tue, 24 Feb 2015 00:41:11 +0900 [thread overview]
Message-ID: <54EB4A17.6020800@gmail.com> (raw)
In-Reply-To: <20150223135842.GA22753@pd.tnic>
On 02/23/2015 10:58 PM, Borislav Petkov wrote:
> On Mon, Feb 23, 2015 at 10:01:50PM +0900, Naoya Horiguchi wrote:
>> userspace. What end users see is like these timeout messages:
>> - "Timeout: Not all CPUs entered broadcast exception handler",
>> - "Timeout: Subject CPUs unable to finish machine check processing",
>> - "Timeout: Monarch CPU unable to finish machine check processing", or
>> - "Timeout: Monarch CPU did not finish machine check processing".
>> These are informative for developers like us, but confusing for end users.
>
> Those messages won't go out if tolerant level is > 1 AFAICT and from
> looking at mce_timed_out() and the machine wouldn't panic, for that
> matter.
Sorry, I misread the code, and you're right. Please ignore this part.
>
> So what is the actual problem you're seeing?
>
> Cores timeoutting when a machine check happens during entering kdump or
> you not wanting cores to panic due to a machine check while the machine
> enters kdump?
What I saw was that once in hundreds of kdump and reboot cycle we hit
kdump failure and panic with "Timeout synchronizing machine check over
CPUs" message.
Panic is OK if the MCE is severe enough, but I don't think panic due to
this synchronization timeout is good because it is not related to MCE's
nature (like victim component or type of error) or severity, so even
recoverable MCEs could trigger this panic. This timeout is just an artifact
of current kdump code, so I think we can/should avoid it.
Anyway your suggestion of raising tolerant to 3 should solve this problem,
so I'll take this approach in the next post.
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2015-02-23 15:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-23 9:12 [PATCH 1/2] x86: mce: kdump: use under_crashdumping to turn off MCE in all CPUs together Naoya Horiguchi
2015-02-23 9:12 ` [PATCH 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-02-23 9:27 ` [PATCH 1/2] x86: mce: kdump: use under_crashdumping to turn off MCE in all CPUs together Borislav Petkov
2015-02-23 13:01 ` Naoya Horiguchi
2015-02-23 13:58 ` Borislav Petkov
2015-02-23 15:41 ` Naoya Horiguchi [this message]
2015-02-23 17:06 ` Borislav Petkov
2015-02-24 8:15 ` Naoya Horiguchi
2015-02-24 9:56 ` Borislav Petkov
2015-02-24 18:20 ` Luck, Tony
2015-02-24 18:40 ` Borislav Petkov
2015-02-24 18:47 ` Luck, Tony
2015-02-24 21:19 ` Borislav Petkov
2015-02-25 0:54 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54EB4A17.6020800@gmail.com \
--to=nao.horiguchi@gmail.com \
--cc=bp@alien8.de \
--cc=j-nomura@ce.jp.nec.com \
--cc=k-ueda@ct.jp.nec.com \
--cc=linux-kernel@vger.kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox