public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Tony Luck <tony.luck@intel.com>,
	Prarit Bhargava <prarit@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>,
	Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump
Date: Thu, 9 Apr 2015 10:00:30 +0200	[thread overview]
Message-ID: <20150409080030.GA4713@gmail.com> (raw)
In-Reply-To: <20150409061346.GA25434@pd.tnic>


* Borislav Petkov <bp@alien8.de> wrote:

> Btw, Ingo had some reservations about this. Ingo?

Yeah, so my concerns are the following:

> kexec disables (or "shoots down") all CPUs other than the crashing 
> CPU before entering the 2nd kernel. However, MCA is still enabled so 
> if an MCE happens and broadcasts to the CPUs after the main thread 
> starts the 2nd kernel (which might not initialize its MCE handler 
> yet, or might decide not to enable it) the MCE handler runs only on 
> the other CPUs (not on the main thread) leading to kernel panic 
> during MCE synchronization. The user-visible effect of this bug is a 
> kdump failure.

So the thing is, when we boot up the second kernel there will be a 
window where the old handler isn't valid (because the new kernel has 
its own pagetables, etc.) and the new handler is not installed yet.

If an MCE hits that window, it's bad luck. (unless the bootup sequence 
is rearchitected significantly to allow cross-kernel inheritance of 
MCE handlers.)

So I think we can ignore _that_ race.

The more significant question is: what happens when an MCE arrives 
whiel the kdump is proceeding - as kdumps can take a long time to 
finish when there's a lot of RAM.

But ... since the 'shootdown' is analogous to a CPU hotplug CPU-down 
sequence, I suppose that the existing MCE code should already properly 
handle the case where an MCE arrives on a (supposedly) dead CPU, 
right? In that case installing a separate MCE handler looks like the 
wrong thing.

> Our standard MCE handler do_machine_check() assumes a bunch of 
> things about system's status and it's hard to alter it to cover 
> kexec/kdump context, so add another, kdump-specific one and switch 
> to it.

So I don't like this principle either: 'our current code is a mess 
that might not work, add new one'.

> Note that this problem exists since current MCE handler was 
> implemented in 2.6.32, and recently commit 716079f66eac ("mce: Panic 
> when a core has reached a timeout") made it more visible by changing 
> the default behavior of the synchronization timeout from "ignore" to 
> "panic".

Looks like that's the real problem. How about the kdump crash dumper 
sets it back to 'ignore' again when we crash, and also double check 
how we handle various corner cases?

Thanks,

	Ingo

  parent reply	other threads:[~2015-04-09  8:00 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03  9:01 [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Naoya Horiguchi
2015-03-03  9:01 ` [PATCH v3 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-03-03 18:09 ` [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Luck, Tony
2015-03-04  7:41   ` [PATCH v4] " Naoya Horiguchi
2015-03-04 23:12     ` Luck, Tony
2015-03-05  1:24       ` Naoya Horiguchi
2015-03-05  6:45         ` [PATCH v5] " Naoya Horiguchi
2015-03-05  8:57           ` Borislav Petkov
2015-03-05  9:37             ` Naoya Horiguchi
2015-03-06  2:59               ` [PATCH v6] " Naoya Horiguchi
2015-03-06  8:34                 ` Borislav Petkov
2015-03-06  9:09                   ` Naoya Horiguchi
2015-03-06  9:27                     ` Borislav Petkov
2015-03-06  9:32                       ` Naoya Horiguchi
2015-03-06 10:22                         ` [PATCH v7] " Naoya Horiguchi
2015-04-06  7:18                           ` Naoya Horiguchi
2015-04-06 11:59                             ` Borislav Petkov
2015-04-07  8:00                               ` Naoya Horiguchi
2015-04-07  8:02                                 ` [PATCH v8] " Naoya Horiguchi
2015-04-09  6:13                                   ` Borislav Petkov
2015-04-09  6:57                                     ` Naoya Horiguchi
2015-04-09  7:02                                       ` Borislav Petkov
2015-04-09 18:07                                         ` Luck, Tony
2015-04-09  8:00                                     ` Ingo Molnar [this message]
2015-04-09  8:21                                       ` Borislav Petkov
2015-04-09  8:59                                         ` Naoya Horiguchi
2015-04-09  9:53                                           ` Borislav Petkov
2015-04-09 18:22                                             ` Luck, Tony
2015-04-09 19:05                                               ` Borislav Petkov
2015-04-10  0:49                                                 ` Naoya Horiguchi
2015-04-10  4:07                                                   ` Naoya Horiguchi
2015-04-10  7:24                                                     ` Borislav Petkov
2015-04-28  8:41                                                   ` Baoquan He
2015-04-09  8:39                                       ` Naoya Horiguchi
2015-04-09  9:13                                         ` Ingo Molnar
2015-04-06 11:56                           ` [PATCH v7] " Borislav Petkov
2015-04-07  7:59                             ` Naoya Horiguchi
2015-03-06  8:28               ` [PATCH v5] " Borislav Petkov
2015-03-06  5:44         ` [PATCH v4] " Naoya Horiguchi
2015-03-05  8:48       ` Borislav Petkov
2015-03-03 18:53 ` [PATCH v3 1/2] " Borislav Petkov
2015-03-04  7:51   ` Naoya Horiguchi
2015-03-04  9:12     ` Borislav Petkov
2015-03-05  1:27       ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150409080030.GA4713@gmail.com \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=k-ueda@ct.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=prarit@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox