From: Ingo Molnar <mingo@kernel.org>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>,
Prarit Bhargava <prarit@redhat.com>,
Vivek Goyal <vgoyal@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Junichi Nomura <j-nomura@ce.jp.nec.com>,
Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump
Date: Thu, 9 Apr 2015 11:13:21 +0200 [thread overview]
Message-ID: <20150409091321.GA9811@gmail.com> (raw)
In-Reply-To: <20150409083908.GA25764@hori1.linux.bs1.fc.nec.co.jp>
* Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
> On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote:
> >
> > * Borislav Petkov <bp@alien8.de> wrote:
> >
> > > Btw, Ingo had some reservations about this. Ingo?
> >
> > Yeah, so my concerns are the following:
> >
> > > kexec disables (or "shoots down") all CPUs other than the crashing
> > > CPU before entering the 2nd kernel. However, MCA is still enabled so
> > > if an MCE happens and broadcasts to the CPUs after the main thread
> > > starts the 2nd kernel (which might not initialize its MCE handler
> > > yet, or might decide not to enable it) the MCE handler runs only on
> > > the other CPUs (not on the main thread) leading to kernel panic
> > > during MCE synchronization. The user-visible effect of this bug is a
> > > kdump failure.
> >
> > So the thing is, when we boot up the second kernel there will be a
> > window where the old handler isn't valid (because the new kernel has
> > its own pagetables, etc.) and the new handler is not installed yet.
> >
> > If an MCE hits that window, it's bad luck. (unless the bootup sequence
> > is rearchitected significantly to allow cross-kernel inheritance of
> > MCE handlers.)
> >
> > So I think we can ignore _that_ race.
> >
> > The more significant question is: what happens when an MCE arrives
> > whiel the kdump is proceeding - as kdumps can take a long time to
> > finish when there's a lot of RAM.
>
> Without this patch, MCE makes idling CPUs unpreferably wake up and
> needlessly run MCE handler, which disturbs memory so does harm on
> the kdump. This patch improves not only the transition phase, but
> also that window.
The way the kdump code stops CPUs already 'disturbs' the state of
those CPUs.
> > But ... since the 'shootdown' is analogous to a CPU hotplug
> > CPU-down sequence, I suppose that the existing MCE code should
> > already properly handle the case where an MCE arrives on a
> > (supposedly) dead CPU, right?
>
> Currently not, so Tony mentioned some idea about it (although not
> included in this patch.)
>
> > In that case installing a separate MCE handler looks like the
> > wrong thing.
>
> One difference bewteen kdump and CPU offline is whether we need handle
> MCEs then or not. In CPU offline situation, running CPUs have to continue
> their normal operations, so it's imporatant to handle MCE (i.e. log and/or
> take recovery action), so I think that should be done in our main MCE
> handler, do_machine_check().
I disagree: if offline CPUs are still active and can produce MCEs then
they should be reported regardless of whether they were shot down by
the CPU hotplug code or by kdump.
> But that's not the case in kdump situation (logging or recovering is
> not possible/necessary any more.) So it seems make sense to me to
> separate the handler.
I disagree: for example logging to the screen is still possible and
should be done if there's an uncorrectable error.
So I agree that MCE policy should be made non-fatal during kdump, but
I disagree that it needs a separate handler: it should be part of the
regular MCE handling routines to handle kdump gracefully.
Thanks,
Ingo
next prev parent reply other threads:[~2015-04-09 9:13 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-03 9:01 [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Naoya Horiguchi
2015-03-03 9:01 ` [PATCH v3 2/2] x86: mce: comment about MCE synchronization timeout on definition of tolerant Naoya Horiguchi
2015-03-03 18:09 ` [PATCH v3 1/2] x86: mce: kexec: switch MCE handler for kexec/kdump Luck, Tony
2015-03-04 7:41 ` [PATCH v4] " Naoya Horiguchi
2015-03-04 23:12 ` Luck, Tony
2015-03-05 1:24 ` Naoya Horiguchi
2015-03-05 6:45 ` [PATCH v5] " Naoya Horiguchi
2015-03-05 8:57 ` Borislav Petkov
2015-03-05 9:37 ` Naoya Horiguchi
2015-03-06 2:59 ` [PATCH v6] " Naoya Horiguchi
2015-03-06 8:34 ` Borislav Petkov
2015-03-06 9:09 ` Naoya Horiguchi
2015-03-06 9:27 ` Borislav Petkov
2015-03-06 9:32 ` Naoya Horiguchi
2015-03-06 10:22 ` [PATCH v7] " Naoya Horiguchi
2015-04-06 7:18 ` Naoya Horiguchi
2015-04-06 11:59 ` Borislav Petkov
2015-04-07 8:00 ` Naoya Horiguchi
2015-04-07 8:02 ` [PATCH v8] " Naoya Horiguchi
2015-04-09 6:13 ` Borislav Petkov
2015-04-09 6:57 ` Naoya Horiguchi
2015-04-09 7:02 ` Borislav Petkov
2015-04-09 18:07 ` Luck, Tony
2015-04-09 8:00 ` Ingo Molnar
2015-04-09 8:21 ` Borislav Petkov
2015-04-09 8:59 ` Naoya Horiguchi
2015-04-09 9:53 ` Borislav Petkov
2015-04-09 18:22 ` Luck, Tony
2015-04-09 19:05 ` Borislav Petkov
2015-04-10 0:49 ` Naoya Horiguchi
2015-04-10 4:07 ` Naoya Horiguchi
2015-04-10 7:24 ` Borislav Petkov
2015-04-28 8:41 ` Baoquan He
2015-04-09 8:39 ` Naoya Horiguchi
2015-04-09 9:13 ` Ingo Molnar [this message]
2015-04-06 11:56 ` [PATCH v7] " Borislav Petkov
2015-04-07 7:59 ` Naoya Horiguchi
2015-03-06 8:28 ` [PATCH v5] " Borislav Petkov
2015-03-06 5:44 ` [PATCH v4] " Naoya Horiguchi
2015-03-05 8:48 ` Borislav Petkov
2015-03-03 18:53 ` [PATCH v3 1/2] " Borislav Petkov
2015-03-04 7:51 ` Naoya Horiguchi
2015-03-04 9:12 ` Borislav Petkov
2015-03-05 1:27 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150409091321.GA9811@gmail.com \
--to=mingo@kernel.org \
--cc=bp@alien8.de \
--cc=j-nomura@ce.jp.nec.com \
--cc=k-ueda@ct.jp.nec.com \
--cc=linux-kernel@vger.kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=prarit@redhat.com \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.