From: Greg KH <gregkh@linuxfoundation.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
Tianyu Lan <Tianyu.Lan@microsoft.com>,
Konrad Wilk <konrad.wilk@oracle.com>,
Josh Poimboeuf <jpoimboe@redhat.com>,
Mukesh Ojha <mojha@codeaurora.org>,
Peter Zijlstra <peterz@infradead.org>,
Jiri Kosina <jkosina@suse.cz>, Rik van Riel <riel@surriel.com>,
Andy Lutomirski <luto@kernel.org>,
Micheal Kelley <michael.h.kelley@microsoft.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Borislav Petkov <bp@alien8.de>,
x86@kernel.org
Subject: Re: [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Date: Wed, 27 Mar 2019 10:12:42 +0900 [thread overview]
Message-ID: <20190327011242.GA3099@kroah.com> (raw)
In-Reply-To: <20190326163811.503390616@linutronix.de>
On Tue, Mar 26, 2019 at 05:36:05PM +0100, Thomas Gleixner wrote:
> Tianyu reported a crash in a CPU hotplug teardown callback when booting a
> kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
> parameter.
>
> It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
> forever in case that a bringup callback fails. Unfortunately this issue was
> not recognized when the CPU hotplug code was reworked, so the shortcoming
> just stayed in place.
>
> When a bringup callback fails, the CPU hotplug code rolls back the
> operation and takes the CPU offline.
>
> The 'nosmt' command line argument uses a bringup failure to abort the
> bringup of SMT sibling CPUs. This partial bringup is required due to the
> MCE misdesign on Intel CPUs.
>
> With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
> CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
> teardown of a CPU including the synchronizations in various facilities like
> RCU, NOHZ and others.
>
> As a consequence the teardown callbacks which must be executed on the
> outgoing CPU within stop machine with interrupts disabled are executed on
> the control CPU in interrupt enabled and preemptible context causing the
> kernel to crash and burn. The pre state machine code has a different
> failure mode which is more subtle and resulting in a less obvious use after
> free crash because the control side frees resources which are still in use
> by the undead CPU.
>
> But this is not a x86 only problem. Any architecture which supports the
> SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
> likely to be triggered because in 99.99999% of the cases all bringup
> callbacks succeed.
>
> The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
> all architectures as the following architectures have either no hotplug
> support at all or not all subarchitectures support it:
>
> alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
>
> Crashing the kernel in such a situation is not an acceptable state
> either.
>
> Implement a minimal rollback variant by limiting the teardown to the point
> where all regular teardown callbacks have been invoked and leave the CPU in
> the 'dead' idle state. This has the following consequences:
>
> - the CPU is brought down to the point where the stop_machine takedown
> would happen.
>
> - the CPU stays there forever and is idle
>
> - The CPU is cleared in the CPU active mask, but not in the CPU online
> mask which is a legit state.
>
> - Interrupts are not forced away from the CPU
>
> - All facilities which only look at online mask would still see it, but
> that is the case during normal hotplug/unplug operations as well. It's
> just a (way) longer time frame.
>
> This will expose issues, which haven't been exposed before or only seldom,
> because now the normally transient state of being non active but online is
> a permanent state. In testing this exposed already an issue vs. work queues
> where the vmstat code schedules work on the almost dead CPU which ends up
> in an unbound workqueue and triggers 'preemtible context' warnings. This is
> not a problem of this change, it merily exposes an already existing issue.
> Still this is better than crashing fully without a chance to debug it.
>
> This is mainly thought as workaround for those architectures which do not
> support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
>
> Fixes: 2e1a3483ce74 ("cpu/hotplug: Split out the state walk into functions")
> Reported-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Tested-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Mukesh Ojha <mojha@codeaurora.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
> Cc: K. Y. Srinivasan <kys@microsoft.com>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: stable@vger.kernel.org
> ---
> kernel/cpu.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
next prev parent reply other threads:[~2019-03-27 1:12 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-26 16:36 [patch 0/2] cpu/hotplug: Prevent damage with SMP=y and HOTPLUG_CPU=n Thomas Gleixner
2019-03-26 16:36 ` [patch 1/2] cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n Thomas Gleixner
2019-03-27 1:12 ` Greg KH [this message]
2019-03-28 12:38 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
2019-03-26 16:36 ` [patch 2/2] x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y Thomas Gleixner
2019-03-27 1:12 ` Greg KH
2019-03-28 12:39 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190327011242.GA3099@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=Tianyu.Lan@microsoft.com \
--cc=bp@alien8.de \
--cc=jkosina@suse.cz \
--cc=jpoimboe@redhat.com \
--cc=konrad.wilk@oracle.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=michael.h.kelley@microsoft.com \
--cc=mojha@codeaurora.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.