All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pingfan Liu <kernelfans@gmail.com>
To: Dave Young <dyoung@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Baoquan He <bhe@redhat.com>,
	tony.luck@intel.com, Peter Zijlstra <peterz@infradead.org>,
	kexec@lists.infradead.org,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	x86@kernel.org, Michal Hocko <mhocko@suse.com>,
	linux-kernel@vger.kernel.org, Daniel Drake <drake@endlessm.com>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Qian Cai <cai@lca.pw>,
	Thomas Gleixner <tglx@linutronix.de>,
	Xunlei Pang <xlpang@linux.alibaba.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Eric Biederman <ebiederm@xmission.com>
Subject: Re: [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce
Date: Wed, 7 Aug 2019 15:52:26 +0800	[thread overview]
Message-ID: <20190807075226.GA10392@mypc> (raw)
In-Reply-To: <20190807025843.GA4776@dhcp-128-65.nay.redhat.com>

On Wed, Aug 07, 2019 at 11:00:41AM +0800, Dave Young wrote:
> Add Tony and Xunlei in cc.
> On 08/05/19 at 04:58pm, Pingfan Liu wrote:
> > This series include two related groups:
> > [1-3/4]: protect nr_cpus from rebooting by broadcast mce
> > [4/4]: improve "kexec -l" robustness against broadcast mce
> > 
> > When I tried to fix [1], Thomas raised concern about the nr_cpus' vulnerability
> > to unexpected rebooting by broadcast mce. After analysis, I think only the
> > following first case suffers from the rebooting by broadcast mce. [1-3/4] aims
> > to fix that issue.
> 
> I did not understand and read the MCE details, but we previously had a
> MCE problem, Xunlei fixed in below commit:
> commit 5bc329503e8191c91c4c40836f062ef771d8ba83
> Author: Xunlei Pang <xlpang@redhat.com>
> Date:   Mon Mar 13 10:50:19 2017 +0100
> 
>     x86/mce: Handle broadcasted MCE gracefully with kexec
> 
> I wonder if this is same issue or not. Also the old discussion is in
> below thread:
> https://lore.kernel.org/patchwork/patch/753530/
> 
> Tony raised similar questions, but I'm not sure if it is still a problem
> or it has been fixed.
> 
Xunlei's patch is the precondition of the stability for the case 2: boot up by "kexec -p nr_cpus="

For case1/3, extra effort is needed.

Thanks,
	Pingfan
> > 
> > *** Back ground ***
> > 
> > On x86 it's required to have all logical CPUs set CR4.MCE=1. Otherwise, a
> > broadcast MCE observing CR4.MCE=0b on any core will shutdown the machine.
> > 
> > The option 'nosmt' has already complied with the above rule by Thomas's patch.
> > For detail, refer to 506a66f3748 (Revert "x86/apic: Ignore secondary threads if
> > nosmt=force")
> > 
> > But for nr_cpus option, the exposure to broadcast MCE is a little complicated,
> > and can be categorized into three cases.
> > 
> > -1. boot up by BIOS. Since no one set CR4.MCE=1, nr_cpus risks rebooting by
> > broadcast MCE.
> > 
> > -2. boot up by "kexec -p nr_cpus=".  Since the 1st kernel has all cpus'
> > CR4.MCE=1 set before kexec -p, nr_cpus is free of rebooting by broadcast MCE.
> > Furthermore, the crashed kernel's wreckage, including page table and text, is
> > not touched by capture kernel. Hence if MCE event happens on capped cpu,
> > do_machine_check->__mc_check_crashing_cpu() runs smoothly and returns
> > immediately, the capped cpu is still pinned on "halt".
> > 
> > -3. boot up by "kexec -l nr_cpus=". As "kexec -p", it is free of rebooting by
> > broadcast MCE. But the 1st kernel's wreckage is discarded and changed.  when
> > capped cpus execute do_machine_check(), they may crack the new kernel.  But
> > this is not related with broadcast MCE, and need an extra fix.
> > 
> > *** Solution ***
> > "nr_cpus" can not follow the same way as "nosmt".  Because nr_cpus limits the
> > allocation of percpu area and some other kthread memory, which is critical to
> > cpu hotplug framework.  Instead, developing a dedicated SIPI callback
> > make_capped_cpu_stable() for capped cpu, which does not lean on percpu area to
> > work.
> > 
> > [1]: https://lkml.org/lkml/2019/7/5/3
> > 
> > To: Gleixner <tglx@linutronix.de>
> > To: Andy Lutomirski <luto@kernel.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > To: x86@kernel.org
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Daniel Drake <drake@endlessm.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: kexec@lists.infradead.org
> > 
> > ---
> > Pingfan Liu (4):
> >   x86/apic: correct the ENO in generic_processor_info()
> >   x86/apic: record capped cpu in generic_processor_info()
> >   x86/smp: send capped cpus to a stable state when smp_init()
> >   x86/smp: disallow MCE handler on rebooting AP
> > 
> >  arch/x86/include/asm/apic.h  |  1 +
> >  arch/x86/include/asm/smp.h   |  3 ++
> >  arch/x86/kernel/apic/apic.c  | 23 ++++++++----
> >  arch/x86/kernel/cpu/common.c |  7 ++++
> >  arch/x86/kernel/smp.c        |  8 +++++
> >  arch/x86/kernel/smpboot.c    | 83 ++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/smp.c                 |  6 ++++
> >  7 files changed, 124 insertions(+), 7 deletions(-)
> > 
> > -- 
> > 2.7.5
> > 
> 
> Thanks
> Dave

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Pingfan Liu <kernelfans@gmail.com>
To: Dave Young <dyoung@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Masami Hiramatsu <mhiramat@kernel.org>, Qian Cai <cai@lca.pw>,
	Vlastimil Babka <vbabka@suse.cz>,
	Daniel Drake <drake@endlessm.com>,
	Jacob Pan <jacob.jun.pan@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Eric Biederman <ebiederm@xmission.com>,
	linux-kernel@vger.kernel.org, Baoquan He <bhe@redhat.com>,
	kexec@lists.infradead.org, tony.luck@intel.com,
	Xunlei Pang <xlpang@linux.alibaba.com>
Subject: Re: [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce
Date: Wed, 7 Aug 2019 15:52:26 +0800	[thread overview]
Message-ID: <20190807075226.GA10392@mypc> (raw)
In-Reply-To: <20190807025843.GA4776@dhcp-128-65.nay.redhat.com>

On Wed, Aug 07, 2019 at 11:00:41AM +0800, Dave Young wrote:
> Add Tony and Xunlei in cc.
> On 08/05/19 at 04:58pm, Pingfan Liu wrote:
> > This series include two related groups:
> > [1-3/4]: protect nr_cpus from rebooting by broadcast mce
> > [4/4]: improve "kexec -l" robustness against broadcast mce
> > 
> > When I tried to fix [1], Thomas raised concern about the nr_cpus' vulnerability
> > to unexpected rebooting by broadcast mce. After analysis, I think only the
> > following first case suffers from the rebooting by broadcast mce. [1-3/4] aims
> > to fix that issue.
> 
> I did not understand and read the MCE details, but we previously had a
> MCE problem, Xunlei fixed in below commit:
> commit 5bc329503e8191c91c4c40836f062ef771d8ba83
> Author: Xunlei Pang <xlpang@redhat.com>
> Date:   Mon Mar 13 10:50:19 2017 +0100
> 
>     x86/mce: Handle broadcasted MCE gracefully with kexec
> 
> I wonder if this is same issue or not. Also the old discussion is in
> below thread:
> https://lore.kernel.org/patchwork/patch/753530/
> 
> Tony raised similar questions, but I'm not sure if it is still a problem
> or it has been fixed.
> 
Xunlei's patch is the precondition of the stability for the case 2: boot up by "kexec -p nr_cpus="

For case1/3, extra effort is needed.

Thanks,
	Pingfan
> > 
> > *** Back ground ***
> > 
> > On x86 it's required to have all logical CPUs set CR4.MCE=1. Otherwise, a
> > broadcast MCE observing CR4.MCE=0b on any core will shutdown the machine.
> > 
> > The option 'nosmt' has already complied with the above rule by Thomas's patch.
> > For detail, refer to 506a66f3748 (Revert "x86/apic: Ignore secondary threads if
> > nosmt=force")
> > 
> > But for nr_cpus option, the exposure to broadcast MCE is a little complicated,
> > and can be categorized into three cases.
> > 
> > -1. boot up by BIOS. Since no one set CR4.MCE=1, nr_cpus risks rebooting by
> > broadcast MCE.
> > 
> > -2. boot up by "kexec -p nr_cpus=".  Since the 1st kernel has all cpus'
> > CR4.MCE=1 set before kexec -p, nr_cpus is free of rebooting by broadcast MCE.
> > Furthermore, the crashed kernel's wreckage, including page table and text, is
> > not touched by capture kernel. Hence if MCE event happens on capped cpu,
> > do_machine_check->__mc_check_crashing_cpu() runs smoothly and returns
> > immediately, the capped cpu is still pinned on "halt".
> > 
> > -3. boot up by "kexec -l nr_cpus=". As "kexec -p", it is free of rebooting by
> > broadcast MCE. But the 1st kernel's wreckage is discarded and changed.  when
> > capped cpus execute do_machine_check(), they may crack the new kernel.  But
> > this is not related with broadcast MCE, and need an extra fix.
> > 
> > *** Solution ***
> > "nr_cpus" can not follow the same way as "nosmt".  Because nr_cpus limits the
> > allocation of percpu area and some other kthread memory, which is critical to
> > cpu hotplug framework.  Instead, developing a dedicated SIPI callback
> > make_capped_cpu_stable() for capped cpu, which does not lean on percpu area to
> > work.
> > 
> > [1]: https://lkml.org/lkml/2019/7/5/3
> > 
> > To: Gleixner <tglx@linutronix.de>
> > To: Andy Lutomirski <luto@kernel.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > To: x86@kernel.org
> > Cc: Masami Hiramatsu <mhiramat@kernel.org>
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Daniel Drake <drake@endlessm.com>
> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: kexec@lists.infradead.org
> > 
> > ---
> > Pingfan Liu (4):
> >   x86/apic: correct the ENO in generic_processor_info()
> >   x86/apic: record capped cpu in generic_processor_info()
> >   x86/smp: send capped cpus to a stable state when smp_init()
> >   x86/smp: disallow MCE handler on rebooting AP
> > 
> >  arch/x86/include/asm/apic.h  |  1 +
> >  arch/x86/include/asm/smp.h   |  3 ++
> >  arch/x86/kernel/apic/apic.c  | 23 ++++++++----
> >  arch/x86/kernel/cpu/common.c |  7 ++++
> >  arch/x86/kernel/smp.c        |  8 +++++
> >  arch/x86/kernel/smpboot.c    | 83 ++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/smp.c                 |  6 ++++
> >  7 files changed, 124 insertions(+), 7 deletions(-)
> > 
> > -- 
> > 2.7.5
> > 
> 
> Thanks
> Dave

  reply	other threads:[~2019-08-07  7:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05  8:58 [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce Pingfan Liu
2019-08-05  8:58 ` Pingfan Liu
2019-08-05  8:58 ` [PATCH 1/4] x86/apic: correct the ENO in generic_processor_info() Pingfan Liu
2019-08-05  8:58   ` Pingfan Liu
2019-08-05  8:58 ` [PATCH 2/4] x86/apic: record capped cpu " Pingfan Liu
2019-08-05  8:58   ` Pingfan Liu
2019-08-08  0:17   ` kbuild test robot
2019-08-08  0:17     ` kbuild test robot
2019-08-08  0:17   ` [RFC PATCH] x86/apic: __cpu_capped_mask can be static kbuild test robot
2019-08-08  0:17     ` kbuild test robot
2019-08-05  8:58 ` [PATCH 3/4] x86/smp: send capped cpus to a stable state when smp_init() Pingfan Liu
2019-08-05  8:58   ` Pingfan Liu
2019-08-08  1:20   ` kbuild test robot
2019-08-08  1:20     ` kbuild test robot
2019-08-08  1:20   ` [RFC PATCH] x86/smp: __cpu_capped_done_mask can be static kbuild test robot
2019-08-08  1:20     ` kbuild test robot
2019-08-08  2:36   ` [PATCH 3/4] x86/smp: send capped cpus to a stable state when smp_init() kbuild test robot
2019-08-08  2:36     ` kbuild test robot
2019-08-08  5:18   ` kbuild test robot
2019-08-08  5:18     ` kbuild test robot
2019-08-05  8:58 ` [PATCH 4/4] x86/smp: disallow MCE handler on rebooting AP Pingfan Liu
2019-08-05  8:58   ` Pingfan Liu
2019-08-07  3:00 ` [PATCH 0/4] x86/mce: protect nr_cpus from rebooting by broadcast mce Dave Young
2019-08-07  3:00   ` Dave Young
2019-08-07  7:52   ` Pingfan Liu [this message]
2019-08-07  7:52     ` Pingfan Liu
2019-08-07 13:07     ` Thomas Gleixner
2019-08-07 13:07       ` Thomas Gleixner
2019-08-08  5:41       ` Pingfan Liu
2019-08-08  5:41         ` Pingfan Liu
2019-08-08  6:51         ` Thomas Gleixner
2019-08-08  6:51           ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190807075226.GA10392@mypc \
    --to=kernelfans@gmail.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=cai@lca.pw \
    --cc=dave.hansen@linux.intel.com \
    --cc=drake@endlessm.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    --cc=xlpang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.