From: Borislav Petkov <bp@alien8.de>
To: Ashok Raj <ashok.raj@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org,
Tony Luck <tony.luck@intel.com>
Subject: Re: [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process.
Date: Fri, 4 Dec 2015 15:34:04 +0100 [thread overview]
Message-ID: <20151204143404.GF21177@pd.tnic> (raw)
In-Reply-To: <1449188170-3909-1-git-send-email-ashok.raj@intel.com>
On Thu, Dec 03, 2015 at 07:16:10PM -0500, Ashok Raj wrote:
> Linux has logical cpu offline capability. That can be triggered by:
>
> # echo 0 > /sys/devices/system/cpu/cpuX/online
>
> In Intel Architecture, MCE's are broadcasted to all CPUs in the system.
>
> This includes the CPUs marked offline by Linux. Unless the CPU's were removed
> via an ACPI notification, in which case the cpu's are removed from the
> cpu_present_map.
>
> This patch ensures offline CPU's don't participate in MCE rendezvous, but
> simply perform clearing some status bits to ensure a second MCE wont cause
> automatic shutdown.
>
> Without the patch, mce_start will increment mce_callin, but mce_start would
> wait for all online_cpus. So offline cpu's should avoid participating in the
> rendezvous process.
>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index c5b0d56..82a0c8b 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -998,6 +998,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> u64 recover_paddr = ~0ull;
> int flags = MF_ACTION_REQUIRED;
> int lmce = 0;
> + unsigned int cpu = smp_processor_id();
>
> ist_enter(regs);
>
> @@ -1008,6 +1009,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>
> mce_gather_info(&m, regs);
>
> + /*
> + * if this cpu is offline, just bail out.
> + * TBD: looking into adding any logs this offline CPU might have,
> + * to be collected and reported by the rendezvous master.
> + */
> + if (cpu_is_offline(cpu) && (m.mcgstatus & MCG_STATUS_RIPV))
> + goto out;
This CPU - it being offline and all - is not doing the minimal amount of
work possible IMO.
Why does it have to do ist_enter(), this_cpu_inc(mce_exception_count),
etc?
IMO the only things it should do is this:
if (cpu_is_offline(smp_processor_id())) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
and that should be at the very beginning of do_machine_check(). So
that the hardware is happy. Concerning Linux, it is offline so no data
structures on it are valid.
Hmmm?
P.S., please don't put stable@ to CC - add it as a "CC: " line in the
SOB section instead.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
next prev parent reply other threads:[~2015-12-04 14:34 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-04 0:16 [Patch V0] x86, mce: Ensure offline CPU's don't participate in mce rendezvous process Ashok Raj
2015-12-03 23:34 ` Greg KH
2015-12-04 14:34 ` Borislav Petkov [this message]
2015-12-04 17:14 ` Raj, Ashok
2015-12-04 16:51 ` Borislav Petkov
2015-12-04 17:23 ` Luck, Tony
2015-12-04 17:36 ` Borislav Petkov
2015-12-04 17:53 ` Luck, Tony
2015-12-04 18:00 ` Borislav Petkov
2015-12-04 18:30 ` Luck, Tony
2015-12-04 19:38 ` Borislav Petkov
2015-12-04 22:34 ` Andy Lutomirski
2015-12-05 0:08 ` Raj, Ashok
2015-12-04 23:14 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151204143404.GF21177@pd.tnic \
--to=bp@alien8.de \
--cc=ashok.raj@intel.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.