From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
elliott@hpe.com, Brian Gerst <brgerst@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@ml01.01.org, x86@kernel.org
Subject: Re: [PATCH v10 4/4] x86: Create a new synthetic cpu capability for machine check recovery
Date: Sun, 7 Feb 2016 18:10:41 +0100 [thread overview]
Message-ID: <20160207171041.GG5862@pd.tnic> (raw)
In-Reply-To: <97426a50c5667bb81a28340b820b371d7fadb6fa.1454618190.git.tony.luck@intel.com>
On Fri, Jan 29, 2016 at 04:00:19PM -0800, Tony Luck wrote:
> The Intel Software Developer Manual describes bit 24 in the MCG_CAP
> MSR:
> MCG_SER_P (software error recovery support present) flag,
> bit 24 a?? Indicates (when set) that the processor supports
> software error recovery
> But only some models with this capability bit set will actually
> generate recoverable machine checks.
>
> Check the model name and set a synthetic capability bit. Provide
> a command line option to set this bit anyway in case the kernel
> doesn't recognise the model name.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> Documentation/x86/x86_64/boot-options.txt | 4 ++++
> arch/x86/include/asm/cpufeature.h | 1 +
> arch/x86/include/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++++
> 4 files changed, 17 insertions(+)
>
> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
> index 68ed3114c363..8423c04ae7b3 100644
> --- a/Documentation/x86/x86_64/boot-options.txt
> +++ b/Documentation/x86/x86_64/boot-options.txt
> @@ -60,6 +60,10 @@ Machine check
> threshold to 1. Enabling this may make memory predictive failure
> analysis less effective if the bios sets thresholds for memory
> errors since we will not see details for all errors.
> + mce=recovery
> + Tell the kernel that this system can generate recoverable
> + machine checks (useful when the kernel doesn't recognize
> + the cpuid x86_model_id[])
I'd say "Force-enable generation of recoverable MCEs."
and not mention implementation details in the description text.
> nomce (for compatibility with i386): same as mce=off
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 7ad8c9464297..06c6c2d2fea0 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -106,6 +106,7 @@
> #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */
> #define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
> #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
> +#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */
>
> /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
> #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 2ea4527e462f..18d2ba9c8e44 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -113,6 +113,7 @@ struct mca_config {
> bool ignore_ce;
> bool disabled;
> bool ser;
> + bool recovery;
> bool bios_cmci_threshold;
> u8 banks;
> s8 bootlog;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 905f3070f412..16a3d0e29f84 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1696,6 +1696,15 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
> return;
> }
>
> + /*
> + * MCG_CAP.MCG_SER_P is necessary but not sufficient to know
> + * whether this processor will actually generate recoverable
> + * machine checks. Check to see if this is an E7 model Xeon.
> + */
> + if (mca_cfg.recovery || (mca_cfg.ser &&
> + !strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24)))
Eeww, a model string check :-(
Lemme guess: those E7s can't be represented by a range of
model/steppings, can they?
Similar to AMD_MODEL_RANGE() thing in cpu/amd.c, for example.
In any case, that chunk belongs in the Intel part of
__mcheck_cpu_apply_quirks().
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dan Williams <dan.j.williams@intel.com>,
elliott@hpe.com, Brian Gerst <brgerst@gmail.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-nvdimm@ml01.01.org, x86@kernel.org
Subject: Re: [PATCH v10 4/4] x86: Create a new synthetic cpu capability for machine check recovery
Date: Sun, 7 Feb 2016 18:10:41 +0100 [thread overview]
Message-ID: <20160207171041.GG5862@pd.tnic> (raw)
In-Reply-To: <97426a50c5667bb81a28340b820b371d7fadb6fa.1454618190.git.tony.luck@intel.com>
On Fri, Jan 29, 2016 at 04:00:19PM -0800, Tony Luck wrote:
> The Intel Software Developer Manual describes bit 24 in the MCG_CAP
> MSR:
> MCG_SER_P (software error recovery support present) flag,
> bit 24 — Indicates (when set) that the processor supports
> software error recovery
> But only some models with this capability bit set will actually
> generate recoverable machine checks.
>
> Check the model name and set a synthetic capability bit. Provide
> a command line option to set this bit anyway in case the kernel
> doesn't recognise the model name.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> Documentation/x86/x86_64/boot-options.txt | 4 ++++
> arch/x86/include/asm/cpufeature.h | 1 +
> arch/x86/include/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce.c | 11 +++++++++++
> 4 files changed, 17 insertions(+)
>
> diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
> index 68ed3114c363..8423c04ae7b3 100644
> --- a/Documentation/x86/x86_64/boot-options.txt
> +++ b/Documentation/x86/x86_64/boot-options.txt
> @@ -60,6 +60,10 @@ Machine check
> threshold to 1. Enabling this may make memory predictive failure
> analysis less effective if the bios sets thresholds for memory
> errors since we will not see details for all errors.
> + mce=recovery
> + Tell the kernel that this system can generate recoverable
> + machine checks (useful when the kernel doesn't recognize
> + the cpuid x86_model_id[])
I'd say "Force-enable generation of recoverable MCEs."
and not mention implementation details in the description text.
> nomce (for compatibility with i386): same as mce=off
>
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 7ad8c9464297..06c6c2d2fea0 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -106,6 +106,7 @@
> #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */
> #define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
> #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
> +#define X86_FEATURE_MCE_RECOVERY ( 3*32+31) /* cpu has recoverable machine checks */
>
> /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
> #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 2ea4527e462f..18d2ba9c8e44 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -113,6 +113,7 @@ struct mca_config {
> bool ignore_ce;
> bool disabled;
> bool ser;
> + bool recovery;
> bool bios_cmci_threshold;
> u8 banks;
> s8 bootlog;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 905f3070f412..16a3d0e29f84 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1696,6 +1696,15 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
> return;
> }
>
> + /*
> + * MCG_CAP.MCG_SER_P is necessary but not sufficient to know
> + * whether this processor will actually generate recoverable
> + * machine checks. Check to see if this is an E7 model Xeon.
> + */
> + if (mca_cfg.recovery || (mca_cfg.ser &&
> + !strncmp(c->x86_model_id, "Intel(R) Xeon(R) CPU E7-", 24)))
Eeww, a model string check :-(
Lemme guess: those E7s can't be represented by a range of
model/steppings, can they?
Similar to AMD_MODEL_RANGE() thing in cpu/amd.c, for example.
In any case, that chunk belongs in the Intel part of
__mcheck_cpu_apply_quirks().
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
next prev parent reply other threads:[~2016-02-07 17:10 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-04 20:36 [PATCH v10 0/4] Machine check recovery when kernel accesses poison Tony Luck
2016-02-04 20:36 ` Tony Luck
2015-12-31 19:40 ` [PATCH v10 2/4] x86, mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries Tony Luck
2015-12-31 19:40 ` Tony Luck
2016-01-08 20:49 ` [PATCH v10 1/4] x86: Expand exception table to allow new handling options Tony Luck
2016-01-08 20:49 ` Tony Luck
2016-01-08 21:18 ` [PATCH v10 3/4] x86, mce: Add __mcsafe_copy() Tony Luck
2016-01-08 21:18 ` Tony Luck
2016-02-07 16:49 ` Borislav Petkov
2016-02-07 16:49 ` Borislav Petkov
2016-02-09 23:15 ` Luck, Tony
2016-02-09 23:15 ` Luck, Tony
2016-02-10 10:58 ` Borislav Petkov
2016-02-10 10:58 ` Borislav Petkov
2016-02-10 19:39 ` Luck, Tony
2016-02-10 19:39 ` Luck, Tony
2016-02-10 20:50 ` Borislav Petkov
2016-02-10 20:50 ` Borislav Petkov
2016-02-07 16:55 ` Borislav Petkov
2016-02-07 16:55 ` Borislav Petkov
2016-02-07 20:54 ` Richard Weinberger
2016-02-07 20:54 ` Richard Weinberger
2016-01-30 0:00 ` [PATCH v10 4/4] x86: Create a new synthetic cpu capability for machine check recovery Tony Luck
2016-01-30 0:00 ` Tony Luck
2016-02-07 17:10 ` Borislav Petkov [this message]
2016-02-07 17:10 ` Borislav Petkov
2016-02-09 23:38 ` Luck, Tony
2016-02-09 23:38 ` Luck, Tony
2016-02-10 11:06 ` Borislav Petkov
2016-02-10 11:06 ` Borislav Petkov
2016-02-10 19:27 ` Luck, Tony
2016-02-10 19:27 ` Luck, Tony
2016-02-11 11:55 ` Borislav Petkov
2016-02-11 11:55 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160207171041.GG5862@pd.tnic \
--to=bp@alien8.de \
--cc=akpm@linux-foundation.org \
--cc=brgerst@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=elliott@hpe.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=luto@kernel.org \
--cc=mingo@kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.