* [PATCH 2/2] mcelog: Print the PPIN in machine check records when it is available @ 2016-11-18 0:35 Luck, Tony 2016-11-18 0:35 ` [PATCH 1/2] x86/mce: Include " Luck, Tony 0 siblings, 1 reply; 17+ messages in thread From: Luck, Tony @ 2016-11-18 0:35 UTC (permalink / raw) To: Andi Kleen; +Cc: Tony Luck, linux-kernel, Boris Petkov From: Tony Luck <tony.luck@intel.com> Intel Xeons from Ivy Bridge onwards support a processor identification number. Kernels v4.9 and higher include it in the "mce" record. Signed-off-by: Tony Luck <tony.luck@intel.com> --- mcelog.c | 3 +++ mcelog.h | 3 +++ 2 files changed, 6 insertions(+) diff --git a/mcelog.c b/mcelog.c index 7214a0d23f65..e79996db9b5b 100644 --- a/mcelog.c +++ b/mcelog.c @@ -441,6 +441,9 @@ static void dump_mce(struct mce *m, unsigned recordlen) if (n > 0) Wprintf("\n"); + if (recordlen >= offsetof(struct mce, ppin) && m->ppin) + n += Wprintf("PPIN %llx\n", m->ppin); + if (recordlen >= offsetof(struct mce, cpuid) && m->cpuid) { u32 fam, mod; parse_cpuid(m->cpuid, &fam, &mod); diff --git a/mcelog.h b/mcelog.h index 254b3a092fba..9a54077e5474 100644 --- a/mcelog.h +++ b/mcelog.h @@ -31,6 +31,9 @@ struct mce { __u32 socketid; /* CPU socket ID */ __u32 apicid; /* CPU initial apic ID */ __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ + __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ + __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ }; #define X86_VENDOR_INTEL 0 -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 0:35 [PATCH 2/2] mcelog: Print the PPIN in machine check records when it is available Luck, Tony @ 2016-11-18 0:35 ` Luck, Tony 2016-11-18 13:00 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Luck, Tony @ 2016-11-18 0:35 UTC (permalink / raw) To: Boris Petkov; +Cc: Tony Luck, linux-kernel, Andi Kleen From: Tony Luck <tony.luck@intel.com> Intel Xeons from Ivy Bridge onwards support a processor identification number. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Signed-off-by: Tony Luck <tony.luck@intel.com> --- arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/include/uapi/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 35 +++++++++++++++++++++++++++++++++++ 3 files changed, 40 insertions(+) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 78f3760ca1f2..710273c617b8 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -37,6 +37,10 @@ #define EFER_FFXSR (1<<_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ + +#define MSR_PPIN_CTL 0x0000004e +#define MSR_PPIN 0x0000004f + #define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 #define MSR_FSB_FREQ 0x000000cd diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index 69a6e07e3149..eb6247a7009b 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -28,6 +28,7 @@ struct mce { __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ }; #define MCE_GET_RECORD_LEN _IOR('M', 1, int) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index a7fdf453d895..eb9ce5023da3 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -43,6 +43,7 @@ #include <linux/export.h> #include <linux/jump_label.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/traps.h> #include <asm/tlbflush.h> @@ -122,6 +123,9 @@ static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs); */ ATOMIC_NOTIFIER_HEAD(x86_mce_decoder_chain); +/* Some Intel Xeons support per socket protected processor inventory number */ +static bool have_ppin; + /* Do initial initialization of a struct mce */ void mce_setup(struct mce *m) { @@ -135,6 +139,8 @@ void mce_setup(struct mce *m) m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap); + if (have_ppin) + rdmsrl(MSR_PPIN, m->ppin); } DEFINE_PER_CPU(struct mce, injectm); @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str) } __setup("mce", mcheck_enable); +static void mcheck_intel_ppin_init(void) +{ + unsigned long long msr_ppin_ctl; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return; + switch (boot_cpu_data.x86_model) { + case INTEL_FAM6_IVYBRIDGE_X: + case INTEL_FAM6_HASWELL_X: + case INTEL_FAM6_BROADWELL_XEON_D: + case INTEL_FAM6_BROADWELL_X: + case INTEL_FAM6_SKYLAKE_X: + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl)) + return; + if (msr_ppin_ctl == 1) { + pr_info("PPIN available but disabled\n"); + return; + } + /* if PPIN is disabled, but not locked, try to enable */ + if (msr_ppin_ctl == 0) { + wrmsrl_safe(MSR_PPIN_CTL, 2); + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl); + } + if (msr_ppin_ctl == 2) + have_ppin = 1; + } +} + int __init mcheck_init(void) { + mcheck_intel_ppin_init(); mcheck_intel_therm_init(); mce_register_decode_chain(&mce_srao_nb); mcheck_vendor_init_severity(); -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 0:35 ` [PATCH 1/2] x86/mce: Include " Luck, Tony @ 2016-11-18 13:00 ` Borislav Petkov 2016-11-18 16:42 ` Luck, Tony 2016-11-18 17:02 ` Andi Kleen 0 siblings, 2 replies; 17+ messages in thread From: Borislav Petkov @ 2016-11-18 13:00 UTC (permalink / raw) To: Luck, Tony; +Cc: linux-kernel, Andi Kleen On Thu, Nov 17, 2016 at 04:35:48PM -0800, Luck, Tony wrote: > From: Tony Luck <tony.luck@intel.com> > > Intel Xeons from Ivy Bridge onwards support a processor identification > number. On systems that have it, include it in the machine check record. > I'm told that this would be helpful for users that run large data centers > with multi-socket servers to keep track of which CPUs are seeing errors. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > arch/x86/include/asm/msr-index.h | 4 ++++ > arch/x86/include/uapi/asm/mce.h | 1 + > arch/x86/kernel/cpu/mcheck/mce.c | 35 +++++++++++++++++++++++++++++++++++ > 3 files changed, 40 insertions(+) ... > @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str) > } > __setup("mce", mcheck_enable); > > +static void mcheck_intel_ppin_init(void) So this functionality could all be moved to arch/x86/kernel/cpu/intel.c where you could set an artificial X86_FEATURE_PPIN and get rid of the have_ppin var. > +{ > + unsigned long long msr_ppin_ctl; > + > + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) > + return; Then, that check can go. > + switch (boot_cpu_data.x86_model) { > + case INTEL_FAM6_IVYBRIDGE_X: > + case INTEL_FAM6_HASWELL_X: > + case INTEL_FAM6_BROADWELL_XEON_D: > + case INTEL_FAM6_BROADWELL_X: > + case INTEL_FAM6_SKYLAKE_X: > + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl)) > + return; I don't think you need to check models - if the RDMSR fails, you're done. > + if (msr_ppin_ctl == 1) { & BIT_ULL(0) for future robustness in case those other reserved bits get used. > + pr_info("PPIN available but disabled\n"); We don't care, do we? > + return; > + } > + /* if PPIN is disabled, but not locked, try to enable */ > + if (msr_ppin_ctl == 0) { Also, properly masked off. There are [63:2] reserved bits which might be assigned someday. > + wrmsrl_safe(MSR_PPIN_CTL, 2); > + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl); Why aren't we programming a number here? Or are users supposed to do that? If so, please design a proper sysfs interface and not make them use msr-tools. > + } > + if (msr_ppin_ctl == 2) > + have_ppin = 1; set_cpu_cap(c, X86_FEATURE_PPIN); -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 13:00 ` Borislav Petkov @ 2016-11-18 16:42 ` Luck, Tony 2016-11-18 17:02 ` Andi Kleen 1 sibling, 0 replies; 17+ messages in thread From: Luck, Tony @ 2016-11-18 16:42 UTC (permalink / raw) To: Borislav Petkov; +Cc: linux-kernel, Andi Kleen On Fri, Nov 18, 2016 at 02:00:22PM +0100, Borislav Petkov wrote: > On Thu, Nov 17, 2016 at 04:35:48PM -0800, Luck, Tony wrote: > > @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str) > > } > > __setup("mce", mcheck_enable); > > > > +static void mcheck_intel_ppin_init(void) > > So this functionality could all be moved to arch/x86/kernel/cpu/intel.c > where you could set an artificial X86_FEATURE_PPIN and get rid of the > have_ppin var. Ok - will do. > > + switch (boot_cpu_data.x86_model) { > > + case INTEL_FAM6_IVYBRIDGE_X: > > + case INTEL_FAM6_HASWELL_X: > > + case INTEL_FAM6_BROADWELL_XEON_D: > > + case INTEL_FAM6_BROADWELL_X: > > + case INTEL_FAM6_SKYLAKE_X: > > + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl)) > > + return; > > I don't think you need to check models - if the RDMSR fails, you're > done. Other models may use this MSR number for some other purpose. So the read might succeed, but what I get might be something else entirely. Technically with the model check I shouldn't have to use the _safe versions ... but I'm paranoid that some SKUs might not implement this. > > + if (msr_ppin_ctl == 1) { > > & BIT_ULL(0) > > for future robustness in case those other reserved bits get used. Unlikely ... but paranoia is good (see above about using rdmsr_safe). > > + pr_info("PPIN available but disabled\n"); > > We don't care, do we? Probably not ... there might be a BIOS setting, but the user that finds they aren't getting PPIN in their logs could diagnose by making their own rdmsr checks ... will delete this pr_info(). > > + return; > > + } > > + /* if PPIN is disabled, but not locked, try to enable */ > > + if (msr_ppin_ctl == 0) { > > Also, properly masked off. There are [63:2] reserved bits which might be > assigned someday. Ok. > > + wrmsrl_safe(MSR_PPIN_CTL, 2); > > + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl); > > Why aren't we programming a number here? Or are users supposed to do > that? > > If so, please design a proper sysfs interface and not make them use > msr-tools. The PPIN is programmed at the fab. To the user it is just a handy unique number. I think Intel can decode it back to which fab and production run this chip came from (useful to us if there are many chips reporting some error). > > + } > > + if (msr_ppin_ctl == 2) > > + have_ppin = 1; > > set_cpu_cap(c, X86_FEATURE_PPIN); Yes - that looks prettier. Thanks -Tony ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 13:00 ` Borislav Petkov 2016-11-18 16:42 ` Luck, Tony @ 2016-11-18 17:02 ` Andi Kleen 2016-11-18 17:45 ` Borislav Petkov 2016-11-18 17:48 ` [PATCH v2] " Luck, Tony 1 sibling, 2 replies; 17+ messages in thread From: Andi Kleen @ 2016-11-18 17:02 UTC (permalink / raw) To: Borislav Petkov; +Cc: Luck, Tony, linux-kernel Borislav Petkov <bp@suse.de> writes: > >> @@ -2134,8 +2140,37 @@ static int __init mcheck_enable(char *str) >> } >> __setup("mce", mcheck_enable); >> >> +static void mcheck_intel_ppin_init(void) > > So this functionality could all be moved to arch/x86/kernel/cpu/intel.c > where you could set an artificial X86_FEATURE_PPIN and get rid of the > have_ppin var. That means that a tiny kernel that compiles out machine check functionality has this unnecessary code. In general it doesn't make any sense to define a FEATURE flag for a single user. It's better to just check it where it is needed. -Andi ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 17:02 ` Andi Kleen @ 2016-11-18 17:45 ` Borislav Petkov 2016-11-18 17:48 ` [PATCH v2] " Luck, Tony 1 sibling, 0 replies; 17+ messages in thread From: Borislav Petkov @ 2016-11-18 17:45 UTC (permalink / raw) To: Andi Kleen; +Cc: Luck, Tony, linux-kernel On Fri, Nov 18, 2016 at 09:02:56AM -0800, Andi Kleen wrote: > In general it doesn't make any sense to define a FEATURE flag for > a single user. It's better to just check it where it is needed. Then the whole thing should go into mce_intel.c. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 17:02 ` Andi Kleen 2016-11-18 17:45 ` Borislav Petkov @ 2016-11-18 17:48 ` Luck, Tony 2016-11-23 11:48 ` Borislav Petkov 1 sibling, 1 reply; 17+ messages in thread From: Luck, Tony @ 2016-11-18 17:48 UTC (permalink / raw) To: Borislav Petkov; +Cc: Tony Luck, Andi Kleen, Ashok Raj, linux-kernel From: Tony Luck <tony.luck@intel.com> Intel Xeons from Ivy Bridge onwards support a processor identification number set in the factory. To the user this is a handy unique number to identify a particular cpu. Intel can decode this to the fab/production run to track errors. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Signed-off-by: Tony Luck <tony.luck@intel.com> --- Boris: Moved feature detection to mce_intel.c Use feature bit. Don't spam console if feature is disabled Program defensively against future bits in MSR_PPIN_CTL Updated commit comment to note the PPIN is set in factory Andi: Dynamic feature bits don't impact tiny kernels (well we are using one *bit* so this could contribute to NCAPINTS someday needing to be increased). arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/include/uapi/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 3 +++ arch/x86/kernel/cpu/mcheck/mce_intel.c | 29 +++++++++++++++++++++++++++++ 5 files changed, 38 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a39629206864..d625b651e526 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -193,6 +193,7 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 78f3760ca1f2..710273c617b8 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -37,6 +37,10 @@ #define EFER_FFXSR (1<<_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ + +#define MSR_PPIN_CTL 0x0000004e +#define MSR_PPIN 0x0000004f + #define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 #define MSR_FSB_FREQ 0x000000cd diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index 69a6e07e3149..eb6247a7009b 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -28,6 +28,7 @@ struct mce { __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ }; #define MCE_GET_RECORD_LEN _IOR('M', 1, int) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index a7fdf453d895..cc6d877db88c 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -43,6 +43,7 @@ #include <linux/export.h> #include <linux/jump_label.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/traps.h> #include <asm/tlbflush.h> @@ -135,6 +136,8 @@ void mce_setup(struct mce *m) m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap); + if (this_cpu_has(X86_FEATURE_INTEL_PPIN)) + rdmsrl(MSR_PPIN, m->ppin); } DEFINE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c index 1defb8ea882c..b2601c96fc3e 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c @@ -11,6 +11,8 @@ #include <linux/sched.h> #include <linux/cpumask.h> #include <asm/apic.h> +#include <asm/cpufeature.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/msr.h> #include <asm/mce.h> @@ -464,11 +466,38 @@ static void intel_clear_lmce(void) wrmsrl(MSR_IA32_MCG_EXT_CTL, val); } +static void intel_ppin_init(struct cpuinfo_x86 *c) +{ + unsigned long long msr_ppin_ctl; + + switch (c->x86_model) { + case INTEL_FAM6_IVYBRIDGE_X: + case INTEL_FAM6_HASWELL_X: + case INTEL_FAM6_BROADWELL_XEON_D: + case INTEL_FAM6_BROADWELL_X: + case INTEL_FAM6_SKYLAKE_X: + if (rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl)) + return; + if ((msr_ppin_ctl & 3ul) == 1ul) { + /* PPIN available but disabled */ + return; + } + /* if PPIN is disabled, but not locked, try to enable */ + if (msr_ppin_ctl == 0) { + wrmsrl_safe(MSR_PPIN_CTL, 2ul); + rdmsrl_safe(MSR_PPIN_CTL, &msr_ppin_ctl); + } + if ((msr_ppin_ctl & 3ul) == 2ul) + set_cpu_cap(c, X86_FEATURE_INTEL_PPIN); + } +} + void mce_intel_feature_init(struct cpuinfo_x86 *c) { intel_init_thermal(c); intel_init_cmci(); intel_init_lmce(); + intel_ppin_init(c); } void mce_intel_feature_clear(struct cpuinfo_x86 *c) -- 2.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-18 17:48 ` [PATCH v2] " Luck, Tony @ 2016-11-23 11:48 ` Borislav Petkov 2016-11-23 13:29 ` Henrique de Moraes Holschuh 2016-11-23 15:58 ` [tip:ras/core] x86/mce: Include the PPIN in MCE records when available tip-bot for Tony Luck 0 siblings, 2 replies; 17+ messages in thread From: Borislav Petkov @ 2016-11-23 11:48 UTC (permalink / raw) To: Luck, Tony; +Cc: Andi Kleen, Ashok Raj, linux-kernel On Fri, Nov 18, 2016 at 09:48:36AM -0800, Luck, Tony wrote: > From: Tony Luck <tony.luck@intel.com> > > Intel Xeons from Ivy Bridge onwards support a processor identification > number set in the factory. To the user this is a handy unique number to > identify a particular cpu. Intel can decode this to the fab/production > run to track errors. On systems that have it, include it in the machine > check record. I'm told that this would be helpful for users that run > large data centers with multi-socket servers to keep track of which > CPUs are seeing errors. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > > Boris: > Moved feature detection to mce_intel.c > Use feature bit. > Don't spam console if feature is disabled > Program defensively against future bits in MSR_PPIN_CTL > Updated commit comment to note the PPIN is set in factory > > Andi: > Dynamic feature bits don't impact tiny kernels (well we > are using one *bit* so this could contribute to NCAPINTS > someday needing to be increased). > > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/msr-index.h | 4 ++++ > arch/x86/include/uapi/asm/mce.h | 1 + > arch/x86/kernel/cpu/mcheck/mce.c | 3 +++ > arch/x86/kernel/cpu/mcheck/mce_intel.c | 29 +++++++++++++++++++++++++++++ > 5 files changed, 38 insertions(+) Applied with some minor fixups: --- From: Tony Luck <tony.luck@intel.com> Date: Fri, 18 Nov 2016 09:48:36 -0800 Subject: [PATCH] x86/mce: Include the PPIN in MCE records when available Intel Xeons from Ivy Bridge onwards support a processor identification number set in the factory. To the user this is a handy unique number to identify a particular CPU. Intel can decode this to the fab/production run to track errors. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Boris: * Add some clarifying comments and spacing. * Mask out [63:2] in the disabled-but-not-locked case * Call the MSR variable "val" for more readability. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1479491316-11716-1-git-send-email-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de> --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/include/uapi/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 4 ++++ arch/x86/kernel/cpu/mcheck/mce_intel.c | 37 ++++++++++++++++++++++++++++++++++ 5 files changed, 47 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a39629206864..d625b651e526 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -193,6 +193,7 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 78f3760ca1f2..710273c617b8 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -37,6 +37,10 @@ #define EFER_FFXSR (1<<_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ + +#define MSR_PPIN_CTL 0x0000004e +#define MSR_PPIN 0x0000004f + #define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 #define MSR_FSB_FREQ 0x000000cd diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index 69a6e07e3149..eb6247a7009b 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -28,6 +28,7 @@ struct mce { __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ }; #define MCE_GET_RECORD_LEN _IOR('M', 1, int) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index aab96f8d52b0..a3cb27af4f9b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -43,6 +43,7 @@ #include <linux/export.h> #include <linux/jump_label.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/traps.h> #include <asm/tlbflush.h> @@ -135,6 +136,9 @@ void mce_setup(struct mce *m) m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap); + + if (this_cpu_has(X86_FEATURE_INTEL_PPIN)) + rdmsrl(MSR_PPIN, m->ppin); } DEFINE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c index be0b2fad47c5..1faefb696af8 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c @@ -11,6 +11,8 @@ #include <linux/sched.h> #include <linux/cpumask.h> #include <asm/apic.h> +#include <asm/cpufeature.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/msr.h> #include <asm/mce.h> @@ -464,11 +466,46 @@ static void intel_clear_lmce(void) wrmsrl(MSR_IA32_MCG_EXT_CTL, val); } +static void intel_ppin_init(struct cpuinfo_x86 *c) +{ + unsigned long long val; + + /* + * Even if testing the presence of the MSR would be enough, we don't + * want to risk the situation where other models reuse this MSR for + * other purposes. + */ + switch (c->x86_model) { + case INTEL_FAM6_IVYBRIDGE_X: + case INTEL_FAM6_HASWELL_X: + case INTEL_FAM6_BROADWELL_XEON_D: + case INTEL_FAM6_BROADWELL_X: + case INTEL_FAM6_SKYLAKE_X: + if (rdmsrl_safe(MSR_PPIN_CTL, &val)) + return; + + if ((val & 3ul) == 1ul) { + /* PPIN available but disabled: */ + return; + } + + /* if PPIN is disabled, but not locked, try to enable: */ + if (!(val & 3ul)) { + wrmsrl_safe(MSR_PPIN_CTL, val | 2ul); + rdmsrl_safe(MSR_PPIN_CTL, &val); + } + + if ((val & 3ul) == 2ul) + set_cpu_cap(c, X86_FEATURE_INTEL_PPIN); + } +} + void mce_intel_feature_init(struct cpuinfo_x86 *c) { intel_init_thermal(c); intel_init_cmci(); intel_init_lmce(); + intel_ppin_init(c); } void mce_intel_feature_clear(struct cpuinfo_x86 *c) -- 2.10.0 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 11:48 ` Borislav Petkov @ 2016-11-23 13:29 ` Henrique de Moraes Holschuh 2016-11-23 13:37 ` Borislav Petkov 2016-11-23 15:58 ` [tip:ras/core] x86/mce: Include the PPIN in MCE records when available tip-bot for Tony Luck 1 sibling, 1 reply; 17+ messages in thread From: Henrique de Moraes Holschuh @ 2016-11-23 13:29 UTC (permalink / raw) To: Borislav Petkov; +Cc: Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, 23 Nov 2016, Borislav Petkov wrote: > + /* if PPIN is disabled, but not locked, try to enable: */ > + if (!(val & 3ul)) { > + wrmsrl_safe(MSR_PPIN_CTL, val | 2ul); > + rdmsrl_safe(MSR_PPIN_CTL, &val); > + } Actually, since this thing is supposed to be opt-in [through UEFI config] for a good reason (privacy), IMHO it would make more sense to: 1. Assuming we can do it, always lock it when it is found to be unlocked at kernel boot. 2. Not attempt to change its state from disabled to enabled *unless* given a command line parameter authorizing it. A kconfig-based solution for default+command line override would also work well IMHO, if it makes more sense. This would keep the feature opt-in as it is supposed to be, while making it "safer" on firmware that leaves it unlocked after boot, and would still allow owners of systems that leave it unlocked to change its state at boot. Everyone ends up happy... -- Henrique Holschuh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 13:29 ` Henrique de Moraes Holschuh @ 2016-11-23 13:37 ` Borislav Petkov 2016-11-23 14:05 ` Borislav Petkov 2016-11-23 17:29 ` Henrique de Moraes Holschuh 0 siblings, 2 replies; 17+ messages in thread From: Borislav Petkov @ 2016-11-23 13:37 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, Nov 23, 2016 at 11:29:51AM -0200, Henrique de Moraes Holschuh wrote: > 1. Assuming we can do it, always lock it when it is found to be unlocked > at kernel boot. Because...? > 2. Not attempt to change its state from disabled to enabled *unless* > given a command line parameter authorizing it. A kconfig-based > solution for default+command line override would also work well IMHO, > if it makes more sense. You can't reenable it: "LockOut (R/WO) Set 1 to prevent further writes to MSR_PPIN_CTL. Writing 1 to MSR_PPINCTL[bit 0] is permitted only if MSR_PPIN_CTL[bit 1] is clear, Default is 0." -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 13:37 ` Borislav Petkov @ 2016-11-23 14:05 ` Borislav Petkov 2016-11-23 16:42 ` Tony Luck 2016-11-23 17:29 ` Henrique de Moraes Holschuh 1 sibling, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2016-11-23 14:05 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, Nov 23, 2016 at 02:37:23PM +0100, Borislav Petkov wrote: > You can't reenable it: > > "LockOut (R/WO) > Set 1 to prevent further writes to MSR_PPIN_CTL. Writing 1 to > MSR_PPINCTL[bit 0] is permitted only if MSR_PPIN_CTL[bit 1] is > clear, Default is 0." Well, almost. "Enable_PPIN (R/W) If 1, enables MSR_PPIN to be accessible using RDMSR. Once set, attempt to write 1 to MSR_PPIN_CTL[bit 0] will cause #GP. If 0, an attempt to read MSR_PPIN will cause #GP. Default is 0." Frankly, I don't get what the deal behind that locking out is. And it says that BIOS should provide an opt-in so that agent can read the PPIN and then that agent should *disable* it again by writing 01b to the CTL MSR. But then the first paragraph above says that the write MSR_PPIN_CTL[0]=1b will #GP because MSR_PPIN_CTL[1] will be 1 for the agent to read out MSR_PPIN first. I guess we need to write a 00b first to disable PPIN and then write 01b to lock it out. So AFAIU, the steps will be: * BIOS writes 10b * agent reads MSR_PPIN * agent writes 00b to disable MSR_PPIN * agent writes 01b because bit 1 is clear now and it won't #GP. Meh... -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 14:05 ` Borislav Petkov @ 2016-11-23 16:42 ` Tony Luck 2016-11-23 16:55 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Tony Luck @ 2016-11-23 16:42 UTC (permalink / raw) To: Borislav Petkov Cc: Henrique de Moraes Holschuh, Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel If the BIOS writes 10b, then PPIN is disabled and will remain so until the processor is reset. Bit 1 is a one way trip, it can be set by s/w, but not cleared again. All this is because of the huge stink last time Intel tried to add a serial number to CPUs a decade and a half ago. The lockout bit is so that this can be turned off in a way that you can be sure that it can't be turned on again. -Tony Sent from my iPhone > On Nov 23, 2016, at 06:05, Borislav Petkov <bp@suse.de> wrote: > >> On Wed, Nov 23, 2016 at 02:37:23PM +0100, Borislav Petkov wrote: >> You can't reenable it: >> >> "LockOut (R/WO) >> Set 1 to prevent further writes to MSR_PPIN_CTL. Writing 1 to >> MSR_PPINCTL[bit 0] is permitted only if MSR_PPIN_CTL[bit 1] is >> clear, Default is 0." > > Well, almost. > > "Enable_PPIN (R/W) > If 1, enables MSR_PPIN to be accessible using RDMSR. Once set, > attempt to write 1 to MSR_PPIN_CTL[bit 0] will cause #GP. > If 0, an attempt to read MSR_PPIN will cause #GP. Default is 0." > > Frankly, I don't get what the deal behind that locking out is. And it > says that BIOS should provide an opt-in so that agent can read the PPIN > and then that agent should *disable* it again by writing 01b to the CTL > MSR. > > But then the first paragraph above says that the write > MSR_PPIN_CTL[0]=1b will #GP because MSR_PPIN_CTL[1] will be 1 for the > agent to read out MSR_PPIN first. > > I guess we need to write a 00b first to disable PPIN and then write 01b > to lock it out. > > So AFAIU, the steps will be: > > * BIOS writes 10b > * agent reads MSR_PPIN > * agent writes 00b to disable MSR_PPIN > * agent writes 01b because bit 1 is clear now and it won't #GP. > > Meh... > > -- > Regards/Gruss, > Boris. > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) > -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 16:42 ` Tony Luck @ 2016-11-23 16:55 ` Borislav Petkov 0 siblings, 0 replies; 17+ messages in thread From: Borislav Petkov @ 2016-11-23 16:55 UTC (permalink / raw) To: Tony Luck Cc: Henrique de Moraes Holschuh, Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, Nov 23, 2016 at 08:42:40AM -0800, Tony Luck wrote: > If the BIOS writes 10b, then PPIN is disabled and will remain so until > the processor is reset. Bit 1 is a one way trip, it can be set by s/w, > but not cleared again. 10b means bit 1, i.e., Enable_PPIN is set, right? Which actually *enables* PPIN. Or am I confused again? Otherwise, this explains the "Once set" wording - if Enable_PPIN is 1, there's no changing until next reboot. > All this is because of the huge stink last time Intel tried to add > a serial number to CPUs a decade and a half ago. It certainly rang a bell when you sent v1. :-) > The lockout bit is so that this can be turned off in a way that you > can be sure that it can't be turned on again. ... in order to protect ourselves from root doing wrmsr? Or why are we doing this? -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 13:37 ` Borislav Petkov 2016-11-23 14:05 ` Borislav Petkov @ 2016-11-23 17:29 ` Henrique de Moraes Holschuh 2016-11-23 20:56 ` Tony Luck 1 sibling, 1 reply; 17+ messages in thread From: Henrique de Moraes Holschuh @ 2016-11-23 17:29 UTC (permalink / raw) To: Borislav Petkov; +Cc: Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, 23 Nov 2016, Borislav Petkov wrote: > On Wed, Nov 23, 2016 at 11:29:51AM -0200, Henrique de Moraes Holschuh wrote: > > 1. Assuming we can do it, always lock it when it is found to be unlocked > > at kernel boot. > > Because...? Privacy, and the fact that /dev/cpu/msr exists and is enabled on almost all general-use distros. > > 2. Not attempt to change its state from disabled to enabled *unless* > > given a command line parameter authorizing it. A kconfig-based > > solution for default+command line override would also work well IMHO, > > if it makes more sense. > > You can't reenable it: Yeah, I just found the description for that thing in the IA32 manual. It can be disabled + unlocked, disabled + locked, or enabled + unlocked. Once locked, it will stay disabled until the next reboot. However, the manual makes it clear we are _not_ supposed to leave it enabled + unlocked. Apparently, we're supposed to do our business and disable+lock it (i.e. enable, read and store/process, disable+lock). Looks like it is supposed to be used in a way that protects privacy by making it very hard for general use software to depend on it existing and being enabled. -- Henrique Holschuh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 17:29 ` Henrique de Moraes Holschuh @ 2016-11-23 20:56 ` Tony Luck 2016-11-24 11:20 ` Henrique de Moraes Holschuh 0 siblings, 1 reply; 17+ messages in thread From: Tony Luck @ 2016-11-23 20:56 UTC (permalink / raw) To: Henrique de Moraes Holschuh Cc: Borislav Petkov, Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel IMHO people who really care should find the BIOS option and disable it there. Having Linux take responsibility seems a little weird. If we do go that route it should be early in setup_arch() before any callbacks to other subsystems to avoid and endless games of whack-a-mole. I also wonder about the level of outrage this time around. The feature has been sitting there for three full generations: Ivybridge (tick), Haswell (tock) and another tick for Broadwell. Do privacy folks not read each new SDM from cover to cover? Sent from my iPhone > On Nov 23, 2016, at 09:29, Henrique de Moraes Holschuh <hmh@hmh.eng.br> wrote: > >> On Wed, 23 Nov 2016, Borislav Petkov wrote: >>> On Wed, Nov 23, 2016 at 11:29:51AM -0200, Henrique de Moraes Holschuh wrote: >>> 1. Assuming we can do it, always lock it when it is found to be unlocked >>> at kernel boot. >> >> Because...? > > Privacy, and the fact that /dev/cpu/msr exists and is enabled on > almost all general-use distros. > >>> 2. Not attempt to change its state from disabled to enabled *unless* >>> given a command line parameter authorizing it. A kconfig-based >>> solution for default+command line override would also work well IMHO, >>> if it makes more sense. >> >> You can't reenable it: > > Yeah, I just found the description for that thing in the IA32 manual. > > It can be disabled + unlocked, disabled + locked, or enabled + unlocked. > Once locked, it will stay disabled until the next reboot. > > However, the manual makes it clear we are _not_ supposed to leave it > enabled + unlocked. Apparently, we're supposed to do our business and > disable+lock it (i.e. enable, read and store/process, disable+lock). > > Looks like it is supposed to be used in a way that protects privacy by > making it very hard for general use software to depend on it existing > and being enabled. > > -- > Henrique Holschuh ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2] x86/mce: Include the PPIN in machine check records when it is available 2016-11-23 20:56 ` Tony Luck @ 2016-11-24 11:20 ` Henrique de Moraes Holschuh 0 siblings, 0 replies; 17+ messages in thread From: Henrique de Moraes Holschuh @ 2016-11-24 11:20 UTC (permalink / raw) To: Tony Luck Cc: Borislav Petkov, Luck, Tony, Andi Kleen, Ashok Raj, linux-kernel On Wed, 23 Nov 2016, Tony Luck wrote: > IMHO people who really care should find the BIOS option and disable it > there. That can also be said about *enabling* it, I think (see below). > Having Linux take responsibility seems a little weird. If we do go Not really. The currently proposed patch *enables* PPIN if it is found to be disabled but unlocked. That pretty much means Linux _would_ take the responsibility, the blame, and the outrage of privacy advocates (if any). If we enable it, it is our fault, plain and simple. > I also wonder about the level of outrage this time around. The feature > has been sitting there for three full generations: Ivybridge (tick), > Haswell (tock) and another tick for Broadwell. Do privacy folks not > read each new SDM from cover to cover? I very much doubt so :-) And it would take a very through and careful read of the SDM changes to find it, if you are not searching for it by name. But even if the privacy advocates did read the SDM changelogs very carefully and took notice of it, the PPIN feature clearly looks like it was designed to protect the privacy of anyone that did not especifically want it enabled. 1. PPIN is disabled on hard reset (as far as I can tell). 2. BIOS/UEFI ships it disabled by default, as recommended by SDM ("opt-in" feature). Although it should have recommended that it be *locked* disabled by default, thus *ensuring* opt-in. 3. Opt-in bias is enforced in hardware (the firmware cannot lock the feature in an enabled state). 4. Access violations (read when disabled, unlock, etc) will raise a #GP, thus getting the operating system/firmware crash handler involved immediately. The expected usecase is, as described in the IA32 SDM: a trusted asset agent will enable, read the PPIN, and lock it disabled afterwards. That "lock it disabled" would get in the way of general abuse of the feature by random ISVs. I think the architecture / hardware / microcode people @intel covered their angle really well on this. Anyone that raise a ruckus on the fact that PPIN exists (as described in the SDM) is not going to look very reasonable. I recommend that the Linux kernel should take the same instance as the intel hardware/microcode team did: don't enable it by default, don't make it easy for any ISVs to abuse it without positive opt-in action from the local system admin. This is why I also recommend that the kernel should always lock it disabled -- whether we read the PPIN for kernel use (when PPIN was enabled by the BIOS[1]) or not. It indeed *is* the kernel taking responsibility for side-stepping the whole "rdmsr is for ring 0" architectural security model due to unfiltered /dev/cpu/msr. [1] I personally have nothing against an override, e.g. a kernel command-line parameter, that allows the kernel to enable PPIN when the BIOS left it unlocked, as long as it is not done by default. -- Henrique Holschuh ^ permalink raw reply [flat|nested] 17+ messages in thread
* [tip:ras/core] x86/mce: Include the PPIN in MCE records when available 2016-11-23 11:48 ` Borislav Petkov 2016-11-23 13:29 ` Henrique de Moraes Holschuh @ 2016-11-23 15:58 ` tip-bot for Tony Luck 1 sibling, 0 replies; 17+ messages in thread From: tip-bot for Tony Luck @ 2016-11-23 15:58 UTC (permalink / raw) To: linux-tip-commits Cc: hpa, linux-edac, bp, tony.luck, mingo, linux-kernel, tglx, x86, ashok.raj Commit-ID: 3f5a7896a5096fd50030a04d4c3f28a7441e30a5 Gitweb: http://git.kernel.org/tip/3f5a7896a5096fd50030a04d4c3f28a7441e30a5 Author: Tony Luck <tony.luck@intel.com> AuthorDate: Fri, 18 Nov 2016 09:48:36 -0800 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Wed, 23 Nov 2016 16:51:52 +0100 x86/mce: Include the PPIN in MCE records when available Intel Xeons from Ivy Bridge onwards support a processor identification number set in the factory. To the user this is a handy unique number to identify a particular CPU. Intel can decode this to the fab/production run to track errors. On systems that have it, include it in the machine check record. I'm told that this would be helpful for users that run large data centers with multi-socket servers to keep track of which CPUs are seeing errors. Boris: * Add some clarifying comments and spacing. * Mask out [63:2] in the disabled-but-not-locked case * Call the MSR variable "val" for more readability. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20161123114855.njguoaygp3qnbkia@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 4 ++++ arch/x86/include/uapi/asm/mce.h | 1 + arch/x86/kernel/cpu/mcheck/mce.c | 4 ++++ arch/x86/kernel/cpu/mcheck/mce_intel.c | 37 ++++++++++++++++++++++++++++++++++ 5 files changed, 47 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index a396292..d625b65 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -193,6 +193,7 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ #define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 78f3760..710273c 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -37,6 +37,10 @@ #define EFER_FFXSR (1<<_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ + +#define MSR_PPIN_CTL 0x0000004e +#define MSR_PPIN 0x0000004f + #define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 #define MSR_FSB_FREQ 0x000000cd diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h index 69a6e07..eb6247a 100644 --- a/arch/x86/include/uapi/asm/mce.h +++ b/arch/x86/include/uapi/asm/mce.h @@ -28,6 +28,7 @@ struct mce { __u64 mcgcap; /* MCGCAP MSR: machine check capabilities of CPU */ __u64 synd; /* MCA_SYND MSR: only valid on SMCA systems */ __u64 ipid; /* MCA_IPID MSR: only valid on SMCA systems */ + __u64 ppin; /* Protected Processor Inventory Number */ }; #define MCE_GET_RECORD_LEN _IOR('M', 1, int) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index aab96f8..a3cb27a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -43,6 +43,7 @@ #include <linux/export.h> #include <linux/jump_label.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/traps.h> #include <asm/tlbflush.h> @@ -135,6 +136,9 @@ void mce_setup(struct mce *m) m->socketid = cpu_data(m->extcpu).phys_proc_id; m->apicid = cpu_data(m->extcpu).initial_apicid; rdmsrl(MSR_IA32_MCG_CAP, m->mcgcap); + + if (this_cpu_has(X86_FEATURE_INTEL_PPIN)) + rdmsrl(MSR_PPIN, m->ppin); } DEFINE_PER_CPU(struct mce, injectm); diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c index be0b2fa..190b3e6 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c @@ -11,6 +11,8 @@ #include <linux/sched.h> #include <linux/cpumask.h> #include <asm/apic.h> +#include <asm/cpufeature.h> +#include <asm/intel-family.h> #include <asm/processor.h> #include <asm/msr.h> #include <asm/mce.h> @@ -464,11 +466,46 @@ static void intel_clear_lmce(void) wrmsrl(MSR_IA32_MCG_EXT_CTL, val); } +static void intel_ppin_init(struct cpuinfo_x86 *c) +{ + unsigned long long val; + + /* + * Even if testing the presence of the MSR would be enough, we don't + * want to risk the situation where other models reuse this MSR for + * other purposes. + */ + switch (c->x86_model) { + case INTEL_FAM6_IVYBRIDGE_X: + case INTEL_FAM6_HASWELL_X: + case INTEL_FAM6_BROADWELL_XEON_D: + case INTEL_FAM6_BROADWELL_X: + case INTEL_FAM6_SKYLAKE_X: + if (rdmsrl_safe(MSR_PPIN_CTL, &val)) + return; + + if ((val & 3UL) == 1UL) { + /* PPIN available but disabled: */ + return; + } + + /* If PPIN is disabled, but not locked, try to enable: */ + if (!(val & 3UL)) { + wrmsrl_safe(MSR_PPIN_CTL, val | 2UL); + rdmsrl_safe(MSR_PPIN_CTL, &val); + } + + if ((val & 3UL) == 2UL) + set_cpu_cap(c, X86_FEATURE_INTEL_PPIN); + } +} + void mce_intel_feature_init(struct cpuinfo_x86 *c) { intel_init_thermal(c); intel_init_cmci(); intel_init_lmce(); + intel_ppin_init(c); } void mce_intel_feature_clear(struct cpuinfo_x86 *c) ^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-11-24 11:21 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-18 0:35 [PATCH 2/2] mcelog: Print the PPIN in machine check records when it is available Luck, Tony 2016-11-18 0:35 ` [PATCH 1/2] x86/mce: Include " Luck, Tony 2016-11-18 13:00 ` Borislav Petkov 2016-11-18 16:42 ` Luck, Tony 2016-11-18 17:02 ` Andi Kleen 2016-11-18 17:45 ` Borislav Petkov 2016-11-18 17:48 ` [PATCH v2] " Luck, Tony 2016-11-23 11:48 ` Borislav Petkov 2016-11-23 13:29 ` Henrique de Moraes Holschuh 2016-11-23 13:37 ` Borislav Petkov 2016-11-23 14:05 ` Borislav Petkov 2016-11-23 16:42 ` Tony Luck 2016-11-23 16:55 ` Borislav Petkov 2016-11-23 17:29 ` Henrique de Moraes Holschuh 2016-11-23 20:56 ` Tony Luck 2016-11-24 11:20 ` Henrique de Moraes Holschuh 2016-11-23 15:58 ` [tip:ras/core] x86/mce: Include the PPIN in MCE records when available tip-bot for Tony Luck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).