* [PATCH 0/3] Fix MCE handling for AMD multi-node processors
@ 2014-12-22 20:10 Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 1/3] x86,amd: Refactor amd cpu topology functions for " Aravind Gopalakrishnan
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-22 20:10 UTC (permalink / raw)
To: tglx, mingo, hpa, tony.luck, bp, dougthompson, mchehab, x86,
linux-kernel, linux-edac
Cc: dave.hansen, mgorman, bp, riel, jacob.w.shin,
Aravind Gopalakrishnan
When a MCE happens that is to be logged onto bank 4 of AMD multi-node
processors, they are reported only to corresponding node base core of
the cpu on which the error occurred.
Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
clarifications on the reporting of MC4 errors only to NBC MSRs.
We don't have the exception handler wired up to handle this currently.
As a consequence, do_machine_check only runs on the core on which error
occurred and (since according to the BKDGs, reads to MC4_STATUS MSR of
non-NBC will simply RAZ) the exception is ignored for the core.
This is a problem as now we have dropped MCEs.
I tested this on Fam10h and Fam15h using mce_amd_inj and by triggering
a real HW MCE using Boris' new interface; And can confirm the behavior.
This patch set fixes the issue by looking at the NBC MSRs when bank 4
errors happen on AMD multi node processors.
Patch 1: Refactor AMD cpu topology functions so that we can get some
relevant info that we need to use in EDAC, MC handler routines
Patch 2: The fix to our problem
Patch 3: Modify mce_amd_inj interfaces to write to only NBC for bank 4
errors. Only then will they be picked up for error handling.
Aravind Gopalakrishnan (3):
x86,amd: Refactor amd cpu topology functions for multi-node processors
x86, mce: Handle AMD MCE on bank4 on NBC for multi-node processors
edac, mce_amd_inj: Inject errors only on NBC for bank 4 errors
arch/x86/include/asm/processor.h | 1 +
arch/x86/kernel/cpu/amd.c | 78 ++++++++++++++----
arch/x86/kernel/cpu/mcheck/mce.c | 167 +++++++++++++++++++++++++++++++++++----
drivers/edac/mce_amd_inj.c | 21 ++++-
4 files changed, 235 insertions(+), 32 deletions(-)
--
2.0.2
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/3] x86,amd: Refactor amd cpu topology functions for multi-node processors
2014-12-22 20:10 [PATCH 0/3] Fix MCE handling for AMD multi-node processors Aravind Gopalakrishnan
@ 2014-12-22 20:10 ` Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 2/3] x86, mce: Handle AMD MCE on bank4 on NBC " Aravind Gopalakrishnan
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-22 20:10 UTC (permalink / raw)
To: tglx, mingo, hpa, tony.luck, bp, dougthompson, mchehab, x86,
linux-kernel, linux-edac
Cc: dave.hansen, mgorman, bp, riel, jacob.w.shin,
Aravind Gopalakrishnan
We would like to be able to provide a mechanism for finding the NBC for
each node in multi-node platforms. For this, we will need the number of
nodes per processor.
So, factor this calculation out of amd_get_topology() and provide separate
getter function 'amd_get_nbc_for_node' for obtaining NBC for a node.
Another useful calculation is to find the NBC of a node on which a given cpu
is on. Provide a getter for this as well.
- We are using it in EDAC drivers, MCE handlers right away, so marked
extern.
- We currently don't use amd_get_nbc_for_node outside of here, so
marking it static. We can make this extern as and when need arises.
While at it, reverse the return condition in amd_get_topology
and save an indent level.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
arch/x86/include/asm/processor.h | 1 +
arch/x86/kernel/cpu/amd.c | 78 ++++++++++++++++++++++++++++++++--------
2 files changed, 65 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a092a0c..6e1dc06 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -979,6 +979,7 @@ static inline int mpx_disable_management(struct task_struct *tsk)
#endif /* CONFIG_X86_INTEL_MPX */
extern u16 amd_get_nb_id(int cpu);
+extern unsigned int amd_get_nbc_for_cpu(int cpu);
static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves)
{
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index a220239..0c0eae5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -281,6 +281,28 @@ static int nearby_node(int apicid)
}
#endif
+#ifdef CONFIG_X86_HT
+static u32 amd_get_num_nodes(struct cpuinfo_x86 *c)
+{
+ u32 nodes = 1;
+
+ if (cpu_has_topoext) {
+ u32 ecx;
+
+ ecx = cpuid_ecx(0x8000001e);
+ nodes = ((ecx >> 8) & 7) + 1;
+ } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
+ u64 value;
+
+ rdmsrl(MSR_FAM10H_NODE_ID, value);
+ nodes = ((value >> 3) & 7) + 1;
+ }
+
+ return nodes;
+}
+EXPORT_SYMBOL_GPL(amd_get_num_nodes);
+#endif
+
/*
* Fixup core topology information for
* (1) AMD multi-node processors
@@ -291,15 +313,18 @@ static int nearby_node(int apicid)
static void amd_get_topology(struct cpuinfo_x86 *c)
{
u32 nodes, cores_per_cu = 1;
+ u32 cores_per_node;
+ u32 cus_per_node;
u8 node_id;
int cpu = smp_processor_id();
/* get information required for multi-node processors */
+ nodes = amd_get_num_nodes(c);
+
if (cpu_has_topoext) {
u32 eax, ebx, ecx, edx;
cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
- nodes = ((ecx >> 8) & 7) + 1;
node_id = ecx & 7;
/* get compute unit information */
@@ -310,27 +335,24 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
u64 value;
rdmsrl(MSR_FAM10H_NODE_ID, value);
- nodes = ((value >> 3) & 7) + 1;
node_id = value & 7;
} else
return;
/* fixup multi-node processor information */
- if (nodes > 1) {
- u32 cores_per_node;
- u32 cus_per_node;
+ if (nodes < 2)
+ return;
- set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes;
- cus_per_node = cores_per_node / cores_per_cu;
+ set_cpu_cap(c, X86_FEATURE_AMD_DCM);
+ cores_per_node = c->x86_max_cores / nodes;
+ cus_per_node = cores_per_node / cores_per_cu;
- /* store NodeID, use llc_shared_map to store sibling info */
- per_cpu(cpu_llc_id, cpu) = node_id;
+ /* store NodeID, use llc_shared_map to store sibling info */
+ per_cpu(cpu_llc_id, cpu) = node_id;
- /* core id has to be in the [0 .. cores_per_node - 1] range */
- c->cpu_core_id %= cores_per_node;
- c->compute_unit_id %= cus_per_node;
- }
+ /* core id has to be in the [0 .. cores_per_node - 1] range */
+ c->cpu_core_id %= cores_per_node;
+ c->compute_unit_id %= cus_per_node;
}
#endif
@@ -365,6 +387,34 @@ u16 amd_get_nb_id(int cpu)
}
EXPORT_SYMBOL_GPL(amd_get_nb_id);
+#ifdef CONFIG_X86_HT
+static u32 amd_get_nbc_for_node(int node_id)
+{
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+ u32 nodes, cores_per_node;
+
+ nodes = amd_get_num_nodes(c);
+ cores_per_node = c->x86_max_cores / nodes;
+
+ return cores_per_node * node_id;
+}
+EXPORT_SYMBOL_GPL(amd_get_nbc_for_node);
+#endif
+
+#ifdef CONFIG_X86_HT
+unsigned int amd_get_nbc_for_cpu(int cpu)
+{
+ unsigned int nbc = 0;
+
+ /* for multi-node */
+ if (test_cpu_cap(&boot_cpu_data, X86_FEATURE_AMD_DCM))
+ nbc = amd_get_nbc_for_node(amd_get_nb_id(cpu));
+
+ return nbc;
+}
+EXPORT_SYMBOL_GPL(amd_get_nbc_for_cpu);
+#endif
+
static void srat_detect_node(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_NUMA
--
2.0.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] x86, mce: Handle AMD MCE on bank4 on NBC for multi-node processors
2014-12-22 20:10 [PATCH 0/3] Fix MCE handling for AMD multi-node processors Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 1/3] x86,amd: Refactor amd cpu topology functions for " Aravind Gopalakrishnan
@ 2014-12-22 20:10 ` Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 3/3] edac, mce_amd_inj: Inject errors only on NBC for bank 4 errors Aravind Gopalakrishnan
2014-12-22 20:15 ` [PATCH 0/3] Fix MCE handling for AMD multi-node processors Borislav Petkov
3 siblings, 0 replies; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-22 20:10 UTC (permalink / raw)
To: tglx, mingo, hpa, tony.luck, bp, dougthompson, mchehab, x86,
linux-kernel, linux-edac
Cc: dave.hansen, mgorman, bp, riel, jacob.w.shin,
Aravind Gopalakrishnan
The problem is that when MC4 error happens on AMD multi-node processors,
they are reported only to corresponding node base core of the cpu
on which the error occurred.
We don't have the exception handler wired up to handle this currently.
As a consequence, do_machine_check only runs on the core on which error
occurred and (since according to the BKDGs, reads to MC4_STATUS MSR of
non-NBC will simply RAZ) the exception is ignored for the core.
Now, when an exception _does_ happen on the node base core, that gets
handled properly. I tested this on Fam10h and Fam15h using mce_amd_inj
and by triggering a real HW MCE using Boris' new interface; And can
confirm the behavior. (Of course, with mce_amd_inj I could only inject
on the NBC as BKDGs state that writes are ignored. The following patch
addresses this problem)
This patch fixes the issue by reading (or writing) only to the NBC MSR
for bank 4 errors on multi node processors for Fam10h onwards.
For this, we need to rdmsrl_on_cpu and I have provided wrappers that
also supports the injection interface used by mce-inject.c
The handling of CE also has to be corrected as a result of this.
- Since MC4 errors are only processed on NBCs, we can theoritically
have multiple number of CEs in flight waiting to be handled (as many
as there are nodes on the system)
- Providing an array to save cpu on which CEs occur
- Later, when the poller runs, it only need to verify if the current
cpu is the saved one for the node on which current cpu is on. If yes,
handle the error.
Misc changes:
- Modify mce_read_aux to use bank value off existing mce struct argument
- modifying mce_clear_state to take a cpu argument as it will be needed
for clearing correct MSR STATUS register if we need_amd_nbc_handling
Testing details:
- Tested the patch on
- multi node: Fam10h, Fam15h Model 00h
- single node: Fam15h Model 60h
- Before patch, no error injected to non NBC is handled by MCE handler
- After patch, UE are handled immediately by looking at correct NBC
- For CEs cpu values are stored and handled by polling mechanism
later on the correct cpu on which error happened.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 167 +++++++++++++++++++++++++++++++++++----
1 file changed, 153 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d2c6116..c5eedbd 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -87,6 +87,12 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
static DEFINE_PER_CPU(struct mce, mces_seen);
static int cpu_missing;
+/* AMD fam10h and later can only access NBC MSRs for MC4 errors */
+static int need_amd_nbc_handling;
+
+/* multiple CEs can be in flight on the per-node MSR waiting to be handled */
+static int *amd_saved_ce_on_cpu;
+
/* CMCI storm detection filter */
static DEFINE_PER_CPU(unsigned long, mce_polled_error);
@@ -428,6 +434,45 @@ static void mce_wrmsrl(u32 msr, u64 v)
wrmsrl(msr, v);
}
+/* MSR per-cpu access wrappers */
+static u64 mce_rdmsrl_on_cpu(unsigned int cpu, u32 msr)
+{
+ struct mce *reg = per_cpu_ptr(&injectm, cpu);
+ u64 v;
+
+ if (reg->finished) {
+ int offset = msr_to_offset(msr);
+
+ if (offset < 0)
+ return 0;
+ return *(u64 *)((char *)per_cpu_ptr(&injectm, cpu) + offset);
+ }
+
+ if (rdmsrl_on_cpu(cpu, msr, &v)) {
+ WARN_ONCE(1, "mce: Unable to read msr %d!\n", msr);
+ v = 0;
+ }
+
+ return v;
+}
+
+static void mce_wrmsrl_on_cpu(unsigned int cpu, u32 msr, u64 v)
+{
+ struct mce *reg = per_cpu_ptr(&injectm, cpu);
+
+ if (reg->finished) {
+ int offset = msr_to_offset(msr);
+
+ if (offset >= 0)
+ *(u64 *)((char *)per_cpu_ptr(&injectm, cpu) +
+ offset) = v;
+ return;
+ }
+
+ if (wrmsrl_on_cpu(cpu, msr, v))
+ WARN_ONCE(1, "mce: Unable to write to msr %d!\n", msr);
+}
+
/*
* Collect all global (w.r.t. this processor) status about this machine
* check into our "mce" struct so that we can use it later to assess
@@ -557,12 +602,19 @@ static void mce_report_event(struct pt_regs *regs)
/*
* Read ADDR and MISC registers.
*/
-static void mce_read_aux(struct mce *m, int i)
+static void mce_read_aux(struct mce *m)
{
if (m->status & MCI_STATUS_MISCV)
- m->misc = mce_rdmsrl(MSR_IA32_MCx_MISC(i));
+ m->misc = mce_rdmsrl(MSR_IA32_MCx_MISC(m->bank));
if (m->status & MCI_STATUS_ADDRV) {
- m->addr = mce_rdmsrl(MSR_IA32_MCx_ADDR(i));
+ if (m->bank == 4 && need_amd_nbc_handling) {
+ unsigned int nbc = amd_get_nbc_for_cpu(m->extcpu);
+
+ m->addr = mce_rdmsrl_on_cpu(nbc,
+ MSR_IA32_MCx_ADDR(m->bank));
+ } else {
+ m->addr = mce_rdmsrl(MSR_IA32_MCx_ADDR(m->bank));
+ }
/*
* Mask the reported address by the reported granularity.
@@ -627,6 +679,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
{
struct mce m;
int severity;
+ u16 nid = 0;
int i;
this_cpu_inc(mce_poll_count);
@@ -643,7 +696,19 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
m.tsc = 0;
barrier();
- m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+ if (i == 4 && need_amd_nbc_handling) {
+ nid = amd_get_nb_id(m.extcpu);
+ if (amd_saved_ce_on_cpu[nid] != -1 &&
+ amd_saved_ce_on_cpu[nid] != m.extcpu)
+ continue;
+ else
+ m.status = mce_rdmsrl_on_cpu(
+ amd_get_nbc_for_cpu(m.extcpu),
+ MSR_IA32_MCx_STATUS(i));
+ } else {
+ m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+ }
+
if (!(m.status & MCI_STATUS_VAL))
continue;
@@ -658,7 +723,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
(m.status & (mca_cfg.ser ? MCI_STATUS_S : MCI_STATUS_UC)))
continue;
- mce_read_aux(&m, i);
+ mce_read_aux(&m);
if (!(flags & MCP_TIMESTAMP))
m.tsc = 0;
@@ -686,7 +751,21 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
/*
* Clear state for this bank.
*/
- mce_wrmsrl(MSR_IA32_MCx_STATUS(i), 0);
+ if (i == 4 &&
+ need_amd_nbc_handling &&
+ amd_saved_ce_on_cpu[nid] != -1 &&
+ amd_saved_ce_on_cpu[nid] == m.extcpu) {
+ mce_wrmsrl_on_cpu(amd_get_nbc_for_cpu(m.extcpu),
+ MSR_IA32_MCx_STATUS(i),
+ 0);
+ /*
+ * We also need to reset
+ * saved_ce_on_cpu for future CEs
+ */
+ amd_saved_ce_on_cpu[nid] = -1;
+ } else {
+ mce_wrmsrl(MSR_IA32_MCx_STATUS(i), 0);
+ }
}
/*
@@ -708,7 +787,13 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
int i, ret = 0;
for (i = 0; i < mca_cfg.banks; i++) {
- m->status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+ if (i == 4 && need_amd_nbc_handling)
+ m->status = mce_rdmsrl_on_cpu(
+ amd_get_nbc_for_cpu(m->extcpu),
+ MSR_IA32_MCx_STATUS(i));
+ else
+ m->status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+
if (m->status & MCI_STATUS_VAL) {
__set_bit(i, validp);
if (quirk_no_way_out)
@@ -992,13 +1077,23 @@ static int mce_usable_address(struct mce *m)
return 1;
}
-static void mce_clear_state(unsigned long *toclear)
+/*
+ * If need_amd_nbc_handling is set, then we'll need a cpu
+ * value to be able to clean the status on the correct core
+ */
+static void mce_clear_state(unsigned long *toclear, unsigned int cpu)
{
int i;
for (i = 0; i < mca_cfg.banks; i++) {
- if (test_bit(i, toclear))
- mce_wrmsrl(MSR_IA32_MCx_STATUS(i), 0);
+ if (test_bit(i, toclear)) {
+ if (i == 4 && need_amd_nbc_handling)
+ mce_wrmsrl_on_cpu(cpu,
+ MSR_IA32_MCx_STATUS(i),
+ 0);
+ else
+ mce_wrmsrl(MSR_IA32_MCx_STATUS(i), 0);
+ }
}
}
@@ -1081,6 +1176,10 @@ void do_machine_check(struct pt_regs *regs, long error_code)
* error.
*/
int kill_it = 0;
+
+ /* for saving nbc value for AMD systems that need_amd_nbc_handling */
+ unsigned int nbc = 0;
+
DECLARE_BITMAP(toclear, MAX_NR_BANKS);
DECLARE_BITMAP(valid_banks, MAX_NR_BANKS);
char *msg = "Unknown";
@@ -1125,7 +1224,14 @@ void do_machine_check(struct pt_regs *regs, long error_code)
m.addr = 0;
m.bank = i;
- m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+ if (i == 4 && need_amd_nbc_handling) {
+ nbc = amd_get_nbc_for_cpu(m.extcpu);
+ m.status = mce_rdmsrl_on_cpu(nbc,
+ MSR_IA32_MCx_STATUS(i));
+ } else {
+ m.status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i));
+ }
+
if ((m.status & MCI_STATUS_VAL) == 0)
continue;
@@ -1133,9 +1239,22 @@ void do_machine_check(struct pt_regs *regs, long error_code)
* Non uncorrected or non signaled errors are handled by
* machine_check_poll. Leave them alone, unless this panics.
*/
+
+ /*
+ * Also, If we are on AMD system and need_amd_nbc_handling is
+ * set, then keep track of the cpu number on which error
+ * happened so that polling mechanism can handle it on the
+ * correct core later on
+ */
if (!(m.status & (cfg->ser ? MCI_STATUS_S : MCI_STATUS_UC)) &&
- !no_way_out)
+ !no_way_out) {
+ if (i == 4 && need_amd_nbc_handling) {
+ u16 nid = amd_get_nb_id(m.extcpu);
+
+ amd_saved_ce_on_cpu[nid] = m.extcpu;
+ }
continue;
+ }
/*
* Set taint even when machine check was not enabled.
@@ -1160,7 +1279,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
continue;
}
- mce_read_aux(&m, i);
+ mce_read_aux(&m);
/*
* Action optional error. Queue address for later processing.
@@ -1184,7 +1303,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
m = *final;
if (!no_way_out)
- mce_clear_state(toclear);
+ mce_clear_state(toclear, nbc);
/*
* Do most of the synchronization with other CPUs.
@@ -1559,6 +1678,26 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
/* This should be disabled by the BIOS, but isn't always */
if (c->x86_vendor == X86_VENDOR_AMD) {
+ if (c->x86 >= 0x10) {
+ int n_nodes = num_online_nodes();
+ int i;
+
+ /* we only need to initialize once */
+ if (!amd_saved_ce_on_cpu) {
+ amd_saved_ce_on_cpu = (int *)
+ kzalloc(sizeof(int) * n_nodes,
+ GFP_KERNEL);
+
+ if (!amd_saved_ce_on_cpu)
+ return -ENOMEM;
+
+ for (i = 0; i < n_nodes; i++)
+ amd_saved_ce_on_cpu[i] = -1;
+
+ need_amd_nbc_handling = 1;
+ }
+ }
+
if (c->x86 == 15 && cfg->banks > 4) {
/*
* disable GART TBL walk error reporting, which
--
2.0.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] edac, mce_amd_inj: Inject errors only on NBC for bank 4 errors
2014-12-22 20:10 [PATCH 0/3] Fix MCE handling for AMD multi-node processors Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 1/3] x86,amd: Refactor amd cpu topology functions for " Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 2/3] x86, mce: Handle AMD MCE on bank4 on NBC " Aravind Gopalakrishnan
@ 2014-12-22 20:10 ` Aravind Gopalakrishnan
2014-12-22 20:15 ` [PATCH 0/3] Fix MCE handling for AMD multi-node processors Borislav Petkov
3 siblings, 0 replies; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-22 20:10 UTC (permalink / raw)
To: tglx, mingo, hpa, tony.luck, bp, dougthompson, mchehab, x86,
linux-kernel, linux-edac
Cc: dave.hansen, mgorman, bp, riel, jacob.w.shin,
Aravind Gopalakrishnan
For Fam10h onwards, bank4 MCE are reported only to the node base
cores. This patch modifies the do_inject code path to take care
of this.
Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
clarifications on the reporting of MC4 errors only to NBC MSRs.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
drivers/edac/mce_amd_inj.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index 0bd91a8..d4aa14f 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -126,6 +126,7 @@ static void do_inject(void)
{
u64 mcg_status = 0;
unsigned int cpu = i_mce.extcpu;
+ int saved_cpu = -1;
u8 b = i_mce.bank;
if (!(i_mce.inject_flags & MCJ_EXCEPTION)) {
@@ -143,22 +144,34 @@ static void do_inject(void)
if (!(i_mce.status & MCI_STATUS_PCC))
mcg_status |= MCG_STATUS_RIPV;
- toggle_hw_mce_inject(cpu, true);
-
wrmsr_on_cpu(cpu, MSR_IA32_MCG_STATUS,
(u32)mcg_status, (u32)(mcg_status >> 32));
+ /*
+ * for Fam10h+, if bank = 4, then we should write to NBC MSR
+ * for multi-node processors else to core 0 for single node processors
+ */
+ if (boot_cpu_data.x86 >= 0x10 && b == 4) {
+ saved_cpu = cpu;
+ cpu = amd_get_nbc_for_cpu(cpu);
+ }
+
+ toggle_hw_mce_inject(cpu, true);
+
wrmsr_on_cpu(cpu, MSR_IA32_MCx_STATUS(b),
(u32)i_mce.status, (u32)(i_mce.status >> 32));
wrmsr_on_cpu(cpu, MSR_IA32_MCx_ADDR(b),
(u32)i_mce.addr, (u32)(i_mce.addr >> 32));
+ toggle_hw_mce_inject(cpu, false);
+
+ if (saved_cpu != -1)
+ cpu = saved_cpu;
+
wrmsr_on_cpu(cpu, MSR_IA32_MCx_MISC(b),
(u32)i_mce.misc, (u32)(i_mce.misc >> 32));
- toggle_hw_mce_inject(cpu, false);
-
smp_call_function_single(cpu, trigger_mce, NULL, 0);
err:
--
2.0.2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2014-12-22 20:10 [PATCH 0/3] Fix MCE handling for AMD multi-node processors Aravind Gopalakrishnan
` (2 preceding siblings ...)
2014-12-22 20:10 ` [PATCH 3/3] edac, mce_amd_inj: Inject errors only on NBC for bank 4 errors Aravind Gopalakrishnan
@ 2014-12-22 20:15 ` Borislav Petkov
2014-12-22 20:56 ` Aravind Gopalakrishnan
3 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2014-12-22 20:15 UTC (permalink / raw)
To: Aravind Gopalakrishnan
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On Mon, Dec 22, 2014 at 02:10:09PM -0600, Aravind Gopalakrishnan wrote:
> When a MCE happens that is to be logged onto bank 4 of AMD multi-node
> processors, they are reported only to corresponding node base core of
> the cpu on which the error occurred.
>
> Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
Let me try to understand this correctly:
Does that mean that we could fix this by simply doing:
D18F3x44[NbMcaToMstCpuEn]=0b
on each NB?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2014-12-22 20:15 ` [PATCH 0/3] Fix MCE handling for AMD multi-node processors Borislav Petkov
@ 2014-12-22 20:56 ` Aravind Gopalakrishnan
2014-12-22 23:19 ` Borislav Petkov
0 siblings, 1 reply; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-22 20:56 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On 12/22/2014 2:15 PM, Borislav Petkov wrote:
> On Mon, Dec 22, 2014 at 02:10:09PM -0600, Aravind Gopalakrishnan wrote:
>> When a MCE happens that is to be logged onto bank 4 of AMD multi-node
>> processors, they are reported only to corresponding node base core of
>> the cpu on which the error occurred.
>>
>> Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
> Let me try to understand this correctly:
>
> Does that mean that we could fix this by simply doing:
>
> D18F3x44[NbMcaToMstCpuEn]=0b
>
> on each NB?
>
Not quite..
When this field is 0, BKDG says the error may be reported to the core
that originated the request *if applicable and known*
Looking at the error signatures table for MC4 (Part 2),
we can see only some errors have 'ErrCoreId' column as valid
Besides, if IO originated the request, then it is reported only to NBC.
So, to take care of all these cases, I am just following one approach
here: and that is to look at NBC MSRs for any bank 4 errors.
(It seems to be what the BKDG recommends anyway as BIOS by default
should set D18F3x44[NbMcaToMstCpuEn])
Thanks,
-Aravind.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2014-12-22 20:56 ` Aravind Gopalakrishnan
@ 2014-12-22 23:19 ` Borislav Petkov
2014-12-23 19:41 ` Aravind Gopalakrishnan
0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2014-12-22 23:19 UTC (permalink / raw)
To: Aravind Gopalakrishnan
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On Mon, Dec 22, 2014 at 02:56:47PM -0600, Aravind Gopalakrishnan wrote:
> On 12/22/2014 2:15 PM, Borislav Petkov wrote:
> >On Mon, Dec 22, 2014 at 02:10:09PM -0600, Aravind Gopalakrishnan wrote:
> >>When a MCE happens that is to be logged onto bank 4 of AMD multi-node
> >>processors, they are reported only to corresponding node base core of
> >>the cpu on which the error occurred.
> >>
> >>Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
> >Let me try to understand this correctly:
> >
> >Does that mean that we could fix this by simply doing:
> >
> >D18F3x44[NbMcaToMstCpuEn]=0b
> >
> >on each NB?
> >
>
> Not quite..
> When this field is 0, BKDG says the error may be reported to the core that
> originated the request *if applicable and known*
> Looking at the error signatures table for MC4 (Part 2),
> we can see only some errors have 'ErrCoreId' column as valid
>
> Besides, if IO originated the request, then it is reported only to NBC.
>
> So, to take care of all these cases, I am just following one approach here:
> and that is to look at NBC MSRs for any bank 4 errors.
> (It seems to be what the BKDG recommends anyway as BIOS by default should
> set D18F3x44[NbMcaToMstCpuEn])
Then in that case you have to check the case where
D18F3x44[NbMcaToMstCpuEn] is 0 for whatever reason (some BIOS forgot to
set it or whatever) and to set it again.
Then, upon a quick scan, your patches are adding a lot of vendor-specific
stuff which doesn't belong in the #MC handler, should probably be
wrapped or so, no good idea right now.
Then, you're using rd/wrmsr_on_cpu which does smp_call_function_single()
which can deadlock in atomic context and #MC is one.
Also, the math in amd_get_nbc_for_node() is too fragile and will break
the moment some BIOS renumbers cores to accomodate some other OS.
In any case, I won't be able to take a detailed look soon with the
holidays coming up.
Also, I'm wondering if this can't be solved much more elegantly
by detecting that condition (bank == 4) in the #MC handler and
issuing an IPI before exiting it using irq_work which will schedule
do_machine_check on the NBC. And that should be even easier to do since
we're moving the #MC handler out of the IST and to the normal kernel
stack for 3.20, which would make this endeavor pretty cheap.
Anyway, just a couple of thoughts...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2014-12-22 23:19 ` Borislav Petkov
@ 2014-12-23 19:41 ` Aravind Gopalakrishnan
2015-01-06 23:54 ` Aravind Gopalakrishnan
0 siblings, 1 reply; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2014-12-23 19:41 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On 12/22/2014 5:19 PM, Borislav Petkov wrote:
> On Mon, Dec 22, 2014 at 02:56:47PM -0600, Aravind Gopalakrishnan wrote:
>> On 12/22/2014 2:15 PM, Borislav Petkov wrote:
>>> On Mon, Dec 22, 2014 at 02:10:09PM -0600, Aravind Gopalakrishnan wrote:
>>>> When a MCE happens that is to be logged onto bank 4 of AMD multi-node
>>>> processors, they are reported only to corresponding node base core of
>>>> the cpu on which the error occurred.
>>>>
>>>> Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
>>> Let me try to understand this correctly:
>>>
>>> Does that mean that we could fix this by simply doing:
>>>
>>> D18F3x44[NbMcaToMstCpuEn]=0b
>>>
>>> on each NB?
>>>
>> Not quite..
>> When this field is 0, BKDG says the error may be reported to the core that
>> originated the request *if applicable and known*
>> Looking at the error signatures table for MC4 (Part 2),
>> we can see only some errors have 'ErrCoreId' column as valid
>>
>> Besides, if IO originated the request, then it is reported only to NBC.
>>
>> So, to take care of all these cases, I am just following one approach here:
>> and that is to look at NBC MSRs for any bank 4 errors.
>> (It seems to be what the BKDG recommends anyway as BIOS by default should
>> set D18F3x44[NbMcaToMstCpuEn])
> Then in that case you have to check the case where
> D18F3x44[NbMcaToMstCpuEn] is 0 for whatever reason (some BIOS forgot to
> set it or whatever) and to set it again.
Okay.
> Then, upon a quick scan, your patches are adding a lot of vendor-specific
> stuff which doesn't belong in the #MC handler, should probably be
> wrapped or so, no good idea right now.
>
> Then, you're using rd/wrmsr_on_cpu which does smp_call_function_single()
> which can deadlock in atomic context and #MC is one.
>
> Also, the math in amd_get_nbc_for_node() is too fragile and will break
> the moment some BIOS renumbers cores to accomodate some other OS.
>
> In any case, I won't be able to take a detailed look soon with the
> holidays coming up.
>
> Also, I'm wondering if this can't be solved much more elegantly
> by detecting that condition (bank == 4) in the #MC handler and
> issuing an IPI before exiting it using irq_work which will schedule
> do_machine_check on the NBC. And that should be even easier to do since
> we're moving the #MC handler out of the IST and to the normal kernel
> stack for 3.20, which would make this endeavor pretty cheap.
Ok. I'll look into this approach too over the holidays and we can
restart the discussion
at a more convenient time.
> Anyway, just a couple of thoughts...
>
Thanks,
-Aravind.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2014-12-23 19:41 ` Aravind Gopalakrishnan
@ 2015-01-06 23:54 ` Aravind Gopalakrishnan
2015-01-07 17:06 ` Borislav Petkov
0 siblings, 1 reply; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2015-01-06 23:54 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On 12/23/2014 1:41 PM, Aravind Gopalakrishnan wrote:
> On 12/22/2014 5:19 PM, Borislav Petkov wrote:
>> On Mon, Dec 22, 2014 at 02:56:47PM -0600, Aravind Gopalakrishnan wrote:
>>> On 12/22/2014 2:15 PM, Borislav Petkov wrote:
>>>> On Mon, Dec 22, 2014 at 02:10:09PM -0600, Aravind Gopalakrishnan
>>>> wrote:
>>>>> When a MCE happens that is to be logged onto bank 4 of AMD multi-node
>>>>> processors, they are reported only to corresponding node base core of
>>>>> the cpu on which the error occurred.
>>>>>
>>>>> Refer D18F3x44[NbMcaToMstCpuEn] on BKDGs of Fam10h and later for
>>>> Let me try to understand this correctly:
>>>>
>>>> Does that mean that we could fix this by simply doing:
>>>>
>>>> D18F3x44[NbMcaToMstCpuEn]=0b
>>>>
>>>> on each NB?
>>>>
>>> Not quite..
>>> When this field is 0, BKDG says the error may be reported to the
>>> core that
>>> originated the request *if applicable and known*
>>> Looking at the error signatures table for MC4 (Part 2),
>>> we can see only some errors have 'ErrCoreId' column as valid
>>>
>>> Besides, if IO originated the request, then it is reported only to NBC.
>>>
>>> So, to take care of all these cases, I am just following one
>>> approach here:
>>> and that is to look at NBC MSRs for any bank 4 errors.
>>> (It seems to be what the BKDG recommends anyway as BIOS by default
>>> should
>>> set D18F3x44[NbMcaToMstCpuEn])
>> Then in that case you have to check the case where
>> D18F3x44[NbMcaToMstCpuEn] is 0 for whatever reason (some BIOS forgot to
>> set it or whatever) and to set it again.
>
Hi Boris,
It seems my earlier understanding of hardware behavior was not
completely right.
Here are some clarifications I have received after some internal discussion-
When D18F3x44[NBMstToMstCpuEn] is set, the interrupt is also routed to
the NBC.
This was not immediately clear to me from the description for the field
in the BKDG.
The BKDG states that errors are reported to the NBC and also that
status, addr, ctl
MSRs for MC4 are only accessible from the NBC.
I took this to understand that the error info is written to the NBC MSRs
while
the #MC could be generated from the non-NBC.
Now, given that setting NBMstToMstCpuEn ensures #MC is generated only on
NBC for MC4 errors,
we don't have a problem to solve in the #MC handler code.
So, we can discard patch2 of the series,
But we still need to change the error injection interfaces in mce_amd_inj:
mce_amd_inj triggers a #MC on the cpu number that the user specifies on
debugfs.
For any error other than MC4 errors, this is fine.
But we should really be triggering #MC only on NBC for MC4 errors.
and also make sure we use NBC to write to the status/addr MSRs as only
NBC has access to
these MSRs.
And for this, we will still need some the changes introduced in patch 1
and patch 3 entirely.
And since calculating a corresponding NBC for a given core will be
needed only in mce_amd_inj,
I can move those calculations to mce_amd_inj.
(Or if you prefer it be in a common location then I can leave it as it
as too)
>>
>> Also, the math in amd_get_nbc_for_node() is too fragile and will break
>> the moment some BIOS renumbers cores to accomodate some other OS.
>>
>
Could you please clarify this?
For cores_per_node I am using c->x86_max_cores which is a value obtained
from cpuid_ecx(0x80000008);
and NodesPerProcessor is obtained from cpuid_ecx(0x8000001e); (or
MSR_FAM10H_NODE_ID)
These values should still be consistent for other OS too right?
Also, these are the values that we have currently in amd_get_topology().
I simply refactored the code..
Thanks,
-Aravind.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2015-01-06 23:54 ` Aravind Gopalakrishnan
@ 2015-01-07 17:06 ` Borislav Petkov
2015-01-08 0:18 ` Aravind Gopalakrishnan
0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2015-01-07 17:06 UTC (permalink / raw)
To: Aravind Gopalakrishnan
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On Tue, Jan 06, 2015 at 05:54:15PM -0600, Aravind Gopalakrishnan wrote:
> Hi Boris,
> It seems my earlier understanding of hardware behavior was not completely
> right.
> Here are some clarifications I have received after some internal discussion-
> When D18F3x44[NBMstToMstCpuEn] is set, the interrupt is also routed to the
> NBC.
Good :)
> This was not immediately clear to me from the description for the field in
> the BKDG.
> The BKDG states that errors are reported to the NBC and also that status,
> addr, ctl
> MSRs for MC4 are only accessible from the NBC.
> I took this to understand that the error info is written to the NBC MSRs
> while
> the #MC could be generated from the non-NBC.
>
> Now, given that setting NBMstToMstCpuEn ensures #MC is generated only on NBC
> for MC4 errors,
> we don't have a problem to solve in the #MC handler code.
> So, we can discard patch2 of the series,
>
> But we still need to change the error injection interfaces in mce_amd_inj:
> mce_amd_inj triggers a #MC on the cpu number that the user specifies on
> debugfs.
> For any error other than MC4 errors, this is fine.
> But we should really be triggering #MC only on NBC for MC4 errors.
Why?
As you said yourself, the errors get reported on the NBC. Where they get
*triggered* is a different story.
We do injection as it is described in "2.15.2 Error Injection and
Simulation" in F15h BKDG, for example. Reporting of the thusly injected
bank4 error goes to the NBC.
I don't see the need to fix anything in the code as it is.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] Fix MCE handling for AMD multi-node processors
2015-01-07 17:06 ` Borislav Petkov
@ 2015-01-08 0:18 ` Aravind Gopalakrishnan
0 siblings, 0 replies; 11+ messages in thread
From: Aravind Gopalakrishnan @ 2015-01-08 0:18 UTC (permalink / raw)
To: Borislav Petkov
Cc: tglx, mingo, hpa, tony.luck, dougthompson, mchehab, x86,
linux-kernel, linux-edac, dave.hansen, mgorman, bp, riel,
jacob.w.shin
On 1/7/2015 11:06 AM, Borislav Petkov wrote:
> On Tue, Jan 06, 2015 at 05:54:15PM -0600, Aravind Gopalakrishnan wrote:
>> But we still need to change the error injection interfaces in mce_amd_inj:
>> mce_amd_inj triggers a #MC on the cpu number that the user specifies on
>> debugfs.
>> For any error other than MC4 errors, this is fine.
>> But we should really be triggering #MC only on NBC for MC4 errors.
> Why?
>
> As you said yourself, the errors get reported on the NBC. Where they get
> *triggered* is a different story.
Apologies if I was not clear earlier. Let me try to address the issue again-
I shall be verbose for sake of clarity here..
The bank 4 MSRs are per-node and per-node MSR are shared between cores
in a node.
So, technically, all cores of the same node have access to the MSR.
But, since D18F3x44[NBMstToMstCpuEn] is set, access is restricted to
only the NBC.
And, BKDG states that-
reads of these MSRs from other cores return 0’s and writes are ignored.
Now, with mce_amd_inj interface as it is right now, we basically
wrmsr_on_cpu()
to the MCx_[status|addr|misc] registers using the cpu value user
specifies at /sys/kernel/debug/mce-inject/cpu.
For a bank4 error (assume a UC case here) to a non-NBC (say core 6 of
first node in a multi-node platform),
mce_amd_inj will simply wrmsr_on_cpu(6,...).
Since writes are ignored, we basically don't populate any info on the
MSRs and when you trigger_mce on cpu 6,
do_machine_check will try to read status MSR for cpu6 which causes RAZ
and you basically would not see any output on dmesg.
(This is why I had originally thought we had dropped MCEs)
If the same error were to be introduced on a NBC (core 0 in the above
example),
(i.e), user were to provide cpu number 0 on
/sys/kernel/debug/mce-inject/cpu; then we would see output on dmesg.
This is because writes from cpu 0 to the MSR will go through.
This is the correction I have made in patch 3 where, for bank = 4, I
find the NBC for the given cpu and write the MSRs using the nbc value.
(I still need to modify the patch to also trigger #MC on the NBC)
Also, just to clarify any terminology issues:
'reporting' of errors means: active notification of errors to software
via machine check exceptions.
(as defined by BKDG in the section "Error Detection, Action, Logging,
and Reporting".
It's section 2.13.1.3 on a F15h M0h-0fh BKDG rev 3.14 for me.
section number might vary for you depending on the document version you
are referring to..)
> We do injection as it is described in "2.15.2 Error Injection and
> Simulation" in F15h BKDG, for example. Reporting of the thusly injected
> bank4 error goes to the NBC.
>
>
Just want to clarify some (potential) terminology issues here too:
"Error injection" is causing a DRAM error by writing to D18F3xB8 and
D18F3xBC.
If a DRAM error were to be introduced by using above method, then HW
should correctly 'report' the error to NBC.
"Error simulation" is basically what we are doing in mce_amd_inj.
But before we drive a #MC, we should honor the rules specified in the
BKDG wrt writing of MSRs IMHO. (specifically for the bank=4 case)
Thanks,
-Aravind.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-01-08 0:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-22 20:10 [PATCH 0/3] Fix MCE handling for AMD multi-node processors Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 1/3] x86,amd: Refactor amd cpu topology functions for " Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 2/3] x86, mce: Handle AMD MCE on bank4 on NBC " Aravind Gopalakrishnan
2014-12-22 20:10 ` [PATCH 3/3] edac, mce_amd_inj: Inject errors only on NBC for bank 4 errors Aravind Gopalakrishnan
2014-12-22 20:15 ` [PATCH 0/3] Fix MCE handling for AMD multi-node processors Borislav Petkov
2014-12-22 20:56 ` Aravind Gopalakrishnan
2014-12-22 23:19 ` Borislav Petkov
2014-12-23 19:41 ` Aravind Gopalakrishnan
2015-01-06 23:54 ` Aravind Gopalakrishnan
2015-01-07 17:06 ` Borislav Petkov
2015-01-08 0:18 ` Aravind Gopalakrishnan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox