* [v7,3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES
From: Alexandru Gagniuc @ 2018-05-25 15:53 UTC (permalink / raw)
To: linux-acpi
Cc: alex_gagniuc, austin_bolen, shyam_iyer, Alexandru Gagniuc,
Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, x86, Rafael J. Wysocki, Len Brown,
Mauro Carvalho Chehab, Robert Moore, Erik Schmauss, Tyler Baicar,
Will Deacon, James Morse, Jonathan (Zhixiong) Zhang, Dongjiu Geng,
linux-edac, linux-kernel, devel
As previously noted, the policy to panic on any "Fatal" GHES error is
not suitable for several classes of errors. The most notable is
error containment. The correct policy is to achieve identical behavior
to native error handling -- i.e. when not reported through GHES. This,
in special cases, may not be possible, as we have to exit NMIs, which
requires these special considerations
PCIe AER errors are contained and reported at the root port. On DPC
capable hardware, containment can be done by all downstream ports. DPC
also has the added advantage of preventing future errors. Since these
errors stop at the root port, we can do all the work we need to exit
NMI and reach the error handler.
This patch does away with the mindless crashing of the system, and
correctly invokes the AER handler. When AER is not enabled, or the
firmware doesn't provide sufficient information to identify the source
of the error, the original panic() behavior is maintained.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/acpi/apei/ghes.c | 43 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1b22e18168f5..f7126f6d8d52 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -425,7 +425,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
* GHES_SEV_RECOVERABLE -> AER_NONFATAL
* GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
* These both need to be reported and recovered from by the AER driver.
- * GHES_SEV_FATAL does not make it to this handler
+ * GHES_SEV_FATAL -> AER_FATAL
*/
static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
{
@@ -837,6 +837,45 @@ static inline void ghes_sea_remove(struct ghes *ghes) { }
static struct llist_head ghes_estatus_llist;
static struct irq_work ghes_proc_irq_work;
+/* PCIe AER errors are safe if AER section contains enough info. */
+static int ghes_pcie_has_safe_handler(struct acpi_hest_generic_data *gdata)
+{
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+ if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO &&
+ IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER))
+ return true;
+
+ return false;
+}
+
+/*
+ * Do we have an error handler that we can safely reach? We're concerned with
+ * being able to notify an error handler by crossing the NMI/IRQ boundary,
+ * being able to schedule_work, and so forth.
+ */
+static int ghes_has_fatal_handler(struct ghes *ghes)
+{
+ int worst_sev, sec_sev;
+ bool safe = true;
+ struct acpi_hest_generic_data *gdata;
+ const guid_t *section_type;
+ const struct acpi_hest_generic_status *estatus = ghes->estatus;
+
+ apei_estatus_for_each_section(estatus, gdata) {
+ section_type = (guid_t *)gdata->section_type;
+
+ if (guid_equal(section_type, &CPER_SEC_PCIE))
+ safe = ghes_pcie_has_safe_handler(gdata);
+
+ if (!safe)
+ break;
+ }
+
+ return safe;
+}
+
/*
* NMI may be triggered on any CPU, so ghes_in_nmi is used for
* having only one concurrent reader.
@@ -944,7 +983,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
}
sev = ghes_cper_severity(ghes->estatus->error_severity);
- if (sev >= GHES_SEV_FATAL) {
+ if ((sev >= GHES_SEV_FATAL) && !ghes_has_fatal_handler(ghes)) {
oops_begin();
ghes_print_queued_estatus();
__ghes_panic(ghes);
^ permalink raw reply related
* [v7,2/3] acpi: apei: Rename ghes_severity() to ghes_cper_severity()
From: Alexandru Gagniuc @ 2018-05-25 15:53 UTC (permalink / raw)
To: linux-acpi
Cc: alex_gagniuc, austin_bolen, shyam_iyer, Alexandru Gagniuc,
Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, x86, Rafael J. Wysocki, Len Brown,
Mauro Carvalho Chehab, Robert Moore, Erik Schmauss, Tyler Baicar,
Will Deacon, James Morse, Jonathan (Zhixiong) Zhang, Dongjiu Geng,
linux-edac, linux-kernel, devel
ghes_severity() is a misnomer in this case, as it implies the severity
of the entire GHES structure. Instead, it maps one CPER value to one
GHES_SEV* value. ghes_cper_severity() is clearer.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/acpi/apei/ghes.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 26a41bbe222b..1b22e18168f5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -271,7 +271,7 @@ static void ghes_fini(struct ghes *ghes)
unmap_gen_v2(ghes);
}
-static inline int ghes_severity(int severity)
+static inline int ghes_cper_severity(int severity)
{
switch (severity) {
case CPER_SEV_INFORMATIONAL:
@@ -388,7 +388,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
unsigned long pfn;
int flags = -1;
- int sec_sev = ghes_severity(gdata->error_severity);
+ int sec_sev = ghes_cper_severity(gdata->error_severity);
struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
@@ -467,10 +467,10 @@ static void ghes_do_proc(struct ghes *ghes,
guid_t *fru_id = &NULL_UUID_LE;
char *fru_text = "";
- sev = ghes_severity(estatus->error_severity);
+ sev = ghes_cper_severity(estatus->error_severity);
apei_estatus_for_each_section(estatus, gdata) {
sec_type = (guid_t *)gdata->section_type;
- sec_sev = ghes_severity(gdata->error_severity);
+ sec_sev = ghes_cper_severity(gdata->error_severity);
if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
fru_id = (guid_t *)gdata->fru_id;
@@ -511,7 +511,7 @@ static void __ghes_print_estatus(const char *pfx,
char pfx_seq[64];
if (pfx == NULL) {
- if (ghes_severity(estatus->error_severity) <=
+ if (ghes_cper_severity(estatus->error_severity) <=
GHES_SEV_CORRECTED)
pfx = KERN_WARNING;
else
@@ -533,7 +533,7 @@ static int ghes_print_estatus(const char *pfx,
static DEFINE_RATELIMIT_STATE(ratelimit_uncorrected, 5*HZ, 2);
struct ratelimit_state *ratelimit;
- if (ghes_severity(estatus->error_severity) <= GHES_SEV_CORRECTED)
+ if (ghes_cper_severity(estatus->error_severity) <= GHES_SEV_CORRECTED)
ratelimit = &ratelimit_corrected;
else
ratelimit = &ratelimit_uncorrected;
@@ -704,9 +704,8 @@ static int ghes_proc(struct ghes *ghes)
if (rc)
goto out;
- if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_FATAL) {
+ if (ghes_cper_severity(ghes->estatus->error_severity) >= GHES_SEV_FATAL)
__ghes_panic(ghes);
- }
if (!ghes_estatus_cached(ghes->estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
@@ -944,7 +943,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
ret = NMI_HANDLED;
}
- sev = ghes_severity(ghes->estatus->error_severity);
+ sev = ghes_cper_severity(ghes->estatus->error_severity);
if (sev >= GHES_SEV_FATAL) {
oops_begin();
ghes_print_queued_estatus();
^ permalink raw reply related
* [v7,1/3] acpi: apei: Rename GHES_SEV_PANIC to GHES_SEV_FATAL
From: Alexandru Gagniuc @ 2018-05-25 15:53 UTC (permalink / raw)
To: linux-acpi
Cc: alex_gagniuc, austin_bolen, shyam_iyer, Alexandru Gagniuc,
Tony Luck, Borislav Petkov, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, x86, Rafael J. Wysocki, Len Brown,
Mauro Carvalho Chehab, Robert Moore, Erik Schmauss, Tyler Baicar,
Will Deacon, James Morse, Jonathan (Zhixiong) Zhang, Dongjiu Geng,
linux-edac, linux-kernel, devel
'GHES_SEV_PANIC' implies that the kernel must panic. That was true
many years ago when fatal errors could not be handled and recovered.
However, this is no longer the case with PCIe AER and DPC errors. The
latter class of errors are contained at the hardware level.
'GHES_SEV_PANIC' is confusing because it implies a policy to crash the
system on fatal errors. Drop this questionable policy, and rename the
enum to 'GHES_SEV_FATAL' to better convey the meaning.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
arch/x86/kernel/cpu/mcheck/mce-apei.c | 2 +-
drivers/acpi/apei/ghes.c | 11 +++++------
drivers/edac/ghes_edac.c | 2 +-
include/acpi/ghes.h | 2 +-
4 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce-apei.c b/arch/x86/kernel/cpu/mcheck/mce-apei.c
index 2eee85379689..cbec89f5cdf0 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-apei.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-apei.c
@@ -53,7 +53,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
if (severity >= GHES_SEV_RECOVERABLE)
m.status |= MCI_STATUS_UC;
- if (severity >= GHES_SEV_PANIC) {
+ if (severity >= GHES_SEV_FATAL) {
m.status |= MCI_STATUS_PCC;
m.tsc = rdtsc();
}
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1efefe919555..26a41bbe222b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -281,10 +281,10 @@ static inline int ghes_severity(int severity)
case CPER_SEV_RECOVERABLE:
return GHES_SEV_RECOVERABLE;
case CPER_SEV_FATAL:
- return GHES_SEV_PANIC;
+ return GHES_SEV_FATAL;
default:
/* Unknown, go panic */
- return GHES_SEV_PANIC;
+ return GHES_SEV_FATAL;
}
}
@@ -425,8 +425,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
* GHES_SEV_RECOVERABLE -> AER_NONFATAL
* GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
* These both need to be reported and recovered from by the AER driver.
- * GHES_SEV_PANIC does not make it to this handling since the kernel must
- * panic.
+ * GHES_SEV_FATAL does not make it to this handler
*/
static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
{
@@ -705,7 +704,7 @@ static int ghes_proc(struct ghes *ghes)
if (rc)
goto out;
- if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
+ if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_FATAL) {
__ghes_panic(ghes);
}
@@ -946,7 +945,7 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
}
sev = ghes_severity(ghes->estatus->error_severity);
- if (sev >= GHES_SEV_PANIC) {
+ if (sev >= GHES_SEV_FATAL) {
oops_begin();
ghes_print_queued_estatus();
__ghes_panic(ghes);
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 68b6ee18bea6..8455758327d4 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -220,7 +220,7 @@ void ghes_edac_report_mem_error(struct ghes *ghes, int sev,
case GHES_SEV_RECOVERABLE:
type = HW_EVENT_ERR_UNCORRECTED;
break;
- case GHES_SEV_PANIC:
+ case GHES_SEV_FATAL:
type = HW_EVENT_ERR_FATAL;
break;
default:
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c866ee0..322f7ede24bd 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -49,7 +49,7 @@ enum {
GHES_SEV_NO = 0x0,
GHES_SEV_CORRECTED = 0x1,
GHES_SEV_RECOVERABLE = 0x2,
- GHES_SEV_PANIC = 0x3,
+ GHES_SEV_FATAL = 0x3,
};
/* From drivers/edac/ghes_edac.c */
^ permalink raw reply related
* mcelog: Fix "--ascii" parsing to cope with change in kernel output since v4.10
From: Andi Kleen @ 2018-05-22 17:07 UTC (permalink / raw)
To: Tony Luck; +Cc: Borislav Petkov, Jeffrin Thalakkottoor, linux-edac
> Fix the parsing code to treat the word "Exception" as optional.
Applied thanks.
-Andi
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* mcelog: Fix "--ascii" parsing to cope with change in kernel output since v4.10
From: Luck, Tony @ 2018-05-22 16:39 UTC (permalink / raw)
To: Andi Kleen; +Cc: Tony Luck, Borislav Petkov, Jeffrin Thalakkottoor, linux-edac
Linux kernel now includes this change:
commit cd9c57cad3fe ("x86/MCE: Dump MCE to dmesg if no consumers")
which has this fragment:
- pr_emerg(HW_ERR "CPU %d: Machine Check Exception: %Lx Bank %d: %016Lx\n",
- m->extcpu, m->mcgstatus, m->bank, m->status);
+ pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
+ m->extcpu,
+ (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
+ m->mcgstatus, m->bank, m->status);
that switched from always printing the word "Exception" to only
doing so for errors that were logged during aan exception (i.e. when
IA32_MSR_MCG_STATUS.MCIP is set to one). This is helpful to human readers
of console logs to distinguish between errors that caused machine checks
from errors that were found later polling machine check banks. But it
broke the "--ascii" parsing code in mcelog(8) :-(
The change went upstream 15 months ago in v4.10. So the horse has
left the stable and is over the hills and far away.
Fix the parsing code to treat the word "Exception" as optional.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
mcelog.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mcelog.c b/mcelog.c
index 8c09665810ba..2d46afbf9a41 100644
--- a/mcelog.c
+++ b/mcelog.c
@@ -719,7 +719,7 @@ restart:
if (!strncmp(s, "CPU ", 4)) {
unsigned cpu = 0, bank = 0;
n = sscanf(s,
- "CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx%n",
+ "CPU %u: Machine Check%*[ :Ec-x]%16Lx Bank %d: %016Lx%n",
&cpu,
&m.mcgstatus,
&bank,
^ permalink raw reply related
* [08/11] x86, memory_failure: introduce {set, clear}_mce_nospec()
From: Dan Williams @ 2018-05-22 14:40 UTC (permalink / raw)
To: linux-nvdimm
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, Tony Luck,
Borislav Petkov, linux-edac, hch, linux-mm, linux-fsdevel
Currently memory_failure() returns zero if the error was handled. On
that result mce_unmap_kpfn() is called to zap the page out of the kernel
linear mapping to prevent speculative fetches of potentially poisoned
memory. However, in the case of dax mapped devmap pages the page may be
in active permanent use by the device driver, so it cannot be unmapped
from the kernel.
Instead of marking the page not present, marking the page UC should
be sufficient for preventing poison from being pre-fetched into the
cache. Convert mce_unmap_pfn() to set_mce_nospec() remapping the page as
UC, to hide it from speculative accesses.
Given that that persistent memory errors can be cleared by the driver,
include a facility to restore the page to cacheable operation,
clear_mce_nospec().
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
arch/x86/include/asm/set_memory.h | 29 ++++++++++++++++++++++
arch/x86/kernel/cpu/mcheck/mce-internal.h | 15 -----------
arch/x86/kernel/cpu/mcheck/mce.c | 38 ++---------------------------
include/linux/set_memory.h | 14 +++++++++++
4 files changed, 46 insertions(+), 50 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index bd090367236c..debc1fee1457 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -88,4 +88,33 @@ extern int kernel_set_to_readonly;
void set_kernel_text_rw(void);
void set_kernel_text_ro(void);
+#ifdef CONFIG_X86_64
+/*
+ * Mark the linear address as UC to disable speculative pre-fetches into
+ * potentially poisoned memory.
+ */
+static inline int set_mce_nospec(unsigned long pfn)
+{
+ int rc;
+
+ rc = set_memory_uc((unsigned long) __va(PFN_PHYS(pfn)), 1);
+ if (rc)
+ pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
+ return rc;
+}
+#define set_mce_nospec set_mce_nospec
+
+/* Restore full speculative operation to the pfn. */
+static inline int clear_mce_nospec(unsigned long pfn)
+{
+ return set_memory_wb((unsigned long) __va(PFN_PHYS(pfn)), 1);
+}
+#define clear_mce_nospec clear_mce_nospec
+#else
+/*
+ * Few people would run a 32-bit kernel on a machine that supports
+ * recoverable errors because they have too much memory to boot 32-bit.
+ */
+#endif
+
#endif /* _ASM_X86_SET_MEMORY_H */
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 374d1aa66952..ceb67cd5918f 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -113,21 +113,6 @@ static inline void mce_register_injector_chain(struct notifier_block *nb) { }
static inline void mce_unregister_injector_chain(struct notifier_block *nb) { }
#endif
-#ifndef CONFIG_X86_64
-/*
- * On 32-bit systems it would be difficult to safely unmap a poison page
- * from the kernel 1:1 map because there are no non-canonical addresses that
- * we can use to refer to the address without risking a speculative access.
- * However, this isn't much of an issue because:
- * 1) Few unmappable pages are in the 1:1 map. Most are in HIGHMEM which
- * are only mapped into the kernel as needed
- * 2) Few people would run a 32-bit kernel on a machine that supports
- * recoverable errors because they have too much memory to boot 32-bit.
- */
-static inline void mce_unmap_kpfn(unsigned long pfn) {}
-#define mce_unmap_kpfn mce_unmap_kpfn
-#endif
-
struct mca_config {
bool dont_log_ce;
bool cmci_disabled;
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 42cf2880d0ed..a0fbf0a8b7e6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -42,6 +42,7 @@
#include <linux/irq_work.h>
#include <linux/export.h>
#include <linux/jump_label.h>
+#include <linux/set_memory.h>
#include <asm/intel-family.h>
#include <asm/processor.h>
@@ -50,7 +51,6 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/reboot.h>
-#include <asm/set_memory.h>
#include "mce-internal.h"
@@ -108,10 +108,6 @@ static struct irq_work mce_irq_work;
static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
-#ifndef mce_unmap_kpfn
-static void mce_unmap_kpfn(unsigned long pfn);
-#endif
-
/*
* CPU/chipset specific EDAC code can register a notifier call here to print
* MCE errors in a human-readable form.
@@ -602,7 +598,7 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) {
pfn = mce->addr >> PAGE_SHIFT;
if (!memory_failure(pfn, 0))
- mce_unmap_kpfn(pfn);
+ set_mce_nospec(pfn);
}
return NOTIFY_OK;
@@ -1070,38 +1066,10 @@ static int do_memory_failure(struct mce *m)
if (ret)
pr_err("Memory error not recovered");
else
- mce_unmap_kpfn(m->addr >> PAGE_SHIFT);
+ set_mce_nospec(m->addr >> PAGE_SHIFT);
return ret;
}
-#ifndef mce_unmap_kpfn
-static void mce_unmap_kpfn(unsigned long pfn)
-{
- unsigned long decoy_addr;
-
- /*
- * Unmap this page from the kernel 1:1 mappings to make sure
- * we don't log more errors because of speculative access to
- * the page.
- * We would like to just call:
- * set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1);
- * but doing that would radically increase the odds of a
- * speculative access to the poison page because we'd have
- * the virtual address of the kernel 1:1 mapping sitting
- * around in registers.
- * Instead we get tricky. We create a non-canonical address
- * that looks just like the one we want, but has bit 63 flipped.
- * This relies on set_memory_np() not checking whether we passed
- * a legal address.
- */
-
- decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
-
- if (set_memory_np(decoy_addr, 1))
- pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
-}
-#endif
-
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h
index da5178216da5..2a986d282a97 100644
--- a/include/linux/set_memory.h
+++ b/include/linux/set_memory.h
@@ -17,6 +17,20 @@ static inline int set_memory_x(unsigned long addr, int numpages) { return 0; }
static inline int set_memory_nx(unsigned long addr, int numpages) { return 0; }
#endif
+#ifndef set_mce_nospec
+static inline int set_mce_nospec(unsigned long pfn)
+{
+ return 0;
+}
+#endif
+
+#ifndef clear_mce_nospec
+static inline int clear_mce_nospec(unsigned long pfn)
+{
+ return 0;
+}
+#endif
+
#ifndef CONFIG_ARCH_HAS_MEM_ENCRYPT
static inline int set_memory_encrypted(unsigned long addr, int numpages)
{
^ permalink raw reply related
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Tyler Baicar @ 2018-05-22 2:10 UTC (permalink / raw)
To: Borislav Petkov
Cc: James Morse, Zhengqiang, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
On 5/21/2018 4:44 PM, Borislav Petkov wrote:
> On Mon, May 21, 2018 at 04:34:11PM -0400, Tyler Baicar wrote:
>> Agreed. Sounds good to me.
> Meaning you're sending me a patch soon or ...?
>
> :-D
>
Yes, I'll put a patch together and send it out tomorrow.
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Borislav Petkov @ 2018-05-21 20:44 UTC (permalink / raw)
To: Tyler Baicar
Cc: James Morse, Zhengqiang, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
On Mon, May 21, 2018 at 04:34:11PM -0400, Tyler Baicar wrote:
> Agreed. Sounds good to me.
Meaning you're sending me a patch soon or ...?
:-D
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Tyler Baicar @ 2018-05-21 20:34 UTC (permalink / raw)
To: Borislav Petkov
Cc: James Morse, Zhengqiang, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
On 5/21/2018 1:15 PM, Borislav Petkov wrote:
> On Mon, May 21, 2018 at 09:48:23AM -0400, Tyler Baicar wrote:
>> I don't see an issue with not printing out the long BIOS statement, but the
>> number of DIMM sockets print could still be useful.
> Well, if you wanna dump the number of DIMMs - then maybe that line
> should issue unconditionally. However, "DIMM sockets" is silly - it
> should simply say:
>
> "%d DIMMs detected"
>
> or so.
Agreed. Sounds good to me.
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Borislav Petkov @ 2018-05-21 17:15 UTC (permalink / raw)
To: Tyler Baicar
Cc: James Morse, Zhengqiang, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
On Mon, May 21, 2018 at 09:48:23AM -0400, Tyler Baicar wrote:
> I don't see an issue with not printing out the long BIOS statement, but the
> number of DIMM sockets print could still be useful.
Well, if you wanna dump the number of DIMMs - then maybe that line
should issue unconditionally. However, "DIMM sockets" is silly - it
should simply say:
"%d DIMMs detected"
or so.
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Tyler Baicar @ 2018-05-21 13:48 UTC (permalink / raw)
To: Borislav Petkov, James Morse
Cc: Zhengqiang, mchehab, toshi.kani, linux-edac, linuxarm,
linux-arm-kernel@lists.infradead.org
On 5/18/2018 7:20 AM, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
>
> ARM machines all have DMI tables so if they request hw error reporting
> through GHES, then the driver should be able to detect DIMMs and report
> errors successfully (famous last words :)).
>
> Make the platform-based list x86-specific so that ghes_edac can load on
> ARM.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Reviewed-by: James Morse <james.morse@arm.com>
> Tested-by: James Morse <james.morse@arm.com>
> Cc: Qiang Zheng <zhengqiang10@huawei.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Was it intentional to make idx=0 so that the pr_info later in this function no
longer hits?
I don't see an issue with not printing out the long BIOS statement, but the
number of DIMM sockets print could still be useful.
Thanks,
Tyler
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Zhengqiang @ 2018-05-21 9:39 UTC (permalink / raw)
To: Borislav Petkov, James Morse
Cc: Tyler Baicar, mchehab, toshi.kani, linux-edac, linuxarm,
linux-arm-kernel@lists.infradead.org
Thanks, it works for me.
On 2018/5/18 19:20, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
>
> ARM machines all have DMI tables so if they request hw error reporting
> through GHES, then the driver should be able to detect DIMMs and report
> errors successfully (famous last words :)).
>
> Make the platform-based list x86-specific so that ghes_edac can load on
> ARM.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Reviewed-by: James Morse <james.morse@arm.com>
> Tested-by: James Morse <james.morse@arm.com>
> Cc: Qiang Zheng <zhengqiang10@huawei.com>
> Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
Tested-by: Qiang Zheng <zhengqiang10@huawei.com>
> ---
> drivers/edac/ghes_edac.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 863fbf3db29f..473aeec4b1da 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> struct mem_ctl_info *mci;
> struct edac_mc_layer layers[1];
> struct ghes_edac_dimm_fill dimm_fill;
> - int idx;
> + int idx = -1;
>
> - /* Check if safe to enable on this system */
> - idx = acpi_match_platform_list(plat_list);
> - if (!force_load && idx < 0)
> - return -ENODEV;
> + if (IS_ENABLED(CONFIG_X86)) {
> + /* Check if safe to enable on this system */
> + idx = acpi_match_platform_list(plat_list);
> + if (!force_load && idx < 0)
> + return -ENODEV;
> + } else {
> + idx = 0;
> + }
>
> /*
> * We have only one logical memory controller to which all DIMMs belong.
>
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* EDAC, ghes: Make platform-based whitelisting x86-only
From: Borislav Petkov @ 2018-05-18 11:20 UTC (permalink / raw)
To: James Morse
Cc: Zhengqiang, Tyler Baicar, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
From: Borislav Petkov <bp@suse.de>
ARM machines all have DMI tables so if they request hw error reporting
through GHES, then the driver should be able to detect DIMMs and report
errors successfully (famous last words :)).
Make the platform-based list x86-specific so that ghes_edac can load on
ARM.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Cc: Qiang Zheng <zhengqiang10@huawei.com>
Link: https://lkml.kernel.org/r/1526039543-180996-1-git-send-email-zhengqiang10@huawei.com
---
drivers/edac/ghes_edac.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 863fbf3db29f..473aeec4b1da 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
struct mem_ctl_info *mci;
struct edac_mc_layer layers[1];
struct ghes_edac_dimm_fill dimm_fill;
- int idx;
+ int idx = -1;
- /* Check if safe to enable on this system */
- idx = acpi_match_platform_list(plat_list);
- if (!force_load && idx < 0)
- return -ENODEV;
+ if (IS_ENABLED(CONFIG_X86)) {
+ /* Check if safe to enable on this system */
+ idx = acpi_match_platform_list(plat_list);
+ if (!force_load && idx < 0)
+ return -ENODEV;
+ } else {
+ idx = 0;
+ }
/*
* We have only one logical memory controller to which all DIMMs belong.
^ permalink raw reply related
* ghes_edac: enable HIP08 platform edac driver
From: Borislav Petkov @ 2018-05-18 11:11 UTC (permalink / raw)
To: James Morse
Cc: Zhengqiang, Tyler Baicar, mchehab, toshi.kani, linux-edac,
linuxarm, linux-arm-kernel@lists.infradead.org
On Thu, May 17, 2018 at 07:02:18PM +0100, James Morse wrote:
> v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be
> registered unless ghes_edac is also supported by the platform?
> Shouldn't this be '0' for a silent failure?
https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/commit/?h=for-next&id=cc7f3f132658289b6661ab8294ab08a9d32ea026
> Tested on Seattle and some cranky homebrew-no-DMI firmware:
> Tested-by: James Morse <james.morse@arm.com>
>
> With the ENODEV/0 thing above:
> Reviewed-by: James Morse <james.morse@arm.com>
Thanks, adding.
^ permalink raw reply
* ghes_edac: enable HIP08 platform edac driver
From: Zhengqiang @ 2018-05-18 7:13 UTC (permalink / raw)
To: James Morse, Borislav Petkov, Tyler Baicar
Cc: mchehab, toshi.kani, linux-edac, linuxarm,
linux-arm-kernel@lists.infradead.org
On 2018/5/18 2:02, James Morse wrote:
> Hi guys,
>
> Tyler, Zhengqiang, I assume all your shipped platforms with HEST->GHES entries
> also have DMI tables.
>
Sure, Our ARM64 platform have DMI tables. thanks.
>
> On 16/05/18 19:29, Borislav Petkov wrote:
>> On Wed, May 16, 2018 at 02:38:38PM +0100, James Morse wrote:
>>> XGene has its own edac driver, but it doesn't probe when booted via ACPI so
>>> won't conflict with ghes_edac.
>>
>> Actually it will. EDAC core can have only one EDAC driver loaded. Don't
>> ask me why - it has been that way since forever.
>
> By won't probe I mean it only works on DT systems:
>
> | static const struct of_device_id xgene_edac_of_match[] = {
> | { .compatible = "apm,xgene-edac" },
> | {},
> | };
>
> | .driver = {
> | .name = "xgene-edac",
> | .of_match_table = xgene_edac_of_match,
> | },
>
> To work on a system with GHES it would need an 'struct acpi_device_id' to
> describe the HID (?) and populate driver's acpi_match_table.
>
>
>> We can change it some
>> day but frankly, I don't see reasoning for it. One driver can easily
>> manage *all* error sources on a system, I'd say.
>
> I agree, there is no reason to support two at the same time, if this happens
> then there is probably something wrong with the platform (e.g. races with
> firmware reading the same hardware registers), so we should make some noise.
>
> Xgene's edac driver would be a good example of this, it looks like it reads data
> from some mmio region, if something else is doing the same we're going to make a
> mess.
>
>
>>> So I think we're good to make the whitelist x86 only.
>>> Your diff-hunk makes 'idx=-1', so we always get the 'Unfortunately' warning. I'd
>>> like to suppress this unless force_load has been used.
>>
>> Yeah, we should handle that differently for ARM. Toshi added the idx
>> thing in
>>
>> 5deed6b6a479 ("EDAC, ghes: Add platform check")
>>
>> to dump this when the platform is not whitelisted. So let's do that:
>>
>> ---
>> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
>> index 863fbf3db29f..473aeec4b1da 100644
>> --- a/drivers/edac/ghes_edac.c
>> +++ b/drivers/edac/ghes_edac.c
>> @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
>> struct mem_ctl_info *mci;
>> struct edac_mc_layer layers[1];
>> struct ghes_edac_dimm_fill dimm_fill;
>> - int idx;
>> + int idx = -1;
>>
>> - /* Check if safe to enable on this system */
>> - idx = acpi_match_platform_list(plat_list);
>> - if (!force_load && idx < 0)
>> - return -ENODEV;
>
> v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be
> registered unless ghes_edac is also supported by the platform?
> Shouldn't this be '0' for a silent failure?
>
>
>> + if (IS_ENABLED(CONFIG_X86)) {
>> + /* Check if safe to enable on this system */
>> + idx = acpi_match_platform_list(plat_list);
>> + if (!force_load && idx < 0)
>> + return -ENODEV;
>> + } else {
>> + idx = 0;
>> + }
>>
>> /*
>> * We have only one logical memory controller to which all DIMMs belong.
>
> Tested on Seattle and some cranky homebrew-no-DMI firmware:
> Tested-by: James Morse <james.morse@arm.com>
>
> With the ENODEV/0 thing above:
> Reviewed-by: James Morse <james.morse@arm.com>
>
>
>>> It looks like even the oldest Arm64 ACPI systems have dmi tables, so we can
>>> probably require DMI or the 'force' flag.
>>
>> Well, with the hunk above it would still do ghes_edac_count_dimms() on
>> ARM and if it fails to find something, it will set fake, which is a good
>> sanity-check as it screams loudly. :)
>
>
> Thanks,
>
> James
>
> .
>
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Boris Petkov @ 2018-05-17 19:33 UTC (permalink / raw)
To: Johannes Hirte
Cc: Ghannam, Yazen, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
On Thu, May 17, 2018 at 09:29:23PM +0200, Johannes Hirte wrote:
> Works as expected on my Carrizo.
Thanks, I'll add your Tested-by.
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Johannes Hirte @ 2018-05-17 19:29 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ghannam, Yazen, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
On 2018 Mai 17, Borislav Petkov wrote:
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
>
> Yeah, and we should simply cache all those MSR values as I suggested then.
>
> The patch at the end should fix your issue.
>
Works as expected on my Carrizo.
^ permalink raw reply
* [2/2] x86/MCE/AMD: Read MCx_MISC block addresses on any CPU
From: Boris Petkov @ 2018-05-17 18:31 UTC (permalink / raw)
To: Ghannam, Yazen
Cc: Johannes Hirte, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
From: Borislav Petkov <bp@suse.de>
We used rdmsr_safe_on_cpu() to make sure we're reading the proper CPU's
MISC block addresses. However, that caused trouble with CPU hotplug due
to the _on_cpu() helper issuing an IPI while IRQs are disabled.
But we don't have to do that: the block addresses are the same on any
CPU so we can read them on any CPU. (What practically happens is, we
read them on the BSP and cache them, and for later reads, we service
them from the cache).
Suggested-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index c8e038800591..f591b01930db 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -436,8 +436,7 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
wrmsr(MSR_CU_DEF_ERR, low, high);
}
-static u32 smca_get_block_address(unsigned int cpu, unsigned int bank,
- unsigned int block)
+static u32 smca_get_block_address(unsigned int bank, unsigned int block)
{
u32 low, high;
u32 addr = 0;
@@ -456,13 +455,13 @@ static u32 smca_get_block_address(unsigned int cpu, unsigned int bank,
* For SMCA enabled processors, BLKPTR field of the first MISC register
* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
*/
- if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
+ if (rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
goto out;
if (!(low & MCI_CONFIG_MCAX))
goto out;
- if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
+ if (!rdmsr_safe(MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
(low & MASK_BLKPTR_LO))
addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
@@ -471,7 +470,7 @@ static u32 smca_get_block_address(unsigned int cpu, unsigned int bank,
return addr;
}
-static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 high,
+static u32 get_block_address(u32 current_addr, u32 low, u32 high,
unsigned int bank, unsigned int block)
{
u32 addr = 0, offset = 0;
@@ -480,7 +479,7 @@ static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 hi
return addr;
if (mce_flags.smca)
- return smca_get_block_address(cpu, bank, block);
+ return smca_get_block_address(bank, block);
/* Fall back to method we used for older processors: */
switch (block) {
@@ -558,7 +557,7 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
smca_configure(bank, cpu);
for (block = 0; block < NR_BLOCKS; ++block) {
- address = get_block_address(cpu, address, low, high, bank, block);
+ address = get_block_address(address, low, high, bank, block);
if (!address)
break;
@@ -1175,7 +1174,7 @@ static int allocate_threshold_blocks(unsigned int cpu, unsigned int bank,
if (err)
goto out_free;
recurse:
- address = get_block_address(cpu, address, low, high, bank, ++block);
+ address = get_block_address(address, low, high, bank, ++block);
if (!address)
return 0;
^ permalink raw reply related
* [1/2] x86/MCE/AMD: Cache SMCA MISC block addresses
From: Boris Petkov @ 2018-05-17 18:30 UTC (permalink / raw)
To: Ghannam, Yazen
Cc: Johannes Hirte, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
From: Borislav Petkov <bp@suse.de>
... into a global, two-dimensional array and service subsequent reads
from that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs
with IRQs disabled).
In addition, this fixes a KASAN slab-out-of-bounds read due to wrong
usage of the bank->blocks pointer.
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Fixes: 27bd59502702 ("x86/mce/AMD: Get address from already initialized block")
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++--------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..c8e038800591 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
[SMCA_SMU] = { "smu", "System Management Unit" },
};
+static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
+{
+ [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
+};
+
const char *smca_get_name(enum smca_bank_types t)
{
if (t >= N_SMCA_BANK_TYPES)
@@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int cpu, unsigned int bank,
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
+ /* Check our cache first: */
+ if (smca_bank_addrs[bank][block] != -1)
+ return smca_bank_addrs[bank][block];
+
/*
* For SMCA enabled processors, BLKPTR field of the first MISC register
* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
*/
if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
- return addr;
+ goto out;
if (!(low & MCI_CONFIG_MCAX))
- return addr;
+ goto out;
if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
(low & MASK_BLKPTR_LO))
- return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+ addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+out:
+ smca_bank_addrs[bank][block] = addr;
return addr;
}
@@ -468,18 +479,6 @@ static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 hi
if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
return addr;
- /* Get address from already initialized block. */
- if (per_cpu(threshold_banks, cpu)) {
- struct threshold_bank *bankp = per_cpu(threshold_banks, cpu)[bank];
-
- if (bankp && bankp->blocks) {
- struct threshold_block *blockp = &bankp->blocks[block];
-
- if (blockp)
- return blockp->address;
- }
- }
-
if (mce_flags.smca)
return smca_get_block_address(cpu, bank, block);
^ permalink raw reply related
* ghes_edac: enable HIP08 platform edac driver
From: James Morse @ 2018-05-17 18:02 UTC (permalink / raw)
To: Borislav Petkov, Zhengqiang, Tyler Baicar
Cc: mchehab, toshi.kani, linux-edac, linuxarm,
linux-arm-kernel@lists.infradead.org
Hi guys,
Tyler, Zhengqiang, I assume all your shipped platforms with HEST->GHES entries
also have DMI tables.
On 16/05/18 19:29, Borislav Petkov wrote:
> On Wed, May 16, 2018 at 02:38:38PM +0100, James Morse wrote:
>> XGene has its own edac driver, but it doesn't probe when booted via ACPI so
>> won't conflict with ghes_edac.
>
> Actually it will. EDAC core can have only one EDAC driver loaded. Don't
> ask me why - it has been that way since forever.
By won't probe I mean it only works on DT systems:
| static const struct of_device_id xgene_edac_of_match[] = {
| { .compatible = "apm,xgene-edac" },
| {},
| };
| .driver = {
| .name = "xgene-edac",
| .of_match_table = xgene_edac_of_match,
| },
To work on a system with GHES it would need an 'struct acpi_device_id' to
describe the HID (?) and populate driver's acpi_match_table.
> We can change it some
> day but frankly, I don't see reasoning for it. One driver can easily
> manage *all* error sources on a system, I'd say.
I agree, there is no reason to support two at the same time, if this happens
then there is probably something wrong with the platform (e.g. races with
firmware reading the same hardware registers), so we should make some noise.
Xgene's edac driver would be a good example of this, it looks like it reads data
from some mmio region, if something else is doing the same we're going to make a
mess.
>> So I think we're good to make the whitelist x86 only.
>> Your diff-hunk makes 'idx=-1', so we always get the 'Unfortunately' warning. I'd
>> like to suppress this unless force_load has been used.
>
> Yeah, we should handle that differently for ARM. Toshi added the idx
> thing in
>
> 5deed6b6a479 ("EDAC, ghes: Add platform check")
>
> to dump this when the platform is not whitelisted. So let's do that:
>
> ---
> diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> index 863fbf3db29f..473aeec4b1da 100644
> --- a/drivers/edac/ghes_edac.c
> +++ b/drivers/edac/ghes_edac.c
> @@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
> struct mem_ctl_info *mci;
> struct edac_mc_layer layers[1];
> struct ghes_edac_dimm_fill dimm_fill;
> - int idx;
> + int idx = -1;
>
> - /* Check if safe to enable on this system */
> - idx = acpi_match_platform_list(plat_list);
> - if (!force_load && idx < 0)
> - return -ENODEV;
v4.17-rc5 has 'return 0' here. Wouldn't this change means no ghes can be
registered unless ghes_edac is also supported by the platform?
Shouldn't this be '0' for a silent failure?
> + if (IS_ENABLED(CONFIG_X86)) {
> + /* Check if safe to enable on this system */
> + idx = acpi_match_platform_list(plat_list);
> + if (!force_load && idx < 0)
> + return -ENODEV;
> + } else {
> + idx = 0;
> + }
>
> /*
> * We have only one logical memory controller to which all DIMMs belong.
Tested on Seattle and some cranky homebrew-no-DMI firmware:
Tested-by: James Morse <james.morse@arm.com>
With the ENODEV/0 thing above:
Reviewed-by: James Morse <james.morse@arm.com>
>> It looks like even the oldest Arm64 ACPI systems have dmi tables, so we can
>> probably require DMI or the 'force' flag.
>
> Well, with the hunk above it would still do ghes_edac_count_dimms() on
> ARM and if it fails to find something, it will set fake, which is a good
> sanity-check as it screams loudly. :)
Thanks,
James
---
To unsubscribe from this list: send the line "unsubscribe linux-edac" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Yazen Ghannam @ 2018-05-17 14:05 UTC (permalink / raw)
To: Borislav Petkov
Cc: Johannes Hirte, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
> -----Original Message-----
> From: Borislav Petkov <bp@suse.de>
> Sent: Thursday, May 17, 2018 9:44 AM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: Johannes Hirte <johannes.hirte@datenkhaos.de>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com;
> x86@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
>
> On Thu, May 17, 2018 at 01:04:19PM +0000, Ghannam, Yazen wrote:
...
>
> I check PFEH is enabled how?
>
If MISC0 is RAZ then you can assume PFEH is enabled. There should be a BIOS
option to disable it.
BTW, I just tried you patch with PFEH disabled and it seems to work fine.
...
> > Since we're caching the values during init, we can drop all the
> > *_on_cpu() calls. What do you think?
>
> Well, if they're all the same on all CPUs, sure. That's your call.
>
Let's drop them. We won't need them since we're caching the values during
init. And the init code is run on the target CPU.
We can just make smca_bank_addrs[][] into per_cpu when we need to support
different values on different CPUs.
Thanks,
Yazen
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Boris Petkov @ 2018-05-17 13:44 UTC (permalink / raw)
To: Ghannam, Yazen
Cc: Johannes Hirte, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
On Thu, May 17, 2018 at 01:04:19PM +0000, Ghannam, Yazen wrote:
> Yes, you're right. I thought using the existing data structures would work, but it
> seems I messed up the implementation.
Not only that - your idea wouldn't fly because the per-CPU stuff you
were using gets torn down when the CPU goes offline so you can't use
them on resume because they're not there yet.
> Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
> enabled on your system? That would cause MISC0 to be RAZ so we won't
> get the MISC1 address. I'll try it myself also and let you know.
I check PFEH is enabled how?
> I think this good for now. We'll probably need to change it in the
> future, but maybe we can clean up all the thresholding blocks code and
> make it simpler when we do change it.
Ok.
> This hunk could go above the !block. Though maybe the macro is lighter than the
> array lookup. It'll work either way, but I'm just thinking out loud.
Yeah, it doesn't matter in that case.
> Since we're caching the values during init, we can drop all the
> *_on_cpu() calls. What do you think?
Well, if they're all the same on all CPUs, sure. That's your call.
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Yazen Ghannam @ 2018-05-17 13:04 UTC (permalink / raw)
To: Borislav Petkov, Johannes Hirte
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
tony.luck@intel.com, x86@kernel.org
> -----Original Message-----
> From: Borislav Petkov <bp@suse.de>
> Sent: Thursday, May 17, 2018 6:41 AM
> To: Johannes Hirte <johannes.hirte@datenkhaos.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com;
> x86@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
>
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
>
> Yeah, and we should simply cache all those MSR values as I suggested then.
>
Yes, you're right. I thought using the existing data structures would work, but it
seems I messed up the implementation.
> The patch at the end should fix your issue.
>
> Yazen, so I'm working on the assumption that all addresses are the same
> on every core - I dumped them and it looks like this:
>
> 128 bank: 00, block: 0 : 0xc0002003
> 128 bank: 00, block: 1 : 0x0
> 128 bank: 01, block: 0 : 0xc0002013
> 128 bank: 01, block: 1 : 0x0
> 128 bank: 02, block: 0 : 0xc0002023
> 128 bank: 02, block: 1 : 0x0
> 128 bank: 03, block: 0 : 0xc0002033
> 128 bank: 03, block: 1 : 0x0
> 128 bank: 04, block: 0 : 0x0
> 128 bank: 05, block: 0 : 0xc0002053
> 128 bank: 05, block: 1 : 0x0
> 128 bank: 06, block: 0 : 0xc0002063
> 128 bank: 06, block: 1 : 0x0
> 128 bank: 07, block: 0 : 0xc0002073
> 128 bank: 07, block: 1 : 0x0
> 128 bank: 08, block: 0 : 0xc0002083
> 128 bank: 08, block: 1 : 0x0
> 128 bank: 09, block: 0 : 0xc0002093
> 128 bank: 09, block: 1 : 0x0
> 128 bank: 10, block: 0 : 0xc00020a3
> 128 bank: 10, block: 1 : 0x0
> 128 bank: 11, block: 0 : 0xc00020b3
> 128 bank: 11, block: 1 : 0x0
> 128 bank: 12, block: 0 : 0xc00020c3
> 128 bank: 12, block: 1 : 0x0
> 128 bank: 13, block: 0 : 0xc00020d3
> 128 bank: 13, block: 1 : 0x0
> 128 bank: 14, block: 0 : 0xc00020e3
> 128 bank: 14, block: 1 : 0x0
> 128 bank: 15, block: 0 : 0xc00020f3
> 128 bank: 15, block: 1 : 0x0
> 128 bank: 16, block: 0 : 0xc0002103
> 128 bank: 16, block: 1 : 0x0
Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
enabled on your system? That would cause MISC0 to be RAZ so we won't
get the MISC1 address. I'll try it myself also and let you know.
> 128 bank: 17, block: 0 : 0xc0002113
> 128 bank: 17, block: 1 : 0x0
> 128 bank: 18, block: 0 : 0xc0002123
> 128 bank: 18, block: 1 : 0x0
> 128 bank: 19, block: 0 : 0xc0002133
> 128 bank: 19, block: 1 : 0x0
> 128 bank: 20, block: 0 : 0xc0002143
> 128 bank: 20, block: 1 : 0x0
> 128 bank: 21, block: 0 : 0xc0002153
> 128 bank: 21, block: 1 : 0x0
> 128 bank: 22, block: 0 : 0xc0002163
> 128 bank: 22, block: 1 : 0x0
>
> so you have 128 times the same address on a 128 core Zen box.
>
> If that is correct, then we can use this nice simplification and cache
> them globally and not per-CPU. Seems to work here. Scream if something's
> amiss.
>
I think this good for now. We'll probably need to change it in the future, but
maybe we can clean up all the thresholding blocks code and make it simpler
when we do change it.
> Thx.
>
> ---
> From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00
> 2001
> From: Borislav Petkov <bp@susede>
> Date: Thu, 17 May 2018 10:46:26 +0200
> Subject: [PATCH] array caching
>
> Not-Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
> arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++--------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f7666eef4a87..c8e038800591 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
> [SMCA_SMU] = { "smu", "System Management Unit" },
> };
>
> +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
> +{
> + [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
> +};
> +
> const char *smca_get_name(enum smca_bank_types t)
> {
> if (t >= N_SMCA_BANK_TYPES)
> @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int
> cpu, unsigned int bank,
> if (!block)
> return MSR_AMD64_SMCA_MCx_MISC(bank);
>
> + /* Check our cache first: */
> + if (smca_bank_addrs[bank][block] != -1)
> + return smca_bank_addrs[bank][block];
> +
This hunk could go above the !block. Though maybe the macro is lighter than the
array lookup. It'll work either way, but I'm just thinking out loud.
> /*
> * For SMCA enabled processors, BLKPTR field of the first MISC register
> * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-
> 4).
> */
> if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank),
> &low, &high))
> - return addr;
> + goto out;
>
> if (!(low & MCI_CONFIG_MCAX))
> - return addr;
> + goto out;
>
> if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank),
> &low, &high) &&
> (low & MASK_BLKPTR_LO))
> - return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
> + addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
>
> +out:
> + smca_bank_addrs[bank][block] = addr;
> return addr;
> }
>
Since we're caching the values during init, we can drop all the *_on_cpu() calls.
What do you think?
Thanks,
Yazen
^ permalink raw reply
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Boris Petkov @ 2018-05-17 10:41 UTC (permalink / raw)
To: Johannes Hirte
Cc: Ghannam, Yazen, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> Maybe I'm missing something, but those RDMSR IPSs don't happen on
> pre-SMCA systems, right? So the caching should be avoided here, cause
> the whole lookup looks more expensive to me than the simple switch-block
> in get_block_address.
Yeah, and we should simply cache all those MSR values as I suggested then.
The patch at the end should fix your issue.
Yazen, so I'm working on the assumption that all addresses are the same
on every core - I dumped them and it looks like this:
128 bank: 00, block: 0 : 0xc0002003
128 bank: 00, block: 1 : 0x0
128 bank: 01, block: 0 : 0xc0002013
128 bank: 01, block: 1 : 0x0
128 bank: 02, block: 0 : 0xc0002023
128 bank: 02, block: 1 : 0x0
128 bank: 03, block: 0 : 0xc0002033
128 bank: 03, block: 1 : 0x0
128 bank: 04, block: 0 : 0x0
128 bank: 05, block: 0 : 0xc0002053
128 bank: 05, block: 1 : 0x0
128 bank: 06, block: 0 : 0xc0002063
128 bank: 06, block: 1 : 0x0
128 bank: 07, block: 0 : 0xc0002073
128 bank: 07, block: 1 : 0x0
128 bank: 08, block: 0 : 0xc0002083
128 bank: 08, block: 1 : 0x0
128 bank: 09, block: 0 : 0xc0002093
128 bank: 09, block: 1 : 0x0
128 bank: 10, block: 0 : 0xc00020a3
128 bank: 10, block: 1 : 0x0
128 bank: 11, block: 0 : 0xc00020b3
128 bank: 11, block: 1 : 0x0
128 bank: 12, block: 0 : 0xc00020c3
128 bank: 12, block: 1 : 0x0
128 bank: 13, block: 0 : 0xc00020d3
128 bank: 13, block: 1 : 0x0
128 bank: 14, block: 0 : 0xc00020e3
128 bank: 14, block: 1 : 0x0
128 bank: 15, block: 0 : 0xc00020f3
128 bank: 15, block: 1 : 0x0
128 bank: 16, block: 0 : 0xc0002103
128 bank: 16, block: 1 : 0x0
128 bank: 17, block: 0 : 0xc0002113
128 bank: 17, block: 1 : 0x0
128 bank: 18, block: 0 : 0xc0002123
128 bank: 18, block: 1 : 0x0
128 bank: 19, block: 0 : 0xc0002133
128 bank: 19, block: 1 : 0x0
128 bank: 20, block: 0 : 0xc0002143
128 bank: 20, block: 1 : 0x0
128 bank: 21, block: 0 : 0xc0002153
128 bank: 21, block: 1 : 0x0
128 bank: 22, block: 0 : 0xc0002163
128 bank: 22, block: 1 : 0x0
so you have 128 times the same address on a 128 core Zen box.
If that is correct, then we can use this nice simplification and cache
them globally and not per-CPU. Seems to work here. Scream if something's
amiss.
Thx.
---
From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@susede>
Date: Thu, 17 May 2018 10:46:26 +0200
Subject: [PATCH] array caching
Not-Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++--------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index f7666eef4a87..c8e038800591 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
[SMCA_SMU] = { "smu", "System Management Unit" },
};
+static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
+{
+ [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
+};
+
const char *smca_get_name(enum smca_bank_types t)
{
if (t >= N_SMCA_BANK_TYPES)
@@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int cpu, unsigned int bank,
if (!block)
return MSR_AMD64_SMCA_MCx_MISC(bank);
+ /* Check our cache first: */
+ if (smca_bank_addrs[bank][block] != -1)
+ return smca_bank_addrs[bank][block];
+
/*
* For SMCA enabled processors, BLKPTR field of the first MISC register
* (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-4).
*/
if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), &low, &high))
- return addr;
+ goto out;
if (!(low & MCI_CONFIG_MCAX))
- return addr;
+ goto out;
if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), &low, &high) &&
(low & MASK_BLKPTR_LO))
- return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+ addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
+out:
+ smca_bank_addrs[bank][block] = addr;
return addr;
}
@@ -468,18 +479,6 @@ static u32 get_block_address(unsigned int cpu, u32 current_addr, u32 low, u32 hi
if ((bank >= mca_cfg.banks) || (block >= NR_BLOCKS))
return addr;
- /* Get address from already initialized block. */
- if (per_cpu(threshold_banks, cpu)) {
- struct threshold_bank *bankp = per_cpu(threshold_banks, cpu)[bank];
-
- if (bankp && bankp->blocks) {
- struct threshold_block *blockp = &bankp->blocks[block];
-
- if (blockp)
- return blockp->address;
- }
- }
-
if (mce_flags.smca)
return smca_get_block_address(cpu, bank, block);
^ permalink raw reply related
* [3/3] x86/MCE/AMD: Get address from already initialized block
From: Johannes Hirte @ 2018-05-17 6:49 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ghannam, Yazen, linux-edac@vger.kernel.org,
linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org
On 2018 Mai 17, Borislav Petkov wrote:
> On Tue, May 15, 2018 at 11:39:54AM +0200, Johannes Hirte wrote:
> > The out-of-bound access happens in get_block_address:
> >
> > if (bankp && bankp->blocks) {
> > struct threshold_block *blockp blockp = &bankp->blocks[block];
> >
> > with block=1. This doesn't exists. I don't even find any array here.
> > There is a linked list, created in allocate_threshold_blocks. On my
> > system I get 17 lists with one element each.
>
> Yes, what a mess this is. ;-\
>
> There's no such thing as ->blocks[block] array. We assign simply the
> threshold_block to it in allocate_threshold_blocks:
>
> per_cpu(threshold_banks, cpu)[bank]->blocks = b;
>
> And I can't say the design of this thing is really friendly but it is
> still no excuse that I missed that during review. Grrr.
>
> So, Yazen, what really needs to happen here is to iterate the
> bank->blocks->miscj list to find the block you're looking for and return
> its address, the opposite to this here:
>
> if (per_cpu(threshold_banks, cpu)[bank]->blocks) {
> list_add(&b->miscj,
> &per_cpu(threshold_banks, cpu)[bank]->blocks->miscj);
> } else {
> per_cpu(threshold_banks, cpu)[bank]->blocks = b;
> }
>
> and don't forget to look at ->blocks itself.
>
> And then you need to make sure that searching for block addresses still
> works when resuming from suspend so that you can avoid the RDMSR IPIs.
>
Maybe I'm missing something, but those RDMSR IPSs don't happen on
pre-SMCA systems, right? So the caching should be avoided here, cause
the whole lookup looks more expensive to me than the simple switch-block
in get_block_address.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox