[PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs
@ 2024-06-25 19:56 Avadhut Naik
  2024-06-25 19:56 ` [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Avadhut Naik @ 2024-06-25 19:56 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel, linux-acpi
  Cc: linux-kernel, bp, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen,
	avadnaik

This patchset adds a new wrapper for struct mce to prevent its bloating
and export vendor specific error information. Additionally, support is
also introduced for two new "syndrome" MSRs used in newer AMD Scalable
MCA (SMCA) systems. Also, a new "FRU Text in MCA" feature that uses these
new "syndrome" MSRs has been addded.

Patch 1 adds the new wrapper structure mce_hw_err for the struct mce
while also modifying the mce_record tracepoint to use the new wrapper.

Patch 2 adds support for the new "syndrome" registers. They are read/printed
wherever the existing MCA_SYND register is used.

Patch 3 updates the function that pulls MCA information from UEFI x86
Common Platform Error Records (CPERs) to handle systems that support the
new registers.

Patch 4 adds support to the AMD MCE decoder module to detect and use the
"FRU Text in MCA" feature which leverages the new registers.

NOTE:

This set was initially submitted as part of the larger MCA Updates set.

v1: https://lore.kernel.org/linux-edac/20231118193248.1296798-1-yazen.ghannam@amd.com/
v2: https://lore.kernel.org/linux-edac/20240404151359.47970-1-yazen.ghannam@amd.com/

However, since the MCA Updates set has been split up into smaller sets,
this set, going forward, will be submitted independently.

Having said that, this set set depends on and applies cleanly on top of
the below two sets.

[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

Changes in v2:
 - Drop dependencies on sets [1] and [2] above and rebase on top of
   tip/master. (Boris)

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (2):
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h              |  20 ++-
 arch/x86/kernel/cpu/mce/amd.c           |  33 ++--
 arch/x86/kernel/cpu/mce/apei.c          | 119 ++++++++++----
 arch/x86/kernel/cpu/mce/core.c          | 201 ++++++++++++++----------
 arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
 arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
 arch/x86/kernel/cpu/mce/inject.c        |   4 +-
 arch/x86/kernel/cpu/mce/internal.h      |   4 +-
 drivers/acpi/acpi_extlog.c              |   2 +-
 drivers/acpi/nfit/mce.c                 |   2 +-
 drivers/edac/i7core_edac.c              |   2 +-
 drivers/edac/igen6_edac.c               |   2 +-
 drivers/edac/mce_amd.c                  |  27 +++-
 drivers/edac/pnd2_edac.c                |   2 +-
 drivers/edac/sb_edac.c                  |   2 +-
 drivers/edac/skx_common.c               |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
 drivers/ras/amd/fmpm.c                  |   2 +-
 drivers/ras/cec.c                       |   2 +-
 include/trace/events/mce.h              |  51 +++---
 20 files changed, 316 insertions(+), 185 deletions(-)

base-commit: 4fe5c16f5e5e0bd1a71a5ac79b5870f91b6b8e81
-- 
2.34.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-06-25 19:56 [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
@ 2024-06-25 19:56 ` Avadhut Naik
  2024-06-26 10:44   ` Borislav Petkov
  2024-06-25 19:56 ` [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Avadhut Naik @ 2024-06-25 19:56 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel, linux-acpi
  Cc: linux-kernel, bp, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen,
	avadnaik

Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.

The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.

Note: Some Checkpatch checks have been ignored to maintain coding style.

[Yazen: Add last commit message paragraph.]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/include/asm/mce.h              |   6 +-
 arch/x86/kernel/cpu/mce/amd.c           |  29 ++--
 arch/x86/kernel/cpu/mce/apei.c          |  54 +++----
 arch/x86/kernel/cpu/mce/core.c          | 178 +++++++++++++-----------
 arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
 arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
 arch/x86/kernel/cpu/mce/inject.c        |   4 +-
 arch/x86/kernel/cpu/mce/internal.h      |   4 +-
 drivers/acpi/acpi_extlog.c              |   2 +-
 drivers/acpi/nfit/mce.c                 |   2 +-
 drivers/edac/i7core_edac.c              |   2 +-
 drivers/edac/igen6_edac.c               |   2 +-
 drivers/edac/mce_amd.c                  |   2 +-
 drivers/edac/pnd2_edac.c                |   2 +-
 drivers/edac/sb_edac.c                  |   2 +-
 drivers/edac/skx_common.c               |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
 drivers/ras/amd/fmpm.c                  |   2 +-
 drivers/ras/cec.c                       |   2 +-
 include/trace/events/mce.h              |  42 +++---
 20 files changed, 199 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..e955edb22897 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,6 +187,10 @@ enum mce_notifier_prios {
 	MCE_PRIO_HIGHEST = MCE_PRIO_CEC
 };
 
+struct mce_hw_err {
+	struct mce m;
+};
+
 struct notifier_block;
 extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
@@ -222,7 +226,7 @@ static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 #endif
 
 void mce_setup(struct mce *m);
-void mce_log(struct mce *m);
+void mce_log(struct mce_hw_err *err);
 DECLARE_PER_CPU(struct device *, mce_device);
 
 /* Maximum number of MCA banks per CPU. */
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9a0133ef7e20..cb7dc0b1aa50 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -778,29 +778,32 @@ bool amd_mce_usable_address(struct mce *m)
 
 static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 {
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m = &err.m;
 
-	mce_setup(&m);
+	memset(&err, 0, sizeof(struct mce_hw_err));
 
-	m.status = status;
-	m.misc   = misc;
-	m.bank   = bank;
-	m.tsc	 = rdtsc();
+	mce_setup(m);
 
-	if (m.status & MCI_STATUS_ADDRV) {
-		m.addr = addr;
+	m->status = status;
+	m->misc   = misc;
+	m->bank   = bank;
+	m->tsc	 = rdtsc();
 
-		smca_extract_err_addr(&m);
+	if (m->status & MCI_STATUS_ADDRV) {
+		m->addr = addr;
+
+		smca_extract_err_addr(m);
 	}
 
 	if (mce_flags.smca) {
-		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
+		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m.status & MCI_STATUS_SYNDV)
-			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+		if (m->status & MCI_STATUS_SYNDV)
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
 	}
 
-	mce_log(&m);
+	mce_log(&err);
 }
 
 DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7f7309ff67d0..b8f4e75fb8a7 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -28,9 +28,12 @@
 
 void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 {
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m = &err.m;
 	int lsb;
 
+	memset(&err, 0, sizeof(struct mce_hw_err));
+
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
 		return;
 
@@ -44,30 +47,33 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 	else
 		lsb = PAGE_SHIFT;
 
-	mce_setup(&m);
-	m.bank = -1;
+	mce_setup(m);
+	m->bank = -1;
 	/* Fake a memory read error with unknown channel */
-	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
-	m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;
+	m->status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
+	m->misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;
 
 	if (severity >= GHES_SEV_RECOVERABLE)
-		m.status |= MCI_STATUS_UC;
+		m->status |= MCI_STATUS_UC;
 
 	if (severity >= GHES_SEV_PANIC) {
-		m.status |= MCI_STATUS_PCC;
-		m.tsc = rdtsc();
+		m->status |= MCI_STATUS_PCC;
+		m->tsc = rdtsc();
 	}
 
-	m.addr = mem_err->physical_addr;
-	mce_log(&m);
+	m->addr = mem_err->physical_addr;
+	mce_log(&err);
 }
 EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
+	struct mce_hw_err err;
+	struct mce *m = &err.m;
 	unsigned int cpu;
-	struct mce m;
+
+	memset(&err, 0, sizeof(struct mce_hw_err));
 
 	if (!boot_cpu_has(X86_FEATURE_SMCA))
 		return -EINVAL;
@@ -97,29 +103,29 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	if (ctx_info->reg_arr_size < 48)
 		return -EINVAL;
 
-	mce_setup(&m);
+	mce_setup(m);
 
-	m.extcpu = -1;
-	m.socketid = -1;
+	m->extcpu = -1;
+	m->socketid = -1;
 
 	for_each_possible_cpu(cpu) {
 		if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
-			m.extcpu = cpu;
-			m.socketid = cpu_data(m.extcpu).topo.pkg_id;
+			m->extcpu = cpu;
+			m->socketid = cpu_data(m->extcpu).topo.pkg_id;
 			break;
 		}
 	}
 
-	m.apicid = lapic_id;
-	m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
-	m.status = *i_mce;
-	m.addr = *(i_mce + 1);
-	m.misc = *(i_mce + 2);
+	m->apicid = lapic_id;
+	m->bank = (ctx_info->msr_addr >> 4) & 0xFF;
+	m->status = *i_mce;
+	m->addr = *(i_mce + 1);
+	m->misc = *(i_mce + 2);
 	/* Skipping MCA_CONFIG */
-	m.ipid = *(i_mce + 4);
-	m.synd = *(i_mce + 5);
+	m->ipid = *(i_mce + 4);
+	m->synd = *(i_mce + 5);
 
-	mce_log(&m);
+	mce_log(&err);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b85ec7a4ec9e..6225143b9b14 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -88,7 +88,7 @@ struct mca_config mca_cfg __read_mostly = {
 	.monarch_timeout = -1
 };
 
-static DEFINE_PER_CPU(struct mce, mces_seen);
+static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
 static unsigned long mce_need_notify;
 
 /*
@@ -136,9 +136,9 @@ void mce_setup(struct mce *m)
 DEFINE_PER_CPU(struct mce, injectm);
 EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
-void mce_log(struct mce *m)
+void mce_log(struct mce_hw_err *err)
 {
-	if (!mce_gen_pool_add(m))
+	if (!mce_gen_pool_add(err))
 		irq_work_queue(&mce_irq_work);
 }
 EXPORT_SYMBOL_GPL(mce_log);
@@ -159,8 +159,10 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
 
-static void __print_mce(struct mce *m)
+static void __print_mce(struct mce_hw_err *err)
 {
+	struct mce *m = &err->m;
+
 	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
 		 m->extcpu,
 		 (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
@@ -202,9 +204,11 @@ static void __print_mce(struct mce *m)
 		m->microcode);
 }
 
-static void print_mce(struct mce *m)
+static void print_mce(struct mce_hw_err *err)
 {
-	__print_mce(m);
+	struct mce *m = &err->m;
+
+	__print_mce(err);
 
 	if (m->cpuvendor != X86_VENDOR_AMD && m->cpuvendor != X86_VENDOR_HYGON)
 		pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
@@ -239,7 +243,7 @@ static const char *mce_dump_aux_info(struct mce *m)
 	return NULL;
 }
 
-static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
+static noinstr void mce_panic(const char *msg, struct mce_hw_err *final, char *exp)
 {
 	struct llist_node *pending;
 	struct mce_evt_llist *l;
@@ -270,20 +274,22 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	pending = mce_gen_pool_prepare_records();
 	/* First print corrected ones that are still unlogged */
 	llist_for_each_entry(l, pending, llnode) {
-		struct mce *m = &l->mce;
+		struct mce_hw_err *err = &l->err;
+		struct mce *m = &err->m;
 		if (!(m->status & MCI_STATUS_UC)) {
-			print_mce(m);
+			print_mce(err);
 			if (!apei_err)
 				apei_err = apei_write_mce(m);
 		}
 	}
 	/* Now print uncorrected but with the final one last */
 	llist_for_each_entry(l, pending, llnode) {
-		struct mce *m = &l->mce;
+		struct mce_hw_err *err = &l->err;
+		struct mce *m = &err->m;
 		if (!(m->status & MCI_STATUS_UC))
 			continue;
-		if (!final || mce_cmp(m, final)) {
-			print_mce(m);
+		if (!final || mce_cmp(m, &final->m)) {
+			print_mce(err);
 			if (!apei_err)
 				apei_err = apei_write_mce(m);
 		}
@@ -291,12 +297,12 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (final) {
 		print_mce(final);
 		if (!apei_err)
-			apei_err = apei_write_mce(final);
+			apei_err = apei_write_mce(&final->m);
 	}
 	if (exp)
 		pr_emerg(HW_ERR "Machine check: %s\n", exp);
 
-	memmsg = mce_dump_aux_info(final);
+	memmsg = mce_dump_aux_info(&final->m);
 	if (memmsg)
 		pr_emerg(HW_ERR "Machine check: %s\n", memmsg);
 
@@ -311,9 +317,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 		 * panic.
 		 */
 		if (kexec_crash_loaded()) {
-			if (final && (final->status & MCI_STATUS_ADDRV)) {
+			if (final && (final->m.status & MCI_STATUS_ADDRV)) {
 				struct page *p;
-				p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+				p = pfn_to_online_page(final->m.addr >> PAGE_SHIFT);
 				if (p)
 					SetPageHWPoison(p);
 			}
@@ -562,13 +568,13 @@ EXPORT_SYMBOL_GPL(mce_is_correctable);
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = (struct mce_hw_err *)data;
 
-	if (!m)
+	if (!err)
 		return NOTIFY_DONE;
 
 	/* Emit the trace record: */
-	trace_mce_record(m);
+	trace_mce_record(err);
 
 	set_bit(0, &mce_need_notify);
 
@@ -585,7 +591,8 @@ static struct notifier_block early_nb = {
 static int uc_decode_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce_hw_err *err = (struct mce_hw_err *)data;
+	struct mce *mce = &err->m;
 	unsigned long pfn;
 
 	if (!mce || !mce_usable_address(mce))
@@ -612,13 +619,13 @@ static struct notifier_block mce_uc_nb = {
 static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = (struct mce_hw_err *)data;
 
-	if (!m)
+	if (!err)
 		return NOTIFY_DONE;
 
-	if (mca_cfg.print_all || !m->kflags)
-		__print_mce(m);
+	if (mca_cfg.print_all || !(err->m.kflags))
+		__print_mce(err);
 
 	return NOTIFY_DONE;
 }
@@ -680,26 +687,29 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m = &err.m;
 	int i;
 
+	memset(&err, 0, sizeof(struct mce_hw_err));
+
 	this_cpu_inc(mce_poll_count);
 
-	mce_gather_info(&m, NULL);
+	mce_gather_info(m, NULL);
 
 	if (flags & MCP_TIMESTAMP)
-		m.tsc = rdtsc();
+		m->tsc = rdtsc();
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
 			continue;
 
-		m.misc = 0;
-		m.addr = 0;
-		m.bank = i;
+		m->misc = 0;
+		m->addr = 0;
+		m->bank = i;
 
 		barrier();
-		m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
+		m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
 
 		/*
 		 * Update storm tracking here, before checking for the
@@ -709,17 +719,17 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 * storm status.
 		 */
 		if (!mca_cfg.cmci_disabled)
-			mce_track_storm(&m);
+			mce_track_storm(m);
 
 		/* If this entry is not valid, ignore it */
-		if (!(m.status & MCI_STATUS_VAL))
+		if (!(m->status & MCI_STATUS_VAL))
 			continue;
 
 		/*
 		 * If we are logging everything (at CPU online) or this
 		 * is a corrected error, then we must log it.
 		 */
-		if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC))
+		if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
 			goto log_it;
 
 		/*
@@ -729,20 +739,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 * everything else.
 		 */
 		if (!mca_cfg.ser) {
-			if (m.status & MCI_STATUS_UC)
+			if (m->status & MCI_STATUS_UC)
 				continue;
 			goto log_it;
 		}
 
 		/* Log "not enabled" (speculative) errors */
-		if (!(m.status & MCI_STATUS_EN))
+		if (!(m->status & MCI_STATUS_EN))
 			goto log_it;
 
 		/*
 		 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
 		 * UC == 1 && PCC == 0 && S == 0
 		 */
-		if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S))
+		if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
 			goto log_it;
 
 		/*
@@ -756,20 +766,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (flags & MCP_DONTLOG)
 			goto clear_it;
 
-		mce_read_aux(&m, i);
-		m.severity = mce_severity(&m, NULL, NULL, false);
+		mce_read_aux(m, i);
+		m->severity = mce_severity(m, NULL, NULL, false);
 		/*
 		 * Don't get the IP here because it's unlikely to
 		 * have anything to do with the actual error location.
 		 */
 
-		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
+		if (mca_cfg.dont_log_ce && !mce_usable_address(m))
 			goto clear_it;
 
 		if (flags & MCP_QUEUE_LOG)
-			mce_gen_pool_add(&m);
+			mce_gen_pool_add(&err);
 		else
-			mce_log(&m);
+			mce_log(&err);
 
 clear_it:
 		/*
@@ -1005,6 +1015,7 @@ static noinstr int mce_timed_out(u64 *t, const char *msg)
 static void mce_reign(void)
 {
 	int cpu;
+	struct mce_hw_err *err = NULL;
 	struct mce *m = NULL;
 	int global_worst = 0;
 	char *msg = NULL;
@@ -1015,11 +1026,13 @@ static void mce_reign(void)
 	 * Grade the severity of the errors of all the CPUs.
 	 */
 	for_each_possible_cpu(cpu) {
-		struct mce *mtmp = &per_cpu(mces_seen, cpu);
+		struct mce_hw_err *etmp = &per_cpu(hw_errs_seen, cpu);
+		struct mce *mtmp = &etmp->m;
 
 		if (mtmp->severity > global_worst) {
 			global_worst = mtmp->severity;
-			m = &per_cpu(mces_seen, cpu);
+			err = &per_cpu(hw_errs_seen, cpu);
+			m = &err->m;
 		}
 	}
 
@@ -1031,7 +1044,7 @@ static void mce_reign(void)
 	if (m && global_worst >= MCE_PANIC_SEVERITY) {
 		/* call mce_severity() to get "msg" for panic */
 		mce_severity(m, NULL, &msg, true);
-		mce_panic("Fatal machine check", m, msg);
+		mce_panic("Fatal machine check", err, msg);
 	}
 
 	/*
@@ -1048,11 +1061,11 @@ static void mce_reign(void)
 		mce_panic("Fatal machine check from unknown source", NULL, NULL);
 
 	/*
-	 * Now clear all the mces_seen so that they don't reappear on
+	 * Now clear all the hw_errs_seen so that they don't reappear on
 	 * the next mce.
 	 */
 	for_each_possible_cpu(cpu)
-		memset(&per_cpu(mces_seen, cpu), 0, sizeof(struct mce));
+		memset(&per_cpu(hw_errs_seen, cpu), 0, sizeof(struct mce_hw_err));
 }
 
 static atomic_t global_nwo;
@@ -1256,12 +1269,13 @@ static noinstr bool mce_check_crashing_cpu(void)
 }
 
 static __always_inline int
-__mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final,
+__mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
 		unsigned long *toclear, unsigned long *valid_banks, int no_way_out,
 		int *worst)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
 	struct mca_config *cfg = &mca_cfg;
+	struct mce *m = &err->m;
 	int severity, i, taint = 0;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -1317,7 +1331,7 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final,
 		 * done in #MC context, where instrumentation is disabled.
 		 */
 		instrumentation_begin();
-		mce_log(m);
+		mce_log(err);
 		instrumentation_end();
 
 		if (severity > *worst) {
@@ -1387,8 +1401,9 @@ static void kill_me_never(struct callback_head *cb)
 		set_mce_nospec(pfn);
 }
 
-static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *))
+static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(struct callback_head *))
 {
+	struct mce *m = &err->m;
 	int count = ++current->mce_count;
 
 	/* First call, save all the details */
@@ -1402,11 +1417,12 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba
 
 	/* Ten is likely overkill. Don't expect more than two faults before task_work() */
 	if (count > 10)
-		mce_panic("Too many consecutive machine checks while accessing user data", m, msg);
+		mce_panic("Too many consecutive machine checks while accessing user data",
+			  err, msg);
 
 	/* Second or later call, make sure page address matches the one from first call */
 	if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT))
-		mce_panic("Consecutive machine checks to different user pages", m, msg);
+		mce_panic("Consecutive machine checks to different user pages", err, msg);
 
 	/* Do not call task_work_add() more than once */
 	if (count > 1)
@@ -1455,8 +1471,14 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	int worst = 0, order, no_way_out, kill_current_task, lmce, taint = 0;
 	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS) = { 0 };
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS) = { 0 };
-	struct mce m, *final;
+	struct mce_hw_err *final;
+	struct mce_hw_err err;
 	char *msg = NULL;
+	struct mce *m;
+
+	memset(&err, 0, sizeof(struct mce_hw_err));
+
+	m = &err.m;
 
 	if (unlikely(mce_flags.p5))
 		return pentium_machine_check(regs);
@@ -1494,13 +1516,13 @@ noinstr void do_machine_check(struct pt_regs *regs)
 
 	this_cpu_inc(mce_exception_count);
 
-	mce_gather_info(&m, regs);
-	m.tsc = rdtsc();
+	mce_gather_info(m, regs);
+	m->tsc = rdtsc();
 
-	final = this_cpu_ptr(&mces_seen);
-	*final = m;
+	final = this_cpu_ptr(&hw_errs_seen);
+	final->m = *m;
 
-	no_way_out = mce_no_way_out(&m, &msg, valid_banks, regs);
+	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
 
 	barrier();
 
@@ -1509,15 +1531,15 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * Assume the worst for now, but if we find the
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
-	if (!(m.mcgstatus & MCG_STATUS_RIPV))
+	if (!(m->mcgstatus & MCG_STATUS_RIPV))
 		kill_current_task = 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
 	 */
-	if (m.cpuvendor == X86_VENDOR_INTEL ||
-	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
-		lmce = m.mcgstatus & MCG_STATUS_LMCES;
+	if (m->cpuvendor == X86_VENDOR_INTEL ||
+	    m->cpuvendor == X86_VENDOR_ZHAOXIN)
+		lmce = m->mcgstatus & MCG_STATUS_LMCES;
 
 	/*
 	 * Local machine check may already know that we have to panic.
@@ -1528,12 +1550,12 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 */
 	if (lmce) {
 		if (no_way_out)
-			mce_panic("Fatal local machine check", &m, msg);
+			mce_panic("Fatal local machine check", &err, msg);
 	} else {
 		order = mce_start(&no_way_out);
 	}
 
-	taint = __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_out, &worst);
+	taint = __mc_scan_banks(&err, regs, &final->m, toclear, valid_banks, no_way_out, &worst);
 
 	if (!no_way_out)
 		mce_clear_state(toclear);
@@ -1548,7 +1570,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 				no_way_out = worst >= MCE_PANIC_SEVERITY;
 
 			if (no_way_out)
-				mce_panic("Fatal machine check on current CPU", &m, msg);
+				mce_panic("Fatal machine check on current CPU", &err, msg);
 		}
 	} else {
 		/*
@@ -1560,8 +1582,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * make sure we have the right "msg".
 		 */
 		if (worst >= MCE_PANIC_SEVERITY) {
-			mce_severity(&m, regs, &msg, true);
-			mce_panic("Local fatal machine check!", &m, msg);
+			mce_severity(m, regs, &msg, true);
+			mce_panic("Local fatal machine check!", &err, msg);
 		}
 	}
 
@@ -1579,16 +1601,16 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		goto out;
 
 	/* Fault was in user mode and we need to take some action */
-	if ((m.cs & 3) == 3) {
+	if ((m->cs & 3) == 3) {
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		if (!mce_usable_address(&m))
-			queue_task_work(&m, msg, kill_me_now);
+		if (!mce_usable_address(m))
+			queue_task_work(&err, msg, kill_me_now);
 		else
-			queue_task_work(&m, msg, kill_me_maybe);
+			queue_task_work(&err, msg, kill_me_maybe);
 
-	} else if (m.mcgstatus & MCG_STATUS_SEAM_NR) {
+	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {
 		/*
 		 * Saved RIP on stack makes it look like the machine check
 		 * was taken in the kernel on the instruction following
@@ -1600,8 +1622,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * not occur there. Mark the page as poisoned so it won't
 		 * be added to free list when the guest is terminated.
 		 */
-		if (mce_usable_address(&m)) {
-			struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT);
+		if (mce_usable_address(m)) {
+			struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
 
 			if (p)
 				SetPageHWPoison(p);
@@ -1616,13 +1638,13 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * corresponding exception handler which would do that is the
 		 * proper one.
 		 */
-		if (m.kflags & MCE_IN_KERNEL_RECOV) {
+		if (m->kflags & MCE_IN_KERNEL_RECOV) {
 			if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
-				mce_panic("Failed kernel mode recovery", &m, msg);
+				mce_panic("Failed kernel mode recovery", &err, msg);
 		}
 
-		if (m.kflags & MCE_IN_KERNEL_COPYIN)
-			queue_task_work(&m, msg, kill_me_never);
+		if (m->kflags & MCE_IN_KERNEL_COPYIN)
+			queue_task_work(&err, msg, kill_me_never);
 	}
 
 out:
diff --git a/arch/x86/kernel/cpu/mce/dev-mcelog.c b/arch/x86/kernel/cpu/mce/dev-mcelog.c
index a05ac0716ecf..4a0e3bb4a4fb 100644
--- a/arch/x86/kernel/cpu/mce/dev-mcelog.c
+++ b/arch/x86/kernel/cpu/mce/dev-mcelog.c
@@ -36,7 +36,7 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
 static int dev_mce_log(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	unsigned int entry;
 
 	if (mce->kflags & MCE_HANDLED_CEC)
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index 4284749ec803..3337ea5c428d 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -31,15 +31,15 @@ static LLIST_HEAD(mce_event_llist);
  */
 static bool is_duplicate_mce_record(struct mce_evt_llist *t, struct mce_evt_llist *l)
 {
+	struct mce_hw_err *err1, *err2;
 	struct mce_evt_llist *node;
-	struct mce *m1, *m2;
 
-	m1 = &t->mce;
+	err1 = &t->err;
 
 	llist_for_each_entry(node, &l->llnode, llnode) {
-		m2 = &node->mce;
+		err2 = &node->err;
 
-		if (!mce_cmp(m1, m2))
+		if (!mce_cmp(&err1->m, &err2->m))
 			return true;
 	}
 	return false;
@@ -73,9 +73,9 @@ struct llist_node *mce_gen_pool_prepare_records(void)
 
 void mce_gen_pool_process(struct work_struct *__unused)
 {
+	struct mce_hw_err *err;
 	struct llist_node *head;
 	struct mce_evt_llist *node, *tmp;
-	struct mce *mce;
 
 	head = llist_del_all(&mce_event_llist);
 	if (!head)
@@ -83,8 +83,8 @@ void mce_gen_pool_process(struct work_struct *__unused)
 
 	head = llist_reverse_order(head);
 	llist_for_each_entry_safe(node, tmp, head, llnode) {
-		mce = &node->mce;
-		blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
+		err = &node->err;
+		blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, err);
 		gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
 	}
 }
@@ -94,11 +94,11 @@ bool mce_gen_pool_empty(void)
 	return llist_empty(&mce_event_llist);
 }
 
-int mce_gen_pool_add(struct mce *mce)
+int mce_gen_pool_add(struct mce_hw_err *err)
 {
 	struct mce_evt_llist *node;
 
-	if (filter_mce(mce))
+	if (filter_mce(&err->m))
 		return -EINVAL;
 
 	if (!mce_evt_pool)
@@ -110,7 +110,7 @@ int mce_gen_pool_add(struct mce *mce)
 		return -ENOMEM;
 	}
 
-	memcpy(&node->mce, mce, sizeof(*mce));
+	memcpy(&node->err, err, sizeof(*err));
 	llist_add(&node->llnode, &mce_event_llist);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 49ed3428785d..c65a5c4e2f22 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -502,6 +502,7 @@ static void prepare_msrs(void *info)
 
 static void do_inject(void)
 {
+	struct mce_hw_err err;
 	u64 mcg_status = 0;
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
@@ -517,7 +518,8 @@ static void do_inject(void)
 		i_mce.status |= MCI_STATUS_SYNDV;
 
 	if (inj_type == SW_INJ) {
-		mce_log(&i_mce);
+		err.m = i_mce;
+		mce_log(&err);
 		return;
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f03969e6..c79cb5b00e4c 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -26,12 +26,12 @@ extern struct blocking_notifier_head x86_mce_decoder_chain;
 
 struct mce_evt_llist {
 	struct llist_node llnode;
-	struct mce mce;
+	struct mce_hw_err err;
 };
 
 void mce_gen_pool_process(struct work_struct *__unused);
 bool mce_gen_pool_empty(void);
-int mce_gen_pool_add(struct mce *mce);
+int mce_gen_pool_add(struct mce_hw_err *err);
 int mce_gen_pool_init(void);
 struct llist_node *mce_gen_pool_prepare_records(void);
 
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index ca87a0939135..4864191918db 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -134,7 +134,7 @@ static int print_extlog_rcd(const char *pfx,
 static int extlog_print(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	int	bank = mce->bank;
 	int	cpu = mce->extcpu;
 	struct acpi_hest_generic_status *estatus, *tmp;
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index d48a388b796e..b917988db794 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -13,7 +13,7 @@
 static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	struct acpi_nfit_desc *acpi_desc;
 	struct nfit_spa *nfit_spa;
 
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 91e0a88ef904..d1e47cba0ff2 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -1810,7 +1810,7 @@ static void i7core_check_error(struct mem_ctl_info *mci, struct mce *m)
 static int i7core_mce_check_error(struct notifier_block *nb, unsigned long val,
 				  void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	struct i7core_dev *i7_dev;
 	struct mem_ctl_info *mci;
 
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dbe9fe5f2ca6..7579c4fa6f18 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -911,7 +911,7 @@ static int ecclog_nmi_handler(unsigned int cmd, struct pt_regs *regs)
 static int ecclog_mce_handler(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	char *type;
 
 	if (mce->kflags & MCE_HANDLED_CEC)
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 8130c3dc64da..c5fae99de781 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -792,7 +792,7 @@ static const char *decode_error_status(struct mce *m)
 static int
 amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce *m = &((struct mce_hw_err *)data)->m;
 	unsigned int fam = x86_family(m->cpuid);
 	int ecc;
 
diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 2afcd148fcf8..e2fb2d75af04 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -1366,7 +1366,7 @@ static void pnd2_unregister_mci(struct mem_ctl_info *mci)
  */
 static int pnd2_mce_check_error(struct notifier_block *nb, unsigned long val, void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	struct mem_ctl_info *mci;
 	struct dram_addr daddr;
 	char *type;
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 26cca5a9322d..0c4e45245153 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -3255,7 +3255,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
 static int sbridge_mce_check_error(struct notifier_block *nb, unsigned long val,
 				   void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	struct mem_ctl_info *mci;
 	char *type;
 
diff --git a/drivers/edac/skx_common.c b/drivers/edac/skx_common.c
index 27996b7924c8..7de89c24e06f 100644
--- a/drivers/edac/skx_common.c
+++ b/drivers/edac/skx_common.c
@@ -633,7 +633,7 @@ static bool skx_error_in_mem(const struct mce *m)
 int skx_mce_check_error(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
-	struct mce *mce = (struct mce *)data;
+	struct mce *mce = &((struct mce_hw_err *)data)->m;
 	struct decoded_addr res;
 	struct mem_ctl_info *mci;
 	char *type;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1adc81a55734..d8cd44eb7e73 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -3824,7 +3824,7 @@ static struct amdgpu_device *find_adev(uint32_t node_id)
 static int amdgpu_bad_page_notifier(struct notifier_block *nb,
 				    unsigned long val, void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce *m = &((struct mce_hw_err *)data)->m;
 	struct amdgpu_device *adev = NULL;
 	uint32_t gpu_id = 0;
 	uint32_t umc_inst = 0, ch_inst = 0;
diff --git a/drivers/ras/amd/fmpm.c b/drivers/ras/amd/fmpm.c
index 271dfad05d68..d3ce41a46ac4 100644
--- a/drivers/ras/amd/fmpm.c
+++ b/drivers/ras/amd/fmpm.c
@@ -400,7 +400,7 @@ static void retire_dram_row(u64 addr, u64 id, u32 cpu)
 
 static int fru_handle_mem_poison(struct notifier_block *nb, unsigned long val, void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce *m = &((struct mce_hw_err *)data)->m;
 	struct fru_rec *rec;
 
 	if (!mce_is_memory_error(m))
diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index e440b15fbabc..be785746f587 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -534,7 +534,7 @@ static int __init create_debugfs_nodes(void)
 static int cec_notifier(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce *m = &((struct mce_hw_err *)data)->m;
 
 	if (!m)
 		return NOTIFY_DONE;
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index f0f7b3cb2041..65aba1afcd07 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -19,9 +19,9 @@
 
 TRACE_EVENT(mce_record,
 
-	TP_PROTO(struct mce *m),
+	TP_PROTO(struct mce_hw_err *err),
 
-	TP_ARGS(m),
+	TP_ARGS(err),
 
 	TP_STRUCT__entry(
 		__field(	u64,		mcgcap		)
@@ -46,25 +46,25 @@ TRACE_EVENT(mce_record,
 	),
 
 	TP_fast_assign(
-		__entry->mcgcap		= m->mcgcap;
-		__entry->mcgstatus	= m->mcgstatus;
-		__entry->status		= m->status;
-		__entry->addr		= m->addr;
-		__entry->misc		= m->misc;
-		__entry->synd		= m->synd;
-		__entry->ipid		= m->ipid;
-		__entry->ip		= m->ip;
-		__entry->tsc		= m->tsc;
-		__entry->ppin		= m->ppin;
-		__entry->walltime	= m->time;
-		__entry->cpu		= m->extcpu;
-		__entry->cpuid		= m->cpuid;
-		__entry->apicid		= m->apicid;
-		__entry->socketid	= m->socketid;
-		__entry->cs		= m->cs;
-		__entry->bank		= m->bank;
-		__entry->cpuvendor	= m->cpuvendor;
-		__entry->microcode	= m->microcode;
+		__entry->mcgcap		= err->m.mcgcap;
+		__entry->mcgstatus	= err->m.mcgstatus;
+		__entry->status		= err->m.status;
+		__entry->addr		= err->m.addr;
+		__entry->misc		= err->m.misc;
+		__entry->synd		= err->m.synd;
+		__entry->ipid		= err->m.ipid;
+		__entry->ip		= err->m.ip;
+		__entry->tsc		= err->m.tsc;
+		__entry->ppin		= err->m.ppin;
+		__entry->walltime	= err->m.time;
+		__entry->cpu		= err->m.extcpu;
+		__entry->cpuid		= err->m.cpuid;
+		__entry->apicid		= err->m.apicid;
+		__entry->socketid	= err->m.socketid;
+		__entry->cs		= err->m.cs;
+		__entry->bank		= err->m.bank;
+		__entry->cpuvendor	= err->m.cpuvendor;
+		__entry->microcode	= err->m.microcode;
 	),
 
 	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-06-25 19:56 [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
  2024-06-25 19:56 ` [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
@ 2024-06-25 19:56 ` Avadhut Naik
  2024-06-26 11:10   ` Borislav Petkov
  2024-06-25 19:56 ` [PATCH v2 3/4] x86/mce/apei: Handle variable register array size Avadhut Naik
  2024-06-25 19:56 ` [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
  3 siblings, 1 reply; 21+ messages in thread
From: Avadhut Naik @ 2024-06-25 19:56 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel, linux-acpi
  Cc: linux-kernel, bp, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen,
	avadnaik

AMD's Scalable MCA systems viz. Genoa will include two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition
to the existing MCA_SYND register. The data within the registers is
considered valid if MCA_STATUS[SyndV] is set.

Add fields for these registers as vendor-specific error information
in struct mce_hw_err. Save and print these registers wherever
MCA_STATUS[SyndV]/MCA_SYND is currently used.

Also, modify the mce_record tracepoint to export these new registers
through __dynamic_array. While the sizeof() operator has been used to
determine the size of this __dynamic_array, the same, if needed in the
future can be substituted by caching the size of vendor-specific error
information as part of struct mce_hw_err.

Note: Checkpatch warnings/errors are ignored to maintain coding style.

[Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/include/asm/mce.h     | 12 ++++++++++++
 arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
 arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
 drivers/edac/mce_amd.c         | 10 +++++++---
 include/trace/events/mce.h     |  9 +++++++--
 5 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e955edb22897..2b43ba37bbda 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -122,6 +122,9 @@
 #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
 #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
 #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
+/* Registers MISC2 to MISC4 are at offsets B to D. */
+#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
+#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
 #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
@@ -132,6 +135,8 @@
 #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
+#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
 
 #define XEC(x, mask)			(((x) >> 16) & mask)
 
@@ -189,6 +194,13 @@ enum mce_notifier_prios {
 
 struct mce_hw_err {
 	struct mce m;
+
+	union vendor_info {
+		struct {
+			u64 synd1;
+			u64 synd2;
+		} amd;
+	} vi;
 };
 
 struct notifier_block;
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index cb7dc0b1aa50..fc69d244ca7f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
+		}
 	}
 
 	mce_log(&err);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6225143b9b14..3bb0f8b39f97 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err)
 	if (mce_flags.smca) {
 		if (m->synd)
 			pr_cont("SYND %llx ", m->synd);
+		if (err->vi.amd.synd1)
+			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
+		if (err->vi.amd.synd2)
+			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
 	}
@@ -639,8 +643,10 @@ static struct notifier_block mce_default_nb = {
 /*
  * Read ADDR and MISC registers.
  */
-static noinstr void mce_read_aux(struct mce *m, int i)
+static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 {
+	struct mce *m = &err->m;
+
 	if (m->status & MCI_STATUS_MISCV)
 		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
 
@@ -662,8 +668,11 @@ static noinstr void mce_read_aux(struct mce *m, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
+			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
+		}
 	}
 }
 
@@ -766,7 +775,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (flags & MCP_DONTLOG)
 			goto clear_it;
 
-		mce_read_aux(m, i);
+		mce_read_aux(&err, i);
 		m->severity = mce_severity(m, NULL, NULL, false);
 		/*
 		 * Don't get the IP here because it's unlikely to
@@ -903,9 +912,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
  */
-static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
+static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
 					  struct pt_regs *regs)
 {
+	struct mce *m = &err->m;
 	char *tmp = *msg;
 	int i;
 
@@ -923,7 +933,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 
 		m->bank = i;
 		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
-			mce_read_aux(m, i);
+			mce_read_aux(err, i);
 			*msg = tmp;
 			return 1;
 		}
@@ -1321,7 +1331,7 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
 		if (severity == MCE_NO_SEVERITY)
 			continue;
 
-		mce_read_aux(m, i);
+		mce_read_aux(err, i);
 
 		/* assuming valid severity level != 0 */
 		m->severity = severity;
@@ -1522,7 +1532,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	final = this_cpu_ptr(&hw_errs_seen);
 	final->m = *m;
 
-	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
+	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
 
 	barrier();
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index c5fae99de781..69e12cb2f0de 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m)
 static int
 amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 {
-	struct mce *m = &((struct mce_hw_err *)data)->m;
+	struct mce_hw_err *err = (struct mce_hw_err *)data;
+	struct mce *m = &err->m;
 	unsigned int fam = x86_family(m->cpuid);
 	int ecc;
 
@@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
+		if (m->status & MCI_STATUS_SYNDV) {
+			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
+			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+				 err->vi.amd.synd1, err->vi.amd.synd2);
+		}
 
 		pr_cont("\n");
 
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 65aba1afcd07..9e7211eddbca 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -43,6 +43,8 @@ TRACE_EVENT(mce_record,
 		__field(	u8,		bank		)
 		__field(	u8,		cpuvendor	)
 		__field(	u32,		microcode	)
+		__field(	u8,		len	)
+		__dynamic_array(u8, v_data, sizeof(err->vi))
 	),
 
 	TP_fast_assign(
@@ -65,9 +67,11 @@ TRACE_EVENT(mce_record,
 		__entry->bank		= err->m.bank;
 		__entry->cpuvendor	= err->m.cpuvendor;
 		__entry->microcode	= err->m.microcode;
+		__entry->len		= sizeof(err->vi);
+		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s",
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
@@ -83,7 +87,8 @@ TRACE_EVENT(mce_record,
 		__entry->walltime,
 		__entry->socketid,
 		__entry->apicid,
-		__entry->microcode)
+		__entry->microcode,
+		__print_array(__get_dynamic_array(v_data), __entry->len / 8, 8))
 );
 
 #endif /* _TRACE_MCE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/4] x86/mce/apei: Handle variable register array size
  2024-06-25 19:56 [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
  2024-06-25 19:56 ` [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
  2024-06-25 19:56 ` [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
@ 2024-06-25 19:56 ` Avadhut Naik
  2024-06-26 11:57   ` Borislav Petkov
  2024-06-25 19:56 ` [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
  3 siblings, 1 reply; 21+ messages in thread
From: Avadhut Naik @ 2024-06-25 19:56 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel, linux-acpi
  Cc: linux-kernel, bp, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen,
	avadnaik

From: Yazen Ghannam <yazen.ghannam@amd.com>

ACPI Boot Error Record Table (BERT) is being used by the kernel to
report errors that occurred in a previous boot. On some modern AMD
systems, these very errors within the BERT are reported through the
x86 Common Platform Error Record (CPER) format which consists of one
or more Processor Context Information Structures. These context
structures provide a starting address and represent an x86 MSR range
in which the data constitutes a contiguous set of MSRs starting from,
and including the starting address.

It's common, for AMD systems that implement this behavior, that the
MSR range represents the MCAX register space used for the Scalable MCA
feature. The apei_smca_report_x86_error() function decodes and passes
this information through the MCE notifier chain. However, this function
assumes a fixed register size based on the original HW/FW implementation.

This assumption breaks with the addition of two new MCAX registers viz.
MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
MCAX register space, so they won't be included when decoding the CPER
data.

Rework apei_smca_report_x86_error() to support a variable register array
size. This covers any case where the MSR context information starts at
the MCAX address for MCA_STATUS and ends at any other register within
the MCAX register space.

Add code comments indicating the MCAX register at each offset.

[Yazen: Add Avadhut as co-developer for wrapper changes.]

Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/apei.c | 73 +++++++++++++++++++++++++++-------
 1 file changed, 59 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index b8f4e75fb8a7..7a15f0ca1bd1 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -69,9 +69,9 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
+	unsigned int cpu, num_registers;
 	struct mce_hw_err err;
 	struct mce *m = &err.m;
-	unsigned int cpu;
 
 	memset(&err, 0, sizeof(struct mce_hw_err));
 
@@ -91,16 +91,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 		return -EINVAL;
 
 	/*
-	 * The register array size must be large enough to include all the
-	 * SMCA registers which need to be extracted.
-	 *
 	 * The number of registers in the register array is determined by
 	 * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
-	 * The register layout is fixed and currently the raw data in the
-	 * register array includes 6 SMCA registers which the kernel can
-	 * extract.
+	 * Ensure that the array size includes at least 1 register.
 	 */
-	if (ctx_info->reg_arr_size < 48)
+	num_registers = ctx_info->reg_arr_size >> 3;
+	if (!num_registers)
 		return -EINVAL;
 
 	mce_setup(m);
@@ -118,12 +114,61 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 
 	m->apicid = lapic_id;
 	m->bank = (ctx_info->msr_addr >> 4) & 0xFF;
-	m->status = *i_mce;
-	m->addr = *(i_mce + 1);
-	m->misc = *(i_mce + 2);
-	/* Skipping MCA_CONFIG */
-	m->ipid = *(i_mce + 4);
-	m->synd = *(i_mce + 5);
+
+	/*
+	 * The SMCA register layout is fixed and includes 16 registers.
+	 * The end of the array may be variable, but the beginning is known.
+	 * Switch on the number of registers. Cap the number of registers to
+	 * expected max (15).
+	 */
+	if (num_registers > 15)
+		num_registers = 15;
+
+	switch (num_registers) {
+	/* MCA_SYND2 */
+	case 15:
+		err.vi.amd.synd2 = *(i_mce + 14);
+		fallthrough;
+	/* MCA_SYND1 */
+	case 14:
+		err.vi.amd.synd1 = *(i_mce + 13);
+		fallthrough;
+	/* MCA_MISC4 */
+	case 13:
+	/* MCA_MISC3 */
+	case 12:
+	/* MCA_MISC2 */
+	case 11:
+	/* MCA_MISC1 */
+	case 10:
+	/* MCA_DEADDR */
+	case 9:
+	/* MCA_DESTAT */
+	case 8:
+	/* reserved */
+	case 7:
+	/* MCA_SYND */
+	case 6:
+		m->synd = *(i_mce + 5);
+		fallthrough;
+	/* MCA_IPID */
+	case 5:
+		m->ipid = *(i_mce + 4);
+		fallthrough;
+	/* MCA_CONFIG */
+	case 4:
+	/* MCA_MISC0 */
+	case 3:
+		m->misc = *(i_mce + 2);
+		fallthrough;
+	/* MCA_ADDR */
+	case 2:
+		m->addr = *(i_mce + 1);
+		fallthrough;
+	/* MCA_STATUS */
+	case 1:
+		m->status = *i_mce;
+	}
 
 	mce_log(&err);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-25 19:56 [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
                   ` (2 preceding siblings ...)
  2024-06-25 19:56 ` [PATCH v2 3/4] x86/mce/apei: Handle variable register array size Avadhut Naik
@ 2024-06-25 19:56 ` Avadhut Naik
  2024-06-26 12:04   ` Borislav Petkov
  3 siblings, 1 reply; 21+ messages in thread
From: Avadhut Naik @ 2024-06-25 19:56 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel, linux-acpi
  Cc: linux-kernel, bp, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen,
	avadnaik

From: Yazen Ghannam <yazen.ghannam@amd.com>

A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).

The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
covers multiple devices, for example, a Unified Memory Controller (UMC)
bank that manages two DIMMs.

Print the FRU Text string, if available, when decoding an MCA error.

Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
error information and save the value of the MSR. The very value can then be
exported through tracepoint for userspace tools like rasdaemon to print FRU
Text, if available.

 Note: Checkpatch checks/warnings are ignored to maintain coding style.

[Yazen: Add Avadhut as co-developer for wrapper changes. ]

Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/include/asm/mce.h     |  2 ++
 arch/x86/kernel/cpu/mce/amd.c  |  1 +
 arch/x86/kernel/cpu/mce/apei.c |  2 ++
 arch/x86/kernel/cpu/mce/core.c |  3 +++
 drivers/edac/mce_amd.c         | 21 ++++++++++++++-------
 5 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 2b43ba37bbda..c6dea9c12498 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
  *  - TCC bit is present in MCx_STATUS.
  */
 #define MCI_CONFIG_MCAX		0x1
+#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
@@ -199,6 +200,7 @@ struct mce_hw_err {
 		struct {
 			u64 synd1;
 			u64 synd2;
+			u64 config;
 		} amd;
 	} vi;
 };
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index fc69d244ca7f..f690905aa04f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -798,6 +798,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
+		rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vi.amd.config);
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7a15f0ca1bd1..ba8947983dc7 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -157,6 +157,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 		fallthrough;
 	/* MCA_CONFIG */
 	case 4:
+		err.vi.amd.config = *(i_mce + 3);
+		fallthrough;
 	/* MCA_MISC0 */
 	case 3:
 		m->misc = *(i_mce + 2);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 3bb0f8b39f97..cbd10e499a28 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -195,6 +195,8 @@ static void __print_mce(struct mce_hw_err *err)
 			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
+		if (err->vi.amd.config)
+			pr_cont("CONFIG %llx ", err->vi.amd.config);
 	}
 
 	pr_cont("\n");
@@ -667,6 +669,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
+		err->vi.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 69e12cb2f0de..6ae6b89b1a1e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce_hw_err *err = (struct mce_hw_err *)data;
 	struct mce *m = &err->m;
 	unsigned int fam = x86_family(m->cpuid);
+	u64 mca_config = err->vi.amd.config;
 	int ecc;
 
 	if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		u32 low, high;
-		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
-
-		if (!rdmsr_safe(addr, &low, &high) &&
-		    (low & MCI_CONFIG_MCAX))
+		if (mca_config & MCI_CONFIG_MCAX)
 			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
 
 		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
-			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-				 err->vi.amd.synd1, err->vi.amd.synd2);
+			if (mca_config & MCI_CONFIG_FRUTEXT) {
+				char frutext[17];
+
+				memset(frutext, 0, sizeof(frutext));
+				memcpy(&frutext[0], &err->vi.amd.synd1, 8);
+				memcpy(&frutext[8], &err->vi.amd.synd2, 8);
+
+				pr_emerg(HW_ERR "FRU Text: %s", frutext);
+			} else {
+				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+					 err->vi.amd.synd1, err->vi.amd.synd2);
+			}
 		}
 
 		pr_cont("\n");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-06-25 19:56 ` [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
@ 2024-06-26 10:44   ` Borislav Petkov
  2024-06-26 17:11     ` Luck, Tony
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 10:44 UTC (permalink / raw)
  To: tony.luck
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-acpi,
	linux-kernel, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, avadnaik

On Tue, Jun 25, 2024 at 02:56:21PM -0500, Avadhut Naik wrote:
> Currently, exporting new additional machine check error information
> involves adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
> 
> However, as new MSRs are being added (and will be added in the future)
> by CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some
> CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
> different CPU vendors may export the additional information in varying
> sizes.
> 
> The problem particularly intensifies since struct mce is exposed to
> userspace as part of UAPI. It's bloating through vendor-specific data
> should be avoided to limit the information being sent out to userspace.
> 
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be
> exported through a union within the wrapper structure and through
> __dynamic_array in mce_record tracepoint.
> 
> Furthermore, new internal kernel fields can be added to the wrapper
> struct without impacting the user space API.
> 
> Note: Some Checkpatch checks have been ignored to maintain coding style.
> 
> [Yazen: Add last commit message paragraph.]
> 
> Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/include/asm/mce.h              |   6 +-
>  arch/x86/kernel/cpu/mce/amd.c           |  29 ++--
>  arch/x86/kernel/cpu/mce/apei.c          |  54 +++----
>  arch/x86/kernel/cpu/mce/core.c          | 178 +++++++++++++-----------
>  arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
>  arch/x86/kernel/cpu/mce/genpool.c       |  20 +--
>  arch/x86/kernel/cpu/mce/inject.c        |   4 +-
>  arch/x86/kernel/cpu/mce/internal.h      |   4 +-
>  drivers/acpi/acpi_extlog.c              |   2 +-
>  drivers/acpi/nfit/mce.c                 |   2 +-
>  drivers/edac/i7core_edac.c              |   2 +-
>  drivers/edac/igen6_edac.c               |   2 +-
>  drivers/edac/mce_amd.c                  |   2 +-
>  drivers/edac/pnd2_edac.c                |   2 +-
>  drivers/edac/sb_edac.c                  |   2 +-
>  drivers/edac/skx_common.c               |   2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   2 +-
>  drivers/ras/amd/fmpm.c                  |   2 +-
>  drivers/ras/cec.c                       |   2 +-
>  include/trace/events/mce.h              |  42 +++---
>  20 files changed, 199 insertions(+), 162 deletions(-)

Ok, did some minor massaging but otherwise looks ok now.

Tony, any comments? You ok with this, would that fit any Intel-specific vendor
fields too or do you need some additional Intel-specific changes?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-06-25 19:56 ` [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
@ 2024-06-26 11:10   ` Borislav Petkov
  2024-06-26 17:24     ` Naik, Avadhut
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 11:10 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, avadnaik

On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> AMD's Scalable MCA systems viz. Genoa will include two new registers:

"viz."?

Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
more widespread.

> MCA_SYND1 and MCA_SYND2.
> 
> These registers will include supplemental error information in addition
> to the existing MCA_SYND register. The data within the registers is
> considered valid if MCA_STATUS[SyndV] is set.

From here...

> Add fields for these registers as vendor-specific error information
> in struct mce_hw_err. Save and print these registers wherever
> MCA_STATUS[SyndV]/MCA_SYND is currently used.
> 
> Also, modify the mce_record tracepoint to export these new registers
> through __dynamic_array. While the sizeof() operator has been used to
> determine the size of this __dynamic_array, the same, if needed in the
> future can be substituted by caching the size of vendor-specific error
> information as part of struct mce_hw_err.

... to here this text explains what the patch does. I guess it is time for my
boilerplate text again:

Do not talk about *what* the patch is doing in the commit message - that
should be obvious from the diff itself. Rather, concentrate on the *why*
it needs to be done.

Imagine one fine day you're doing git archeology, you find the place in
the code about which you want to find out why it was changed the way it 
is now.

You do git annotate <filename> ... find the line, see the commit id and
you do:

git show <commit id>

You read the commit message and there's just gibberish and nothing's
explaining *why* that change was done. And you start scratching your
head, trying to figure out why. Because the damn commit message is worth
sh*t.

This happens to us maintainers at least once a week. Well, I don't want
that to happen in my tree anymore.

You catch my drift? :)

So, now, how are those new syndromes going to be used in the tracepoint and
why do we want them there?

> Note: Checkpatch warnings/errors are ignored to maintain coding style.

This goes...

> 
> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Yes, you did but now your SOB chain is wrong:

> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>

This tells me Avadhut is the author, Yazen handled it and he's sending it to
me. But nope, he isn't. So it needs another Avadhut SOB underneath.

Audit all patches pls.

> ---

... right under those three "---" as such notes do not belong in the commit
message. Remember that for the future.

>  arch/x86/include/asm/mce.h     | 12 ++++++++++++
>  arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
>  arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
>  drivers/edac/mce_amd.c         | 10 +++++++---
>  include/trace/events/mce.h     |  9 +++++++--
>  5 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index e955edb22897..2b43ba37bbda 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -122,6 +122,9 @@
>  #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
>  #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
>  #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
> +/* Registers MISC2 to MISC4 are at offsets B to D. */
> +#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
> +#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
>  #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
> @@ -132,6 +135,8 @@
>  #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
> +#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
> +#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>  
>  #define XEC(x, mask)			(((x) >> 16) & mask)
>  
> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>  
>  struct mce_hw_err {
>  	struct mce m;
> +
> +	union vendor_info {
> +		struct {
> +			u64 synd1;
> +			u64 synd2;
> +		} amd;

I presume the intent here is for Intel or other vendors to add their
vendor-specific stuff here too?

I'm also expecting that shared fields will be promoted up to the common struct
namespace. Pls add a short comment explaining what the goal with that struct
is.

> +	} vi;

Call that "vendor" so that in the code you can have

	err.vendor.amd.

or

	err.vendor.intel.

and so on so that it is perfectly clear what this is.

>  };
>  
>  struct notifier_block;
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index cb7dc0b1aa50..fc69d244ca7f 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>  	if (mce_flags.smca) {
>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
>  
> -		if (m->status & MCI_STATUS_SYNDV)
> +		if (m->status & MCI_STATUS_SYNDV) {
>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
> +		}
>  	}
>  
>  	mce_log(&err);
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 6225143b9b14..3bb0f8b39f97 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err)
>  	if (mce_flags.smca) {
>  		if (m->synd)
>  			pr_cont("SYND %llx ", m->synd);
> +		if (err->vi.amd.synd1)
> +			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
> +		if (err->vi.amd.synd2)
> +			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
>  		if (m->ipid)
>  			pr_cont("IPID %llx ", m->ipid);
>  	}
> @@ -639,8 +643,10 @@ static struct notifier_block mce_default_nb = {
>  /*
>   * Read ADDR and MISC registers.
>   */
> -static noinstr void mce_read_aux(struct mce *m, int i)
> +static noinstr void mce_read_aux(struct mce_hw_err *err, int i)

This whole conversion to struct mce_hw_err here belongs logically into patch
1.

>  {
> +	struct mce *m = &err->m;
> +
>  	if (m->status & MCI_STATUS_MISCV)
>  		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
>  
> @@ -662,8 +668,11 @@ static noinstr void mce_read_aux(struct mce *m, int i)
>  	if (mce_flags.smca) {
>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
>  
> -		if (m->status & MCI_STATUS_SYNDV)
> +		if (m->status & MCI_STATUS_SYNDV) {
>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
> +			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
> +			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
> +		}
>  	}
>  }
>  
> @@ -766,7 +775,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  		if (flags & MCP_DONTLOG)
>  			goto clear_it;
>  
> -		mce_read_aux(m, i);
> +		mce_read_aux(&err, i);
>  		m->severity = mce_severity(m, NULL, NULL, false);
>  		/*
>  		 * Don't get the IP here because it's unlikely to
> @@ -903,9 +912,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
>   * Do a quick check if any of the events requires a panic.
>   * This decides if we keep the events around or clear them.
>   */
> -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
> +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
>  					  struct pt_regs *regs)
>  {
> +	struct mce *m = &err->m;
>  	char *tmp = *msg;
>  	int i;
>  
> @@ -923,7 +933,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
>  
>  		m->bank = i;
>  		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
> -			mce_read_aux(m, i);
> +			mce_read_aux(err, i);
>  			*msg = tmp;
>  			return 1;
>  		}
> @@ -1321,7 +1331,7 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
>  		if (severity == MCE_NO_SEVERITY)
>  			continue;
>  
> -		mce_read_aux(m, i);
> +		mce_read_aux(err, i);
>  
>  		/* assuming valid severity level != 0 */
>  		m->severity = severity;
> @@ -1522,7 +1532,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  	final = this_cpu_ptr(&hw_errs_seen);
>  	final->m = *m;
>  
> -	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
> +	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
>  
>  	barrier();
>  
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index c5fae99de781..69e12cb2f0de 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m)
>  static int
>  amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  {
> -	struct mce *m = &((struct mce_hw_err *)data)->m;
> +	struct mce_hw_err *err = (struct mce_hw_err *)data;
> +	struct mce *m = &err->m;
>  	unsigned int fam = x86_family(m->cpuid);
>  	int ecc;
>  
> @@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
>  		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
>  
> -		if (m->status & MCI_STATUS_SYNDV)
> -			pr_cont(", Syndrome: 0x%016llx", m->synd);
> +		if (m->status & MCI_STATUS_SYNDV) {
> +			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> +			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> +				 err->vi.amd.synd1, err->vi.amd.synd2);
> +		}
>  
>  		pr_cont("\n");
>  
> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
> index 65aba1afcd07..9e7211eddbca 100644
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -43,6 +43,8 @@ TRACE_EVENT(mce_record,
>  		__field(	u8,		bank		)
>  		__field(	u8,		cpuvendor	)
>  		__field(	u32,		microcode	)
> +		__field(	u8,		len	)
> +		__dynamic_array(u8, v_data, sizeof(err->vi))
>  	),
>  
>  	TP_fast_assign(
> @@ -65,9 +67,11 @@ TRACE_EVENT(mce_record,
>  		__entry->bank		= err->m.bank;
>  		__entry->cpuvendor	= err->m.cpuvendor;
>  		__entry->microcode	= err->m.microcode;
> +		__entry->len		= sizeof(err->vi);
> +		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));

So that vendor data layout - is that ABI too? Or are we free to shuffle the
fields around in the future or even remove some?

This all needs to be specified somewhere explicitly so that nothing relies on
that layout.

And I'm not sure that that's enough because when userspace tools start using
them, then they're practically an ABI so you can't change them even if you
wanted to.

So is libtraceevent or all the other libraries going to parse this as a blob
and it is always going to remain such?

But then the tools which interpret it need to know its layout and if it
changes, perhaps check kernel version which then becomes RealUgly(tm).

So you might just as well dump the separate fields one by one, without
a dynamic array.

Or do a dynamic array but specify that their layout in struct
mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
where.

Questions over questions...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] x86/mce/apei: Handle variable register array size
  2024-06-25 19:56 ` [PATCH v2 3/4] x86/mce/apei: Handle variable register array size Avadhut Naik
@ 2024-06-26 11:57   ` Borislav Petkov
  2024-06-26 17:28     ` Naik, Avadhut
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 11:57 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, avadnaik

On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> ACPI Boot Error Record Table (BERT) is being used by the kernel to
> report errors that occurred in a previous boot. On some modern AMD
> systems, these very errors within the BERT are reported through the
> x86 Common Platform Error Record (CPER) format which consists of one
> or more Processor Context Information Structures. These context
> structures provide a starting address and represent an x86 MSR range
> in which the data constitutes a contiguous set of MSRs starting from,
> and including the starting address.
> 
> It's common, for AMD systems that implement this behavior, that the
> MSR range represents the MCAX register space used for the Scalable MCA
> feature. The apei_smca_report_x86_error() function decodes and passes
> this information through the MCE notifier chain. However, this function
> assumes a fixed register size based on the original HW/FW implementation.
> 
> This assumption breaks with the addition of two new MCAX registers viz.
> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
> MCAX register space, so they won't be included when decoding the CPER
> data.
> 
> Rework apei_smca_report_x86_error() to support a variable register array
> size. This covers any case where the MSR context information starts at
> the MCAX address for MCA_STATUS and ends at any other register within
> the MCAX register space.
> 
> Add code comments indicating the MCAX register at each offset.
> 
> [Yazen: Add Avadhut as co-developer for wrapper changes.]
> 
> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>

This needs Avadhut's SOB after Yazen's.

Touchups ontop:

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7a15f0ca1bd1..6bbeb29125a9 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
-	unsigned int cpu, num_registers;
+	unsigned int cpu, num_regs;
 	struct mce_hw_err err;
 	struct mce *m = &err.m;
 
@@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	/*
 	 * The number of registers in the register array is determined by
 	 * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
-	 * Ensure that the array size includes at least 1 register.
+	 * Sanity-check registers array size.
 	 */
-	num_registers = ctx_info->reg_arr_size >> 3;
-	if (!num_registers)
+	num_regs = ctx_info->reg_arr_size >> 3;
+	if (!num_regs)
 		return -EINVAL;
 
 	mce_setup(m);
@@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	/*
 	 * The SMCA register layout is fixed and includes 16 registers.
 	 * The end of the array may be variable, but the beginning is known.
-	 * Switch on the number of registers. Cap the number of registers to
-	 * expected max (15).
+	 * Cap the number of registers to expected max (15).
 	 */
-	if (num_registers > 15)
-		num_registers = 15;
+	if (num_regs > 15)
+		num_regs = 15;
 
-	switch (num_registers) {
+	switch (num_regs) {
 	/* MCA_SYND2 */
 	case 15:
 		err.vi.amd.synd2 = *(i_mce + 14);

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-25 19:56 ` [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
@ 2024-06-26 12:04   ` Borislav Petkov
  2024-06-26 18:00     ` Naik, Avadhut
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 12:04 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, avadnaik

On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> A new "FRU Text in MCA" feature is defined where the Field Replaceable
> Unit (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> 
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
> 

From here...

> Print the FRU Text string, if available, when decoding an MCA error.
> 
> Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
> error information and save the value of the MSR. The very value can then be
> exported through tracepoint for userspace tools like rasdaemon to print FRU
> Text, if available.
> 
>  Note: Checkpatch checks/warnings are ignored to maintain coding style.

... to here goes into the trash can except what MCA_CONFIG is for being logged
as part of the error.

> [Yazen: Add Avadhut as co-developer for wrapper changes. ]
> 
> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>

Ditto as for patch 3.

> ---

> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> -				 err->vi.amd.synd1, err->vi.amd.synd2);
> +			if (mca_config & MCI_CONFIG_FRUTEXT) {
> +				char frutext[17];
> +
> +				memset(frutext, 0, sizeof(frutext));

Why are you clearing it if you're overwriting it immediately?

> +				memcpy(&frutext[0], &err->vi.amd.synd1, 8);
> +				memcpy(&frutext[8], &err->vi.amd.synd2, 8);
> +
> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> +			} else {
> +				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> +					 err->vi.amd.synd1, err->vi.amd.synd2);
> +			}
>  		}
>  
>  		pr_cont("\n");
> -- 
> 2.34.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-06-26 10:44   ` Borislav Petkov
@ 2024-06-26 17:11     ` Luck, Tony
  2024-06-26 18:10       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2024-06-26 17:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, rafael@kernel.org,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	lenb@kernel.org, mchehab@kernel.org, james.morse@arm.com,
	airlied@gmail.com, yazen.ghannam@amd.com, john.allen@amd.com,
	avadnaik@amd.com

> Tony, any comments? You ok with this, would that fit any Intel-specific vendor
> fields too or do you need some additional Intel-specific changes?

It looks easy enough to add any Intel specific bits to the union later.

Is there anyway that the trace event could be "smarter" about what vendor specific
information to include based on boot_cpu_data.x86_vendor? As currently written
Intel systems are going to see 3*u64 decoded into ascii, that are all zero. Not a
huge deal, I think it will just look like "0x0,0x0,0x0"

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-06-26 11:10   ` Borislav Petkov
@ 2024-06-26 17:24     ` Naik, Avadhut
  2024-06-26 18:18       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Naik, Avadhut @ 2024-06-26 17:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, Avadhut Naik



On 6/26/2024 06:10, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> 
> "viz."?
> 
Right. Will mention Zen4 instead of Genoa.

> Not a lot of people outside of AMD know what Genoa is. Zen4 is probably a lot
> more widespread.
> 
>> MCA_SYND1 and MCA_SYND2.
>>
>> These registers will include supplemental error information in addition
>> to the existing MCA_SYND register. The data within the registers is
>> considered valid if MCA_STATUS[SyndV] is set.
> 
> From here...
> 
>> Add fields for these registers as vendor-specific error information
>> in struct mce_hw_err. Save and print these registers wherever
>> MCA_STATUS[SyndV]/MCA_SYND is currently used.
>>
>> Also, modify the mce_record tracepoint to export these new registers
>> through __dynamic_array. While the sizeof() operator has been used to
>> determine the size of this __dynamic_array, the same, if needed in the
>> future can be substituted by caching the size of vendor-specific error
>> information as part of struct mce_hw_err.
> 
> ... to here this text explains what the patch does. I guess it is time for my
> boilerplate text again:
> 
> Do not talk about *what* the patch is doing in the commit message - that
> should be obvious from the diff itself. Rather, concentrate on the *why*
> it needs to be done.
> 
> Imagine one fine day you're doing git archeology, you find the place in
> the code about which you want to find out why it was changed the way it 
> is now.
> 
> You do git annotate <filename> ... find the line, see the commit id and
> you do:
> 
> git show <commit id>
> 
> You read the commit message and there's just gibberish and nothing's
> explaining *why* that change was done. And you start scratching your
> head, trying to figure out why. Because the damn commit message is worth
> sh*t.
> 
> This happens to us maintainers at least once a week. Well, I don't want
> that to happen in my tree anymore.
> 
> You catch my drift? :)
> 
> So, now, how are those new syndromes going to be used in the tracepoint and
> why do we want them there?
> 
Yes, I catch your drift. Will reword the commit message to explain that the
new syndrome registers are going to be exported through the tracepoint
in a dynamic array, as they are vendor-specific, so that usersapce error
decoding tools can retrieve the supplemental error information within them.

>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> 
> This goes...
> 
>>
>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> 
> Yes, you did but now your SOB chain is wrong:
> 
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> This tells me Avadhut is the author, Yazen handled it and he's sending it to
> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> 
> Audit all patches pls.
> 
Wasn't aware of this chronology. Thanks for this information!
So, IIUC, the sequence for this patch should be as follows?

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

>> ---
> 
> ... right under those three "---" as such notes do not belong in the commit
> message. Remember that for the future.
> 
Okay. Will move the note here.

>>  arch/x86/include/asm/mce.h     | 12 ++++++++++++
>>  arch/x86/kernel/cpu/mce/amd.c  |  5 ++++-
>>  arch/x86/kernel/cpu/mce/core.c | 24 +++++++++++++++++-------
>>  drivers/edac/mce_amd.c         | 10 +++++++---
>>  include/trace/events/mce.h     |  9 +++++++--
>>  5 files changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index e955edb22897..2b43ba37bbda 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -122,6 +122,9 @@
>>  #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
>>  #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
>>  #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
>> +/* Registers MISC2 to MISC4 are at offsets B to D. */
>> +#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
>> +#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
>>  #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
>> @@ -132,6 +135,8 @@
>>  #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
>>  #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
>> +#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
>> +#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
>>  
>>  #define XEC(x, mask)			(((x) >> 16) & mask)
>>  
>> @@ -189,6 +194,13 @@ enum mce_notifier_prios {
>>  
>>  struct mce_hw_err {
>>  	struct mce m;
>> +
>> +	union vendor_info {
>> +		struct {
>> +			u64 synd1;
>> +			u64 synd2;
>> +		} amd;
> 
> I presume the intent here is for Intel or other vendors to add their
> vendor-specific stuff here too?
> 
> I'm also expecting that shared fields will be promoted up to the common struct
> namespace. Pls add a short comment explaining what the goal with that struct
> is.
> 
Yes, other vendors can export their vendor-specific data through thier own
structure within the union. Yes, shared fields can be promoted to the common
structure. Will add a comment to explain the endgoal.

>> +	} vi;
> 
> Call that "vendor" so that in the code you can have
> 
> 	err.vendor.amd.
> 
> or
> 
> 	err.vendor.intel.
> 
> and so on so that it is perfectly clear what this is.
> 
Will do.
>>  };
>>  
>>  struct notifier_block;
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index cb7dc0b1aa50..fc69d244ca7f 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -799,8 +799,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>>  	if (mce_flags.smca) {
>>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vi.amd.synd1);
>> +			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vi.amd.synd2);
>> +		}
>>  	}
>>  
>>  	mce_log(&err);
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index 6225143b9b14..3bb0f8b39f97 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -189,6 +189,10 @@ static void __print_mce(struct mce_hw_err *err)
>>  	if (mce_flags.smca) {
>>  		if (m->synd)
>>  			pr_cont("SYND %llx ", m->synd);
>> +		if (err->vi.amd.synd1)
>> +			pr_cont("SYND1 %llx ", err->vi.amd.synd1);
>> +		if (err->vi.amd.synd2)
>> +			pr_cont("SYND2 %llx ", err->vi.amd.synd2);
>>  		if (m->ipid)
>>  			pr_cont("IPID %llx ", m->ipid);
>>  	}
>> @@ -639,8 +643,10 @@ static struct notifier_block mce_default_nb = {
>>  /*
>>   * Read ADDR and MISC registers.
>>   */
>> -static noinstr void mce_read_aux(struct mce *m, int i)
>> +static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
> 
> This whole conversion to struct mce_hw_err here belongs logically into patch
> 1.
> 
Had considered this. But struct mce_hw_err *err wouldn't really be used in
mce_read_aux() in patch 1. Only struct mce m, which is already available, will
be used.
Hence, deferred the change to this patch where usage of struct mce_hw_err *err
is actually introduced in mce_read_aux().

Do you prefer having this change in patch 1 instead?

>>  {
>> +	struct mce *m = &err->m;
>> +
>>  	if (m->status & MCI_STATUS_MISCV)
>>  		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
>>  
>> @@ -662,8 +668,11 @@ static noinstr void mce_read_aux(struct mce *m, int i)
>>  	if (mce_flags.smca) {
>>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> +		if (m->status & MCI_STATUS_SYNDV) {
>>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
>> +			err->vi.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
>> +			err->vi.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
>> +		}
>>  	}
>>  }
>>  
>> @@ -766,7 +775,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>>  		if (flags & MCP_DONTLOG)
>>  			goto clear_it;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(&err, i);
>>  		m->severity = mce_severity(m, NULL, NULL, false);
>>  		/*
>>  		 * Don't get the IP here because it's unlikely to
>> @@ -903,9 +912,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
>>   * Do a quick check if any of the events requires a panic.
>>   * This decides if we keep the events around or clear them.
>>   */
>> -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
>> +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
>>  					  struct pt_regs *regs)
>>  {
>> +	struct mce *m = &err->m;
>>  	char *tmp = *msg;
>>  	int i;
>>  
>> @@ -923,7 +933,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
>>  
>>  		m->bank = i;
>>  		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
>> -			mce_read_aux(m, i);
>> +			mce_read_aux(err, i);
>>  			*msg = tmp;
>>  			return 1;
>>  		}
>> @@ -1321,7 +1331,7 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, struct mce *final,
>>  		if (severity == MCE_NO_SEVERITY)
>>  			continue;
>>  
>> -		mce_read_aux(m, i);
>> +		mce_read_aux(err, i);
>>  
>>  		/* assuming valid severity level != 0 */
>>  		m->severity = severity;
>> @@ -1522,7 +1532,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>  	final = this_cpu_ptr(&hw_errs_seen);
>>  	final->m = *m;
>>  
>> -	no_way_out = mce_no_way_out(m, &msg, valid_banks, regs);
>> +	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
>>  
>>  	barrier();
>>  
>> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
>> index c5fae99de781..69e12cb2f0de 100644
>> --- a/drivers/edac/mce_amd.c
>> +++ b/drivers/edac/mce_amd.c
>> @@ -792,7 +792,8 @@ static const char *decode_error_status(struct mce *m)
>>  static int
>>  amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  {
>> -	struct mce *m = &((struct mce_hw_err *)data)->m;
>> +	struct mce_hw_err *err = (struct mce_hw_err *)data;
>> +	struct mce *m = &err->m;
>>  	unsigned int fam = x86_family(m->cpuid);
>>  	int ecc;
>>  
>> @@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
>>  		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
>>  
>> -		if (m->status & MCI_STATUS_SYNDV)
>> -			pr_cont(", Syndrome: 0x%016llx", m->synd);
>> +		if (m->status & MCI_STATUS_SYNDV) {
>> +			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> +			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> +				 err->vi.amd.synd1, err->vi.amd.synd2);
>> +		}
>>  
>>  		pr_cont("\n");
>>  
>> diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
>> index 65aba1afcd07..9e7211eddbca 100644
>> --- a/include/trace/events/mce.h
>> +++ b/include/trace/events/mce.h
>> @@ -43,6 +43,8 @@ TRACE_EVENT(mce_record,
>>  		__field(	u8,		bank		)
>>  		__field(	u8,		cpuvendor	)
>>  		__field(	u32,		microcode	)
>> +		__field(	u8,		len	)
>> +		__dynamic_array(u8, v_data, sizeof(err->vi))
>>  	),
>>  
>>  	TP_fast_assign(
>> @@ -65,9 +67,11 @@ TRACE_EVENT(mce_record,
>>  		__entry->bank		= err->m.bank;
>>  		__entry->cpuvendor	= err->m.cpuvendor;
>>  		__entry->microcode	= err->m.microcode;
>> +		__entry->len		= sizeof(err->vi);
>> +		memcpy(__get_dynamic_array(v_data), &err->vi, sizeof(err->vi));
> 
> So that vendor data layout - is that ABI too? Or are we free to shuffle the
> fields around in the future or even remove some?
> 
> This all needs to be specified somewhere explicitly so that nothing relies on
> that layout.
> 
> And I'm not sure that that's enough because when userspace tools start using
> them, then they're practically an ABI so you can't change them even if you
> wanted to.
> 
> So is libtraceevent or all the other libraries going to parse this as a blob
> and it is always going to remain such?
> 
> But then the tools which interpret it need to know its layout and if it
> changes, perhaps check kernel version which then becomes RealUgly(tm).
> 
> So you might just as well dump the separate fields one by one, without
> a dynamic array.
> 
> Or do a dynamic array but specify that their layout in struct
> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> where.
> 
> Questions over questions...
> 
Should we document this where struct mce_hw_err is defined, in
arch/x86/include/asm/mce.h? Or do you have any other recommendations?

IIUC, the libtraceevent library relies on tracepoint's format in tracefs. Below
is the format with this patchset incorporated.

[root avadnaik]# cat /sys/kernel/debug/tracing/events/mce/mce_record/format 
name: mce_record
ID: 113
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:u64 mcgcap;       offset:8;       size:8; signed:0;
        field:u64 mcgstatus;    offset:16;      size:8; signed:0;
        field:u64 status;       offset:24;      size:8; signed:0;
        field:u64 addr; offset:32;      size:8; signed:0;
        field:u64 misc; offset:40;      size:8; signed:0;
        field:u64 synd; offset:48;      size:8; signed:0;
        field:u64 ipid; offset:56;      size:8; signed:0;
        field:u64 ip;   offset:64;      size:8; signed:0;
        field:u64 tsc;  offset:72;      size:8; signed:0;
        field:u64 ppin; offset:80;      size:8; signed:0;
        field:u64 walltime;     offset:88;      size:8; signed:0;
        field:u32 cpu;  offset:96;      size:4; signed:0;
        field:u32 cpuid;        offset:100;     size:4; signed:0;
        field:u32 apicid;       offset:104;     size:4; signed:0;
        field:u32 socketid;     offset:108;     size:4; signed:0;
        field:u8 cs;    offset:112;     size:1; signed:0;
        field:u8 bank;  offset:113;     size:1; signed:0;
        field:u8 cpuvendor;     offset:114;     size:1; signed:0;
        field:u32 microcode;    offset:116;     size:4; signed:0;
        field:u8 len;   offset:120;     size:1; signed:0;
        field:__data_loc u8[] v_data;   offset:124;     size:4; signed:0;

print fmt: "CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s", REC->cpu, REC->mcgcap, REC->mcgstatus, REC->bank, REC->status, REC->ipid, REC->addr, REC->misc, REC->synd, REC->cs, REC->ip, REC->tsc, REC->ppin, REC->cpuvendor, REC->cpuid, REC->walltime, REC->socketid, REC->apicid, REC->microcode, __print_array(__get_dynamic_array(v_data), REC->len / 8, 8)

So, yes, the tools which interpret the vendor data need to aware of its layout
if things like FRUTEXT are to be decoded from the data.

Just FYI, patch adding support for this in rasdaemon, has already been merged in.
https://github.com/mchehab/rasdaemon/pull/122/commits/926c2b39c6386d0a1bf4232977f9fd7e37850361

> Thx.
> 

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] x86/mce/apei: Handle variable register array size
  2024-06-26 11:57   ` Borislav Petkov
@ 2024-06-26 17:28     ` Naik, Avadhut
  0 siblings, 0 replies; 21+ messages in thread
From: Naik, Avadhut @ 2024-06-26 17:28 UTC (permalink / raw)
  To: Borislav Petkov, Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen



On 6/26/2024 06:57, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:23PM -0500, Avadhut Naik wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> ACPI Boot Error Record Table (BERT) is being used by the kernel to
>> report errors that occurred in a previous boot. On some modern AMD
>> systems, these very errors within the BERT are reported through the
>> x86 Common Platform Error Record (CPER) format which consists of one
>> or more Processor Context Information Structures. These context
>> structures provide a starting address and represent an x86 MSR range
>> in which the data constitutes a contiguous set of MSRs starting from,
>> and including the starting address.
>>
>> It's common, for AMD systems that implement this behavior, that the
>> MSR range represents the MCAX register space used for the Scalable MCA
>> feature. The apei_smca_report_x86_error() function decodes and passes
>> this information through the MCE notifier chain. However, this function
>> assumes a fixed register size based on the original HW/FW implementation.
>>
>> This assumption breaks with the addition of two new MCAX registers viz.
>> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
>> MCAX register space, so they won't be included when decoding the CPER
>> data.
>>
>> Rework apei_smca_report_x86_error() to support a variable register array
>> size. This covers any case where the MSR context information starts at
>> the MCAX address for MCA_STATUS and ends at any other register within
>> the MCAX register space.
>>
>> Add code comments indicating the MCAX register at each offset.
>>
>> [Yazen: Add Avadhut as co-developer for wrapper changes.]
>>
>> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> This needs Avadhut's SOB after Yazen's.
> 
Will do. Will change to the below format:

Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

> Touchups ontop:
> 
> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
> index 7a15f0ca1bd1..6bbeb29125a9 100644
> --- a/arch/x86/kernel/cpu/mce/apei.c
> +++ b/arch/x86/kernel/cpu/mce/apei.c
> @@ -69,7 +69,7 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
>  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>  {
>  	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
> -	unsigned int cpu, num_registers;
> +	unsigned int cpu, num_regs;
>  	struct mce_hw_err err;
>  	struct mce *m = &err.m;
>  
> @@ -93,10 +93,10 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>  	/*
>  	 * The number of registers in the register array is determined by
>  	 * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
> -	 * Ensure that the array size includes at least 1 register.
> +	 * Sanity-check registers array size.
>  	 */
> -	num_registers = ctx_info->reg_arr_size >> 3;
> -	if (!num_registers)
> +	num_regs = ctx_info->reg_arr_size >> 3;
> +	if (!num_regs)
>  		return -EINVAL;
>  
>  	mce_setup(m);
> @@ -118,13 +118,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>  	/*
>  	 * The SMCA register layout is fixed and includes 16 registers.
>  	 * The end of the array may be variable, but the beginning is known.
> -	 * Switch on the number of registers. Cap the number of registers to
> -	 * expected max (15).
> +	 * Cap the number of registers to expected max (15).
>  	 */
> -	if (num_registers > 15)
> -		num_registers = 15;
> +	if (num_regs > 15)
> +		num_regs = 15;
>  
> -	switch (num_registers) {
> +	switch (num_regs) {
>  	/* MCA_SYND2 */
>  	case 15:
>  		err.vi.amd.synd2 = *(i_mce + 14);
> 
Will incorporate these.

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-26 12:04   ` Borislav Petkov
@ 2024-06-26 18:00     ` Naik, Avadhut
  2024-06-26 18:20       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Naik, Avadhut @ 2024-06-26 18:00 UTC (permalink / raw)
  To: Borislav Petkov, Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen



On 6/26/2024 07:04, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 02:56:24PM -0500, Avadhut Naik wrote:
>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>
>> A new "FRU Text in MCA" feature is defined where the Field Replaceable
>> Unit (FRU) Text for a device is represented by a string in the new
>> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>>
>> The FRU Text is populated dynamically for each individual error state
>> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
>> covers multiple devices, for example, a Unified Memory Controller (UMC)
>> bank that manages two DIMMs.
>>
> 
> From here...
> 
>> Print the FRU Text string, if available, when decoding an MCA error.
>>
>> Also, add field for MCA_CONFIG MSR in struct mce_hw_err as vendor specific
>> error information and save the value of the MSR. The very value can then be
>> exported through tracepoint for userspace tools like rasdaemon to print FRU
>> Text, if available.
>>
>>  Note: Checkpatch checks/warnings are ignored to maintain coding style.
> 
> ... to here goes into the trash can except what MCA_CONFIG is for being logged
> as part of the error.
> 
Will do.

>> [Yazen: Add Avadhut as co-developer for wrapper changes. ]
>>
>> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Ditto as for patch 3.
> 
Will do.
>> ---
> 
>> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  
>>  		if (m->status & MCI_STATUS_SYNDV) {
>>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> -				 err->vi.amd.synd1, err->vi.amd.synd2);
>> +			if (mca_config & MCI_CONFIG_FRUTEXT) {
>> +				char frutext[17];
>> +
>> +				memset(frutext, 0, sizeof(frutext));
> 
> Why are you clearing it if you're overwriting it immediately?
> 
Since its a local variable, wanted to ensure that the memory is zeroed out to prevent
any issues with the %s specifier, used later on.
Would you recommend removing that and using initializer instead for the string?

>> +				memcpy(&frutext[0], &err->vi.amd.synd1, 8);
>> +				memcpy(&frutext[8], &err->vi.amd.synd2, 8);
>> +
>> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
>> +			} else {
>> +				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> +					 err->vi.amd.synd1, err->vi.amd.synd2);
>> +			}
>>  		}
>>  
>>  		pr_cont("\n");
>> -- 
>> 2.34.1
>>
> 

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-06-26 17:11     ` Luck, Tony
@ 2024-06-26 18:10       ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 18:10 UTC (permalink / raw)
  To: Luck, Tony, Steven Rostedt
  Cc: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, rafael@kernel.org,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	lenb@kernel.org, mchehab@kernel.org, james.morse@arm.com,
	airlied@gmail.com, yazen.ghannam@amd.com, john.allen@amd.com,
	avadnaik@amd.com

On Wed, Jun 26, 2024 at 05:11:29PM +0000, Luck, Tony wrote:
> > Tony, any comments? You ok with this, would that fit any Intel-specific vendor
> > fields too or do you need some additional Intel-specific changes?
> 
> It looks easy enough to add any Intel specific bits to the union later.
> 
> Is there anyway that the trace event could be "smarter" about what vendor specific
> information to include based on boot_cpu_data.x86_vendor? As currently written
> Intel systems are going to see 3*u64 decoded into ascii, that are all zero. Not a
> huge deal, I think it will just look like "0x0,0x0,0x0"

Hmm, good question.

Yo, Steve, is there a way to do conditional things in a TP?

For example:

@@ -83,7 +87,8 @@ TRACE_EVENT(mce_record,
                __entry->walltime,
                __entry->socketid,
                __entry->apicid,
-               __entry->microcode)
+               __entry->microcode,

	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
		__print_array(__get_dynamic_array(v_data), __entry->len / 8, 8))

i.e., print that array only when on a AMD.

I'm sure this won't fly as it is macro magic - this is just to show the
intent...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-06-26 17:24     ` Naik, Avadhut
@ 2024-06-26 18:18       ` Borislav Petkov
  2024-07-09  6:27         ` Naik, Avadhut
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 18:18 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, Avadhut Naik

On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
> 
> 
> On 6/26/2024 06:10, Borislav Petkov wrote:
> > On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
> >> AMD's Scalable MCA systems viz. Genoa will include two new registers:
> > 
> > "viz."?
> > 
> Right. Will mention Zen4 instead of Genoa.

I still don't know what "viz." means...

> Yes, I catch your drift. Will reword the commit message to explain that the
> new syndrome registers are going to be exported through the tracepoint
> in a dynamic array, as they are vendor-specific, so that usersapce error
> decoding tools can retrieve the supplemental error information within them.

Again, why?

Why is it important to have them in the tracepoint?

> >> Note: Checkpatch warnings/errors are ignored to maintain coding style.
> > 
> > This goes...
> > 
> >>
> >> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> > 
> > Yes, you did but now your SOB chain is wrong:
> > 
> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> >> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > 
> > This tells me Avadhut is the author, Yazen handled it and he's sending it to
> > me. But nope, he isn't. So it needs another Avadhut SOB underneath.
> > 
> > Audit all patches pls.
> > 
> Wasn't aware of this chronology. Thanks for this information!

Well, there's documentation for that which you should've read already, before
sending patches:

https://kernel.org/doc/html/latest/process/development-process.html

and

https://kernel.org/doc/html/latest/process/submitting-patches.html

especially.

> So, IIUC, the sequence for this patch should be as follows?
> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

Yes, now I leave it to you to explain why. Hint: it is in those docs above.

> 
> >> ---
> > 
> > ... right under those three "---" as such notes do not belong in the commit
> > message. Remember that for the future.
> > 
> Okay. Will move the note here.

Or remove it completely. checkpatch is crap - I know. No need to have it in
every patch.

> Had considered this. But struct mce_hw_err *err wouldn't really be used in
> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
> be used.

So?

> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
> is actually introduced in mce_read_aux().
> 
> Do you prefer having this change in patch 1 instead?

I prefer a patch to contain one logical and complete change only. Because this
makes review easier. You should try reviewing patches sometimes too and you'll
know.

> > So that vendor data layout - is that ABI too? Or are we free to shuffle the
> > fields around in the future or even remove some?
> > 
> > This all needs to be specified somewhere explicitly so that nothing relies on
> > that layout.
> > 
> > And I'm not sure that that's enough because when userspace tools start using
> > them, then they're practically an ABI so you can't change them even if you
> > wanted to.
> > 
> > So is libtraceevent or all the other libraries going to parse this as a blob
> > and it is always going to remain such?
> > 
> > But then the tools which interpret it need to know its layout and if it
> > changes, perhaps check kernel version which then becomes RealUgly(tm).
> > 
> > So you might just as well dump the separate fields one by one, without
> > a dynamic array.
> > 
> > Or do a dynamic array but specify that their layout in struct
> > mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
> > where.
> > 
> > Questions over questions...
> > 
> Should we document this where struct mce_hw_err is defined, in
> arch/x86/include/asm/mce.h? Or do you have any other recommendations?

I don't know. If I knew I wouldn't have questions which you can read again and
try to answer.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-26 18:00     ` Naik, Avadhut
@ 2024-06-26 18:20       ` Borislav Petkov
  2024-06-27 16:20         ` Yazen Ghannam
  2024-07-09  6:29         ` Naik, Avadhut
  0 siblings, 2 replies; 21+ messages in thread
From: Borislav Petkov @ 2024-06-26 18:20 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-acpi,
	linux-kernel, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen

On Wed, Jun 26, 2024 at 01:00:30PM -0500, Naik, Avadhut wrote:
> > 
> > Why are you clearing it if you're overwriting it immediately?
> > 
> Since its a local variable, wanted to ensure that the memory is zeroed out to prevent
> any issues with the %s specifier, used later on.

What issues?

> Would you recommend removing that and using initializer instead for the string?

I'd recommend looking at what the code does and then really thinking whether
that makes any sense.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-26 18:20       ` Borislav Petkov
@ 2024-06-27 16:20         ` Yazen Ghannam
  2024-07-09  6:29         ` Naik, Avadhut
  1 sibling, 0 replies; 21+ messages in thread
From: Yazen Ghannam @ 2024-06-27 16:20 UTC (permalink / raw)
  To: Borislav Petkov, Avadhut Naik
  Cc: Naik, Avadhut, Avadhut Naik, x86, linux-edac, linux-trace-kernel,
	linux-acpi, linux-kernel, tony.luck, rafael, tglx, mingo, rostedt,
	lenb, mchehab, james.morse, airlied, john.allen

On Wed, Jun 26, 2024 at 08:20:13PM +0200, Borislav Petkov wrote:
> On Wed, Jun 26, 2024 at 01:00:30PM -0500, Naik, Avadhut wrote:
> > > 
> > > Why are you clearing it if you're overwriting it immediately?
> > > 
> > Since its a local variable, wanted to ensure that the memory is zeroed out to prevent
> > any issues with the %s specifier, used later on.
> 
> What issues?
> 
> > Would you recommend removing that and using initializer instead for the string?
> 
> I'd recommend looking at what the code does and then really thinking whether
> that makes any sense.
>

We need to make sure the string is NULL-terminated. So the memset()
could be replaced with this:

	frutext[16] = '\0';

Or better yet, maybe we can use scnprintf() or similar.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-06-26 18:18       ` Borislav Petkov
@ 2024-07-09  6:27         ` Naik, Avadhut
  2024-07-10  9:38           ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Naik, Avadhut @ 2024-07-09  6:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, Avadhut Naik



On 6/26/2024 13:18, Borislav Petkov wrote:
> On Wed, Jun 26, 2024 at 12:24:20PM -0500, Naik, Avadhut wrote:
>>
>>
>> On 6/26/2024 06:10, Borislav Petkov wrote:
>>> On Tue, Jun 25, 2024 at 02:56:22PM -0500, Avadhut Naik wrote:
>>>> AMD's Scalable MCA systems viz. Genoa will include two new registers:
>>>
>>> "viz."?
>>>
>> Right. Will mention Zen4 instead of Genoa.
> 
> I still don't know what "viz." means...
> 
IIUC, its an abbreviation of a Latin word and is used as a synonym for "namely"
or "that is to say".
Might not be the best choice in this case. Will change it.

>> Yes, I catch your drift. Will reword the commit message to explain that the
>> new syndrome registers are going to be exported through the tracepoint
>> in a dynamic array, as they are vendor-specific, so that usersapce error
>> decoding tools can retrieve the supplemental error information within them.
> 
> Again, why?
> 
> Why is it important to have them in the tracepoint?
> 
Userspace error decoding tools like the rasdaemon gather related hardware error
information through the tracepoints. As such, its important to have these two
registers in the tracepoint so that the tools like rasdaemon can parse them
and output the supplemental error information like FRU Text contained in them.
Yes, the registers are also being outputted thorough the dmesg but printk messages
are not an ABI.
The proper way to export these registers is through the tracepoint.

>>>> Note: Checkpatch warnings/errors are ignored to maintain coding style.
>>>
>>> This goes...
>>>
>>>>
>>>> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
>>>
>>> Yes, you did but now your SOB chain is wrong:
>>>
>>>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>>>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>>
>>> This tells me Avadhut is the author, Yazen handled it and he's sending it to
>>> me. But nope, he isn't. So it needs another Avadhut SOB underneath.
>>>
>>> Audit all patches pls.
>>>
>> Wasn't aware of this chronology. Thanks for this information!
> 
> Well, there's documentation for that which you should've read already, before
> sending patches:
> 
> https://kernel.org/doc/html/latest/process/development-process.html
> 
> and
> 
> https://kernel.org/doc/html/latest/process/submitting-patches.html
> 
> especially.
> 
>> So, IIUC, the sequence for this patch should be as follows?
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> 
> Yes, now I leave it to you to explain why. Hint: it is in those docs above.
> 
Got it. The first SoB entry is of the primary author. The successive SoB's are
from the people handling and transporting the patch.
IOW, the route taken by a patch, as its propagated, to maintainers and eventually
to Linus, should be evident from the SoB chain.

>>
>>>> ---
>>>
>>> ... right under those three "---" as such notes do not belong in the commit
>>> message. Remember that for the future.
>>>
>> Okay. Will move the note here.
> 
> Or remove it completely. checkpatch is crap - I know. No need to have it in
> every patch.
> 
Okay. Will remove the note altogether.
It's also present in the commit descriptions of other patches in this set.
Will remove from there as well.

>> Had considered this. But struct mce_hw_err *err wouldn't really be used in
>> mce_read_aux() in patch 1. Only struct mce m, which is already available, will
>> be used.
> 
> So?
> 
>> Hence, deferred the change to this patch where usage of struct mce_hw_err *err
>> is actually introduced in mce_read_aux().
>>
>> Do you prefer having this change in patch 1 instead?
> 
> I prefer a patch to contain one logical and complete change only. Because this
> makes review easier. You should try reviewing patches sometimes too and you'll
> know.
> 
Understood. Will move this to patch1.

>>> So that vendor data layout - is that ABI too? Or are we free to shuffle the
>>> fields around in the future or even remove some?
>>>
>>> This all needs to be specified somewhere explicitly so that nothing relies on
>>> that layout.
>>>
>>> And I'm not sure that that's enough because when userspace tools start using
>>> them, then they're practically an ABI so you can't change them even if you
>>> wanted to.
>>>
>>> So is libtraceevent or all the other libraries going to parse this as a blob
>>> and it is always going to remain such?
>>>
>>> But then the tools which interpret it need to know its layout and if it
>>> changes, perhaps check kernel version which then becomes RealUgly(tm).
>>>
>>> So you might just as well dump the separate fields one by one, without
>>> a dynamic array.
>>>
>>> Or do a dynamic array but specify that their layout in struct
>>> mce_hw_er.vendor.amd are cast in stone so that we're all clear on what goes
>>> where.
>>>
>>> Questions over questions...
>>>
>> Should we document this where struct mce_hw_err is defined, in
>> arch/x86/include/asm/mce.h? Or do you have any other recommendations?
> 
> I don't know. If I knew I wouldn't have questions which you can read again and
> try to answer.
> 
IIUC, at least for now, the libtraceevent library parses the entire vendor data
array as a blob. Rather, a pointer to the array in the raw tracepoint record along
with its length is returned by the library's tep_get_field_raw() API.

This very API has been used for implementing support for these registers and FRU
Text in the rasdaemon.

https://github.com/mchehab/rasdaemon/pull/122

Thus, the position of the array within the tracepoint and its length can be changed
in the future.

Its layout however, is a completely different matter. At least for AMD, it shouldn't
be changed. New fields, if any, should be added at the end.

The underlying reason for this is the FRU text feature.

With this set, the first two elements of the vendor data dynamic array are SYND 1/2
registers while the third element is MCA_CONFIG (added through patch 4 of the set).
Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.

Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.

Hope this answers some of your questions!

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-06-26 18:20       ` Borislav Petkov
  2024-06-27 16:20         ` Yazen Ghannam
@ 2024-07-09  6:29         ` Naik, Avadhut
  1 sibling, 0 replies; 21+ messages in thread
From: Naik, Avadhut @ 2024-07-09  6:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-acpi,
	linux-kernel, tony.luck, rafael, tglx, mingo, rostedt, lenb,
	mchehab, james.morse, airlied, yazen.ghannam, john.allen



On 6/26/2024 13:20, Borislav Petkov wrote:
> On Wed, Jun 26, 2024 at 01:00:30PM -0500, Naik, Avadhut wrote:
>>>
>>> Why are you clearing it if you're overwriting it immediately?
>>>
>> Since its a local variable, wanted to ensure that the memory is zeroed out to prevent
>> any issues with the %s specifier, used later on.
> 
> What issues?
> 
Its a locally defined string of 17 bytes. We are doing memcpy() into the first 16 bytes.
Don't we need to ensure that it is NULL-terminated to prevent undefined behavior when its
given to pr_emerg()? Am I missing something here?

>> Would you recommend removing that and using initializer instead for the string?
> 
> I'd recommend looking at what the code does and then really thinking whether
> that makes any sense.
> 

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-07-09  6:27         ` Naik, Avadhut
@ 2024-07-10  9:38           ` Borislav Petkov
  2024-07-10 22:59             ` Naik, Avadhut
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2024-07-10  9:38 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, Avadhut Naik

On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> IIUC, its an abbreviation of a Latin word and is used as a synonym for "namely"
> or "that is to say".
> Might not be the best choice in this case. Will change it.

I learn new stuff every day:

https://en.wikipedia.org/wiki/Viz.

> Userspace error decoding tools like the rasdaemon gather related hardware error
> information through the tracepoints. As such, its important to have these two
> registers in the tracepoint so that the tools like rasdaemon can parse them
> and output the supplemental error information like FRU Text contained in them.

Put *that* in the commit message - do not explain what the patch does.

> Got it. The first SoB entry is of the primary author. The successive SoB's are
> from the people handling and transporting the patch.

Exactly!

> IOW, the route taken by a patch, as its propagated, to maintainers and eventually
> to Linus, should be evident from the SoB chain.

You got it.

> With this set, the first two elements of the vendor data dynamic array are SYND 1/2
> registers while the third element is MCA_CONFIG (added through patch 4 of the set).
> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
> 
> Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.

So it sounds to me like we want to document and thus freeze the
vendor-specific blob layout because tools are going to be using and parsing
it. And this will spare us the kernel version checks.

And new additions to that AMD-specific blob will come at the end and will
have to be documented too.

That sounds like an ok compromise to me...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-07-10  9:38           ` Borislav Petkov
@ 2024-07-10 22:59             ` Naik, Avadhut
  0 siblings, 0 replies; 21+ messages in thread
From: Naik, Avadhut @ 2024-07-10 22:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: x86, linux-edac, linux-trace-kernel, linux-acpi, linux-kernel,
	tony.luck, rafael, tglx, mingo, rostedt, lenb, mchehab,
	james.morse, airlied, yazen.ghannam, john.allen, Avadhut Naik



On 7/10/2024 04:38, Borislav Petkov wrote:
> On Tue, Jul 09, 2024 at 01:27:25AM -0500, Naik, Avadhut wrote:
> 
>> Userspace error decoding tools like the rasdaemon gather related hardware error
>> information through the tracepoints. As such, its important to have these two
>> registers in the tracepoint so that the tools like rasdaemon can parse them
>> and output the supplemental error information like FRU Text contained in them.
> 
> Put *that* in the commit message - do not explain what the patch does.
> 
Will do.
 
>> With this set, the first two elements of the vendor data dynamic array are SYND 1/2
>> registers while the third element is MCA_CONFIG (added through patch 4 of the set).
>> Now, in rasdaemon, SYND1/2 register contents (i.e. first two fields) are interpreted
>> as FRU Text only if BIT(9) of MCA_CONFIG (third field) is set.
>>
>> Thus, we depend on array's layout for accurate FRU Text decoding in the rasdaemon.
> 
> So it sounds to me like we want to document and thus freeze the
> vendor-specific blob layout because tools are going to be using and parsing
> it. And this will spare us the kernel version checks.
> 
> And new additions to that AMD-specific blob will come at the end and will
> have to be documented too.
> 
> That sounds like an ok compromise to me...
> 
> Thx.
>
Sounds good!
Is it okay to document this where the new wrapper and vendor-specific data
structures are being defined, in arch/x86/include/asm/mce.h?
Similar approach has been taken for struct mce.
Or do you have any other recommendations?

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-07-10 22:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-25 19:56 [PATCH v2 0/4] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
2024-06-25 19:56 ` [PATCH v2 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
2024-06-26 10:44   ` Borislav Petkov
2024-06-26 17:11     ` Luck, Tony
2024-06-26 18:10       ` Borislav Petkov
2024-06-25 19:56 ` [PATCH v2 2/4] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
2024-06-26 11:10   ` Borislav Petkov
2024-06-26 17:24     ` Naik, Avadhut
2024-06-26 18:18       ` Borislav Petkov
2024-07-09  6:27         ` Naik, Avadhut
2024-07-10  9:38           ` Borislav Petkov
2024-07-10 22:59             ` Naik, Avadhut
2024-06-25 19:56 ` [PATCH v2 3/4] x86/mce/apei: Handle variable register array size Avadhut Naik
2024-06-26 11:57   ` Borislav Petkov
2024-06-26 17:28     ` Naik, Avadhut
2024-06-25 19:56 ` [PATCH v2 4/4] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
2024-06-26 12:04   ` Borislav Petkov
2024-06-26 18:00     ` Naik, Avadhut
2024-06-26 18:20       ` Borislav Petkov
2024-06-27 16:20         ` Yazen Ghannam
2024-07-09  6:29         ` Naik, Avadhut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).