[PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs
@ 2024-10-22 19:36 Avadhut Naik
  2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

This patchset adds a new wrapper for struct mce to prevent its bloating
and export vendor specific error information. Additionally, support is
also introduced for two new "syndrome" MSRs used in newer AMD Scalable
MCA (SMCA) systems. Also, a new "FRU Text in MCA" feature that uses these
new "syndrome" MSRs has been addded.

Patch 1 adds the new wrapper structure mce_hw_err for the struct mce
while also modifying the mce_record tracepoint to use the new wrapper.
 
Patch 2 introduces a new helper function, __print_dynamic_array(), for
logging dynamic arrays through tracepoints.
 
Patch 3 adds support for the new "syndrome" registers. They are read/printed
wherever the existing MCA_SYND register is used.
 
Patch 4 updates the function that pulls MCA information from UEFI x86
Common Platform Error Records (CPERs) to handle systems that support the
new registers.
 
Patch 5 adds support to the AMD MCE decoder module to detect and use the
"FRU Text in MCA" feature which leverages the new registers.
 
NOTE:

This set was initially submitted as part of the larger MCA Updates set.

v1: https://lore.kernel.org/linux-edac/20231118193248.1296798-1-yazen.ghannam@amd.com/
v2: https://lore.kernel.org/linux-edac/20240404151359.47970-1-yazen.ghannam@amd.com/
However, since the MCA Updates set has been split up into smaller sets,
this set, going forward, will be submitted independently.

Having said that, this set set depends on and applies cleanly on top ofthe below two sets.

[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

Changes in v2: 
 - Drop dependencies on sets [1] and [2] above and rebase on top of
   tip/master.
  
Changes in v3:
 - Move wrapper changes required in mce_read_aux() and mce_no_way_out()
   from second patch to the first patch.
 - Modify commit messages for second and fourth patch per feedback
   received.
 - Add comments to explain purpose of the new wrapper structure.
 - Incorporate suggested touchup in the third patch.
 - Remove call to memset() for the frutext string in the fourth patch.
   Instead, just ensure that the string is NULL terminated.
 - Fix SoB chains on all patches to properly reflect the patch path.

Changes in v4:
 - Resolve kernel test robot's warning on the use of memset() in
   do_machine_check().
 - Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
 - Introduce a new helper function __print_dynamic_array() for logging
   dynamic arrays through tracepoints.
 - Remove "len" field from the modified mce_record tracepoint since the
   length of a dynamic array can be fetched from its metadata.
 - Substitute __print_array() with __print_dynamic_array().

Changes in v6:
 - Introduce to_mce_hw_err macro to eliminate changes required in notifier
   chain callback functions.
 - Use the above macro in amd_decode_mce() notifier chain callback.
 - Change third parameter of __mc_scan_banks() to a pointer to the new
   wrapper, struct mce_hw_err.
 - Rebase on top of tip/master.

Changes in v7:
 - Fix initialization of struct mce_hw_err *final in do_machine_check().
 - Add parenthesis around el_size parameter in __print_dynamic_array
   macro.
 - Add Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> tag.
 - Change second parameter of __print_dynamic_array from 8 to sizeof(u8)
   to ensure that the dynamic array is parsed using a u8 pointer instead
   of u64 pointer.
 - Rebase on top of tip/master.

Links:
v1: https://lore.kernel.org/linux-edac/20240530211620.1829453-1-avadhut.naik@amd.com/
v2: https://lore.kernel.org/linux-edac/20240625195624.2565741-1-avadhut.naik@amd.com/
v3: https://lore.kernel.org/linux-edac/20240730185406.3709876-1-avadhut.naik@amd.com/T/#t
v4: https://lore.kernel.org/linux-edac/20240815211635.1336721-1-avadhut.naik@amd.com/
v5: https://lore.kernel.org/linux-edac/20241001181617.604573-1-avadhut.naik@amd.com/
v6: https://lore.kernel.org/linux-edac/20241016064021.2773618-1-avadhut.naik@amd.com/

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Steven Rostedt (1):
  tracing: Add __print_dynamic_array() helper

Yazen Ghannam (2):
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h                 |  38 +++-
 arch/x86/include/uapi/asm/mce.h            |   3 +-
 arch/x86/kernel/cpu/mce/amd.c              |  31 +--
 arch/x86/kernel/cpu/mce/apei.c             | 109 ++++++++---
 arch/x86/kernel/cpu/mce/core.c             | 217 ++++++++++++---------
 arch/x86/kernel/cpu/mce/genpool.c          |  18 +-
 arch/x86/kernel/cpu/mce/inject.c           |   4 +-
 arch/x86/kernel/cpu/mce/internal.h         |   4 +-
 drivers/edac/mce_amd.c                     |  25 ++-
 include/trace/events/mce.h                 |  49 ++---
 include/trace/stages/stage3_trace_output.h |   8 +
 include/trace/stages/stage7_class_define.h |   1 +
 samples/trace_events/trace-events-sample.h |   7 +-
 13 files changed, 334 insertions(+), 180 deletions(-)


base-commit: d7ec15ce8bdc955ce383123c4f01ad0a8155fb90
-- 
2.43.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
@ 2024-10-22 19:36 ` Avadhut Naik
  2024-10-24  2:21   ` Zhuo, Qiuxu
  2024-10-30 13:32   ` Borislav Petkov
  2024-10-22 19:36 ` [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper Avadhut Naik
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.

The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.

[Yazen: Add last commit message paragraph.]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Move wrapper changes required in mce_read_aux() and mce_no_way_out()
to this patch from the second patch.
2. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on of tip/master to avoid merge conflicts.
2. Resolve kernel test robot's warning on the use of memset() in
do_machine_check().

Changes in v5:
1. No changes except rebasing on top of tip/master.

Changes in v6:
1. Rebase on top of tip/master.
2. Introduce to_mce_hw_err macro to eliminate changes required in notifier
chain callback functions, especially callback functions of EDAC drivers.
3. Change third parameter of __mc_scan_banks() to a pointer to the new
wrapper structure and make the required changes accordingly.

Changes in v7:
1. Rebase on top of tip/master.
2. Fix initialization of struct mce_hw_err *final in do_machine_check().
---
 arch/x86/include/asm/mce.h         |  14 +-
 arch/x86/kernel/cpu/mce/amd.c      |  27 ++--
 arch/x86/kernel/cpu/mce/apei.c     |  45 ++++---
 arch/x86/kernel/cpu/mce/core.c     | 205 ++++++++++++++++-------------
 arch/x86/kernel/cpu/mce/genpool.c  |  18 +--
 arch/x86/kernel/cpu/mce/inject.c   |   4 +-
 arch/x86/kernel/cpu/mce/internal.h |   4 +-
 include/trace/events/mce.h         |  42 +++---
 8 files changed, 200 insertions(+), 159 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3b9970117a0f..4e45f45673a3 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -187,6 +187,16 @@ enum mce_notifier_prios {
 	MCE_PRIO_HIGHEST = MCE_PRIO_CEC
 };
 
+/**
+ * struct mce_hw_err - Hardware Error Record.
+ * @m:		Machine Check record.
+ */
+struct mce_hw_err {
+	struct mce m;
+};
+
+#define	to_mce_hw_err(mce) container_of(mce, struct mce_hw_err, m)
+
 struct notifier_block;
 extern void mce_register_decode_chain(struct notifier_block *nb);
 extern void mce_unregister_decode_chain(struct notifier_block *nb);
@@ -221,8 +231,8 @@ static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 					     u64 lapic_id) { return -EINVAL; }
 #endif
 
-void mce_prep_record(struct mce *m);
-void mce_log(struct mce *m);
+void mce_prep_record(struct mce_hw_err *err);
+void mce_log(struct mce_hw_err *err);
 DECLARE_PER_CPU(struct device *, mce_device);
 
 /* Maximum number of MCA banks per CPU. */
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 14bf8c232e45..5b4d266500b2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -778,29 +778,30 @@ bool amd_mce_usable_address(struct mce *m)
 
 static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 {
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m = &err.m;
 
-	mce_prep_record(&m);
+	mce_prep_record(&err);
 
-	m.status = status;
-	m.misc   = misc;
-	m.bank   = bank;
-	m.tsc	 = rdtsc();
+	m->status = status;
+	m->misc   = misc;
+	m->bank   = bank;
+	m->tsc	 = rdtsc();
 
-	if (m.status & MCI_STATUS_ADDRV) {
-		m.addr = addr;
+	if (m->status & MCI_STATUS_ADDRV) {
+		m->addr = addr;
 
-		smca_extract_err_addr(&m);
+		smca_extract_err_addr(m);
 	}
 
 	if (mce_flags.smca) {
-		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid);
+		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m.status & MCI_STATUS_SYNDV)
-			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd);
+		if (m->status & MCI_STATUS_SYNDV)
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
 	}
 
-	mce_log(&m);
+	mce_log(&err);
 }
 
 DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error)
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 3885fe05f01e..7f582b4ca1ca 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -28,7 +28,8 @@
 
 void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 {
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m;
 	int lsb;
 
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
@@ -44,22 +45,23 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
 	else
 		lsb = PAGE_SHIFT;
 
-	mce_prep_record(&m);
-	m.bank = -1;
+	mce_prep_record(&err);
+	m = &err.m;
+	m->bank = -1;
 	/* Fake a memory read error with unknown channel */
-	m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
-	m.misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;
+	m->status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
+	m->misc = (MCI_MISC_ADDR_PHYS << 6) | lsb;
 
 	if (severity >= GHES_SEV_RECOVERABLE)
-		m.status |= MCI_STATUS_UC;
+		m->status |= MCI_STATUS_UC;
 
 	if (severity >= GHES_SEV_PANIC) {
-		m.status |= MCI_STATUS_PCC;
-		m.tsc = rdtsc();
+		m->status |= MCI_STATUS_PCC;
+		m->tsc = rdtsc();
 	}
 
-	m.addr = mem_err->physical_addr;
-	mce_log(&m);
+	m->addr = mem_err->physical_addr;
+	mce_log(&err);
 }
 EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 
@@ -67,8 +69,9 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
 	bool apicid_found = false;
+	struct mce_hw_err err;
 	unsigned int cpu;
-	struct mce m;
+	struct mce *m;
 
 	if (!boot_cpu_has(X86_FEATURE_SMCA))
 		return -EINVAL;
@@ -108,18 +111,20 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	if (!apicid_found)
 		return -EINVAL;
 
-	mce_prep_record_common(&m);
-	mce_prep_record_per_cpu(cpu, &m);
+	m = &err.m;
+	memset(&err, 0, sizeof(struct mce_hw_err));
+	mce_prep_record_common(m);
+	mce_prep_record_per_cpu(cpu, m);
 
-	m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
-	m.status = *i_mce;
-	m.addr = *(i_mce + 1);
-	m.misc = *(i_mce + 2);
+	m->bank = (ctx_info->msr_addr >> 4) & 0xFF;
+	m->status = *i_mce;
+	m->addr = *(i_mce + 1);
+	m->misc = *(i_mce + 2);
 	/* Skipping MCA_CONFIG */
-	m.ipid = *(i_mce + 4);
-	m.synd = *(i_mce + 5);
+	m->ipid = *(i_mce + 4);
+	m->synd = *(i_mce + 5);
 
-	mce_log(&m);
+	mce_log(&err);
 
 	return 0;
 }
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2a938f429c4d..3611366d56b7 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -88,7 +88,7 @@ struct mca_config mca_cfg __read_mostly = {
 	.monarch_timeout = -1
 };
 
-static DEFINE_PER_CPU(struct mce, mces_seen);
+static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
 static unsigned long mce_need_notify;
 
 /*
@@ -119,8 +119,6 @@ BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
 
 void mce_prep_record_common(struct mce *m)
 {
-	memset(m, 0, sizeof(struct mce));
-
 	m->cpuid	= cpuid_eax(1);
 	m->cpuvendor	= boot_cpu_data.x86_vendor;
 	m->mcgcap	= __rdmsr(MSR_IA32_MCG_CAP);
@@ -138,9 +136,12 @@ void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m)
 	m->socketid	= topology_physical_package_id(cpu);
 }
 
-/* Do initial initialization of a struct mce */
-void mce_prep_record(struct mce *m)
+/* Do initial initialization of struct mce_hw_err */
+void mce_prep_record(struct mce_hw_err *err)
 {
+	struct mce *m = &err->m;
+
+	memset(err, 0, sizeof(struct mce_hw_err));
 	mce_prep_record_common(m);
 	mce_prep_record_per_cpu(smp_processor_id(), m);
 }
@@ -148,9 +149,9 @@ void mce_prep_record(struct mce *m)
 DEFINE_PER_CPU(struct mce, injectm);
 EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
-void mce_log(struct mce *m)
+void mce_log(struct mce_hw_err *err)
 {
-	if (!mce_gen_pool_add(m))
+	if (!mce_gen_pool_add(err))
 		irq_work_queue(&mce_irq_work);
 }
 EXPORT_SYMBOL_GPL(mce_log);
@@ -171,8 +172,10 @@ void mce_unregister_decode_chain(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
 
-static void __print_mce(struct mce *m)
+static void __print_mce(struct mce_hw_err *err)
 {
+	struct mce *m = &err->m;
+
 	pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n",
 		 m->extcpu,
 		 (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""),
@@ -214,9 +217,11 @@ static void __print_mce(struct mce *m)
 		m->microcode);
 }
 
-static void print_mce(struct mce *m)
+static void print_mce(struct mce_hw_err *err)
 {
-	__print_mce(m);
+	struct mce *m = &err->m;
+
+	__print_mce(err);
 
 	if (m->cpuvendor != X86_VENDOR_AMD && m->cpuvendor != X86_VENDOR_HYGON)
 		pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n");
@@ -251,7 +256,7 @@ static const char *mce_dump_aux_info(struct mce *m)
 	return NULL;
 }
 
-static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
+static noinstr void mce_panic(const char *msg, struct mce_hw_err *final, char *exp)
 {
 	struct llist_node *pending;
 	struct mce_evt_llist *l;
@@ -282,20 +287,22 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	pending = mce_gen_pool_prepare_records();
 	/* First print corrected ones that are still unlogged */
 	llist_for_each_entry(l, pending, llnode) {
-		struct mce *m = &l->mce;
+		struct mce_hw_err *err = &l->err;
+		struct mce *m = &err->m;
 		if (!(m->status & MCI_STATUS_UC)) {
-			print_mce(m);
+			print_mce(err);
 			if (!apei_err)
 				apei_err = apei_write_mce(m);
 		}
 	}
 	/* Now print uncorrected but with the final one last */
 	llist_for_each_entry(l, pending, llnode) {
-		struct mce *m = &l->mce;
+		struct mce_hw_err *err = &l->err;
+		struct mce *m = &err->m;
 		if (!(m->status & MCI_STATUS_UC))
 			continue;
-		if (!final || mce_cmp(m, final)) {
-			print_mce(m);
+		if (!final || mce_cmp(m, &final->m)) {
+			print_mce(err);
 			if (!apei_err)
 				apei_err = apei_write_mce(m);
 		}
@@ -303,12 +310,12 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (final) {
 		print_mce(final);
 		if (!apei_err)
-			apei_err = apei_write_mce(final);
+			apei_err = apei_write_mce(&final->m);
 	}
 	if (exp)
 		pr_emerg(HW_ERR "Machine check: %s\n", exp);
 
-	memmsg = mce_dump_aux_info(final);
+	memmsg = mce_dump_aux_info(&final->m);
 	if (memmsg)
 		pr_emerg(HW_ERR "Machine check: %s\n", memmsg);
 
@@ -323,9 +330,9 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 		 * panic.
 		 */
 		if (kexec_crash_loaded()) {
-			if (final && (final->status & MCI_STATUS_ADDRV)) {
+			if (final && (final->m.status & MCI_STATUS_ADDRV)) {
 				struct page *p;
-				p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+				p = pfn_to_online_page(final->m.addr >> PAGE_SHIFT);
 				if (p)
 					SetPageHWPoison(p);
 			}
@@ -445,16 +452,18 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
  * check into our "mce" struct so that we can use it later to assess
  * the severity of the problem as we read per-bank specific details.
  */
-static noinstr void mce_gather_info(struct mce *m, struct pt_regs *regs)
+static noinstr void mce_gather_info(struct mce_hw_err *err, struct pt_regs *regs)
 {
+	struct mce *m;
 	/*
 	 * Enable instrumentation around mce_prep_record() which calls external
 	 * facilities.
 	 */
 	instrumentation_begin();
-	mce_prep_record(m);
+	mce_prep_record(err);
 	instrumentation_end();
 
+	m = &err->m;
 	m->mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
 	if (regs) {
 		/*
@@ -574,13 +583,13 @@ EXPORT_SYMBOL_GPL(mce_is_correctable);
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = to_mce_hw_err(data);
 
-	if (!m)
+	if (!err)
 		return NOTIFY_DONE;
 
 	/* Emit the trace record: */
-	trace_mce_record(m);
+	trace_mce_record(err);
 
 	set_bit(0, &mce_need_notify);
 
@@ -624,13 +633,13 @@ static struct notifier_block mce_uc_nb = {
 static int mce_default_notifier(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
-	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = to_mce_hw_err(data);
 
-	if (!m)
+	if (!err)
 		return NOTIFY_DONE;
 
-	if (mca_cfg.print_all || !m->kflags)
-		__print_mce(m);
+	if (mca_cfg.print_all || !(err->m.kflags))
+		__print_mce(err);
 
 	return NOTIFY_DONE;
 }
@@ -644,8 +653,10 @@ static struct notifier_block mce_default_nb = {
 /*
  * Read ADDR and MISC registers.
  */
-static noinstr void mce_read_aux(struct mce *m, int i)
+static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 {
+	struct mce *m = &err->m;
+
 	if (m->status & MCI_STATUS_MISCV)
 		m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
 
@@ -692,26 +703,28 @@ DEFINE_PER_CPU(unsigned, mce_poll_count);
 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	struct mce m;
+	struct mce_hw_err err;
+	struct mce *m;
 	int i;
 
 	this_cpu_inc(mce_poll_count);
 
-	mce_gather_info(&m, NULL);
+	mce_gather_info(&err, NULL);
+	m = &err.m;
 
 	if (flags & MCP_TIMESTAMP)
-		m.tsc = rdtsc();
+		m->tsc = rdtsc();
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		if (!mce_banks[i].ctl || !test_bit(i, *b))
 			continue;
 
-		m.misc = 0;
-		m.addr = 0;
-		m.bank = i;
+		m->misc = 0;
+		m->addr = 0;
+		m->bank = i;
 
 		barrier();
-		m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
+		m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
 
 		/*
 		 * Update storm tracking here, before checking for the
@@ -721,17 +734,17 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 * storm status.
 		 */
 		if (!mca_cfg.cmci_disabled)
-			mce_track_storm(&m);
+			mce_track_storm(m);
 
 		/* If this entry is not valid, ignore it */
-		if (!(m.status & MCI_STATUS_VAL))
+		if (!(m->status & MCI_STATUS_VAL))
 			continue;
 
 		/*
 		 * If we are logging everything (at CPU online) or this
 		 * is a corrected error, then we must log it.
 		 */
-		if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC))
+		if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC))
 			goto log_it;
 
 		/*
@@ -741,20 +754,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 * everything else.
 		 */
 		if (!mca_cfg.ser) {
-			if (m.status & MCI_STATUS_UC)
+			if (m->status & MCI_STATUS_UC)
 				continue;
 			goto log_it;
 		}
 
 		/* Log "not enabled" (speculative) errors */
-		if (!(m.status & MCI_STATUS_EN))
+		if (!(m->status & MCI_STATUS_EN))
 			goto log_it;
 
 		/*
 		 * Log UCNA (SDM: 15.6.3 "UCR Error Classification")
 		 * UC == 1 && PCC == 0 && S == 0
 		 */
-		if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S))
+		if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S))
 			goto log_it;
 
 		/*
@@ -768,20 +781,20 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		if (flags & MCP_DONTLOG)
 			goto clear_it;
 
-		mce_read_aux(&m, i);
-		m.severity = mce_severity(&m, NULL, NULL, false);
+		mce_read_aux(&err, i);
+		m->severity = mce_severity(m, NULL, NULL, false);
 		/*
 		 * Don't get the IP here because it's unlikely to
 		 * have anything to do with the actual error location.
 		 */
 
-		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
+		if (mca_cfg.dont_log_ce && !mce_usable_address(m))
 			goto clear_it;
 
 		if (flags & MCP_QUEUE_LOG)
-			mce_gen_pool_add(&m);
+			mce_gen_pool_add(&err);
 		else
-			mce_log(&m);
+			mce_log(&err);
 
 clear_it:
 		/*
@@ -905,9 +918,10 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
  */
-static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp,
+static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
 					  struct pt_regs *regs)
 {
+	struct mce *m = &err->m;
 	char *tmp = *msg;
 	int i;
 
@@ -925,7 +939,7 @@ static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsigned lo
 
 		m->bank = i;
 		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
-			mce_read_aux(m, i);
+			mce_read_aux(err, i);
 			*msg = tmp;
 			return 1;
 		}
@@ -1017,6 +1031,7 @@ static noinstr int mce_timed_out(u64 *t, const char *msg)
 static void mce_reign(void)
 {
 	int cpu;
+	struct mce_hw_err *err = NULL;
 	struct mce *m = NULL;
 	int global_worst = 0;
 	char *msg = NULL;
@@ -1027,11 +1042,13 @@ static void mce_reign(void)
 	 * Grade the severity of the errors of all the CPUs.
 	 */
 	for_each_possible_cpu(cpu) {
-		struct mce *mtmp = &per_cpu(mces_seen, cpu);
+		struct mce_hw_err *etmp = &per_cpu(hw_errs_seen, cpu);
+		struct mce *mtmp = &etmp->m;
 
 		if (mtmp->severity > global_worst) {
 			global_worst = mtmp->severity;
-			m = &per_cpu(mces_seen, cpu);
+			err = &per_cpu(hw_errs_seen, cpu);
+			m = &err->m;
 		}
 	}
 
@@ -1043,7 +1060,7 @@ static void mce_reign(void)
 	if (m && global_worst >= MCE_PANIC_SEVERITY) {
 		/* call mce_severity() to get "msg" for panic */
 		mce_severity(m, NULL, &msg, true);
-		mce_panic("Fatal machine check", m, msg);
+		mce_panic("Fatal machine check", err, msg);
 	}
 
 	/*
@@ -1060,11 +1077,11 @@ static void mce_reign(void)
 		mce_panic("Fatal machine check from unknown source", NULL, NULL);
 
 	/*
-	 * Now clear all the mces_seen so that they don't reappear on
+	 * Now clear all the hw_errs_seen so that they don't reappear on
 	 * the next mce.
 	 */
 	for_each_possible_cpu(cpu)
-		memset(&per_cpu(mces_seen, cpu), 0, sizeof(struct mce));
+		memset(&per_cpu(hw_errs_seen, cpu), 0, sizeof(struct mce_hw_err));
 }
 
 static atomic_t global_nwo;
@@ -1268,12 +1285,13 @@ static noinstr bool mce_check_crashing_cpu(void)
 }
 
 static __always_inline int
-__mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final,
-		unsigned long *toclear, unsigned long *valid_banks, int no_way_out,
-		int *worst)
+__mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs,
+		struct mce_hw_err *final, unsigned long *toclear,
+		unsigned long *valid_banks, int no_way_out, int *worst)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
 	struct mca_config *cfg = &mca_cfg;
+	struct mce *m = &err->m;
 	int severity, i, taint = 0;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -1319,7 +1337,7 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final,
 		if (severity == MCE_NO_SEVERITY)
 			continue;
 
-		mce_read_aux(m, i);
+		mce_read_aux(err, i);
 
 		/* assuming valid severity level != 0 */
 		m->severity = severity;
@@ -1329,17 +1347,17 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final,
 		 * done in #MC context, where instrumentation is disabled.
 		 */
 		instrumentation_begin();
-		mce_log(m);
+		mce_log(err);
 		instrumentation_end();
 
 		if (severity > *worst) {
-			*final = *m;
+			*final = *err;
 			*worst = severity;
 		}
 	}
 
 	/* mce_clear_state will clear *final, save locally for use later */
-	*m = *final;
+	*err = *final;
 
 	return taint;
 }
@@ -1399,8 +1417,9 @@ static void kill_me_never(struct callback_head *cb)
 		set_mce_nospec(pfn);
 }
 
-static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callback_head *))
+static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(struct callback_head *))
 {
+	struct mce *m = &err->m;
 	int count = ++current->mce_count;
 
 	/* First call, save all the details */
@@ -1414,11 +1433,12 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba
 
 	/* Ten is likely overkill. Don't expect more than two faults before task_work() */
 	if (count > 10)
-		mce_panic("Too many consecutive machine checks while accessing user data", m, msg);
+		mce_panic("Too many consecutive machine checks while accessing user data",
+			  err, msg);
 
 	/* Second or later call, make sure page address matches the one from first call */
 	if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT))
-		mce_panic("Consecutive machine checks to different user pages", m, msg);
+		mce_panic("Consecutive machine checks to different user pages", err, msg);
 
 	/* Do not call task_work_add() more than once */
 	if (count > 1)
@@ -1467,8 +1487,10 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	int worst = 0, order, no_way_out, kill_current_task, lmce, taint = 0;
 	DECLARE_BITMAP(valid_banks, MAX_NR_BANKS) = { 0 };
 	DECLARE_BITMAP(toclear, MAX_NR_BANKS) = { 0 };
-	struct mce m, *final;
+	struct mce_hw_err *final;
+	struct mce_hw_err err;
 	char *msg = NULL;
+	struct mce *m;
 
 	if (unlikely(mce_flags.p5))
 		return pentium_machine_check(regs);
@@ -1506,13 +1528,14 @@ noinstr void do_machine_check(struct pt_regs *regs)
 
 	this_cpu_inc(mce_exception_count);
 
-	mce_gather_info(&m, regs);
-	m.tsc = rdtsc();
+	mce_gather_info(&err, regs);
+	m = &err.m;
+	m->tsc = rdtsc();
 
-	final = this_cpu_ptr(&mces_seen);
-	*final = m;
+	final = this_cpu_ptr(&hw_errs_seen);
+	*final = err;
 
-	no_way_out = mce_no_way_out(&m, &msg, valid_banks, regs);
+	no_way_out = mce_no_way_out(&err, &msg, valid_banks, regs);
 
 	barrier();
 
@@ -1521,15 +1544,15 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * Assume the worst for now, but if we find the
 	 * severity is MCE_AR_SEVERITY we have other options.
 	 */
-	if (!(m.mcgstatus & MCG_STATUS_RIPV))
+	if (!(m->mcgstatus & MCG_STATUS_RIPV))
 		kill_current_task = 1;
 	/*
 	 * Check if this MCE is signaled to only this logical processor,
 	 * on Intel, Zhaoxin only.
 	 */
-	if (m.cpuvendor == X86_VENDOR_INTEL ||
-	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
-		lmce = m.mcgstatus & MCG_STATUS_LMCES;
+	if (m->cpuvendor == X86_VENDOR_INTEL ||
+	    m->cpuvendor == X86_VENDOR_ZHAOXIN)
+		lmce = m->mcgstatus & MCG_STATUS_LMCES;
 
 	/*
 	 * Local machine check may already know that we have to panic.
@@ -1540,12 +1563,12 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 */
 	if (lmce) {
 		if (no_way_out)
-			mce_panic("Fatal local machine check", &m, msg);
+			mce_panic("Fatal local machine check", &err, msg);
 	} else {
 		order = mce_start(&no_way_out);
 	}
 
-	taint = __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_out, &worst);
+	taint = __mc_scan_banks(&err, regs, final, toclear, valid_banks, no_way_out, &worst);
 
 	if (!no_way_out)
 		mce_clear_state(toclear);
@@ -1560,7 +1583,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
 				no_way_out = worst >= MCE_PANIC_SEVERITY;
 
 			if (no_way_out)
-				mce_panic("Fatal machine check on current CPU", &m, msg);
+				mce_panic("Fatal machine check on current CPU", &err, msg);
 		}
 	} else {
 		/*
@@ -1572,8 +1595,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * make sure we have the right "msg".
 		 */
 		if (worst >= MCE_PANIC_SEVERITY) {
-			mce_severity(&m, regs, &msg, true);
-			mce_panic("Local fatal machine check!", &m, msg);
+			mce_severity(m, regs, &msg, true);
+			mce_panic("Local fatal machine check!", &err, msg);
 		}
 	}
 
@@ -1591,16 +1614,16 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		goto out;
 
 	/* Fault was in user mode and we need to take some action */
-	if ((m.cs & 3) == 3) {
+	if ((m->cs & 3) == 3) {
 		/* If this triggers there is no way to recover. Die hard. */
 		BUG_ON(!on_thread_stack() || !user_mode(regs));
 
-		if (!mce_usable_address(&m))
-			queue_task_work(&m, msg, kill_me_now);
+		if (!mce_usable_address(m))
+			queue_task_work(&err, msg, kill_me_now);
 		else
-			queue_task_work(&m, msg, kill_me_maybe);
+			queue_task_work(&err, msg, kill_me_maybe);
 
-	} else if (m.mcgstatus & MCG_STATUS_SEAM_NR) {
+	} else if (m->mcgstatus & MCG_STATUS_SEAM_NR) {
 		/*
 		 * Saved RIP on stack makes it look like the machine check
 		 * was taken in the kernel on the instruction following
@@ -1612,8 +1635,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * not occur there. Mark the page as poisoned so it won't
 		 * be added to free list when the guest is terminated.
 		 */
-		if (mce_usable_address(&m)) {
-			struct page *p = pfn_to_online_page(m.addr >> PAGE_SHIFT);
+		if (mce_usable_address(m)) {
+			struct page *p = pfn_to_online_page(m->addr >> PAGE_SHIFT);
 
 			if (p)
 				SetPageHWPoison(p);
@@ -1628,13 +1651,13 @@ noinstr void do_machine_check(struct pt_regs *regs)
 		 * corresponding exception handler which would do that is the
 		 * proper one.
 		 */
-		if (m.kflags & MCE_IN_KERNEL_RECOV) {
+		if (m->kflags & MCE_IN_KERNEL_RECOV) {
 			if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
-				mce_panic("Failed kernel mode recovery", &m, msg);
+				mce_panic("Failed kernel mode recovery", &err, msg);
 		}
 
-		if (m.kflags & MCE_IN_KERNEL_COPYIN)
-			queue_task_work(&m, msg, kill_me_never);
+		if (m->kflags & MCE_IN_KERNEL_COPYIN)
+			queue_task_work(&err, msg, kill_me_never);
 	}
 
 out:
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index 4284749ec803..504d89724ecd 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -31,15 +31,15 @@ static LLIST_HEAD(mce_event_llist);
  */
 static bool is_duplicate_mce_record(struct mce_evt_llist *t, struct mce_evt_llist *l)
 {
+	struct mce_hw_err *err1, *err2;
 	struct mce_evt_llist *node;
-	struct mce *m1, *m2;
 
-	m1 = &t->mce;
+	err1 = &t->err;
 
 	llist_for_each_entry(node, &l->llnode, llnode) {
-		m2 = &node->mce;
+		err2 = &node->err;
 
-		if (!mce_cmp(m1, m2))
+		if (!mce_cmp(&err1->m, &err2->m))
 			return true;
 	}
 	return false;
@@ -73,9 +73,9 @@ struct llist_node *mce_gen_pool_prepare_records(void)
 
 void mce_gen_pool_process(struct work_struct *__unused)
 {
+	struct mce *mce;
 	struct llist_node *head;
 	struct mce_evt_llist *node, *tmp;
-	struct mce *mce;
 
 	head = llist_del_all(&mce_event_llist);
 	if (!head)
@@ -83,7 +83,7 @@ void mce_gen_pool_process(struct work_struct *__unused)
 
 	head = llist_reverse_order(head);
 	llist_for_each_entry_safe(node, tmp, head, llnode) {
-		mce = &node->mce;
+		mce = &node->err.m;
 		blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce);
 		gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node));
 	}
@@ -94,11 +94,11 @@ bool mce_gen_pool_empty(void)
 	return llist_empty(&mce_event_llist);
 }
 
-int mce_gen_pool_add(struct mce *mce)
+int mce_gen_pool_add(struct mce_hw_err *err)
 {
 	struct mce_evt_llist *node;
 
-	if (filter_mce(mce))
+	if (filter_mce(&err->m))
 		return -EINVAL;
 
 	if (!mce_evt_pool)
@@ -110,7 +110,7 @@ int mce_gen_pool_add(struct mce *mce)
 		return -ENOMEM;
 	}
 
-	memcpy(&node->mce, mce, sizeof(*mce));
+	memcpy(&node->err, err, sizeof(*err));
 	llist_add(&node->llnode, &mce_event_llist);
 
 	return 0;
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 49ed3428785d..c65a5c4e2f22 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -502,6 +502,7 @@ static void prepare_msrs(void *info)
 
 static void do_inject(void)
 {
+	struct mce_hw_err err;
 	u64 mcg_status = 0;
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
@@ -517,7 +518,8 @@ static void do_inject(void)
 		i_mce.status |= MCI_STATUS_SYNDV;
 
 	if (inj_type == SW_INJ) {
-		mce_log(&i_mce);
+		err.m = i_mce;
+		mce_log(&err);
 		return;
 	}
 
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 43c7f3b71df5..84f810598231 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -26,12 +26,12 @@ extern struct blocking_notifier_head x86_mce_decoder_chain;
 
 struct mce_evt_llist {
 	struct llist_node llnode;
-	struct mce mce;
+	struct mce_hw_err err;
 };
 
 void mce_gen_pool_process(struct work_struct *__unused);
 bool mce_gen_pool_empty(void);
-int mce_gen_pool_add(struct mce *mce);
+int mce_gen_pool_add(struct mce_hw_err *err);
 int mce_gen_pool_init(void);
 struct llist_node *mce_gen_pool_prepare_records(void);
 
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index f0f7b3cb2041..65aba1afcd07 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -19,9 +19,9 @@
 
 TRACE_EVENT(mce_record,
 
-	TP_PROTO(struct mce *m),
+	TP_PROTO(struct mce_hw_err *err),
 
-	TP_ARGS(m),
+	TP_ARGS(err),
 
 	TP_STRUCT__entry(
 		__field(	u64,		mcgcap		)
@@ -46,25 +46,25 @@ TRACE_EVENT(mce_record,
 	),
 
 	TP_fast_assign(
-		__entry->mcgcap		= m->mcgcap;
-		__entry->mcgstatus	= m->mcgstatus;
-		__entry->status		= m->status;
-		__entry->addr		= m->addr;
-		__entry->misc		= m->misc;
-		__entry->synd		= m->synd;
-		__entry->ipid		= m->ipid;
-		__entry->ip		= m->ip;
-		__entry->tsc		= m->tsc;
-		__entry->ppin		= m->ppin;
-		__entry->walltime	= m->time;
-		__entry->cpu		= m->extcpu;
-		__entry->cpuid		= m->cpuid;
-		__entry->apicid		= m->apicid;
-		__entry->socketid	= m->socketid;
-		__entry->cs		= m->cs;
-		__entry->bank		= m->bank;
-		__entry->cpuvendor	= m->cpuvendor;
-		__entry->microcode	= m->microcode;
+		__entry->mcgcap		= err->m.mcgcap;
+		__entry->mcgstatus	= err->m.mcgstatus;
+		__entry->status		= err->m.status;
+		__entry->addr		= err->m.addr;
+		__entry->misc		= err->m.misc;
+		__entry->synd		= err->m.synd;
+		__entry->ipid		= err->m.ipid;
+		__entry->ip		= err->m.ip;
+		__entry->tsc		= err->m.tsc;
+		__entry->ppin		= err->m.ppin;
+		__entry->walltime	= err->m.time;
+		__entry->cpu		= err->m.extcpu;
+		__entry->cpuid		= err->m.cpuid;
+		__entry->apicid		= err->m.apicid;
+		__entry->socketid	= err->m.socketid;
+		__entry->cs		= err->m.cs;
+		__entry->bank		= err->m.bank;
+		__entry->cpuvendor	= err->m.cpuvendor;
+		__entry->microcode	= err->m.microcode;
 	),
 
 	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
  2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
@ 2024-10-22 19:36 ` Avadhut Naik
  2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

From: Steven Rostedt <rostedt@goodmis.org>

When printing a dynamic array in a trace event, the method is rather ugly.
It has the format of:

  __print_array(__get_dynamic_array(array),
            __get_dynmaic_array_len(array) / el_size, el_size)

Since dynamic arrays are known to the tracing infrastructure, create a
helper macro that does the above for you.

  __print_dynamic_array(array, el_size)

Which would expand to the same output.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v5:
Patch introduced in the series.

Changes in v6:
No changes other than rebasing on top of tip/master.

Changes in v7:
1. Rebase on top of tip/master.
2. Added parenthesis around el_size parameter in macro.
3. Remove Cc (to Avadhut Naik) tag.
4. Add Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> tag.
---
 include/trace/stages/stage3_trace_output.h | 8 ++++++++
 include/trace/stages/stage7_class_define.h | 1 +
 samples/trace_events/trace-events-sample.h | 7 ++++++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/trace/stages/stage3_trace_output.h b/include/trace/stages/stage3_trace_output.h
index c1fb1355d309..1e7b0bef95f5 100644
--- a/include/trace/stages/stage3_trace_output.h
+++ b/include/trace/stages/stage3_trace_output.h
@@ -119,6 +119,14 @@
 		trace_print_array_seq(p, array, count, el_size);	\
 	})
 
+#undef __print_dynamic_array
+#define __print_dynamic_array(array, el_size)				\
+	({								\
+		__print_array(__get_dynamic_array(array),		\
+			      __get_dynamic_array_len(array) / (el_size), \
+			      (el_size));				\
+	})
+
 #undef __print_hex_dump
 #define __print_hex_dump(prefix_str, prefix_type,			\
 			 rowsize, groupsize, buf, len, ascii)		\
diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
index bcb960d16fc0..fcd564a590f4 100644
--- a/include/trace/stages/stage7_class_define.h
+++ b/include/trace/stages/stage7_class_define.h
@@ -22,6 +22,7 @@
 #undef __get_rel_cpumask
 #undef __get_rel_sockaddr
 #undef __print_array
+#undef __print_dynamic_array
 #undef __print_hex_dump
 #undef __get_buf
 
diff --git a/samples/trace_events/trace-events-sample.h b/samples/trace_events/trace-events-sample.h
index 55f9a3da92d5..999f78d380ae 100644
--- a/samples/trace_events/trace-events-sample.h
+++ b/samples/trace_events/trace-events-sample.h
@@ -319,7 +319,7 @@ TRACE_EVENT(foo_bar,
 		__assign_cpumask(cpum, cpumask_bits(mask));
 	),
 
-	TP_printk("foo %s %d %s %s %s %s %s (%s) (%s) %s", __entry->foo, __entry->bar,
+	TP_printk("foo %s %d %s %s %s %s %s %s (%s) (%s) %s", __entry->foo, __entry->bar,
 
 /*
  * Notice here the use of some helper functions. This includes:
@@ -363,6 +363,11 @@ TRACE_EVENT(foo_bar,
 		  __print_array(__get_dynamic_array(list),
 				__get_dynamic_array_len(list) / sizeof(int),
 				sizeof(int)),
+
+/*     A shortcut is to use __print_dynamic_array for dynamic arrays */
+
+		  __print_dynamic_array(list, sizeof(int)),
+
 		  __get_str(str), __get_str(lstr),
 		  __get_bitmask(cpus), __get_cpumask(cpum),
 		  __get_str(vstr))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
  2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
  2024-10-22 19:36 ` [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper Avadhut Naik
@ 2024-10-22 19:36 ` Avadhut Naik
  2024-10-24  2:25   ` Zhuo, Qiuxu
  2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

Starting with Zen4, AMD's Scalable MCA systems incorporate two new
registers: MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition
to the existing MCA_SYND register. The data within these registers is
considered valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like the rasdaemon gather related hardware
error information through the tracepoints. As such, these two registers
should be exported through the mce_record tracepoint so that tools like
rasdaemon can parse them and output the supplemental error information
like FRU Text contained in them.

[Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Move wrapper changes required in mce_read_aux() and mce_no_way_out()
from this patch to the first patch.
2. Add comments to explain the new wrapper's purpose.
3. Modify commit message per feedback received.
4. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
1. Remove "len" field since the length of a dynamic array can be fetched
from its metadata.
2. Substitute __print_array() with __print_dynamic_array().

Changes in v6:
1. Rebase on top of tip/master.
2. Use the newly introduced to_mce_hw_err macro in amd_decode_mce().

Changes in v7:
1. Rebase on top of tip/master.
2. Change second parameter of __print_dynamic_array from 8 to sizeof(u8)
to ensure that the dynamic array is parsed using a u8 pointer instead of
u64 pointer.
---
 arch/x86/include/asm/mce.h      | 22 ++++++++++++++++++++++
 arch/x86/include/uapi/asm/mce.h |  3 ++-
 arch/x86/kernel/cpu/mce/amd.c   |  5 ++++-
 arch/x86/kernel/cpu/mce/core.c  |  9 ++++++++-
 drivers/edac/mce_amd.c          |  8 ++++++--
 include/trace/events/mce.h      |  7 +++++--
 6 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4e45f45673a3..c2466b20fe79 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -122,6 +122,9 @@
 #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
 #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
 #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
+/* Registers MISC2 to MISC4 are at offsets B to D. */
+#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
+#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
 #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
@@ -132,6 +135,8 @@
 #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
+#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
 
 #define XEC(x, mask)			(((x) >> 16) & mask)
 
@@ -190,9 +195,26 @@ enum mce_notifier_prios {
 /**
  * struct mce_hw_err - Hardware Error Record.
  * @m:		Machine Check record.
+ * @vendor:	Vendor-specific error information.
+ *
+ * Vendor-specific fields should not be added to struct mce.
+ * Instead, vendors should export their vendor-specific data
+ * through their structure in the vendor union below.
+ *
+ * AMD's vendor data is parsed by error decoding tools for
+ * supplemental error information. Thus, current offsets of
+ * existing fields must be maintained.
+ * Only add new fields at the end of AMD's vendor structure.
  */
 struct mce_hw_err {
 	struct mce m;
+
+	union vendor_info {
+		struct {
+			u64 synd1;		/* MCA_SYND1 MSR */
+			u64 synd2;		/* MCA_SYND2 MSR */
+		} amd;
+	} vendor;
 };
 
 #define	to_mce_hw_err(mce) container_of(mce, struct mce_hw_err, m)
diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
index db9adc081c5a..cb6b48a7c22b 100644
--- a/arch/x86/include/uapi/asm/mce.h
+++ b/arch/x86/include/uapi/asm/mce.h
@@ -8,7 +8,8 @@
 /*
  * Fields are zero when not available. Also, this struct is shared with
  * userspace mcelog and thus must keep existing fields at current offsets.
- * Only add new fields to the end of the structure
+ * Only add new, shared fields to the end of the structure.
+ * Do not add vendor-specific fields.
  */
 struct mce {
 	__u64 status;		/* Bank's MCi_STATUS MSR */
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5b4d266500b2..6ca80fff1fea 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -797,8 +797,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vendor.amd.synd1);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vendor.amd.synd2);
+		}
 	}
 
 	mce_log(&err);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 3611366d56b7..fca23fe16abe 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -202,6 +202,10 @@ static void __print_mce(struct mce_hw_err *err)
 	if (mce_flags.smca) {
 		if (m->synd)
 			pr_cont("SYND %llx ", m->synd);
+		if (err->vendor.amd.synd1)
+			pr_cont("SYND1 %llx ", err->vendor.amd.synd1);
+		if (err->vendor.amd.synd2)
+			pr_cont("SYND2 %llx ", err->vendor.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
 	}
@@ -678,8 +682,11 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+			err->vendor.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
+			err->vendor.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
+		}
 	}
 }
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 8130c3dc64da..194d9fd47d20 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -793,6 +793,7 @@ static int
 amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 {
 	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
 	int ecc;
 
@@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
+		if (m->status & MCI_STATUS_SYNDV) {
+			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
+			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+		}
 
 		pr_cont("\n");
 
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 65aba1afcd07..c1c50df9ecfd 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -43,6 +43,7 @@ TRACE_EVENT(mce_record,
 		__field(	u8,		bank		)
 		__field(	u8,		cpuvendor	)
 		__field(	u32,		microcode	)
+		__dynamic_array(u8, v_data, sizeof(err->vendor))
 	),
 
 	TP_fast_assign(
@@ -65,9 +66,10 @@ TRACE_EVENT(mce_record,
 		__entry->bank		= err->m.bank;
 		__entry->cpuvendor	= err->m.cpuvendor;
 		__entry->microcode	= err->m.microcode;
+		memcpy(__get_dynamic_array(v_data), &err->vendor, sizeof(err->vendor));
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s",
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
@@ -83,7 +85,8 @@ TRACE_EVENT(mce_record,
 		__entry->walltime,
 		__entry->socketid,
 		__entry->apicid,
-		__entry->microcode)
+		__entry->microcode,
+		__print_dynamic_array(v_data, sizeof(u8)))
 );
 
 #endif /* _TRACE_MCE_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 4/5] x86/mce/apei: Handle variable register array size
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
                   ` (2 preceding siblings ...)
  2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
@ 2024-10-22 19:36 ` Avadhut Naik
  2024-10-24  5:25   ` Zhuo, Qiuxu
  2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
  2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
  5 siblings, 1 reply; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

From: Yazen Ghannam <yazen.ghannam@amd.com>

ACPI Boot Error Record Table (BERT) is being used by the kernel to
report errors that occurred in a previous boot. On some modern AMD
systems, these very errors within the BERT are reported through the
x86 Common Platform Error Record (CPER) format which consists of one
or more Processor Context Information Structures. These context
structures provide a starting address and represent an x86 MSR range
in which the data constitutes a contiguous set of MSRs starting from,
and including the starting address.

It's common, for AMD systems that implement this behavior, that the
MSR range represents the MCAX register space used for the Scalable MCA
feature. The apei_smca_report_x86_error() function decodes and passes
this information through the MCE notifier chain. However, this function
assumes a fixed register size based on the original HW/FW implementation.

This assumption breaks with the addition of two new MCAX registers viz.
MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
MCAX register space, so they won't be included when decoding the CPER
data.

Rework apei_smca_report_x86_error() to support a variable register array
size. This covers any case where the MSR context information starts at
the MCAX address for MCA_STATUS and ends at any other register within
the MCAX register space.

Add code comments indicating the MCAX register at each offset.

[Yazen: Add Avadhut as co-developer for wrapper changes.]

Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Incorporate suggested touchup.
2. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
1. No changes except rebasing on top of tip/master.

Changes in v6:
1. No changes except rebasing on top of tip/master.

Changes in v7:
1. No changes except rebasing on top of tip/master.
---
 arch/x86/kernel/cpu/mce/apei.c | 72 +++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7f582b4ca1ca..0a89947e47bc 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -68,9 +68,9 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 {
 	const u64 *i_mce = ((const u64 *) (ctx_info + 1));
+	unsigned int cpu, num_regs;
 	bool apicid_found = false;
 	struct mce_hw_err err;
-	unsigned int cpu;
 	struct mce *m;
 
 	if (!boot_cpu_has(X86_FEATURE_SMCA))
@@ -89,16 +89,12 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 		return -EINVAL;
 
 	/*
-	 * The register array size must be large enough to include all the
-	 * SMCA registers which need to be extracted.
-	 *
 	 * The number of registers in the register array is determined by
 	 * Register Array Size/8 as defined in UEFI spec v2.8, sec N.2.4.2.2.
-	 * The register layout is fixed and currently the raw data in the
-	 * register array includes 6 SMCA registers which the kernel can
-	 * extract.
+	 * Sanity-check registers array size.
 	 */
-	if (ctx_info->reg_arr_size < 48)
+	num_regs = ctx_info->reg_arr_size >> 3;
+	if (!num_regs)
 		return -EINVAL;
 
 	for_each_possible_cpu(cpu) {
@@ -117,12 +113,60 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 	mce_prep_record_per_cpu(cpu, m);
 
 	m->bank = (ctx_info->msr_addr >> 4) & 0xFF;
-	m->status = *i_mce;
-	m->addr = *(i_mce + 1);
-	m->misc = *(i_mce + 2);
-	/* Skipping MCA_CONFIG */
-	m->ipid = *(i_mce + 4);
-	m->synd = *(i_mce + 5);
+
+	/*
+	 * The SMCA register layout is fixed and includes 16 registers.
+	 * The end of the array may be variable, but the beginning is known.
+	 * Cap the number of registers to expected max (15).
+	 */
+	if (num_regs > 15)
+		num_regs = 15;
+
+	switch (num_regs) {
+	/* MCA_SYND2 */
+	case 15:
+		err.vendor.amd.synd2 = *(i_mce + 14);
+		fallthrough;
+	/* MCA_SYND1 */
+	case 14:
+		err.vendor.amd.synd1 = *(i_mce + 13);
+		fallthrough;
+	/* MCA_MISC4 */
+	case 13:
+	/* MCA_MISC3 */
+	case 12:
+	/* MCA_MISC2 */
+	case 11:
+	/* MCA_MISC1 */
+	case 10:
+	/* MCA_DEADDR */
+	case 9:
+	/* MCA_DESTAT */
+	case 8:
+	/* reserved */
+	case 7:
+	/* MCA_SYND */
+	case 6:
+		m->synd = *(i_mce + 5);
+		fallthrough;
+	/* MCA_IPID */
+	case 5:
+		m->ipid = *(i_mce + 4);
+		fallthrough;
+	/* MCA_CONFIG */
+	case 4:
+	/* MCA_MISC0 */
+	case 3:
+		m->misc = *(i_mce + 2);
+		fallthrough;
+	/* MCA_ADDR */
+	case 2:
+		m->addr = *(i_mce + 1);
+		fallthrough;
+	/* MCA_STATUS */
+	case 1:
+		m->status = *i_mce;
+	}
 
 	mce_log(&err);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
                   ` (3 preceding siblings ...)
  2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
@ 2024-10-22 19:36 ` Avadhut Naik
  2024-10-24  5:49   ` Zhuo, Qiuxu
                     ` (2 more replies)
  2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
  5 siblings, 3 replies; 25+ messages in thread
From: Avadhut Naik @ 2024-10-22 19:36 UTC (permalink / raw)
  To: x86, linux-edac, linux-trace-kernel
  Cc: linux-kernel, bp, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, avadhut.naik

From: Yazen Ghannam <yazen.ghannam@amd.com>

A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).

The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
covers multiple devices, for example, a Unified Memory Controller (UMC)
bank that manages two DIMMs.

Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
exported through the mce_record tracepoint so that userspace tools like
the rasdaemon can determine if FRU Text has been reported through the
MCA_SYND1 and MCA_SYND2 registers and output it.

[Yazen: Add Avadhut as co-developer for wrapper changes.]

Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Modify commit message per feedback provided.
2. Remove call to memset() for the string frutext. Instead, just ensure
that it is NULL terminated.
2. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
1. No changes except rebasing on top of tip/master.

Changes in v6:
1. No changes except rebasing on top of tip/master.

Changes in v7:
1. No changes except rebasing on top of tip/master.
---
 arch/x86/include/asm/mce.h     |  2 ++
 arch/x86/kernel/cpu/mce/amd.c  |  1 +
 arch/x86/kernel/cpu/mce/apei.c |  2 ++
 arch/x86/kernel/cpu/mce/core.c |  3 +++
 drivers/edac/mce_amd.c         | 21 ++++++++++++++-------
 5 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c2466b20fe79..72a69ad7d692 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
  *  - TCC bit is present in MCx_STATUS.
  */
 #define MCI_CONFIG_MCAX		0x1
+#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
@@ -213,6 +214,7 @@ struct mce_hw_err {
 		struct {
 			u64 synd1;		/* MCA_SYND1 MSR */
 			u64 synd2;		/* MCA_SYND2 MSR */
+			u64 config;		/* MCA_CONFIG MSR */
 		} amd;
 	} vendor;
 };
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6ca80fff1fea..65ace034af08 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
+		rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 0a89947e47bc..19a1c72fc2bf 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 		fallthrough;
 	/* MCA_CONFIG */
 	case 4:
+		err.vendor.amd.config = *(i_mce + 3);
+		fallthrough;
 	/* MCA_MISC0 */
 	case 3:
 		m->misc = *(i_mce + 2);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index fca23fe16abe..bc5e67306f77 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -208,6 +208,8 @@ static void __print_mce(struct mce_hw_err *err)
 			pr_cont("SYND2 %llx ", err->vendor.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
+		if (err->vendor.amd.config)
+			pr_cont("CONFIG %llx ", err->vendor.amd.config);
 	}
 
 	pr_cont("\n");
@@ -681,6 +683,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
+		err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..d69a1466f0bc 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
+	u64 mca_config = err->vendor.amd.config;
 	int ecc;
 
 	if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		u32 low, high;
-		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
-
-		if (!rdmsr_safe(addr, &low, &high) &&
-		    (low & MCI_CONFIG_MCAX))
+		if (mca_config & MCI_CONFIG_MCAX)
 			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
 
 		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
-			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+			if (mca_config & MCI_CONFIG_FRUTEXT) {
+				char frutext[17];
+
+				frutext[16] = '\0';
+				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+				pr_emerg(HW_ERR "FRU Text: %s", frutext);
+			} else {
+				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+					 err->vendor.amd.synd1, err->vendor.amd.synd2);
+			}
 		}
 
 		pr_cont("\n");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
@ 2024-10-24  2:21   ` Zhuo, Qiuxu
  2024-10-30 13:32   ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Zhuo, Qiuxu @ 2024-10-24  2:21 UTC (permalink / raw)
  To: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, bp@alien8.de, Luck, Tony,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	mchehab@kernel.org, yazen.ghannam@amd.com, john.allen@amd.com

> From: Avadhut Naik <avadhut.naik@amd.com>
> [...]
> Subject: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export
> vendor specific info
> 
> Currently, exporting new additional machine check error information involves
> adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
> 
> However, as new MSRs are being added (and will be added in the future) by
> CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some CPUs,
> unnecessarily, since those fields are vendor-specific. Moreover, different CPU
> vendors may export the additional information in varying sizes.
> 
> The problem particularly intensifies since struct mce is exposed to userspace
> as part of UAPI. It's bloating through vendor-specific data should be avoided
> to limit the information being sent out to userspace.
> 
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be
> exported through a union within the wrapper structure and through
> __dynamic_array in mce_record tracepoint.
> 
> Furthermore, new internal kernel fields can be added to the wrapper struct
> without impacting the user space API.
> 
> [Yazen: Add last commit message paragraph.]
> 
> Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
> Changes in v2:
> [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-
> yazen.ghannam@amd.com/
> [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-
> yazen.ghannam@amd.com/
> 
> 1. Drop dependencies on sets [1] and [2] above and rebase on top of
> tip/master.
> 
> Changes in v3:
> 1. Move wrapper changes required in mce_read_aux() and
> mce_no_way_out() to this patch from the second patch.
> 2. Fix SoB chain to properly reflect the patch path.
> 
> Changes in v4:
> 1. Rebase on of tip/master to avoid merge conflicts.
> 2. Resolve kernel test robot's warning on the use of memset() in
> do_machine_check().
> 
> Changes in v5:
> 1. No changes except rebasing on top of tip/master.
> 
> Changes in v6:
> 1. Rebase on top of tip/master.
> 2. Introduce to_mce_hw_err macro to eliminate changes required in notifier
> chain callback functions, especially callback functions of EDAC drivers.
> 3. Change third parameter of __mc_scan_banks() to a pointer to the new
> wrapper structure and make the required changes accordingly.
> 
> Changes in v7:
> 1. Rebase on top of tip/master.
> 2. Fix initialization of struct mce_hw_err *final in do_machine_check().

As my comments resolved in v6 and v7,

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

-Qiuxu


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
  2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
@ 2024-10-24  2:25   ` Zhuo, Qiuxu
  0 siblings, 0 replies; 25+ messages in thread
From: Zhuo, Qiuxu @ 2024-10-24  2:25 UTC (permalink / raw)
  To: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, bp@alien8.de, Luck, Tony,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	mchehab@kernel.org, yazen.ghannam@amd.com, john.allen@amd.com

> From: Avadhut Naik <avadhut.naik@amd.com>
> [...]
> Subject: [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new
> MCA_SYND{1,2} registers
> 
> Starting with Zen4, AMD's Scalable MCA systems incorporate two new
> registers: MCA_SYND1 and MCA_SYND2.
> 
> These registers will include supplemental error information in addition to the
> existing MCA_SYND register. The data within these registers is considered valid
> if MCA_STATUS[SyndV] is set.
> 
> Userspace error decoding tools like the rasdaemon gather related hardware
> error information through the tracepoints. As such, these two registers should
> be exported through the mce_record tracepoint so that tools like rasdaemon
> can parse them and output the supplemental error information like FRU Text
> contained in them.
> 
> [Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]
> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
> Changes in v2:
> [1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-
> yazen.ghannam@amd.com/
> [2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-
> yazen.ghannam@amd.com/
> 
> 1. Drop dependencies on sets [1] and [2] above and rebase on top of
> tip/master.
> 
> Changes in v3:
> 1. Move wrapper changes required in mce_read_aux() and
> mce_no_way_out() from this patch to the first patch.
> 2. Add comments to explain the new wrapper's purpose.
> 3. Modify commit message per feedback received.
> 4. Fix SoB chain to properly reflect the patch path.
> 
> Changes in v4:
> 1. Rebase on top of tip/master to avoid merge conflicts.
> 
> Changes in v5:
> 1. Remove "len" field since the length of a dynamic array can be fetched from
> its metadata.
> 2. Substitute __print_array() with __print_dynamic_array().
> 
> Changes in v6:
> 1. Rebase on top of tip/master.
> 2. Use the newly introduced to_mce_hw_err macro in amd_decode_mce().
> 
> Changes in v7:
> 1. Rebase on top of tip/master.
> 2. Change second parameter of __print_dynamic_array from 8 to sizeof(u8)
> to ensure that the dynamic array is parsed using a u8 pointer instead of
> u64 pointer.

As my comments resolved in v6 and v7,

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

- Qiuxu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v7 4/5] x86/mce/apei: Handle variable register array size
  2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
@ 2024-10-24  5:25   ` Zhuo, Qiuxu
  0 siblings, 0 replies; 25+ messages in thread
From: Zhuo, Qiuxu @ 2024-10-24  5:25 UTC (permalink / raw)
  To: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, bp@alien8.de, Luck, Tony,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	mchehab@kernel.org, yazen.ghannam@amd.com, john.allen@amd.com

> From: Avadhut Naik <avadhut.naik@amd.com>
> Sent: Wednesday, October 23, 2024 3:37 AM
> To: x86@kernel.org; linux-edac@vger.kernel.org; linux-trace-
> kernel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; bp@alien8.de; Luck, Tony
> <tony.luck@intel.com>; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>;
> tglx@linutronix.de; mingo@redhat.com; rostedt@goodmis.org;
> mchehab@kernel.org; yazen.ghannam@amd.com; john.allen@amd.com;
> avadhut.naik@amd.com
> Subject: [PATCH v7 4/5] x86/mce/apei: Handle variable register array size
> 
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> ACPI Boot Error Record Table (BERT) is being used by the kernel to report
> errors that occurred in a previous boot. On some modern AMD systems,
> these very errors within the BERT are reported through the
> x86 Common Platform Error Record (CPER) format which consists of one or
> more Processor Context Information Structures. These context structures
> provide a starting address and represent an x86 MSR range in which the data
> constitutes a contiguous set of MSRs starting from, and including the starting
> address.
> 
> It's common, for AMD systems that implement this behavior, that the MSR
> range represents the MCAX register space used for the Scalable MCA feature.
> The apei_smca_report_x86_error() function decodes and passes this
> information through the MCE notifier chain. However, this function assumes
> a fixed register size based on the original HW/FW implementation.
> 
> This assumption breaks with the addition of two new MCAX registers viz.
> MCA_SYND1 and MCA_SYND2. These registers are added at the end of the
> MCAX register space, so they won't be included when decoding the CPER
> data.
> 
> Rework apei_smca_report_x86_error() to support a variable register array
> size. This covers any case where the MSR context information starts at the
> MCAX address for MCA_STATUS and ends at any other register within the
> MCAX register space.
> 
> Add code comments indicating the MCAX register at each offset.
> 
> [Yazen: Add Avadhut as co-developer for wrapper changes.]
> 
> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

LGTM.

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
@ 2024-10-24  5:49   ` Zhuo, Qiuxu
  2024-10-30 16:05   ` Borislav Petkov
  2024-10-30 16:15   ` Borislav Petkov
  2 siblings, 0 replies; 25+ messages in thread
From: Zhuo, Qiuxu @ 2024-10-24  5:49 UTC (permalink / raw)
  To: Avadhut Naik, x86@kernel.org, linux-edac@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, bp@alien8.de, Luck, Tony,
	tglx@linutronix.de, mingo@redhat.com, rostedt@goodmis.org,
	mchehab@kernel.org, yazen.ghannam@amd.com, john.allen@amd.com

> From: Avadhut Naik <avadhut.naik@amd.com>
> [...]
> Subject: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
> 
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> A new "FRU Text in MCA" feature is defined where the Field Replaceable Unit
> (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> 
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
> 
> Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
> exported through the mce_record tracepoint so that userspace tools like the
> rasdaemon can determine if FRU Text has been reported through the
> MCA_SYND1 and MCA_SYND2 registers and output it.
> 
> [Yazen: Add Avadhut as co-developer for wrapper changes.]
> 
> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>

LGTM.

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs
  2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
                   ` (4 preceding siblings ...)
  2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
@ 2024-10-29 18:14 ` Naik, Avadhut
  2024-10-29 18:27   ` Borislav Petkov
  5 siblings, 1 reply; 25+ messages in thread
From: Naik, Avadhut @ 2024-10-29 18:14 UTC (permalink / raw)
  To: bp
  Cc: linux-kernel, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, linux-trace-kernel,
	Avadhut Naik, x86, linux-edac

Hi,

Any further feedback on this set?
If not, can this please be considered for merging in?

On 10/22/2024 14:36, Avadhut Naik wrote:
> This patchset adds a new wrapper for struct mce to prevent its bloating
> and export vendor specific error information. Additionally, support is
> also introduced for two new "syndrome" MSRs used in newer AMD Scalable
> MCA (SMCA) systems. Also, a new "FRU Text in MCA" feature that uses these
> new "syndrome" MSRs has been addded.
[...]
> base-commit: d7ec15ce8bdc955ce383123c4f01ad0a8155fb90

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs
  2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
@ 2024-10-29 18:27   ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2024-10-29 18:27 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: linux-kernel, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, yazen.ghannam, john.allen, linux-trace-kernel,
	Avadhut Naik, x86, linux-edac

On Tue, Oct 29, 2024 at 01:14:16PM -0500, Naik, Avadhut wrote:
> Any further feedback on this set?
> If not, can this please be considered for merging in?

You're on the to-review list which is ever-growing. :-(

While waiting, how about you review some other people's code? Whatever you
find interesting on lkml or somewhere else...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
  2024-10-24  2:21   ` Zhuo, Qiuxu
@ 2024-10-30 13:32   ` Borislav Petkov
  2024-10-30 16:35     ` Naik, Avadhut
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 13:32 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-kernel, tony.luck,
	qiuxu.zhuo, tglx, mingo, rostedt, mchehab, yazen.ghannam,
	john.allen

On Tue, Oct 22, 2024 at 07:36:27PM +0000, Avadhut Naik wrote:
> Currently, exporting new additional machine check error information
> involves adding new fields for the same at the end of the struct mce.
> This additional information can then be consumed through mcelog or
> tracepoint.
> 
> However, as new MSRs are being added (and will be added in the future)
> by CPU vendors on their newer CPUs with additional machine check error
> information to be exported, the size of struct mce will balloon on some
> CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
> different CPU vendors may export the additional information in varying
> sizes.
> 
> The problem particularly intensifies since struct mce is exposed to
> userspace as part of UAPI. It's bloating through vendor-specific data
> should be avoided to limit the information being sent out to userspace.
> 
> Add a new structure mce_hw_err to wrap the existing struct mce. The same
> will prevent its ballooning since vendor-specifc data, if any, can now be

Unknown word [vendor-specifc] in commit message.

Please introduce a spellchecker into your patch creation workflow.

Also:

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

	struct long_struct_name *descriptive_name;
	unsigned long foo, bar;
	unsigned int tmp;
	int ret;

The above is faster to parse than the reverse ordering::

	int ret;
	unsigned int tmp;
	unsigned long foo, bar;
	struct long_struct_name *descriptive_name;

And even more so than random ordering::

	unsigned long foo, bar;
	int ret;
	struct long_struct_name *descriptive_name;
	unsigned int tmp;

diff ontop of yours:

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 3611366d56b7..28e28b69d84d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1030,11 +1030,11 @@ static noinstr int mce_timed_out(u64 *t, const char *msg)
  */
 static void mce_reign(void)
 {
-	int cpu;
 	struct mce_hw_err *err = NULL;
 	struct mce *m = NULL;
 	int global_worst = 0;
 	char *msg = NULL;
+	int cpu;
 
 	/*
 	 * This CPU is the Monarch and the other CPUs have run
@@ -1291,8 +1291,8 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs,
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
 	struct mca_config *cfg = &mca_cfg;
-	struct mce *m = &err->m;
 	int severity, i, taint = 0;
+	struct mce *m = &err->m;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
 		arch___clear_bit(i, toclear);
@@ -1419,8 +1419,8 @@ static void kill_me_never(struct callback_head *cb)
 
 static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(struct callback_head *))
 {
-	struct mce *m = &err->m;
 	int count = ++current->mce_count;
+	struct mce *m = &err->m;
 
 	/* First call, save all the details */
 	if (count == 1) {
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index 504d89724ecd..d0be6dda0c14 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -73,9 +73,9 @@ struct llist_node *mce_gen_pool_prepare_records(void)
 
 void mce_gen_pool_process(struct work_struct *__unused)
 {
-	struct mce *mce;
-	struct llist_node *head;
 	struct mce_evt_llist *node, *tmp;
+	struct llist_node *head;
+	struct mce *mce;
 
 	head = llist_del_all(&mce_event_llist);
 	if (!head)
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index c65a5c4e2f22..313fe682db33 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -502,9 +502,9 @@ static void prepare_msrs(void *info)
 
 static void do_inject(void)
 {
+	unsigned int cpu = i_mce.extcpu;
 	struct mce_hw_err err;
 	u64 mcg_status = 0;
-	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
 
 	i_mce.tsc = rdtsc_ordered();

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
  2024-10-24  5:49   ` Zhuo, Qiuxu
@ 2024-10-30 16:05   ` Borislav Petkov
  2024-10-30 16:15   ` Borislav Petkov
  2 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 16:05 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-kernel, tony.luck,
	qiuxu.zhuo, tglx, mingo, rostedt, mchehab, yazen.ghannam,
	john.allen

On Tue, Oct 22, 2024 at 07:36:31PM +0000, Avadhut Naik wrote:
> @@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
> +			if (mca_config & MCI_CONFIG_FRUTEXT) {
> +				char frutext[17];
> +
> +				frutext[16] = '\0';
> +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> +
> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> +			} else {
> +				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> +					 err->vendor.amd.synd1, err->vendor.amd.synd2);
> +			}
>  		}

Right, so let's turn this into:

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bc5e67306f77..edc2c8033de8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -208,8 +208,6 @@ static void __print_mce(struct mce_hw_err *err)
 			pr_cont("SYND2 %llx ", err->vendor.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
-		if (err->vendor.amd.config)
-			pr_cont("CONFIG %llx ", err->vendor.amd.config);
 	}
 
 	pr_cont("\n");
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index d69a1466f0bc..62fcd92bf9d2 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -858,9 +858,6 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
 
 				pr_emerg(HW_ERR "FRU Text: %s", frutext);
-			} else {
-				pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-					 err->vendor.amd.synd1, err->vendor.amd.synd2);
 			}
 		}
 
and simply treat synd1 and synd2 as FRU text. I don't want to expose
mca_config to userspace yet but use it in the RAS code only. If a case appears
that we want to really expose it to userspace, we can talk about a proper
design then.

This patch doesn't make it part of the tracepoint either so...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
  2024-10-24  5:49   ` Zhuo, Qiuxu
  2024-10-30 16:05   ` Borislav Petkov
@ 2024-10-30 16:15   ` Borislav Petkov
  2024-10-30 16:31     ` Yazen Ghannam
  2 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 16:15 UTC (permalink / raw)
  To: Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-kernel, tony.luck,
	qiuxu.zhuo, tglx, mingo, rostedt, mchehab, yazen.ghannam,
	john.allen

On Tue, Oct 22, 2024 at 07:36:31PM +0000, Avadhut Naik wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> A new "FRU Text in MCA" feature is defined where the Field Replaceable
> Unit (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> 
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
> 
> Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
> exported through the mce_record tracepoint so that userspace tools like
> the rasdaemon can determine if FRU Text has been reported through the
> MCA_SYND1 and MCA_SYND2 registers and output it.

IOW:

Author: Yazen Ghannam <yazen.ghannam@amd.com>
Date:   Tue Oct 22 19:36:31 2024 +0000

    EDAC/mce_amd: Add support for FRU text in MCA
    
    A new "FRU Text in MCA" feature is defined where the Field Replaceable
    Unit (FRU) Text for a device is represented by a string in the new
    MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
    bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
    
    The FRU Text is populated dynamically for each individual error state
    (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
    covers multiple devices, for example, a Unified Memory Controller (UMC)
    bank that manages two DIMMs.
    
    If SYND1 and SYND2 are !NULL, then userspace can assume that they
    contain FRU text information. If they will report other information in
    the future, then a way of communicating the info type contained must be
    devised.
    
      [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
      [ bp: Do not expose MCA_CONFIG to userspace yet. ]
    
    Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
    Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
    Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4d936ee20e24..649a901ad563 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
  *  - TCC bit is present in MCx_STATUS.
  */
 #define MCI_CONFIG_MCAX		0x1
+#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
@@ -212,6 +213,7 @@ struct mce_hw_err {
 		struct {
 			u64 synd1;		/* MCA_SYND1 MSR */
 			u64 synd2;		/* MCA_SYND2 MSR */
+			u64 config;		/* MCA_CONFIG MSR */
 		} amd;
 	} vendor;
 };
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6ca80fff1fea..65ace034af08 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
+		rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 0a89947e47bc..19a1c72fc2bf 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
 		fallthrough;
 	/* MCA_CONFIG */
 	case 4:
+		err.vendor.amd.config = *(i_mce + 3);
+		fallthrough;
 	/* MCA_MISC0 */
 	case 3:
 		m->misc = *(i_mce + 2);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index fca23fe16abe..edc2c8033de8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -681,6 +681,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
+		err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..62fcd92bf9d2 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
+	u64 mca_config = err->vendor.amd.config;
 	int ecc;
 
 	if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		u32 low, high;
-		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
-
-		if (!rdmsr_safe(addr, &low, &high) &&
-		    (low & MCI_CONFIG_MCAX))
+		if (mca_config & MCI_CONFIG_MCAX)
 			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
 
 		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +850,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
-			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+			if (mca_config & MCI_CONFIG_FRUTEXT) {
+				char frutext[17];
+
+				frutext[16] = '\0';
+				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+				pr_emerg(HW_ERR "FRU Text: %s", frutext);
+			}
 		}
 
 		pr_cont("\n");


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 16:15   ` Borislav Petkov
@ 2024-10-30 16:31     ` Yazen Ghannam
  2024-10-30 16:49       ` Naik, Avadhut
  2024-10-30 16:50       ` Borislav Petkov
  0 siblings, 2 replies; 25+ messages in thread
From: Yazen Ghannam @ 2024-10-30 16:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab, john.allen

On Wed, Oct 30, 2024 at 05:15:50PM +0100, Borislav Petkov wrote:
> On Tue, Oct 22, 2024 at 07:36:31PM +0000, Avadhut Naik wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> > 
> > A new "FRU Text in MCA" feature is defined where the Field Replaceable
> > Unit (FRU) Text for a device is represented by a string in the new
> > MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> > bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> > 
> > The FRU Text is populated dynamically for each individual error state
> > (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> > covers multiple devices, for example, a Unified Memory Controller (UMC)
> > bank that manages two DIMMs.
> > 
> > Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
> > exported through the mce_record tracepoint so that userspace tools like
> > the rasdaemon can determine if FRU Text has been reported through the
> > MCA_SYND1 and MCA_SYND2 registers and output it.
> 
> IOW:
> 
> Author: Yazen Ghannam <yazen.ghannam@amd.com>
> Date:   Tue Oct 22 19:36:31 2024 +0000
> 
>     EDAC/mce_amd: Add support for FRU text in MCA
>     
>     A new "FRU Text in MCA" feature is defined where the Field Replaceable
>     Unit (FRU) Text for a device is represented by a string in the new
>     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>     
>     The FRU Text is populated dynamically for each individual error state
>     (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
>     covers multiple devices, for example, a Unified Memory Controller (UMC)
>     bank that manages two DIMMs.
>     
>     If SYND1 and SYND2 are !NULL, then userspace can assume that they
>     contain FRU text information. If they will report other information in
>     the future, then a way of communicating the info type contained must be
>     devised.
>     
>       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
>       [ bp: Do not expose MCA_CONFIG to userspace yet. ]

The entire struct mce_hw_err gets exposed through the mce tracepoint in
patch 3 of this set.

>     
>     Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>     Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
>     Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>     Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
>     Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 4d936ee20e24..649a901ad563 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -61,6 +61,7 @@
>   *  - TCC bit is present in MCx_STATUS.
>   */
>  #define MCI_CONFIG_MCAX		0x1
> +#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
>  #define MCI_IPID_MCATYPE	0xFFFF0000
>  #define MCI_IPID_HWID		0xFFF
>  
> @@ -212,6 +213,7 @@ struct mce_hw_err {
>  		struct {
>  			u64 synd1;		/* MCA_SYND1 MSR */
>  			u64 synd2;		/* MCA_SYND2 MSR */
> +			u64 config;		/* MCA_CONFIG MSR */

Anything that is added here will automatically show up in the
tracepoint, since it's a dynamic array. That was one of the reasons to
do the wrapper struct idea, right?

>  		} amd;
>  	} vendor;
>  };
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 6ca80fff1fea..65ace034af08 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>  
>  	if (mce_flags.smca) {
>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
> +		rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
> index 0a89947e47bc..19a1c72fc2bf 100644
> --- a/arch/x86/kernel/cpu/mce/apei.c
> +++ b/arch/x86/kernel/cpu/mce/apei.c
> @@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>  		fallthrough;
>  	/* MCA_CONFIG */
>  	case 4:
> +		err.vendor.amd.config = *(i_mce + 3);
> +		fallthrough;
>  	/* MCA_MISC0 */
>  	case 3:
>  		m->misc = *(i_mce + 2);
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index fca23fe16abe..edc2c8033de8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -681,6 +681,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
>  
>  	if (mce_flags.smca) {
>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
> +		err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 194d9fd47d20..62fcd92bf9d2 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  	struct mce *m = (struct mce *)data;
>  	struct mce_hw_err *err = to_mce_hw_err(m);
>  	unsigned int fam = x86_family(m->cpuid);
> +	u64 mca_config = err->vendor.amd.config;
>  	int ecc;
>  
>  	if (m->kflags & MCE_HANDLED_CEC)
> @@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
>  
>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> -		u32 low, high;
> -		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> -
> -		if (!rdmsr_safe(addr, &low, &high) &&
> -		    (low & MCI_CONFIG_MCAX))
> +		if (mca_config & MCI_CONFIG_MCAX)
>  			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
>  
>  		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
> @@ -853,8 +850,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
> +			if (mca_config & MCI_CONFIG_FRUTEXT) {
> +				char frutext[17];
> +
> +				frutext[16] = '\0';
> +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> +
> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> +			}
>  		}
>  
>  		pr_cont("\n");
> 
>

The only changes I see are dropping a couple of kernel prints. I think
that's probably okay. But I don't think that's what you intend by not
exposing MCA_CONFIG to user space.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-30 13:32   ` Borislav Petkov
@ 2024-10-30 16:35     ` Naik, Avadhut
  2024-10-30 16:48       ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Naik, Avadhut @ 2024-10-30 16:35 UTC (permalink / raw)
  To: Borislav Petkov, Avadhut Naik
  Cc: x86, linux-edac, linux-trace-kernel, linux-kernel, tony.luck,
	qiuxu.zhuo, tglx, mingo, rostedt, mchehab, yazen.ghannam,
	john.allen



On 10/30/2024 08:32, Borislav Petkov wrote:
> On Tue, Oct 22, 2024 at 07:36:27PM +0000, Avadhut Naik wrote:
>> Currently, exporting new additional machine check error information
>> involves adding new fields for the same at the end of the struct mce.
>> This additional information can then be consumed through mcelog or
>> tracepoint.
>>
>> However, as new MSRs are being added (and will be added in the future)
>> by CPU vendors on their newer CPUs with additional machine check error
>> information to be exported, the size of struct mce will balloon on some
>> CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
>> different CPU vendors may export the additional information in varying
>> sizes.
>>
>> The problem particularly intensifies since struct mce is exposed to
>> userspace as part of UAPI. It's bloating through vendor-specific data
>> should be avoided to limit the information being sent out to userspace.
>>
>> Add a new structure mce_hw_err to wrap the existing struct mce. The same
>> will prevent its ballooning since vendor-specifc data, if any, can now be
> 
> Unknown word [vendor-specifc] in commit message.
> 
> Please introduce a spellchecker into your patch creation workflow.
> 
Will fix this.

> Also:
> 
> The tip-tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
> 
> 	struct long_struct_name *descriptive_name;
> 	unsigned long foo, bar;
> 	unsigned int tmp;
> 	int ret;
> 
> The above is faster to parse than the reverse ordering::
> 
> 	int ret;
> 	unsigned int tmp;
> 	unsigned long foo, bar;
> 	struct long_struct_name *descriptive_name;
> 
> And even more so than random ordering::
> 
> 	unsigned long foo, bar;
> 	int ret;
> 	struct long_struct_name *descriptive_name;
> 	unsigned int tmp;
> 
> diff ontop of yours:
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 3611366d56b7..28e28b69d84d 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1030,11 +1030,11 @@ static noinstr int mce_timed_out(u64 *t, const char *msg)
>   */
>  static void mce_reign(void)
>  {
> -	int cpu;
>  	struct mce_hw_err *err = NULL;
>  	struct mce *m = NULL;
>  	int global_worst = 0;
>  	char *msg = NULL;
> +	int cpu;
>  
>  	/*
>  	 * This CPU is the Monarch and the other CPUs have run
> @@ -1291,8 +1291,8 @@ __mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs,
>  {
>  	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
>  	struct mca_config *cfg = &mca_cfg;
> -	struct mce *m = &err->m;
>  	int severity, i, taint = 0;
> +	struct mce *m = &err->m;
>  
>  	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
>  		arch___clear_bit(i, toclear);
> @@ -1419,8 +1419,8 @@ static void kill_me_never(struct callback_head *cb)
>  
>  static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func)(struct callback_head *))
>  {
> -	struct mce *m = &err->m;
>  	int count = ++current->mce_count;
> +	struct mce *m = &err->m;
>  
>  	/* First call, save all the details */
>  	if (count == 1) {
> diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
> index 504d89724ecd..d0be6dda0c14 100644
> --- a/arch/x86/kernel/cpu/mce/genpool.c
> +++ b/arch/x86/kernel/cpu/mce/genpool.c
> @@ -73,9 +73,9 @@ struct llist_node *mce_gen_pool_prepare_records(void)
>  
>  void mce_gen_pool_process(struct work_struct *__unused)
>  {
> -	struct mce *mce;
> -	struct llist_node *head;
>  	struct mce_evt_llist *node, *tmp;
> +	struct llist_node *head;
> +	struct mce *mce;
>  
>  	head = llist_del_all(&mce_event_llist);
>  	if (!head)
> diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
> index c65a5c4e2f22..313fe682db33 100644
> --- a/arch/x86/kernel/cpu/mce/inject.c
> +++ b/arch/x86/kernel/cpu/mce/inject.c
> @@ -502,9 +502,9 @@ static void prepare_msrs(void *info)
>  
>  static void do_inject(void)
>  {
> +	unsigned int cpu = i_mce.extcpu;
>  	struct mce_hw_err err;
>  	u64 mcg_status = 0;
> -	unsigned int cpu = i_mce.extcpu;
>  	u8 b = i_mce.bank;
>  
>  	i_mce.tsc = rdtsc_ordered();
> 

Ack for the suggestions.

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-30 16:35     ` Naik, Avadhut
@ 2024-10-30 16:48       ` Borislav Petkov
  2024-10-30 16:50         ` Naik, Avadhut
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 16:48 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab,
	yazen.ghannam, john.allen

On Wed, Oct 30, 2024 at 11:35:17AM -0500, Naik, Avadhut wrote:
> Will fix this.

You don't have to - I'll fix up while applying.

This was just for your future info.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 16:31     ` Yazen Ghannam
@ 2024-10-30 16:49       ` Naik, Avadhut
  2024-10-30 16:50       ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Naik, Avadhut @ 2024-10-30 16:49 UTC (permalink / raw)
  To: Yazen Ghannam, Borislav Petkov
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab, john.allen



On 10/30/2024 11:31, Yazen Ghannam wrote:
> On Wed, Oct 30, 2024 at 05:15:50PM +0100, Borislav Petkov wrote:
>> On Tue, Oct 22, 2024 at 07:36:31PM +0000, Avadhut Naik wrote:
>>> From: Yazen Ghannam <yazen.ghannam@amd.com>
>>>
>>> A new "FRU Text in MCA" feature is defined where the Field Replaceable
>>> Unit (FRU) Text for a device is represented by a string in the new
>>> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>>> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>>>
>>> The FRU Text is populated dynamically for each individual error state
>>> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
>>> covers multiple devices, for example, a Unified Memory Controller (UMC)
>>> bank that manages two DIMMs.
>>>
>>> Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
>>> exported through the mce_record tracepoint so that userspace tools like
>>> the rasdaemon can determine if FRU Text has been reported through the
>>> MCA_SYND1 and MCA_SYND2 registers and output it.
>>
>> IOW:
>>
>> Author: Yazen Ghannam <yazen.ghannam@amd.com>
>> Date:   Tue Oct 22 19:36:31 2024 +0000
>>
>>     EDAC/mce_amd: Add support for FRU text in MCA
>>     
>>     A new "FRU Text in MCA" feature is defined where the Field Replaceable
>>     Unit (FRU) Text for a device is represented by a string in the new
>>     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>>     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>>     
>>     The FRU Text is populated dynamically for each individual error state
>>     (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
>>     covers multiple devices, for example, a Unified Memory Controller (UMC)
>>     bank that manages two DIMMs.
>>     
>>     If SYND1 and SYND2 are !NULL, then userspace can assume that they
>>     contain FRU text information. If they will report other information in
>>     the future, then a way of communicating the info type contained must be
>>     devised.
>>     
>>       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
>>       [ bp: Do not expose MCA_CONFIG to userspace yet. ]
> 
> The entire struct mce_hw_err gets exposed through the mce tracepoint in
> patch 3 of this set.
> 
>>     
>>     Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>>     Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
>>     Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>>     Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
>>     Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 4d936ee20e24..649a901ad563 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -61,6 +61,7 @@
>>   *  - TCC bit is present in MCx_STATUS.
>>   */
>>  #define MCI_CONFIG_MCAX		0x1
>> +#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
>>  #define MCI_IPID_MCATYPE	0xFFFF0000
>>  #define MCI_IPID_HWID		0xFFF
>>  
>> @@ -212,6 +213,7 @@ struct mce_hw_err {
>>  		struct {
>>  			u64 synd1;		/* MCA_SYND1 MSR */
>>  			u64 synd2;		/* MCA_SYND2 MSR */
>> +			u64 config;		/* MCA_CONFIG MSR */
> 
> Anything that is added here will automatically show up in the
> tracepoint, since it's a dynamic array. That was one of the reasons to
> do the wrapper struct idea, right?
> 
>>  		} amd;
>>  	} vendor;
>>  };
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 6ca80fff1fea..65ace034af08 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>>  
>>  	if (mce_flags.smca) {
>>  		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
>> +		rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
>>  
>>  		if (m->status & MCI_STATUS_SYNDV) {
>>  			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
>> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
>> index 0a89947e47bc..19a1c72fc2bf 100644
>> --- a/arch/x86/kernel/cpu/mce/apei.c
>> +++ b/arch/x86/kernel/cpu/mce/apei.c
>> @@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>>  		fallthrough;
>>  	/* MCA_CONFIG */
>>  	case 4:
>> +		err.vendor.amd.config = *(i_mce + 3);
>> +		fallthrough;
>>  	/* MCA_MISC0 */
>>  	case 3:
>>  		m->misc = *(i_mce + 2);
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index fca23fe16abe..edc2c8033de8 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -681,6 +681,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
>>  
>>  	if (mce_flags.smca) {
>>  		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
>> +		err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
>>  
>>  		if (m->status & MCI_STATUS_SYNDV) {
>>  			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
>> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
>> index 194d9fd47d20..62fcd92bf9d2 100644
>> --- a/drivers/edac/mce_amd.c
>> +++ b/drivers/edac/mce_amd.c
>> @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  	struct mce *m = (struct mce *)data;
>>  	struct mce_hw_err *err = to_mce_hw_err(m);
>>  	unsigned int fam = x86_family(m->cpuid);
>> +	u64 mca_config = err->vendor.amd.config;
>>  	int ecc;
>>  
>>  	if (m->kflags & MCE_HANDLED_CEC)
>> @@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
>>  
>>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
>> -		u32 low, high;
>> -		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
>> -
>> -		if (!rdmsr_safe(addr, &low, &high) &&
>> -		    (low & MCI_CONFIG_MCAX))
>> +		if (mca_config & MCI_CONFIG_MCAX)
>>  			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
>>  
>>  		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
>> @@ -853,8 +850,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>>  
>>  		if (m->status & MCI_STATUS_SYNDV) {
>>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
>> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
>> -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
>> +			if (mca_config & MCI_CONFIG_FRUTEXT) {
>> +				char frutext[17];
>> +
>> +				frutext[16] = '\0';
>> +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
>> +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
>> +
>> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
>> +			}
>>  		}
>>  
>>  		pr_cont("\n");
>>
>>
> 
> The only changes I see are dropping a couple of kernel prints. I think
> that's probably okay. But I don't think that's what you intend by not
> exposing MCA_CONFIG to user space.
> 

Like Yazen mentioned, we dont explicitly need to add MCA_CONFIG to the tracepoint
Just adding it to struct mce_hw_err should be enough as it would then be
exported as vendor-specific data.

Additionally, userspace tools like rasdaemon would need MCA_CONFIG to undertake
FRU Text decoding from SYND1/2 registers.

> Thanks,
> Yazen

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 16:31     ` Yazen Ghannam
  2024-10-30 16:49       ` Naik, Avadhut
@ 2024-10-30 16:50       ` Borislav Petkov
  2024-10-30 18:01         ` Borislav Petkov
  1 sibling, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 16:50 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab, john.allen

On Wed, Oct 30, 2024 at 12:31:47PM -0400, Yazen Ghannam wrote:
> The entire struct mce_hw_err gets exposed through the mce tracepoint in
> patch 3 of this set.

Bah, crap. Lemme go back and take a second stab at this.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info
  2024-10-30 16:48       ` Borislav Petkov
@ 2024-10-30 16:50         ` Naik, Avadhut
  0 siblings, 0 replies; 25+ messages in thread
From: Naik, Avadhut @ 2024-10-30 16:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab,
	yazen.ghannam, john.allen



On 10/30/2024 11:48, Borislav Petkov wrote:
> On Wed, Oct 30, 2024 at 11:35:17AM -0500, Naik, Avadhut wrote:
>> Will fix this.
> 
> You don't have to - I'll fix up while applying.
> 
> This was just for your future info.
> 
Okay! Thank you!
> Thx.
> 

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 16:50       ` Borislav Petkov
@ 2024-10-30 18:01         ` Borislav Petkov
  2024-10-30 19:57           ` Naik, Avadhut
  0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 18:01 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab, john.allen

On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
> Bah, crap. Lemme go back and take a second stab at this.

Second try.

The reason why I don't want to expose MCA_CONFIG to userspace is, well,
userspace doesn't need to know any "management" information the hw gives. It
either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
know what MCA_CONFIG said or didn't say.

Ok?

Author: Yazen Ghannam <yazen.ghannam@amd.com>
Date:   Tue Oct 22 19:36:31 2024 +0000

    EDAC/mce_amd: Add support for FRU text in MCA
    
    A new "FRU Text in MCA" feature is defined where the Field Replaceable
    Unit (FRU) Text for a device is represented by a string in the new
    MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
    bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
    
    The FRU Text is populated dynamically for each individual error state
    (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
    multiple devices, for example, a Unified Memory Controller (UMC) bank
    that manages two DIMMs.
    
      [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
      [ bp: Do not expose MCA_CONFIG to userspace yet. ]
    
    Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
    Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
    Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4d936ee20e24..4543cf2eb5e8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
  *  - TCC bit is present in MCx_STATUS.
  */
 #define MCI_CONFIG_MCAX		0x1
+#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..50d74d3bf0f5 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
+	u32 mca_config_lo = 0, dummy;
 	int ecc;
 
 	if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		u32 low, high;
-		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+		rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
 
-		if (!rdmsr_safe(addr, &low, &high) &&
-		    (low & MCI_CONFIG_MCAX))
+		if (mca_config_lo & MCI_CONFIG_MCAX)
 			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
 
 		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
-			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+			if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
+				char frutext[17];
+
+				frutext[16] = '\0';
+				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+				pr_emerg(HW_ERR "FRU Text: %s", frutext);
+			}
 		}
 
 		pr_cont("\n");

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 18:01         ` Borislav Petkov
@ 2024-10-30 19:57           ` Naik, Avadhut
  2024-10-30 20:46             ` Yazen Ghannam
  2024-10-30 21:23             ` Borislav Petkov
  0 siblings, 2 replies; 25+ messages in thread
From: Naik, Avadhut @ 2024-10-30 19:57 UTC (permalink / raw)
  To: Borislav Petkov, Yazen Ghannam
  Cc: Avadhut Naik, x86, linux-edac, linux-trace-kernel, linux-kernel,
	tony.luck, qiuxu.zhuo, tglx, mingo, rostedt, mchehab, john.allen



On 10/30/2024 13:01, Borislav Petkov wrote:
> On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
>> Bah, crap. Lemme go back and take a second stab at this.
> 
> Second try.
> 
> The reason why I don't want to expose MCA_CONFIG to userspace is, well,
> userspace doesn't need to know any "management" information the hw gives. It
> either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
> know what MCA_CONFIG said or didn't say.
> 
> Ok?
> 
So, for now, in the kernel, we log SYND1/2 registers only when they contain
FRUText.
While in the userspace, since MCA_CONFIG is not in the picture, we always
interpret SYND1/2 data as FRUText.
Rasdaemon might need to be tweaked accordingly. Will take care of it.
Overall, sounds good.

Do you want me send out a revised version with these changes?

> Author: Yazen Ghannam <yazen.ghannam@amd.com>
> Date:   Tue Oct 22 19:36:31 2024 +0000
> 
>     EDAC/mce_amd: Add support for FRU text in MCA
>     
>     A new "FRU Text in MCA" feature is defined where the Field Replaceable
>     Unit (FRU) Text for a device is represented by a string in the new
>     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
>     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>     
>     The FRU Text is populated dynamically for each individual error state
>     (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
>     multiple devices, for example, a Unified Memory Controller (UMC) bank
>     that manages two DIMMs.
>     
>       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
>       [ bp: Do not expose MCA_CONFIG to userspace yet. ]
>     
>     Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>     Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
>     Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>     Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
>     Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 4d936ee20e24..4543cf2eb5e8 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -61,6 +61,7 @@
>   *  - TCC bit is present in MCx_STATUS.
>   */
>  #define MCI_CONFIG_MCAX		0x1
> +#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
>  #define MCI_IPID_MCATYPE	0xFFFF0000
>  #define MCI_IPID_HWID		0xFFF
>  
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 194d9fd47d20..50d74d3bf0f5 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  	struct mce *m = (struct mce *)data;
>  	struct mce_hw_err *err = to_mce_hw_err(m);
>  	unsigned int fam = x86_family(m->cpuid);
> +	u32 mca_config_lo = 0, dummy;
>  	int ecc;
>  
>  	if (m->kflags & MCE_HANDLED_CEC)
> @@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
>  
>  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> -		u32 low, high;
> -		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> +		rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
>  
> -		if (!rdmsr_safe(addr, &low, &high) &&
> -		    (low & MCI_CONFIG_MCAX))
> +		if (mca_config_lo & MCI_CONFIG_MCAX)
>  			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
>  
>  		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
> @@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>  
>  		if (m->status & MCI_STATUS_SYNDV) {
>  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
> +			if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
> +				char frutext[17];
> +
> +				frutext[16] = '\0';
> +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> +
> +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> +			}
>  		}
>  
>  		pr_cont("\n");
> 

-- 
Thanks,
Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 19:57           ` Naik, Avadhut
@ 2024-10-30 20:46             ` Yazen Ghannam
  2024-10-30 21:23             ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Yazen Ghannam @ 2024-10-30 20:46 UTC (permalink / raw)
  To: Naik, Avadhut, bp
  Cc: Borislav Petkov, Avadhut Naik, x86, linux-edac,
	linux-trace-kernel, linux-kernel, tony.luck, qiuxu.zhuo, tglx,
	mingo, rostedt, mchehab, john.allen

On Wed, Oct 30, 2024 at 02:57:33PM -0500, Naik, Avadhut wrote:
> 
> 
> On 10/30/2024 13:01, Borislav Petkov wrote:
> > On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
> >> Bah, crap. Lemme go back and take a second stab at this.
> > 
> > Second try.
> > 
> > The reason why I don't want to expose MCA_CONFIG to userspace is, well,
> > userspace doesn't need to know any "management" information the hw gives. It
> > either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
> > know what MCA_CONFIG said or didn't say.
> > 
> > Ok?
> > 
> So, for now, in the kernel, we log SYND1/2 registers only when they contain
> FRUText.
> While in the userspace, since MCA_CONFIG is not in the picture, we always
> interpret SYND1/2 data as FRUText.
> Rasdaemon might need to be tweaked accordingly. Will take care of it.
> Overall, sounds good.
>

Sounds good to me too.

Thanks,
Yazen

> Do you want me send out a revised version with these changes?
> 
> > Author: Yazen Ghannam <yazen.ghannam@amd.com>
> > Date:   Tue Oct 22 19:36:31 2024 +0000
> > 
> >     EDAC/mce_amd: Add support for FRU text in MCA
> >     
> >     A new "FRU Text in MCA" feature is defined where the Field Replaceable
> >     Unit (FRU) Text for a device is represented by a string in the new
> >     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> >     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> >     
> >     The FRU Text is populated dynamically for each individual error state
> >     (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
> >     multiple devices, for example, a Unified Memory Controller (UMC) bank
> >     that manages two DIMMs.
> >     
> >       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
> >       [ bp: Do not expose MCA_CONFIG to userspace yet. ]
> >     
> >     Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> >     Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> >     Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> >     Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> >     Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
> > 
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 4d936ee20e24..4543cf2eb5e8 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -61,6 +61,7 @@
> >   *  - TCC bit is present in MCx_STATUS.
> >   */
> >  #define MCI_CONFIG_MCAX		0x1
> > +#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
> >  #define MCI_IPID_MCATYPE	0xFFFF0000
> >  #define MCI_IPID_HWID		0xFFF
> >  
> > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> > index 194d9fd47d20..50d74d3bf0f5 100644
> > --- a/drivers/edac/mce_amd.c
> > +++ b/drivers/edac/mce_amd.c
> > @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  	struct mce *m = (struct mce *)data;
> >  	struct mce_hw_err *err = to_mce_hw_err(m);
> >  	unsigned int fam = x86_family(m->cpuid);
> > +	u32 mca_config_lo = 0, dummy;
> >  	int ecc;
> >  
> >  	if (m->kflags & MCE_HANDLED_CEC)
> > @@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
> >  
> >  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> > -		u32 low, high;
> > -		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> > +		rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
> >  
> > -		if (!rdmsr_safe(addr, &low, &high) &&
> > -		    (low & MCI_CONFIG_MCAX))
> > +		if (mca_config_lo & MCI_CONFIG_MCAX)
> >  			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
> >  
> >  		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
> > @@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  
> >  		if (m->status & MCI_STATUS_SYNDV) {
> >  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> > -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> > -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
> > +			if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
> > +				char frutext[17];
> > +
> > +				frutext[16] = '\0';
> > +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> > +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> > +
> > +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> > +			}
> >  		}
> >  
> >  		pr_cont("\n");
> > 
> 
> -- 
> Thanks,
> Avadhut Naik

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
  2024-10-30 19:57           ` Naik, Avadhut
  2024-10-30 20:46             ` Yazen Ghannam
@ 2024-10-30 21:23             ` Borislav Petkov
  1 sibling, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2024-10-30 21:23 UTC (permalink / raw)
  To: Naik, Avadhut
  Cc: Yazen Ghannam, Avadhut Naik, x86, linux-edac, linux-trace-kernel,
	linux-kernel, tony.luck, qiuxu.zhuo, tglx, mingo, rostedt,
	mchehab, john.allen

On Wed, Oct 30, 2024 at 02:57:33PM -0500, Naik, Avadhut wrote:
> So, for now, in the kernel, we log SYND1/2 registers only when they contain
> FRUText.  While in the userspace, since MCA_CONFIG is not in the picture, we
> always interpret SYND1/2 data as FRUText.  Rasdaemon might need to be
> tweaked accordingly. Will take care of it.  Overall, sounds good.

Thanks.

> Do you want me send out a revised version with these changes?

Nah, I will queue them soon as I have all the bits here already.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-10-30 21:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
2024-10-24  2:21   ` Zhuo, Qiuxu
2024-10-30 13:32   ` Borislav Petkov
2024-10-30 16:35     ` Naik, Avadhut
2024-10-30 16:48       ` Borislav Petkov
2024-10-30 16:50         ` Naik, Avadhut
2024-10-22 19:36 ` [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
2024-10-24  2:25   ` Zhuo, Qiuxu
2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
2024-10-24  5:25   ` Zhuo, Qiuxu
2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
2024-10-24  5:49   ` Zhuo, Qiuxu
2024-10-30 16:05   ` Borislav Petkov
2024-10-30 16:15   ` Borislav Petkov
2024-10-30 16:31     ` Yazen Ghannam
2024-10-30 16:49       ` Naik, Avadhut
2024-10-30 16:50       ` Borislav Petkov
2024-10-30 18:01         ` Borislav Petkov
2024-10-30 19:57           ` Naik, Avadhut
2024-10-30 20:46             ` Yazen Ghannam
2024-10-30 21:23             ` Borislav Petkov
2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
2024-10-29 18:27   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).