From: Avadhut Naik <avadhut.naik@amd.com>
To: <x86@kernel.org>, <linux-edac@vger.kernel.org>,
<linux-trace-kernel@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <bp@alien8.de>,
<tony.luck@intel.com>, <qiuxu.zhuo@intel.com>,
<tglx@linutronix.de>, <mingo@redhat.com>, <rostedt@goodmis.org>,
<mchehab@kernel.org>, <yazen.ghannam@amd.com>,
<john.allen@amd.com>, <avadhut.naik@amd.com>
Subject: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
Date: Tue, 22 Oct 2024 19:36:31 +0000 [thread overview]
Message-ID: <20241022194158.110073-6-avadhut.naik@amd.com> (raw)
In-Reply-To: <20241022194158.110073-1-avadhut.naik@amd.com>
From: Yazen Ghannam <yazen.ghannam@amd.com>
A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
covers multiple devices, for example, a Unified Memory Controller (UMC)
bank that manages two DIMMs.
Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
exported through the mce_record tracepoint so that userspace tools like
the rasdaemon can determine if FRU Text has been reported through the
MCA_SYND1 and MCA_SYND2 registers and output it.
[Yazen: Add Avadhut as co-developer for wrapper changes.]
Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/
1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.
Changes in v3:
1. Modify commit message per feedback provided.
2. Remove call to memset() for the string frutext. Instead, just ensure
that it is NULL terminated.
2. Fix SoB chain to properly reflect the patch path.
Changes in v4:
1. Rebase on top of tip/master to avoid merge conflicts.
Changes in v5:
1. No changes except rebasing on top of tip/master.
Changes in v6:
1. No changes except rebasing on top of tip/master.
Changes in v7:
1. No changes except rebasing on top of tip/master.
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 1 +
arch/x86/kernel/cpu/mce/apei.c | 2 ++
arch/x86/kernel/cpu/mce/core.c | 3 +++
drivers/edac/mce_amd.c | 21 ++++++++++++++-------
5 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c2466b20fe79..72a69ad7d692 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
* - TCC bit is present in MCx_STATUS.
*/
#define MCI_CONFIG_MCAX 0x1
+#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
@@ -213,6 +214,7 @@ struct mce_hw_err {
struct {
u64 synd1; /* MCA_SYND1 MSR */
u64 synd2; /* MCA_SYND2 MSR */
+ u64 config; /* MCA_CONFIG MSR */
} amd;
} vendor;
};
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6ca80fff1fea..65ace034af08 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
if (mce_flags.smca) {
rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
if (m->status & MCI_STATUS_SYNDV) {
rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 0a89947e47bc..19a1c72fc2bf 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
fallthrough;
/* MCA_CONFIG */
case 4:
+ err.vendor.amd.config = *(i_mce + 3);
+ fallthrough;
/* MCA_MISC0 */
case 3:
m->misc = *(i_mce + 2);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index fca23fe16abe..bc5e67306f77 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -208,6 +208,8 @@ static void __print_mce(struct mce_hw_err *err)
pr_cont("SYND2 %llx ", err->vendor.amd.synd2);
if (m->ipid)
pr_cont("IPID %llx ", m->ipid);
+ if (err->vendor.amd.config)
+ pr_cont("CONFIG %llx ", err->vendor.amd.config);
}
pr_cont("\n");
@@ -681,6 +683,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
if (mce_flags.smca) {
m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
+ err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
if (m->status & MCI_STATUS_SYNDV) {
m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..d69a1466f0bc 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
struct mce *m = (struct mce *)data;
struct mce_hw_err *err = to_mce_hw_err(m);
unsigned int fam = x86_family(m->cpuid);
+ u64 mca_config = err->vendor.amd.config;
int ecc;
if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));
if (boot_cpu_has(X86_FEATURE_SMCA)) {
- u32 low, high;
- u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
-
- if (!rdmsr_safe(addr, &low, &high) &&
- (low & MCI_CONFIG_MCAX))
+ if (mca_config & MCI_CONFIG_MCAX)
pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +850,18 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (m->status & MCI_STATUS_SYNDV) {
pr_cont(", Syndrome: 0x%016llx\n", m->synd);
- pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
- err->vendor.amd.synd1, err->vendor.amd.synd2);
+ if (mca_config & MCI_CONFIG_FRUTEXT) {
+ char frutext[17];
+
+ frutext[16] = '\0';
+ memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+ memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+ pr_emerg(HW_ERR "FRU Text: %s", frutext);
+ } else {
+ pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+ err->vendor.amd.synd1, err->vendor.amd.synd2);
+ }
}
pr_cont("\n");
--
2.43.0
next prev parent reply other threads:[~2024-10-22 19:43 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
2024-10-24 2:21 ` Zhuo, Qiuxu
2024-10-30 13:32 ` Borislav Petkov
2024-10-30 16:35 ` Naik, Avadhut
2024-10-30 16:48 ` Borislav Petkov
2024-10-30 16:50 ` Naik, Avadhut
2024-10-22 19:36 ` [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
2024-10-24 2:25 ` Zhuo, Qiuxu
2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
2024-10-24 5:25 ` Zhuo, Qiuxu
2024-10-22 19:36 ` Avadhut Naik [this message]
2024-10-24 5:49 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Zhuo, Qiuxu
2024-10-30 16:05 ` Borislav Petkov
2024-10-30 16:15 ` Borislav Petkov
2024-10-30 16:31 ` Yazen Ghannam
2024-10-30 16:49 ` Naik, Avadhut
2024-10-30 16:50 ` Borislav Petkov
2024-10-30 18:01 ` Borislav Petkov
2024-10-30 19:57 ` Naik, Avadhut
2024-10-30 20:46 ` Yazen Ghannam
2024-10-30 21:23 ` Borislav Petkov
2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
2024-10-29 18:27 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241022194158.110073-6-avadhut.naik@amd.com \
--to=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=john.allen@amd.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=qiuxu.zhuo@intel.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox