From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Avadhut Naik <avadhut.naik@amd.com>,
x86@kernel.org, linux-edac@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org,
tony.luck@intel.com, qiuxu.zhuo@intel.com, tglx@linutronix.de,
mingo@redhat.com, rostedt@goodmis.org, mchehab@kernel.org,
john.allen@amd.com
Subject: Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
Date: Wed, 30 Oct 2024 12:31:47 -0400 [thread overview]
Message-ID: <20241030163147.GA1379143@yaz-khff2.amd.com> (raw)
In-Reply-To: <20241030161550.GFZyJbthMO_2Wxe3bV@fat_crate.local>
On Wed, Oct 30, 2024 at 05:15:50PM +0100, Borislav Petkov wrote:
> On Tue, Oct 22, 2024 at 07:36:31PM +0000, Avadhut Naik wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > A new "FRU Text in MCA" feature is defined where the Field Replaceable
> > Unit (FRU) Text for a device is represented by a string in the new
> > MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> > bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> >
> > The FRU Text is populated dynamically for each individual error state
> > (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> > covers multiple devices, for example, a Unified Memory Controller (UMC)
> > bank that manages two DIMMs.
> >
> > Since MCA_CONFIG[9] is instrumental in decoding FRU Text, it has to be
> > exported through the mce_record tracepoint so that userspace tools like
> > the rasdaemon can determine if FRU Text has been reported through the
> > MCA_SYND1 and MCA_SYND2 registers and output it.
>
> IOW:
>
> Author: Yazen Ghannam <yazen.ghannam@amd.com>
> Date: Tue Oct 22 19:36:31 2024 +0000
>
> EDAC/mce_amd: Add support for FRU text in MCA
>
> A new "FRU Text in MCA" feature is defined where the Field Replaceable
> Unit (FRU) Text for a device is represented by a string in the new
> MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
>
> The FRU Text is populated dynamically for each individual error state
> (MCA_STATUS, MCA_ADDR, et al.). This handles the case where an MCA bank
> covers multiple devices, for example, a Unified Memory Controller (UMC)
> bank that manages two DIMMs.
>
> If SYND1 and SYND2 are !NULL, then userspace can assume that they
> contain FRU text information. If they will report other information in
> the future, then a way of communicating the info type contained must be
> devised.
>
> [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
> [ bp: Do not expose MCA_CONFIG to userspace yet. ]
The entire struct mce_hw_err gets exposed through the mce tracepoint in
patch 3 of this set.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 4d936ee20e24..649a901ad563 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -61,6 +61,7 @@
> * - TCC bit is present in MCx_STATUS.
> */
> #define MCI_CONFIG_MCAX 0x1
> +#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
> #define MCI_IPID_MCATYPE 0xFFFF0000
> #define MCI_IPID_HWID 0xFFF
>
> @@ -212,6 +213,7 @@ struct mce_hw_err {
> struct {
> u64 synd1; /* MCA_SYND1 MSR */
> u64 synd2; /* MCA_SYND2 MSR */
> + u64 config; /* MCA_CONFIG MSR */
Anything that is added here will automatically show up in the
tracepoint, since it's a dynamic array. That was one of the reasons to
do the wrapper struct idea, right?
> } amd;
> } vendor;
> };
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 6ca80fff1fea..65ace034af08 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -796,6 +796,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
>
> if (mce_flags.smca) {
> rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(bank), err.vendor.amd.config);
>
> if (m->status & MCI_STATUS_SYNDV) {
> rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
> diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
> index 0a89947e47bc..19a1c72fc2bf 100644
> --- a/arch/x86/kernel/cpu/mce/apei.c
> +++ b/arch/x86/kernel/cpu/mce/apei.c
> @@ -155,6 +155,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
> fallthrough;
> /* MCA_CONFIG */
> case 4:
> + err.vendor.amd.config = *(i_mce + 3);
> + fallthrough;
> /* MCA_MISC0 */
> case 3:
> m->misc = *(i_mce + 2);
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index fca23fe16abe..edc2c8033de8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -681,6 +681,7 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
>
> if (mce_flags.smca) {
> m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
> + err->vendor.amd.config = mce_rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(i));
>
> if (m->status & MCI_STATUS_SYNDV) {
> m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
> diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> index 194d9fd47d20..62fcd92bf9d2 100644
> --- a/drivers/edac/mce_amd.c
> +++ b/drivers/edac/mce_amd.c
> @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> struct mce *m = (struct mce *)data;
> struct mce_hw_err *err = to_mce_hw_err(m);
> unsigned int fam = x86_family(m->cpuid);
> + u64 mca_config = err->vendor.amd.config;
> int ecc;
>
> if (m->kflags & MCE_HANDLED_CEC)
> @@ -814,11 +815,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> ((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));
>
> if (boot_cpu_has(X86_FEATURE_SMCA)) {
> - u32 low, high;
> - u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> -
> - if (!rdmsr_safe(addr, &low, &high) &&
> - (low & MCI_CONFIG_MCAX))
> + if (mca_config & MCI_CONFIG_MCAX)
> pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
>
> pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
> @@ -853,8 +850,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
>
> if (m->status & MCI_STATUS_SYNDV) {
> pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> - pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> - err->vendor.amd.synd1, err->vendor.amd.synd2);
> + if (mca_config & MCI_CONFIG_FRUTEXT) {
> + char frutext[17];
> +
> + frutext[16] = '\0';
> + memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> + memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> +
> + pr_emerg(HW_ERR "FRU Text: %s", frutext);
> + }
> }
>
> pr_cont("\n");
>
>
The only changes I see are dropping a couple of kernel prints. I think
that's probably okay. But I don't think that's what you intend by not
exposing MCA_CONFIG to user space.
Thanks,
Yazen
next prev parent reply other threads:[~2024-10-30 16:31 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-22 19:36 [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 1/5] x86/mce: Add wrapper for struct mce to export vendor specific info Avadhut Naik
2024-10-24 2:21 ` Zhuo, Qiuxu
2024-10-30 13:32 ` Borislav Petkov
2024-10-30 16:35 ` Naik, Avadhut
2024-10-30 16:48 ` Borislav Petkov
2024-10-30 16:50 ` Naik, Avadhut
2024-10-22 19:36 ` [PATCH v7 2/5] tracing: Add __print_dynamic_array() helper Avadhut Naik
2024-10-22 19:36 ` [PATCH v7 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers Avadhut Naik
2024-10-24 2:25 ` Zhuo, Qiuxu
2024-10-22 19:36 ` [PATCH v7 4/5] x86/mce/apei: Handle variable register array size Avadhut Naik
2024-10-24 5:25 ` Zhuo, Qiuxu
2024-10-22 19:36 ` [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA Avadhut Naik
2024-10-24 5:49 ` Zhuo, Qiuxu
2024-10-30 16:05 ` Borislav Petkov
2024-10-30 16:15 ` Borislav Petkov
2024-10-30 16:31 ` Yazen Ghannam [this message]
2024-10-30 16:49 ` Naik, Avadhut
2024-10-30 16:50 ` Borislav Petkov
2024-10-30 18:01 ` Borislav Petkov
2024-10-30 19:57 ` Naik, Avadhut
2024-10-30 20:46 ` Yazen Ghannam
2024-10-30 21:23 ` Borislav Petkov
2024-10-29 18:14 ` [PATCH v7 0/5] MCE wrapper and support for new SMCA syndrome MSRs Naik, Avadhut
2024-10-29 18:27 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241030163147.GA1379143@yaz-khff2.amd.com \
--to=yazen.ghannam@amd.com \
--cc=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=john.allen@amd.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=qiuxu.zhuo@intel.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox