From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Avadhut Naik <avadhut.naik@amd.com>
Cc: x86@kernel.org, linux-edac@vger.kernel.org, bp@alien8.de,
tony.luck@intel.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Date: Wed, 6 Aug 2025 08:52:20 -0400 [thread overview]
Message-ID: <20250806125220.GA628281@yaz-khff2.amd.com> (raw)
In-Reply-To: <20250729204801.1044100-1-avadhut.naik@amd.com>
On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
>
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
>
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.
The commit message is missing an imperative statement. You can describe
how and why these bits are helpful.
For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
usable address." or something like this.
>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
> arch/x86/include/asm/mce.h | 2 ++
> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c77c03139f7..387cf250525f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -48,6 +48,7 @@
>
> /* AMD-specific bits */
> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
> @@ -62,6 +63,7 @@
> */
> #define MCI_CONFIG_MCAX 0x1
> #define MCI_CONFIG_FRUTEXT BIT_ULL(9)
> +#define MCI_CONFIG_PAVALID BIT_ULL(11)
> #define MCI_IPID_MCATYPE 0xFFFF0000
> #define MCI_IPID_HWID 0xFFF
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 5c4eb28c3ac9..6ac222aec28d 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
> }
>
> /*
> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
> - * a system physical address. Therefore, individual cases need to be detected.
> - * Future cases and checks will be added as needed.
> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
> + * system physical address. Individual cases though, need to be detected for
> + * other systems. Future cases will be added as needed.
> *
> * 1) General case
> * a) Assume address is not usable.
> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
> * a) Reported in legacy bank 4 with extended error code (XEC) 8.
> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
> * this bit should not be checked.
> + * 4) MCI_STATUS_PADDRVAL is set
> + * a)Will provide a valid system physical address.
Space after "a)".
> *
> * NOTE: SMCA UMC memory errors fall into case #1.
> */
> bool amd_mce_usable_address(struct mce *m)
> {
> + u64 smca_config;
> +
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
Newline here.
> + if (smca_config & MCI_CONFIG_PAVALID) {
> + if(m->status & MCI_STATUS_PADDRVAL)
> + return true;
> + return false;
> + }
Newline here.
Also, the entire hunk above should go after the !SMCA case below and
before the other SMCA cases.
Furthermore, the hunk can be simplified to this:
if (smca_config & MCI_CONFIG_PAVALID)
return m->status & MCI_STATUS_PADDRVAL;
Also also, the bit names are uncannily similar. I think they should be
the same (except for the prefix) or clearly different.
MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
> /* Check special northbridge case 3) first. */
> if (!mce_flags.smca) {
> if (legacy_mce_is_memory_error(m))
>
> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
> --
We should revisit saving MCA_CONFIG during logging. We end up reading it
multiple times during error processing.
I realize this came up before in a previous patch set. IIRC, the point
of contention was exposing MCA_CONFIG to userspace. We don't necessarily
have to do that. We can just use it in the kernel data structures.
Thanks,
Yazen
next prev parent reply other threads:[~2025-08-06 12:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 20:46 [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam [this message]
2025-08-06 21:23 ` Naik, Avadhut
-- strict thread matches above, loose matches on Subject: below --
2025-11-13 16:52 [PATCH v2] " Borislav Petkov
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
2025-11-16 18:05 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250806125220.GA628281@yaz-khff2.amd.com \
--to=yazen.ghannam@amd.com \
--cc=avadhut.naik@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox