* [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
@ 2025-07-29 20:46 Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam
0 siblings, 1 reply; 5+ messages in thread
From: Avadhut Naik @ 2025-07-29 20:46 UTC (permalink / raw)
To: x86, linux-edac; +Cc: bp, tony.luck, yazen.ghannam, linux-kernel, avadhut.naik
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.
PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.
PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..387cf250525f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PAVALID BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5c4eb28c3ac9..6ac222aec28d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a)Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
bool amd_mce_usable_address(struct mce *m)
{
+ u64 smca_config;
+
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+ if (smca_config & MCI_CONFIG_PAVALID) {
+ if(m->status & MCI_STATUS_PADDRVAL)
+ return true;
+ return false;
+ }
/* Check special northbridge case 3) first. */
if (!mce_flags.smca) {
if (legacy_mce_is_memory_error(m))
base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-07-29 20:46 [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
@ 2025-08-06 12:52 ` Yazen Ghannam
2025-08-06 21:23 ` Naik, Avadhut
0 siblings, 1 reply; 5+ messages in thread
From: Yazen Ghannam @ 2025-08-06 12:52 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel
On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
>
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
>
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.
The commit message is missing an imperative statement. You can describe
how and why these bits are helpful.
For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
usable address." or something like this.
>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
> arch/x86/include/asm/mce.h | 2 ++
> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c77c03139f7..387cf250525f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -48,6 +48,7 @@
>
> /* AMD-specific bits */
> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
> @@ -62,6 +63,7 @@
> */
> #define MCI_CONFIG_MCAX 0x1
> #define MCI_CONFIG_FRUTEXT BIT_ULL(9)
> +#define MCI_CONFIG_PAVALID BIT_ULL(11)
> #define MCI_IPID_MCATYPE 0xFFFF0000
> #define MCI_IPID_HWID 0xFFF
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 5c4eb28c3ac9..6ac222aec28d 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
> }
>
> /*
> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
> - * a system physical address. Therefore, individual cases need to be detected.
> - * Future cases and checks will be added as needed.
> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
> + * system physical address. Individual cases though, need to be detected for
> + * other systems. Future cases will be added as needed.
> *
> * 1) General case
> * a) Assume address is not usable.
> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
> * a) Reported in legacy bank 4 with extended error code (XEC) 8.
> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
> * this bit should not be checked.
> + * 4) MCI_STATUS_PADDRVAL is set
> + * a)Will provide a valid system physical address.
Space after "a)".
> *
> * NOTE: SMCA UMC memory errors fall into case #1.
> */
> bool amd_mce_usable_address(struct mce *m)
> {
> + u64 smca_config;
> +
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
Newline here.
> + if (smca_config & MCI_CONFIG_PAVALID) {
> + if(m->status & MCI_STATUS_PADDRVAL)
> + return true;
> + return false;
> + }
Newline here.
Also, the entire hunk above should go after the !SMCA case below and
before the other SMCA cases.
Furthermore, the hunk can be simplified to this:
if (smca_config & MCI_CONFIG_PAVALID)
return m->status & MCI_STATUS_PADDRVAL;
Also also, the bit names are uncannily similar. I think they should be
the same (except for the prefix) or clearly different.
MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
> /* Check special northbridge case 3) first. */
> if (!mce_flags.smca) {
> if (legacy_mce_is_memory_error(m))
>
> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
> --
We should revisit saving MCA_CONFIG during logging. We end up reading it
multiple times during error processing.
I realize this came up before in a previous patch set. IIRC, the point
of contention was exposing MCA_CONFIG to userspace. We don't necessarily
have to do that. We can just use it in the kernel data structures.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-08-06 12:52 ` Yazen Ghannam
@ 2025-08-06 21:23 ` Naik, Avadhut
0 siblings, 0 replies; 5+ messages in thread
From: Naik, Avadhut @ 2025-08-06 21:23 UTC (permalink / raw)
To: Yazen Ghannam, Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel
Hi,
On 8/6/2025 07:52, Yazen Ghannam wrote:
> On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
>> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
>> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
>> valid System Physical Address (SPA) is present in MCA_ADDR.
>>
>> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
>> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
>> if it indicates validity of SPA in MCA_ADDR.
>>
>> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
>> SPA or if it is implementation specific.
>
> The commit message is missing an imperative statement. You can describe
> how and why these bits are helpful.
>
> For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
> usable address." or something like this.
>
Will add it.
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>> arch/x86/include/asm/mce.h | 2 ++
>> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
>> 2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 6c77c03139f7..387cf250525f 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -48,6 +48,7 @@
>>
>> /* AMD-specific bits */
>> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
>> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
>> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
>> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
>> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
>> @@ -62,6 +63,7 @@
>> */
>> #define MCI_CONFIG_MCAX 0x1
>> #define MCI_CONFIG_FRUTEXT BIT_ULL(9)
>> +#define MCI_CONFIG_PAVALID BIT_ULL(11)
>> #define MCI_IPID_MCATYPE 0xFFFF0000
>> #define MCI_IPID_HWID 0xFFF
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 5c4eb28c3ac9..6ac222aec28d 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
>> }
>>
>> /*
>> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
>> - * a system physical address. Therefore, individual cases need to be detected.
>> - * Future cases and checks will be added as needed.
>> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
>> + * system physical address. Individual cases though, need to be detected for
>> + * other systems. Future cases will be added as needed.
>> *
>> * 1) General case
>> * a) Assume address is not usable.
>> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
>> * a) Reported in legacy bank 4 with extended error code (XEC) 8.
>> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
>> * this bit should not be checked.
>> + * 4) MCI_STATUS_PADDRVAL is set
>> + * a)Will provide a valid system physical address.
>
> Space after "a)".
>
Noted.
>> *
>> * NOTE: SMCA UMC memory errors fall into case #1.
>> */
>> bool amd_mce_usable_address(struct mce *m)
>> {
>> + u64 smca_config;
>> +
>> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
>
> Newline here.
>
Noted.
>> + if (smca_config & MCI_CONFIG_PAVALID) {
>> + if(m->status & MCI_STATUS_PADDRVAL)
>> + return true;
>> + return false;
>> + }
>
> Newline here.
>
Noted.
> Also, the entire hunk above should go after the !SMCA case below and
> before the other SMCA cases.
>
> Furthermore, the hunk can be simplified to this:
>
> if (smca_config & MCI_CONFIG_PAVALID)
> return m->status & MCI_STATUS_PADDRVAL;
>
Ack for both.
> Also also, the bit names are uncannily similar. I think they should be
> the same (except for the prefix) or clearly different.
>
> MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
>
> MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
>
Will change the name per your suggestions.
>> /* Check special northbridge case 3) first. */
>> if (!mce_flags.smca) {
>> if (legacy_mce_is_memory_error(m))
>>
>> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
>> --
>
> We should revisit saving MCA_CONFIG during logging. We end up reading it
> multiple times during error processing.
>
> I realize this came up before in a previous patch set. IIRC, the point
> of contention was exposing MCA_CONFIG to userspace. We don't necessarily
> have to do that. We can just use it in the kernel data structures.
>
I agree. If there is consensus, I can work on getting a patch out.
Maybe make this a set with the first patch caching MCA_CONFIG and
this being the second patch.
> Thanks,
> Yazen
--
Thanks,
Avadhut Naik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
@ 2025-11-13 16:52 Borislav Petkov
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2025-11-13 16:52 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> @@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
> return false;
> }
>
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
We have struct smca_bank and per-CPU smca_banks array.
MCI_CONFIG_PADDRV looks like a property of the bank which is static and
doesn't change willy-nilly. So instead of doing the silly MSR read on every
error, you can cache the fact that the bank supports MCI_STATUS_PADDRV and
query that and save us 100+ unnecessary cycles every time...
No?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-11-13 16:52 [PATCH v2] " Borislav Petkov
@ 2025-11-14 20:20 ` Avadhut Naik
2025-11-16 18:05 ` Borislav Petkov
0 siblings, 1 reply; 5+ messages in thread
From: Avadhut Naik @ 2025-11-14 20:20 UTC (permalink / raw)
To: bp; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel,
avadhut.naik
Do you mean something like this?
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1cfbfff0be3f..2d98886de09a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PADDRV BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 53385e6aa230..c6be2f520476 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -87,6 +87,7 @@ struct smca_bank {
const struct smca_hwid *hwid;
u32 id; /* Value of MCA_IPID[InstanceId]. */
u8 sysfs_id; /* Value used for sysfs name. */
+ bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
};
static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
@@ -327,6 +328,9 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
+ if (low & MCI_CONFIG_PADDRV)
+ this_cpu_ptr(smca_banks)[bank].paddrv_support = true;
+
wrmsr(smca_config, low, high);
}
@@ -790,9 +794,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -806,6 +810,8 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a) Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
@@ -819,6 +825,9 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}
+ if (this_cpu_ptr(smca_banks)[m->bank].paddrv_support)
+ return m->status & MCI_STATUS_PADDRV;
+
/* Check poison bit for all other bank types. */
if (m->status & MCI_STATUS_POISON)
return true;
base-commit: 438be5bb46f4be6e78cef7c3400f20d77f03c734
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
@ 2025-11-16 18:05 ` Borislav Petkov
0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2025-11-16 18:05 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel
On Fri, Nov 14, 2025 at 08:20:20PM +0000, Avadhut Naik wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 53385e6aa230..c6be2f520476 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -87,6 +87,7 @@ struct smca_bank {
> const struct smca_hwid *hwid;
> u32 id; /* Value of MCA_IPID[InstanceId]. */
> u8 sysfs_id; /* Value used for sysfs name. */
> + bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
u64 paddrv : 1,
__reserved : 63;
Otherwise, yes.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-16 18:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 20:46 [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam
2025-08-06 21:23 ` Naik, Avadhut
-- strict thread matches above, loose matches on Subject: below --
2025-11-13 16:52 [PATCH v2] " Borislav Petkov
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
2025-11-16 18:05 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox