* [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
@ 2025-10-15 17:22 Avadhut Naik
2025-10-17 14:31 ` Yazen Ghannam
2025-11-13 16:52 ` Borislav Petkov
0 siblings, 2 replies; 9+ messages in thread
From: Avadhut Naik @ 2025-10-15 17:22 UTC (permalink / raw)
To: x86, linux-edac; +Cc: bp, tony.luck, yazen.ghannam, linux-kernel, avadhut.naik
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.
PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.
PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.
Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
1. Modified commit message per feedback received.
2. Refactored and simplified per feedback received.
Links:
v1: https://lore.kernel.org/all/20250729204801.1044100-1-avadhut.naik@amd.com/
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 31e3cb550fb3..09d04cedf175 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PADDRV BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d6906442f49b..e92829a423b1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -764,11 +764,15 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a) Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
bool amd_mce_usable_address(struct mce *m)
{
+ u64 smca_config;
+
/* Check special northbridge case 3) first. */
if (!mce_flags.smca) {
if (legacy_mce_is_memory_error(m))
@@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+
+ if (smca_config & MCI_CONFIG_PADDRV)
+ return m->status & MCI_STATUS_PADDRV;
+
/* Check poison bit for all other bank types. */
if (m->status & MCI_STATUS_POISON)
return true;
base-commit: 0292ef418ce08aad597fc0bba65b6dbb841808ba
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-10-15 17:22 [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
@ 2025-10-17 14:31 ` Yazen Ghannam
2025-10-17 16:44 ` Naik, Avadhut
2025-11-13 16:52 ` Borislav Petkov
1 sibling, 1 reply; 9+ messages in thread
From: Yazen Ghannam @ 2025-10-17 14:31 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
>
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
>
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.
>
> Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
Minor nit:
The $SUBJECT could be simpler, like "Use new physical address valid
bit". And leave the details and proper field names for the commit
message.
But in any case, looks good to me.
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Thanks,
Yazen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-10-17 14:31 ` Yazen Ghannam
@ 2025-10-17 16:44 ` Naik, Avadhut
0 siblings, 0 replies; 9+ messages in thread
From: Naik, Avadhut @ 2025-10-17 16:44 UTC (permalink / raw)
To: Yazen Ghannam, bp; +Cc: x86, linux-edac, tony.luck, linux-kernel, Avadhut Naik
On 10/17/2025 09:31, Yazen Ghannam wrote:
> On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
>> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
>> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
>> valid System Physical Address (SPA) is present in MCA_ADDR.
>>
>> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
>> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
>> if it indicates validity of SPA in MCA_ADDR.
>>
>> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
>> SPA or if it is implementation specific.
>>
>> Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>
> Minor nit:
> The $SUBJECT could be simpler, like "Use new physical address valid
> bit". And leave the details and proper field names for the commit
> message.
>
> But in any case, looks good to me.
>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>
Thanks Yazen!
Hi Boris,
If you don't have any further feedback on this, do you want me to send
another version with the subject changed per Yazen's recommendation?
Or would you take care of it at your end?
> Thanks,
> Yazen
--
Thanks,
Avadhut Naik
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-10-15 17:22 [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
2025-10-17 14:31 ` Yazen Ghannam
@ 2025-11-13 16:52 ` Borislav Petkov
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
1 sibling, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2025-11-13 16:52 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> @@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
> return false;
> }
>
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
We have struct smca_bank and per-CPU smca_banks array.
MCI_CONFIG_PADDRV looks like a property of the bank which is static and
doesn't change willy-nilly. So instead of doing the silly MSR read on every
error, you can cache the fact that the bank supports MCI_STATUS_PADDRV and
query that and save us 100+ unnecessary cycles every time...
No?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-11-13 16:52 ` Borislav Petkov
@ 2025-11-14 20:20 ` Avadhut Naik
2025-11-16 18:05 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Avadhut Naik @ 2025-11-14 20:20 UTC (permalink / raw)
To: bp; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel,
avadhut.naik
Do you mean something like this?
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1cfbfff0be3f..2d98886de09a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PADDRV BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 53385e6aa230..c6be2f520476 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -87,6 +87,7 @@ struct smca_bank {
const struct smca_hwid *hwid;
u32 id; /* Value of MCA_IPID[InstanceId]. */
u8 sysfs_id; /* Value used for sysfs name. */
+ bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
};
static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
@@ -327,6 +328,9 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
+ if (low & MCI_CONFIG_PADDRV)
+ this_cpu_ptr(smca_banks)[bank].paddrv_support = true;
+
wrmsr(smca_config, low, high);
}
@@ -790,9 +794,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -806,6 +810,8 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a) Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
@@ -819,6 +825,9 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}
+ if (this_cpu_ptr(smca_banks)[m->bank].paddrv_support)
+ return m->status & MCI_STATUS_PADDRV;
+
/* Check poison bit for all other bank types. */
if (m->status & MCI_STATUS_POISON)
return true;
base-commit: 438be5bb46f4be6e78cef7c3400f20d77f03c734
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
@ 2025-11-16 18:05 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2025-11-16 18:05 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, tony.luck, yazen.ghannam, linux-kernel
On Fri, Nov 14, 2025 at 08:20:20PM +0000, Avadhut Naik wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 53385e6aa230..c6be2f520476 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -87,6 +87,7 @@ struct smca_bank {
> const struct smca_hwid *hwid;
> u32 id; /* Value of MCA_IPID[InstanceId]. */
> u8 sysfs_id; /* Value used for sysfs name. */
> + bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
u64 paddrv : 1,
__reserved : 63;
Otherwise, yes.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
@ 2025-07-29 20:46 Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam
0 siblings, 1 reply; 9+ messages in thread
From: Avadhut Naik @ 2025-07-29 20:46 UTC (permalink / raw)
To: x86, linux-edac; +Cc: bp, tony.luck, yazen.ghannam, linux-kernel, avadhut.naik
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.
PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.
PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..387cf250525f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PAVALID BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5c4eb28c3ac9..6ac222aec28d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a)Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
bool amd_mce_usable_address(struct mce *m)
{
+ u64 smca_config;
+
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+ if (smca_config & MCI_CONFIG_PAVALID) {
+ if(m->status & MCI_STATUS_PADDRVAL)
+ return true;
+ return false;
+ }
/* Check special northbridge case 3) first. */
if (!mce_flags.smca) {
if (legacy_mce_is_memory_error(m))
base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-07-29 20:46 Avadhut Naik
@ 2025-08-06 12:52 ` Yazen Ghannam
2025-08-06 21:23 ` Naik, Avadhut
0 siblings, 1 reply; 9+ messages in thread
From: Yazen Ghannam @ 2025-08-06 12:52 UTC (permalink / raw)
To: Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel
On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
>
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
>
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.
The commit message is missing an imperative statement. You can describe
how and why these bits are helpful.
For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
usable address." or something like this.
>
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
> arch/x86/include/asm/mce.h | 2 ++
> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c77c03139f7..387cf250525f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -48,6 +48,7 @@
>
> /* AMD-specific bits */
> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
> @@ -62,6 +63,7 @@
> */
> #define MCI_CONFIG_MCAX 0x1
> #define MCI_CONFIG_FRUTEXT BIT_ULL(9)
> +#define MCI_CONFIG_PAVALID BIT_ULL(11)
> #define MCI_IPID_MCATYPE 0xFFFF0000
> #define MCI_IPID_HWID 0xFFF
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 5c4eb28c3ac9..6ac222aec28d 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
> }
>
> /*
> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
> - * a system physical address. Therefore, individual cases need to be detected.
> - * Future cases and checks will be added as needed.
> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
> + * system physical address. Individual cases though, need to be detected for
> + * other systems. Future cases will be added as needed.
> *
> * 1) General case
> * a) Assume address is not usable.
> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
> * a) Reported in legacy bank 4 with extended error code (XEC) 8.
> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
> * this bit should not be checked.
> + * 4) MCI_STATUS_PADDRVAL is set
> + * a)Will provide a valid system physical address.
Space after "a)".
> *
> * NOTE: SMCA UMC memory errors fall into case #1.
> */
> bool amd_mce_usable_address(struct mce *m)
> {
> + u64 smca_config;
> +
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
Newline here.
> + if (smca_config & MCI_CONFIG_PAVALID) {
> + if(m->status & MCI_STATUS_PADDRVAL)
> + return true;
> + return false;
> + }
Newline here.
Also, the entire hunk above should go after the !SMCA case below and
before the other SMCA cases.
Furthermore, the hunk can be simplified to this:
if (smca_config & MCI_CONFIG_PAVALID)
return m->status & MCI_STATUS_PADDRVAL;
Also also, the bit names are uncannily similar. I think they should be
the same (except for the prefix) or clearly different.
MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
> /* Check special northbridge case 3) first. */
> if (!mce_flags.smca) {
> if (legacy_mce_is_memory_error(m))
>
> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
> --
We should revisit saving MCA_CONFIG during logging. We end up reading it
multiple times during error processing.
I realize this came up before in a previous patch set. IIRC, the point
of contention was exposing MCA_CONFIG to userspace. We don't necessarily
have to do that. We can just use it in the kernel data structures.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
2025-08-06 12:52 ` Yazen Ghannam
@ 2025-08-06 21:23 ` Naik, Avadhut
0 siblings, 0 replies; 9+ messages in thread
From: Naik, Avadhut @ 2025-08-06 21:23 UTC (permalink / raw)
To: Yazen Ghannam, Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel
Hi,
On 8/6/2025 07:52, Yazen Ghannam wrote:
> On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
>> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
>> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
>> valid System Physical Address (SPA) is present in MCA_ADDR.
>>
>> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
>> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
>> if it indicates validity of SPA in MCA_ADDR.
>>
>> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
>> SPA or if it is implementation specific.
>
> The commit message is missing an imperative statement. You can describe
> how and why these bits are helpful.
>
> For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
> usable address." or something like this.
>
Will add it.
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>> arch/x86/include/asm/mce.h | 2 ++
>> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
>> 2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 6c77c03139f7..387cf250525f 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -48,6 +48,7 @@
>>
>> /* AMD-specific bits */
>> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
>> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
>> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
>> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
>> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
>> @@ -62,6 +63,7 @@
>> */
>> #define MCI_CONFIG_MCAX 0x1
>> #define MCI_CONFIG_FRUTEXT BIT_ULL(9)
>> +#define MCI_CONFIG_PAVALID BIT_ULL(11)
>> #define MCI_IPID_MCATYPE 0xFFFF0000
>> #define MCI_IPID_HWID 0xFFF
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 5c4eb28c3ac9..6ac222aec28d 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
>> }
>>
>> /*
>> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
>> - * a system physical address. Therefore, individual cases need to be detected.
>> - * Future cases and checks will be added as needed.
>> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
>> + * system physical address. Individual cases though, need to be detected for
>> + * other systems. Future cases will be added as needed.
>> *
>> * 1) General case
>> * a) Assume address is not usable.
>> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
>> * a) Reported in legacy bank 4 with extended error code (XEC) 8.
>> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
>> * this bit should not be checked.
>> + * 4) MCI_STATUS_PADDRVAL is set
>> + * a)Will provide a valid system physical address.
>
> Space after "a)".
>
Noted.
>> *
>> * NOTE: SMCA UMC memory errors fall into case #1.
>> */
>> bool amd_mce_usable_address(struct mce *m)
>> {
>> + u64 smca_config;
>> +
>> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
>
> Newline here.
>
Noted.
>> + if (smca_config & MCI_CONFIG_PAVALID) {
>> + if(m->status & MCI_STATUS_PADDRVAL)
>> + return true;
>> + return false;
>> + }
>
> Newline here.
>
Noted.
> Also, the entire hunk above should go after the !SMCA case below and
> before the other SMCA cases.
>
> Furthermore, the hunk can be simplified to this:
>
> if (smca_config & MCI_CONFIG_PAVALID)
> return m->status & MCI_STATUS_PADDRVAL;
>
Ack for both.
> Also also, the bit names are uncannily similar. I think they should be
> the same (except for the prefix) or clearly different.
>
> MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
>
> MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
>
Will change the name per your suggestions.
>> /* Check special northbridge case 3) first. */
>> if (!mce_flags.smca) {
>> if (legacy_mce_is_memory_error(m))
>>
>> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
>> --
>
> We should revisit saving MCA_CONFIG during logging. We end up reading it
> multiple times during error processing.
>
> I realize this came up before in a previous patch set. IIRC, the point
> of contention was exposing MCA_CONFIG to userspace. We don't necessarily
> have to do that. We can just use it in the kernel data structures.
>
I agree. If there is consensus, I can work on getting a patch out.
Maybe make this a set with the first patch caching MCA_CONFIG and
this being the second patch.
> Thanks,
> Yazen
--
Thanks,
Avadhut Naik
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-11-16 18:05 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15 17:22 [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
2025-10-17 14:31 ` Yazen Ghannam
2025-10-17 16:44 ` Naik, Avadhut
2025-11-13 16:52 ` Borislav Petkov
2025-11-14 20:20 ` [PATCH] " Avadhut Naik
2025-11-16 18:05 ` Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2025-07-29 20:46 Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam
2025-08-06 21:23 ` Naik, Avadhut
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox