linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
@ 2025-07-29 20:46 Avadhut Naik
  2025-08-06 12:52 ` Yazen Ghannam
  0 siblings, 1 reply; 3+ messages in thread
From: Avadhut Naik @ 2025-07-29 20:46 UTC (permalink / raw)
  To: x86, linux-edac; +Cc: bp, tony.luck, yazen.ghannam, linux-kernel, avadhut.naik

Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.

PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.

PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
 arch/x86/include/asm/mce.h    |  2 ++
 arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..387cf250525f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
 
 /* AMD-specific bits */
 #define MCI_STATUS_TCC		BIT_ULL(55)  /* Task context corrupt */
+#define MCI_STATUS_PADDRVAL	BIT_ULL(54)  /* Valid System Physical Address */
 #define MCI_STATUS_SYNDV	BIT_ULL(53)  /* synd reg. valid */
 #define MCI_STATUS_DEFERRED	BIT_ULL(44)  /* uncorrected error, deferred exception */
 #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
@@ -62,6 +63,7 @@
  */
 #define MCI_CONFIG_MCAX		0x1
 #define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
+#define MCI_CONFIG_PAVALID	BIT_ULL(11)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5c4eb28c3ac9..6ac222aec28d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
 }
 
 /*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
  *
  * 1) General case
  *	a) Assume address is not usable.
@@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
  *	a) Reported in legacy bank 4 with extended error code (XEC) 8.
  *	b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
  *	   this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ *	a)Will provide a valid system physical address.
  *
  * NOTE: SMCA UMC memory errors fall into case #1.
  */
 bool amd_mce_usable_address(struct mce *m)
 {
+	u64 smca_config;
+
+	rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+	if (smca_config & MCI_CONFIG_PAVALID) {
+		if(m->status & MCI_STATUS_PADDRVAL)
+			return true;
+		return false;
+	}
 	/* Check special northbridge case 3) first. */
 	if (!mce_flags.smca) {
 		if (legacy_mce_is_memory_error(m))

base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
  2025-07-29 20:46 [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
@ 2025-08-06 12:52 ` Yazen Ghannam
  2025-08-06 21:23   ` Naik, Avadhut
  0 siblings, 1 reply; 3+ messages in thread
From: Yazen Ghannam @ 2025-08-06 12:52 UTC (permalink / raw)
  To: Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel

On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
> 
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
> 
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.

The commit message is missing an imperative statement. You can describe
how and why these bits are helpful.

For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
usable address." or something like this.

> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---
>  arch/x86/include/asm/mce.h    |  2 ++
>  arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 6c77c03139f7..387cf250525f 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -48,6 +48,7 @@
>  
>  /* AMD-specific bits */
>  #define MCI_STATUS_TCC		BIT_ULL(55)  /* Task context corrupt */
> +#define MCI_STATUS_PADDRVAL	BIT_ULL(54)  /* Valid System Physical Address */
>  #define MCI_STATUS_SYNDV	BIT_ULL(53)  /* synd reg. valid */
>  #define MCI_STATUS_DEFERRED	BIT_ULL(44)  /* uncorrected error, deferred exception */
>  #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
> @@ -62,6 +63,7 @@
>   */
>  #define MCI_CONFIG_MCAX		0x1
>  #define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
> +#define MCI_CONFIG_PAVALID	BIT_ULL(11)
>  #define MCI_IPID_MCATYPE	0xFFFF0000
>  #define MCI_IPID_HWID		0xFFF
>  
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 5c4eb28c3ac9..6ac222aec28d 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
>  }
>  
>  /*
> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
> - * a system physical address. Therefore, individual cases need to be detected.
> - * Future cases and checks will be added as needed.
> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
> + * system physical address. Individual cases though, need to be detected for
> + * other systems. Future cases will be added as needed.
>   *
>   * 1) General case
>   *	a) Assume address is not usable.
> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
>   *	a) Reported in legacy bank 4 with extended error code (XEC) 8.
>   *	b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
>   *	   this bit should not be checked.
> + * 4) MCI_STATUS_PADDRVAL is set
> + *	a)Will provide a valid system physical address.

Space after "a)".

>   *
>   * NOTE: SMCA UMC memory errors fall into case #1.
>   */
>  bool amd_mce_usable_address(struct mce *m)
>  {
> +	u64 smca_config;
> +
> +	rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);

Newline here.

> +	if (smca_config & MCI_CONFIG_PAVALID) {
> +		if(m->status & MCI_STATUS_PADDRVAL)
> +			return true;
> +		return false;
> +	}

Newline here.

Also, the entire hunk above should go after the !SMCA case below and
before the other SMCA cases.

Furthermore, the hunk can be simplified to this:

	if (smca_config & MCI_CONFIG_PAVALID)
		return m->status & MCI_STATUS_PADDRVAL;

Also also, the bit names are uncannily similar. I think they should be
the same (except for the prefix) or clearly different.

MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV

MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL

>  	/* Check special northbridge case 3) first. */
>  	if (!mce_flags.smca) {
>  		if (legacy_mce_is_memory_error(m))
> 
> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
> -- 

We should revisit saving MCA_CONFIG during logging. We end up reading it
multiple times during error processing.

I realize this came up before in a previous patch set. IIRC, the point
of contention was exposing MCA_CONFIG to userspace. We don't necessarily
have to do that. We can just use it in the kernel data structures.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
  2025-08-06 12:52 ` Yazen Ghannam
@ 2025-08-06 21:23   ` Naik, Avadhut
  0 siblings, 0 replies; 3+ messages in thread
From: Naik, Avadhut @ 2025-08-06 21:23 UTC (permalink / raw)
  To: Yazen Ghannam, Avadhut Naik; +Cc: x86, linux-edac, bp, tony.luck, linux-kernel

Hi,

On 8/6/2025 07:52, Yazen Ghannam wrote:
> On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote:
>> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
>> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
>> valid System Physical Address (SPA) is present in MCA_ADDR.
>>
>> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
>> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
>> if it indicates validity of SPA in MCA_ADDR.
>>
>> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
>> SPA or if it is implementation specific.
> 
> The commit message is missing an imperative statement. You can describe
> how and why these bits are helpful.
> 
> For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a
> usable address." or something like this.
> 
Will add it.

>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
>>  arch/x86/include/asm/mce.h    |  2 ++
>>  arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
>>  2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
>> index 6c77c03139f7..387cf250525f 100644
>> --- a/arch/x86/include/asm/mce.h
>> +++ b/arch/x86/include/asm/mce.h
>> @@ -48,6 +48,7 @@
>>  
>>  /* AMD-specific bits */
>>  #define MCI_STATUS_TCC		BIT_ULL(55)  /* Task context corrupt */
>> +#define MCI_STATUS_PADDRVAL	BIT_ULL(54)  /* Valid System Physical Address */
>>  #define MCI_STATUS_SYNDV	BIT_ULL(53)  /* synd reg. valid */
>>  #define MCI_STATUS_DEFERRED	BIT_ULL(44)  /* uncorrected error, deferred exception */
>>  #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
>> @@ -62,6 +63,7 @@
>>   */
>>  #define MCI_CONFIG_MCAX		0x1
>>  #define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
>> +#define MCI_CONFIG_PAVALID	BIT_ULL(11)
>>  #define MCI_IPID_MCATYPE	0xFFFF0000
>>  #define MCI_IPID_HWID		0xFFF
>>  
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 5c4eb28c3ac9..6ac222aec28d 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
>>  }
>>  
>>  /*
>> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
>> - * a system physical address. Therefore, individual cases need to be detected.
>> - * Future cases and checks will be added as needed.
>> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
>> + * system physical address. Individual cases though, need to be detected for
>> + * other systems. Future cases will be added as needed.
>>   *
>>   * 1) General case
>>   *	a) Assume address is not usable.
>> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
>>   *	a) Reported in legacy bank 4 with extended error code (XEC) 8.
>>   *	b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
>>   *	   this bit should not be checked.
>> + * 4) MCI_STATUS_PADDRVAL is set
>> + *	a)Will provide a valid system physical address.
> 
> Space after "a)".
>
Noted.
 
>>   *
>>   * NOTE: SMCA UMC memory errors fall into case #1.
>>   */
>>  bool amd_mce_usable_address(struct mce *m)
>>  {
>> +	u64 smca_config;
>> +
>> +	rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
> 
> Newline here.
> 
Noted.

>> +	if (smca_config & MCI_CONFIG_PAVALID) {
>> +		if(m->status & MCI_STATUS_PADDRVAL)
>> +			return true;
>> +		return false;
>> +	}
> 
> Newline here.
>
Noted.
 
> Also, the entire hunk above should go after the !SMCA case below and
> before the other SMCA cases.
> 
> Furthermore, the hunk can be simplified to this:
> 
> 	if (smca_config & MCI_CONFIG_PAVALID)
> 		return m->status & MCI_STATUS_PADDRVAL;
> 
Ack for both.

> Also also, the bit names are uncannily similar. I think they should be
> the same (except for the prefix) or clearly different.
> 
> MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV
> 
> MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL
> 

Will change the name per your suggestions.

>>  	/* Check special northbridge case 3) first. */
>>  	if (!mce_flags.smca) {
>>  		if (legacy_mce_is_memory_error(m))
>>
>> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
>> -- 
> 
> We should revisit saving MCA_CONFIG during logging. We end up reading it
> multiple times during error processing.
> 
> I realize this came up before in a previous patch set. IIRC, the point
> of contention was exposing MCA_CONFIG to userspace. We don't necessarily
> have to do that. We can just use it in the kernel data structures.
> 
I agree. If there is consensus, I can work on getting a patch out.
Maybe make this a set with the first patch caching MCA_CONFIG and
this being the second patch.

> Thanks,
> Yazen

-- 
Thanks,
Avadhut Naik


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-06 21:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 20:46 [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits Avadhut Naik
2025-08-06 12:52 ` Yazen Ghannam
2025-08-06 21:23   ` Naik, Avadhut

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).