[PATCH v3 0/1] AMD VM crashing on deferred memory error injection

public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/1] AMD VM crashing on deferred memory error injection
@ 2026-03-17 10:38 “William Roche
  2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche
  0 siblings, 1 reply; 9+ messages in thread
From: “William Roche @ 2026-03-17 10:38 UTC (permalink / raw)
  To: bp, yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

v3 changes:
- Commit title and message changed to put the emphasis on SMCA access
  correctness - Borislav Petkov feedback.

v2 changes:
- Commit title changed to:
  x86/mce/amd: Fix VM crash during deferred error handling
- Commit message with capitalized QEMU and KVM as well as the imperative
  statement suggested by Yazen
- "CC stable" tag placed after "Signed-off-by"
  (The documentation asks for "the sign-off area" without more details)
- blank line added to separate SMCA code block and the update of
  MCA_STATUS.

 --

After the integration of the following commit:
	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling

A problem was found with AMD Qemu VM - it started to crash when dealing
with deferred memory error injection with a stack trace like:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

  amd_clear_bank+0x6e/0x70
  machine_check_poll+0x228/0x2e0
  ? __pfx_mce_timer_fn+0x10/0x10
  mce_timer_fn+0xb1/0x130
  ? __pfx_mce_timer_fn+0x10/0x10
  call_timer_fn+0x26/0x120
  __run_timers+0x202/0x290
  run_timer_softirq+0x49/0x100
  handle_softirqs+0xeb/0x2c0
  __irq_exit_rcu+0xda/0x100
  sysvec_apic_timer_interrupt+0x71/0x90
[...]
 Kernel panic - not syncing: MCA architectural violation!

See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/

We identified a problem with SMCA specific registers access from
non-SMCA platforms like a QEMU/KVM machine.

This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.

William Roche (1):
  x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines

 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
2.47.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 10:38 [PATCH v3 0/1] AMD VM crashing on deferred memory error injection “William Roche
@ 2026-03-17 10:38 ` “William Roche
  2026-03-17 13:32   ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: “William Roche @ 2026-03-17 10:38 UTC (permalink / raw)
  To: bp, yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

Access to SMCA specific registers like MCA_DESTAT should only be done
after having checked the smca bit. Avoiding a non-SMCA machine (like
AMD QEMU/KVM VMs) crash during deferred error handling.

Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Signed-off-by: William Roche <william.roche@oracle.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index da13c1e37f87..a030ee4cecc2 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -875,13 +875,18 @@ void amd_clear_bank(struct mce *m)
 {
 	amd_reset_thr_limit(m->bank);
 
-	/* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
-	if (m->status & MCI_STATUS_DEFERRED)
-		mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
+	if (mce_flags.smca) {
+		/*
+		 * Clear MCA_DESTAT for all deferred errors even those
+		 * logged in MCA_STATUS.
+		 */
+		if (m->status & MCI_STATUS_DEFERRED)
+			mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
 
-	/* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
-	if (m->kflags & MCE_CHECK_DFR_REGS)
-		return;
+		/* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
+		if (m->kflags & MCE_CHECK_DFR_REGS)
+			return;
+	}
 
 	mce_wrmsrq(mca_msr_reg(m->bank, MCA_STATUS), 0);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche
@ 2026-03-17 13:32   ` Borislav Petkov
  2026-03-17 13:38     ` William Roche
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2026-03-17 13:32 UTC (permalink / raw)
  To: “William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
> 
> Access to SMCA specific registers like MCA_DESTAT should only be done
> after having checked the smca bit. Avoiding a non-SMCA machine (like
> AMD QEMU/KVM VMs) crash during deferred error handling.

Not good enough. I rewrote it to:

Author: William Roche <william.roche@oracle.com>
Date:   Tue Mar 17 10:38:10 2026 +0000
 
    x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
    
    People do effort to inject MCEs into guests in order to simulate/test handling
    of real hardware errors. These efforts are of a questionable nature because,
    for one, a guest cannot really make any assumptions about the underlying
    machine and especially which MSR accesses the hypervisor filters and
    which it doesn't. See Link tag for the whole background.
    
    However, regardless of virtualization or not, access to SMCA-specific
    registers like MCA_DESTAT should only be done after having checked the smca
    feature bit. And there are AMD machines like Bulldozer (the one before Zen1)
    which do support deferred errors but are not SMCA machines.
    
    Therefore, properly check the feature bit before accessing related MSRs.
    
      [ bp: Rewrite commit message. ]
    
    Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
    Signed-off-by: William Roche <william.roche@oracle.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 13:32   ` Borislav Petkov
@ 2026-03-17 13:38     ` William Roche
  2026-03-17 18:17       ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: William Roche @ 2026-03-17 13:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On 3/17/26 14:32, Borislav Petkov wrote:
> On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote:
>> From: William Roche <william.roche@oracle.com>
>>
>> Access to SMCA specific registers like MCA_DESTAT should only be done
>> after having checked the smca bit. Avoiding a non-SMCA machine (like
>> AMD QEMU/KVM VMs) crash during deferred error handling.
> 
> Not good enough. I rewrote it to:
> 
> Author: William Roche <william.roche@oracle.com>
> Date:   Tue Mar 17 10:38:10 2026 +0000
>   
>      x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
>      
>      People do effort to inject MCEs into guests in order to simulate/test handling
>      of real hardware errors. These efforts are of a questionable nature because,
>      for one, a guest cannot really make any assumptions about the underlying
>      machine and especially which MSR accesses the hypervisor filters and
>      which it doesn't. See Link tag for the whole background.
>      
>      However, regardless of virtualization or not, access to SMCA-specific
>      registers like MCA_DESTAT should only be done after having checked the smca
>      feature bit. And there are AMD machines like Bulldozer (the one before Zen1)
>      which do support deferred errors but are not SMCA machines.
>      
>      Therefore, properly check the feature bit before accessing related MSRs.
>      
>        [ bp: Rewrite commit message. ]
>      
>      Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
>      Signed-off-by: William Roche <william.roche@oracle.com>
>      Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
>      Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>      Cc: stable@vger.kernel.org
>      Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com

Thank you.

William.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 13:38     ` William Roche
@ 2026-03-17 18:17       ` Borislav Petkov
  2026-03-17 20:06         ` William Roche
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2026-03-17 18:17 UTC (permalink / raw)
  To: William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Tue, Mar 17, 2026 at 02:38:58PM +0100, William Roche wrote:
> On 3/17/26 14:32, Borislav Petkov wrote:
> > On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote:
> > > From: William Roche <william.roche@oracle.com>
> > > 
> > > Access to SMCA specific registers like MCA_DESTAT should only be done
> > > after having checked the smca bit. Avoiding a non-SMCA machine (like
> > > AMD QEMU/KVM VMs) crash during deferred error handling.
> > 
> > Not good enough. I rewrote it to:
> > 
> > Author: William Roche <william.roche@oracle.com>
> > Date:   Tue Mar 17 10:38:10 2026 +0000
> >      x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
> >      People do effort to inject MCEs into guests in order to simulate/test handling
> >      of real hardware errors. These efforts are of a questionable nature because,
> >      for one, a guest cannot really make any assumptions about the underlying
> >      machine and especially which MSR accesses the hypervisor filters and
> >      which it doesn't. See Link tag for the whole background.
> >      However, regardless of virtualization or not, access to SMCA-specific
> >      registers like MCA_DESTAT should only be done after having checked the smca
> >      feature bit. And there are AMD machines like Bulldozer (the one before Zen1)
> >      which do support deferred errors but are not SMCA machines.
> >      Therefore, properly check the feature bit before accessing related MSRs.
> >        [ bp: Rewrite commit message. ]
> >      Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> >      Signed-off-by: William Roche <william.roche@oracle.com>
> >      Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> >      Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
> >      Cc: stable@vger.kernel.org
> >      Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com
> 
> Thank you.

Rewrote it again after talking to Yazen. A patch needs to have the proper
justification why it exists!

x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs

People do effort to inject MCEs into guests in order to simulate/test
handling of hardware errors. The real use case behind it is testing the
handling of SIGBUS which the memory failure code sends to the process.
 
If that process is QEMU, instead of killing the whole guest, the MCE can
be injected into the guest kernel so that latter can attempt proper
handling and kill the user *process*  in the guest, instead, which 
caused the MCE. The assumption being here that the whole injection flow
can supply enough information that the guest kernel can poinpoint the
right process. But that's a different topic...
 
Regardless of virtualization or not, access to SMCA-specific registers
like MCA_DESTAT should only be done after having checked the smca
feature bit. And there are AMD machines like Bulldozer (the one before
Zen1) which do support deferred errors but are not SMCA machines.
 
Therefore, properly check the feature bit before accessing related MSRs.
 
  [ bp: Rewrite commit message. ]


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 18:17       ` Borislav Petkov
@ 2026-03-17 20:06         ` William Roche
  2026-03-17 20:24           ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: William Roche @ 2026-03-17 20:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

I just wanted to give a small precision about the VM error relay mechanism:

On 3/17/26 19:17, Borislav Petkov wrote:
> Rewrote it again after talking to Yazen. A patch needs to have the proper
> justification why it exists!
> 
> x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs
> 
> People do effort to inject MCEs into guests in order to simulate/test
> handling of hardware errors. The real use case behind it is testing the
> handling of SIGBUS which the memory failure code sends to the process.
>   
> If that process is QEMU, instead of killing the whole guest, the MCE can
> be injected into the guest kernel so that latter can attempt proper
> handling and kill the user *process*  in the guest, instead, which
> caused the MCE. The assumption being here that the whole injection flow
> can supply enough information that the guest kernel can poinpoint the
> right process. But that's a different topic...


Relaying the error to the guest doesn't only have a value to target a VM 
process but also deal with free memory or clean file cache memory 
impacted etc... Cases where a memory error may not crash the kernel can 
benefit to the VM too (Kernel RAS features that are, as you said, a 
different topic :) )

There is also a small typo in "pinpoint"

>   
> Regardless of virtualization or not, access to SMCA-specific registers
> like MCA_DESTAT should only be done after having checked the smca
> feature bit. And there are AMD machines like Bulldozer (the one before
> Zen1) which do support deferred errors but are not SMCA machines.
>   
> Therefore, properly check the feature bit before accessing related MSRs.
>   
>    [ bp: Rewrite commit message. ]


Thank you very much for your feedback !
William.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 20:06         ` William Roche
@ 2026-03-17 20:24           ` Borislav Petkov
  2026-03-17 21:52             ` William Roche
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2026-03-17 20:24 UTC (permalink / raw)
  To: William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Tue, Mar 17, 2026 at 09:06:54PM +0100, William Roche wrote:
> Relaying the error to the guest doesn't only have a value to target a VM
> process but also deal with free memory or clean file cache memory impacted
> etc... Cases where a memory error may not crash the kernel can benefit to
> the VM too

I don't understand - what do you mean with "free memory or clean file cache
memory"?

> There is also a small typo in "pinpoint"

Ack, fixed.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 20:24           ` Borislav Petkov
@ 2026-03-17 21:52             ` William Roche
  2026-03-18 20:24               ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: William Roche @ 2026-03-17 21:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On 3/17/26 21:24, Borislav Petkov wrote:
> On Tue, Mar 17, 2026 at 09:06:54PM +0100, William Roche wrote:
>> Relaying the error to the guest doesn't only have a value to target a VM
>> process but also deal with free memory or clean file cache memory impacted
>> etc... Cases where a memory error may not crash the kernel can benefit to
>> the VM too
> 
> I don't understand - what do you mean with "free memory or clean file cache
> memory"?

The physical address of an uncorrected memory error (if/when it can be 
identified) can give a chance to a kernel reaction depending on the 
state (and type) of the impacted memory -- as implemented in 
mm/memory-failure.c with error_states[], me_pagecache_clean() or 
try_memory_failure()...

The Kernel can try to "deal" with the error. The process case (with its 
SIGBUS) is probably the most common one, but a few kernel memory pages 
impacted by a memory error could be isolated (poisoned) without 
requiring a kernel crash. Free memory pages or clean page cache pages 
could be an example of that, they are poisoned and should not be used by 
the system after that. The kernel can also return EIO error on poisoned 
page cache failed access attempt, etc...

These mechanisms are implemented for the bare-metal running kernel, but 
what is really interesting when relaying the error to a VM is that its 
kernel can, in some cases, also benefit from these mechanisms. And 
having a chance (even small) to avoid a VM crash is a significant gain 
for virtualized workload.

Just giving my point of view on why we care about VM relayed memory 
errors :)

William.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines
  2026-03-17 21:52             ` William Roche
@ 2026-03-18 20:24               ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2026-03-18 20:24 UTC (permalink / raw)
  To: William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Tue, Mar 17, 2026 at 10:52:50PM +0100, William Roche wrote:
> The physical address of an uncorrected memory error (if/when it can be
> identified) can give a chance to a kernel reaction depending on the state
> (and type) of the impacted memory -- as implemented in mm/memory-failure.c
> with error_states[], me_pagecache_clean() or try_memory_failure()...
> 
> The Kernel can try to "deal" with the error. The process case (with its
> SIGBUS) is probably the most common one, but a few kernel memory pages
> impacted by a memory error could be isolated (poisoned) without requiring a
> kernel crash. Free memory pages or clean page cache pages could be an
> example of that, they are poisoned and should not be used by the system
> after that. The kernel can also return EIO error on poisoned page cache
> failed access attempt, etc...
> 
> These mechanisms are implemented for the bare-metal running kernel, but what
> is really interesting when relaying the error to a VM is that its kernel
> can, in some cases, also benefit from these mechanisms. And having a chance
> (even small) to avoid a VM crash is a significant gain for virtualized
> workload.
> 
> Just giving my point of view on why we care about VM relayed memory errors
> :)

Ah, you want to be able to handle an error belonging to a guest, regardless of
which part it hits. As in, the guest memory hit could be pagecache, free
memory, etc... anything that would prevent the guest from dying unnecessary
death.

Ack, makes sense.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-18 20:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 10:38 [PATCH v3 0/1] AMD VM crashing on deferred memory error injection “William Roche
2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche
2026-03-17 13:32   ` Borislav Petkov
2026-03-17 13:38     ` William Roche
2026-03-17 18:17       ` Borislav Petkov
2026-03-17 20:06         ` William Roche
2026-03-17 20:24           ` Borislav Petkov
2026-03-17 21:52             ` William Roche
2026-03-18 20:24               ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox