[PATCH v2 0/1] AMD VM crashing on deferred memory error injection

public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/1] AMD VM crashing on deferred memory error injection
@ 2026-02-18 16:30 “William Roche
  2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
  2026-03-12 14:23 ` [PATCH v2 0/1] AMD VM crashing on deferred memory error injection William Roche
  0 siblings, 2 replies; 12+ messages in thread
From: “William Roche @ 2026-02-18 16:30 UTC (permalink / raw)
  To: yazen.ghannam, tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

Thank you very much Yazen for your review and all the suggestions!

v2 changes:
- Commit title changed to:
  x86/mce/amd: Fix VM crash during deferred error handling
- Commit message with capitalized QEMU and KVM as well as the imperative
  statement suggested by Yazen
- "CC stable" tag placed after "Signed-off-by"
  (The documentation asks for "the sign-off area" without more details)
- blank line added to separate SCMA code block and the update of
  MCA_STATUS.

 --

After the integration of the following commit:
	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling

AMD Qemu VM started to crash when dealing with deferred memory error
injection with a stack trace like:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

  amd_clear_bank+0x6e/0x70
  machine_check_poll+0x228/0x2e0
  ? __pfx_mce_timer_fn+0x10/0x10
  mce_timer_fn+0xb1/0x130
  ? __pfx_mce_timer_fn+0x10/0x10
  call_timer_fn+0x26/0x120
  __run_timers+0x202/0x290
  run_timer_softirq+0x49/0x100
  handle_softirqs+0xeb/0x2c0
  __irq_exit_rcu+0xda/0x100
  sysvec_apic_timer_interrupt+0x71/0x90
[...]
 Kernel panic - not syncing: MCA architectural violation!

See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/

We identified a problem with SMCA specific registers access from
non-SMCA platforms like a QEMU/KVM machine.

This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.

William Roche (1):
  x86/mce/amd: Fix VM crash during deferred error handling

 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
2.47.3

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-02-18 16:30 [PATCH v2 0/1] AMD VM crashing on deferred memory error injection “William Roche
@ 2026-02-18 16:30 ` “William Roche
  2026-03-12 14:42   ` Borislav Petkov
  2026-03-12 14:23 ` [PATCH v2 0/1] AMD VM crashing on deferred memory error injection William Roche
  1 sibling, 1 reply; 12+ messages in thread
From: “William Roche @ 2026-02-18 16:30 UTC (permalink / raw)
  To: yazen.ghannam, tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

A non Scalable MCA system may prevent access to SMCA specific registers
like MCA_DESTAT. This is the case of QEMU/KVM VMs, where the kernel
has to check for the SMCA feature before accessing MCA_DESTAT.

Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Signed-off-by: William Roche <william.roche@oracle.com>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3f1dda355307..7b9932f13bca 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -875,13 +875,18 @@ void amd_clear_bank(struct mce *m)
 {
 	amd_reset_thr_limit(m->bank);
 
-	/* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
-	if (m->status & MCI_STATUS_DEFERRED)
-		mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
+	if (mce_flags.smca) {
+		/*
+		 * Clear MCA_DESTAT for all deferred errors even those
+		 * logged in MCA_STATUS.
+		 */
+		if (m->status & MCI_STATUS_DEFERRED)
+			mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
 
-	/* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
-	if (m->kflags & MCE_CHECK_DFR_REGS)
-		return;
+		/* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
+		if (m->kflags & MCE_CHECK_DFR_REGS)
+			return;
+	}
 
 	mce_wrmsrq(mca_msr_reg(m->bank, MCA_STATUS), 0);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
@ 2026-03-12 14:42   ` Borislav Petkov
  2026-03-12 15:11     ` William Roche
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2026-03-12 14:42 UTC (permalink / raw)
  To: “William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Wed, Feb 18, 2026 at 04:30:25PM +0000, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
> 
> A non Scalable MCA system may prevent access to SMCA specific registers

"may prevent"?

Please explain in the commit message the whole scenario how you're triggering
this in detail.

> like MCA_DESTAT. This is the case of QEMU/KVM VMs, where the kernel
> has to check for the SMCA feature before accessing MCA_DESTAT.
> 
> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> Signed-off-by: William Roche <william.roche@oracle.com>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: stable@vger.kernel.org

AFAIR, you're injecting errors. This is not really a critical fix that
warrants this going to stable.

> ---
>  arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 3f1dda355307..7b9932f13bca 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -875,13 +875,18 @@ void amd_clear_bank(struct mce *m)
>  {
>  	amd_reset_thr_limit(m->bank);
>  
> -	/* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
> -	if (m->status & MCI_STATUS_DEFERRED)
> -		mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
> +	if (mce_flags.smca) {

All this code should not run in a VM. So why does it?

What is the use case we're supposed to support here?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-12 14:42   ` Borislav Petkov
@ 2026-03-12 15:11     ` William Roche
  2026-03-12 16:04       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: William Roche @ 2026-03-12 15:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

Thank you for your reply,

On 3/12/26 15:42, Borislav Petkov wrote:
> On Wed, Feb 18, 2026 at 04:30:25PM +0000, “William Roche wrote:
>> From: William Roche <william.roche@oracle.com>
>>
>> A non Scalable MCA system may prevent access to SMCA specific registers
> 
> "may prevent"?
> 
> Please explain in the commit message the whole scenario how you're triggering
> this in detail.
> 

 From the kernel point of view (regardless if it is running on bare 
metal or in a VM), access to these registers registers is provided by 
the platform: either the Hardware or the emulation framework.

Yazen indicated on Feb 12 that "AMD systems generally have a 
Read-as-Zero/Writes-Ignored behavior when accessing unimplemented MCA 
registers", but you rightly indicated on Feb 9 that "KVM works as 
advertized" and so prevents access to unimplemented SMCA specific 
registers. That's the reason why I had to say "may".

This access crashes on AMD VMs and "may" work on AMD hardware according 
to Yazen.

>> like MCA_DESTAT. This is the case of QEMU/KVM VMs, where the kernel
>> has to check for the SMCA feature before accessing MCA_DESTAT.
>>
>> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
>> Signed-off-by: William Roche <william.roche@oracle.com>
>> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Cc: stable@vger.kernel.org
> 
> AFAIR, you're injecting errors. This is not really a critical fix that
> warrants this going to stable.

Errors are injected into VMs by the hypervisor when real memory hardware 
errors occur on the system that impact the VM address space.
This is not only a test, this is real life mechanism. With the fix 
7cb735d7c0cb that has been integrated, VMs kernel running on AMD now 
crashes on Deferred errors, where it used to be able to deal with them 
before this commit.
That's the reason why we need this additional fix.

> 
>> ---
>>   arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
>>   1 file changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 3f1dda355307..7b9932f13bca 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -875,13 +875,18 @@ void amd_clear_bank(struct mce *m)
>>   {
>>   	amd_reset_thr_limit(m->bank);
>>   
>> -	/* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
>> -	if (m->status & MCI_STATUS_DEFERRED)
>> -		mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
>> +	if (mce_flags.smca) {
> 
> All this code should not run in a VM. So why does it?

Why do you say that this code should not run in a VM ?
Error injection mechanism has been running for several years with QEMU/KVM.
I must be missing something here. Please let me know.

> 
> What is the use case we're supposed to support here?
> 

Dealing with real life deferred memory errors impacting VMs address space.

I hope this clarifies the need for this new kernel fix.

Thanks again,
William.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-12 15:11     ` William Roche
@ 2026-03-12 16:04       ` Borislav Petkov
  2026-03-12 22:44         ` William Roche
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2026-03-12 16:04 UTC (permalink / raw)
  To: William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Thu, Mar 12, 2026 at 04:11:10PM +0100, William Roche wrote:
> From the kernel point of view (regardless if it is running on bare metal or
> in a VM), access to these registers registers is provided by the platform:
> either the Hardware or the emulation framework.

Except the emulation doesn't emulate the platform properly. We test on real
hw. If your hypervisor doesn't do that properly then that's not really
upstream kernel's problem.

> Errors are injected into VMs by the hypervisor when real memory hardware
> errors occur on the system that impact the VM address space.

And?

Why?

What's the recovery action scenario for having errors injected into guests?
Where is that documented? Why does the upstream kernel need to care?

Basically I'm asking you for the use case in order to determine whether that
use case is valid for the *upstream* kernel to support.

> This is not only a test, this is real life mechanism. With the fix
> 7cb735d7c0cb that has been integrated, VMs kernel running on AMD now crashes
> on Deferred errors, where it used to be able to deal with them before this
> commit.

Because we don't know of your use case. So when we do upstream development how
can we test your case?

Before that, is that case even worth testing?

I hope I'm making sense here. The MCA and other low-level hw code works on
baremetal as that's its main target. If it is supposed to work in VMs, then
there better be a proper use case which we are willing to support and we can
*actually* *test*.

If not, you can keep this "fix" in your guest kernels and everyone's happy.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-12 16:04       ` Borislav Petkov
@ 2026-03-12 22:44         ` William Roche
  2026-03-13 20:10           ` Borislav Petkov
  2026-03-13 20:26           ` Yazen Ghannam
  0 siblings, 2 replies; 12+ messages in thread
From: William Roche @ 2026-03-12 22:44 UTC (permalink / raw)
  To: Borislav Petkov, yazen.ghannam
  Cc: tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac,
	linux-kernel, John.Allen, jane.chu

[-- Attachment #1: Type: text/plain, Size: 7495 bytes --]

Thank you for taking the time to explain your worries about the context 
of this fix integration, and I do hope my feedback can help to convince you.

On 3/12/26 17:04, Borislav Petkov wrote:
> On Thu, Mar 12, 2026 at 04:11:10PM +0100, William Roche wrote:
>>  From the kernel point of view (regardless if it is running on bare metal or
>> in a VM), access to these registers registers is provided by the platform:
>> either the Hardware or the emulation framework.
> 
> Except the emulation doesn't emulate the platform properly. We test on real
> hw. If your hypervisor doesn't do that properly then that's not really
> upstream kernel's problem.

There are several aspects that are worth considering here:
First, I totally agree that the emulation has to emulate properly ! :)

The problem we are facing is to consider non-SMCA platform reaction to 
updating and SMCA specific register.
And is the QEMU/KVM VM reaction as a non-SCMA machine a valid case ?

In this VM case, the MSR handling emulation is done by KVM which doesn't 
implement a "permissive" access to unimplemented registers. I also 
agreed with you when you said that it is working as advertised.
Now if emulating an AMD platform requires to provide a "permissive" 
access to a specific set of registers, the fix would not be absolutely 
necessary. But I may have missed a specification about that. And if such 
a thing exists, it would also be all kernels (including upstream) 
responsibility to take that into account.

Yazen may help us on this aspect: Could you please let us know if there 
is an AMD specification for accessing SMCA registers on non SMCA machines ?

Now if we had a valid case of an existing non-SMCA AMD hardware that 
could crash on updating an SMCA register, the fix would be needed not 
only for the VM case.

Yazen, could you also please tell us if an existing non-SMCA AMD 
hardware could crash on updating an SMCA register ?

The commit 7cb735d7c0cb [x86/mce: Unify AMD DFR handler with MCA 
Polling]  written by Yazen,
introduced an upstream kernel problem on non-SMCA platforms that has 
been revealed by the emulation framework on AMD. That's the reason why I 
think it should be fixed in upstream too. And Yazen himself agrees with 
that.

> 
>> Errors are injected into VMs by the hypervisor when real memory hardware
>> errors occur on the system that impact the VM address space.
> 
> And?

The injected error is received by the VM kernel to deal with it.

> Why?

The VM kernel executes the same mechanisms used on bare metal in that case.
As Tony said on Feb 9: The guest may be able to just kill a process and 
keep running.

> 
> What's the recovery action scenario for having errors injected into guests?

Just the same as running on real HW.

> Where is that documented? Why does the upstream kernel need to care?

Sorry I don't have a kernel documentation pointer about that, but the 
MCE relay mechanism sure is an Hypervisor functionality.

> 
> Basically I'm asking you for the use case in order to determine whether that
> use case is valid for the *upstream* kernel to support.

Yes, of course, see below.

> 
>> This is not only a test, this is real life mechanism. With the fix
>> 7cb735d7c0cb that has been integrated, VMs kernel running on AMD now crashes
>> on Deferred errors, where it used to be able to deal with them before this
>> commit.
> 
> Because we don't know of your use case. So when we do upstream development how
> can we test your case?
> 

I have a procedure to verify the behavior: It consists of running the 
upstream kernel in a VM (on an AMD platform) and injecting a memory 
error from the hardware platform to this VM to mimic a real hardware 
error being reported to the platform Kernel.

To do so:
Run Qemu as root (to help with the address translation).
The VM runs the upstream kernel.
Run the small attached program in the VM as root, so that it gives a 
guest physical address of one of its mapped memory page.

[root@VM]# ./mce_process_react_x86
Setting Early kill... Ok

Data pages at 0xXXXXXXX  physically 0xYYYYY000

-> DON'T Press enter !   (just leave the process wait here)

Ask the emulator (QEMU in this case) to give the host physical address 
of the guest physical page:
  (qemu) gpa2hpa 0xYYYYY000
  Host physical address for 0xYYYYY000 (pc.ram) is 0xPFN000

 From the host physical address get the pfn value (removing the last 3 
zeros of the address) to poison.

On the host, use hwpoison kernel module:
[root@host]# modprobe hwpoison_inject

and inject an error to the targeted pfn:
[root@host]# echo 0xPFN > /sys/kernel/debug/hwpoison/corrupt-pfn

Than wait until the Asynchronous error generated reaches the VM (it can 
take up to 5 minutes on AMD virtualization) to see the VM kernel deal 
with it.

Without this suggested fix, the VM kernel panics, with the stack trace I 
gave:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 
0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

    amd_clear_bank+0x6e/0x70
    machine_check_poll+0x228/0x2e0
    ? __pfx_mce_timer_fn+0x10/0x10
    mce_timer_fn+0xb1/0x130
    ? __pfx_mce_timer_fn+0x10/0x10
    call_timer_fn+0x26/0x120
    __run_timers+0x202/0x290
    run_timer_softirq+0x49/0x100
    handle_softirqs+0xeb/0x2c0
    __irq_exit_rcu+0xda/0x100
    sysvec_apic_timer_interrupt+0x71/0x90
[...]
   Kernel panic - not syncing: MCA architectural violation!

With the fix the VM Kernel deals with the error:

[root@VM]# ./mce_process_react_x86
Setting Early kill... Ok
Data pages at 0x7fa0f9b25000 physically 0x172929000

(qemu) gpa2hpa 0x172929000
Host physical address for 0x172929000 (pc.ram) is 0x237129000

-> Injecting the error with:
[root@host]# echo 0x237129 >  /sys/kernel/debug/hwpoison/corrupt-pfn

-> The VM monitor indicates:
qemu-kvm: warning: Guest MCE Memory Error at QEMU addr 0x7f3ae2729000 
and GUEST addr 0x172929000 of type BUS_MCEERR_AO injected

-> A few minutes later, the VM console shows:
localhost login: [  332.973864] mce: [Hardware Error]: Machine check 
events logged
[  332.976795] Memory failure: 0x172929: Sending SIGBUS to 
mce_process_rea:5607 due to hardware memory corruption
[  332.977832] Memory failure: 0x172929: recovery action for dirty LRU 
page: Recovered
[  355.056785] MCE: Killing mce_process_rea:5607 due to hardware memory 
corruption fault at 0x7fa0f9b25000

-> The process shows:
Signal 7 received: BUS_MCEERR_AO on vaddr: 0x7fa0f9b25000
Signal 7 received: BUS_MCEERR_AR on vaddr: 0x7fa0f9b25000
Exit from the signal handler on BUS_MCEERR_AR

-> Works as expected: AO error is relayed by the VM kernel to the 
application running.

> Before that, is that case even worth testing?

If we accept that relayed MCEs is supported by the upstream kernel 
running in the VM, than yes.

> 
> I hope I'm making sense here. The MCA and other low-level hw code works on
> baremetal as that's its main target. If it is supposed to work in VMs, then
> there better be a proper use case which we are willing to support and we can
> *actually* *test*.

The above detailed procedure can maybe help with this aspect, even if it 
is virtualization oriented. As I do hope that upstream kernel supports 
memory error handling in a VM.

But Yazen's answers about non-SMCA hardware can also help to decide what 
to do with this fix.

> 
> If not, you can keep this "fix" in your guest kernels and everyone's happy.
> 
> Thx.

I hope my explanations helped to better understand the context.

Thanks,
William.

[-- Attachment #2: mce_process_react.c --]
[-- Type: text/x-csrc, Size: 4517 bytes --]

#include <sys/types.h>
#include <sys/prctl.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <signal.h>
#include <string.h>

#define PAGEMAP_ENTRY 8
#define GET_BIT(X,Y) (X & ((uint64_t)1<<Y)) >> Y
#define GET_PFN(X) X & 0x7FFFFFFFFFFFFF

const int __endian_bit = 1;
#define is_bigendian() ( (*(char*)&__endian_bit) == 0 )
static long pgsz;

/*
 * Set the early kill mode reaction state to MCE error.
 */
static void early_reaction() {
   printf("Setting Early kill... ");
   if (prctl(PR_MCE_KILL, PR_MCE_KILL_SET, PR_MCE_KILL_EARLY, 0, 0) == 0)
      printf("Ok\n");
   else
      printf("Failure !\n");
}

/*
 * Return the physical address associated to a given local virtual address,
 * or -1 in case of an error.
 */
static uint64_t physical_address(uint64_t virt_addr) {
   char path_buf [0x100];
   FILE * f;
   uint64_t read_val, file_offset, pfn = 0;
   unsigned char c_buf[PAGEMAP_ENTRY];
   pid_t my_pid = getpid();
   int status, i;

   sprintf(path_buf, "/proc/%u/pagemap", my_pid);

   f = fopen(path_buf, "rb");
   if(!f){
      printf("Error! Cannot open %s\n", path_buf);
      return (uint64_t)-1;
   }

   file_offset = virt_addr / (uint64_t)pgsz * PAGEMAP_ENTRY;
   status = fseek(f, (long)file_offset, SEEK_SET);
   if(status){
      perror("Failed to do fseek!");
      fclose(f);
      return (uint64_t)-1;
   }

   for(i=0; i < PAGEMAP_ENTRY; i++){
      int c = getc(f);
      if(c==EOF){
         fclose(f);
         return (uint64_t)-1;
      }
      if(is_bigendian())
           c_buf[i] = (unsigned char)c;
      else
           c_buf[PAGEMAP_ENTRY - i - 1] = (unsigned char)c;
   }
   fclose(f);

   read_val = 0;
   for(i=0; i < PAGEMAP_ENTRY; i++){
      read_val = (read_val << 8) + c_buf[i];
   }

   if(GET_BIT(read_val, 63)) {
      pfn = GET_PFN(read_val);
   } else {
      printf("Page not present !\n");
   }
   if(GET_BIT(read_val, 62))
      printf("Page swapped\n");

   if (pfn == 0)
      return (uint64_t)-1;

   return pfn * (uint64_t)pgsz;
}

/*
 * SIGBUS handler to display the given information.
 */
static void sigbus_action(int signum, siginfo_t *siginfo, void *ctx) {
   printf("Signal %d received: ", signum);
   printf("%s on vaddr: %p\n",
      (siginfo->si_code == 4? "BUS_MCEERR_AR":"BUS_MCEERR_AO"),
      siginfo->si_addr);

  if (siginfo->si_code == 4) { /* BUS_MCEERR_AR */
	fprintf(stderr, "Exit from the signal handler on BUS_MCEERR_AR\n");
	_exit(1);
  }
}

int main(int argc, char ** argv) {
   struct sigaction my_sigaction;
   uint64_t virt_addr = 0, phys_addr;
   void *local_pnt;

   // Need to have the CAP_SYS_ADMIN capability to get PFNs values in pagemap.
   if (getuid() != 0) {
      fprintf(stderr, "Usage: %s needs to run as root\n", argv[0]);
      exit(EXIT_FAILURE);
   }

   // attach our SIGBUS handler.
   memset(&my_sigaction, 0, sizeof(my_sigaction));
   my_sigaction.sa_sigaction = sigbus_action;
   my_sigaction.sa_flags = SA_SIGINFO | SA_NODEFER;
   sigemptyset(&my_sigaction.sa_mask);
   if (sigaction(SIGBUS, &my_sigaction, NULL) == -1) {
      perror("Signal handler attach failed");
      exit(EXIT_FAILURE);
   }

   pgsz = sysconf(_SC_PAGESIZE);
   if (pgsz == -1) {
	   perror("sysconf(_SC_PAGESIZE)");
	   exit(EXIT_FAILURE);
   }
   early_reaction();

   // Allocate a private page.
   local_pnt = mmap(NULL, pgsz, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
   if (local_pnt == MAP_FAILED) {
      fprintf(stderr, "Memory Allocation failed !\n");
      exit(EXIT_FAILURE);
   }
   virt_addr = (uint64_t)local_pnt;

   // Dirty / map the page.
   sprintf((char *)local_pnt, "My page\n");

   phys_addr = physical_address(virt_addr);
   if (phys_addr == -1) {
      fprintf(stderr, "Virtual address translation 0x%llx failed\n", 
         (unsigned long long)virt_addr);
      exit(EXIT_FAILURE);
   }
   printf("\nData pages at 0x%llx  physically 0x%llx\n",
      (unsigned long long)virt_addr, (unsigned long long)phys_addr);
   fflush(stdout);

   printf("\nPress ENTER to continue\n");
   fgetc(stdin);

   // read the string at the beginning of page.
   printf("%s", (char *)local_pnt);

   phys_addr = physical_address(virt_addr);
   if (phys_addr == -1) {
      fprintf(stderr, "Virtual address translation 0x%llx failed\n", 
         (unsigned long long)virt_addr);
   } else {
      printf("\nData pages at 0x%llx  physically 0x%llx\n",
         (unsigned long long)virt_addr, (unsigned long long)phys_addr);
   }

   return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-12 22:44         ` William Roche
@ 2026-03-13 20:10           ` Borislav Petkov
  2026-03-16 15:27             ` William Roche
  2026-03-13 20:26           ` Yazen Ghannam
  1 sibling, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2026-03-13 20:10 UTC (permalink / raw)
  To: William Roche
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Thu, Mar 12, 2026 at 11:44:04PM +0100, William Roche wrote:
> Yazen, could you also please tell us if an existing non-SMCA AMD hardware
> could crash on updating an SMCA register ?

So, the situation is this: if software needs to access a MCA_DESTATUS MSR
- which is part of AMD's MCA extensions - then software needs to check the
 smca bit.

So your patch is correct. The justification about it is not.

It should talk about how software should touch that MSR *only* *after* having
checked mce_flags.smca.

Because, it doesn't matter what KVM does or whoever - we all adhere to the hw
spec.

Because technically speaking, this code should blow up on non-SMCA machines
too because they do support deferred errors (Bulldozer for example) but they
will #GP on access to the MCA_DESTATUS MSRs as those are reserved there.

So please rewrite your commit message to state that. And then you can talk
about what the real-life situation is which caught this.

As to your use case - thanks for explaining it. If this is something which
people run, then it would be wonderful if we had a simple test script in the
kernel which verifies new changes don't break it and so that we can run it
periodically as part of testing.

HTH.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-13 20:10           ` Borislav Petkov
@ 2026-03-16 15:27             ` William Roche
  0 siblings, 0 replies; 12+ messages in thread
From: William Roche @ 2026-03-16 15:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On 3/13/26 21:10, Borislav Petkov wrote:
> On Thu, Mar 12, 2026 at 11:44:04PM +0100, William Roche wrote:
>> Yazen, could you also please tell us if an existing non-SMCA AMD hardware
>> could crash on updating an SMCA register ?
> 
> So, the situation is this: if software needs to access a MCA_DESTATUS MSR
> - which is part of AMD's MCA extensions - then software needs to check the
>   smca bit.
> 
> So your patch is correct. The justification about it is not.
> 
> It should talk about how software should touch that MSR *only* *after* having
> checked mce_flags.smca.
> 

Ok, I understand your point.

> Because, it doesn't matter what KVM does or whoever - we all adhere to the hw
> spec.
> 
> Because technically speaking, this code should blow up on non-SMCA machines
> too because they do support deferred errors (Bulldozer for example) but they
> will #GP on access to the MCA_DESTATUS MSRs as those are reserved there.

This is a little more complicated as Yazen raised the situation in his 
answer. But I agree that SMCA specific registers are reserved and should 
not be accessed without checking that it is allowed to do so, first.

> 
> So please rewrite your commit message to state that. And then you can talk
> about what the real-life situation is which caught this.
> 

Sure, I'm going to submit a new version of this patch using this new 
commit message:

  x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines

  Access to SMCA specific registers like MCA_DESTAT should only be done
  after having checked the smca bit. Avoiding a non-SMCA machine (like
  AMD QEMU/KVM VMs) crash during deferred error handling.

  Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
  Signed-off-by: William Roche <william.roche@oracle.com>
  Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
  Cc: stable@vger.kernel.org


> As to your use case - thanks for explaining it. If this is something which
> people run, then it would be wonderful if we had a simple test script in the
> kernel which verifies new changes don't break it and so that we can run it
> periodically as part of testing.

That would be great !
If there is a framework to create simple test script running the built 
kernel into a VM, I'd be happy to know about it and create the test we 
are talking about -- as a separate fix proposal.

Thanks again for your feedback,
William.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-12 22:44         ` William Roche
  2026-03-13 20:10           ` Borislav Petkov
@ 2026-03-13 20:26           ` Yazen Ghannam
  2026-03-16 15:26             ` William Roche
  1 sibling, 1 reply; 12+ messages in thread
From: Yazen Ghannam @ 2026-03-13 20:26 UTC (permalink / raw)
  To: William Roche
  Cc: Borislav Petkov, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Thu, Mar 12, 2026 at 11:44:04PM +0100, William Roche wrote:

[...]

> 
> Yazen may help us on this aspect: Could you please let us know if there is
> an AMD specification for accessing SMCA registers on non SMCA machines ?
> 
> 
> Now if we had a valid case of an existing non-SMCA AMD hardware that could
> crash on updating an SMCA register, the fix would be needed not only for the
> VM case.
> 
> Yazen, could you also please tell us if an existing non-SMCA AMD hardware
> could crash on updating an SMCA register ?
> 

All the systems I have access to are Zen systems, and all Zen systems
are SMCA systems. I'll try to find a older system to test (Bulldozer,
etc.).

[...]

> 
> I have a procedure to verify the behavior: It consists of running the
> upstream kernel in a VM (on an AMD platform) and injecting a memory error
> from the hardware platform to this VM to mimic a real hardware error being
> reported to the platform Kernel.
> 
> To do so:
> Run Qemu as root (to help with the address translation).
> The VM runs the upstream kernel.
> Run the small attached program in the VM as root, so that it gives a guest
> physical address of one of its mapped memory page.
> 
> [root@VM]# ./mce_process_react_x86
> Setting Early kill... Ok
> 
> Data pages at 0xXXXXXXX  physically 0xYYYYY000
> 
> -> DON'T Press enter !   (just leave the process wait here)
> 
> Ask the emulator (QEMU in this case) to give the host physical address of
> the guest physical page:
>  (qemu) gpa2hpa 0xYYYYY000
>  Host physical address for 0xYYYYY000 (pc.ram) is 0xPFN000
> 
> From the host physical address get the pfn value (removing the last 3 zeros
> of the address) to poison.
> 
> On the host, use hwpoison kernel module:
> [root@host]# modprobe hwpoison_inject
> 
> and inject an error to the targeted pfn:
> [root@host]# echo 0xPFN > /sys/kernel/debug/hwpoison/corrupt-pfn
> 
> Than wait until the Asynchronous error generated reaches the VM (it can take
> up to 5 minutes on AMD virtualization) to see the VM kernel deal with it.

...hint for below question.

> 
> Without this suggested fix, the VM kernel panics, with the stack trace I
> gave:
> 
> mce: MSR access error: WRMSR to 0xc0002098 (tried to write
> 0x0000000000000000)
> at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)
> 
>    amd_clear_bank+0x6e/0x70
>    machine_check_poll+0x228/0x2e0
>    ? __pfx_mce_timer_fn+0x10/0x10
>    mce_timer_fn+0xb1/0x130
>    ? __pfx_mce_timer_fn+0x10/0x10
>    call_timer_fn+0x26/0x120
>    __run_timers+0x202/0x290
>    run_timer_softirq+0x49/0x100
>    handle_softirqs+0xeb/0x2c0
>    __irq_exit_rcu+0xda/0x100
>    sysvec_apic_timer_interrupt+0x71/0x90
> [...]
>   Kernel panic - not syncing: MCA architectural violation!

The code flow indicates that a Deferred error was found by MCA polling.

I thought QEMU injects a #MC into the guest?

William, do you encounter the issue if you disable MCA polling in the
guest?

To my knowledge, Deferred errors are reported starting with Zen/SMCA
systems, even though the concept is found in older documentation. This
is another reason for the implicit handling.

I see in QEMU we set the DEFERRED status bit for BUS_MCEERR_AO errors. I
don't recall why we did that. I'll need to review the old threads.

I feel like the intent was to select bits to produce the desired outcome
rather than faithfully replicate hardware behavior. Specifically, the
DEFERRED status bit would prevent CE filtering condition in
do_machine_check(). And it would trigger the AO flow in the guest rather
than the AR flow if we set the UC status bit.

Another example is we use the POISON status bit so the address is marked
as "usable". A real DEFERRED error would never have the POISON status
bit; they are mutually exclusive by definition.

But there may be another hidden issue: handling the error through
polling rather than #MC. I'm thinking this isn't intentional, and the
recent Linux changes exposed this behavior.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-13 20:26           ` Yazen Ghannam
@ 2026-03-16 15:26             ` William Roche
  2026-03-19 14:25               ` Yazen Ghannam
  0 siblings, 1 reply; 12+ messages in thread
From: William Roche @ 2026-03-16 15:26 UTC (permalink / raw)
  To: Yazen Ghannam
  Cc: Borislav Petkov, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On 3/13/26 21:26, Yazen Ghannam wrote:
> On Thu, Mar 12, 2026 at 11:44:04PM +0100, William Roche wrote:
> 
> [...]
> 
>>
>> Yazen may help us on this aspect: Could you please let us know if there is
>> an AMD specification for accessing SMCA registers on non SMCA machines ?
>>
>>
>> Now if we had a valid case of an existing non-SMCA AMD hardware that could
>> crash on updating an SMCA register, the fix would be needed not only for the
>> VM case.
>>
>> Yazen, could you also please tell us if an existing non-SMCA AMD hardware
>> could crash on updating an SMCA register ?
>>
> 
> All the systems I have access to are Zen systems, and all Zen systems
> are SMCA systems. I'll try to find a older system to test (Bulldozer,
> etc.).

I don't think that it is needed anymore, if the bare metal doesn't show 
this case of AO errors dealt the same way (as discussed below).
It looks to me like the QEMU/KVM VM case could be a specific case, 
exposed with your new change.

> 
> [...]
> 
>>
>> I have a procedure to verify the behavior: It consists of running the
>> upstream kernel in a VM (on an AMD platform) and injecting a memory error
>> from the hardware platform to this VM to mimic a real hardware error being
>> reported to the platform Kernel.
>>
>> To do so:
>> Run Qemu as root (to help with the address translation).
>> The VM runs the upstream kernel.
>> Run the small attached program in the VM as root, so that it gives a guest
>> physical address of one of its mapped memory page.
>>
>> [root@VM]# ./mce_process_react_x86
>> Setting Early kill... Ok
>>
>> Data pages at 0xXXXXXXX  physically 0xYYYYY000
>>
>> -> DON'T Press enter !   (just leave the process wait here)
>>
>> Ask the emulator (QEMU in this case) to give the host physical address of
>> the guest physical page:
>>   (qemu) gpa2hpa 0xYYYYY000
>>   Host physical address for 0xYYYYY000 (pc.ram) is 0xPFN000
>>
>>  From the host physical address get the pfn value (removing the last 3 zeros
>> of the address) to poison.
>>
>> On the host, use hwpoison kernel module:
>> [root@host]# modprobe hwpoison_inject
>>
>> and inject an error to the targeted pfn:
>> [root@host]# echo 0xPFN > /sys/kernel/debug/hwpoison/corrupt-pfn
>>
>> Than wait until the Asynchronous error generated reaches the VM (it can take
>> up to 5 minutes on AMD virtualization) to see the VM kernel deal with it.
> 
> ...hint for below question.
> 
>>
>> Without this suggested fix, the VM kernel panics, with the stack trace I
>> gave:
>>
>> mce: MSR access error: WRMSR to 0xc0002098 (tried to write
>> 0x0000000000000000)
>> at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)
>>
>>     amd_clear_bank+0x6e/0x70
>>     machine_check_poll+0x228/0x2e0
>>     ? __pfx_mce_timer_fn+0x10/0x10
>>     mce_timer_fn+0xb1/0x130
>>     ? __pfx_mce_timer_fn+0x10/0x10
>>     call_timer_fn+0x26/0x120
>>     __run_timers+0x202/0x290
>>     run_timer_softirq+0x49/0x100
>>     handle_softirqs+0xeb/0x2c0
>>     __irq_exit_rcu+0xda/0x100
>>     sysvec_apic_timer_interrupt+0x71/0x90
>> [...]
>>    Kernel panic - not syncing: MCA architectural violation!
> 
> The code flow indicates that a Deferred error was found by MCA polling.

This is right.

> 
> I thought QEMU injects a #MC into the guest?

The way AO error handling has been integrated to QEMU/KVM for the AMD VM 
case relies on machine_check_poll()

> 
> William, do you encounter the issue if you disable MCA polling in the
> guest?

If I disable machine check polling (with mce=ignore_ce kernel option for 
example), the AO error is not seen in the VM anymore, and of course we 
don't crash because of it.

> 
> To my knowledge, Deferred errors are reported starting with Zen/SMCA
> systems, even though the concept is found in older documentation. This
> is another reason for the implicit handling.
> 
> I see in QEMU we set the DEFERRED status bit for BUS_MCEERR_AO errors. I
> don't recall why we did that. I'll need to review the old threads.
> 
> I feel like the intent was to select bits to produce the desired outcome
> rather than faithfully replicate hardware behavior. Specifically, the
> DEFERRED status bit would prevent CE filtering condition in
> do_machine_check(). And it would trigger the AO flow in the guest rather
> than the AR flow if we set the UC status bit.
> 
> Another example is we use the POISON status bit so the address is marked
> as "usable". A real DEFERRED error would never have the POISON status
> bit; they are mutually exclusive by definition.

That's the QEMU/KVM choice that was made about 2 years ago, and 
explained in the following comment of the *QEMU* fix:
     4b77512b2782 i386: Fix MCE support for AMD hosts
target/i386/kvm/kvm.c  function kvm_mce_inject():

      /* Setting the POISON bit for deferred errors indicates to the
       * guest kernel that the address provided by the MCE is valid
       * and usable which will ensure that the guest kernel will send
       * a SIGBUS_AO signal to the guest process. This allows for
       * more desirable behavior in the case that the guest process
       * with poisoned memory has set the MCE_KILL_EARLY prctl flag
       * which indicates that the process would prefer to handle or
       * shutdown due to the poisoned memory condition before the
       * memory has been accessed.
       *
       * While the POISON bit would not be set in a deferred error
       * sent from hardware, the bit is not meaningful for deferred
       * errors and can be reused in this scenario.
       */
       status |= MCI_STATUS_DEFERRED | MCI_STATUS_POISON;

> 
> But there may be another hidden issue: handling the error through
> polling rather than #MC. I'm thinking this isn't intentional, and the
> recent Linux changes exposed this behavior.

You are right about "recent Linux changes exposed this behavior", but 
handling AO this way was intentional.

With the suggested fix, we should cover this new exposed failure case.

Now if we have a better way to deal with AO error handling on AMD VMs, 
it could be the subject of a separate thread (probably a Qemu thread).
Our current suggested kernel fix would still be valid, even if it the 
code may not be exercised in the bare-metal case.

> 
> Thanks,
> Yazen


Thank you very much Yazen for your help !

Cheers,
William.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling
  2026-03-16 15:26             ` William Roche
@ 2026-03-19 14:25               ` Yazen Ghannam
  0 siblings, 0 replies; 12+ messages in thread
From: Yazen Ghannam @ 2026-03-19 14:25 UTC (permalink / raw)
  To: William Roche
  Cc: Borislav Petkov, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel, John.Allen, jane.chu

On Mon, Mar 16, 2026 at 04:26:11PM +0100, William Roche wrote:
[...]
> 
> With the suggested fix, we should cover this new exposed failure case.
> 
> Now if we have a better way to deal with AO error handling on AMD VMs, it
> could be the subject of a separate thread (probably a Qemu thread).
> Our current suggested kernel fix would still be valid, even if it the code
> may not be exercised in the bare-metal case.
> 

Yes, that's right. Enhancing the AO handling flow is separate
discussion.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 0/1] AMD VM crashing on deferred memory error injection
  2026-02-18 16:30 [PATCH v2 0/1] AMD VM crashing on deferred memory error injection “William Roche
  2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
@ 2026-03-12 14:23 ` William Roche
  1 sibling, 0 replies; 12+ messages in thread
From: William Roche @ 2026-03-12 14:23 UTC (permalink / raw)
  To: yazen.ghannam, tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu

On 2/18/26 17:30, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
> 
> Thank you very much Yazen for your review and all the suggestions!
> 
> v2 changes:
> - Commit title changed to:
>    x86/mce/amd: Fix VM crash during deferred error handling
> - Commit message with capitalized QEMU and KVM as well as the imperative
>    statement suggested by Yazen
> - "CC stable" tag placed after "Signed-off-by"
>    (The documentation asks for "the sign-off area" without more details)
> - blank line added to separate SCMA code block and the update of
>    MCA_STATUS.
> 
>   --
> 
> After the integration of the following commit:
> 	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling
> 
> AMD Qemu VM started to crash when dealing with deferred memory error
> injection with a stack trace like:
> 
> mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
> at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)
> 
>    amd_clear_bank+0x6e/0x70
>    machine_check_poll+0x228/0x2e0
>    ? __pfx_mce_timer_fn+0x10/0x10
>    mce_timer_fn+0xb1/0x130
>    ? __pfx_mce_timer_fn+0x10/0x10
>    call_timer_fn+0x26/0x120
>    __run_timers+0x202/0x290
>    run_timer_softirq+0x49/0x100
>    handle_softirqs+0xeb/0x2c0
>    __irq_exit_rcu+0xda/0x100
>    sysvec_apic_timer_interrupt+0x71/0x90
> [...]
>   Kernel panic - not syncing: MCA architectural violation!
> 
> See the discussion at:
> https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/
> 
> We identified a problem with SMCA specific registers access from
> non-SMCA platforms like a QEMU/KVM machine.
> 
> This patch is checkpatch.pl clean.
> Unit test of memory error injection works fine with it.
> 
> 
> William Roche (1):
>    x86/mce/amd: Fix VM crash during deferred error handling
> 
>   arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 

Hello,

This fix has been reviewed by Yazen Ghannam. The code tested with 
QEMU/KVM virtual machines on AMD platforms. The commit that is fixed 
here (7cb735d7c0cb) is present in the stable branch linux-6.19.y.

Could you please let me know if anything is missing to integrate this fix ?

Thanks in advance for your feedback,
William.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-19 14:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 16:30 [PATCH v2 0/1] AMD VM crashing on deferred memory error injection “William Roche
2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
2026-03-12 14:42   ` Borislav Petkov
2026-03-12 15:11     ` William Roche
2026-03-12 16:04       ` Borislav Petkov
2026-03-12 22:44         ` William Roche
2026-03-13 20:10           ` Borislav Petkov
2026-03-16 15:27             ` William Roche
2026-03-13 20:26           ` Yazen Ghannam
2026-03-16 15:26             ` William Roche
2026-03-19 14:25               ` Yazen Ghannam
2026-03-12 14:23 ` [PATCH v2 0/1] AMD VM crashing on deferred memory error injection William Roche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox