* [PATCH v1 0/1] AMD VM crashing on deferred memory error injection
@ 2026-02-13 18:26 “William Roche
2026-02-13 18:26 ` [PATCH v1 1/1] x86/mce: AMD deferred error handling crashes Qemu VMs “William Roche
0 siblings, 1 reply; 3+ messages in thread
From: “William Roche @ 2026-02-13 18:26 UTC (permalink / raw)
To: tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa, linux-edac,
linux-kernel
Cc: yazen.ghannam, John.Allen, jane.chu, william.roche
From: William Roche <william.roche@oracle.com>
After the integration of the following commit:
7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling
AMD Qemu VM started to crash when dealing with deferred memory error
injection with a stack trace like:
mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)
amd_clear_bank+0x6e/0x70
machine_check_poll+0x228/0x2e0
? __pfx_mce_timer_fn+0x10/0x10
mce_timer_fn+0xb1/0x130
? __pfx_mce_timer_fn+0x10/0x10
call_timer_fn+0x26/0x120
__run_timers+0x202/0x290
run_timer_softirq+0x49/0x100
handle_softirqs+0xeb/0x2c0
__irq_exit_rcu+0xda/0x100
sysvec_apic_timer_interrupt+0x71/0x90
[...]
Kernel panic - not syncing: MCA architectural violation!
See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/
We identified a problem with SMCA specific registers access from
non-SMCA platforms like a Qemu/KVM machine.
This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.
The commit introducing this error has been integrated into the stable
tree too, that's the reason why I added the Cc: stable... entry.
Thanks in advance for your feedback.
William Roche (1):
x86/mce: AMD deferred error handling crashes Qemu VMs
arch/x86/kernel/cpu/mce/amd.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v1 1/1] x86/mce: AMD deferred error handling crashes Qemu VMs
2026-02-13 18:26 [PATCH v1 0/1] AMD VM crashing on deferred memory error injection “William Roche
@ 2026-02-13 18:26 ` “William Roche
2026-02-17 15:24 ` Yazen Ghannam
0 siblings, 1 reply; 3+ messages in thread
From: “William Roche @ 2026-02-13 18:26 UTC (permalink / raw)
To: tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa, linux-edac,
linux-kernel
Cc: yazen.ghannam, John.Allen, jane.chu, william.roche
From: William Roche <william.roche@oracle.com>
A non Scalable MCA system may prevent access to SMCA specific registers
like MCA_DESTAT. This is the case of Qemu/kvm VMs, and the VM kernel
needs to avoid accessing SMCA registers on non-SMCA platforms.
Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Cc: stable@vger.kernel.org
Signed-off-by: William Roche <william.roche@oracle.com>
---
arch/x86/kernel/cpu/mce/amd.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3f1dda355307..53c4b032ad35 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -875,14 +875,18 @@ void amd_clear_bank(struct mce *m)
{
amd_reset_thr_limit(m->bank);
- /* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
- if (m->status & MCI_STATUS_DEFERRED)
- mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
-
- /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
- if (m->kflags & MCE_CHECK_DFR_REGS)
- return;
+ if (mce_flags.smca) {
+ /*
+ * Clear MCA_DESTAT for all deferred errors even those
+ * logged in MCA_STATUS.
+ */
+ if (m->status & MCI_STATUS_DEFERRED)
+ mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
+ /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
+ if (m->kflags & MCE_CHECK_DFR_REGS)
+ return;
+ }
mce_wrmsrq(mca_msr_reg(m->bank, MCA_STATUS), 0);
}
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v1 1/1] x86/mce: AMD deferred error handling crashes Qemu VMs
2026-02-13 18:26 ` [PATCH v1 1/1] x86/mce: AMD deferred error handling crashes Qemu VMs “William Roche
@ 2026-02-17 15:24 ` Yazen Ghannam
0 siblings, 0 replies; 3+ messages in thread
From: Yazen Ghannam @ 2026-02-17 15:24 UTC (permalink / raw)
To: “William Roche
Cc: tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa, linux-edac,
linux-kernel, John.Allen, jane.chu
On Fri, Feb 13, 2026 at 06:26:30PM +0000, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
>
Hi William, I agree with the fix, and feedback is mostly around
style/convention.
Use "x86/mce/amd" as the $SUBJECT prefix.
Subject should be imperative.
Ex. "x86/mce/amd: Fix VM crash during deferred error handling"
> A non Scalable MCA system may prevent access to SMCA specific registers
> like MCA_DESTAT. This is the case of Qemu/kvm VMs, and the VM kernel
I expect that QEMU and KVM should be capitalized.
> needs to avoid accessing SMCA registers on non-SMCA platforms.
>
The commit message should include an imperative statement.
Ex. "Check for the SMCA feature before accessing MCA_DESTAT."
> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> Cc: stable@vger.kernel.org
> Signed-off-by: William Roche <william.roche@oracle.com>
"CC stable" tag should go after Signed-off-by.
> ---
> arch/x86/kernel/cpu/mce/amd.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 3f1dda355307..53c4b032ad35 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -875,14 +875,18 @@ void amd_clear_bank(struct mce *m)
> {
> amd_reset_thr_limit(m->bank);
>
> - /* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */
> - if (m->status & MCI_STATUS_DEFERRED)
> - mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
> -
> - /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
> - if (m->kflags & MCE_CHECK_DFR_REGS)
> - return;
> + if (mce_flags.smca) {
> + /*
> + * Clear MCA_DESTAT for all deferred errors even those
> + * logged in MCA_STATUS.
> + */
> + if (m->status & MCI_STATUS_DEFERRED)
> + mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0);
>
> + /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */
> + if (m->kflags & MCE_CHECK_DFR_REGS)
> + return;
> + }
Please include a newline here.
> mce_wrmsrq(mca_msr_reg(m->bank, MCA_STATUS), 0);
> }
>
Otherwise, looks good to me.
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Thanks,
Yazen
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-17 15:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 18:26 [PATCH v1 0/1] AMD VM crashing on deferred memory error injection “William Roche
2026-02-13 18:26 ` [PATCH v1 1/1] x86/mce: AMD deferred error handling crashes Qemu VMs “William Roche
2026-02-17 15:24 ` Yazen Ghannam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox