* [PATCH v3 0/1] AMD VM crashing on deferred memory error injection @ 2026-03-17 10:38 “William Roche 2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche 0 siblings, 1 reply; 9+ messages in thread From: “William Roche @ 2026-03-17 10:38 UTC (permalink / raw) To: bp, yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel Cc: John.Allen, jane.chu, william.roche From: William Roche <william.roche@oracle.com> v3 changes: - Commit title and message changed to put the emphasis on SMCA access correctness - Borislav Petkov feedback. v2 changes: - Commit title changed to: x86/mce/amd: Fix VM crash during deferred error handling - Commit message with capitalized QEMU and KVM as well as the imperative statement suggested by Yazen - "CC stable" tag placed after "Signed-off-by" (The documentation asks for "the sign-off area" without more details) - blank line added to separate SMCA code block and the update of MCA_STATUS. -- After the integration of the following commit: 7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling A problem was found with AMD Qemu VM - it started to crash when dealing with deferred memory error injection with a stack trace like: mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000) at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60) amd_clear_bank+0x6e/0x70 machine_check_poll+0x228/0x2e0 ? __pfx_mce_timer_fn+0x10/0x10 mce_timer_fn+0xb1/0x130 ? __pfx_mce_timer_fn+0x10/0x10 call_timer_fn+0x26/0x120 __run_timers+0x202/0x290 run_timer_softirq+0x49/0x100 handle_softirqs+0xeb/0x2c0 __irq_exit_rcu+0xda/0x100 sysvec_apic_timer_interrupt+0x71/0x90 [...] Kernel panic - not syncing: MCA architectural violation! See the discussion at: https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/ We identified a problem with SMCA specific registers access from non-SMCA platforms like a QEMU/KVM machine. This patch is checkpatch.pl clean. Unit test of memory error injection works fine with it. William Roche (1): x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 10:38 [PATCH v3 0/1] AMD VM crashing on deferred memory error injection “William Roche @ 2026-03-17 10:38 ` “William Roche 2026-03-17 13:32 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: “William Roche @ 2026-03-17 10:38 UTC (permalink / raw) To: bp, yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel Cc: John.Allen, jane.chu, william.roche From: William Roche <william.roche@oracle.com> Access to SMCA specific registers like MCA_DESTAT should only be done after having checked the smca bit. Avoiding a non-SMCA machine (like AMD QEMU/KVM VMs) crash during deferred error handling. Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") Signed-off-by: William Roche <william.roche@oracle.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: stable@vger.kernel.org --- arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index da13c1e37f87..a030ee4cecc2 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -875,13 +875,18 @@ void amd_clear_bank(struct mce *m) { amd_reset_thr_limit(m->bank); - /* Clear MCA_DESTAT for all deferred errors even those logged in MCA_STATUS. */ - if (m->status & MCI_STATUS_DEFERRED) - mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0); + if (mce_flags.smca) { + /* + * Clear MCA_DESTAT for all deferred errors even those + * logged in MCA_STATUS. + */ + if (m->status & MCI_STATUS_DEFERRED) + mce_wrmsrq(MSR_AMD64_SMCA_MCx_DESTAT(m->bank), 0); - /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */ - if (m->kflags & MCE_CHECK_DFR_REGS) - return; + /* Don't clear MCA_STATUS if MCA_DESTAT was used exclusively. */ + if (m->kflags & MCE_CHECK_DFR_REGS) + return; + } mce_wrmsrq(mca_msr_reg(m->bank, MCA_STATUS), 0); } -- 2.47.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche @ 2026-03-17 13:32 ` Borislav Petkov 2026-03-17 13:38 ` William Roche 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2026-03-17 13:32 UTC (permalink / raw) To: “William Roche Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote: > From: William Roche <william.roche@oracle.com> > > Access to SMCA specific registers like MCA_DESTAT should only be done > after having checked the smca bit. Avoiding a non-SMCA machine (like > AMD QEMU/KVM VMs) crash during deferred error handling. Not good enough. I rewrote it to: Author: William Roche <william.roche@oracle.com> Date: Tue Mar 17 10:38:10 2026 +0000 x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs People do effort to inject MCEs into guests in order to simulate/test handling of real hardware errors. These efforts are of a questionable nature because, for one, a guest cannot really make any assumptions about the underlying machine and especially which MSR accesses the hypervisor filters and which it doesn't. See Link tag for the whole background. However, regardless of virtualization or not, access to SMCA-specific registers like MCA_DESTAT should only be done after having checked the smca feature bit. And there are AMD machines like Bulldozer (the one before Zen1) which do support deferred errors but are not SMCA machines. Therefore, properly check the feature bit before accessing related MSRs. [ bp: Rewrite commit message. ] Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") Signed-off-by: William Roche <william.roche@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 13:32 ` Borislav Petkov @ 2026-03-17 13:38 ` William Roche 2026-03-17 18:17 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: William Roche @ 2026-03-17 13:38 UTC (permalink / raw) To: Borislav Petkov Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On 3/17/26 14:32, Borislav Petkov wrote: > On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote: >> From: William Roche <william.roche@oracle.com> >> >> Access to SMCA specific registers like MCA_DESTAT should only be done >> after having checked the smca bit. Avoiding a non-SMCA machine (like >> AMD QEMU/KVM VMs) crash during deferred error handling. > > Not good enough. I rewrote it to: > > Author: William Roche <william.roche@oracle.com> > Date: Tue Mar 17 10:38:10 2026 +0000 > > x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs > > People do effort to inject MCEs into guests in order to simulate/test handling > of real hardware errors. These efforts are of a questionable nature because, > for one, a guest cannot really make any assumptions about the underlying > machine and especially which MSR accesses the hypervisor filters and > which it doesn't. See Link tag for the whole background. > > However, regardless of virtualization or not, access to SMCA-specific > registers like MCA_DESTAT should only be done after having checked the smca > feature bit. And there are AMD machines like Bulldozer (the one before Zen1) > which do support deferred errors but are not SMCA machines. > > Therefore, properly check the feature bit before accessing related MSRs. > > [ bp: Rewrite commit message. ] > > Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") > Signed-off-by: William Roche <william.roche@oracle.com> > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com Thank you. William. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 13:38 ` William Roche @ 2026-03-17 18:17 ` Borislav Petkov 2026-03-17 20:06 ` William Roche 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2026-03-17 18:17 UTC (permalink / raw) To: William Roche Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On Tue, Mar 17, 2026 at 02:38:58PM +0100, William Roche wrote: > On 3/17/26 14:32, Borislav Petkov wrote: > > On Tue, Mar 17, 2026 at 10:38:10AM +0000, “William Roche wrote: > > > From: William Roche <william.roche@oracle.com> > > > > > > Access to SMCA specific registers like MCA_DESTAT should only be done > > > after having checked the smca bit. Avoiding a non-SMCA machine (like > > > AMD QEMU/KVM VMs) crash during deferred error handling. > > > > Not good enough. I rewrote it to: > > > > Author: William Roche <william.roche@oracle.com> > > Date: Tue Mar 17 10:38:10 2026 +0000 > > x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs > > People do effort to inject MCEs into guests in order to simulate/test handling > > of real hardware errors. These efforts are of a questionable nature because, > > for one, a guest cannot really make any assumptions about the underlying > > machine and especially which MSR accesses the hypervisor filters and > > which it doesn't. See Link tag for the whole background. > > However, regardless of virtualization or not, access to SMCA-specific > > registers like MCA_DESTAT should only be done after having checked the smca > > feature bit. And there are AMD machines like Bulldozer (the one before Zen1) > > which do support deferred errors but are not SMCA machines. > > Therefore, properly check the feature bit before accessing related MSRs. > > [ bp: Rewrite commit message. ] > > Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling") > > Signed-off-by: William Roche <william.roche@oracle.com> > > Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > > Cc: stable@vger.kernel.org > > Link: https://lore.kernel.org/r/20260218163025.1316501-1-william.roche@oracle.com > > Thank you. Rewrote it again after talking to Yazen. A patch needs to have the proper justification why it exists! x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs People do effort to inject MCEs into guests in order to simulate/test handling of hardware errors. The real use case behind it is testing the handling of SIGBUS which the memory failure code sends to the process. If that process is QEMU, instead of killing the whole guest, the MCE can be injected into the guest kernel so that latter can attempt proper handling and kill the user *process* in the guest, instead, which caused the MCE. The assumption being here that the whole injection flow can supply enough information that the guest kernel can poinpoint the right process. But that's a different topic... Regardless of virtualization or not, access to SMCA-specific registers like MCA_DESTAT should only be done after having checked the smca feature bit. And there are AMD machines like Bulldozer (the one before Zen1) which do support deferred errors but are not SMCA machines. Therefore, properly check the feature bit before accessing related MSRs. [ bp: Rewrite commit message. ] -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 18:17 ` Borislav Petkov @ 2026-03-17 20:06 ` William Roche 2026-03-17 20:24 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: William Roche @ 2026-03-17 20:06 UTC (permalink / raw) To: Borislav Petkov Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu I just wanted to give a small precision about the VM error relay mechanism: On 3/17/26 19:17, Borislav Petkov wrote: > Rewrote it again after talking to Yazen. A patch needs to have the proper > justification why it exists! > > x86/mce/amd: Check SMCA feature bit before accessing SMCA MSRs > > People do effort to inject MCEs into guests in order to simulate/test > handling of hardware errors. The real use case behind it is testing the > handling of SIGBUS which the memory failure code sends to the process. > > If that process is QEMU, instead of killing the whole guest, the MCE can > be injected into the guest kernel so that latter can attempt proper > handling and kill the user *process* in the guest, instead, which > caused the MCE. The assumption being here that the whole injection flow > can supply enough information that the guest kernel can poinpoint the > right process. But that's a different topic... Relaying the error to the guest doesn't only have a value to target a VM process but also deal with free memory or clean file cache memory impacted etc... Cases where a memory error may not crash the kernel can benefit to the VM too (Kernel RAS features that are, as you said, a different topic :) ) There is also a small typo in "pinpoint" > > Regardless of virtualization or not, access to SMCA-specific registers > like MCA_DESTAT should only be done after having checked the smca > feature bit. And there are AMD machines like Bulldozer (the one before > Zen1) which do support deferred errors but are not SMCA machines. > > Therefore, properly check the feature bit before accessing related MSRs. > > [ bp: Rewrite commit message. ] Thank you very much for your feedback ! William. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 20:06 ` William Roche @ 2026-03-17 20:24 ` Borislav Petkov 2026-03-17 21:52 ` William Roche 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2026-03-17 20:24 UTC (permalink / raw) To: William Roche Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On Tue, Mar 17, 2026 at 09:06:54PM +0100, William Roche wrote: > Relaying the error to the guest doesn't only have a value to target a VM > process but also deal with free memory or clean file cache memory impacted > etc... Cases where a memory error may not crash the kernel can benefit to > the VM too I don't understand - what do you mean with "free memory or clean file cache memory"? > There is also a small typo in "pinpoint" Ack, fixed. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 20:24 ` Borislav Petkov @ 2026-03-17 21:52 ` William Roche 2026-03-18 20:24 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: William Roche @ 2026-03-17 21:52 UTC (permalink / raw) To: Borislav Petkov Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On 3/17/26 21:24, Borislav Petkov wrote: > On Tue, Mar 17, 2026 at 09:06:54PM +0100, William Roche wrote: >> Relaying the error to the guest doesn't only have a value to target a VM >> process but also deal with free memory or clean file cache memory impacted >> etc... Cases where a memory error may not crash the kernel can benefit to >> the VM too > > I don't understand - what do you mean with "free memory or clean file cache > memory"? The physical address of an uncorrected memory error (if/when it can be identified) can give a chance to a kernel reaction depending on the state (and type) of the impacted memory -- as implemented in mm/memory-failure.c with error_states[], me_pagecache_clean() or try_memory_failure()... The Kernel can try to "deal" with the error. The process case (with its SIGBUS) is probably the most common one, but a few kernel memory pages impacted by a memory error could be isolated (poisoned) without requiring a kernel crash. Free memory pages or clean page cache pages could be an example of that, they are poisoned and should not be used by the system after that. The kernel can also return EIO error on poisoned page cache failed access attempt, etc... These mechanisms are implemented for the bare-metal running kernel, but what is really interesting when relaying the error to a VM is that its kernel can, in some cases, also benefit from these mechanisms. And having a chance (even small) to avoid a VM crash is a significant gain for virtualized workload. Just giving my point of view on why we care about VM relayed memory errors :) William. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines 2026-03-17 21:52 ` William Roche @ 2026-03-18 20:24 ` Borislav Petkov 0 siblings, 0 replies; 9+ messages in thread From: Borislav Petkov @ 2026-03-18 20:24 UTC (permalink / raw) To: William Roche Cc: yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa, linux-edac, linux-kernel, John.Allen, jane.chu On Tue, Mar 17, 2026 at 10:52:50PM +0100, William Roche wrote: > The physical address of an uncorrected memory error (if/when it can be > identified) can give a chance to a kernel reaction depending on the state > (and type) of the impacted memory -- as implemented in mm/memory-failure.c > with error_states[], me_pagecache_clean() or try_memory_failure()... > > The Kernel can try to "deal" with the error. The process case (with its > SIGBUS) is probably the most common one, but a few kernel memory pages > impacted by a memory error could be isolated (poisoned) without requiring a > kernel crash. Free memory pages or clean page cache pages could be an > example of that, they are poisoned and should not be used by the system > after that. The kernel can also return EIO error on poisoned page cache > failed access attempt, etc... > > These mechanisms are implemented for the bare-metal running kernel, but what > is really interesting when relaying the error to a VM is that its kernel > can, in some cases, also benefit from these mechanisms. And having a chance > (even small) to avoid a VM crash is a significant gain for virtualized > workload. > > Just giving my point of view on why we care about VM relayed memory errors > :) Ah, you want to be able to handle an error belonging to a guest, regardless of which part it hits. As in, the guest memory hit could be pagecache, free memory, etc... anything that would prevent the guest from dying unnecessary death. Ack, makes sense. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-18 20:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-17 10:38 [PATCH v3 0/1] AMD VM crashing on deferred memory error injection “William Roche 2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche 2026-03-17 13:32 ` Borislav Petkov 2026-03-17 13:38 ` William Roche 2026-03-17 18:17 ` Borislav Petkov 2026-03-17 20:06 ` William Roche 2026-03-17 20:24 ` Borislav Petkov 2026-03-17 21:52 ` William Roche 2026-03-18 20:24 ` Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox