public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] AMD VM crashing on deferred memory error injection
@ 2026-02-18 16:30 “William Roche
  2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
  2026-03-12 14:23 ` [PATCH v2 0/1] AMD VM crashing on deferred memory error injection William Roche
  0 siblings, 2 replies; 12+ messages in thread
From: “William Roche @ 2026-02-18 16:30 UTC (permalink / raw)
  To: yazen.ghannam, tony.luck, bp, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

Thank you very much Yazen for your review and all the suggestions!

v2 changes:
- Commit title changed to:
  x86/mce/amd: Fix VM crash during deferred error handling
- Commit message with capitalized QEMU and KVM as well as the imperative
  statement suggested by Yazen
- "CC stable" tag placed after "Signed-off-by"
  (The documentation asks for "the sign-off area" without more details)
- blank line added to separate SCMA code block and the update of
  MCA_STATUS.

 --

After the integration of the following commit:
	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling

AMD Qemu VM started to crash when dealing with deferred memory error
injection with a stack trace like:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

  amd_clear_bank+0x6e/0x70
  machine_check_poll+0x228/0x2e0
  ? __pfx_mce_timer_fn+0x10/0x10
  mce_timer_fn+0xb1/0x130
  ? __pfx_mce_timer_fn+0x10/0x10
  call_timer_fn+0x26/0x120
  __run_timers+0x202/0x290
  run_timer_softirq+0x49/0x100
  handle_softirqs+0xeb/0x2c0
  __irq_exit_rcu+0xda/0x100
  sysvec_apic_timer_interrupt+0x71/0x90
[...]
 Kernel panic - not syncing: MCA architectural violation!

See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/

We identified a problem with SMCA specific registers access from
non-SMCA platforms like a QEMU/KVM machine.

This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.


William Roche (1):
  x86/mce/amd: Fix VM crash during deferred error handling

 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-19 14:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 16:30 [PATCH v2 0/1] AMD VM crashing on deferred memory error injection “William Roche
2026-02-18 16:30 ` [PATCH v2 1/1] x86/mce/amd: Fix VM crash during deferred error handling “William Roche
2026-03-12 14:42   ` Borislav Petkov
2026-03-12 15:11     ` William Roche
2026-03-12 16:04       ` Borislav Petkov
2026-03-12 22:44         ` William Roche
2026-03-13 20:10           ` Borislav Petkov
2026-03-16 15:27             ` William Roche
2026-03-13 20:26           ` Yazen Ghannam
2026-03-16 15:26             ` William Roche
2026-03-19 14:25               ` Yazen Ghannam
2026-03-12 14:23 ` [PATCH v2 0/1] AMD VM crashing on deferred memory error injection William Roche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox