public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/1] AMD VM crashing on deferred memory error injection
@ 2026-03-17 10:38 “William Roche
  2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche
  0 siblings, 1 reply; 9+ messages in thread
From: “William Roche @ 2026-03-17 10:38 UTC (permalink / raw)
  To: bp, yazen.ghannam, tony.luck, tglx, mingo, dave.hansen, x86, hpa,
	linux-edac, linux-kernel
  Cc: John.Allen, jane.chu, william.roche

From: William Roche <william.roche@oracle.com>

v3 changes:
- Commit title and message changed to put the emphasis on SMCA access
  correctness - Borislav Petkov feedback.

v2 changes:
- Commit title changed to:
  x86/mce/amd: Fix VM crash during deferred error handling
- Commit message with capitalized QEMU and KVM as well as the imperative
  statement suggested by Yazen
- "CC stable" tag placed after "Signed-off-by"
  (The documentation asks for "the sign-off area" without more details)
- blank line added to separate SMCA code block and the update of
  MCA_STATUS.

 --

After the integration of the following commit:
	7cb735d7c0cb x86/mce: Unify AMD DFR handler with MCA Polling

A problem was found with AMD Qemu VM - it started to crash when dealing
with deferred memory error injection with a stack trace like:

mce: MSR access error: WRMSR to 0xc0002098 (tried to write 0x0000000000000000)
at rIP: 0xffffffff8229894d (mce_wrmsrq+0x1d/0x60)

  amd_clear_bank+0x6e/0x70
  machine_check_poll+0x228/0x2e0
  ? __pfx_mce_timer_fn+0x10/0x10
  mce_timer_fn+0xb1/0x130
  ? __pfx_mce_timer_fn+0x10/0x10
  call_timer_fn+0x26/0x120
  __run_timers+0x202/0x290
  run_timer_softirq+0x49/0x100
  handle_softirqs+0xeb/0x2c0
  __irq_exit_rcu+0xda/0x100
  sysvec_apic_timer_interrupt+0x71/0x90
[...]
 Kernel panic - not syncing: MCA architectural violation!

See the discussion at:
https://lore.kernel.org/all/48d8e1c8-1eb9-49cc-8de8-78077f29c203@oracle.com/

We identified a problem with SMCA specific registers access from
non-SMCA platforms like a QEMU/KVM machine.

This patch is checkpatch.pl clean.
Unit test of memory error injection works fine with it.


William Roche (1):
  x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines

 arch/x86/kernel/cpu/mce/amd.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-18 20:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17 10:38 [PATCH v3 0/1] AMD VM crashing on deferred memory error injection “William Roche
2026-03-17 10:38 ` [PATCH v3 1/1] x86/mce/amd: Guard SMCA DESTAT access on non-SMCA machines “William Roche
2026-03-17 13:32   ` Borislav Petkov
2026-03-17 13:38     ` William Roche
2026-03-17 18:17       ` Borislav Petkov
2026-03-17 20:06         ` William Roche
2026-03-17 20:24           ` Borislav Petkov
2026-03-17 21:52             ` William Roche
2026-03-18 20:24               ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox