qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] target/arm: Improvement on memory error handling
@ 2025-02-14  4:16 Gavin Shan
  2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

Currently, there is only one CPER buffer (entry), meaning only one
memory error can be reported. In extreme case, multiple memory errors
can be raised on different vCPUs. For example, a singile memory error
on a 64KB page of the host can results in 16 memory errors to 4KB
pages of the guest. Unfortunately, the virtual machine is simply aborted
by multiple concurrent memory errors, as the following call trace shows.
A SEA exception is injected to the guest so that the CPER buffer can
be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
Otherwise, abort() is triggered to crash the virtual machine.

  kvm_vcpu_thread_fn
    kvm_cpu_exec
      kvm_arch_on_sigbus_vcpu
        kvm_cpu_synchronize_state
        acpi_ghes_memory_errors         (a)
        kvm_inject_arm_sea | abort

It's arguably to crash the virtual machine in this case. The better
behaviour would be to retry on pushing the memory errors, to keep the
virtual machine alive so that the administrator has chance to chime
in, for example to dump the important data with luck. This series
adds one more parameter to acpi_ghes_memory_errors() so that it will
be tried to push the memory error until it succeeds.

Gavin Shan (4):
  acpi/ghes: Make ghes_record_cper_errors() static
  acpi/ghes: Use error_report() in ghes_record_cper_errors()
  acpi/ghes: Allow retry to write CPER errors
  target/arm: Retry pushing CPER error if necessary

 hw/acpi/ghes-stub.c    |  3 ++-
 hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
 include/hw/acpi/ghes.h |  5 ++---
 target/arm/kvm.c       | 31 +++++++++++++++++++++++------
 4 files changed, 51 insertions(+), 33 deletions(-)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-28  1:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
2025-02-21 10:44   ` Philippe Mathieu-Daudé
2025-02-14  4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
2025-02-14  4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
2025-02-19 17:55   ` Igor Mammedov
2025-02-21  5:27     ` Gavin Shan
2025-02-21 11:04       ` Jonathan Cameron via
2025-02-25 11:19         ` Igor Mammedov
2025-02-26  4:58           ` Gavin Shan
2025-02-28  1:55             ` Jonathan Cameron via
2025-02-26  6:56         ` Gavin Shan
2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
2025-02-17  0:29   ` Gavin Shan
2025-02-14 10:12 ` Jonathan Cameron via
2025-02-17  3:49   ` Gavin Shan
2025-02-14 12:59 ` Mauro Carvalho Chehab
2025-02-17  3:58   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).