qemu-arm.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] target/arm/kvm: Improve memory error handling
@ 2025-11-05 11:44 Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages. Those 16x guest pages are most likely
owned by separate threads and accessed by the threads in parallel.
It means 16x memory errors can be raised at once. However, we're
unable to handle this situation because the only error source has
one read acknowledgement register in current design. QEMU has to
crash in the following path due to the previously delivered error
isn't acknowledged by the guest on attempt to deliver another error.

  kvm_vcpu_thread_fn
    kvm_cpu_exec
      kvm_arch_on_sigbus_vcpu
        kvm_cpu_synchronize_state
        acpi_ghes_memory_errors
        abort

This series fixes the issue by sending 16x consective CPER errors
which are contained in a single GHES error block.

PATCH[1-3] Increases GHES raw data maximal length from 1KiB to 4KiB
PATCH[4]   Supports multiple error records in a single error block
PATCH[5-6] Improves the error handling in the error delivery path
PATCH[7]   Introduces helper push_ghes_memory_errors()
PATCH[8]   Delivers 16x consective CPERs in a single error block

Changelog
=========
v3:
  * v2: https://lists.nongnu.org/archive/html/qemu-arm/2025-10/msg00372.html
  * Code and changelog improvements                            (Jonathan)
  * Fixed GHES error block status field and improved error
    handling in the error delivery path                        (Igor)
  * Fixed ACPI HEST table and document                         (Mauro)
v2:
  * v1: https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg00897.html
  * Send 16x memory errors for the specific case               (Jonathan)

Gavin Shan (8):
  tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  acpi/ghes: Increase GHES raw data maximal length to 4KiB
  tests/qtest/bios-tables-test: Update HEST table
  acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
  acpi/ghes: Bail early on error from get_ghes_source_offsets()
  acpi/ghes: Use error_abort in acpi_ghes_memory_errors()
  kvm/arm/kvm: Introduce helper push_ghes_memory_errors()
  target/arm/kvm: Support multiple memory CPERs injection

 docs/specs/acpi_hest_ghes.rst     |   2 +-
 hw/acpi/ghes-stub.c               |   6 +--
 hw/acpi/ghes.c                    |  78 +++++++++++++++---------------
 include/hw/acpi/ghes.h            |   5 +-
 target/arm/kvm.c                  |  69 +++++++++++++++++++++++---
 tests/data/acpi/aarch64/virt/HEST | Bin 224 -> 224 bytes
 6 files changed, 108 insertions(+), 52 deletions(-)

-- 
2.51.0



^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2025-11-12 13:23 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-10 14:11   ` Igor Mammedov
2025-11-11  4:05     ` Gavin Shan
2025-11-12 12:32       ` Igor Mammedov
2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-11-05 14:14   ` Jonathan Cameron via
2025-11-06  3:15     ` Gavin Shan
2025-11-10 14:49       ` Igor Mammedov
2025-11-11  4:08         ` Gavin Shan
2025-11-11 10:07           ` Jonathan Cameron via
2025-11-11 10:55             ` Gavin Shan
2025-11-11 11:55               ` Jonathan Cameron via
2025-11-11 12:19                 ` Gavin Shan
2025-11-11 13:12                   ` Jonathan Cameron via
2025-11-10 14:38   ` Igor Mammedov
2025-11-11  4:40     ` Gavin Shan
2025-11-12 13:12       ` Igor Mammedov
2025-11-10 14:43   ` Philippe Mathieu-Daudé
2025-11-10 23:38     ` Gavin Shan
2025-11-11  3:40       ` Gavin Shan
2025-11-10 14:48   ` Philippe Mathieu-Daudé
2025-11-11  3:44     ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-10 14:50   ` Philippe Mathieu-Daudé
2025-11-11  3:48     ` Gavin Shan
2025-11-10 14:51   ` Igor Mammedov
2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
2025-11-05 14:18   ` Jonathan Cameron via
2025-11-10 14:53   ` Igor Mammedov
2025-11-10 14:54   ` Philippe Mathieu-Daudé
2025-11-11  3:58     ` Gavin Shan
2025-11-12 12:49       ` Igor Mammedov
2025-11-11  5:08     ` Markus Armbruster
2025-11-11  5:25   ` Markus Armbruster
2025-11-11  6:02     ` Gavin Shan
2025-11-11  7:31       ` Markus Armbruster
2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-11-05 14:19   ` Jonathan Cameron via
2025-11-10 14:56   ` Igor Mammedov
2025-11-11  4:09     ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-05 14:37   ` Jonathan Cameron via
2025-11-06  3:26     ` Gavin Shan
2025-11-11 10:12       ` Jonathan Cameron via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).