[PATCH v4 0/8] target/arm/kvm: Improve memory error handling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
@ 2025-11-12 17:25 Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages. Those 16x guest pages are most likely
owned by separate threads and accessed by the threads in parallel.
It means 16x memory errors can be raised at once. However, we're
unable to handle this situation because the only error source has
one read acknowledgement register in current design. QEMU has to
crash in the following path due to the previously delivered error
isn't acknowledged by the guest on attempt to deliver another error.

  kvm_vcpu_thread_fn
    kvm_cpu_exec
      kvm_arch_on_sigbus_vcpu
        kvm_cpu_synchronize_state
        acpi_ghes_memory_errors
        abort

This series fixes the issue by sending 16x consective CPER errors
which are contained in a single GHES error block.

PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
PATCH[5]   Supports multiple error records in a single error block
PATCH[6-7] Improves the error handling in the error delivery path
PATCH[8]   Sends 16x consective CPERs in a single block if needed

Changelog
=========
v4:
  * v3: https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00264.html
  * Pick r-b tags from Jonathan
  * Add compat property 'x-error-block-size' for migration     (Igor)
  * Code and commit log improvements                           (Igor/Jonathan)
  * Use error_fatal in the memory error delivery path          (Markus)
  * Use APIs from registerfields.h                             (Philippe)
v3:
  * v2: https://lists.nongnu.org/archive/html/qemu-arm/2025-10/msg00372.html
  * Code and changelog improvements                            (Jonathan)
  * Fixed GHES error block status field and improved error
    handling in the error delivery path                        (Igor)
  * Fixed ACPI HEST table and document                         (Mauro)
v2:
  * v1: https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg00897.html
  * Send 16x memory errors for the specific case               (Jonathan)

Gavin Shan (8):
  acpi/ghes: Make GHES max raw data length dynamic
  tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  acpi/ghes: Increase GHES raw data maximal length to 4KiB
  tests/qtest/bios-tables-test: Update HEST table
  acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs
  acpi/ghes: Bail early on error from get_ghes_source_offsets()
  acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
  target/arm/kvm: Support multiple memory CPERs injection

 docs/specs/acpi_hest_ghes.rst     |   6 +-
 hw/acpi/generic_event_device.c    |   2 +
 hw/acpi/ghes-stub.c               |   7 +-
 hw/acpi/ghes.c                    | 127 +++++++++++++++++-------------
 hw/core/machine.c                 |   1 +
 include/hw/acpi/ghes.h            |   8 +-
 target/arm/kvm.c                  |  72 +++++++++++++++--
 tests/data/acpi/aarch64/virt/HEST | Bin 224 -> 224 bytes
 8 files changed, 153 insertions(+), 70 deletions(-)

-- 
2.51.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

Make GHES max raw data length dynamic, preparing for the increased
length from 1KiB to 4KiB because we need to send 16 consective
errors in case a problematic 64KiB host page affects 16 4KiB guest
pages.

GHES max raw data length remains 1KiB as before until a compat
property is added to customize it in the subsequent patch.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes.c         | 30 +++++++++++++++++++-----------
 include/hw/acpi/ghes.h |  1 +
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 06555905ce..06f75df43d 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -32,9 +32,6 @@
 #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
 #define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
 
-/* The max size in bytes for one error block */
-#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
-
 /* Generic Hardware Error Source version 2 */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
 
@@ -232,6 +229,15 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
         ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
 }
 
+static inline uint32_t ghes_max_raw_data_length(AcpiGhesState *ags)
+{
+    if (ags->error_block_size == 0) {
+        return 1 * KiB;
+    } else {
+        return ags->error_block_size;
+    }
+}
+
 /*
  * Build table for the hardware error fw_cfg blob.
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
@@ -263,7 +269,7 @@ static void build_ghes_error_table(AcpiGhesState *ags, GArray *hardware_errors,
 
     /* Reserve space for Error Status Data Block */
     acpi_data_push(hardware_errors,
-        ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
+        ghes_max_raw_data_length(ags) * num_sources);
 
     /* Tell guest firmware to place hardware_errors blob into RAM */
     bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
@@ -280,7 +286,7 @@ static void build_ghes_error_table(AcpiGhesState *ags, GArray *hardware_errors,
                                        sizeof(uint64_t),
                                        ACPI_HW_ERROR_FW_CFG_FILE,
                                        error_status_block_offset +
-                                       i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+                                       i * ghes_max_raw_data_length(ags));
     }
 
     if (!ags->use_hest_addr) {
@@ -295,7 +301,8 @@ static void build_ghes_error_table(AcpiGhesState *ags, GArray *hardware_errors,
 }
 
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2_entry(GArray *table_data,
+static void build_ghes_v2_entry(AcpiGhesState *ags,
+                                GArray *table_data,
                                 BIOSLinker *linker,
                                 const AcpiNotificationSourceId *notif_src,
                                 uint16_t index, int num_sources)
@@ -323,7 +330,7 @@ static void build_ghes_v2_entry(GArray *table_data,
     /* Max Sections Per Record */
     build_append_int_noprefix(table_data, 1, 4);
     /* Max Raw Data Length */
-    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+    build_append_int_noprefix(table_data, ghes_max_raw_data_length(ags), 4);
 
     address_offset = table_data->len;
     /* Error Status Address */
@@ -339,7 +346,7 @@ static void build_ghes_v2_entry(GArray *table_data,
     build_ghes_hw_error_notification(table_data, notify);
 
     /* Error Status Block Length */
-    build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
+    build_append_int_noprefix(table_data, ghes_max_raw_data_length(ags), 4);
 
     /*
      * Read Ack Register
@@ -387,7 +394,8 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
     /* Error Source Count */
     build_append_int_noprefix(table_data, num_sources, 4);
     for (i = 0; i < num_sources; i++) {
-        build_ghes_v2_entry(table_data, linker, &notif_source[i], i, num_sources);
+        build_ghes_v2_entry(ags, table_data, linker,
+                            &notif_source[i], i, num_sources);
     }
 
     acpi_table_end(linker, &table);
@@ -518,7 +526,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
 
-    if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+    if (len > ghes_max_raw_data_length(ags)) {
         error_setg(errp, "GHES CPER record is too big: %zd", len);
         return;
     }
@@ -575,7 +583,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
      * error data entry
      */
     assert((data_length + ACPI_GHES_GESB_SIZE) <=
-            ACPI_GHES_MAX_RAW_DATA_LENGTH);
+           ghes_max_raw_data_length(ags));
 
     ghes_gen_err_data_uncorrectable_recoverable(block, guid, data_length);
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index df2ecbf6e4..c98bd6d1e2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -88,6 +88,7 @@ typedef struct AcpiGhesState {
     uint64_t hest_addr_le;
     uint64_t hw_error_le;
     bool use_hest_addr; /* True if HEST address is present */
+    uint32_t error_block_size;
 } AcpiGhesState;
 
 void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

Adds HEST blob to the whilelist, preparing for the increased GHES
raw data maximal length from 1KiB to 4KiB, because we need to send
16 consective errors in case a problematic 64KiB host page affects
16 4KiB guest pages.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..39901c58d6 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/HEST",
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

The current GHES raw data maximal length isn't enough for 16 consecutive
CPER errors, which will be sent to a guest with 4KiB page size on a
erroneous 64KiB host page. Note those 16 CPER errors will be contained
in one single error block, meaning all CPER errors should be identical
in terms of type and severity and all of them should be delivered in
one shot.

Increase GHES raw data maximal length from 1KiB to 4KiB so that the
error block has enough storage space for 16 consecutive CPER errors.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 docs/specs/acpi_hest_ghes.rst  | 6 +++---
 hw/acpi/generic_event_device.c | 2 ++
 hw/core/machine.c              | 1 +
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index aaf7b1ad11..b3ecb5856c 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -68,9 +68,9 @@ Design Details
     and N Read Ack Register entries. The size for each entry is 8-byte.
     The Error Status Data Block table contains N Error Status Data Block
     entries. The size for each entry is defined at the source code as
-    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
-    for the "etc/hardware_errors" fw_cfg blob is
-    (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
+    GHES_MAX_RAW_DATA_LENGTH, which is 4096 bytes in QEMU 10.2 or onwards,
+    but 1024 bytes in other cases. The total size of "etc/hardware_errors"
+    fw_cfg blob is (N * 8 * 2 + N * GHES_MAX_RAW_DATA_LENGTH) bytes.
     N is the number of the kinds of hardware error sources.
 
 (4) QEMU generates the ACPI linker/loader script for the firmware. The
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index e7b773d84d..30a5078878 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -364,6 +364,8 @@ static const Property acpi_ged_properties[] = {
                      TYPE_PCI_BUS, PCIBus *),
     DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState,
                      ghes_state.use_hest_addr, true),
+    DEFINE_PROP_UINT32("x-error-block-size", AcpiGedState,
+                       ghes_state.error_block_size, 4096),
 };
 
 static const VMStateDescription vmstate_memhp_state = {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 06e0c9a179..539793b11b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,6 +40,7 @@
 
 GlobalProperty hw_compat_10_1[] = {
     { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
+    { TYPE_ACPI_GED, "x-error-block-size", "1024" },
     { TYPE_VIRTIO_NET, "host_tunnel", "off" },
     { TYPE_VIRTIO_NET, "host_tunnel_csum", "off" },
     { TYPE_VIRTIO_NET, "guest_tunnel", "off" },
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (2 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs Gavin Shan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

Update HEST table since GHES raw data maximal length has been increased
from 1KiB to 4KiB.

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20240322 (64-bit version)
  * Copyright (c) 2000 - 2023 Intel Corporation
  *
- * Disassembly of tests/data/acpi/aarch64/virt/HEST
+ * Disassembly of /tmp/aml-28KMF3
  *
  * ACPI Data Table [HEST]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue (in hex)
  */

 [000h 0000 004h]                   Signature : "HEST"    [Hardware Error Source Table]
 [004h 0004 004h]                Table Length : 000000E0
 [008h 0008 001h]                    Revision : 01
-[009h 0009 001h]                    Checksum : 6C
+[009h 0009 001h]                    Checksum : 3C
 [00Ah 0010 006h]                      Oem ID : "BOCHS "
 [010h 0016 008h]                Oem Table ID : "BXPC    "
 [018h 0024 004h]                Oem Revision : 00000001
 [01Ch 0028 004h]             Asl Compiler ID : "BXPC"
 [020h 0032 004h]       Asl Compiler Revision : 00000001

 [024h 0036 004h]          Error Source Count : 00000002

 [028h 0040 002h]               Subtable Type : 000A [Generic Hardware Error Source V2]
 [02Ah 0042 002h]                   Source Id : 0000
 [02Ch 0044 002h]           Related Source Id : FFFF
 [02Eh 0046 001h]                    Reserved : 00
 [02Fh 0047 001h]                     Enabled : 01
 [030h 0048 004h]      Records To Preallocate : 00000001
 [034h 0052 004h]     Max Sections Per Record : 00000001
-[038h 0056 004h]         Max Raw Data Length : 00000400
+[038h 0056 004h]         Max Raw Data Length : 00001000

 [03Ch 0060 00Ch]        Error Status Address : [Generic Address Structure]
 [03Ch 0060 001h]                    Space ID : 00 [SystemMemory]
 [03Dh 0061 001h]                   Bit Width : 40
 [03Eh 0062 001h]                  Bit Offset : 00
 [03Fh 0063 001h]        Encoded Access Width : 04 [QWord Access:64]
 [040h 0064 008h]                     Address : 0000000043DA0000

 [048h 0072 01Ch]                      Notify : [Hardware Error Notification Structure]
 [048h 0072 001h]                 Notify Type : 08 [SEA]
 [049h 0073 001h]               Notify Length : 1C
 [04Ah 0074 002h]  Configuration Write Enable : 0000
 [04Ch 0076 004h]                PollInterval : 00000000
 [050h 0080 004h]                      Vector : 00000000
 [054h 0084 004h]     Polling Threshold Value : 00000000
 [058h 0088 004h]    Polling Threshold Window : 00000000
 [05Ch 0092 004h]       Error Threshold Value : 00000000
 [060h 0096 004h]      Error Threshold Window : 00000000

-[064h 0100 004h]   Error Status Block Length : 00000400
+[064h 0100 004h]   Error Status Block Length : 00001000
 [068h 0104 00Ch]           Read Ack Register : [Generic Address Structure]
 [068h 0104 001h]                    Space ID : 00 [SystemMemory]
 [069h 0105 001h]                   Bit Width : 40
 [06Ah 0106 001h]                  Bit Offset : 00
 [06Bh 0107 001h]        Encoded Access Width : 04 [QWord Access:64]
 [06Ch 0108 008h]                     Address : 0000000043DA0010

 [074h 0116 008h]           Read Ack Preserve : FFFFFFFFFFFFFFFE
 [07Ch 0124 008h]              Read Ack Write : 0000000000000001

 [084h 0132 002h]               Subtable Type : 000A [Generic Hardware Error Source V2]
 [086h 0134 002h]                   Source Id : 0001
 [088h 0136 002h]           Related Source Id : FFFF
 [08Ah 0138 001h]                    Reserved : 00
 [08Bh 0139 001h]                     Enabled : 01
 [08Ch 0140 004h]      Records To Preallocate : 00000001
 [090h 0144 004h]     Max Sections Per Record : 00000001
-[094h 0148 004h]         Max Raw Data Length : 00000400
+[094h 0148 004h]         Max Raw Data Length : 00001000

 [098h 0152 00Ch]        Error Status Address : [Generic Address Structure]
 [098h 0152 001h]                    Space ID : 00 [SystemMemory]
 [099h 0153 001h]                   Bit Width : 40
 [09Ah 0154 001h]                  Bit Offset : 00
 [09Bh 0155 001h]        Encoded Access Width : 04 [QWord Access:64]
 [09Ch 0156 008h]                     Address : 0000000043DA0008

 [0A4h 0164 01Ch]                      Notify : [Hardware Error Notification Structure]
 [0A4h 0164 001h]                 Notify Type : 07 [GPIO]
 [0A5h 0165 001h]               Notify Length : 1C
 [0A6h 0166 002h]  Configuration Write Enable : 0000
 [0A8h 0168 004h]                PollInterval : 00000000
 [0ACh 0172 004h]                      Vector : 00000000
 [0B0h 0176 004h]     Polling Threshold Value : 00000000
 [0B4h 0180 004h]    Polling Threshold Window : 00000000
 [0B8h 0184 004h]       Error Threshold Value : 00000000
 [0BCh 0188 004h]      Error Threshold Window : 00000000

-[0C0h 0192 004h]   Error Status Block Length : 00000400
+[0C0h 0192 004h]   Error Status Block Length : 00001000
 [0C4h 0196 00Ch]           Read Ack Register : [Generic Address Structure]
 [0C4h 0196 001h]                    Space ID : 00 [SystemMemory]
 [0C5h 0197 001h]                   Bit Width : 40
 [0C6h 0198 001h]                  Bit Offset : 00
 [0C7h 0199 001h]        Encoded Access Width : 04 [QWord Access:64]
 [0C8h 0200 008h]                     Address : 0000000043DA0018

 [0D0h 0208 008h]           Read Ack Preserve : FFFFFFFFFFFFFFFE
 [0D8h 0216 008h]              Read Ack Write : 0000000000000001

 Raw Table Data: Length 224 (0xE0)

-    0000: 48 45 53 54 E0 00 00 00 01 6C 42 4F 43 48 53 20  // HEST.....lBOCHS
+    0000: 48 45 53 54 E0 00 00 00 01 3C 42 4F 43 48 53 20  // HEST.....<BOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
     0020: 01 00 00 00 02 00 00 00 0A 00 00 00 FF FF 00 01  // ................
-    0030: 01 00 00 00 01 00 00 00 00 04 00 00 00 40 00 04  // .............@..
+    0030: 01 00 00 00 01 00 00 00 00 10 00 00 00 40 00 04  // .............@..
     0040: 00 00 DA 43 00 00 00 00 08 1C 00 00 00 00 00 00  // ...C............
     0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
-    0060: 00 00 00 00 00 04 00 00 00 40 00 04 10 00 DA 43  // .........@.....C
+    0060: 00 00 00 00 00 10 00 00 00 40 00 04 10 00 DA 43  // .........@.....C
     0070: 00 00 00 00 FE FF FF FF FF FF FF FF 01 00 00 00  // ................
     0080: 00 00 00 00 0A 00 01 00 FF FF 00 01 01 00 00 00  // ................
-    0090: 01 00 00 00 00 04 00 00 00 40 00 04 08 00 DA 43  // .........@.....C
+    0090: 01 00 00 00 00 10 00 00 00 40 00 04 08 00 DA 43  // .........@.....C
     00A0: 00 00 00 00 07 1C 00 00 00 00 00 00 00 00 00 00  // ................
     00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
-    00C0: 00 04 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
+    00C0: 00 10 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
     00D0: FE FF FF FF FF FF FF FF 01 00 00 00 00 00 00 00  // ................

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 tests/data/acpi/aarch64/virt/HEST           | Bin 224 -> 224 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/aarch64/virt/HEST b/tests/data/acpi/aarch64/virt/HEST
index 674272922db7d48f7821aa7c83ec76bb3b556d2a..247345901b9be41a86c51bcced59287f74f9e918 100644
GIT binary patch
delta 45
ycmaFB_<)hi!!<bM0RsaAqs>GveMW(aR@#gL6VpJfY1$kD3=9ko3@j28bHxD$GYS#_

delta 45
ycmaFB_<)hi!!<bM0RsaAW6ne_eMXjvR@#g#6VpJfY1$kt3=9ko3@j28bHxD&iwY6|

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 39901c58d6..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/aarch64/virt/HEST",
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (3 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

In the situation where host and guest has 64KiB and 4KiB page sizes,
one problematic host page affects 16 guest pages. we need to send 16
consective errors in this specific case.

Extend acpi_ghes_memory_errors() to support multiple CPERs after the
hunk of code to generate the GHES error status is pulled out from
ghes_gen_err_data_uncorrectable_recoverable(). The status field of
generic error status block is also updated accordingly if multiple
error data entries are contained in the generic error status block.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes-stub.c    |  2 +-
 hw/acpi/ghes.c         | 65 ++++++++++++++++++++++++------------------
 include/hw/acpi/ghes.h |  2 +-
 target/arm/kvm.c       |  4 ++-
 4 files changed, 43 insertions(+), 30 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 40f660c246..4faf573aeb 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -12,7 +12,7 @@
 #include "hw/acpi/ghes.h"
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t physical_address)
+                            uint64_t *addresses, uint32_t num_of_addresses)
 {
     return -1;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 06f75df43d..160eedcf09 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -21,6 +21,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
+#include "hw/registerfields.h"
 #include "hw/acpi/ghes.h"
 #include "hw/acpi/aml-build.h"
 #include "qemu/error-report.h"
@@ -54,8 +55,12 @@
 /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
 #define ACPI_GHES_MEM_CPER_LENGTH           80
 
-/* Masks for block_status flags */
-#define ACPI_GEBS_UNCORRECTABLE         1
+/* Bits for block_status flags */
+FIELD(ACPI_GEBS, UNCORRECTABLE, 0, 1)
+FIELD(ACPI_GEBS, CORRECTABLE, 1, 1)
+FIELD(ACPI_GEBS, MULTIPLE_UNCORRECTABLE, 2, 1)
+FIELD(ACPI_GEBS, MULTIPLE_CORRECTABLE, 3, 1)
+FIELD(ACPI_GEBS, ERROR_DATA_ENTRIES, 4, 10)
 
 /*
  * Total size for Generic Error Status Block except Generic Error Data Entries
@@ -209,26 +214,6 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
     build_append_int_noprefix(table, 0, 7);
 }
 
-static void
-ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
-                                            const uint8_t *section_type,
-                                            int data_length)
-{
-    /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
-     * Table 17-13 Generic Error Data Entry
-     */
-    QemuUUID fru_id = {};
-
-    /* Build the new generic error status block header */
-    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
-        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
-
-    /* Build this new generic error data entry header */
-    acpi_ghes_generic_error_data(block, section_type,
-        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
-        ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-}
-
 static inline uint32_t ghes_max_raw_data_length(AcpiGhesState *ags)
 {
     if (ags->error_block_size == 0) {
@@ -565,30 +550,56 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
 }
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t physical_address)
+                            uint64_t *addresses, uint32_t num_of_addresses)
 {
     /* Memory Error Section Type */
     const uint8_t guid[] =
           UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
                   0xED, 0x7C, 0x83, 0xB1);
+    /*
+     * invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
+     * Table 17-13 Generic Error Data Entry
+     */
+    QemuUUID fru_id = {};
     Error *errp = NULL;
     int data_length;
     GArray *block;
+    uint32_t block_status = 0, i;
 
     block = g_array_new(false, true /* clear */, 1);
 
-    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
+    data_length = num_of_addresses *
+                  (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
     /*
      * It should not run out of the preallocated memory if adding a new generic
      * error data entry
      */
     assert((data_length + ACPI_GHES_GESB_SIZE) <=
            ghes_max_raw_data_length(ags));
+    assert(num_of_addresses <=
+           FIELD_EX32(0xffffffff, ACPI_GEBS, ERROR_DATA_ENTRIES));
 
-    ghes_gen_err_data_uncorrectable_recoverable(block, guid, data_length);
+    /* Build the new generic error status block header */
+    block_status = FIELD_DP32(block_status, ACPI_GEBS, UNCORRECTABLE, 1);
+    block_status = FIELD_DP32(block_status, ACPI_GEBS, ERROR_DATA_ENTRIES,
+                              num_of_addresses);
+    if (num_of_addresses > 1) {
+        block_status = FIELD_DP32(block_status, ACPI_GEBS,
+                                  MULTIPLE_UNCORRECTABLE, 1);
+    }
+
+    acpi_ghes_generic_error_status(block, block_status, 0, 0,
+                                   data_length, ACPI_CPER_SEV_RECOVERABLE);
 
-    /* Build the memory section CPER for above new generic error data entry */
-    acpi_ghes_build_append_mem_cper(block, physical_address);
+    for (i = 0; i < num_of_addresses; i++) {
+        /* Build generic error data entries */
+        acpi_ghes_generic_error_data(block, guid,
+                                     ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+                                     ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
+
+        /* Memory section CPER on top of the generic error data entry */
+        acpi_ghes_build_append_mem_cper(block, addresses[i]);
+    }
 
     /* Report the error */
     ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index c98bd6d1e2..f7b084c039 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -100,7 +100,7 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t error_physical_addr);
+                            uint64_t *addresses, uint32_t num_of_addresses);
 void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
                              uint16_t source_id, Error **errp);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..459ca4a9b0 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2434,6 +2434,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
     ram_addr_t ram_addr;
     hwaddr paddr;
     AcpiGhesState *ags;
+    uint64_t addresses[16];
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
@@ -2454,10 +2455,11 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              * later from the main thread, so doing the injection of
              * the error would be more complicated.
              */
+            addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
                 if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                             paddr)) {
+                                             addresses, 1)) {
                     kvm_inject_arm_sea(c);
                 } else {
                     error_report("failed to record the error");
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets()
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (4 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

For one particular error (Error), we can't call error_setg() for twice.
Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
can return a error initialized by error_setg(). Without bailing on
this error, it can call into the second error_setg() due to the
unexpected value from the read acknowledgement register.

Bail early in ghes_record_cper_errors() when error is received from
get_ghes_source_offsets() to avoid the exception.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 hw/acpi/ghes.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 160eedcf09..d3d6c11197 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -436,7 +436,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
     *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
-static void get_ghes_source_offsets(uint16_t source_id,
+static bool get_ghes_source_offsets(uint16_t source_id,
                                     uint64_t hest_addr,
                                     uint64_t *cper_addr,
                                     uint64_t *read_ack_start_addr,
@@ -467,7 +467,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
         /* For now, we only know the size of GHESv2 table */
         if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
             error_setg(errp, "HEST: type %d not supported.", type);
-            return;
+            return false;
         }
 
         /* Compare CPER source ID at the GHESv2 structure */
@@ -481,7 +481,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
     }
     if (i == num_sources) {
         error_setg(errp, "HEST: Source %d not found.", source_id);
-        return;
+        return false;
     }
 
     /* Navigate through table address pointers */
@@ -501,6 +501,8 @@ static void get_ghes_source_offsets(uint16_t source_id,
     cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
                              sizeof(*read_ack_start_addr));
     *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
+
+    return true;
 }
 
 NotifierList acpi_generic_error_notifiers =
@@ -519,9 +521,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
     if (!ags->use_hest_addr) {
         get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
                              &cper_addr, &read_ack_register_addr);
-    } else {
-        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
-                                &cper_addr, &read_ack_register_addr, errp);
+    } else if (!get_ghes_source_offsets(source_id,
+                    le64_to_cpu(ags->hest_addr_le), &cper_addr,
+                    &read_ack_register_addr, errp)) {
+            return;
     }
 
     cpu_physical_memory_read(read_ack_register_addr,
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (5 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-13  7:41   ` Markus Armbruster
  2025-11-12 17:25 ` [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
  2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
  8 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

Use error_fatal in acpi_ghes_memory_errors() so that the caller
needn't explicitly terminate on errors. With error_fatal, a qemu
core dump won't be provided as it doesn't provide anything needed
by debugging.

There is no way to call ghes-stu.c::acpi_ghes_memory_errors(), an
abort() is put there as explicit marker. Besides, the return value
of acpi_ghes_memory_errors() is changed from 'int' to 'bool' as
the error indicator. ghes_record_cper_errors() also return a 'bool'
value for that, to be compatible to what is documented in error.h.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 hw/acpi/ghes-stub.c    |  7 ++++---
 hw/acpi/ghes.c         | 27 +++++++++++++--------------
 include/hw/acpi/ghes.h |  7 ++++---
 target/arm/kvm.c       | 10 +++-------
 4 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 4faf573aeb..fc7374b0a6 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,10 +11,11 @@
 #include "qemu/osdep.h"
 #include "hw/acpi/ghes.h"
 
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses)
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp)
 {
-    return -1;
+    abort();
 }
 
 AcpiGhesState *acpi_ghes_get_state(void)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index d3d6c11197..7160cf37d0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -508,14 +508,14 @@ static bool get_ghes_source_offsets(uint16_t source_id,
 NotifierList acpi_generic_error_notifiers =
     NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers);
 
-void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
+bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
                              uint16_t source_id, Error **errp)
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
 
     if (len > ghes_max_raw_data_length(ags)) {
         error_setg(errp, "GHES CPER record is too big: %zd", len);
-        return;
+        return false;
     }
 
     if (!ags->use_hest_addr) {
@@ -524,7 +524,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
     } else if (!get_ghes_source_offsets(source_id,
                     le64_to_cpu(ags->hest_addr_le), &cper_addr,
                     &read_ack_register_addr, errp)) {
-            return;
+            return false;
     }
 
     cpu_physical_memory_read(read_ack_register_addr,
@@ -535,7 +535,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
         error_setg(errp,
                    "OSPM does not acknowledge previous error,"
                    " so can not record CPER for current error anymore");
-        return;
+        return false;
     }
 
     read_ack_register = cpu_to_le64(0);
@@ -550,10 +550,13 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
     cpu_physical_memory_write(cper_addr, cper, len);
 
     notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
+
+    return true;
 }
 
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses)
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp)
 {
     /* Memory Error Section Type */
     const uint8_t guid[] =
@@ -564,10 +567,10 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
      * Table 17-13 Generic Error Data Entry
      */
     QemuUUID fru_id = {};
-    Error *errp = NULL;
     int data_length;
     GArray *block;
     uint32_t block_status = 0, i;
+    bool ret;
 
     block = g_array_new(false, true /* clear */, 1);
 
@@ -605,16 +608,12 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
     }
 
     /* Report the error */
-    ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
+    ret = ghes_record_cper_errors(ags, block->data, block->len,
+                                  source_id, errp);
 
     g_array_free(block, true);
 
-    if (errp) {
-        error_report_err(errp);
-        return -1;
-    }
-
-    return 0;
+    return ret;
 }
 
 AcpiGhesState *acpi_ghes_get_state(void)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index f7b084c039..c1f01ac25c 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -99,9 +99,10 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
                      const char *oem_id, const char *oem_table_id);
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses);
-void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp);
+bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
                              uint16_t source_id, Error **errp);
 
 /**
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 459ca4a9b0..b8c3ad2ad9 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
             addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
-                if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                             addresses, 1)) {
-                    kvm_inject_arm_sea(c);
-                } else {
-                    error_report("failed to record the error");
-                    abort();
-                }
+                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
+                                        addresses, 1, &error_fatal);
+                kvm_inject_arm_sea(c);
             }
             return;
         }
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
  2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-13  7:41   ` Markus Armbruster
  2025-11-14  9:46     ` Gavin Shan
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2025-11-13  7:41 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, jonathan.cameron, mchehab+huawei,
	gengdongjiu1, mst, imammedo, anisinha, eduardo, marcel.apfelbaum,
	philmd, wangyanan55, zhao1.liu, peter.maydell, pbonzini,
	shan.gavin

Gavin Shan <gshan@redhat.com> writes:

> Use error_fatal in acpi_ghes_memory_errors() so that the caller
> needn't explicitly terminate on errors. With error_fatal, a qemu
> core dump won't be provided as it doesn't provide anything needed
> by debugging.
>
> There is no way to call ghes-stu.c::acpi_ghes_memory_errors(), an
> abort() is put there as explicit marker. Besides, the return value
> of acpi_ghes_memory_errors() is changed from 'int' to 'bool' as
> the error indicator. ghes_record_cper_errors() also return a 'bool'
> value for that, to be compatible to what is documented in error.h.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

This commit does a number of things:

1. Change abort() to exit() on error, and drop the extra error message
"failed to record the error".

2. Use &error_fatal to separate concerns and simplify the caller.  This
item covers both the new Error ** parameter and the returning bool
instead of void.

3. Make the unreachable stub abort().

The commit message could use polish to make the three distinct things
more clear.

In general, patches that do just one thing are easier to understand and
describe (in the commit message) than patches that do multiple things.
That said, a single commit feels okay in this case.  Up to you.

> ---
>  hw/acpi/ghes-stub.c    |  7 ++++---
>  hw/acpi/ghes.c         | 27 +++++++++++++--------------
>  include/hw/acpi/ghes.h |  7 ++++---
>  target/arm/kvm.c       | 10 +++-------
>  4 files changed, 24 insertions(+), 27 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index 4faf573aeb..fc7374b0a6 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,10 +11,11 @@
>  #include "qemu/osdep.h"
>  #include "hw/acpi/ghes.h"
>  
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> -                            uint64_t *addresses, uint32_t num_of_addresses)
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> +                             uint64_t *addresses, uint32_t num_of_addresses,
> +                             Error **errp)
>  {
> -    return -1;
> +    abort();
>  }
>  
>  AcpiGhesState *acpi_ghes_get_state(void)
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index d3d6c11197..7160cf37d0 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -508,14 +508,14 @@ static bool get_ghes_source_offsets(uint16_t source_id,
>  NotifierList acpi_generic_error_notifiers =
>      NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers);
>  
> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>                               uint16_t source_id, Error **errp)
>  {
>      uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
>  
>      if (len > ghes_max_raw_data_length(ags)) {
>          error_setg(errp, "GHES CPER record is too big: %zd", len);
> -        return;
> +        return false;
>      }
>  
>      if (!ags->use_hest_addr) {
> @@ -524,7 +524,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>      } else if (!get_ghes_source_offsets(source_id,
>                      le64_to_cpu(ags->hest_addr_le), &cper_addr,
>                      &read_ack_register_addr, errp)) {
> -            return;
> +            return false;
>      }
>  
>      cpu_physical_memory_read(read_ack_register_addr,
> @@ -535,7 +535,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>          error_setg(errp,
>                     "OSPM does not acknowledge previous error,"
>                     " so can not record CPER for current error anymore");
> -        return;
> +        return false;
>      }
>  
>      read_ack_register = cpu_to_le64(0);
> @@ -550,10 +550,13 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>      cpu_physical_memory_write(cper_addr, cper, len);
>  
>      notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
> +
> +    return true;
>  }
>  
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> -                            uint64_t *addresses, uint32_t num_of_addresses)
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> +                             uint64_t *addresses, uint32_t num_of_addresses,
> +                             Error **errp)
>  {
>      /* Memory Error Section Type */
>      const uint8_t guid[] =
> @@ -564,10 +567,10 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>       * Table 17-13 Generic Error Data Entry
>       */
>      QemuUUID fru_id = {};
> -    Error *errp = NULL;
>      int data_length;
>      GArray *block;
>      uint32_t block_status = 0, i;
> +    bool ret;
>  
>      block = g_array_new(false, true /* clear */, 1);
>  
> @@ -605,16 +608,12 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>      }
>  
>      /* Report the error */

This comment is now stale.  I'd simply drop it.

> -    ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
> +    ret = ghes_record_cper_errors(ags, block->data, block->len,
> +                                  source_id, errp);
>  
>      g_array_free(block, true);
>  
> -    if (errp) {
> -        error_report_err(errp);
> -        return -1;
> -    }
> -
> -    return 0;
> +    return ret;

I figure you could use g_autoptr() to simplify this further.  Something
along the lines of

       g_autoptr(GArray) block = g_array_new(false, true, 1);

       [...]

       return ghes_record_cper_errors(ags, block->data, block->len,
                                      source_id, errp);

>  }
>  
>  AcpiGhesState *acpi_ghes_get_state(void)
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index f7b084c039..c1f01ac25c 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -99,9 +99,10 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
>                       const char *oem_id, const char *oem_table_id);
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>                            GArray *hardware_errors);
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> -                            uint64_t *addresses, uint32_t num_of_addresses);
> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> +                             uint64_t *addresses, uint32_t num_of_addresses,
> +                             Error **errp);
> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>                               uint16_t source_id, Error **errp);
>  
>  /**
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 459ca4a9b0..b8c3ad2ad9 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>              addresses[0] = paddr;
>              if (code == BUS_MCEERR_AR) {
>                  kvm_cpu_synchronize_state(c);
> -                if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> -                                             addresses, 1)) {
> -                    kvm_inject_arm_sea(c);
> -                } else {
> -                    error_report("failed to record the error");
> -                    abort();
> -                }
> +                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> +                                        addresses, 1, &error_fatal);
> +                kvm_inject_arm_sea(c);
>              }
>              return;
>          }

Readability improves nicely here.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
  2025-11-13  7:41   ` Markus Armbruster
@ 2025-11-14  9:46     ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-14  9:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-arm, qemu-devel, jonathan.cameron, mchehab+huawei,
	gengdongjiu1, mst, imammedo, anisinha, eduardo, marcel.apfelbaum,
	philmd, wangyanan55, zhao1.liu, peter.maydell, pbonzini,
	shan.gavin

Hi Markus,

On 11/13/25 5:41 PM, Markus Armbruster wrote:
> Gavin Shan <gshan@redhat.com> writes:
> 
>> Use error_fatal in acpi_ghes_memory_errors() so that the caller
>> needn't explicitly terminate on errors. With error_fatal, a qemu
>> core dump won't be provided as it doesn't provide anything needed
>> by debugging.
>>
>> There is no way to call ghes-stu.c::acpi_ghes_memory_errors(), an
>> abort() is put there as explicit marker. Besides, the return value
>> of acpi_ghes_memory_errors() is changed from 'int' to 'bool' as
>> the error indicator. ghes_record_cper_errors() also return a 'bool'
>> value for that, to be compatible to what is documented in error.h.
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Suggested-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
> This commit does a number of things:
> 
> 1. Change abort() to exit() on error, and drop the extra error message
> "failed to record the error".
> 
> 2. Use &error_fatal to separate concerns and simplify the caller.  This
> item covers both the new Error ** parameter and the returning bool
> instead of void.
> 
> 3. Make the unreachable stub abort().
> 
> The commit message could use polish to make the three distinct things
> more clear.
> 
> In general, patches that do just one thing are easier to understand and
> describe (in the commit message) than patches that do multiple things.
> That said, a single commit feels okay in this case.  Up to you.
> 

Yes, it would be nice to do one thing per patch. I will split this into
3 patches as you suggested in next revision. I would wait a while to see
if Igor has more comments prior to next revision.

>> ---
>>   hw/acpi/ghes-stub.c    |  7 ++++---
>>   hw/acpi/ghes.c         | 27 +++++++++++++--------------
>>   include/hw/acpi/ghes.h |  7 ++++---
>>   target/arm/kvm.c       | 10 +++-------
>>   4 files changed, 24 insertions(+), 27 deletions(-)
>>
>> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
>> index 4faf573aeb..fc7374b0a6 100644
>> --- a/hw/acpi/ghes-stub.c
>> +++ b/hw/acpi/ghes-stub.c
>> @@ -11,10 +11,11 @@
>>   #include "qemu/osdep.h"
>>   #include "hw/acpi/ghes.h"
>>   
>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> -                            uint64_t *addresses, uint32_t num_of_addresses)
>> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> +                             uint64_t *addresses, uint32_t num_of_addresses,
>> +                             Error **errp)
>>   {
>> -    return -1;
>> +    abort();
>>   }
>>   
>>   AcpiGhesState *acpi_ghes_get_state(void)
>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
>> index d3d6c11197..7160cf37d0 100644
>> --- a/hw/acpi/ghes.c
>> +++ b/hw/acpi/ghes.c
>> @@ -508,14 +508,14 @@ static bool get_ghes_source_offsets(uint16_t source_id,
>>   NotifierList acpi_generic_error_notifiers =
>>       NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers);
>>   
>> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>                                uint16_t source_id, Error **errp)
>>   {
>>       uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
>>   
>>       if (len > ghes_max_raw_data_length(ags)) {
>>           error_setg(errp, "GHES CPER record is too big: %zd", len);
>> -        return;
>> +        return false;
>>       }
>>   
>>       if (!ags->use_hest_addr) {
>> @@ -524,7 +524,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>       } else if (!get_ghes_source_offsets(source_id,
>>                       le64_to_cpu(ags->hest_addr_le), &cper_addr,
>>                       &read_ack_register_addr, errp)) {
>> -            return;
>> +            return false;
>>       }
>>   
>>       cpu_physical_memory_read(read_ack_register_addr,
>> @@ -535,7 +535,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>           error_setg(errp,
>>                      "OSPM does not acknowledge previous error,"
>>                      " so can not record CPER for current error anymore");
>> -        return;
>> +        return false;
>>       }
>>   
>>       read_ack_register = cpu_to_le64(0);
>> @@ -550,10 +550,13 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>       cpu_physical_memory_write(cper_addr, cper, len);
>>   
>>       notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
>> +
>> +    return true;
>>   }
>>   
>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> -                            uint64_t *addresses, uint32_t num_of_addresses)
>> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> +                             uint64_t *addresses, uint32_t num_of_addresses,
>> +                             Error **errp)
>>   {
>>       /* Memory Error Section Type */
>>       const uint8_t guid[] =
>> @@ -564,10 +567,10 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>        * Table 17-13 Generic Error Data Entry
>>        */
>>       QemuUUID fru_id = {};
>> -    Error *errp = NULL;
>>       int data_length;
>>       GArray *block;
>>       uint32_t block_status = 0, i;
>> +    bool ret;
>>   
>>       block = g_array_new(false, true /* clear */, 1);
>>   
>> @@ -605,16 +608,12 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>       }
>>   
>>       /* Report the error */
> 
> This comment is now stale.  I'd simply drop it.
> 

Indeed, thanks.

>> -    ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
>> +    ret = ghes_record_cper_errors(ags, block->data, block->len,
>> +                                  source_id, errp);
>>   
>>       g_array_free(block, true);
>>   
>> -    if (errp) {
>> -        error_report_err(errp);
>> -        return -1;
>> -    }
>> -
>> -    return 0;
>> +    return ret;
> 
> I figure you could use g_autoptr() to simplify this further.  Something
> along the lines of
> 
>         g_autoptr(GArray) block = g_array_new(false, true, 1);
> 
>         [...]
> 
>         return ghes_record_cper_errors(ags, block->data, block->len,
>                                        source_id, errp);
> 

Yes. It seems the pattern g_autoptr(GArray) isn't widely used in QEMU yet.
I will have one separate patch for this before the patches improving the
error (Error instead of memory error) handling in next revision.


>>   }
>>   
>>   AcpiGhesState *acpi_ghes_get_state(void)
>> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
>> index f7b084c039..c1f01ac25c 100644
>> --- a/include/hw/acpi/ghes.h
>> +++ b/include/hw/acpi/ghes.h
>> @@ -99,9 +99,10 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
>>                        const char *oem_id, const char *oem_table_id);
>>   void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>>                             GArray *hardware_errors);
>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> -                            uint64_t *addresses, uint32_t num_of_addresses);
>> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>> +                             uint64_t *addresses, uint32_t num_of_addresses,
>> +                             Error **errp);
>> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>                                uint16_t source_id, Error **errp);
>>   
>>   /**
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 459ca4a9b0..b8c3ad2ad9 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>               addresses[0] = paddr;
>>               if (code == BUS_MCEERR_AR) {
>>                   kvm_cpu_synchronize_state(c);
>> -                if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>> -                                             addresses, 1)) {
>> -                    kvm_inject_arm_sea(c);
>> -                } else {
>> -                    error_report("failed to record the error");
>> -                    abort();
>> -                }
>> +                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>> +                                        addresses, 1, &error_fatal);
>> +                kvm_inject_arm_sea(c);
>>               }
>>               return;
>>           }
> 
> Readability improves nicely here.
> 

Yes :-)

Thanks,
Gavin




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (6 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-12 17:25 ` Gavin Shan
  2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
  8 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-12 17:25 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, armbru, anisinha, eduardo, marcel.apfelbaum, philmd,
	wangyanan55, zhao1.liu, peter.maydell, pbonzini, shan.gavin

In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages that can be owned by different threads.
It means 16x memory errors can be raised at once due to the parallel
accesses to those 16x guest pages on the guest. Unfortunately, QEMU
can't deliver them one by one because we just have one GHES error block,
corresponding to one read acknowledgement register. It can eventually
cause QEMU crash dump due to the contention on that register, meaning
the current memory error can't be delivered before the previous error
isn't acknowledged.

Move the logics of sending ACPI GHES memory errors and injects SEA
exception to a newly added helper push_ghes_memory_errors(). Improve
push_ghes_memory_errors() to push 16x consecutive memory errors if
needed to avoid the contention on the read acknowledgement register,
thus the exceptional termination on QEMU.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 target/arm/kvm.c | 70 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 64 insertions(+), 6 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b8c3ad2ad9..3da97664eb 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include <sys/ioctl.h>
 
 #include <linux/kvm.h>
@@ -2429,12 +2430,73 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
     return ret;
 }
 
+static bool push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
+                                    uint64_t paddr, Error **errp)
+{
+    uint64_t val, start, end, guest_pgsz, host_pgsz;
+    uint64_t addresses[16];
+    uint32_t num_of_addresses;
+    int err;
+    bool ret;
+
+    /*
+     * Sort out the guest page size from TCR_EL1, which can be modified
+     * by the guest from time to time. So we have to sort it out dynamically.
+     */
+    err = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
+    if (err) {
+        error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", err);
+        return false;
+    }
+
+    switch (extract64(val, 14, 2)) {
+    case 0:
+        guest_pgsz = 4 * KiB;
+        break;
+    case 1:
+        guest_pgsz = 64 * KiB;
+        break;
+    case 2:
+        guest_pgsz = 16 * KiB;
+        break;
+    default:
+        error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", val);
+        return false;
+    }
+
+    host_pgsz = qemu_real_host_page_size();
+    start = paddr & ~(host_pgsz - 1);
+    end = start + host_pgsz;
+    num_of_addresses = 0;
+
+    while (start < end) {
+        /*
+         * The precise physical address is provided for the affected
+         * guest page that contains @paddr. Otherwise, the starting
+         * address of the guest page is provided.
+         */
+        if (paddr >= start && paddr < (start + guest_pgsz)) {
+            addresses[num_of_addresses++] = paddr;
+        } else {
+            addresses[num_of_addresses++] = start;
+        }
+
+        start += guest_pgsz;
+    }
+
+    kvm_cpu_synchronize_state(c);
+    ret = acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
+                                  addresses, num_of_addresses, errp);
+    kvm_inject_arm_sea(c);
+
+    return ret;
+}
+
 void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
     hwaddr paddr;
     AcpiGhesState *ags;
-    uint64_t addresses[16];
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
@@ -2455,12 +2517,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              * later from the main thread, so doing the injection of
              * the error would be more complicated.
              */
-            addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
-                kvm_cpu_synchronize_state(c);
-                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                        addresses, 1, &error_fatal);
-                kvm_inject_arm_sea(c);
+                push_ghes_memory_errors(c, ags, paddr, &error_fatal);
             }
             return;
         }
-- 
2.51.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
  2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (7 preceding siblings ...)
  2025-11-12 17:25 ` [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
@ 2025-11-18 10:47 ` Jonathan Cameron via
  2025-11-18 10:54   ` Mauro Carvalho Chehab
  2025-11-21  6:51   ` Gavin Shan
  8 siblings, 2 replies; 15+ messages in thread
From: Jonathan Cameron via @ 2025-11-18 10:47 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	armbru, anisinha, eduardo, marcel.apfelbaum, philmd, wangyanan55,
	zhao1.liu, peter.maydell, pbonzini, shan.gavin, zhangliang5

On Thu, 13 Nov 2025 03:25:27 +1000
Gavin Shan <gshan@redhat.com> wrote:

> In the combination of 64KiB host and 4KiB guest, a problematic host
> page affects 16x guest pages. Those 16x guest pages are most likely
> owned by separate threads and accessed by the threads in parallel.
> It means 16x memory errors can be raised at once. However, we're
> unable to handle this situation because the only error source has
> one read acknowledgement register in current design. QEMU has to
> crash in the following path due to the previously delivered error
> isn't acknowledged by the guest on attempt to deliver another error.
> 
>   kvm_vcpu_thread_fn
>     kvm_cpu_exec
>       kvm_arch_on_sigbus_vcpu
>         kvm_cpu_synchronize_state
>         acpi_ghes_memory_errors
>         abort
> 
> This series fixes the issue by sending 16x consective CPER errors
> which are contained in a single GHES error block.
> 
> PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
> PATCH[5]   Supports multiple error records in a single error block
> PATCH[6-7] Improves the error handling in the error delivery path
> PATCH[8]   Sends 16x consective CPERs in a single block if needed
> 

Hi Gavin,

Just a quick head's up to say we've had some internal discussions around the
kernel handling of broader address masks in CPER and think it is probably
broken. Rectifying that may at least simplify what is needed on the QEMU side
of things and maybe even handle much larger blocks (2M and larger).

Will keep everyone informed of how we get on with resolving that.

Jonathan



> Changelog
> =========
> v4:
>   * v3: https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00264.html
>   * Pick r-b tags from Jonathan
>   * Add compat property 'x-error-block-size' for migration     (Igor)
>   * Code and commit log improvements                           (Igor/Jonathan)
>   * Use error_fatal in the memory error delivery path          (Markus)
>   * Use APIs from registerfields.h                             (Philippe)
> v3:
>   * v2: https://lists.nongnu.org/archive/html/qemu-arm/2025-10/msg00372.html
>   * Code and changelog improvements                            (Jonathan)
>   * Fixed GHES error block status field and improved error
>     handling in the error delivery path                        (Igor)
>   * Fixed ACPI HEST table and document                         (Mauro)
> v2:
>   * v1: https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg00897.html
>   * Send 16x memory errors for the specific case               (Jonathan)
> 
> Gavin Shan (8):
>   acpi/ghes: Make GHES max raw data length dynamic
>   tests/qtest/bios-tables-test: Prepare for changes in the HEST table
>   acpi/ghes: Increase GHES raw data maximal length to 4KiB
>   tests/qtest/bios-tables-test: Update HEST table
>   acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs
>   acpi/ghes: Bail early on error from get_ghes_source_offsets()
>   acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>   target/arm/kvm: Support multiple memory CPERs injection
> 
>  docs/specs/acpi_hest_ghes.rst     |   6 +-
>  hw/acpi/generic_event_device.c    |   2 +
>  hw/acpi/ghes-stub.c               |   7 +-
>  hw/acpi/ghes.c                    | 127 +++++++++++++++++-------------
>  hw/core/machine.c                 |   1 +
>  include/hw/acpi/ghes.h            |   8 +-
>  target/arm/kvm.c                  |  72 +++++++++++++++--
>  tests/data/acpi/aarch64/virt/HEST | Bin 224 -> 224 bytes
>  8 files changed, 153 insertions(+), 70 deletions(-)
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
  2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
@ 2025-11-18 10:54   ` Mauro Carvalho Chehab
  2025-11-21  6:54     ` Gavin Shan
  2025-11-21  6:51   ` Gavin Shan
  1 sibling, 1 reply; 15+ messages in thread
From: Mauro Carvalho Chehab @ 2025-11-18 10:54 UTC (permalink / raw)
  To: Jonathan Cameron, Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	armbru, anisinha, eduardo, marcel.apfelbaum, philmd, wangyanan55,
	zhao1.liu, peter.maydell, pbonzini, shan.gavin, zhangliang5

On Tue, Nov 18, 2025 at 10:47:55AM +0000, Jonathan Cameron wrote:
> On Thu, 13 Nov 2025 03:25:27 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
> > In the combination of 64KiB host and 4KiB guest, a problematic host
> > page affects 16x guest pages. Those 16x guest pages are most likely
> > owned by separate threads and accessed by the threads in parallel.
> > It means 16x memory errors can be raised at once. However, we're
> > unable to handle this situation because the only error source has
> > one read acknowledgement register in current design. QEMU has to
> > crash in the following path due to the previously delivered error
> > isn't acknowledged by the guest on attempt to deliver another error.
> > 
> >   kvm_vcpu_thread_fn
> >     kvm_cpu_exec
> >       kvm_arch_on_sigbus_vcpu
> >         kvm_cpu_synchronize_state
> >         acpi_ghes_memory_errors
> >         abort
> > 
> > This series fixes the issue by sending 16x consective CPER errors
> > which are contained in a single GHES error block.
> > 
> > PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
> > PATCH[5]   Supports multiple error records in a single error block
> > PATCH[6-7] Improves the error handling in the error delivery path
> > PATCH[8]   Sends 16x consective CPERs in a single block if needed
> > 
> 
> Hi Gavin,
> 
> Just a quick head's up to say we've had some internal discussions around the
> kernel handling of broader address masks in CPER and think it is probably
> broken. Rectifying that may at least simplify what is needed on the QEMU side
> of things and maybe even handle much larger blocks (2M and larger).

Btw, I just added a logic at rasdaemon to catch SIGBUS errors:
https://github.com/mchehab/rasdaemon/pull/199

But so far, I didn't find a proper way to check such code.

Jonathan/Gavin,

Do you know a good way for us to check how the mm SEA notification
is handled with QEMU?

Regards,
Mauro


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
  2025-11-18 10:54   ` Mauro Carvalho Chehab
@ 2025-11-21  6:54     ` Gavin Shan
  0 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-21  6:54 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jonathan Cameron
  Cc: qemu-arm, qemu-devel, gengdongjiu1, mst, imammedo, armbru,
	anisinha, eduardo, marcel.apfelbaum, philmd, wangyanan55,
	zhao1.liu, peter.maydell, pbonzini, shan.gavin, zhangliang5

Hi Mauro,

On 11/18/25 8:54 PM, Mauro Carvalho Chehab wrote:
> On Tue, Nov 18, 2025 at 10:47:55AM +0000, Jonathan Cameron wrote:
>> On Thu, 13 Nov 2025 03:25:27 +1000
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> In the combination of 64KiB host and 4KiB guest, a problematic host
>>> page affects 16x guest pages. Those 16x guest pages are most likely
>>> owned by separate threads and accessed by the threads in parallel.
>>> It means 16x memory errors can be raised at once. However, we're
>>> unable to handle this situation because the only error source has
>>> one read acknowledgement register in current design. QEMU has to
>>> crash in the following path due to the previously delivered error
>>> isn't acknowledged by the guest on attempt to deliver another error.
>>>
>>>    kvm_vcpu_thread_fn
>>>      kvm_cpu_exec
>>>        kvm_arch_on_sigbus_vcpu
>>>          kvm_cpu_synchronize_state
>>>          acpi_ghes_memory_errors
>>>          abort
>>>
>>> This series fixes the issue by sending 16x consective CPER errors
>>> which are contained in a single GHES error block.
>>>
>>> PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
>>> PATCH[5]   Supports multiple error records in a single error block
>>> PATCH[6-7] Improves the error handling in the error delivery path
>>> PATCH[8]   Sends 16x consective CPERs in a single block if needed
>>>
>>
>> Hi Gavin,
>>
>> Just a quick head's up to say we've had some internal discussions around the
>> kernel handling of broader address masks in CPER and think it is probably
>> broken. Rectifying that may at least simplify what is needed on the QEMU side
>> of things and maybe even handle much larger blocks (2M and larger).
> 
> Btw, I just added a logic at rasdaemon to catch SIGBUS errors:
> https://github.com/mchehab/rasdaemon/pull/199
> 
> But so far, I didn't find a proper way to check such code.
> 
> Jonathan/Gavin,
> 
> Do you know a good way for us to check how the mm SEA notification
> is handled with QEMU?
> 

Sorry that I'm not familiar with rasdaemon. Could you please provide more
contexts about your question?

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
  2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
  2025-11-18 10:54   ` Mauro Carvalho Chehab
@ 2025-11-21  6:51   ` Gavin Shan
  1 sibling, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2025-11-21  6:51 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	armbru, anisinha, eduardo, marcel.apfelbaum, philmd, wangyanan55,
	zhao1.liu, peter.maydell, pbonzini, shan.gavin, zhangliang5

Hi Jonathan,

On 11/18/25 8:47 PM, Jonathan Cameron wrote:
> On Thu, 13 Nov 2025 03:25:27 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> In the combination of 64KiB host and 4KiB guest, a problematic host
>> page affects 16x guest pages. Those 16x guest pages are most likely
>> owned by separate threads and accessed by the threads in parallel.
>> It means 16x memory errors can be raised at once. However, we're
>> unable to handle this situation because the only error source has
>> one read acknowledgement register in current design. QEMU has to
>> crash in the following path due to the previously delivered error
>> isn't acknowledged by the guest on attempt to deliver another error.
>>
>>    kvm_vcpu_thread_fn
>>      kvm_cpu_exec
>>        kvm_arch_on_sigbus_vcpu
>>          kvm_cpu_synchronize_state
>>          acpi_ghes_memory_errors
>>          abort
>>
>> This series fixes the issue by sending 16x consective CPER errors
>> which are contained in a single GHES error block.
>>
>> PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
>> PATCH[5]   Supports multiple error records in a single error block
>> PATCH[6-7] Improves the error handling in the error delivery path
>> PATCH[8]   Sends 16x consective CPERs in a single block if needed
>>
> 
> Hi Gavin,
> 
> Just a quick head's up to say we've had some internal discussions around the
> kernel handling of broader address masks in CPER and think it is probably
> broken. Rectifying that may at least simplify what is needed on the QEMU side
> of things and maybe even handle much larger blocks (2M and larger).
> 
> Will keep everyone informed of how we get on with resolving that.
> 

Thanks, Jonathan. If the broader address mask in CPER can be used to isolate
the specified memory range instead of a page, QEMU needn't the improvement
done in this series. Please copy me if the linux patches are going to be
sent for review if possible, I will try to review.

I will pull those patches improving error handling and post them separately
so that they can be merged. Those patches aren't really relevant to error handling.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-11-21  7:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
2025-11-12 17:25 ` [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-12 17:25 ` [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs Gavin Shan
2025-11-12 17:25 ` [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-13  7:41   ` Markus Armbruster
2025-11-14  9:46     ` Gavin Shan
2025-11-12 17:25 ` [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
2025-11-18 10:54   ` Mauro Carvalho Chehab
2025-11-21  6:54     ` Gavin Shan
2025-11-21  6:51   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).