qemu-arm.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] target/arm/kvm: Improve memory error handling
@ 2025-11-05 11:44 Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages. Those 16x guest pages are most likely
owned by separate threads and accessed by the threads in parallel.
It means 16x memory errors can be raised at once. However, we're
unable to handle this situation because the only error source has
one read acknowledgement register in current design. QEMU has to
crash in the following path due to the previously delivered error
isn't acknowledged by the guest on attempt to deliver another error.

  kvm_vcpu_thread_fn
    kvm_cpu_exec
      kvm_arch_on_sigbus_vcpu
        kvm_cpu_synchronize_state
        acpi_ghes_memory_errors
        abort

This series fixes the issue by sending 16x consective CPER errors
which are contained in a single GHES error block.

PATCH[1-3] Increases GHES raw data maximal length from 1KiB to 4KiB
PATCH[4]   Supports multiple error records in a single error block
PATCH[5-6] Improves the error handling in the error delivery path
PATCH[7]   Introduces helper push_ghes_memory_errors()
PATCH[8]   Delivers 16x consective CPERs in a single error block

Changelog
=========
v3:
  * v2: https://lists.nongnu.org/archive/html/qemu-arm/2025-10/msg00372.html
  * Code and changelog improvements                            (Jonathan)
  * Fixed GHES error block status field and improved error
    handling in the error delivery path                        (Igor)
  * Fixed ACPI HEST table and document                         (Mauro)
v2:
  * v1: https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg00897.html
  * Send 16x memory errors for the specific case               (Jonathan)

Gavin Shan (8):
  tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  acpi/ghes: Increase GHES raw data maximal length to 4KiB
  tests/qtest/bios-tables-test: Update HEST table
  acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
  acpi/ghes: Bail early on error from get_ghes_source_offsets()
  acpi/ghes: Use error_abort in acpi_ghes_memory_errors()
  kvm/arm/kvm: Introduce helper push_ghes_memory_errors()
  target/arm/kvm: Support multiple memory CPERs injection

 docs/specs/acpi_hest_ghes.rst     |   2 +-
 hw/acpi/ghes-stub.c               |   6 +--
 hw/acpi/ghes.c                    |  78 +++++++++++++++---------------
 include/hw/acpi/ghes.h            |   5 +-
 target/arm/kvm.c                  |  69 +++++++++++++++++++++++---
 tests/data/acpi/aarch64/virt/HEST | Bin 224 -> 224 bytes
 6 files changed, 108 insertions(+), 52 deletions(-)

-- 
2.51.0



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:16   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

Adds HEST blob to the whilelist, preparing for the increased GHES raw
data maximal length (ACPI_GHES_MAX_RAW_DATA_LENGTH) from 1KiB to 4KiB,
because we need to send 16 consective errors in case a problematic 64KiB
host page affects 16 4KiB guest pages.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..39901c58d6 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/aarch64/virt/HEST",
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:16   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

The current GHES raw data maximal length isn't enough for 16 consecutive
CPER errors, which will be sent to a guest with 4KiB page size on a
erroneous 64KiB host page. Note those 16 CPER errors will be contained
in one single error block, meaning all CPER errors should be identical
in terms of type and severity and all of them should be delivered in
one shot.

Increase GHES raw data maximal length from 1KiB to 4KiB so that the
error block has enough storage space for 16 consecutive CPER errors.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 docs/specs/acpi_hest_ghes.rst | 2 +-
 hw/acpi/ghes.c                | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index aaf7b1ad11..acf31d6eeb 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -68,7 +68,7 @@ Design Details
     and N Read Ack Register entries. The size for each entry is 8-byte.
     The Error Status Data Block table contains N Error Status Data Block
     entries. The size for each entry is defined at the source code as
-    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 4096 bytes). The total size
     for the "etc/hardware_errors" fw_cfg blob is
     (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
     N is the number of the kinds of hardware error sources.
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 06555905ce..a9c08e73c0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -33,7 +33,7 @@
 #define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
 
 /* The max size in bytes for one error block */
-#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
+#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (4 * KiB)
 
 /* Generic Hardware Error Source version 2 */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:17   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

Update HEST table since GHES raw data maximal length has been increased
from 1KiB to 4KiB.

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20240322 (64-bit version)
  * Copyright (c) 2000 - 2023 Intel Corporation
  *
- * Disassembly of tests/data/acpi/aarch64/virt/HEST
+ * Disassembly of /tmp/aml-28KMF3
  *
  * ACPI Data Table [HEST]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue (in hex)
  */

 [000h 0000 004h]                   Signature : "HEST"    [Hardware Error Source Table]
 [004h 0004 004h]                Table Length : 000000E0
 [008h 0008 001h]                    Revision : 01
-[009h 0009 001h]                    Checksum : 6C
+[009h 0009 001h]                    Checksum : 3C
 [00Ah 0010 006h]                      Oem ID : "BOCHS "
 [010h 0016 008h]                Oem Table ID : "BXPC    "
 [018h 0024 004h]                Oem Revision : 00000001
 [01Ch 0028 004h]             Asl Compiler ID : "BXPC"
 [020h 0032 004h]       Asl Compiler Revision : 00000001

 [024h 0036 004h]          Error Source Count : 00000002

 [028h 0040 002h]               Subtable Type : 000A [Generic Hardware Error Source V2]
 [02Ah 0042 002h]                   Source Id : 0000
 [02Ch 0044 002h]           Related Source Id : FFFF
 [02Eh 0046 001h]                    Reserved : 00
 [02Fh 0047 001h]                     Enabled : 01
 [030h 0048 004h]      Records To Preallocate : 00000001
 [034h 0052 004h]     Max Sections Per Record : 00000001
-[038h 0056 004h]         Max Raw Data Length : 00000400
+[038h 0056 004h]         Max Raw Data Length : 00001000

 [03Ch 0060 00Ch]        Error Status Address : [Generic Address Structure]
 [03Ch 0060 001h]                    Space ID : 00 [SystemMemory]
 [03Dh 0061 001h]                   Bit Width : 40
 [03Eh 0062 001h]                  Bit Offset : 00
 [03Fh 0063 001h]        Encoded Access Width : 04 [QWord Access:64]
 [040h 0064 008h]                     Address : 0000000043DA0000

 [048h 0072 01Ch]                      Notify : [Hardware Error Notification Structure]
 [048h 0072 001h]                 Notify Type : 08 [SEA]
 [049h 0073 001h]               Notify Length : 1C
 [04Ah 0074 002h]  Configuration Write Enable : 0000
 [04Ch 0076 004h]                PollInterval : 00000000
 [050h 0080 004h]                      Vector : 00000000
 [054h 0084 004h]     Polling Threshold Value : 00000000
 [058h 0088 004h]    Polling Threshold Window : 00000000
 [05Ch 0092 004h]       Error Threshold Value : 00000000
 [060h 0096 004h]      Error Threshold Window : 00000000

-[064h 0100 004h]   Error Status Block Length : 00000400
+[064h 0100 004h]   Error Status Block Length : 00001000
 [068h 0104 00Ch]           Read Ack Register : [Generic Address Structure]
 [068h 0104 001h]                    Space ID : 00 [SystemMemory]
 [069h 0105 001h]                   Bit Width : 40
 [06Ah 0106 001h]                  Bit Offset : 00
 [06Bh 0107 001h]        Encoded Access Width : 04 [QWord Access:64]
 [06Ch 0108 008h]                     Address : 0000000043DA0010

 [074h 0116 008h]           Read Ack Preserve : FFFFFFFFFFFFFFFE
 [07Ch 0124 008h]              Read Ack Write : 0000000000000001

 [084h 0132 002h]               Subtable Type : 000A [Generic Hardware Error Source V2]
 [086h 0134 002h]                   Source Id : 0001
 [088h 0136 002h]           Related Source Id : FFFF
 [08Ah 0138 001h]                    Reserved : 00
 [08Bh 0139 001h]                     Enabled : 01
 [08Ch 0140 004h]      Records To Preallocate : 00000001
 [090h 0144 004h]     Max Sections Per Record : 00000001
-[094h 0148 004h]         Max Raw Data Length : 00000400
+[094h 0148 004h]         Max Raw Data Length : 00001000

 [098h 0152 00Ch]        Error Status Address : [Generic Address Structure]
 [098h 0152 001h]                    Space ID : 00 [SystemMemory]
 [099h 0153 001h]                   Bit Width : 40
 [09Ah 0154 001h]                  Bit Offset : 00
 [09Bh 0155 001h]        Encoded Access Width : 04 [QWord Access:64]
 [09Ch 0156 008h]                     Address : 0000000043DA0008

 [0A4h 0164 01Ch]                      Notify : [Hardware Error Notification Structure]
 [0A4h 0164 001h]                 Notify Type : 07 [GPIO]
 [0A5h 0165 001h]               Notify Length : 1C
 [0A6h 0166 002h]  Configuration Write Enable : 0000
 [0A8h 0168 004h]                PollInterval : 00000000
 [0ACh 0172 004h]                      Vector : 00000000
 [0B0h 0176 004h]     Polling Threshold Value : 00000000
 [0B4h 0180 004h]    Polling Threshold Window : 00000000
 [0B8h 0184 004h]       Error Threshold Value : 00000000
 [0BCh 0188 004h]      Error Threshold Window : 00000000

-[0C0h 0192 004h]   Error Status Block Length : 00000400
+[0C0h 0192 004h]   Error Status Block Length : 00001000
 [0C4h 0196 00Ch]           Read Ack Register : [Generic Address Structure]
 [0C4h 0196 001h]                    Space ID : 00 [SystemMemory]
 [0C5h 0197 001h]                   Bit Width : 40
 [0C6h 0198 001h]                  Bit Offset : 00
 [0C7h 0199 001h]        Encoded Access Width : 04 [QWord Access:64]
 [0C8h 0200 008h]                     Address : 0000000043DA0018

 [0D0h 0208 008h]           Read Ack Preserve : FFFFFFFFFFFFFFFE
 [0D8h 0216 008h]              Read Ack Write : 0000000000000001

 Raw Table Data: Length 224 (0xE0)

-    0000: 48 45 53 54 E0 00 00 00 01 6C 42 4F 43 48 53 20  // HEST.....lBOCHS
+    0000: 48 45 53 54 E0 00 00 00 01 3C 42 4F 43 48 53 20  // HEST.....<BOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
     0020: 01 00 00 00 02 00 00 00 0A 00 00 00 FF FF 00 01  // ................
-    0030: 01 00 00 00 01 00 00 00 00 04 00 00 00 40 00 04  // .............@..
+    0030: 01 00 00 00 01 00 00 00 00 10 00 00 00 40 00 04  // .............@..
     0040: 00 00 DA 43 00 00 00 00 08 1C 00 00 00 00 00 00  // ...C............
     0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
-    0060: 00 00 00 00 00 04 00 00 00 40 00 04 10 00 DA 43  // .........@.....C
+    0060: 00 00 00 00 00 10 00 00 00 40 00 04 10 00 DA 43  // .........@.....C
     0070: 00 00 00 00 FE FF FF FF FF FF FF FF 01 00 00 00  // ................
     0080: 00 00 00 00 0A 00 01 00 FF FF 00 01 01 00 00 00  // ................
-    0090: 01 00 00 00 00 04 00 00 00 40 00 04 08 00 DA 43  // .........@.....C
+    0090: 01 00 00 00 00 10 00 00 00 40 00 04 08 00 DA 43  // .........@.....C
     00A0: 00 00 00 00 07 1C 00 00 00 00 00 00 00 00 00 00  // ................
     00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // ................
-    00C0: 00 04 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
+    00C0: 00 10 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
     00D0: FE FF FF FF FF FF FF FF 01 00 00 00 00 00 00 00  // ................

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tests/data/acpi/aarch64/virt/HEST           | Bin 224 -> 224 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/aarch64/virt/HEST b/tests/data/acpi/aarch64/virt/HEST
index 674272922db7d48f7821aa7c83ec76bb3b556d2a..247345901b9be41a86c51bcced59287f74f9e918 100644
GIT binary patch
delta 45
ycmaFB_<)hi!!<bM0RsaAqs>GveMW(aR@#gL6VpJfY1$kD3=9ko3@j28bHxD$GYS#_

delta 45
ycmaFB_<)hi!!<bM0RsaAW6ne_eMXjvR@#g#6VpJfY1$kt3=9ko3@j28bHxD&iwY6|

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index 39901c58d6..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/aarch64/virt/HEST",
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (2 preceding siblings ...)
  2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:14   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

In the situation where host and guest has 64KiB and 4KiB page sizes,
one problematic host page affects 16 guest pages. we need to send 16
consective errors in this specific case.

Extend acpi_ghes_memory_errors() to support multiple CPERs after the
hunk of code to generate the GHES error status is pulled out from
ghes_gen_err_data_uncorrectable_recoverable(). The status field of
generic error status block is also updated accordingly if multiple
error data entries are contained in the generic error status block.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes-stub.c    |  2 +-
 hw/acpi/ghes.c         | 60 +++++++++++++++++++++++-------------------
 include/hw/acpi/ghes.h |  2 +-
 target/arm/kvm.c       |  4 ++-
 4 files changed, 38 insertions(+), 30 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 40f660c246..4faf573aeb 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -12,7 +12,7 @@
 #include "hw/acpi/ghes.h"
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t physical_address)
+                            uint64_t *addresses, uint32_t num_of_addresses)
 {
     return -1;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a9c08e73c0..527b85c8d8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -57,8 +57,12 @@
 /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
 #define ACPI_GHES_MEM_CPER_LENGTH           80
 
-/* Masks for block_status flags */
-#define ACPI_GEBS_UNCORRECTABLE         1
+/* Bits for block_status flags */
+#define ACPI_GEBS_UNCORRECTABLE           0
+#define ACPI_GEBS_CORRECTABLE             1
+#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE  2
+#define ACPI_GEBS_MULTIPLE_CORRECTABLE    3
+#define ACPI_GEBS_ERROR_DATA_ENTRIES      4
 
 /*
  * Total size for Generic Error Status Block except Generic Error Data Entries
@@ -212,26 +216,6 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
     build_append_int_noprefix(table, 0, 7);
 }
 
-static void
-ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
-                                            const uint8_t *section_type,
-                                            int data_length)
-{
-    /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
-     * Table 17-13 Generic Error Data Entry
-     */
-    QemuUUID fru_id = {};
-
-    /* Build the new generic error status block header */
-    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
-        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
-
-    /* Build this new generic error data entry header */
-    acpi_ghes_generic_error_data(block, section_type,
-        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
-        ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-}
-
 /*
  * Build table for the hardware error fw_cfg blob.
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
@@ -557,19 +541,26 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
 }
 
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t physical_address)
+                            uint64_t *addresses, uint32_t num_of_addresses)
 {
     /* Memory Error Section Type */
     const uint8_t guid[] =
           UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
                   0xED, 0x7C, 0x83, 0xB1);
+    /*
+     * invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
+     * Table 17-13 Generic Error Data Entry
+     */
+    QemuUUID fru_id = {};
     Error *errp = NULL;
     int data_length;
     GArray *block;
+    uint32_t block_status, i;
 
     block = g_array_new(false, true /* clear */, 1);
 
-    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
+    data_length = num_of_addresses *
+                  (ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH);
     /*
      * It should not run out of the preallocated memory if adding a new generic
      * error data entry
@@ -577,10 +568,25 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
     assert((data_length + ACPI_GHES_GESB_SIZE) <=
             ACPI_GHES_MAX_RAW_DATA_LENGTH);
 
-    ghes_gen_err_data_uncorrectable_recoverable(block, guid, data_length);
+    /* Build the new generic error status block header */
+    block_status = (1 << ACPI_GEBS_UNCORRECTABLE) |
+                   (num_of_addresses << ACPI_GEBS_ERROR_DATA_ENTRIES);
+    if (num_of_addresses > 1) {
+        block_status |= ACPI_GEBS_MULTIPLE_UNCORRECTABLE;
+    }
+
+    acpi_ghes_generic_error_status(block, block_status, 0, 0,
+                                   data_length, ACPI_CPER_SEV_RECOVERABLE);
 
-    /* Build the memory section CPER for above new generic error data entry */
-    acpi_ghes_build_append_mem_cper(block, physical_address);
+    for (i = 0; i < num_of_addresses; i++) {
+        /* Build generic error data entries */
+        acpi_ghes_generic_error_data(block, guid,
+                                     ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+                                     ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
+
+        /* Memory section CPER on top of the generic error data entry */
+        acpi_ghes_build_append_mem_cper(block, addresses[i]);
+    }
 
     /* Report the error */
     ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index df2ecbf6e4..f73908985d 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -99,7 +99,7 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t error_physical_addr);
+                            uint64_t *addresses, uint32_t num_of_addresses);
 void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
                              uint16_t source_id, Error **errp);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..459ca4a9b0 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2434,6 +2434,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
     ram_addr_t ram_addr;
     hwaddr paddr;
     AcpiGhesState *ags;
+    uint64_t addresses[16];
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
@@ -2454,10 +2455,11 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              * later from the main thread, so doing the injection of
              * the error would be more complicated.
              */
+            addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
                 if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                             paddr)) {
+                                             addresses, 1)) {
                     kvm_inject_arm_sea(c);
                 } else {
                     error_report("failed to record the error");
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets()
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (3 preceding siblings ...)
  2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:17   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

For one particular error (Error), we can't call error_setg() for twice.
Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
can return a error initialized by error_setg(). Without bailing on
this error, it can call into the second error_setg() due to the
unexpected value from the read acknowledgement register.

Bail early in ghes_record_cper_errors() when error is received from
get_ghes_source_offsets() to avoid the exception.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 527b85c8d8..055e5d719a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -513,6 +513,9 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
     } else {
         get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
                                 &cper_addr, &read_ack_register_addr, errp);
+        if (*errp) {
+            return;
+        }
     }
 
     cpu_physical_memory_read(read_ack_register_addr,
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors()
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (4 preceding siblings ...)
  2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:18   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
  2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

Use error_abort in acpi_ghes_memory_errors() so that the caller needn't
explicitly call abort() on errors. With this change, its return value
isn't needed any more.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes-stub.c    |  6 +++---
 hw/acpi/ghes.c         | 15 ++++-----------
 include/hw/acpi/ghes.h |  5 +++--
 target/arm/kvm.c       | 10 +++-------
 4 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 4faf573aeb..4ef914ffc5 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,10 +11,10 @@
 #include "qemu/osdep.h"
 #include "hw/acpi/ghes.h"
 
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses)
+void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp)
 {
-    return -1;
 }
 
 AcpiGhesState *acpi_ghes_get_state(void)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 055e5d719a..aa469c03f2 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -543,8 +543,9 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
     notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
 }
 
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses)
+void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp)
 {
     /* Memory Error Section Type */
     const uint8_t guid[] =
@@ -555,7 +556,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
      * Table 17-13 Generic Error Data Entry
      */
     QemuUUID fru_id = {};
-    Error *errp = NULL;
     int data_length;
     GArray *block;
     uint32_t block_status, i;
@@ -592,16 +592,9 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
     }
 
     /* Report the error */
-    ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
+    ghes_record_cper_errors(ags, block->data, block->len, source_id, errp);
 
     g_array_free(block, true);
-
-    if (errp) {
-        error_report_err(errp);
-        return -1;
-    }
-
-    return 0;
 }
 
 AcpiGhesState *acpi_ghes_get_state(void)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index f73908985d..35c7bbbb01 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -98,8 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
                      const char *oem_id, const char *oem_table_id);
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
-                            uint64_t *addresses, uint32_t num_of_addresses);
+void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+                             uint64_t *addresses, uint32_t num_of_addresses,
+                             Error **errp);
 void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
                              uint16_t source_id, Error **errp);
 
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 459ca4a9b0..a889315606 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
             addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
-                if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                             addresses, 1)) {
-                    kvm_inject_arm_sea(c);
-                } else {
-                    error_report("failed to record the error");
-                    abort();
-                }
+                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
+                                        addresses, 1, &error_abort);
+                kvm_inject_arm_sea(c);
             }
             return;
         }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors()
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (5 preceding siblings ...)
  2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:19   ` Jonathan Cameron via
  2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

Introduce helper push_ghes_memory_errors(), which sends ACPI GHES memory
errors and injects SEA exception. With this, we can add more logics to
the function to support multiple ACPI GHES memory errors in the next
path.

No functional changes intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 target/arm/kvm.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index a889315606..5b151eda3c 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2429,12 +2429,23 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
     return ret;
 }
 
+static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
+                                    uint64_t paddr, Error **errp)
+{
+    uint64_t addresses[16];
+
+    addresses[0] = paddr;
+
+    kvm_cpu_synchronize_state(c);
+    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp);
+    kvm_inject_arm_sea(c);
+}
+
 void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
     hwaddr paddr;
     AcpiGhesState *ags;
-    uint64_t addresses[16];
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
@@ -2455,12 +2466,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              * later from the main thread, so doing the injection of
              * the error would be more complicated.
              */
-            addresses[0] = paddr;
             if (code == BUS_MCEERR_AR) {
-                kvm_cpu_synchronize_state(c);
-                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
-                                        addresses, 1, &error_abort);
-                kvm_inject_arm_sea(c);
+                push_ghes_memory_errors(c, ags, paddr, &error_abort);
             }
             return;
         }
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection
  2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
                   ` (6 preceding siblings ...)
  2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
@ 2025-11-05 11:44 ` Gavin Shan
  2025-11-05 14:37   ` Jonathan Cameron via
  7 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-11-05 11:44 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, jonathan.cameron, mchehab+huawei, gengdongjiu1, mst,
	imammedo, anisinha, peter.maydell, pbonzini, shan.gavin

In the combination of 64KiB host and 4KiB guest, a problematic host
page affects 16x guest pages that can be owned by different threads.
It means 16x memory errors can be raised at once due to the parallel
accesses to those 16x guest pages on the guest. Unfortunately, QEMU
can't deliver them one by one because we just one GHES error block,
corresponding one read acknowledgement register. It can eventually
cause QEMU crash dump due to the contention on that register, meaning
the current memory error can't be delivered before the previous error
isn't acknowledged.

Imporve push_ghes_memory_errors() to push 16x consecutive memory errors
under this situation to avoid the contention on the read acknowledgement
register.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 target/arm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 5b151eda3c..d7de8262da 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include <sys/ioctl.h>
 
 #include <linux/kvm.h>
@@ -2432,12 +2433,59 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
 static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
                                     uint64_t paddr, Error **errp)
 {
+    uint64_t val, start, end, guest_pgsz, host_pgsz;
     uint64_t addresses[16];
+    uint32_t num_of_addresses;
+    int ret;
+
+    /*
+     * Sort out the guest page size from TCR_EL1, which can be modified
+     * by the guest from time to time. So we have to sort it out dynamically.
+     */
+    ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
+    if (ret) {
+        error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", ret);
+        return;
+    }
+
+    switch (extract64(val, 14, 2)) {
+    case 0:
+        guest_pgsz = 4 * KiB;
+        break;
+    case 1:
+        guest_pgsz = 64 * KiB;
+        break;
+    case 2:
+        guest_pgsz = 16 * KiB;
+        break;
+    default:
+        error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", val);
+        return;
+    }
+
+    host_pgsz = qemu_real_host_page_size();
+    start = paddr & ~(host_pgsz - 1);
+    end = start + host_pgsz;
+    num_of_addresses = 0;
 
-    addresses[0] = paddr;
+    while (start < end) {
+        /*
+         * The precise physical address is provided for the affected
+         * guest page that contains @paddr. Otherwise, the starting
+         * address of the guest page is provided.
+         */
+        if (paddr >= start && paddr < (start + guest_pgsz)) {
+            addresses[num_of_addresses++] = paddr;
+        } else {
+            addresses[num_of_addresses++] = start;
+        }
+
+        start += guest_pgsz;
+    }
 
     kvm_cpu_synchronize_state(c);
-    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp);
+    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
+                            addresses, num_of_addresses, errp);
     kvm_inject_arm_sea(c);
 }
 
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
  2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
@ 2025-11-05 14:14   ` Jonathan Cameron via
  2025-11-06  3:15     ` Gavin Shan
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:14 UTC (permalink / raw)
  To: Gavin Shan, shan.gavin
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini

On Wed,  5 Nov 2025 21:44:49 +1000
Gavin Shan <gshan@redhat.com> wrote:

> In the situation where host and guest has 64KiB and 4KiB page sizes,
> one problematic host page affects 16 guest pages. we need to send 16
> consective errors in this specific case.
> 
> Extend acpi_ghes_memory_errors() to support multiple CPERs after the
> hunk of code to generate the GHES error status is pulled out from
> ghes_gen_err_data_uncorrectable_recoverable(). The status field of
> generic error status block is also updated accordingly if multiple
> error data entries are contained in the generic error status block.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Hi Gavin,

Mostly fine, but a few comments on the defines added and a
question on what the multiple things are meant to mean?

> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index a9c08e73c0..527b85c8d8 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -57,8 +57,12 @@
>  /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
>  #define ACPI_GHES_MEM_CPER_LENGTH           80
>  
> -/* Masks for block_status flags */
> -#define ACPI_GEBS_UNCORRECTABLE         1
> +/* Bits for block_status flags */
> +#define ACPI_GEBS_UNCORRECTABLE           0
> +#define ACPI_GEBS_CORRECTABLE             1
> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE  2
> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE    3

So this maps to the bits in block status. 

I'm not actually sure what these multiple variants are meant to tell us.
The multiple error blocks example referred to by the spec is a way to represent
the same error applying to multiple places.  So that's one error, many blocks.
I have no idea if we set these bits in that case.

Based on a quick look I don't think linux even takes any notice.  THere
are defines in actbl1.h but I'm not seeing any use made of them.

> +#define ACPI_GEBS_ERROR_DATA_ENTRIES      4

This is bits 4-13 and the define isn't used. I'd drop it.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table
  2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
@ 2025-11-05 14:16   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:16 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:46 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Adds HEST blob to the whilelist, preparing for the increased GHES raw
> data maximal length (ACPI_GHES_MAX_RAW_DATA_LENGTH) from 1KiB to 4KiB,
> because we need to send 16 consective errors in case a problematic 64KiB
> host page affects 16 4KiB guest pages.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
FWIW
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> ---
>  tests/qtest/bios-tables-test-allowed-diff.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
> index dfb8523c8b..39901c58d6 100644
> --- a/tests/qtest/bios-tables-test-allowed-diff.h
> +++ b/tests/qtest/bios-tables-test-allowed-diff.h
> @@ -1 +1,2 @@
>  /* List of comma-separated changed AML files to ignore */
> +"tests/data/acpi/aarch64/virt/HEST",



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB
  2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
@ 2025-11-05 14:16   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:16 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:47 +1000
Gavin Shan <gshan@redhat.com> wrote:

> The current GHES raw data maximal length isn't enough for 16 consecutive
> CPER errors, which will be sent to a guest with 4KiB page size on a
> erroneous 64KiB host page. Note those 16 CPER errors will be contained
> in one single error block, meaning all CPER errors should be identical
> in terms of type and severity and all of them should be delivered in
> one shot.
> 
> Increase GHES raw data maximal length from 1KiB to 4KiB so that the
> error block has enough storage space for 16 consecutive CPER errors.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table
  2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
@ 2025-11-05 14:17   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:17 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:48 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Update HEST table since GHES raw data maximal length has been increased
...

> -    00C0: 00 04 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
> +    00C0: 00 10 00 00 00 40 00 04 18 00 DA 43 00 00 00 00  // .....@.....C....
>      00D0: FE FF FF FF FF FF FF FF 01 00 00 00 00 00 00 00  // ................
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets()
  2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
@ 2025-11-05 14:17   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:17 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:50 +1000
Gavin Shan <gshan@redhat.com> wrote:

> For one particular error (Error), we can't call error_setg() for twice.
> Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
> error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
> can return a error initialized by error_setg(). Without bailing on
> this error, it can call into the second error_setg() due to the
> unexpected value from the read acknowledgement register.
> 
> Bail early in ghes_record_cper_errors() when error is received from
> get_ghes_source_offsets() to avoid the exception.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Seems reasonable to me.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors()
  2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-05 14:18   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:51 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Use error_abort in acpi_ghes_memory_errors() so that the caller needn't
> explicitly call abort() on errors. With this change, its return value
> isn't needed any more.
> 
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors()
  2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
@ 2025-11-05 14:19   ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:19 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:52 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Introduce helper push_ghes_memory_errors(), which sends ACPI GHES memory
> errors and injects SEA exception. With this, we can add more logics to
> the function to support multiple ACPI GHES memory errors in the next
> path.
> 
> No functional changes intended.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection
  2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
@ 2025-11-05 14:37   ` Jonathan Cameron via
  2025-11-06  3:26     ` Gavin Shan
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron via @ 2025-11-05 14:37 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

On Wed,  5 Nov 2025 21:44:53 +1000
Gavin Shan <gshan@redhat.com> wrote:

> In the combination of 64KiB host and 4KiB guest, a problematic host
> page affects 16x guest pages that can be owned by different threads.
> It means 16x memory errors can be raised at once due to the parallel
> accesses to those 16x guest pages on the guest. Unfortunately, QEMU
> can't deliver them one by one because we just one GHES error block,

we have just one

> corresponding one read acknowledgement register. It can eventually
> cause QEMU crash dump due to the contention on that register, meaning
> the current memory error can't be delivered before the previous error
> isn't acknowledged.
> 
> Imporve push_ghes_memory_errors() to push 16x consecutive memory errors
Improve

> under this situation to avoid the contention on the read acknowledgement
> register.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Hi Gavin

Silly question that never occurred to me before:
What happens if we just report a single larger error?

The CPER record has a Physical Address Mask that I think lets us say we
are only reporting at a 64KiB granularity.

In linux drivers/edac/ghes_edac.c seems to handle this via e->grain.
https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/edac/ghes_edac.c#L346

I haven't chased the whole path through to whether this does appropriate poisoning
on the guest though.

> ---
>  target/arm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 5b151eda3c..d7de8262da 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -11,6 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include <sys/ioctl.h>
>  
>  #include <linux/kvm.h>
> @@ -2432,12 +2433,59 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
>  static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
>                                      uint64_t paddr, Error **errp)
>  {
> +    uint64_t val, start, end, guest_pgsz, host_pgsz;
>      uint64_t addresses[16];
> +    uint32_t num_of_addresses;
> +    int ret;
> +
> +    /*
> +     * Sort out the guest page size from TCR_EL1, which can be modified
> +     * by the guest from time to time. So we have to sort it out dynamically.
> +     */
> +    ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
> +    if (ret) {
> +        error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", ret);
> +        return;
> +    }
> +
> +    switch (extract64(val, 14, 2)) {
> +    case 0:
> +        guest_pgsz = 4 * KiB;
> +        break;
> +    case 1:
> +        guest_pgsz = 64 * KiB;
> +        break;
> +    case 2:
> +        guest_pgsz = 16 * KiB;
> +        break;
> +    default:
> +        error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", val);
> +        return;
> +    }
> +
> +    host_pgsz = qemu_real_host_page_size();
> +    start = paddr & ~(host_pgsz - 1);
> +    end = start + host_pgsz;
> +    num_of_addresses = 0;
>  
> -    addresses[0] = paddr;
> +    while (start < end) {
> +        /*
> +         * The precise physical address is provided for the affected
> +         * guest page that contains @paddr. Otherwise, the starting
> +         * address of the guest page is provided.
> +         */
> +        if (paddr >= start && paddr < (start + guest_pgsz)) {
> +            addresses[num_of_addresses++] = paddr;
> +        } else {
> +            addresses[num_of_addresses++] = start;
> +        }
> +
> +        start += guest_pgsz;
> +    }
>  
>      kvm_cpu_synchronize_state(c);
> -    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp);
> +    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> +                            addresses, num_of_addresses, errp);
>      kvm_inject_arm_sea(c);
>  }
>  



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
  2025-11-05 14:14   ` Jonathan Cameron via
@ 2025-11-06  3:15     ` Gavin Shan
  0 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-11-06  3:15 UTC (permalink / raw)
  To: Jonathan Cameron, shan.gavin
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini

Hi Jonathan and Igor,

On 11/6/25 12:14 AM, Jonathan Cameron wrote:
> On Wed,  5 Nov 2025 21:44:49 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> In the situation where host and guest has 64KiB and 4KiB page sizes,
>> one problematic host page affects 16 guest pages. we need to send 16
>> consective errors in this specific case.
>>
>> Extend acpi_ghes_memory_errors() to support multiple CPERs after the
>> hunk of code to generate the GHES error status is pulled out from
>> ghes_gen_err_data_uncorrectable_recoverable(). The status field of
>> generic error status block is also updated accordingly if multiple
>> error data entries are contained in the generic error status block.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Hi Gavin,
> 
> Mostly fine, but a few comments on the defines added and a
> question on what the multiple things are meant to mean?
> 

Thanks for your review and comments, replies as below.

>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
>> index a9c08e73c0..527b85c8d8 100644
>> --- a/hw/acpi/ghes.c
>> +++ b/hw/acpi/ghes.c
>> @@ -57,8 +57,12 @@
>>   /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
>>   #define ACPI_GHES_MEM_CPER_LENGTH           80
>>   
>> -/* Masks for block_status flags */
>> -#define ACPI_GEBS_UNCORRECTABLE         1
>> +/* Bits for block_status flags */
>> +#define ACPI_GEBS_UNCORRECTABLE           0
>> +#define ACPI_GEBS_CORRECTABLE             1
>> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE  2
>> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE    3
> 
> So this maps to the bits in block status.
> 
> I'm not actually sure what these multiple variants are meant to tell us.
> The multiple error blocks example referred to by the spec is a way to represent
> the same error applying to multiple places.  So that's one error, many blocks.
> I have no idea if we set these bits in that case.
> 
> Based on a quick look I don't think linux even takes any notice.  THere
> are defines in actbl1.h but I'm not seeing any use made of them.
> 

I hope Igor can confirm since it was suggested by him.

It's hard to understand how exactly these multiple variants are used from the
spec. In ACPI 6.5 Table 18.11, it's explained as below.

Bit [2] - Multiple Uncorrectable Errors: If set to one, indicates that more
than one uncorrectable errors have been detected.

I don't see those multiple variants have been used by Linux. So I think it's
safe to drop them.

>> +#define ACPI_GEBS_ERROR_DATA_ENTRIES      4
> 
> This is bits 4-13 and the define isn't used. I'd drop it.
> 

The definition is used in acpi_ghes_memory_errors() of this patch. However,
I don't see it has been used by Linux. This field isn't used by Linux to determine
the total number of error entries. So I think I can drop it either if Igor is ok.

Thanks,
Gavin





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection
  2025-11-05 14:37   ` Jonathan Cameron via
@ 2025-11-06  3:26     ` Gavin Shan
  0 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-11-06  3:26 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-arm, qemu-devel, mchehab+huawei, gengdongjiu1, mst, imammedo,
	anisinha, peter.maydell, pbonzini, shan.gavin

Hi Jonathan,

On 11/6/25 12:37 AM, Jonathan Cameron wrote:
> On Wed,  5 Nov 2025 21:44:53 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> In the combination of 64KiB host and 4KiB guest, a problematic host
>> page affects 16x guest pages that can be owned by different threads.
>> It means 16x memory errors can be raised at once due to the parallel
>> accesses to those 16x guest pages on the guest. Unfortunately, QEMU
>> can't deliver them one by one because we just one GHES error block,
> 
> we have just one
> 

Thanks, fixed locally.

>> corresponding one read acknowledgement register. It can eventually
>> cause QEMU crash dump due to the contention on that register, meaning
>> the current memory error can't be delivered before the previous error
>> isn't acknowledged.
>>
>> Imporve push_ghes_memory_errors() to push 16x consecutive memory errors
> Improve
> 

Thanks, fixed locally.

>> under this situation to avoid the contention on the read acknowledgement
>> register.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Hi Gavin
> 
> Silly question that never occurred to me before:
> What happens if we just report a single larger error?
> 
> The CPER record has a Physical Address Mask that I think lets us say we
> are only reporting at a 64KiB granularity.
> 
> In linux drivers/edac/ghes_edac.c seems to handle this via e->grain.
> https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/edac/ghes_edac.c#L346
> 
> I haven't chased the whole path through to whether this does appropriate poisoning
> on the guest though.
> 

We have the following call trace to handle CPER error record. The e->grain
originated from the Physical Address Mask is used to come up with the size
for memory scrubbing at (a). The page isolation happens at (b) bases on the
reported Physical Address. So a larger Physical Address Mask won't help to
isolate more pages per my understanding.

do_sea
   apei_claim_sea
     ghes_notify_sea
       ghes_in_nmi_spool_from_list
         ghes_in_nmi_queue_one_entry
         irq_work_queue                          // ghes_proc_irq_work
           ghes_proc_in_irq
             ghes_do_proc
               atomic_notifier_call_chain        // (a) ghes_report_chain
                 ghes_edac_report_mem_error
                   edac_raw_mc_handle_error
                ghes_handle_memory_failure
                  ghes_do_memory_failure
                    memory_failure_cb
                      memory_failure             // (b) Isolate the page

Thanks,
Gavin

>> ---
>>   target/arm/kvm.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 5b151eda3c..d7de8262da 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -11,6 +11,7 @@
>>    */
>>   
>>   #include "qemu/osdep.h"
>> +#include "qemu/units.h"
>>   #include <sys/ioctl.h>
>>   
>>   #include <linux/kvm.h>
>> @@ -2432,12 +2433,59 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
>>   static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
>>                                       uint64_t paddr, Error **errp)
>>   {
>> +    uint64_t val, start, end, guest_pgsz, host_pgsz;
>>       uint64_t addresses[16];
>> +    uint32_t num_of_addresses;
>> +    int ret;
>> +
>> +    /*
>> +     * Sort out the guest page size from TCR_EL1, which can be modified
>> +     * by the guest from time to time. So we have to sort it out dynamically.
>> +     */
>> +    ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
>> +    if (ret) {
>> +        error_setg(errp, "Error %" PRId32 " to read TCR_EL1 register", ret);
>> +        return;
>> +    }
>> +
>> +    switch (extract64(val, 14, 2)) {
>> +    case 0:
>> +        guest_pgsz = 4 * KiB;
>> +        break;
>> +    case 1:
>> +        guest_pgsz = 64 * KiB;
>> +        break;
>> +    case 2:
>> +        guest_pgsz = 16 * KiB;
>> +        break;
>> +    default:
>> +        error_setg(errp, "Unknown page size from TCR_EL1 (0x%" PRIx64 ")", val);
>> +        return;
>> +    }
>> +
>> +    host_pgsz = qemu_real_host_page_size();
>> +    start = paddr & ~(host_pgsz - 1);
>> +    end = start + host_pgsz;
>> +    num_of_addresses = 0;
>>   
>> -    addresses[0] = paddr;
>> +    while (start < end) {
>> +        /*
>> +         * The precise physical address is provided for the affected
>> +         * guest page that contains @paddr. Otherwise, the starting
>> +         * address of the guest page is provided.
>> +         */
>> +        if (paddr >= start && paddr < (start + guest_pgsz)) {
>> +            addresses[num_of_addresses++] = paddr;
>> +        } else {
>> +            addresses[num_of_addresses++] = start;
>> +        }
>> +
>> +        start += guest_pgsz;
>> +    }
>>   
>>       kvm_cpu_synchronize_state(c);
>> -    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses, 1, errp);
>> +    acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>> +                            addresses, num_of_addresses, errp);
>>       kvm_inject_arm_sea(c);
>>   }
>>   
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-11-06  3:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-11-05 14:14   ` Jonathan Cameron via
2025-11-06  3:15     ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
2025-11-05 14:18   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-11-05 14:19   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-05 14:37   ` Jonathan Cameron via
2025-11-06  3:26     ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).