public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject
@ 2025-01-22 15:46 Mauro Carvalho Chehab
  2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Philippe Mathieu-Daudé, Ani Sinha,
	Cleber Rosa, Dongjiu Geng, Eduardo Habkost, Eric Blake, John Snow,
	Marcel Apfelbaum, Markus Armbruster, Michael Roth, Paolo Bonzini,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, kvm,
	linux-kernel

Now that the ghes preparation patches were merged, let's add support
for error injection.

I'm opting to fold two patch series into one here:

1. https://lore.kernel.org/qemu-devel/20250113130854.848688-1-mchehab+huawei@kernel.org/

It is the first 5 patches containing changes to the math used to calculate offsets at HEST
table and hardware_error firmware file, together with its migration code. Migration tested
with both latest QEMU released kernel and upstream, on both directions.

There were no changes on this series since last submission, except for a conflict
resolution at the migration table, due to upstream changes.

For more details, se the post of my previous submission.

2. It follows 6 patches from:
	https://lore.kernel.org/qemu-devel/cover.1726293808.git.mchehab+huawei@kernel.org/
    containing the error injection code and script.

   They add a new QAPI to allow injecting GHESv2 errors, and a script using such QAPI
   to inject ARM Processor Error records.

PS.: If I'm counting well, this is the 18th version of this series rebase.

Mauro Carvalho Chehab (11):
  acpi/ghes: Prepare to support multiple sources on ghes
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
    available
  acpi/ghes: add a notifier to notify when error data is ready
  acpi/ghes: Cleanup the code which gets ghes ged state
  acpi/generic_event_device: add an APEI error device
  arm/virt: Wire up a GED error device for ACPI / GHES
  qapi/acpi-hest: add an interface to do generic CPER error injection
  scripts/ghes_inject: add a script to generate GHES error inject

 MAINTAINERS                            |  10 +
 hw/acpi/Kconfig                        |   5 +
 hw/acpi/aml-build.c                    |  10 +
 hw/acpi/generic_event_device.c         |  38 ++
 hw/acpi/ghes-stub.c                    |   4 +-
 hw/acpi/ghes.c                         | 184 +++++--
 hw/acpi/ghes_cper.c                    |  32 ++
 hw/acpi/ghes_cper_stub.c               |  19 +
 hw/acpi/meson.build                    |   2 +
 hw/arm/virt-acpi-build.c               |  35 +-
 hw/arm/virt.c                          |  19 +-
 hw/core/machine.c                      |   2 +
 include/hw/acpi/acpi_dev_interface.h   |   1 +
 include/hw/acpi/aml-build.h            |   2 +
 include/hw/acpi/generic_event_device.h |   1 +
 include/hw/acpi/ghes.h                 |  36 +-
 include/hw/arm/virt.h                  |   2 +
 qapi/acpi-hest.json                    |  35 ++
 qapi/meson.build                       |   1 +
 qapi/qapi-schema.json                  |   1 +
 scripts/arm_processor_error.py         | 377 +++++++++++++
 scripts/ghes_inject.py                 |  51 ++
 scripts/qmp_helper.py                  | 702 +++++++++++++++++++++++++
 target/arm/kvm.c                       |   2 +-
 24 files changed, 1517 insertions(+), 54 deletions(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100644 scripts/qmp_helper.py

-- 
2.48.1



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23  9:56   ` Jonathan Cameron
  2025-01-23 16:48   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 02/11] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, Peter Maydell,
	Shannon Zhao, linux-kernel

The current code is actually dependent on having just one error
structure with a single source.

As the number of sources should be arch-dependent, as it will depend on
what kind of synchronous/assynchronous notifications will exist, change
the logic to dynamically build the table.

Yet, for a proper support, we need to get the number of sources by
reading the number from the HEST table. However, bios currently doesn't
store a pointer to it.

For now just change the logic at table build time, while enforcing that
it will behave like before with a single source ID.

A future patch will add a HEST table bios pointer and change the logic
at acpi_ghes_record_errors() to dynamically use the new size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/acpi/ghes.c           | 43 ++++++++++++++++++++++++++--------------
 hw/arm/virt-acpi-build.c |  5 +++++
 include/hw/acpi/ghes.h   | 21 +++++++++++++-------
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b709c177cdea..3f519ccab90d 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -206,17 +206,26 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
  * See docs/specs/acpi_hest_ghes.rst for blobs format.
  */
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+                                   int num_sources)
 {
     int i, error_status_block_offset;
 
+    /*
+     * TODO: Current version supports only one source.
+     * A further patch will drop this check, after adding a proper migration
+     * code, as, for the code to work, we need to store a bios pointer to the
+     * HEST table.
+     */
+    assert(num_sources == 1);
+
     /* Build error_block_address */
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
     }
 
     /* Build read_ack_register */
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         /*
          * Initialize the value of read_ack_register to 1, so GHES can be
          * writable after (re)boot.
@@ -231,13 +240,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
 
     /* Reserve space for Error Status Data Block */
     acpi_data_push(hardware_errors,
-        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+        ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
 
     /* Tell guest firmware to place hardware_errors blob into RAM */
     bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
                              hardware_errors, sizeof(uint64_t), false);
 
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         /*
          * Tell firmware to patch error_block_address entries to point to
          * corresponding "Generic Error Status Block"
@@ -263,10 +272,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
 static void build_ghes_v2(GArray *table_data,
                           BIOSLinker *linker,
-                          enum AcpiGhesNotifyType notify,
-                          uint16_t source_id)
+                          const AcpiNotificationSourceId *notif_src,
+                          uint16_t index, int num_sources)
 {
     uint64_t address_offset;
+    const uint16_t notify = notif_src->notify;
+    const uint16_t source_id = notif_src->source_id;
 
     /*
      * Type:
@@ -297,7 +308,7 @@ static void build_ghes_v2(GArray *table_data,
                                    address_offset + GAS_ADDR_OFFSET,
                                    sizeof(uint64_t),
                                    ACPI_HW_ERROR_FW_CFG_FILE,
-                                   source_id * sizeof(uint64_t));
+                                   index * sizeof(uint64_t));
 
     /* Notification Structure */
     build_ghes_hw_error_notification(table_data, notify);
@@ -317,8 +328,7 @@ static void build_ghes_v2(GArray *table_data,
                                    address_offset + GAS_ADDR_OFFSET,
                                    sizeof(uint64_t),
                                    ACPI_HW_ERROR_FW_CFG_FILE,
-                                   (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
-                                   * sizeof(uint64_t));
+                                   (num_sources + index) * sizeof(uint64_t));
 
     /*
      * Read Ack Preserve field
@@ -333,19 +343,23 @@ static void build_ghes_v2(GArray *table_data,
 /* Build Hardware Error Source Table */
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
                      BIOSLinker *linker,
+                     const AcpiNotificationSourceId * const notif_source,
+                     int num_sources,
                      const char *oem_id, const char *oem_table_id)
 {
     AcpiTable table = { .sig = "HEST", .rev = 1,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
+    int i;
 
-    build_ghes_error_table(hardware_errors, linker);
+    build_ghes_error_table(hardware_errors, linker, num_sources);
 
     acpi_table_begin(&table, table_data);
 
     /* Error Source Count */
-    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
-    build_ghes_v2(table_data, linker,
-                  ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
+    build_append_int_noprefix(table_data, num_sources, 4);
+    for (i = 0; i < num_sources; i++) {
+        build_ghes_v2(table_data, linker, &notif_source[i], i, num_sources);
+    }
 
     acpi_table_end(linker, &table);
 }
@@ -410,7 +424,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
     get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
                          &cper_addr, &read_ack_register_addr);
 
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3ac8f8e17861..3d411787fc37 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -893,6 +893,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
     g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
 }
 
+static const AcpiNotificationSourceId hest_ghes_notify[] = {
+    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -948,6 +952,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (vms->ras) {
         acpi_add_table(table_offsets, tables_blob);
         acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
                         vms->oem_id, vms->oem_table_id);
     }
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 39619a2457cb..9f0120d0d596 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -57,20 +57,27 @@ enum AcpiGhesNotifyType {
     ACPI_GHES_NOTIFY_RESERVED = 12
 };
 
-enum {
-    ACPI_HEST_SRC_ID_SEA = 0,
-    /* future ids go here */
-
-    ACPI_GHES_ERROR_SOURCE_COUNT
-};
-
 typedef struct AcpiGhesState {
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
+/*
+ * ID numbers used to fill HEST source ID field
+ */
+enum AcpiGhesSourceID {
+    ACPI_HEST_SRC_ID_SYNC,
+};
+
+typedef struct AcpiNotificationSourceId {
+    enum AcpiGhesSourceID source_id;
+    enum AcpiGhesNotifyType notify;
+} AcpiNotificationSourceId;
+
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
                      BIOSLinker *linker,
+                     const AcpiNotificationSourceId * const notif_source,
+                     int num_sources,
                      const char *oem_id, const char *oem_table_id);
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
  2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:02   ` Jonathan Cameron
  2025-01-29 13:33   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, linux-kernel

Store HEST table address at GPA, placing its content at
hest_addr_le variable.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

---

Change from v8:
- hest_addr_lr is now pointing to the error source size and data.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c         | 17 ++++++++++++++++-
 include/hw/acpi/ghes.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 3f519ccab90d..34e3364d3fd8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
 
 #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
 #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
 
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
@@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
     }
 
     /*
-     * tell firmware to write hardware_errors GPA into
+     * Tell firmware to write hardware_errors GPA into
      * hardware_errors_addr fw_cfg, once the former has been initialized.
      */
     bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
@@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
 
     acpi_table_begin(&table, table_data);
 
+    int hest_offset = table_data->len;
+
     /* Error Source Count */
     build_append_int_noprefix(table_data, num_sources, 4);
     for (i = 0; i < num_sources; i++) {
@@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
     }
 
     acpi_table_end(linker, &table);
+
+    /*
+     * tell firmware to write into GPA the address of HEST via fw_cfg,
+     * once initialized.
+     */
+    bios_linker_loader_write_pointer(linker,
+                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+                                     sizeof(uint64_t),
+                                     ACPI_BUILD_TABLE_FILE, hest_offset);
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -375,6 +387,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
+    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+
     ags->present = true;
 }
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 9f0120d0d596..237721fec0a2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
 };
 
 typedef struct AcpiGhesState {
+    uint64_t hest_addr_le;
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
  2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
  2025-01-22 15:46 ` [PATCH 02/11] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:29   ` Jonathan Cameron
  2025-01-22 15:46 ` [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, linux-kernel

There are two pointers that are needed during error injection:

1. The start address of the CPER block to be stored;
2. The address of the ack, which needs a reset before next error.

It is preferable to calculate them from the HEST table.  This allows
checking the source ID, the size of the table and the type of the
HEST error block structures.

Yet, keep the old code, as this is needed for migration purposes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 88 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 34e3364d3fd8..b46b563bcaf8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -61,6 +61,23 @@
  */
 #define ACPI_GHES_GESB_SIZE                 20
 
+/*
+ * Offsets with regards to the start of the HEST table stored at
+ * ags->hest_addr_le, according with the memory layout map at
+ * docs/specs/acpi_hest_ghes.rst.
+ */
+
+/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
+ */
+#define HEST_GHES_V2_TABLE_SIZE  92
+#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
+
+/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
+ * Table 18-380: 'Error Status Address' field
+ */
+#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
+
 /*
  * Values for error_severity field
  */
@@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
 {
     int i, error_status_block_offset;
 
-    /*
-     * TODO: Current version supports only one source.
-     * A further patch will drop this check, after adding a proper migration
-     * code, as, for the code to work, we need to store a bios pointer to the
-     * HEST table.
-     */
-    assert(num_sources == 1);
-
     /* Build error_block_address */
     for (i = 0; i < num_sources; i++) {
         build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
@@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
     *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
+static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
+                                    uint64_t *cper_addr,
+                                    uint64_t *read_ack_start_addr,
+                                    Error **errp)
+{
+    uint64_t hest_err_block_addr, hest_read_ack_addr;
+    uint64_t err_source_struct, error_block_addr;
+    uint32_t num_sources, i;
+
+    if (!hest_addr) {
+        return;
+    }
+
+    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
+    num_sources = le32_to_cpu(num_sources);
+
+    err_source_struct = hest_addr + sizeof(num_sources);
+
+    /*
+     * Currently, HEST Error source navigates only for GHESv2 tables
+     */
+
+    for (i = 0; i < num_sources; i++) {
+        uint64_t addr = err_source_struct;
+        uint16_t type, src_id;
+
+        cpu_physical_memory_read(addr, &type, sizeof(type));
+        type = le16_to_cpu(type);
+
+        /* For now, we only know the size of GHESv2 table */
+        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
+            error_setg(errp, "HEST: type %d not supported.", type);
+            return;
+        }
+
+        /* Compare CPER source address at the GHESv2 structure */
+        addr += sizeof(type);
+        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
+
+        if (src_id == source_id) {
+            break;
+        }
+
+        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
+    }
+    if (i == num_sources) {
+        error_setg(errp, "HEST: Source %d not found.", source_id);
+        return;
+    }
+
+    /* Navigate though table address pointers */
+    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
+    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
+
+    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+                             sizeof(error_block_addr));
+
+    cpu_physical_memory_read(error_block_addr, cper_addr,
+                             sizeof(*cper_addr));
+
+    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
+                             sizeof(*read_ack_start_addr));
+}
+
 void ghes_record_cper_errors(const void *cper, size_t len,
                              uint16_t source_id, Error **errp)
 {
@@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
-                         &cper_addr, &read_ack_register_addr);
+    if (!ags->hest_addr_le) {
+        get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+                             &cper_addr, &read_ack_register_addr);
+    } else {
+        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
+                                &cper_addr, &read_ack_register_addr, errp);
+    }
 
     if (!cper_addr) {
         error_setg(errp, "can not find Generic Error Status Block");
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:31   ` Jonathan Cameron
  2025-01-24 10:08   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, linux-kernel

The GHES migration logic at GED should now support HEST table
location too.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index c85d97ca3776..5346cae573b7 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -386,6 +386,34 @@ static const VMStateDescription vmstate_ghes_state = {
     }
 };
 
+static const VMStateDescription vmstate_hest = {
+    .name = "acpi-hest",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool hest_needed(void *opaque)
+{
+    AcpiGedState *s = opaque;
+    return s->ghes_state.hest_addr_le;
+}
+
+static const VMStateDescription vmstate_hest_state = {
+    .name = "acpi-ged/hest",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = hest_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+                       vmstate_hest, AcpiGhesState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_acpi_ged = {
     .name = "acpi-ged",
     .version_id = 1,
@@ -398,6 +426,7 @@ static const VMStateDescription vmstate_acpi_ged = {
         &vmstate_memhp_state,
         &vmstate_cpuhp_state,
         &vmstate_ghes_state,
+        &vmstate_hest_state,
         NULL
     }
 };
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:52   ` Jonathan Cameron
  2025-01-24 10:23   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Philippe Mathieu-Daudé, Ani Sinha,
	Dongjiu Geng, Eduardo Habkost, Marcel Apfelbaum, Peter Maydell,
	Shannon Zhao, Yanan Wang, Zhao Liu, linux-kernel

Create a new property (x-has-hest-addr) and use it to detect if
the GHES table offsets can be calculated from the HEST address
(qemu 9.2 and upper) or via the legacy way via an offset obtained
from the hardware_errors firmware file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/generic_event_device.c |  1 +
 hw/acpi/ghes.c                 | 28 +++++++++++++++++++++-------
 hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
 hw/core/machine.c              |  2 ++
 include/hw/acpi/ghes.h         |  1 +
 5 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 5346cae573b7..fe537ed05c66 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
 
 static const Property acpi_ged_properties[] = {
     DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
+    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),
 };
 
 static const VMStateDescription vmstate_memhp_state = {
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b46b563bcaf8..86c97f60d6a0 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -359,6 +359,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
 {
     AcpiTable table = { .sig = "HEST", .rev = 1,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
+    AcpiGedState *acpi_ged_state;
+    AcpiGhesState *ags = NULL;
     int i;
 
     build_ghes_error_table(hardware_errors, linker, num_sources);
@@ -379,10 +381,20 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
      * tell firmware to write into GPA the address of HEST via fw_cfg,
      * once initialized.
      */
-    bios_linker_loader_write_pointer(linker,
-                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
-                                     sizeof(uint64_t),
-                                     ACPI_BUILD_TABLE_FILE, hest_offset);
+
+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+    if (!acpi_ged_state) {
+        return;
+    }
+
+    ags = &acpi_ged_state->ghes_state;
+    if (ags->hest_lookup) {
+        bios_linker_loader_write_pointer(linker,
+                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+                                         sizeof(uint64_t),
+                                         ACPI_BUILD_TABLE_FILE, hest_offset);
+    }
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -396,8 +408,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
-    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
-        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+    if (ags && ags->hest_lookup) {
+        fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+            NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+    }
 
     ags->present = true;
 }
@@ -512,7 +526,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    if (!ags->hest_addr_le) {
+    if (!ags->hest_lookup) {
         get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
                              &cper_addr, &read_ack_register_addr);
     } else {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3d411787fc37..ada5d08cfbe7 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
     { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
 };
 
+static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
+    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -950,10 +954,28 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     build_dbg2(tables_blob, tables->linker, vms);
 
     if (vms->ras) {
-        acpi_add_table(table_offsets, tables_blob);
-        acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
-                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
-                        vms->oem_id, vms->oem_table_id);
+        AcpiGhesState *ags;
+        AcpiGedState *acpi_ged_state;
+
+        acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+        if (acpi_ged_state) {
+            ags = &acpi_ged_state->ghes_state;
+
+            acpi_add_table(table_offsets, tables_blob);
+
+            if (!ags->hest_lookup) {
+                acpi_build_hest(tables_blob, tables->hardware_errors,
+                                tables->linker, hest_ghes_notify_9_2,
+                                ARRAY_SIZE(hest_ghes_notify_9_2),
+                                vms->oem_id, vms->oem_table_id);
+            } else {
+                acpi_build_hest(tables_blob, tables->hardware_errors,
+                                tables->linker, hest_ghes_notify,
+                                ARRAY_SIZE(hest_ghes_notify),
+                                vms->oem_id, vms->oem_table_id);
+            }
+        }
     }
 
     if (ms->numa_state->num_nodes > 0) {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c23b39949649..0d0cde481954 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,10 +34,12 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/acpi/generic_event_device.h"
 #include "audio/audio.h"
 
 GlobalProperty hw_compat_9_2[] = {
     {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
+    { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
 };
 const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 237721fec0a2..164ed8b0f9a3 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -61,6 +61,7 @@ typedef struct AcpiGhesState {
     uint64_t hest_addr_le;
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
+    bool hest_lookup; /* True if HEST address is present */
 } AcpiGhesState;
 
 /*
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:52   ` Jonathan Cameron
  2025-01-22 15:46 ` [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, linux-kernel

Some error injection notify methods are async, like GPIO
notify. Add a notifier to be used when the error record is
ready to be sent to the guest OS.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c         | 5 ++++-
 include/hw/acpi/ghes.h | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 86c97f60d6a0..961fc38ea8f5 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -506,6 +506,9 @@ static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
                              sizeof(*read_ack_start_addr));
 }
 
+NotifierList acpi_generic_error_notifiers =
+    NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
 void ghes_record_cper_errors(const void *cper, size_t len,
                              uint16_t source_id, Error **errp)
 {
@@ -561,7 +564,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     /* Write the generic error data entry into guest memory */
     cpu_physical_memory_write(cper_addr, cper, len);
 
-    return;
+    notifier_list_notify(&acpi_generic_error_notifiers, NULL);
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 164ed8b0f9a3..2e8405edfe27 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -24,6 +24,9 @@
 
 #include "hw/acpi/bios-linker-loader.h"
 #include "qapi/error.h"
+#include "qemu/notify.h"
+
+extern NotifierList acpi_generic_error_notifiers;
 
 /*
  * Values for Hardware Error Notification Type field
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:54   ` Jonathan Cameron
  2025-01-24 12:25   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 08/11] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
	Peter Maydell, kvm, linux-kernel

Move the check logic into a common function and simplify the
code which checks if GHES is enabled and was properly setup.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes-stub.c    |  4 ++--
 hw/acpi/ghes.c         | 33 +++++++++++----------------------
 include/hw/acpi/ghes.h |  9 +++++----
 target/arm/kvm.c       |  2 +-
 4 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 7cec1812dad9..fbabf955155a 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -16,7 +16,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
     return -1;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
-    return false;
+    return NULL;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 961fc38ea8f5..5d29db3918dd 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -420,10 +420,6 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
                                  uint64_t *cper_addr,
                                  uint64_t *read_ack_register_addr)
 {
-    if (!ghes_addr) {
-        return;
-    }
-
     /*
      * non-HEST version supports only one source, so no need to change
      * the start offset based on the source ID. Also, we can't validate
@@ -451,10 +447,6 @@ static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
     uint64_t err_source_struct, error_block_addr;
     uint32_t num_sources, i;
 
-    if (!hest_addr) {
-        return;
-    }
-
     cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
     num_sources = le32_to_cpu(num_sources);
 
@@ -513,7 +505,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
                              uint16_t source_id, Error **errp)
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
-    AcpiGedState *acpi_ged_state;
     AcpiGhesState *ags;
 
     if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
@@ -521,13 +512,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
         return;
     }
 
-    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-                                                       NULL));
-    if (!acpi_ged_state) {
-        error_setg(errp, "Can't find ACPI_GED object");
+    ags = acpi_ghes_get_state();
+    if (!ags) {
         return;
     }
-    ags = &acpi_ged_state->ghes_state;
 
     if (!ags->hest_lookup) {
         get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
@@ -537,11 +525,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
                                 &cper_addr, &read_ack_register_addr, errp);
     }
 
-    if (!cper_addr) {
-        error_setg(errp, "can not find Generic Error Status Block");
-        return;
-    }
-
     cpu_physical_memory_read(read_ack_register_addr,
                              &read_ack_register, sizeof(read_ack_register));
 
@@ -605,7 +588,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
     return 0;
 }
 
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
 {
     AcpiGedState *acpi_ged_state;
     AcpiGhesState *ags;
@@ -614,8 +597,14 @@ bool acpi_ghes_present(void)
                                                        NULL));
 
     if (!acpi_ged_state) {
-        return false;
+        return NULL;
     }
     ags = &acpi_ged_state->ghes_state;
-    return ags->present;
+    if (!ags->present) {
+        return NULL;
+    }
+    if (!ags->hw_error_le && !ags->hest_addr_le) {
+        return NULL;
+    }
+    return ags;
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 2e8405edfe27..64fe2b5bea65 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -91,10 +91,11 @@ void ghes_record_cper_errors(const void *cper, size_t len,
                              uint16_t source_id, Error **errp);
 
 /**
- * acpi_ghes_present: Report whether ACPI GHES table is present
+ * acpi_ghes_get_state: Get a pointer for ACPI ghes state
  *
- * Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_memory_errors() to record a memory error.
+ * Returns: a pointer to ghes state if the system has an ACPI GHES table,
+ * it is enabled and it is safe to call acpi_ghes_memory_errors() to record
+ * a memory error. Returns false, otherwise.
  */
-bool acpi_ghes_present(void);
+AcpiGhesState *acpi_ghes_get_state(void);
 #endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb2349..0283089713b9 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2369,7 +2369,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 
     assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
 
-    if (acpi_ghes_present() && addr) {
+    if (acpi_ghes_get_state() && addr) {
         ram_addr = qemu_ram_addr_from_host(addr);
         if (ram_addr != RAM_ADDR_INVALID &&
             kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/11] acpi/generic_event_device: add an APEI error device
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-24 12:30   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, linux-kernel

Adds a generic error device to handle generic hardware error
events as specified at ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.

The PNP0C33 device is used to report hardware errors to
the guest via ACPI APEI Generic Hardware Error Source (GHES).

Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/acpi/aml-build.c                    | 10 ++++++++++
 hw/acpi/generic_event_device.c         |  8 ++++++++
 include/hw/acpi/acpi_dev_interface.h   |  1 +
 include/hw/acpi/aml-build.h            |  2 ++
 include/hw/acpi/generic_event_device.h |  1 +
 5 files changed, 22 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index f8f93a9f66c8..e4bd7b611372 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
 
     return var;
 }
+
+/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
+Aml *aml_error_device(void)
+{
+    Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+    return dev;
+}
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index fe537ed05c66..ce00c80054f4 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
     ACPI_GED_PWR_DOWN_EVT,
     ACPI_GED_NVDIMM_HOTPLUG_EVT,
     ACPI_GED_CPU_HOTPLUG_EVT,
+    ACPI_GED_ERROR_EVT,
 };
 
 /*
@@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
                            aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
                                       aml_int(0x80)));
                 break;
+            case ACPI_GED_ERROR_EVT:
+                aml_append(if_ctx,
+                           aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+                                      aml_int(0x80)));
+                break;
             case ACPI_GED_NVDIMM_HOTPLUG_EVT:
                 aml_append(if_ctx,
                            aml_notify(aml_name("\\_SB.NVDR"),
@@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
         sel = ACPI_GED_MEM_HOTPLUG_EVT;
     } else if (ev & ACPI_POWER_DOWN_STATUS) {
         sel = ACPI_GED_PWR_DOWN_EVT;
+    } else if (ev & ACPI_GENERIC_ERROR) {
+        sel = ACPI_GED_ERROR_EVT;
     } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
         sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
     } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
     ACPI_NVDIMM_HOTPLUG_STATUS = 16,
     ACPI_VMGENID_CHANGE_STATUS = 32,
     ACPI_POWER_DOWN_STATUS = 64,
+    ACPI_GENERIC_ERROR = 128,
 } AcpiEventStatusBits;
 
 #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index c18f68134246..f38e12971932 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -252,6 +252,7 @@ struct CrsRangeSet {
 /* Consumer/Producer */
 #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY        (1 << 1)
 
+#define ACPI_APEI_ERROR_DEVICE   "GEDD"
 /**
  * init_aml_allocator:
  *
@@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
              uint8_t channel);
 Aml *aml_sleep(uint64_t msec);
 Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
+Aml *aml_error_device(void);
 
 /* Block AML object primitives */
 Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
index d2dac87b4a9f..1c18ac296fcb 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
 #define ACPI_GED_PWR_DOWN_EVT      0x2
 #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
 #define ACPI_GED_CPU_HOTPLUG_EVT    0x8
+#define ACPI_GED_ERROR_EVT          0x10
 
 typedef struct GEDState {
     MemoryRegion evt;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 08/11] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 10:56   ` Jonathan Cameron
  2025-01-22 15:46 ` [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Peter Maydell, Shannon Zhao,
	linux-kernel

Adds support to ARM virtualization to allow handling
generic error ACPI Event via GED & error source device.

It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/

Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>

---

Changes from v8:

- Added a call to the function that produces GHES generic
  records, as this is now added earlier in this series.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c            | 12 +++++++++++-
 include/hw/arm/virt.h    |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ada5d08cfbe7..ae60268bdcc2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -861,6 +861,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     }
 
     acpi_dsdt_add_power_button(scope);
+    aml_append(scope, aml_error_device());
 #ifdef CONFIG_TPM
     acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 99e0a68b6c55..e272b35ea114 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -678,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
     DeviceState *dev;
     MachineState *ms = MACHINE(vms);
     int irq = vms->irqmap[VIRT_ACPI_GED];
-    uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+    uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
 
     if (ms->ram_slots) {
         event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1010,6 +1010,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
     }
 }
 
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+    VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
+
+    acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+}
+
 static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
                              uint32_t phandle)
 {
@@ -2404,6 +2411,9 @@ static void machvirt_init(MachineState *machine)
 
     if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
         vms->acpi_dev = create_acpi_ged(vms);
+        vms->generic_error_notifier.notify = virt_generic_error_req;
+        notifier_list_add(&acpi_generic_error_notifiers,
+                          &vms->generic_error_notifier);
     } else {
         create_gpio_devices(vms, VIRT_GPIO, sysmem);
     }
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index c8e94e6aedc9..f3cf28436770 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -176,6 +176,7 @@ struct VirtMachineState {
     DeviceState *gic;
     DeviceState *acpi_dev;
     Notifier powerdown_notifier;
+    Notifier generic_error_notifier;
     PCIBus *bus;
     char *oem_id;
     char *oem_table_id;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 11:00   ` Jonathan Cameron
  2025-01-24 12:38   ` Igor Mammedov
  2025-01-22 15:46 ` [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
  2025-01-24 12:47 ` [PATCH 00/11] Change ghes to use HEST-based offsets and add support for " Igor Mammedov
  11 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Ani Sinha, Dongjiu Geng, Eric Blake,
	Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
	Shannon Zhao, linux-kernel

Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2, and add support for it for ARM guests.

Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
independent. This is mapped at arch virt bindings, depending on the
types supported by QEMU and by the BIOS. So, on ARM, this is supported
via ACPI_GHES_NOTIFY_GPIO notification type.

This patch is co-authored:
    - original ghes logic to inject a simple ARM record by Shiju Jose;
    - generic logic to handle block addresses by Jonathan Cameron;
    - generic GHESv2 error inject by Mauro Carvalho Chehab;

Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

---

Changes since v9:
- ARM source IDs renamed to reflect SYNC/ASYNC;
- command name changed to better reflect what it does;
- some improvements at JSON documentation;
- add a check for QMP source at the notification logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 MAINTAINERS              |  7 +++++++
 hw/acpi/Kconfig          |  5 +++++
 hw/acpi/ghes.c           |  2 +-
 hw/acpi/ghes_cper.c      | 32 ++++++++++++++++++++++++++++++++
 hw/acpi/ghes_cper_stub.c | 19 +++++++++++++++++++
 hw/acpi/meson.build      |  2 ++
 hw/arm/virt-acpi-build.c |  1 +
 hw/arm/virt.c            |  7 +++++++
 include/hw/acpi/ghes.h   |  1 +
 include/hw/arm/virt.h    |  1 +
 qapi/acpi-hest.json      | 35 +++++++++++++++++++++++++++++++++++
 qapi/meson.build         |  1 +
 qapi/qapi-schema.json    |  1 +
 13 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/acpi-hest.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 846b81e3ec03..8e1f662fa0e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
 F: include/hw/acpi/ghes.h
 F: docs/specs/acpi_hest_ghes.rst
 
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/acpi-hest.json
+
 ppc4xx
 L: qemu-ppc@nongnu.org
 S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 1d4e9f0845c0..daabbe6cd11e 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
     bool
     depends on ACPI
 
+config GHES_CPER
+    bool
+    depends on ACPI_APEI
+    default y
+
 config ACPI_PCI
     bool
     depends on ACPI && PCI
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 5d29db3918dd..cf83c959b5ef 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -547,7 +547,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     /* Write the generic error data entry into guest memory */
     cpu_physical_memory_write(cper_addr, cper, len);
 
-    notifier_list_notify(&acpi_generic_error_notifiers, NULL);
+    notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index 000000000000..02c47b41b990
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,32 @@
+/*
+ * CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
+{
+
+    uint8_t *cper;
+    size_t  len;
+
+    cper = qbase64_decode(qmp_cper, -1, &len, errp);
+    if (!cper) {
+        error_setg(errp, "missing GHES CPER payload");
+        return;
+    }
+
+    ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index 000000000000..8782e2c02fa8
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,19 @@
+/*
+ * Stub interface for CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *cper, Error **errp)
+{
+    error_setg(errp, "GHES QMP error inject is not compiled in");
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index 73f02b96912b..56b5d1ec9691 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
 system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
 system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
 system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
+system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
+system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
 system_ss.add(files('acpi-qmp-cmds.c'))
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index ae60268bdcc2..d094212ce584 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -896,6 +896,7 @@ static void acpi_align_size(GArray *blob, unsigned align)
 
 static const AcpiNotificationSourceId hest_ghes_notify[] = {
     { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
+    { ACPI_HEST_SRC_ID_QMP, ACPI_GHES_NOTIFY_GPIO },
 };
 
 static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e272b35ea114..9074a540197d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1012,6 +1012,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
 
 static void virt_generic_error_req(Notifier *n, void *opaque)
 {
+    uint16_t *source_id = opaque;
+
+    /* Currently, only QMP source ID is async */
+    if (*source_id != ACPI_HEST_SRC_ID_QMP) {
+        return;
+    }
+
     VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
 
     acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 64fe2b5bea65..078d78666f91 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -72,6 +72,7 @@ typedef struct AcpiGhesState {
  */
 enum AcpiGhesSourceID {
     ACPI_HEST_SRC_ID_SYNC,
+    ACPI_HEST_SRC_ID_QMP,       /* Use it only for QMP injected errors */
 };
 
 typedef struct AcpiNotificationSourceId {
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f3cf28436770..56f270f61cf5 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -33,6 +33,7 @@
 #include "exec/hwaddr.h"
 #include "qemu/notify.h"
 #include "hw/boards.h"
+#include "hw/acpi/ghes.h"
 #include "hw/arm/boot.h"
 #include "hw/arm/bsa.h"
 #include "hw/block/flash.h"
diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json
new file mode 100644
index 000000000000..d58fba485180
--- /dev/null
+++ b/qapi/acpi-hest.json
@@ -0,0 +1,35 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# == GHESv2 CPER Error Injection
+#
+# Defined since ACPI Specification 6.1,
+# section 18.3.2.8 Generic Hardware Error Source version 2. See:
+#
+# https://uefi.org/sites/default/files/resources/ACPI_6_1.pdf
+##
+
+
+##
+# @inject-ghes-error:
+#
+# Inject an error with additional ACPI 6.1 GHESv2 error information
+#
+# @cper: contains a base64 encoded string with raw data for a single
+#     CPER record with Generic Error Status Block, Generic Error Data
+#     Entry and generic error data payload, as described at
+#     https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.2
+##
+{ 'command': 'inject-ghes-error',
+  'data': {
+    'cper': 'str'
+  },
+  'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..35cea6147262 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -59,6 +59,7 @@ qapi_all_modules = [
 if have_system
   qapi_all_modules += [
     'acpi',
+    'acpi-hest',
     'audio',
     'cryptodev',
     'qdev',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..baf19ab73afe 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -75,6 +75,7 @@
 { 'include': 'misc-target.json' }
 { 'include': 'audio.json' }
 { 'include': 'acpi.json' }
+{ 'include': 'acpi-hest.json' }
 { 'include': 'pci.json' }
 { 'include': 'stats.json' }
 { 'include': 'virtio.json' }
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2025-01-22 15:46 ` Mauro Carvalho Chehab
  2025-01-23 12:10   ` Jonathan Cameron
  2025-01-24 12:47 ` [PATCH 00/11] Change ghes to use HEST-based offsets and add support for " Igor Mammedov
  11 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-22 15:46 UTC (permalink / raw)
  To: Igor Mammedov, Michael S . Tsirkin
  Cc: Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab, Cleber Rosa, John Snow, linux-kernel

Using the QMP GHESv2 API requires preparing a raw data array
containing a CPER record.

Add a helper script with subcommands to prepare such data.

Currently, only ARM Processor error CPER record is supported.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 MAINTAINERS                    |   3 +
 scripts/arm_processor_error.py | 377 ++++++++++++++++++
 scripts/ghes_inject.py         |  51 +++
 scripts/qmp_helper.py          | 702 +++++++++++++++++++++++++++++++++
 4 files changed, 1133 insertions(+)
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100644 scripts/qmp_helper.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e1f662fa0e0..99a9ba5c2ace 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2081,6 +2081,9 @@ S: Maintained
 F: hw/arm/ghes_cper.c
 F: hw/acpi/ghes_cper_stub.c
 F: qapi/acpi-hest.json
+F: scripts/ghes_inject.py
+F: scripts/arm_processor_error.py
+F: scripts/qmp_helper.py
 
 ppc4xx
 L: qemu-ppc@nongnu.org
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
new file mode 100644
index 000000000000..62e0c5662232
--- /dev/null
+++ b/scripts/arm_processor_error.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+# TODO: current implementation has dummy defaults.
+#
+# For a better implementation, a QMP addition/call is needed to
+# retrieve some data for ARM Processor Error injection:
+#
+#   - ARM registers: power_state, mpidr.
+
+import argparse
+import re
+
+from qmp_helper import qmp, util, cper_guid
+
+class ArmProcessorEinj:
+    """
+    Implements ARM Processor Error injection via GHES
+    """
+
+    DESC = """
+    Generates an ARM processor error CPER, compatible with
+    UEFI 2.9A Errata.
+    """
+
+    ACPI_GHES_ARM_CPER_LENGTH = 40
+    ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
+
+    # Context types
+    CONTEXT_AARCH32_EL1 = 1
+    CONTEXT_AARCH64_EL1 = 5
+    CONTEXT_MISC_REG = 8
+
+    def __init__(self, subparsers):
+        """Initialize the error injection class and add subparser"""
+
+        # Valid choice values
+        self.arm_valid_bits = {
+            "mpidr":    util.bit(0),
+            "affinity": util.bit(1),
+            "running":  util.bit(2),
+            "vendor":   util.bit(3),
+        }
+
+        self.pei_flags = {
+            "first":        util.bit(0),
+            "last":         util.bit(1),
+            "propagated":   util.bit(2),
+            "overflow":     util.bit(3),
+        }
+
+        self.pei_error_types = {
+            "cache":        util.bit(1),
+            "tlb":          util.bit(2),
+            "bus":          util.bit(3),
+            "micro-arch":   util.bit(4),
+        }
+
+        self.pei_valid_bits = {
+            "multiple-error":   util.bit(0),
+            "flags":            util.bit(1),
+            "error-info":       util.bit(2),
+            "virt-addr":        util.bit(3),
+            "phy-addr":         util.bit(4),
+        }
+
+        self.data = bytearray()
+
+        parser = subparsers.add_parser("arm", description=self.DESC)
+
+        arm_valid_bits = ",".join(self.arm_valid_bits.keys())
+        flags = ",".join(self.pei_flags.keys())
+        error_types = ",".join(self.pei_error_types.keys())
+        pei_valid_bits = ",".join(self.pei_valid_bits.keys())
+
+        # UEFI N.16 ARM Validation bits
+        g_arm = parser.add_argument_group("ARM processor")
+        g_arm.add_argument("--arm", "--arm-valid",
+                           help=f"ARM valid bits: {arm_valid_bits}")
+        g_arm.add_argument("-a", "--affinity",  "--level", "--affinity-level",
+                           type=lambda x: int(x, 0),
+                           help="Affinity level (when multiple levels apply)")
+        g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
+                           help="Multiprocessor Affinity Register")
+        g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
+                           help="Main ID Register")
+        g_arm.add_argument("-r", "--running",
+                           action=argparse.BooleanOptionalAction,
+                           default=None,
+                           help="Indicates if the processor is running or not")
+        g_arm.add_argument("--psci", "--psci-state",
+                           type=lambda x: int(x, 0),
+                           help="Power State Coordination Interface - PSCI state")
+
+        # TODO: Add vendor-specific support
+
+        # UEFI N.17 bitmaps (type and flags)
+        g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
+        g_pei.add_argument("-t", "--type", nargs="+",
+                        help=f"one or more error types: {error_types}")
+        g_pei.add_argument("-f", "--flags", nargs="*",
+                        help=f"zero or more error flags: {flags}")
+        g_pei.add_argument("-V", "--pei-valid", "--error-valid", nargs="*",
+                        help=f"zero or more PEI valid bits: {pei_valid_bits}")
+
+        # UEFI N.17 Integer values
+        g_pei.add_argument("-m", "--multiple-error", nargs="+",
+                        help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")
+        g_pei.add_argument("-e", "--error-info", nargs="+",
+                        help="Error information (UEFI 2.10 tables N.18 to N.20)")
+        g_pei.add_argument("-p", "--physical-address",  nargs="+",
+                        help="Physical address")
+        g_pei.add_argument("-v", "--virtual-address",  nargs="+",
+                        help="Virtual address")
+
+        # UEFI N.21 Context
+        g_ctx = parser.add_argument_group("Processor Context")
+        g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
+                        help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")
+        g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
+                        help="Minimal size of the context")
+        g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
+                        help="Comma-separated arrays for each context")
+
+        # Vendor-specific data
+        g_vendor = parser.add_argument_group("Vendor-specific data")
+        g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
+                        help="Vendor-specific byte arrays of data")
+
+        # Add arguments for Generic Error Data
+        qmp.argparse(parser)
+
+        parser.set_defaults(func=self.send_cper)
+
+    def send_cper(self, args):
+        """Parse subcommand arguments and send a CPER via QMP"""
+
+        qmp_cmd = qmp(args.host, args.port, args.debug)
+
+        # Handle Generic Error Data arguments if any
+        qmp_cmd.set_args(args)
+
+        is_cpu_type = re.compile(r"^([\w+]+\-)?arm\-cpu$")
+        cpus = qmp_cmd.search_qom("/machine/unattached/device",
+                                  "type", is_cpu_type)
+
+        cper = {}
+        pei = {}
+        ctx = {}
+        vendor = {}
+
+        arg = vars(args)
+
+        # Handle global parameters
+        if args.arm:
+            arm_valid_init = False
+            cper["valid"] = util.get_choice(name="valid",
+                                       value=args.arm,
+                                       choices=self.arm_valid_bits,
+                                       suffixes=["-error", "-err"])
+        else:
+            cper["valid"] = 0
+            arm_valid_init = True
+
+        if "running" in arg:
+            if args.running:
+                cper["running-state"] = util.bit(0)
+            else:
+                cper["running-state"] = 0
+        else:
+            cper["running-state"] = 0
+
+        if arm_valid_init:
+            if args.affinity:
+                cper["valid"] |= self.arm_valid_bits["affinity"]
+
+            if args.mpidr:
+                cper["valid"] |= self.arm_valid_bits["mpidr"]
+
+            if "running-state" in cper:
+                cper["valid"] |= self.arm_valid_bits["running"]
+
+            if args.psci:
+                cper["valid"] |= self.arm_valid_bits["running"]
+
+        # Handle PEI
+        if not args.type:
+            args.type = ["cache-error"]
+
+        util.get_mult_choices(
+            pei,
+            name="valid",
+            values=args.pei_valid,
+            choices=self.pei_valid_bits,
+            suffixes=["-valid", "--addr"],
+        )
+        util.get_mult_choices(
+            pei,
+            name="type",
+            values=args.type,
+            choices=self.pei_error_types,
+            suffixes=["-error", "-err"],
+        )
+        util.get_mult_choices(
+            pei,
+            name="flags",
+            values=args.flags,
+            choices=self.pei_flags,
+            suffixes=["-error", "-cap"],
+        )
+        util.get_mult_int(pei, "error-info", args.error_info)
+        util.get_mult_int(pei, "multiple-error", args.multiple_error)
+        util.get_mult_int(pei, "phy-addr", args.physical_address)
+        util.get_mult_int(pei, "virt-addr", args.virtual_address)
+
+        # Handle context
+        util.get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
+        util.get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
+        util.get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
+
+        util.get_mult_array(vendor, "bytes", args.vendor, max_val=255)
+
+        # Store PEI
+        pei_data = bytearray()
+        default_flags  = self.pei_flags["first"]
+        default_flags |= self.pei_flags["last"]
+
+        error_info_num = 0
+
+        for i, p in pei.items():        # pylint: disable=W0612
+            error_info_num += 1
+
+            # UEFI 2.10 doesn't define how to encode error information
+            # when multiple types are raised. So, provide a default only
+            # if a single type is there
+            if "error-info" not in p:
+                if p["type"] == util.bit(1):
+                    p["error-info"] = 0x0091000F
+                if p["type"] == util.bit(2):
+                    p["error-info"] = 0x0054007F
+                if p["type"] == util.bit(3):
+                    p["error-info"] = 0x80D6460FFF
+                if p["type"] == util.bit(4):
+                    p["error-info"] = 0x78DA03FF
+
+            if "valid" not in p:
+                p["valid"] = 0
+                if "multiple-error" in p:
+                    p["valid"] |= self.pei_valid_bits["multiple-error"]
+
+                if "flags" in p:
+                    p["valid"] |= self.pei_valid_bits["flags"]
+
+                if "error-info" in p:
+                    p["valid"] |= self.pei_valid_bits["error-info"]
+
+                if "phy-addr" in p:
+                    p["valid"] |= self.pei_valid_bits["phy-addr"]
+
+                if "virt-addr" in p:
+                    p["valid"] |= self.pei_valid_bits["virt-addr"]
+
+            # Version
+            util.data_add(pei_data, 0, 1)
+
+            util.data_add(pei_data,
+                         self.ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
+
+            util.data_add(pei_data, p["valid"], 2)
+            util.data_add(pei_data, p["type"], 1)
+            util.data_add(pei_data, p.get("multiple-error", 1), 2)
+            util.data_add(pei_data, p.get("flags", default_flags), 1)
+            util.data_add(pei_data, p.get("error-info", 0), 8)
+            util.data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
+            util.data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
+
+        # Store Context
+        ctx_data = bytearray()
+        context_info_num = 0
+
+        if ctx:
+            ret = qmp_cmd.send_cmd("query-target", may_open=True)
+
+            default_ctx = self.CONTEXT_MISC_REG
+
+            if "arch" in ret:
+                if ret["arch"] == "aarch64":
+                    default_ctx = self.CONTEXT_AARCH64_EL1
+                elif ret["arch"] == "arm":
+                    default_ctx = self.CONTEXT_AARCH32_EL1
+
+            for k in sorted(ctx.keys()):
+                context_info_num += 1
+
+                if "type" not in ctx[k]:
+                    ctx[k]["type"] = default_ctx
+
+                if "register" not in ctx[k]:
+                    ctx[k]["register"] = []
+
+                reg_size = len(ctx[k]["register"])
+                size = 0
+
+                if "minimal-size" in ctx:
+                    size = ctx[k]["minimal-size"]
+
+                size = max(size, reg_size)
+
+                size = (size + 1) % 0xFFFE
+
+                # Version
+                util.data_add(ctx_data, 0, 2)
+
+                util.data_add(ctx_data, ctx[k]["type"], 2)
+
+                util.data_add(ctx_data, 8 * size, 4)
+
+                for r in ctx[k]["register"]:
+                    util.data_add(ctx_data, r, 8)
+
+                for i in range(reg_size, size):   # pylint: disable=W0612
+                    util.data_add(ctx_data, 0, 8)
+
+        # Vendor-specific bytes are not grouped
+        vendor_data = bytearray()
+        if vendor:
+            for k in sorted(vendor.keys()):
+                for b in vendor[k]["bytes"]:
+                    util.data_add(vendor_data, b, 1)
+
+        # Encode ARM Processor Error
+        data = bytearray()
+
+        util.data_add(data, cper["valid"], 4)
+
+        util.data_add(data, error_info_num, 2)
+        util.data_add(data, context_info_num, 2)
+
+        # Calculate the length of the CPER data
+        cper_length = self.ACPI_GHES_ARM_CPER_LENGTH
+        cper_length += len(pei_data)
+        cper_length += len(vendor_data)
+        cper_length += len(ctx_data)
+        util.data_add(data, cper_length, 4)
+
+        util.data_add(data, arg.get("affinity-level", 0), 1)
+
+        # Reserved
+        util.data_add(data, 0, 3)
+
+        if "midr-el1" not in arg:
+            if cpus:
+                cmd_arg = {
+                    'path': cpus[0],
+                    'property': "midr"
+                }
+                ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
+                if isinstance(ret, int):
+                    arg["midr-el1"] = ret
+
+        util.data_add(data, arg.get("mpidr-el1", 0), 8)
+        util.data_add(data, arg.get("midr-el1", 0), 8)
+        util.data_add(data, cper["running-state"], 4)
+        util.data_add(data, arg.get("psci-state", 0), 4)
+
+        # Add PEI
+        data.extend(pei_data)
+        data.extend(ctx_data)
+        data.extend(vendor_data)
+
+        self.data = data
+
+        qmp_cmd.send_cper(cper_guid.CPER_PROC_ARM, self.data)
diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
new file mode 100755
index 000000000000..67cb6077bec8
--- /dev/null
+++ b/scripts/ghes_inject.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+"""
+
+import argparse
+import sys
+
+from arm_processor_error import ArmProcessorEinj
+
+EINJ_DESC = """
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+
+It allows using UEFI BIOS EINJ features to generate GHES records.
+
+It helps testing CPER and GHES drivers at the guest OS and how
+userspace applications at the guest handle them.
+"""
+
+def main():
+    """Main program"""
+
+    # Main parser - handle generic args like QEMU QMP TCP socket options
+    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+                                     usage="%(prog)s [options]",
+                                     description=EINJ_DESC)
+
+    g_options = parser.add_argument_group("QEMU QMP socket options")
+    g_options.add_argument("-H", "--host", default="localhost", type=str,
+                           help="host name")
+    g_options.add_argument("-P", "--port", default=4445, type=int,
+                           help="TCP port number")
+    g_options.add_argument('-d', '--debug', action='store_true')
+
+    subparsers = parser.add_subparsers()
+
+    ArmProcessorEinj(subparsers)
+
+    args = parser.parse_args()
+    if "func" in args:
+        args.func(args)
+    else:
+        sys.exit(f"Please specify a valid command for {sys.argv[0]}")
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
new file mode 100644
index 000000000000..357ebc6e8359
--- /dev/null
+++ b/scripts/qmp_helper.py
@@ -0,0 +1,702 @@
+#!/usr/bin/env python3
+#
+# # pylint: disable=C0103,E0213,E1135,E1136,E1137,R0902,R0903,R0912,R0913
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Helper classes to be used by ghes_inject command classes.
+"""
+
+import json
+import sys
+
+from datetime import datetime
+from os import path as os_path
+
+try:
+    qemu_dir = os_path.abspath(os_path.dirname(os_path.dirname(__file__)))
+    sys.path.append(os_path.join(qemu_dir, 'python'))
+
+    from qemu.qmp.legacy import QEMUMonitorProtocol
+
+except ModuleNotFoundError as exc:
+    print(f"Module '{exc.name}' not found.")
+    print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir")
+    sys.exit(1)
+
+from base64 import b64encode
+
+class util:
+    """
+    Ancillary functions to deal with bitmaps, parse arguments,
+    generate GUID and encode data on a bytearray buffer.
+    """
+
+    #
+    # Helper routines to handle multiple choice arguments
+    #
+    def get_choice(name, value, choices, suffixes=None, bitmask=True):
+        """Produce a list from multiple choice argument"""
+
+        new_values = 0
+
+        if not value:
+            return new_values
+
+        for val in value.split(","):
+            val = val.lower()
+
+            if suffixes:
+                for suffix in suffixes:
+                    val = val.removesuffix(suffix)
+
+            if val not in choices.keys():
+                if suffixes:
+                    for suffix in suffixes:
+                        if val + suffix in choices.keys():
+                            val += suffix
+                            break
+
+            if val not in choices.keys():
+                sys.exit(f"Error on '{name}': choice '{val}' is invalid.")
+
+            val = choices[val]
+
+            if bitmask:
+                new_values |= val
+            else:
+                if new_values:
+                    sys.exit(f"Error on '{name}': only one value is accepted.")
+
+                new_values = val
+
+        return new_values
+
+    def get_array(name, values, max_val=None):
+        """Add numbered hashes from integer lists into an array"""
+
+        array = []
+
+        for value in values:
+            for val in value.split(","):
+                try:
+                    val = int(val, 0)
+                except ValueError:
+                    sys.exit(f"Error on '{name}': {val} is not an integer")
+
+                if val < 0:
+                    sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+                if max_val and val > max_val:
+                    sys.exit(f"Error on '{name}': {val} is too little")
+
+                array.append(val)
+
+        return array
+
+    def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
+        """Add numbered hashes from integer lists"""
+
+        if not allow_zero:
+            if not values:
+                return
+        else:
+            if values is None:
+                return
+
+            if not values:
+                i = 0
+                if i not in mult:
+                    mult[i] = {}
+
+                mult[i][name] = []
+                return
+
+        i = 0
+        for value in values:
+            for val in value.split(","):
+                try:
+                    val = int(val, 0)
+                except ValueError:
+                    sys.exit(f"Error on '{name}': {val} is not an integer")
+
+                if val < 0:
+                    sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+                if max_val and val > max_val:
+                    sys.exit(f"Error on '{name}': {val} is too little")
+
+                if i not in mult:
+                    mult[i] = {}
+
+                if name not in mult[i]:
+                    mult[i][name] = []
+
+                mult[i][name].append(val)
+
+            i += 1
+
+
+    def get_mult_choices(mult, name, values, choices,
+                        suffixes=None, allow_zero=False):
+        """Add numbered hashes from multiple choice arguments"""
+
+        if not allow_zero:
+            if not values:
+                return
+        else:
+            if values is None:
+                return
+
+        i = 0
+        for val in values:
+            new_values = util.get_choice(name, val, choices, suffixes)
+
+            if i not in mult:
+                mult[i] = {}
+
+            mult[i][name] = new_values
+            i += 1
+
+
+    def get_mult_int(mult, name, values, allow_zero=False):
+        """Add numbered hashes from integer arguments"""
+        if not allow_zero:
+            if not values:
+                return
+        else:
+            if values is None:
+                return
+
+        i = 0
+        for val in values:
+            try:
+                val = int(val, 0)
+            except ValueError:
+                sys.exit(f"Error on '{name}': {val} is not an integer")
+
+            if val < 0:
+                sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+            if i not in mult:
+                mult[i] = {}
+
+            mult[i][name] = val
+            i += 1
+
+
+    #
+    # Data encode helper functions
+    #
+    def bit(b):
+        """Simple macro to define a bit on a bitmask"""
+        return 1 << b
+
+
+    def data_add(data, value, num_bytes):
+        """Adds bytes from value inside a bitarray"""
+
+        data.extend(value.to_bytes(num_bytes, byteorder="little"))  # pylint: disable=E1101
+
+    def dump_bytearray(name, data):
+        """Does an hexdump of a byte array, grouping in bytes"""
+
+        print(f"{name} ({len(data)} bytes):")
+
+        for ln_start in range(0, len(data), 16):
+            ln_end = min(ln_start + 16, len(data))
+            print(f"      {ln_start:08x}  ", end="")
+            for i in range(ln_start, ln_end):
+                print(f"{data[i]:02x} ", end="")
+            for i in range(ln_end, ln_start + 16):
+                print("   ", end="")
+            print("  ", end="")
+            for i in range(ln_start, ln_end):
+                if data[i] >= 32 and data[i] < 127:
+                    print(chr(data[i]), end="")
+                else:
+                    print(".", end="")
+
+            print()
+        print()
+
+    def time(string):
+        """Handle BCD timestamps used on Generic Error Data Block"""
+
+        time = None
+
+        # Formats to be used when parsing time stamps
+        formats = [
+            "%Y-%m-%d %H:%M:%S",
+        ]
+
+        if string == "now":
+            time = datetime.now()
+
+        if time is None:
+            for fmt in formats:
+                try:
+                    time = datetime.strptime(string, fmt)
+                    break
+                except ValueError:
+                    pass
+
+            if time is None:
+                raise ValueError("Invalid time format")
+
+        return time
+
+class guid:
+    """
+    Simple class to handle GUID fields.
+    """
+
+    def __init__(self, time_low, time_mid, time_high, nodes):
+        """Initialize a GUID value"""
+
+        assert len(nodes) == 8
+
+        self.time_low = time_low
+        self.time_mid = time_mid
+        self.time_high = time_high
+        self.nodes = nodes
+
+    @classmethod
+    def UUID(cls, guid_str):
+        """Initialize a GUID using a string on its standard format"""
+
+        if len(guid_str) != 36:
+            print("Size not 36")
+            raise ValueError('Invalid GUID size')
+
+        # It is easier to parse without separators. So, drop them
+        guid_str = guid_str.replace('-', '')
+
+        if len(guid_str) != 32:
+            print("Size not 32", guid_str, len(guid_str))
+            raise ValueError('Invalid GUID hex size')
+
+        time_low = 0
+        time_mid = 0
+        time_high = 0
+        nodes = []
+
+        for i in reversed(range(16, 32, 2)):
+            h = guid_str[i:i + 2]
+            value = int(h, 16)
+            nodes.insert(0, value)
+
+        time_high = int(guid_str[12:16], 16)
+        time_mid = int(guid_str[8:12], 16)
+        time_low = int(guid_str[0:8], 16)
+
+        return cls(time_low, time_mid, time_high, nodes)
+
+    def __str__(self):
+        """Output a GUID value on its default string representation"""
+
+        clock = self.nodes[0] << 8 | self.nodes[1]
+
+        node = 0
+        for i in range(2, len(self.nodes)):
+            node = node << 8 | self.nodes[i]
+
+        s = f"{self.time_low:08x}-{self.time_mid:04x}-"
+        s += f"{self.time_high:04x}-{clock:04x}-{node:012x}"
+        return s
+
+    def to_bytes(self):
+        """Output a GUID value in bytes"""
+
+        data = bytearray()
+
+        util.data_add(data, self.time_low, 4)
+        util.data_add(data, self.time_mid, 2)
+        util.data_add(data, self.time_high, 2)
+        data.extend(bytearray(self.nodes))
+
+        return data
+
+class qmp:
+    """
+    Opens a connection and send/receive QMP commands.
+    """
+
+    def send_cmd(self, command, args=None, may_open=False, return_error=True):
+        """Send a command to QMP, optinally opening a connection"""
+
+        if may_open:
+            self._connect()
+        elif not self.connected:
+            return False
+
+        msg = { 'execute': command }
+        if args:
+            msg['arguments'] = args
+
+        try:
+            obj = self.qmp_monitor.cmd_obj(msg)
+        # Can we use some other exception class here?
+        except Exception as e:                         # pylint: disable=W0718
+            print(f"Command: {command}")
+            print(f"Failed to inject error: {e}.")
+            return None
+
+        if "return" in obj:
+            if isinstance(obj.get("return"), dict):
+                if obj["return"]:
+                    return obj["return"]
+                return "OK"
+
+            return obj["return"]
+
+        if isinstance(obj.get("error"), dict):
+            error = obj["error"]
+            if return_error:
+                print(f"Command: {msg}")
+                print(f'{error["class"]}: {error["desc"]}')
+        else:
+            print(json.dumps(obj))
+
+        return None
+
+    def _close(self):
+        """Shutdown and close the socket, if opened"""
+        if not self.connected:
+            return
+
+        self.qmp_monitor.close()
+        self.connected = False
+
+    def _connect(self):
+        """Connect to a QMP TCP/IP port, if not connected yet"""
+
+        if self.connected:
+            return True
+
+        try:
+            self.qmp_monitor.connect(negotiate=True)
+        except ConnectionError:
+            sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
+
+        self.connected = True
+
+        return True
+
+    BLOCK_STATUS_BITS = {
+        "uncorrectable":            util.bit(0),
+        "correctable":              util.bit(1),
+        "multi-uncorrectable":      util.bit(2),
+        "multi-correctable":        util.bit(3),
+    }
+
+    ERROR_SEVERITY = {
+        "recoverable":  0,
+        "fatal":        1,
+        "corrected":    2,
+        "none":         3,
+    }
+
+    VALIDATION_BITS = {
+        "fru-id":       util.bit(0),
+        "fru-text":     util.bit(1),
+        "timestamp":    util.bit(2),
+    }
+
+    GEDB_FLAGS_BITS = {
+        "recovered":    util.bit(0),
+        "prev-error":   util.bit(1),
+        "simulated":    util.bit(2),
+    }
+
+    GENERIC_DATA_SIZE = 72
+
+    def argparse(parser):
+        """Prepare a parser group to query generic error data"""
+
+        block_status_bits = ",".join(qmp.BLOCK_STATUS_BITS.keys())
+        error_severity_enum = ",".join(qmp.ERROR_SEVERITY.keys())
+        validation_bits = ",".join(qmp.VALIDATION_BITS.keys())
+        gedb_flags_bits = ",".join(qmp.GEDB_FLAGS_BITS.keys())
+
+        g_gen = parser.add_argument_group("Generic Error Data")  # pylint: disable=E1101
+        g_gen.add_argument("--block-status",
+                           help=f"block status bits: {block_status_bits}")
+        g_gen.add_argument("--raw-data", nargs="+",
+                        help="Raw data inside the Error Status Block")
+        g_gen.add_argument("--error-severity", "--severity",
+                           help=f"error severity: {error_severity_enum}")
+        g_gen.add_argument("--gen-err-valid-bits",
+                           "--generic-error-validation-bits",
+                           help=f"validation bits: {validation_bits}")
+        g_gen.add_argument("--fru-id", type=guid.UUID,
+                           help="GUID representing a physical device")
+        g_gen.add_argument("--fru-text",
+                           help="ASCII string identifying the FRU hardware")
+        g_gen.add_argument("--timestamp", type=util.time,
+                           help="Time when the error info was collected")
+        g_gen.add_argument("--precise", "--precise-timestamp",
+                           action='store_true',
+                           help="Marks the timestamp as precise if --timestamp is used")
+        g_gen.add_argument("--gedb-flags",
+                           help=f"General Error Data Block flags: {gedb_flags_bits}")
+
+    def set_args(self, args):
+        """Set the arguments optionally defined via self.argparse()"""
+
+        if args.block_status:
+            self.block_status = util.get_choice(name="block-status",
+                                                value=args.block_status,
+                                                choices=self.BLOCK_STATUS_BITS,
+                                                bitmask=False)
+        if args.raw_data:
+            self.raw_data = util.get_array("raw-data", args.raw_data,
+                                           max_val=255)
+            print(self.raw_data)
+
+        if args.error_severity:
+            self.error_severity = util.get_choice(name="error-severity",
+                                                  value=args.error_severity,
+                                                  choices=self.ERROR_SEVERITY,
+                                                  bitmask=False)
+
+        if args.fru_id:
+            self.fru_id = args.fru_id.to_bytes()
+            if not args.gen_err_valid_bits:
+                self.validation_bits |= self.VALIDATION_BITS["fru-id"]
+
+        if args.fru_text:
+            text = bytearray(args.fru_text.encode('ascii'))
+            if len(text) > 20:
+                sys.exit("FRU text is too big to fit")
+
+            self.fru_text = text
+            if not args.gen_err_valid_bits:
+                self.validation_bits |= self.VALIDATION_BITS["fru-text"]
+
+        if args.timestamp:
+            time = args.timestamp
+            century = int(time.year / 100)
+
+            bcd = bytearray()
+            util.data_add(bcd, (time.second // 10) << 4 | (time.second % 10), 1)
+            util.data_add(bcd, (time.minute // 10) << 4 | (time.minute % 10), 1)
+            util.data_add(bcd, (time.hour // 10) << 4 | (time.hour % 10), 1)
+
+            if args.precise:
+                util.data_add(bcd, 1, 1)
+            else:
+                util.data_add(bcd, 0, 1)
+
+            util.data_add(bcd, (time.day // 10) << 4 | (time.day % 10), 1)
+            util.data_add(bcd, (time.month // 10) << 4 | (time.month % 10), 1)
+            util.data_add(bcd,
+                          ((time.year % 100) // 10) << 4 | (time.year % 10), 1)
+            util.data_add(bcd, ((century % 100) // 10) << 4 | (century % 10), 1)
+
+            self.timestamp = bcd
+            if not args.gen_err_valid_bits:
+                self.validation_bits |= self.VALIDATION_BITS["timestamp"]
+
+        if args.gen_err_valid_bits:
+            self.validation_bits = util.get_choice(name="validation",
+                                                   value=args.gen_err_valid_bits,
+                                                   choices=self.VALIDATION_BITS)
+
+    def __init__(self, host, port, debug=False):
+        """Initialize variables used by the QMP send logic"""
+
+        self.connected = False
+        self.host = host
+        self.port = port
+        self.debug = debug
+
+        # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Status Block
+        self.block_status = self.BLOCK_STATUS_BITS["uncorrectable"]
+        self.raw_data = []
+        self.error_severity = self.ERROR_SEVERITY["recoverable"]
+
+        # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Data Entry
+        self.validation_bits = 0
+        self.flags = 0
+        self.fru_id = bytearray(16)
+        self.fru_text = bytearray(20)
+        self.timestamp = bytearray(8)
+
+        self.qmp_monitor = QEMUMonitorProtocol(address=(self.host, self.port))
+
+    #
+    # Socket QMP send command
+    #
+    def send_cper_raw(self, cper_data):
+        """Send a raw CPER data to QEMU though QMP TCP socket"""
+
+        data = b64encode(bytes(cper_data)).decode('ascii')
+
+        cmd_arg = {
+            'cper': data
+        }
+
+        self._connect()
+
+        if self.send_cmd("inject-ghes-error", cmd_arg):
+            print("Error injected.")
+
+    def send_cper(self, notif_type, payload):
+        """Send commands to QEMU though QMP TCP socket"""
+
+        # Fill CPER record header
+
+        # NOTE: bits 4 to 13 of block status contain the number of
+        # data entries in the data section. This is currently unsupported.
+
+        cper_length = len(payload)
+        data_length = cper_length + len(self.raw_data) + self.GENERIC_DATA_SIZE
+
+        #  Generic Error Data Entry
+        gede = bytearray()
+
+        gede.extend(notif_type.to_bytes())
+        util.data_add(gede, self.error_severity, 4)
+        util.data_add(gede, 0x300, 2)
+        util.data_add(gede, self.validation_bits, 1)
+        util.data_add(gede, self.flags, 1)
+        util.data_add(gede, cper_length, 4)
+        gede.extend(self.fru_id)
+        gede.extend(self.fru_text)
+        gede.extend(self.timestamp)
+
+        # Generic Error Status Block
+        gebs = bytearray()
+
+        if self.raw_data:
+            raw_data_offset = len(gebs)
+        else:
+            raw_data_offset = 0
+
+        util.data_add(gebs, self.block_status, 4)
+        util.data_add(gebs, raw_data_offset, 4)
+        util.data_add(gebs, len(self.raw_data), 4)
+        util.data_add(gebs, data_length, 4)
+        util.data_add(gebs, self.error_severity, 4)
+
+        cper_data = bytearray()
+        cper_data.extend(gebs)
+        cper_data.extend(gede)
+        cper_data.extend(bytearray(self.raw_data))
+        cper_data.extend(bytearray(payload))
+
+        if self.debug:
+            print(f"GUID: {notif_type}")
+
+            util.dump_bytearray("Generic Error Status Block", gebs)
+            util.dump_bytearray("Generic Error Data Entry", gede)
+
+            if self.raw_data:
+                util.dump_bytearray("Raw data", bytearray(self.raw_data))
+
+            util.dump_bytearray("Payload", payload)
+
+        self.send_cper_raw(cper_data)
+
+
+    def search_qom(self, path, prop, regex):
+        """
+        Return a list of devices that match path array like:
+
+            /machine/unattached/device
+            /machine/peripheral-anon/device
+            ...
+        """
+
+        found = []
+
+        i = 0
+        while 1:
+            dev = f"{path}[{i}]"
+            args = {
+                'path': dev,
+                'property': prop
+            }
+            ret = self.send_cmd("qom-get", args, may_open=True, return_error=False)
+            if not ret:
+                break
+
+            if isinstance(ret, str):
+                if regex.search(ret):
+                    found.append(dev)
+
+            i += 1
+            if i > 10000:
+                print("Too many objects returned by qom-get!")
+                break
+
+        return found
+
+class cper_guid:
+    """
+    Contains CPER GUID, as per:
+    https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
+    """
+
+    CPER_PROC_GENERIC =  guid(0x9876CCAD, 0x47B4, 0x4bdb,
+                              [0xB6, 0x5E, 0x16, 0xF1,
+                               0x93, 0xC4, 0xF3, 0xDB])
+
+    CPER_PROC_X86 = guid(0xDC3EA0B0, 0xA144, 0x4797,
+                         [0xB9, 0x5B, 0x53, 0xFA,
+                          0x24, 0x2B, 0x6E, 0x1D])
+
+    CPER_PROC_ITANIUM = guid(0xe429faf1, 0x3cb7, 0x11d4,
+                             [0xbc, 0xa7, 0x00, 0x80,
+                              0xc7, 0x3c, 0x88, 0x81])
+
+    CPER_PROC_ARM = guid(0xE19E3D16, 0xBC11, 0x11E4,
+                         [0x9C, 0xAA, 0xC2, 0x05,
+                          0x1D, 0x5D, 0x46, 0xB0])
+
+    CPER_PLATFORM_MEM = guid(0xA5BC1114, 0x6F64, 0x4EDE,
+                             [0xB8, 0x63, 0x3E, 0x83,
+                              0xED, 0x7C, 0x83, 0xB1])
+
+    CPER_PLATFORM_MEM2 = guid(0x61EC04FC, 0x48E6, 0xD813,
+                              [0x25, 0xC9, 0x8D, 0xAA,
+                               0x44, 0x75, 0x0B, 0x12])
+
+    CPER_PCIE = guid(0xD995E954, 0xBBC1, 0x430F,
+                     [0xAD, 0x91, 0xB4, 0x4D,
+                      0xCB, 0x3C, 0x6F, 0x35])
+
+    CPER_PCI_BUS = guid(0xC5753963, 0x3B84, 0x4095,
+                        [0xBF, 0x78, 0xED, 0xDA,
+                         0xD3, 0xF9, 0xC9, 0xDD])
+
+    CPER_PCI_DEV = guid(0xEB5E4685, 0xCA66, 0x4769,
+                        [0xB6, 0xA2, 0x26, 0x06,
+                         0x8B, 0x00, 0x13, 0x26])
+
+    CPER_FW_ERROR = guid(0x81212A96, 0x09ED, 0x4996,
+                         [0x94, 0x71, 0x8D, 0x72,
+                          0x9C, 0x8E, 0x69, 0xED])
+
+    CPER_DMA_GENERIC = guid(0x5B51FEF7, 0xC79D, 0x4434,
+                            [0x8F, 0x1B, 0xAA, 0x62,
+                             0xDE, 0x3E, 0x2C, 0x64])
+
+    CPER_DMA_VT = guid(0x71761D37, 0x32B2, 0x45cd,
+                       [0xA7, 0xD0, 0xB0, 0xFE,
+                        0xDD, 0x93, 0xE8, 0xCF])
+
+    CPER_DMA_IOMMU = guid(0x036F84E1, 0x7F37, 0x428c,
+                         [0xA7, 0x9E, 0x57, 0x5F,
+                          0xDF, 0xAA, 0x84, 0xEC])
+
+    CPER_CCIX_PER = guid(0x91335EF6, 0xEBFB, 0x4478,
+                         [0xA6, 0xA6, 0x88, 0xB7,
+                          0x28, 0xCF, 0x75, 0xD7])
+
+    CPER_CXL_PROT_ERR = guid(0x80B9EFB4, 0x52B5, 0x4DE3,
+                             [0xA7, 0x77, 0x68, 0x78,
+                              0x4B, 0x77, 0x10, 0x48])
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes
  2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2025-01-23  9:56   ` Jonathan Cameron
  2025-01-23 16:48   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23  9:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Peter Maydell, Shannon Zhao,
	linux-kernel

On Wed, 22 Jan 2025 16:46:18 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The current code is actually dependent on having just one error
> structure with a single source.
> 
> As the number of sources should be arch-dependent, as it will depend on
> what kind of synchronous/assynchronous notifications will exist, change

asynchronous.

> the logic to dynamically build the table.
> 
> Yet, for a proper support, we need to get the number of sources by
> reading the number from the HEST table. However, bios currently doesn't
> store a pointer to it.
> 
> For now just change the logic at table build time, while enforcing that
> it will behave like before with a single source ID.
> 
> A future patch will add a HEST table bios pointer and change the logic
> at acpi_ghes_record_errors() to dynamically use the new size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-22 15:46 ` [PATCH 02/11] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2025-01-23 10:02   ` Jonathan Cameron
  2025-01-23 11:46     ` Mauro Carvalho Chehab
                       ` (2 more replies)
  2025-01-29 13:33   ` Igor Mammedov
  1 sibling, 3 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Wed, 22 Jan 2025 16:46:19 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Store HEST table address at GPA, placing its content at
> hest_addr_le variable.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
A few trivial things inline.

Jonathan

> ---
> 
> Change from v8:
> - hest_addr_lr is now pointing to the error source size and data.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Bonus.  I guess you really like this patch :)
> ---
>  hw/acpi/ghes.c         | 17 ++++++++++++++++-
>  include/hw/acpi/ghes.h |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 3f519ccab90d..34e3364d3fd8 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -30,6 +30,7 @@
>  
>  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
>  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
>  
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
>      }
>  
>      /*
> -     * tell firmware to write hardware_errors GPA into
> +     * Tell firmware to write hardware_errors GPA into

Sneaky tidy up.  No problem with it in general but adding noise here, so if there
are others in the series maybe gather them up in a cleanup patch.

>       * hardware_errors_addr fw_cfg, once the former has been initialized.
>       */
>      bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> @@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>  
>      acpi_table_begin(&table, table_data);
>  
> +    int hest_offset = table_data->len;

Local style looks to be traditional C with definitions at top.  Maybe define
hest_offset up a few lines and just set it here?

> +
>      /* Error Source Count */
>      build_append_int_noprefix(table_data, num_sources, 4);
>      for (i = 0; i < num_sources; i++) {
> @@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>      }
>  
>      acpi_table_end(linker, &table);
> +
> +    /*
> +     * tell firmware to write into GPA the address of HEST via fw_cfg,

Given the tidy up above, fix this one to have a capital T, or was this
where you meant to change it?

> +     * once initialized.
> +     */
> +    bios_linker_loader_write_pointer(linker,
> +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,

Could wrap less and stay under 80 chars as both lines above add up to 70 something

> +                                     sizeof(uint64_t),
> +                                     ACPI_BUILD_TABLE_FILE, hest_offset);
>  }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records
  2025-01-22 15:46 ` [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
@ 2025-01-23 10:29   ` Jonathan Cameron
  2025-01-23 18:23     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:29 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Wed, 22 Jan 2025 16:46:20 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There are two pointers that are needed during error injection:
> 
> 1. The start address of the CPER block to be stored;
> 2. The address of the ack, which needs a reset before next error.
> 
> It is preferable to calculate them from the HEST table.  This allows
> checking the source ID, the size of the table and the type of the
> HEST error block structures.
> 
> Yet, keep the old code, as this is needed for migration purposes.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Generally looks good.  A few bits that I think could be made
easier to follow for anyone with the spec open in front of them.

Thanks,

Jonathan

> ---
>  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 88 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 34e3364d3fd8..b46b563bcaf8 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -61,6 +61,23 @@
>   */
>  #define ACPI_GHES_GESB_SIZE                 20
>  
> +/*
> + * Offsets with regards to the start of the HEST table stored at
> + * ags->hest_addr_le, according with the memory layout map at
> + * docs/specs/acpi_hest_ghes.rst.
> + */
> +
> +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2

Local multiline comment style seems to be always
/*
 * ACPI 6.2:...

So perhaps good to copy that.

> + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> + */
> +#define HEST_GHES_V2_TABLE_SIZE  92
> +#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)

Using a GAS offset here to me obscures what is going on.  I'd
explicitly handle the GAS where you are reading this.
We probably should sanity check the type as there are
some crazy options that might turn up one day.

Maybe worth using spec term of
GHES_READ_ACK_...

Obviously it's a question of who you are for whether it is read or
write, but maybe still worth using that term for easy checking
against the specification.

> +
> +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
same on comment format.

> + * Table 18-380: 'Error Status Address' field
> + */
> +#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
Maybe STS or spell out status? I found ST a bit confusing below.

> +
>  /*
>   * Values for error_severity field
>   */
> @@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
>  {
>      int i, error_status_block_offset;
>  
> -    /*
> -     * TODO: Current version supports only one source.
> -     * A further patch will drop this check, after adding a proper migration
> -     * code, as, for the code to work, we need to store a bios pointer to the
> -     * HEST table.
> -     */
> -    assert(num_sources == 1);
> -
>      /* Build error_block_address */
>      for (i = 0; i < num_sources; i++) {
>          build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> @@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
>      *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
>  }
>  
> +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> +                                    uint64_t *cper_addr,
> +                                    uint64_t *read_ack_start_addr,
> +                                    Error **errp)
> +{
> +    uint64_t hest_err_block_addr, hest_read_ack_addr;
> +    uint64_t err_source_struct, error_block_addr;
> +    uint32_t num_sources, i;
> +
> +    if (!hest_addr) {
I guess it is a question of matching local code, but I'd be tempted
to name this hest_body_addr as it isn't the start of the table but
rather the bit after the header.

> +        return;
> +    }
> +
> +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));

The hest_addr naming thing confused me a tiny bit here because obviously num_sources
isn't the first thing in the table in the spec!

> +    num_sources = le32_to_cpu(num_sources);
> +
> +    err_source_struct = hest_addr + sizeof(num_sources);
> +
> +    /*
> +     * Currently, HEST Error source navigates only for GHESv2 tables
> +     */
> +
> +    for (i = 0; i < num_sources; i++) {
> +        uint64_t addr = err_source_struct;
> +        uint16_t type, src_id;
> +
> +        cpu_physical_memory_read(addr, &type, sizeof(type));
> +        type = le16_to_cpu(type);
> +
> +        /* For now, we only know the size of GHESv2 table */
> +        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> +            error_setg(errp, "HEST: type %d not supported.", type);
> +            return;

It's a pity we can't just skip them, but absence of a size field
makes that tricky...  Can add that later I guess along with sizes
for each defined type including figuring out the variable length
ones like IA-32 machine check.  I guess this is why the whole ordering
constraint for new types was added. Can't find the old ones if
we don't know the size of the new ones, hence need new definitions
at the end.

Anyhow, I'm fine with this but maybe a little more description in the comment
would avoid someone going down same rat hole I just did.


> +        }
> +
> +        /* Compare CPER source address at the GHESv2 structure */
> +        addr += sizeof(type);
> +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> +
> +        if (src_id == source_id) {
> +            break;
> +        }
> +
> +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> +    }
> +    if (i == num_sources) {
> +        error_setg(errp, "HEST: Source %d not found.", source_id);
> +        return;
> +    }
> +
> +    /* Navigate though table address pointers */
> +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;

So this is a bit confusing. I'd pull the GAS offset down here rather
than putting it in the define. That way we can clearly see you
are grabbing the address field.  As above, should we check the type
is 0x00?  There are many fun things it could be but here I think
we just want it to be memory space.

> +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;

Perhaps move this down to above where it is used?
Same thing about GAS address offset being better found down here
rather than hidden in GHES_ACK_OFFSET.

> +
> +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> +                             sizeof(error_block_addr));
> +
> +    cpu_physical_memory_read(error_block_addr, cper_addr,
> +                             sizeof(*cper_addr));
> +
> +    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> +                             sizeof(*read_ack_start_addr));
> +}
> +
>  void ghes_record_cper_errors(const void *cper, size_t len,
>                               uint16_t source_id, Error **errp)
>  {
> @@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>      }
>      ags = &acpi_ged_state->ghes_state;
>  
> -    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> -                         &cper_addr, &read_ack_register_addr);
> +    if (!ags->hest_addr_le) {
> +        get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> +                             &cper_addr, &read_ack_register_addr);
> +    } else {
> +        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
> +                                &cper_addr, &read_ack_register_addr, errp);
> +    }
>  
>      if (!cper_addr) {
>          error_setg(errp, "can not find Generic Error Status Block");


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr
  2025-01-22 15:46 ` [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2025-01-23 10:31   ` Jonathan Cameron
  2025-01-24 10:08   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:31 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, linux-kernel

On Wed, 22 Jan 2025 16:46:21 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The GHES migration logic at GED should now support HEST table
> location too.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

I'm not an expert on migration logic, but with that in mind, this looks fine to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index c85d97ca3776..5346cae573b7 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -386,6 +386,34 @@ static const VMStateDescription vmstate_ghes_state = {
>      }
>  };
>  
> +static const VMStateDescription vmstate_hest = {
> +    .name = "acpi-hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool hest_needed(void *opaque)
> +{
> +    AcpiGedState *s = opaque;
> +    return s->ghes_state.hest_addr_le;
> +}
> +
> +static const VMStateDescription vmstate_hest_state = {
> +    .name = "acpi-ged/hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = hest_needed,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> +                       vmstate_hest, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_acpi_ged = {
>      .name = "acpi-ged",
>      .version_id = 1,
> @@ -398,6 +426,7 @@ static const VMStateDescription vmstate_acpi_ged = {
>          &vmstate_memhp_state,
>          &vmstate_cpuhp_state,
>          &vmstate_ghes_state,
> +        &vmstate_hest_state,
>          NULL
>      }
>  };


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available
  2025-01-22 15:46 ` [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
@ 2025-01-23 10:52   ` Jonathan Cameron
  2025-01-24 10:23   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Eduardo Habkost, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel

On Wed, 22 Jan 2025 16:46:22 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Create a new property (x-has-hest-addr) and use it to detect if
> the GHES table offsets can be calculated from the HEST address
> (qemu 9.2 and upper) or via the legacy way via an offset obtained
> from the hardware_errors firmware file.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready
  2025-01-22 15:46 ` [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
@ 2025-01-23 10:52   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Wed, 22 Jan 2025 16:46:23 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Some error injection notify methods are async, like GPIO
> notify. Add a notifier to be used when the error record is
> ready to be sent to the guest OS.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state
  2025-01-22 15:46 ` [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
@ 2025-01-23 10:54   ` Jonathan Cameron
  2025-01-24 12:25   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:54 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Paolo Bonzini, Peter Maydell,
	kvm, linux-kernel

On Wed, 22 Jan 2025 16:46:24 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Move the check logic into a common function and simplify the
> code which checks if GHES is enabled and was properly setup.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Seems a reasonable cleanup to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES
  2025-01-22 15:46 ` [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
@ 2025-01-23 10:56   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 10:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Peter Maydell, Shannon Zhao, linux-kernel

On Wed, 22 Jan 2025 16:46:26 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Adds support to ARM virtualization to allow handling
> generic error ACPI Event via GED & error source device.
> 
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> 
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Acked-by: Igor Mammedov <imammedo@redhat.com>
> 
> ---
> 
> Changes from v8:
> 
> - Added a call to the function that produces GHES generic
>   records, as this is now added earlier in this series.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Another bonus.

LGTM otherwise.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection
  2025-01-22 15:46 ` [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2025-01-23 11:00   ` Jonathan Cameron
  2025-01-24 12:40     ` Igor Mammedov
  2025-01-24 12:38   ` Igor Mammedov
  1 sibling, 1 reply; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 11:00 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Eric Blake,
	Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
	Shannon Zhao, linux-kernel

On Wed, 22 Jan 2025 16:46:27 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2, and add support for it for ARM guests.
> 
> Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
> independent. This is mapped at arch virt bindings, depending on the
> types supported by QEMU and by the BIOS. So, on ARM, this is supported
> via ACPI_GHES_NOTIFY_GPIO notification type.
> 
> This patch is co-authored:
>     - original ghes logic to inject a simple ARM record by Shiju Jose;
>     - generic logic to handle block addresses by Jonathan Cameron;
>     - generic GHESv2 error inject by Mauro Carvalho Chehab;
> 
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> 
> ---
> 
> Changes since v9:
> - ARM source IDs renamed to reflect SYNC/ASYNC;
> - command name changed to better reflect what it does;
> - some improvements at JSON documentation;
> - add a check for QMP source at the notification logic.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Another bonus.

Few trivial formatting comments, otherwise looks fine to me.

Jonathan

> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 5d29db3918dd..cf83c959b5ef 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -547,7 +547,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>      /* Write the generic error data entry into guest memory */
>      cpu_physical_memory_write(cper_addr, cper, len);
>  
> -    notifier_list_notify(&acpi_generic_error_notifiers, NULL);
> +    notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
>  }
>  
>  int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
> diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> new file mode 100644
> index 000000000000..02c47b41b990
> --- /dev/null
> +++ b/hw/acpi/ghes_cper.c
> @@ -0,0 +1,32 @@
> +/*
> + * CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/base64.h"
> +#include "qemu/error-report.h"
> +#include "qemu/uuid.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
> +{
> +
Odd blank line that can go.

> +    uint8_t *cper;
> +    size_t  len;
> +
> +    cper = qbase64_decode(qmp_cper, -1, &len, errp);
> +    if (!cper) {
> +        error_setg(errp, "missing GHES CPER payload");
> +        return;
> +    }
> +
> +    ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
> +}
> diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
> new file mode 100644
> index 000000000000..8782e2c02fa8
> --- /dev/null
> +++ b/hw/acpi/ghes_cper_stub.c
> @@ -0,0 +1,19 @@
> +/*
> + * Stub interface for CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
Trivial but I'd drop these trailing blank lines as they don't add
anything other than making people scroll further.

> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"

Trivial but doe we need ghes.h?

> +
> +void qmp_inject_ghes_error(const char *cper, Error **errp)
> +{
> +    error_setg(errp, "GHES QMP error inject is not compiled in");
> +}


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-23 10:02   ` Jonathan Cameron
@ 2025-01-23 11:46     ` Mauro Carvalho Chehab
  2025-01-23 17:01     ` Igor Mammedov
  2025-01-28 10:00     ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-23 11:46 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

Em Thu, 23 Jan 2025 10:02:17 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> > ---
> > 
> > Change from v8:
> > - hest_addr_lr is now pointing to the error source size and data.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Bonus.  I guess you really like this patch :)

There is something wrong here with git rebase - maybe related to
.git/hooks/pre-commit running checkpatch: every time I do a rebase,
it adds my SOB at the end of description, if not there already.
Not a problem for normal patches, but when the patch has a version
history, it ends placing such duplicated SOBs. That happens even
using --no-signoff and with:

	[format]
        	signOff = false

at git config. No idea how to fix it.

Thanks,
Mauro

--- 

This is the pre-commit hook:

#!/bin/sh
#
#
TMP=$(mktemp)

git diff --cached HEAD >$TMP

$PWD/scripts/checkpatch.pl --no-signoff -q $TMP >&2
ERR=$?

rm $TMP

exit $ERR




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject
  2025-01-22 15:46 ` [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
@ 2025-01-23 12:10   ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-23 12:10 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Cleber Rosa, John Snow, linux-kernel

On Wed, 22 Jan 2025 16:46:28 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Using the QMP GHESv2 API requires preparing a raw data array
> containing a CPER record.
> 
> Add a helper script with subcommands to prepare such data.
> 
> Currently, only ARM Processor error CPER record is supported.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
My Python is poor at best, so take that into account!

Some usage examples might be good to have as well.
Either in explicit docs or the patch description.

Jonathan


> ---
>  MAINTAINERS                    |   3 +
>  scripts/arm_processor_error.py | 377 ++++++++++++++++++
>  scripts/ghes_inject.py         |  51 +++
>  scripts/qmp_helper.py          | 702 +++++++++++++++++++++++++++++++++
>  4 files changed, 1133 insertions(+)
>  create mode 100644 scripts/arm_processor_error.py
>  create mode 100755 scripts/ghes_inject.py
>  create mode 100644 scripts/qmp_helper.py
> 

> diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> new file mode 100644
> index 000000000000..357ebc6e8359
> --- /dev/null
> +++ b/scripts/qmp_helper.py



> +    def send_cper(self, notif_type, payload):
> +        """Send commands to QEMU though QMP TCP socket"""
> +
> +        # Fill CPER record header
> +
> +        # NOTE: bits 4 to 13 of block status contain the number of
> +        # data entries in the data section. This is currently unsupported.

Not controllable, so always set to 0 or 1?  Or not set?

...


> +        self.send_cper_raw(cper_data)
> +

Trivial but maybe can improve consistency on spacing.  1 or 2 lines
between functions. I don't care which.

> +
> +    def search_qom(self, path, prop, regex):

> +class cper_guid:
> +    """
> +    Contains CPER GUID, as per:
> +    https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
> +    """

> +
> +    CPER_PROC_X86 = guid(0xDC3EA0B0, 0xA144, 0x4797,

Maybe follow the spec naming as IA32_X64?

> +                         [0xB9, 0x5B, 0x53, 0xFA,
> +                          0x24, 0x2B, 0x6E, 0x1D])
> +
> +    CPER_PROC_ITANIUM = guid(0xe429faf1, 0x3cb7, 0x11d4,
> +                             [0xbc, 0xa7, 0x00, 0x80,

To stop people falling down a hole, maybe call out that this
is not the format in the spec which is weird for this one case.

> +                              0xc7, 0x3c, 0x88, 0x81])
> +

> +

> +
> +    CPER_PLATFORM_MEM2 = guid(0x61EC04FC, 0x48E6, 0xD813,
> +                              [0x25, 0xC9, 0x8D, 0xAA,
> +                               0x44, 0x75, 0x0B, 0x12])
Huh. they missed this one in the big spec table but is
there in N.2.6



> +    CPER_PCI_DEV = guid(0xEB5E4685, 0xCA66, 0x4769,
CPER_PCI_COMPONENT would match N.2.9 naming.
If I recall PCI terminology right, component covers a bunch
of things that Device doesn't.


> +                        [0xB6, 0xA2, 0x26, 0x06,
> +                         0x8B, 0x00, 0x13, 0x26])
> +
> +    CPER_FW_ERROR = guid(0x81212A96, 0x09ED, 0x4996,
Another one oddly missing from the big table but not the broken
out sections.  Not our problem but this chunk of the
spec could do with tidying up!

> +                         [0x94, 0x71, 0x8D, 0x72,
> +                          0x9C, 0x8E, 0x69, 0xED])
> +
> +    CPER_DMA_GENERIC = guid(0x5B51FEF7, 0xC79D, 0x4434,
CPER_DMAR
Nothing to do with DMA in general.  All about x86 IOMMUs I think.

> +                            [0x8F, 0x1B, 0xAA, 0x62,
> +                             0xDE, 0x3E, 0x2C, 0x64])
> +

> +
> +    CPER_CXL_PROT_ERR = guid(0x80B9EFB4, 0x52B5, 0x4DE3,
> +                             [0xA7, 0x77, 0x68, 0x78,
> +                              0x4B, 0x77, 0x10, 0x48])

Maybe add the one for FRU Memory poison from the new 2.11 UEFI spec.
This will constantly need updating with new specs so no problem
if you'd rather stick to 2.10 only for now.
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes
  2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
  2025-01-23  9:56   ` Jonathan Cameron
@ 2025-01-23 16:48   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-23 16:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Peter Maydell, Shannon Zhao,
	linux-kernel

On Wed, 22 Jan 2025 16:46:18 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The current code is actually dependent on having just one error
> structure with a single source.
> 
> As the number of sources should be arch-dependent, as it will depend on
> what kind of synchronous/assynchronous notifications will exist, change

I 'd drop 'synchronous/assynchronous' and just leave broader 'notifications'

> the logic to dynamically build the table.
> 
> Yet, for a proper support, we need to get the number of sources by
> reading the number from the HEST table. However, bios currently doesn't
> store a pointer to it.
> 
> For now just change the logic at table build time, while enforcing that
> it will behave like before with a single source ID.
> 
> A future patch will add a HEST table bios pointer and change the logic
> at acpi_ghes_record_errors() to dynamically use the new size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  hw/acpi/ghes.c           | 43 ++++++++++++++++++++++++++--------------
>  hw/arm/virt-acpi-build.c |  5 +++++
>  include/hw/acpi/ghes.h   | 21 +++++++++++++-------
>  3 files changed, 47 insertions(+), 22 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index b709c177cdea..3f519ccab90d 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -206,17 +206,26 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
>   * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
>   * See docs/specs/acpi_hest_ghes.rst for blobs format.
>   */
> -static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> +                                   int num_sources)
>  {
>      int i, error_status_block_offset;
>  
> +    /*
> +     * TODO: Current version supports only one source.
> +     * A further patch will drop this check, after adding a proper migration
> +     * code, as, for the code to work, we need to store a bios pointer to the
> +     * HEST table.
> +     */
> +    assert(num_sources == 1);
> +
>      /* Build error_block_address */
> -    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +    for (i = 0; i < num_sources; i++) {
>          build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
>      }
>  
>      /* Build read_ack_register */
> -    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +    for (i = 0; i < num_sources; i++) {
>          /*
>           * Initialize the value of read_ack_register to 1, so GHES can be
>           * writable after (re)boot.
> @@ -231,13 +240,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
>  
>      /* Reserve space for Error Status Data Block */
>      acpi_data_push(hardware_errors,
> -        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> +        ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
>  
>      /* Tell guest firmware to place hardware_errors blob into RAM */
>      bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
>                               hardware_errors, sizeof(uint64_t), false);
>  
> -    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +    for (i = 0; i < num_sources; i++) {
>          /*
>           * Tell firmware to patch error_block_address entries to point to
>           * corresponding "Generic Error Status Block"
> @@ -263,10 +272,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
>  /* Build Generic Hardware Error Source version 2 (GHESv2) */
>  static void build_ghes_v2(GArray *table_data,
               ^^^^^^^^^^^^^

it's a bit unclear what name implies, maybe s/build_ghes_v2/build_ghes_v2_entry/

>                            BIOSLinker *linker,
> -                          enum AcpiGhesNotifyType notify,
> -                          uint16_t source_id)
> +                          const AcpiNotificationSourceId *notif_src,
> +                          uint16_t index, int num_sources)
>  {
>      uint64_t address_offset;
> +    const uint16_t notify = notif_src->notify;
> +    const uint16_t source_id = notif_src->source_id;
>  
>      /*
>       * Type:
> @@ -297,7 +308,7 @@ static void build_ghes_v2(GArray *table_data,
>                                     address_offset + GAS_ADDR_OFFSET,
>                                     sizeof(uint64_t),
>                                     ACPI_HW_ERROR_FW_CFG_FILE,
> -                                   source_id * sizeof(uint64_t));
> +                                   index * sizeof(uint64_t));
>  
>      /* Notification Structure */
>      build_ghes_hw_error_notification(table_data, notify);
> @@ -317,8 +328,7 @@ static void build_ghes_v2(GArray *table_data,
>                                     address_offset + GAS_ADDR_OFFSET,
>                                     sizeof(uint64_t),
>                                     ACPI_HW_ERROR_FW_CFG_FILE,
> -                                   (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
> -                                   * sizeof(uint64_t));
> +                                   (num_sources + index) * sizeof(uint64_t));
>  
>      /*
>       * Read Ack Preserve field
> @@ -333,19 +343,23 @@ static void build_ghes_v2(GArray *table_data,
>  /* Build Hardware Error Source Table */
>  void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>                       BIOSLinker *linker,
> +                     const AcpiNotificationSourceId * const notif_source,
> +                     int num_sources,
>                       const char *oem_id, const char *oem_table_id)
>  {
>      AcpiTable table = { .sig = "HEST", .rev = 1,
>                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> +    int i;
>  
> -    build_ghes_error_table(hardware_errors, linker);
> +    build_ghes_error_table(hardware_errors, linker, num_sources);
>  
>      acpi_table_begin(&table, table_data);
>  
>      /* Error Source Count */
> -    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> -    build_ghes_v2(table_data, linker,
> -                  ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
> +    build_append_int_noprefix(table_data, num_sources, 4);
> +    for (i = 0; i < num_sources; i++) {
> +        build_ghes_v2(table_data, linker, &notif_source[i], i, num_sources);
> +    }
>  
>      acpi_table_end(linker, &table);
>  }
> @@ -410,7 +424,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>      }
>      ags = &acpi_ged_state->ghes_state;
>  
> -    assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
>      get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
>                           &cper_addr, &read_ack_register_addr);
>  
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 3ac8f8e17861..3d411787fc37 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -893,6 +893,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
>      g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
>  }
>  
> +static const AcpiNotificationSourceId hest_ghes_notify[] = {
> +    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> +};
> +
>  static
>  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>  {
> @@ -948,6 +952,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      if (vms->ras) {
>          acpi_add_table(table_offsets, tables_blob);
>          acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> +                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
>                          vms->oem_id, vms->oem_table_id);
>      }
>  
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 39619a2457cb..9f0120d0d596 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -57,20 +57,27 @@ enum AcpiGhesNotifyType {
>      ACPI_GHES_NOTIFY_RESERVED = 12
>  };
>  
> -enum {
> -    ACPI_HEST_SRC_ID_SEA = 0,
> -    /* future ids go here */
> -
> -    ACPI_GHES_ERROR_SOURCE_COUNT
> -};
> -
>  typedef struct AcpiGhesState {
>      uint64_t hw_error_le;
>      bool present; /* True if GHES is present at all on this board */
>  } AcpiGhesState;
>  
> +/*
> + * ID numbers used to fill HEST source ID field
> + */
> +enum AcpiGhesSourceID {
> +    ACPI_HEST_SRC_ID_SYNC,
> +};
> +
> +typedef struct AcpiNotificationSourceId {
> +    enum AcpiGhesSourceID source_id;
> +    enum AcpiGhesNotifyType notify;
> +} AcpiNotificationSourceId;
> +
>  void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>                       BIOSLinker *linker,
> +                     const AcpiNotificationSourceId * const notif_source,
                                                         ^^^ is this intentional?

> +                     int num_sources,
>                       const char *oem_id, const char *oem_table_id);
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>                            GArray *hardware_errors);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-23 10:02   ` Jonathan Cameron
  2025-01-23 11:46     ` Mauro Carvalho Chehab
@ 2025-01-23 17:01     ` Igor Mammedov
  2025-01-28 10:12       ` Mauro Carvalho Chehab
  2025-01-28 10:00     ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2025-01-23 17:01 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mauro Carvalho Chehab, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Thu, 23 Jan 2025 10:02:17 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 22 Jan 2025 16:46:19 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Store HEST table address at GPA, placing its content at
> > hest_addr_le variable.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> A few trivial things inline.
> 
> Jonathan
> 
> > ---
> > 
> > Change from v8:
> > - hest_addr_lr is now pointing to the error source size and data.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Bonus.  I guess you really like this patch :)
> > ---
> >  hw/acpi/ghes.c         | 17 ++++++++++++++++-
> >  include/hw/acpi/ghes.h |  1 +
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 3f519ccab90d..34e3364d3fd8 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -30,6 +30,7 @@
> >  
> >  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
> >  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> > +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
> >  
> >  /* The max size in bytes for one error block */
> >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> > @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> >      }
> >  
> >      /*
> > -     * tell firmware to write hardware_errors GPA into
> > +     * Tell firmware to write hardware_errors GPA into  
> 
> Sneaky tidy up.  No problem with it in general but adding noise here, so if there
> are others in the series maybe gather them up in a cleanup patch.

+1

> 
> >       * hardware_errors_addr fw_cfg, once the former has been initialized.
> >       */
> >      bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> > @@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >  
> >      acpi_table_begin(&table, table_data);
> >  
> > +    int hest_offset = table_data->len;  
should be unsigned, and better uint32_t
but we have a zoo wrt type here all over the place.

  
> 
> Local style looks to be traditional C with definitions at top.  Maybe define
> hest_offset up a few lines and just set it here?

yep, it applies to whole QEMU (i.e. definitions only at the start of the block)

> 
> > +
> >      /* Error Source Count */
> >      build_append_int_noprefix(table_data, num_sources, 4);
> >      for (i = 0; i < num_sources; i++) {
> > @@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >      }
> >  
> >      acpi_table_end(linker, &table);
> > +
> > +    /*
> > +     * tell firmware to write into GPA the address of HEST via fw_cfg,  
> 
> Given the tidy up above, fix this one to have a capital T, or was this
> where you meant to change it?
> 
> > +     * once initialized.
> > +     */
> > +    bios_linker_loader_write_pointer(linker,
> > +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,  
> 
> Could wrap less and stay under 80 chars as both lines above add up to 70 something
> 
> > +                                     sizeof(uint64_t),
> > +                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> >  }  
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records
  2025-01-23 10:29   ` Jonathan Cameron
@ 2025-01-23 18:23     ` Mauro Carvalho Chehab
  2025-01-24  9:59       ` Igor Mammedov
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-23 18:23 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

Em Thu, 23 Jan 2025 10:29:19 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Wed, 22 Jan 2025 16:46:20 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > There are two pointers that are needed during error injection:
> > 
> > 1. The start address of the CPER block to be stored;
> > 2. The address of the ack, which needs a reset before next error.
> > 
> > It is preferable to calculate them from the HEST table.  This allows
> > checking the source ID, the size of the table and the type of the
> > HEST error block structures.
> > 
> > Yet, keep the old code, as this is needed for migration purposes.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Generally looks good.  A few bits that I think could be made
> easier to follow for anyone with the spec open in front of them.
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 88 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 34e3364d3fd8..b46b563bcaf8 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -61,6 +61,23 @@
> >   */
> >  #define ACPI_GHES_GESB_SIZE                 20
> >  
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> > +
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2  
> 
> Local multiline comment style seems to be always
> /*
>  * ACPI 6.2:...
> 
> So perhaps good to copy that.
> 
> > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > + */
> > +#define HEST_GHES_V2_TABLE_SIZE  92
> > +#define GHES_ACK_OFFSET          (64 + 
)  
> 
> Using a GAS offset here to me obscures what is going on.  I'd
> explicitly handle the GAS where you are reading this.
> We probably should sanity check the type as there are
> some crazy options that might turn up one day.

See below.

> Maybe worth using spec term of
> GHES_READ_ACK_...
> 
> Obviously it's a question of who you are for whether it is read or
> write, but maybe still worth using that term for easy checking
> against the specification.
> 
> > +
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source  
> same on comment format.
> 
> > + * Table 18-380: 'Error Status Address' field
> > + */
> > +#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)  
> Maybe STS or spell out status? I found ST a bit confusing below.

Giving names is not an easy task... Removing _ST doesn't sound
right, as everything is GHES_ERR. STS sounds weird to me as well.
Maybe we could name them both as something like:

	GHES_ERR_STATUS_ADDR_OFF
	GHES_READ_ACK_ADDR_OFF
 
> > +
> >  /*
> >   * Values for error_severity field
> >   */
> > @@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> >  {
> >      int i, error_status_block_offset;
> >  
> > -    /*
> > -     * TODO: Current version supports only one source.
> > -     * A further patch will drop this check, after adding a proper migration
> > -     * code, as, for the code to work, we need to store a bios pointer to the
> > -     * HEST table.
> > -     */
> > -    assert(num_sources == 1);
> > -
> >      /* Build error_block_address */
> >      for (i = 0; i < num_sources; i++) {
> >          build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> > @@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
> >      *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
> >  }
> >  
> > +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> > +                                    uint64_t *cper_addr,
> > +                                    uint64_t *read_ack_start_addr,
> > +                                    Error **errp)
> > +{
> > +    uint64_t hest_err_block_addr, hest_read_ack_addr;
> > +    uint64_t err_source_struct, error_block_addr;
> > +    uint32_t num_sources, i;
> > +
> > +    if (!hest_addr) {  
> I guess it is a question of matching local code, but I'd be tempted
> to name this hest_body_addr as it isn't the start of the table but
> rather the bit after the header.

This is named after hest_addr_le, which in turn was named after ghes_hw_le.

Besides, I guess such name was suggested on a past review. From my side, 
I'm OK with any name you/Igor pick.

> 
> > +        return;
> > +    }
> > +
> > +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));  
> 
> The hest_addr naming thing confused me a tiny bit here because obviously num_sources
> isn't the first thing in the table in the spec!
> 
> > +    num_sources = le32_to_cpu(num_sources);
> > +
> > +    err_source_struct = hest_addr + sizeof(num_sources);
> > +
> > +    /*
> > +     * Currently, HEST Error source navigates only for GHESv2 tables
> > +     */
> > +
> > +    for (i = 0; i < num_sources; i++) {
> > +        uint64_t addr = err_source_struct;
> > +        uint16_t type, src_id;
> > +
> > +        cpu_physical_memory_read(addr, &type, sizeof(type));
> > +        type = le16_to_cpu(type);
> > +
> > +        /* For now, we only know the size of GHESv2 table */
> > +        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> > +            error_setg(errp, "HEST: type %d not supported.", type);
> > +            return;  
> 
> It's a pity we can't just skip them, but absence of a size field
> makes that tricky...  Can add that later I guess along with sizes
> for each defined type including figuring out the variable length
> ones like IA-32 machine check.  I guess this is why the whole ordering
> constraint for new types was added. Can't find the old ones if
> we don't know the size of the new ones, hence need new definitions
> at the end.

Yes. The variable sizes makes it harder to parse with current GHES
types. It sounds they'll fix it for the next types, as the size of
the record will be stored for types above 11.

So, at the end, we'll need to add a much more complex logic if/when
we add non-GHES records.

> 
> Anyhow, I'm fine with this but maybe a little more description in the comment
> would avoid someone going down same rat hole I just did.
> 
> 
> > +        }
> > +
> > +        /* Compare CPER source address at the GHESv2 structure */
> > +        addr += sizeof(type);
> > +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > +
> > +        if (src_id == source_id) {
> > +            break;
> > +        }
> > +
> > +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > +    }
> > +    if (i == num_sources) {
> > +        error_setg(errp, "HEST: Source %d not found.", source_id);
> > +        return;
> > +    }
> > +
> > +    /* Navigate though table address pointers */
> > +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;  
> 
> So this is a bit confusing. I'd pull the GAS offset down here rather
> than putting it in the define. That way we can clearly see you
> are grabbing the address field.  As above, should we check the type
> is 0x00?  There are many fun things it could be but here I think
> we just want it to be memory space.

In short:

The type was already ensured when HEST table is built. I can't see
any need to add extra checks. If you want this to be better documented,
we could just do:

	hest_err_block_addr = err_source_struct + GHES_ERR_STATUS_ADDR_OFF + GAS_OFFSET;  

It follows a longer rationale:

If I understood well, and after some discussions we had today via chat,
you basically want to add an additional check logic during error inject
to check if the memory type filled at build_ghes_v2() here:

	/* Build Generic Hardware Error Source version 2 (GHESv2) */
	static void build_ghes_v2(GArray *table_data,
	                          BIOSLinker *linker,
       		                  const AcpiNotificationSourceId *notif_src,
                	          uint16_t index, int num_sources)
	{
...
	    /* Error Status Address */
	    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
...
	    /*
	     * Read Ack Register 
	     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
	     * version 2 (GHESv2 - Type 10)
	     */
	   build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
                     4 /* QWord access */, 0);
...
	}

was not modified and still remains AML_AS_SYSTEM_MEMORY, as otherwise the
code at get_ghes_source_offsets() won't be able to use 
cpu_physical_memory_read/cpu_physical_memory_write. To make it right,
IMO we would need to add something like this to aml-build.c:

	int verify_gas_addr_space(uint64_t addr, AmlAddressSpace as)
	{
		uint64_t gas_header;

		cpu_physical_memory_read(addr, &gas_header, 4);
		gas_header = cpu_to_le64(0);

		if ((gas_header & 0xff) != as)
			return 1;

		return 0;
	}

and at ghes.c do something like:

	// Using current names just to better illustrate the changes
	#define GHES_ACK_OFFSET          64 // don't add GAS_ADDR_OFFSET here
	#define GHES_ERR_ST_ADDR_OFFSET  20 // don't add GAS_ADDR_OFFSET here

	static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
	                                    uint64_t *cper_addr,
	       	                            uint64_t *read_ack_start_addr,
	                                    Error **errp)
	{
 
	    /* Navigate though table address pointers */
	    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;

	    /* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
	    if (verify_gas_addr_space(hest_err_block_addr, AML_AS_SYSTEM_MEMORY)} {
		error_setg(errp, "HEST firmware is not using system memory!!!");
		return;
	    }
	    hest_err_block_addr += GAS_ADDR_OFFSET;

	    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;

	    /* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
	    if (verify_gas_addr_space(hest_read_ack_addr, AML_AS_SYSTEM_MEMORY)} {
		error_setg(errp, "HEST firmware is not using system memory!!!");
		return;
	    }
	    hest_read_ack_addr += GAS_ADDR_OFFSET;

	    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
	                             sizeof(error_block_addr));

	    cpu_physical_memory_read(error_block_addr, cper_addr,
	                             sizeof(*cper_addr));

	    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
	                             sizeof(*read_ack_start_addr));
	}

IMO, this is overkill:
- I can't see how such check will ever be triggered in practice;
- I can't see any reason why the HEST firmware would ever be stored
  on a non-system memory: firmware should always be at 
  AML_AS_SYSTEM_MEMORY;
- As those offsets are related to fw_cfg, any change there would
  require changing the firmware binding logic. Plus, they'll very
  likely also require changes at BIOS code itself, as it would need
  to know how to store firmware files on some other memory type;
- Any change like that will certainly require adding backport support,
  as QEMU would need to check if BIOS would support different types
  of memory to store firmware instead of system memory. Also, QEMU 9.1
  is only compatible with firmware stored on system's memory. 

So, I don't see any benefit of doing that.

> > +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;  
> 
> Perhaps move this down to above where it is used?
> Same thing about GAS address offset being better found down here
> rather than hidden in GHES_ACK_OFFSET.
> 
> > +
> > +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > +                             sizeof(error_block_addr));
> > +
> > +    cpu_physical_memory_read(error_block_addr, cper_addr,
> > +                             sizeof(*cper_addr));
> > +
> > +    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> > +                             sizeof(*read_ack_start_addr));
> > +}
> > +
> >  void ghes_record_cper_errors(const void *cper, size_t len,
> >                               uint16_t source_id, Error **errp)
> >  {
> > @@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> >      }
> >      ags = &acpi_ged_state->ghes_state;
> >  
> > -    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > -                         &cper_addr, &read_ack_register_addr);
> > +    if (!ags->hest_addr_le) {
> > +        get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > +                             &cper_addr, &read_ack_register_addr);
> > +    } else {
> > +        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
> > +                                &cper_addr, &read_ack_register_addr, errp);
> > +    }
> >  
> >      if (!cper_addr) {
> >          error_setg(errp, "can not find Generic Error Status Block");  
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records
  2025-01-23 18:23     ` Mauro Carvalho Chehab
@ 2025-01-24  9:59       ` Igor Mammedov
  0 siblings, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24  9:59 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Thu, 23 Jan 2025 19:23:50 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Thu, 23 Jan 2025 10:29:19 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> 
> > On Wed, 22 Jan 2025 16:46:20 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > There are two pointers that are needed during error injection:
> > > 
> > > 1. The start address of the CPER block to be stored;
> > > 2. The address of the ack, which needs a reset before next error.

drop sentence after comma as it's not necessary (and confusing detail)

> > > 
> > > It is preferable to calculate them from the HEST table.  This allows
> > > checking the source ID, the size of the table and the type of the
> > > HEST error block structures.
> > > 
> > > Yet, keep the old code, as this is needed for migration purposes.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
> > Generally looks good.  A few bits that I think could be made
> > easier to follow for anyone with the spec open in front of them.
> > 
> > Thanks,
> > 
> > Jonathan
> >   
> > > ---
> > >  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 88 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 34e3364d3fd8..b46b563bcaf8 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -61,6 +61,23 @@
> > >   */
> > >  #define ACPI_GHES_GESB_SIZE                 20
> > >  
> > > +/*
> > > + * Offsets with regards to the start of the HEST table stored at
> > > + * ags->hest_addr_le, according with the memory layout map at
> > > + * docs/specs/acpi_hest_ghes.rst.
> > > + */
> > > +
> > > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2    
> > 
> > Local multiline comment style seems to be always
> > /*
> >  * ACPI 6.2:...
> > 
> > So perhaps good to copy that.
> >   
> > > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > > + */
> > > +#define HEST_GHES_V2_TABLE_SIZE  92
> > > +#define GHES_ACK_OFFSET          (64 +   
> )  
> > 
> > Using a GAS offset here to me obscures what is going on.  I'd
> > explicitly handle the GAS where you are reading this.
> > We probably should sanity check the type as there are
> > some crazy options that might turn up one day.  
> 
> See below.
> 
> > Maybe worth using spec term of
> > GHES_READ_ACK_...
> > 
> > Obviously it's a question of who you are for whether it is read or
> > write, but maybe still worth using that term for easy checking
> > against the specification.
> >   
> > > +
> > > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source    
> > same on comment format.
> >   
> > > + * Table 18-380: 'Error Status Address' field
> > > + */
> > > +#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)    
> > Maybe STS or spell out status? I found ST a bit confusing below.  
> 
> Giving names is not an easy task... Removing _ST doesn't sound
> right, as everything is GHES_ERR. STS sounds weird to me as well.
> Maybe we could name them both as something like:
> 
> 	GHES_ERR_STATUS_ADDR_OFF
> 	GHES_READ_ACK_ADDR_OFF

lgtm

>  
> > > +
> > >  /*
> > >   * Values for error_severity field
> > >   */
> > > @@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> > >  {
> > >      int i, error_status_block_offset;
> > >  
> > > -    /*
> > > -     * TODO: Current version supports only one source.
> > > -     * A further patch will drop this check, after adding a proper migration
> > > -     * code, as, for the code to work, we need to store a bios pointer to the
> > > -     * HEST table.
> > > -     */
> > > -    assert(num_sources == 1);
> > > -
> > >      /* Build error_block_address */
> > >      for (i = 0; i < num_sources; i++) {
> > >          build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> > > @@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
> > >      *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
> > >  }
> > >  
> > > +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> > > +                                    uint64_t *cper_addr,
> > > +                                    uint64_t *read_ack_start_addr,
> > > +                                    Error **errp)
> > > +{
> > > +    uint64_t hest_err_block_addr, hest_read_ack_addr;
> > > +    uint64_t err_source_struct, error_block_addr;
> > > +    uint32_t num_sources, i;
> > > +
> > > +    if (!hest_addr) {    
> > I guess it is a question of matching local code, but I'd be tempted
> > to name this hest_body_addr as it isn't the start of the table but
> > rather the bit after the header.  
> 
> This is named after hest_addr_le, which in turn was named after ghes_hw_le.
> 
> Besides, I guess such name was suggested on a past review. From my side, 
> I'm OK with any name you/Igor pick.
> 
> >   
> > > +        return;
> > > +    }
> > > +
> > > +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources)); 


if it were the HEST table addr, then hest_addr name is ok
however in that case you are reading value of "Header Signature" into num_sources.
Which makes me think there is a bug here as one should read hest_addr + num_src_off

   
> > 
> > The hest_addr naming thing confused me a tiny bit here because obviously num_sources
> > isn't the first thing in the table in the spec!
> >   
> > > +    num_sources = le32_to_cpu(num_sources);
> > > +
> > > +    err_source_struct = hest_addr + sizeof(num_sources);
the same issue wrt correct offset

> > > +
> > > +    /*
> > > +     * Currently, HEST Error source navigates only for GHESv2 tables
> > > +     */
> > > +
> > > +    for (i = 0; i < num_sources; i++) {
> > > +        uint64_t addr = err_source_struct;
> > > +        uint16_t type, src_id;
> > > +
> > > +        cpu_physical_memory_read(addr, &type, sizeof(type));
> > > +        type = le16_to_cpu(type);
> > > +
> > > +        /* For now, we only know the size of GHESv2 table */
> > > +        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> > > +            error_setg(errp, "HEST: type %d not supported.", type);
> > > +            return;    
> > 
> > It's a pity we can't just skip them, but absence of a size field
> > makes that tricky...  Can add that later I guess along with sizes
> > for each defined type including figuring out the variable length
> > ones like IA-32 machine check.  I guess this is why the whole ordering
> > constraint for new types was added. Can't find the old ones if
> > we don't know the size of the new ones, hence need new definitions
> > at the end.

this error out may cause parsing issues in the future when migrating from
new to old QEMU (but at least it's not fatal),
So I've given up on pushing for graceful 'skip' of not handled entries.
(considering backward migration is not hard req for upstream qemu)

Though, tt's would be nice to have it in the same merge cycle as patch on top
if possible.

> Yes. The variable sizes makes it harder to parse with current GHES
> types. It sounds they'll fix it for the next types, as the size of
> the record will be stored for types above 11.
> 
> So, at the end, we'll need to add a much more complex logic if/when
> we add non-GHES records.
> 
> > 
> > Anyhow, I'm fine with this but maybe a little more description in the comment
> > would avoid someone going down same rat hole I just did.
> > 
> >   
> > > +        }
> > > +
> > > +        /* Compare CPER source address at the GHESv2 structure */
> > > +        addr += sizeof(type);
> > > +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > > +
> > > +        if (src_id == source_id) {

missing le2cpu(src_id) here ???

> > > +            break;
> > > +        }
> > > +
> > > +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > > +    }
> > > +    if (i == num_sources) {
> > > +        error_setg(errp, "HEST: Source %d not found.", source_id);
> > > +        return;
> > > +    }
> > > +
> > > +    /* Navigate though table address pointers */
> > > +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;    
> > 
> > So this is a bit confusing. I'd pull the GAS offset down here rather
> > than putting it in the define. That way we can clearly see you
> > are grabbing the address field.  As above, should we check the type
> > is 0x00?  There are many fun things it could be but here I think
> > we just want it to be memory space.  
> 
> In short:
> 
> The type was already ensured when HEST table is built. I can't see
> any need to add extra checks. If you want this to be better documented,
> we could just do:
> 

1)
> 	hest_err_block_addr = err_source_struct + GHES_ERR_STATUS_ADDR_OFF + GAS_OFFSET;
I'd prefer this, _ST_ part in GHES_ERR_ST_ADDR_OFFSET also reads to me a bit confusing.


> It follows a longer rationale:
> 
> If I understood well, and after some discussions we had today via chat,
> you basically want to add an additional check logic during error inject
> to check if the memory type filled at build_ghes_v2() here:
> 
> 	/* Build Generic Hardware Error Source version 2 (GHESv2) */
> 	static void build_ghes_v2(GArray *table_data,
> 	                          BIOSLinker *linker,
>        		                  const AcpiNotificationSourceId *notif_src,
>                 	          uint16_t index, int num_sources)
> 	{
> ...
> 	    /* Error Status Address */
> 	    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> ...
> 	    /*
> 	     * Read Ack Register 
> 	     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> 	     * version 2 (GHESv2 - Type 10)
> 	     */
> 	   build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>                      4 /* QWord access */, 0);
> ...
> 	}
> 
> was not modified and still remains AML_AS_SYSTEM_MEMORY, as otherwise the
> code at get_ghes_source_offsets() won't be able to use 
> cpu_physical_memory_read/cpu_physical_memory_write. To make it right,
> IMO we would need to add something like this to aml-build.c:
> 
> 	int verify_gas_addr_space(uint64_t addr, AmlAddressSpace as)
> 	{
> 		uint64_t gas_header;
> 
> 		cpu_physical_memory_read(addr, &gas_header, 4);
> 		gas_header = cpu_to_le64(0);
> 
> 		if ((gas_header & 0xff) != as)
> 			return 1;
> 
> 		return 0;
> 	}
> 
> and at ghes.c do something like:
> 
> 	// Using current names just to better illustrate the changes
> 	#define GHES_ACK_OFFSET          64 // don't add GAS_ADDR_OFFSET here
> 	#define GHES_ERR_ST_ADDR_OFFSET  20 // don't add GAS_ADDR_OFFSET here
> 
> 	static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> 	                                    uint64_t *cper_addr,
> 	       	                            uint64_t *read_ack_start_addr,
> 	                                    Error **errp)
> 	{
>  
> 	    /* Navigate though table address pointers */
> 	    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> 
> 	    /* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
> 	    if (verify_gas_addr_space(hest_err_block_addr, AML_AS_SYSTEM_MEMORY)} {
> 		error_setg(errp, "HEST firmware is not using system memory!!!");
> 		return;
> 	    }
> 	    hest_err_block_addr += GAS_ADDR_OFFSET;
> 
> 	    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
> 
> 	    /* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
> 	    if (verify_gas_addr_space(hest_read_ack_addr, AML_AS_SYSTEM_MEMORY)} {
> 		error_setg(errp, "HEST firmware is not using system memory!!!");
> 		return;
> 	    }
> 	    hest_read_ack_addr += GAS_ADDR_OFFSET;
> 
> 	    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> 	                             sizeof(error_block_addr));
> 
> 	    cpu_physical_memory_read(error_block_addr, cper_addr,
> 	                             sizeof(*cper_addr));
> 
> 	    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> 	                             sizeof(*read_ack_start_addr));
> 	}
> 
> IMO, this is overkill:
> - I can't see how such check will ever be triggered in practice;
> - I can't see any reason why the HEST firmware would ever be stored
>   on a non-system memory: firmware should always be at 
>   AML_AS_SYSTEM_MEMORY;
> - As those offsets are related to fw_cfg, any change there would
>   require changing the firmware binding logic. Plus, they'll very
>   likely also require changes at BIOS code itself, as it would need
>   to know how to store firmware files on some other memory type;
> - Any change like that will certainly require adding backport support,
>   as QEMU would need to check if BIOS would support different types
>   of memory to store firmware instead of system memory. Also, QEMU 9.1
>   is only compatible with firmware stored on system's memory. 
> 
> So, I don't see any benefit of doing that.

+1

> 
> > > +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;    
> > 
> > Perhaps move this down to above where it is used?
> > Same thing about GAS address offset being better found down here
> > rather than hidden in GHES_ACK_OFFSET.

I'd treat it the same as [1]

> >   
> > > +
> > > +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > > +                             sizeof(error_block_addr));
> > > +
> > > +    cpu_physical_memory_read(error_block_addr, cper_addr,
> > > +                             sizeof(*cper_addr));
> > > +
> > > +    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> > > +                             sizeof(*read_ack_start_addr));
> > > +}
> > > +
> > >  void ghes_record_cper_errors(const void *cper, size_t len,
> > >                               uint16_t source_id, Error **errp)
> > >  {
> > > @@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > >      }
> > >      ags = &acpi_ged_state->ghes_state;
> > >  
> > > -    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > > -                         &cper_addr, &read_ack_register_addr);

> > > +    if (!ags->hest_addr_le) {
> > > +        get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > > +                             &cper_addr, &read_ack_register_addr);
> > > +    } else {
> > > +        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
> > > +                                &cper_addr, &read_ack_register_addr, errp);
> > > +    }
> > >  

it looks like returned addresses in le byteorder, and then caller uses
them as is to access memory, where as it should use le2cpu on them 1st.
I'd add here conversion so the caller would deal only with host byteorder
(i.e. the same way as get_hw_error_offsets())

> > >      if (!cper_addr) {
> > >          error_setg(errp, "can not find Generic Error Status Block");    
> >   
> 
> 
> 
> Thanks,
> Mauro
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr
  2025-01-22 15:46 ` [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
  2025-01-23 10:31   ` Jonathan Cameron
@ 2025-01-24 10:08   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 10:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, linux-kernel

On Wed, 22 Jan 2025 16:46:21 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The GHES migration logic at GED should now support HEST table
> location too.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index c85d97ca3776..5346cae573b7 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -386,6 +386,34 @@ static const VMStateDescription vmstate_ghes_state = {
>      }
>  };
>  
> +static const VMStateDescription vmstate_hest = {
> +    .name = "acpi-hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool hest_needed(void *opaque)
> +{
> +    AcpiGedState *s = opaque;
> +    return s->ghes_state.hest_addr_le;
> +}
> +
> +static const VMStateDescription vmstate_hest_state = {
> +    .name = "acpi-ged/hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = hest_needed,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> +                       vmstate_hest, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_acpi_ged = {
>      .name = "acpi-ged",
>      .version_id = 1,
> @@ -398,6 +426,7 @@ static const VMStateDescription vmstate_acpi_ged = {
>          &vmstate_memhp_state,
>          &vmstate_cpuhp_state,
>          &vmstate_ghes_state,
> +        &vmstate_hest_state,
>          NULL
>      }
>  };


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available
  2025-01-22 15:46 ` [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
  2025-01-23 10:52   ` Jonathan Cameron
@ 2025-01-24 10:23   ` Igor Mammedov
  2025-01-28 11:29     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 10:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Eduardo Habkost, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel

On Wed, 22 Jan 2025 16:46:22 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Create a new property (x-has-hest-addr) and use it to detect if
> the GHES table offsets can be calculated from the HEST address
> (qemu 9.2 and upper) or via the legacy way via an offset obtained

10.0 by now

> from the hardware_errors firmware file.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/generic_event_device.c |  1 +
>  hw/acpi/ghes.c                 | 28 +++++++++++++++++++++-------
>  hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
>  hw/core/machine.c              |  2 ++
>  include/hw/acpi/ghes.h         |  1 +
>  5 files changed, 51 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 5346cae573b7..fe537ed05c66 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
>  
>  static const Property acpi_ged_properties[] = {
>      DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
> +    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),

s/hest_lookup/use_hest_addr/

>  };
>  
>  static const VMStateDescription vmstate_memhp_state = {
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index b46b563bcaf8..86c97f60d6a0 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -359,6 +359,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>  {
>      AcpiTable table = { .sig = "HEST", .rev = 1,
>                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> +    AcpiGedState *acpi_ged_state;
> +    AcpiGhesState *ags = NULL;
>      int i;
>  
>      build_ghes_error_table(hardware_errors, linker, num_sources);
> @@ -379,10 +381,20 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>       * tell firmware to write into GPA the address of HEST via fw_cfg,
>       * once initialized.
>       */
> -    bios_linker_loader_write_pointer(linker,
> -                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> -                                     sizeof(uint64_t),
> -                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> +
> +    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> +                                                       NULL));

the caller, already did lookup,
just pass hest_lookup as an argument and use it here

> +    if (!acpi_ged_state) {
> +        return;
> +    }
> +
> +    ags = &acpi_ged_state->ghes_state;
> +    if (ags->hest_lookup) {
> +        bios_linker_loader_write_pointer(linker,
> +                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> +                                         sizeof(uint64_t),
> +                                         ACPI_BUILD_TABLE_FILE, hest_offset);
> +    }
>  }
>  
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> @@ -396,8 +408,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
>      fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);

btw shouldn't we disable hw_error_le if hest_lookup is active?
>  
> -    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> -        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> +    if (ags && ags->hest_lookup) {

why bother with 'ags &&' if we don't do it hw_error_le?


> +        fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> +            NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> +    }
>  
>      ags->present = true;
>  }
> @@ -512,7 +526,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>      }
>      ags = &acpi_ged_state->ghes_state;
>  
> -    if (!ags->hest_addr_le) {
> +    if (!ags->hest_lookup) {
why? !ags->hest_addr_le is sufficient

>          get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
>                               &cper_addr, &read_ack_register_addr);
>      } else {
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 3d411787fc37..ada5d08cfbe7 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
>      { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
>  };
>  
> +static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
> +    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> +};
> +
>  static
>  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>  {
> @@ -950,10 +954,28 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      build_dbg2(tables_blob, tables->linker, vms);
>  
>      if (vms->ras) {
> -        acpi_add_table(table_offsets, tables_blob);
> -        acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> -                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
> -                        vms->oem_id, vms->oem_table_id);
> +        AcpiGhesState *ags;
> +        AcpiGedState *acpi_ged_state;
> +
> +        acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> +                                                       NULL));
> +        if (acpi_ged_state) {
> +            ags = &acpi_ged_state->ghes_state;
> +
> +            acpi_add_table(table_offsets, tables_blob);
> +
> +            if (!ags->hest_lookup) {
> +                acpi_build_hest(tables_blob, tables->hardware_errors,
> +                                tables->linker, hest_ghes_notify_9_2,
> +                                ARRAY_SIZE(hest_ghes_notify_9_2),
> +                                vms->oem_id, vms->oem_table_id);
> +            } else {
> +                acpi_build_hest(tables_blob, tables->hardware_errors,
> +                                tables->linker, hest_ghes_notify,
> +                                ARRAY_SIZE(hest_ghes_notify),
> +                                vms->oem_id, vms->oem_table_id);
> +            }
> +        }
>      }
>  
>      if (ms->numa_state->num_nodes > 0) {
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index c23b39949649..0d0cde481954 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -34,10 +34,12 @@
>  #include "hw/virtio/virtio-pci.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "hw/virtio/virtio-iommu.h"
> +#include "hw/acpi/generic_event_device.h"
>  #include "audio/audio.h"
>  
>  GlobalProperty hw_compat_9_2[] = {
>      {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
> +    { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
>  };
>  const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
>  
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 237721fec0a2..164ed8b0f9a3 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -61,6 +61,7 @@ typedef struct AcpiGhesState {
>      uint64_t hest_addr_le;
>      uint64_t hw_error_le;
>      bool present; /* True if GHES is present at all on this board */
                        and perhaps reformulate this as well

> +    bool hest_lookup; /* True if HEST address is present */
                                 if device should use HEST addr for error source lookup 

>  } AcpiGhesState;
>  
>  /*


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state
  2025-01-22 15:46 ` [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
  2025-01-23 10:54   ` Jonathan Cameron
@ 2025-01-24 12:25   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 12:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Paolo Bonzini, Peter Maydell,
	kvm, linux-kernel

On Wed, 22 Jan 2025 16:46:24 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Move the check logic into a common function and simplify the
> code which checks if GHES is enabled and was properly setup.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes-stub.c    |  4 ++--
>  hw/acpi/ghes.c         | 33 +++++++++++----------------------
>  include/hw/acpi/ghes.h |  9 +++++----
>  target/arm/kvm.c       |  2 +-
>  4 files changed, 19 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index 7cec1812dad9..fbabf955155a 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -16,7 +16,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
>      return -1;
>  }
>  
> -bool acpi_ghes_present(void)
> +AcpiGhesState *acpi_ghes_get_state(void)
>  {
> -    return false;
> +    return NULL;
>  }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 961fc38ea8f5..5d29db3918dd 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -420,10 +420,6 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
>                                   uint64_t *cper_addr,
>                                   uint64_t *read_ack_register_addr)
>  {
> -    if (!ghes_addr) {
> -        return;
> -    }
> -
>      /*
>       * non-HEST version supports only one source, so no need to change
>       * the start offset based on the source ID. Also, we can't validate
> @@ -451,10 +447,6 @@ static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
>      uint64_t err_source_struct, error_block_addr;
>      uint32_t num_sources, i;
>  
> -    if (!hest_addr) {
> -        return;
> -    }
> -
>      cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
>      num_sources = le32_to_cpu(num_sources);
>  
> @@ -513,7 +505,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>                               uint16_t source_id, Error **errp)
>  {
>      uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
> -    AcpiGedState *acpi_ged_state;
>      AcpiGhesState *ags;
>  
>      if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> @@ -521,13 +512,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>          return;
>      }
>  
> -    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> -                                                       NULL));
> -    if (!acpi_ged_state) {
> -        error_setg(errp, "Can't find ACPI_GED object");
> +    ags = acpi_ghes_get_state();

1)

> +    if (!ags) {
>          return;
>      }
> -    ags = &acpi_ged_state->ghes_state;
>  
>      if (!ags->hest_lookup) {
>          get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> @@ -537,11 +525,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>                                  &cper_addr, &read_ack_register_addr, errp);
>      }
>  
> -    if (!cper_addr) {
> -        error_setg(errp, "can not find Generic Error Status Block");
> -        return;
> -    }
> -
>      cpu_physical_memory_read(read_ack_register_addr,
>                               &read_ack_register, sizeof(read_ack_register));
>  
> @@ -605,7 +588,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
>      return 0;
>  }
>  
> -bool acpi_ghes_present(void)
> +AcpiGhesState *acpi_ghes_get_state(void)
>  {
>      AcpiGedState *acpi_ged_state;
>      AcpiGhesState *ags;
> @@ -614,8 +597,14 @@ bool acpi_ghes_present(void)
>                                                         NULL));
>  
>      if (!acpi_ged_state) {
> -        return false;
> +        return NULL;
>      }
>      ags = &acpi_ged_state->ghes_state;
> -    return ags->present;

> +    if (!ags->present) {
> +        return NULL;
> +    }

redundant check,  check below vvvv should be sufficient

> +    if (!ags->hw_error_le && !ags->hest_addr_le) {
> +        return NULL;
> +    }
> +    return ags;
>  }
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 2e8405edfe27..64fe2b5bea65 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -91,10 +91,11 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>                               uint16_t source_id, Error **errp);
>  
>  /**
> - * acpi_ghes_present: Report whether ACPI GHES table is present
> + * acpi_ghes_get_state: Get a pointer for ACPI ghes state
>   *
> - * Returns: true if the system has an ACPI GHES table and it is
> - * safe to call acpi_ghes_memory_errors() to record a memory error.
> + * Returns: a pointer to ghes state if the system has an ACPI GHES table,
> + * it is enabled and it is safe to call acpi_ghes_memory_errors() to record
> + * a memory error. Returns false, otherwise.
>   */
> -bool acpi_ghes_present(void);
> +AcpiGhesState *acpi_ghes_get_state(void);
>  #endif
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index da30bdbb2349..0283089713b9 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2369,7 +2369,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>  
>      assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>  
> -    if (acpi_ghes_present() && addr) {
> +    if (acpi_ghes_get_state() && addr) {

double lookup, 1sh here and then in [1],
suggest store state here and pass it as an argument to down the call chain
(i.e. to acpi_ghes_memory_errors() and below)

>          ram_addr = qemu_ram_addr_from_host(addr);
>          if (ram_addr != RAM_ADDR_INVALID &&
>              kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/11] acpi/generic_event_device: add an APEI error device
  2025-01-22 15:46 ` [PATCH 08/11] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2025-01-24 12:30   ` Igor Mammedov
  2025-01-28 17:42     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 12:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, linux-kernel

On Wed, 22 Jan 2025 16:46:25 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Adds a generic error device to handle generic hardware error
> events as specified at ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> using HID PNP0C33.
> 
> The PNP0C33 device is used to report hardware errors to
> the guest via ACPI APEI Generic Hardware Error Source (GHES).
> 
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/acpi/aml-build.c                    | 10 ++++++++++
>  hw/acpi/generic_event_device.c         |  8 ++++++++
>  include/hw/acpi/acpi_dev_interface.h   |  1 +
>  include/hw/acpi/aml-build.h            |  2 ++
>  include/hw/acpi/generic_event_device.h |  1 +
>  5 files changed, 22 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index f8f93a9f66c8..e4bd7b611372 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
>  
>      return var;
>  }
> +
> +/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
> +Aml *aml_error_device(void)
> +{
> +    Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +
> +    return dev;
> +}
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index fe537ed05c66..ce00c80054f4 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
>      ACPI_GED_PWR_DOWN_EVT,
>      ACPI_GED_NVDIMM_HOTPLUG_EVT,
>      ACPI_GED_CPU_HOTPLUG_EVT,
> +    ACPI_GED_ERROR_EVT,
>  };
>  
>  /*
> @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
>                             aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
>                                        aml_int(0x80)));
>                  break;
> +            case ACPI_GED_ERROR_EVT:
> +                aml_append(if_ctx,
> +                           aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> +                                      aml_int(0x80)));
                                                 ^^^^^
nit: perhaps add a comment with intent and ref to spec wrt above  value

> +                break;
>              case ACPI_GED_NVDIMM_HOTPLUG_EVT:
>                  aml_append(if_ctx,
>                             aml_notify(aml_name("\\_SB.NVDR"),
> @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
>          sel = ACPI_GED_MEM_HOTPLUG_EVT;
>      } else if (ev & ACPI_POWER_DOWN_STATUS) {
>          sel = ACPI_GED_PWR_DOWN_EVT;
> +    } else if (ev & ACPI_GENERIC_ERROR) {
> +        sel = ACPI_GED_ERROR_EVT;
>      } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
>          sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
>      } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> index 68d9d15f50aa..8294f8f0ccca 100644
> --- a/include/hw/acpi/acpi_dev_interface.h
> +++ b/include/hw/acpi/acpi_dev_interface.h
> @@ -13,6 +13,7 @@ typedef enum {
>      ACPI_NVDIMM_HOTPLUG_STATUS = 16,
>      ACPI_VMGENID_CHANGE_STATUS = 32,
>      ACPI_POWER_DOWN_STATUS = 64,
> +    ACPI_GENERIC_ERROR = 128,
>  } AcpiEventStatusBits;
>  
>  #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index c18f68134246..f38e12971932 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -252,6 +252,7 @@ struct CrsRangeSet {
>  /* Consumer/Producer */
>  #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY        (1 << 1)
>  
> +#define ACPI_APEI_ERROR_DEVICE   "GEDD"
>  /**
>   * init_aml_allocator:
>   *
> @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
>               uint8_t channel);
>  Aml *aml_sleep(uint64_t msec);
>  Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
> +Aml *aml_error_device(void);
>  
>  /* Block AML object primitives */
>  Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
> diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> index d2dac87b4a9f..1c18ac296fcb 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
>  #define ACPI_GED_PWR_DOWN_EVT      0x2
>  #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
>  #define ACPI_GED_CPU_HOTPLUG_EVT    0x8
> +#define ACPI_GED_ERROR_EVT          0x10
>  
>  typedef struct GEDState {
>      MemoryRegion evt;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection
  2025-01-22 15:46 ` [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
  2025-01-23 11:00   ` Jonathan Cameron
@ 2025-01-24 12:38   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 12:38 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Eric Blake,
	Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
	Shannon Zhao, linux-kernel

On Wed, 22 Jan 2025 16:46:27 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2, and add support for it for ARM guests.
> 
> Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
> independent. This is mapped at arch virt bindings, depending on the
> types supported by QEMU and by the BIOS. So, on ARM, this is supported
> via ACPI_GHES_NOTIFY_GPIO notification type.
> 
> This patch is co-authored:
>     - original ghes logic to inject a simple ARM record by Shiju Jose;
>     - generic logic to handle block addresses by Jonathan Cameron;
>     - generic GHESv2 error inject by Mauro Carvalho Chehab;
> 
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

QMP is not my area, so just a cursory review

with nits below fixed up
  Acked-by: Igor Mammedov <imammedo@redhat.com>

> 
> ---
> 
> Changes since v9:
> - ARM source IDs renamed to reflect SYNC/ASYNC;
> - command name changed to better reflect what it does;
> - some improvements at JSON documentation;
> - add a check for QMP source at the notification logic.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  MAINTAINERS              |  7 +++++++
>  hw/acpi/Kconfig          |  5 +++++
>  hw/acpi/ghes.c           |  2 +-
>  hw/acpi/ghes_cper.c      | 32 ++++++++++++++++++++++++++++++++
>  hw/acpi/ghes_cper_stub.c | 19 +++++++++++++++++++
>  hw/acpi/meson.build      |  2 ++
>  hw/arm/virt-acpi-build.c |  1 +
>  hw/arm/virt.c            |  7 +++++++
>  include/hw/acpi/ghes.h   |  1 +
>  include/hw/arm/virt.h    |  1 +
>  qapi/acpi-hest.json      | 35 +++++++++++++++++++++++++++++++++++
>  qapi/meson.build         |  1 +
>  qapi/qapi-schema.json    |  1 +
>  13 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 hw/acpi/ghes_cper.c
>  create mode 100644 hw/acpi/ghes_cper_stub.c
>  create mode 100644 qapi/acpi-hest.json
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 846b81e3ec03..8e1f662fa0e0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
>  F: include/hw/acpi/ghes.h
>  F: docs/specs/acpi_hest_ghes.rst
>  
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/ghes_cper.c
> +F: hw/acpi/ghes_cper_stub.c
> +F: qapi/acpi-hest.json
> +
>  ppc4xx
>  L: qemu-ppc@nongnu.org
>  S: Orphan
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index 1d4e9f0845c0..daabbe6cd11e 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -51,6 +51,11 @@ config ACPI_APEI
>      bool
>      depends on ACPI
>  
> +config GHES_CPER
> +    bool
> +    depends on ACPI_APEI
> +    default y
> +
>  config ACPI_PCI
>      bool
>      depends on ACPI && PCI
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 5d29db3918dd..cf83c959b5ef 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -547,7 +547,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>      /* Write the generic error data entry into guest memory */
>      cpu_physical_memory_write(cper_addr, cper, len);
>  
> -    notifier_list_notify(&acpi_generic_error_notifiers, NULL);
> +    notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
>  }
>  
>  int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
> diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> new file mode 100644
> index 000000000000..02c47b41b990
> --- /dev/null
> +++ b/hw/acpi/ghes_cper.c
> @@ -0,0 +1,32 @@
> +/*
> + * CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/base64.h"
> +#include "qemu/error-report.h"
> +#include "qemu/uuid.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
> +{
> +
> +    uint8_t *cper;
> +    size_t  len;
> +
> +    cper = qbase64_decode(qmp_cper, -1, &len, errp);
> +    if (!cper) {
> +        error_setg(errp, "missing GHES CPER payload");
> +        return;
> +    }
> +
> +    ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
> +}
> diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
> new file mode 100644
> index 000000000000..8782e2c02fa8
> --- /dev/null
> +++ b/hw/acpi/ghes_cper_stub.c
> @@ -0,0 +1,19 @@
> +/*
> + * Stub interface for CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_inject_ghes_error(const char *cper, Error **errp)
> +{
> +    error_setg(errp, "GHES QMP error inject is not compiled in");
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index 73f02b96912b..56b5d1ec9691 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -34,4 +34,6 @@ endif
>  system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
>  system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
>  system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
>  system_ss.add(files('acpi-qmp-cmds.c'))
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index ae60268bdcc2..d094212ce584 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -896,6 +896,7 @@ static void acpi_align_size(GArray *blob, unsigned align)
>  
>  static const AcpiNotificationSourceId hest_ghes_notify[] = {
>      { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> +    { ACPI_HEST_SRC_ID_QMP, ACPI_GHES_NOTIFY_GPIO },
>  };
>  
>  static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index e272b35ea114..9074a540197d 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1012,6 +1012,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
>  
>  static void virt_generic_error_req(Notifier *n, void *opaque)
>  {
> +    uint16_t *source_id = opaque;
> +
> +    /* Currently, only QMP source ID is async */
> +    if (*source_id != ACPI_HEST_SRC_ID_QMP) {
> +        return;
> +    }
> +
>      VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
>  
>      acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 64fe2b5bea65..078d78666f91 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -72,6 +72,7 @@ typedef struct AcpiGhesState {
>   */
>  enum AcpiGhesSourceID {
>      ACPI_HEST_SRC_ID_SYNC,
> +    ACPI_HEST_SRC_ID_QMP,       /* Use it only for QMP injected errors */
>  };
>  
>  typedef struct AcpiNotificationSourceId {
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index f3cf28436770..56f270f61cf5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -33,6 +33,7 @@
>  #include "exec/hwaddr.h"
>  #include "qemu/notify.h"
>  #include "hw/boards.h"
> +#include "hw/acpi/ghes.h"
>  #include "hw/arm/boot.h"
>  #include "hw/arm/bsa.h"
>  #include "hw/block/flash.h"
> diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json
> new file mode 100644
> index 000000000000..d58fba485180
> --- /dev/null
> +++ b/qapi/acpi-hest.json
> @@ -0,0 +1,35 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# == GHESv2 CPER Error Injection
> +#
> +# Defined since ACPI Specification 6.1,
> +# section 18.3.2.8 Generic Hardware Error Source version 2. See:
> +#
> +# https://uefi.org/sites/default/files/resources/ACPI_6_1.pdf
> +##
> +
> +
> +##
> +# @inject-ghes-error:
> +#
> +# Inject an error with additional ACPI 6.1 GHESv2 error information
> +#
> +# @cper: contains a base64 encoded string with raw data for a single
> +#     CPER record with Generic Error Status Block, Generic Error Data
> +#     Entry and generic error data payload, as described at
> +#     https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.2

10.0

> +##
> +{ 'command': 'inject-ghes-error',

s/ghes/ghes-v2/


> +  'data': {
> +    'cper': 'str'
> +  },
> +  'features': [ 'unstable' ]
> +}
> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..35cea6147262 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -59,6 +59,7 @@ qapi_all_modules = [
>  if have_system
>    qapi_all_modules += [
>      'acpi',
> +    'acpi-hest',
>      'audio',
>      'cryptodev',
>      'qdev',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..baf19ab73afe 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -75,6 +75,7 @@
>  { 'include': 'misc-target.json' }
>  { 'include': 'audio.json' }
>  { 'include': 'acpi.json' }
> +{ 'include': 'acpi-hest.json' }
>  { 'include': 'pci.json' }
>  { 'include': 'stats.json' }
>  { 'include': 'virtio.json' }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection
  2025-01-23 11:00   ` Jonathan Cameron
@ 2025-01-24 12:40     ` Igor Mammedov
  0 siblings, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 12:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mauro Carvalho Chehab, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, Eric Blake,
	Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
	Shannon Zhao, linux-kernel

On Thu, 23 Jan 2025 11:00:32 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Wed, 22 Jan 2025 16:46:27 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Creates a QMP command to be used for generic ACPI APEI hardware error
> > injection (HEST) via GHESv2, and add support for it for ARM guests.
> > 
> > Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
> > independent. This is mapped at arch virt bindings, depending on the
> > types supported by QEMU and by the BIOS. So, on ARM, this is supported
> > via ACPI_GHES_NOTIFY_GPIO notification type.
> > 
> > This patch is co-authored:
> >     - original ghes logic to inject a simple ARM record by Shiju Jose;
> >     - generic logic to handle block addresses by Jonathan Cameron;
> >     - generic GHESv2 error inject by Mauro Carvalho Chehab;
> > 
> > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > 
> > ---
> > 
> > Changes since v9:
> > - ARM source IDs renamed to reflect SYNC/ASYNC;
> > - command name changed to better reflect what it does;
> > - some improvements at JSON documentation;
> > - add a check for QMP source at the notification logic.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Another bonus.
> 
> Few trivial formatting comments, otherwise looks fine to me.
> 
> Jonathan
> 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 5d29db3918dd..cf83c959b5ef 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -547,7 +547,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> >      /* Write the generic error data entry into guest memory */
> >      cpu_physical_memory_write(cper_addr, cper, len);
> >  
> > -    notifier_list_notify(&acpi_generic_error_notifiers, NULL);
> > +    notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
> >  }
> >  
> >  int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
> > diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> > new file mode 100644
> > index 000000000000..02c47b41b990
> > --- /dev/null
> > +++ b/hw/acpi/ghes_cper.c
> > @@ -0,0 +1,32 @@
> > +/*
> > + * CPER payload parser for error injection
> > + *
> > + * Copyright(C) 2024 Huawei LTD.

2025

> > + *
> > + * This code is licensed under the GPL version 2 or later. See the
> > + * COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +
> > +#include "qemu/base64.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/uuid.h"
> > +#include "qapi/qapi-commands-acpi-hest.h"
> > +#include "hw/acpi/ghes.h"
> > +
> > +void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
> > +{
> > +  
> Odd blank line that can go.
> 
> > +    uint8_t *cper;
> > +    size_t  len;
> > +
> > +    cper = qbase64_decode(qmp_cper, -1, &len, errp);
> > +    if (!cper) {
> > +        error_setg(errp, "missing GHES CPER payload");
> > +        return;
> > +    }
> > +
> > +    ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
> > +}
> > diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
> > new file mode 100644
> > index 000000000000..8782e2c02fa8
> > --- /dev/null
> > +++ b/hw/acpi/ghes_cper_stub.c
> > @@ -0,0 +1,19 @@
> > +/*
> > + * Stub interface for CPER payload parser for error injection
> > + *
> > + * Copyright(C) 2024 Huawei LTD.

2025

> > + *
> > + * This code is licensed under the GPL version 2 or later. See the
> > + * COPYING file in the top-level directory.
> > + *  
> Trivial but I'd drop these trailing blank lines as they don't add
> anything other than making people scroll further.
> 
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qapi/qapi-commands-acpi-hest.h"
> > +#include "hw/acpi/ghes.h"  
> 
> Trivial but doe we need ghes.h?
> 
> > +
> > +void qmp_inject_ghes_error(const char *cper, Error **errp)
> > +{
> > +    error_setg(errp, "GHES QMP error inject is not compiled in");
> > +}  
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject
  2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  2025-01-22 15:46 ` [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
@ 2025-01-24 12:47 ` Igor Mammedov
  11 siblings, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-24 12:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Philippe Mathieu-Daudé, Ani Sinha, Cleber Rosa,
	Dongjiu Geng, Eduardo Habkost, Eric Blake, John Snow,
	Marcel Apfelbaum, Markus Armbruster, Michael Roth, Paolo Bonzini,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, kvm,
	linux-kernel

On Wed, 22 Jan 2025 16:46:17 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Now that the ghes preparation patches were merged, let's add support
> for error injection.
> 
> I'm opting to fold two patch series into one here:
> 
> 1. https://lore.kernel.org/qemu-devel/20250113130854.848688-1-mchehab+huawei@kernel.org/
> 
> It is the first 5 patches containing changes to the math used to calculate offsets at HEST
> table and hardware_error firmware file, together with its migration code. Migration tested
> with both latest QEMU released kernel and upstream, on both directions.
> 
> There were no changes on this series since last submission, except for a conflict
> resolution at the migration table, due to upstream changes.
> 
> For more details, se the post of my previous submission.
> 
> 2. It follows 6 patches from:
> 	https://lore.kernel.org/qemu-devel/cover.1726293808.git.mchehab+huawei@kernel.org/
>     containing the error injection code and script.
> 
>    They add a new QAPI to allow injecting GHESv2 errors, and a script using such QAPI
>    to inject ARM Processor Error records.
> 
> PS.: If I'm counting well, this is the 18th version of this series rebase.

the series is more or less in good shape,
it requires a few fixups here and there, so I'd expect to to be ready on
the next respin.

I'm done with this round of review.

PS:
the moment you'd start changing ACPI tables you need, 1st whitelist
affected tables and then update expected blobs with new content.
see comment at the beginning of tests/qtest/bios-tables-test.c

if you haven't done above 'make check-qtest' would fail,
and if it didn't that likely means a missing test case
(in that case please add one) 

> 
> Mauro Carvalho Chehab (11):
>   acpi/ghes: Prepare to support multiple sources on ghes
>   acpi/ghes: add a firmware file with HEST address
>   acpi/ghes: Use HEST table offsets when preparing GHES records
>   acpi/generic_event_device: Update GHES migration to cover hest addr
>   acpi/generic_event_device: add logic to detect if HEST addr is
>     available
>   acpi/ghes: add a notifier to notify when error data is ready
>   acpi/ghes: Cleanup the code which gets ghes ged state
>   acpi/generic_event_device: add an APEI error device
>   arm/virt: Wire up a GED error device for ACPI / GHES
>   qapi/acpi-hest: add an interface to do generic CPER error injection
>   scripts/ghes_inject: add a script to generate GHES error inject
> 
>  MAINTAINERS                            |  10 +
>  hw/acpi/Kconfig                        |   5 +
>  hw/acpi/aml-build.c                    |  10 +
>  hw/acpi/generic_event_device.c         |  38 ++
>  hw/acpi/ghes-stub.c                    |   4 +-
>  hw/acpi/ghes.c                         | 184 +++++--
>  hw/acpi/ghes_cper.c                    |  32 ++
>  hw/acpi/ghes_cper_stub.c               |  19 +
>  hw/acpi/meson.build                    |   2 +
>  hw/arm/virt-acpi-build.c               |  35 +-
>  hw/arm/virt.c                          |  19 +-
>  hw/core/machine.c                      |   2 +
>  include/hw/acpi/acpi_dev_interface.h   |   1 +
>  include/hw/acpi/aml-build.h            |   2 +
>  include/hw/acpi/generic_event_device.h |   1 +
>  include/hw/acpi/ghes.h                 |  36 +-
>  include/hw/arm/virt.h                  |   2 +
>  qapi/acpi-hest.json                    |  35 ++
>  qapi/meson.build                       |   1 +
>  qapi/qapi-schema.json                  |   1 +
>  scripts/arm_processor_error.py         | 377 +++++++++++++
>  scripts/ghes_inject.py                 |  51 ++
>  scripts/qmp_helper.py                  | 702 +++++++++++++++++++++++++
>  target/arm/kvm.c                       |   2 +-
>  24 files changed, 1517 insertions(+), 54 deletions(-)
>  create mode 100644 hw/acpi/ghes_cper.c
>  create mode 100644 hw/acpi/ghes_cper_stub.c
>  create mode 100644 qapi/acpi-hest.json
>  create mode 100644 scripts/arm_processor_error.py
>  create mode 100755 scripts/ghes_inject.py
>  create mode 100644 scripts/qmp_helper.py
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-23 10:02   ` Jonathan Cameron
  2025-01-23 11:46     ` Mauro Carvalho Chehab
  2025-01-23 17:01     ` Igor Mammedov
@ 2025-01-28 10:00     ` Mauro Carvalho Chehab
  2025-01-28 14:10       ` Jonathan Cameron
  2 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-28 10:00 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

Em Thu, 23 Jan 2025 10:02:17 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Wed, 22 Jan 2025 16:46:19 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Store HEST table address at GPA, placing its content at
> > hest_addr_le variable.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >   
> A few trivial things inline.
> 
> Jonathan
> 
> > ---
> > 
> > Change from v8:
> > - hest_addr_lr is now pointing to the error source size and data.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Bonus.  I guess you really like this patch :)
> > ---
> >  hw/acpi/ghes.c         | 17 ++++++++++++++++-
> >  include/hw/acpi/ghes.h |  1 +
> >  2 files changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 3f519ccab90d..34e3364d3fd8 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -30,6 +30,7 @@
> >  
> >  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
> >  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> > +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
> >  
> >  /* The max size in bytes for one error block */
> >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> > @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> >      }
> >  
> >      /*
> > -     * tell firmware to write hardware_errors GPA into
> > +     * Tell firmware to write hardware_errors GPA into  
> 
> Sneaky tidy up.  No problem with it in general but adding noise here, so if there
> are others in the series maybe gather them up in a cleanup patch.

There are no other cleanups pending. Besides, as you noticed, this
aligns with the comment below. So, I'm opting to add a note at the
patch's description.

> 
> >       * hardware_errors_addr fw_cfg, once the former has been initialized.
> >       */
> >      bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> > @@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >  
> >      acpi_table_begin(&table, table_data);
> >  
> > +    int hest_offset = table_data->len;  
> 
> Local style looks to be traditional C with definitions at top.  Maybe define
> hest_offset up a few lines and just set it here?

Ok. I'll follow Igor's suggestion of using uint32_t.

> > +
> >      /* Error Source Count */
> >      build_append_int_noprefix(table_data, num_sources, 4);
> >      for (i = 0; i < num_sources; i++) {
> > @@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >      }
> >  
> >      acpi_table_end(linker, &table);
> > +
> > +    /*
> > +     * tell firmware to write into GPA the address of HEST via fw_cfg,  
> 
> Given the tidy up above, fix this one to have a capital T, or was this
> where you meant to change it?

OK.

> > +     * once initialized.
> > +     */
> > +    bios_linker_loader_write_pointer(linker,
> > +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,  
> 
> Could wrap less and stay under 80 chars as both lines above add up to 70 something

Why? This follows QEMU coding style and lines aren't longer than 80
columns. Besides, at least for my eyes and some experience doing maintainership
on other projects over the years, it is a lot quicker to identify function
parameters if they're properly aligned with the parenthesis.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-23 17:01     ` Igor Mammedov
@ 2025-01-28 10:12       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-28 10:12 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

Em Thu, 23 Jan 2025 18:01:35 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Thu, 23 Jan 2025 10:02:17 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Wed, 22 Jan 2025 16:46:19 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > Store HEST table address at GPA, placing its content at
> > > hest_addr_le variable.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >     
> > A few trivial things inline.
> > 
> > Jonathan
> >   
> > > ---
> > > 
> > > Change from v8:
> > > - hest_addr_lr is now pointing to the error source size and data.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
> > Bonus.  I guess you really like this patch :)  
> > > ---
> > >  hw/acpi/ghes.c         | 17 ++++++++++++++++-
> > >  include/hw/acpi/ghes.h |  1 +
> > >  2 files changed, 17 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 3f519ccab90d..34e3364d3fd8 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -30,6 +30,7 @@
> > >  
> > >  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
> > >  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> > > +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
> > >  
> > >  /* The max size in bytes for one error block */
> > >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> > > @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> > >      }
> > >  
> > >      /*
> > > -     * tell firmware to write hardware_errors GPA into
> > > +     * Tell firmware to write hardware_errors GPA into    
> > 
> > Sneaky tidy up.  No problem with it in general but adding noise here, so if there
> > are others in the series maybe gather them up in a cleanup patch.  
> 
> +1

If Ok, I would prefer to keep this here, as there's no other cleanups
anymore, and writing a patch just for this seems overkill. Besides,
it replicates a comment with a similar content on this patch.

So, instead, if OK to you, I would prefer to add a comment about it at
the patch description.

> >   
> > >       * hardware_errors_addr fw_cfg, once the former has been initialized.
> > >       */
> > >      bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> > > @@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > >  
> > >      acpi_table_begin(&table, table_data);
> > >  
> > > +    int hest_offset = table_data->len;    
> should be unsigned, and better uint32_t
> but we have a zoo wrt type here all over the place.

Changed this one to uint32_t. 

> > Local style looks to be traditional C with definitions at top.  Maybe define
> > hest_offset up a few lines and just set it here?  
> 
> yep, it applies to whole QEMU (i.e. definitions only at the start of the block)

Good to know. That's my personal style too. Yet, I guess I saw somewhere
other places declaring variables in the middle of the code. 

> > > +
> > >      /* Error Source Count */
> > >      build_append_int_noprefix(table_data, num_sources, 4);
> > >      for (i = 0; i < num_sources; i++) {
> > > @@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > >      }
> > >  
> > >      acpi_table_end(linker, &table);
> > > +
> > > +    /*
> > > +     * tell firmware to write into GPA the address of HEST via fw_cfg,    
> > 
> > Given the tidy up above, fix this one to have a capital T, or was this
> > where you meant to change it?
> >   
> > > +     * once initialized.
> > > +     */
> > > +    bios_linker_loader_write_pointer(linker,
> > > +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,    
> > 
> > Could wrap less and stay under 80 chars as both lines above add up to 70 something
> >   
> > > +                                     sizeof(uint64_t),
> > > +                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> > >  }    
> >   
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available
  2025-01-24 10:23   ` Igor Mammedov
@ 2025-01-28 11:29     ` Mauro Carvalho Chehab
  2025-01-29  6:26       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-28 11:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Eduardo Habkost, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel

Em Fri, 24 Jan 2025 11:23:46 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Wed, 22 Jan 2025 16:46:22 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Create a new property (x-has-hest-addr) and use it to detect if
> > the GHES table offsets can be calculated from the HEST address
> > (qemu 9.2 and upper) or via the legacy way via an offset obtained  
> 
> 10.0 by now
> 
> > from the hardware_errors firmware file.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/acpi/generic_event_device.c |  1 +
> >  hw/acpi/ghes.c                 | 28 +++++++++++++++++++++-------
> >  hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
> >  hw/core/machine.c              |  2 ++
> >  include/hw/acpi/ghes.h         |  1 +
> >  5 files changed, 51 insertions(+), 11 deletions(-)
> > 
> > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > index 5346cae573b7..fe537ed05c66 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> >  
> >  static const Property acpi_ged_properties[] = {
> >      DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
> > +    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),  
> 
> s/hest_lookup/use_hest_addr/
> 
> >  };
> >  
> >  static const VMStateDescription vmstate_memhp_state = {
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index b46b563bcaf8..86c97f60d6a0 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -359,6 +359,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >  {
> >      AcpiTable table = { .sig = "HEST", .rev = 1,
> >                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> > +    AcpiGedState *acpi_ged_state;
> > +    AcpiGhesState *ags = NULL;
> >      int i;
> >  
> >      build_ghes_error_table(hardware_errors, linker, num_sources);
> > @@ -379,10 +381,20 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >       * tell firmware to write into GPA the address of HEST via fw_cfg,
> >       * once initialized.
> >       */
> > -    bios_linker_loader_write_pointer(linker,
> > -                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> > -                                     sizeof(uint64_t),
> > -                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> > +
> > +    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > +                                                       NULL));  
> 
> the caller, already did lookup,
> just pass hest_lookup as an argument and use it here

for all the above: OK.

> > +    if (!acpi_ged_state) {
> > +        return;
> > +    }
> > +
> > +    ags = &acpi_ged_state->ghes_state;
> > +    if (ags->hest_lookup) {
> > +        bios_linker_loader_write_pointer(linker,
> > +                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> > +                                         sizeof(uint64_t),
> > +                                         ACPI_BUILD_TABLE_FILE, hest_offset);
> > +    }
> >  }
> >  
> >  void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > @@ -396,8 +408,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> >      fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> >          NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);  
> 
> btw shouldn't we disable hw_error_le if hest_lookup is active?

Despite not being used, we still need the fw_cfg logic to recalculate the 
table offsets, solving the bios_linker stuff.

At the tests I did, not having a callback causes some fw_cfg issue when QEMU
tries to load the firmware or tries to update it.

> >  
> > -    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> > -        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> > +    if (ags && ags->hest_lookup) {  
> 
> why bother with 'ags &&' if we don't do it hw_error_le?

Legacy stuff. I'll drop "ags &&".

> 
> 
> > +        fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> > +            NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> > +    }
> >  
> >      ags->present = true;
> >  }
> > @@ -512,7 +526,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> >      }
> >      ags = &acpi_ged_state->ghes_state;
> >  
> > -    if (!ags->hest_addr_le) {
> > +    if (!ags->hest_lookup) {  
> why? !ags->hest_addr_le is sufficient

Either checking for "hest_addr_le" or for "use_hest_addr" would
equally work, assuming that address == 0 is invalid. I opted to use
the latest one because you requested on a previous review, and also
because it makes clearer that this comes from the migration logic,
which dictates what kind of lookup should be done.

From my side, either way works fine. What do you prefer?

> 
> >          get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> >                               &cper_addr, &read_ack_register_addr);
> >      } else {
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index 3d411787fc37..ada5d08cfbe7 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
> >      { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> >  };
> >  
> > +static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
> > +    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> > +};
> > +
> >  static
> >  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >  {
> > @@ -950,10 +954,28 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> >      build_dbg2(tables_blob, tables->linker, vms);
> >  
> >      if (vms->ras) {
> > -        acpi_add_table(table_offsets, tables_blob);
> > -        acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> > -                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
> > -                        vms->oem_id, vms->oem_table_id);
> > +        AcpiGhesState *ags;
> > +        AcpiGedState *acpi_ged_state;
> > +
> > +        acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > +                                                       NULL));
> > +        if (acpi_ged_state) {
> > +            ags = &acpi_ged_state->ghes_state;
> > +
> > +            acpi_add_table(table_offsets, tables_blob);
> > +
> > +            if (!ags->hest_lookup) {
> > +                acpi_build_hest(tables_blob, tables->hardware_errors,
> > +                                tables->linker, hest_ghes_notify_9_2,
> > +                                ARRAY_SIZE(hest_ghes_notify_9_2),
> > +                                vms->oem_id, vms->oem_table_id);
> > +            } else {
> > +                acpi_build_hest(tables_blob, tables->hardware_errors,
> > +                                tables->linker, hest_ghes_notify,
> > +                                ARRAY_SIZE(hest_ghes_notify),
> > +                                vms->oem_id, vms->oem_table_id);
> > +            }
> > +        }
> >      }
> >  
> >      if (ms->numa_state->num_nodes > 0) {
> > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > index c23b39949649..0d0cde481954 100644
> > --- a/hw/core/machine.c
> > +++ b/hw/core/machine.c
> > @@ -34,10 +34,12 @@
> >  #include "hw/virtio/virtio-pci.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/virtio-iommu.h"
> > +#include "hw/acpi/generic_event_device.h"
> >  #include "audio/audio.h"
> >  
> >  GlobalProperty hw_compat_9_2[] = {
> >      {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
> > +    { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
> >  };
> >  const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
> >  
> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index 237721fec0a2..164ed8b0f9a3 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -61,6 +61,7 @@ typedef struct AcpiGhesState {
> >      uint64_t hest_addr_le;
> >      uint64_t hw_error_le;
> >      bool present; /* True if GHES is present at all on this board */  
>                         and perhaps reformulate this as well
> 
> > +    bool hest_lookup; /* True if HEST address is present */  
>                                  if device should use HEST addr for error source lookup 
> 
> >  } AcpiGhesState;
> >  
> >  /*  
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-28 10:00     ` Mauro Carvalho Chehab
@ 2025-01-28 14:10       ` Jonathan Cameron
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron @ 2025-01-28 14:10 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Michael S . Tsirkin, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Tue, 28 Jan 2025 11:00:34 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Thu, 23 Jan 2025 10:02:17 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> 
> > On Wed, 22 Jan 2025 16:46:19 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > Store HEST table address at GPA, placing its content at
> > > hest_addr_le variable.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >     
> > A few trivial things inline.
> > 
> > Jonathan
> >   
> > > ---
> > > 
> > > Change from v8:
> > > - hest_addr_lr is now pointing to the error source size and data.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
> > Bonus.  I guess you really like this patch :)  
> > > ---
> > >  hw/acpi/ghes.c         | 17 ++++++++++++++++-
> > >  include/hw/acpi/ghes.h |  1 +
> > >  2 files changed, 17 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 3f519ccab90d..34e3364d3fd8 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -30,6 +30,7 @@
> > >  
> > >  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
> > >  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> > > +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
> > >  
> > >  /* The max size in bytes for one error block */
> > >  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> > > @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> > >      }
> > >  
> > >      /*
> > > -     * tell firmware to write hardware_errors GPA into
> > > +     * Tell firmware to write hardware_errors GPA into    
> > 
> > Sneaky tidy up.  No problem with it in general but adding noise here, so if there
> > are others in the series maybe gather them up in a cleanup patch.  
> 
> There are no other cleanups pending. Besides, as you noticed, this
> aligns with the comment below. So, I'm opting to add a note at the
> patch's description.

ok.


> 
> > > +     * once initialized.
> > > +     */
> > > +    bios_linker_loader_write_pointer(linker,
> > > +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,    
> > 
> > Could wrap less and stay under 80 chars as both lines above add up to 70 something  
> 
> Why? This follows QEMU coding style and lines aren't longer than 80
> columns. Besides, at least for my eyes and some experience doing maintainership
> on other projects over the years, it is a lot quicker to identify function
> parameters if they're properly aligned with the parenthesis.
Ah. I didn't state this clearly enough.
       bios_linker_loader_write_pointer(linker, ACPI_HEST_ADDR_FW_CFG_FILE, 0,    
is also under 80 chars.


> 
> Thanks,
> Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/11] acpi/generic_event_device: add an APEI error device
  2025-01-24 12:30   ` Igor Mammedov
@ 2025-01-28 17:42     ` Mauro Carvalho Chehab
  2025-01-28 17:45       ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-28 17:42 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, linux-kernel

Em Fri, 24 Jan 2025 13:30:54 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Wed, 22 Jan 2025 16:46:25 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Adds a generic error device to handle generic hardware error
> > events as specified at ACPI 6.5 specification at 18.3.2.7.2:
> > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > using HID PNP0C33.
> > 
> > The PNP0C33 device is used to report hardware errors to
> > the guest via ACPI APEI Generic Hardware Error Source (GHES).
> > 
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> >  hw/acpi/aml-build.c                    | 10 ++++++++++
> >  hw/acpi/generic_event_device.c         |  8 ++++++++
> >  include/hw/acpi/acpi_dev_interface.h   |  1 +
> >  include/hw/acpi/aml-build.h            |  2 ++
> >  include/hw/acpi/generic_event_device.h |  1 +
> >  5 files changed, 22 insertions(+)
> > 
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index f8f93a9f66c8..e4bd7b611372 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
> >  
> >      return var;
> >  }
> > +
> > +/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
> > +Aml *aml_error_device(void)
> > +{
> > +    Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> > +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> > +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > +
> > +    return dev;
> > +}
> > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > index fe537ed05c66..ce00c80054f4 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
> >      ACPI_GED_PWR_DOWN_EVT,
> >      ACPI_GED_NVDIMM_HOTPLUG_EVT,
> >      ACPI_GED_CPU_HOTPLUG_EVT,
> > +    ACPI_GED_ERROR_EVT,
> >  };
> >  
> >  /*
> > @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
> >                             aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> >                                        aml_int(0x80)));
> >                  break;
> > +            case ACPI_GED_ERROR_EVT:
> > +                aml_append(if_ctx,
> > +                           aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> > +                                      aml_int(0x80)));  
>                                                  ^^^^^
> nit: perhaps add a comment with intent and ref to spec wrt above  value

Will add this as with a define:

	/*
	 * ACPI 5.0b: 5.6.6 Device Object Notifications
	 * Table 5-135 Error Device Notification Values
	 */
	#define ERROR_DEVICE_NOTIFICATION   0x80

(the spec here is the same as we used on this patch for aml_error_device()
function)

> 
> > +                break;
> >              case ACPI_GED_NVDIMM_HOTPLUG_EVT:
> >                  aml_append(if_ctx,
> >                             aml_notify(aml_name("\\_SB.NVDR"),
> > @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> >          sel = ACPI_GED_MEM_HOTPLUG_EVT;
> >      } else if (ev & ACPI_POWER_DOWN_STATUS) {
> >          sel = ACPI_GED_PWR_DOWN_EVT;
> > +    } else if (ev & ACPI_GENERIC_ERROR) {
> > +        sel = ACPI_GED_ERROR_EVT;
> >      } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
> >          sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
> >      } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> > diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> > index 68d9d15f50aa..8294f8f0ccca 100644
> > --- a/include/hw/acpi/acpi_dev_interface.h
> > +++ b/include/hw/acpi/acpi_dev_interface.h
> > @@ -13,6 +13,7 @@ typedef enum {
> >      ACPI_NVDIMM_HOTPLUG_STATUS = 16,
> >      ACPI_VMGENID_CHANGE_STATUS = 32,
> >      ACPI_POWER_DOWN_STATUS = 64,
> > +    ACPI_GENERIC_ERROR = 128,
> >  } AcpiEventStatusBits;
> >  
> >  #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > index c18f68134246..f38e12971932 100644
> > --- a/include/hw/acpi/aml-build.h
> > +++ b/include/hw/acpi/aml-build.h
> > @@ -252,6 +252,7 @@ struct CrsRangeSet {
> >  /* Consumer/Producer */
> >  #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY        (1 << 1)
> >  
> > +#define ACPI_APEI_ERROR_DEVICE   "GEDD"
> >  /**
> >   * init_aml_allocator:
> >   *
> > @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
> >               uint8_t channel);
> >  Aml *aml_sleep(uint64_t msec);
> >  Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
> > +Aml *aml_error_device(void);
> >  
> >  /* Block AML object primitives */
> >  Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
> > diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> > index d2dac87b4a9f..1c18ac296fcb 100644
> > --- a/include/hw/acpi/generic_event_device.h
> > +++ b/include/hw/acpi/generic_event_device.h
> > @@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> >  #define ACPI_GED_PWR_DOWN_EVT      0x2
> >  #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> >  #define ACPI_GED_CPU_HOTPLUG_EVT    0x8
> > +#define ACPI_GED_ERROR_EVT          0x10
> >  
> >  typedef struct GEDState {
> >      MemoryRegion evt;  
> 



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/11] acpi/generic_event_device: add an APEI error device
  2025-01-28 17:42     ` Mauro Carvalho Chehab
@ 2025-01-28 17:45       ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2025-01-28 17:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Jonathan Cameron, Shiju Jose, qemu-arm, qemu-devel,
	Ani Sinha, linux-kernel

On Tue, Jan 28, 2025 at 06:42:02PM +0100, Mauro Carvalho Chehab wrote:
> Em Fri, 24 Jan 2025 13:30:54 +0100
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > On Wed, 22 Jan 2025 16:46:25 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > 
> > > Adds a generic error device to handle generic hardware error
> > > events as specified at ACPI 6.5 specification at 18.3.2.7.2:
> > > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > > using HID PNP0C33.
> > > 
> > > The PNP0C33 device is used to report hardware errors to
> > > the guest via ACPI APEI Generic Hardware Error Source (GHES).
> > > 
> > > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > >  hw/acpi/aml-build.c                    | 10 ++++++++++
> > >  hw/acpi/generic_event_device.c         |  8 ++++++++
> > >  include/hw/acpi/acpi_dev_interface.h   |  1 +
> > >  include/hw/acpi/aml-build.h            |  2 ++
> > >  include/hw/acpi/generic_event_device.h |  1 +
> > >  5 files changed, 22 insertions(+)
> > > 
> > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > > index f8f93a9f66c8..e4bd7b611372 100644
> > > --- a/hw/acpi/aml-build.c
> > > +++ b/hw/acpi/aml-build.c
> > > @@ -2614,3 +2614,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
> > >  
> > >      return var;
> > >  }
> > > +
> > > +/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
> > > +Aml *aml_error_device(void)
> > > +{
> > > +    Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> > > +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> > > +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > > +
> > > +    return dev;
> > > +}
> > > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > > index fe537ed05c66..ce00c80054f4 100644
> > > --- a/hw/acpi/generic_event_device.c
> > > +++ b/hw/acpi/generic_event_device.c
> > > @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
> > >      ACPI_GED_PWR_DOWN_EVT,
> > >      ACPI_GED_NVDIMM_HOTPLUG_EVT,
> > >      ACPI_GED_CPU_HOTPLUG_EVT,
> > > +    ACPI_GED_ERROR_EVT,
> > >  };
> > >  
> > >  /*
> > > @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
> > >                             aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> > >                                        aml_int(0x80)));
> > >                  break;
> > > +            case ACPI_GED_ERROR_EVT:
> > > +                aml_append(if_ctx,
> > > +                           aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> > > +                                      aml_int(0x80)));  
> >                                                  ^^^^^
> > nit: perhaps add a comment with intent and ref to spec wrt above  value
> 
> Will add this as with a define:
> 
> 	/*
> 	 * ACPI 5.0b: 5.6.6 Device Object Notifications
> 	 * Table 5-135 Error Device Notification Values
> 	 */
> 	#define ERROR_DEVICE_NOTIFICATION   0x80
> 
> (the spec here is the same as we used on this patch for aml_error_device()
> function)

we do not do a lot of defines definetely not for 1 time used constants.

just comment on top of the value.

> > 
> > > +                break;
> > >              case ACPI_GED_NVDIMM_HOTPLUG_EVT:
> > >                  aml_append(if_ctx,
> > >                             aml_notify(aml_name("\\_SB.NVDR"),
> > > @@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> > >          sel = ACPI_GED_MEM_HOTPLUG_EVT;
> > >      } else if (ev & ACPI_POWER_DOWN_STATUS) {
> > >          sel = ACPI_GED_PWR_DOWN_EVT;
> > > +    } else if (ev & ACPI_GENERIC_ERROR) {
> > > +        sel = ACPI_GED_ERROR_EVT;
> > >      } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
> > >          sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
> > >      } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> > > diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> > > index 68d9d15f50aa..8294f8f0ccca 100644
> > > --- a/include/hw/acpi/acpi_dev_interface.h
> > > +++ b/include/hw/acpi/acpi_dev_interface.h
> > > @@ -13,6 +13,7 @@ typedef enum {
> > >      ACPI_NVDIMM_HOTPLUG_STATUS = 16,
> > >      ACPI_VMGENID_CHANGE_STATUS = 32,
> > >      ACPI_POWER_DOWN_STATUS = 64,
> > > +    ACPI_GENERIC_ERROR = 128,
> > >  } AcpiEventStatusBits;
> > >  
> > >  #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> > > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > > index c18f68134246..f38e12971932 100644
> > > --- a/include/hw/acpi/aml-build.h
> > > +++ b/include/hw/acpi/aml-build.h
> > > @@ -252,6 +252,7 @@ struct CrsRangeSet {
> > >  /* Consumer/Producer */
> > >  #define AML_SERIAL_BUS_FLAG_CONSUME_ONLY        (1 << 1)
> > >  
> > > +#define ACPI_APEI_ERROR_DEVICE   "GEDD"
> > >  /**
> > >   * init_aml_allocator:
> > >   *
> > > @@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
> > >               uint8_t channel);
> > >  Aml *aml_sleep(uint64_t msec);
> > >  Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
> > > +Aml *aml_error_device(void);
> > >  
> > >  /* Block AML object primitives */
> > >  Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
> > > diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> > > index d2dac87b4a9f..1c18ac296fcb 100644
> > > --- a/include/hw/acpi/generic_event_device.h
> > > +++ b/include/hw/acpi/generic_event_device.h
> > > @@ -101,6 +101,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> > >  #define ACPI_GED_PWR_DOWN_EVT      0x2
> > >  #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> > >  #define ACPI_GED_CPU_HOTPLUG_EVT    0x8
> > > +#define ACPI_GED_ERROR_EVT          0x10
> > >  
> > >  typedef struct GEDState {
> > >      MemoryRegion evt;  
> > 
> 
> 
> 
> Thanks,
> Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available
  2025-01-28 11:29     ` Mauro Carvalho Chehab
@ 2025-01-29  6:26       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2025-01-29  6:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Eduardo Habkost, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel

Em Tue, 28 Jan 2025 12:29:51 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Fri, 24 Jan 2025 11:23:46 +0100
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > On Wed, 22 Jan 2025 16:46:22 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > Create a new property (x-has-hest-addr) and use it to detect if
> > > the GHES table offsets can be calculated from the HEST address
> > > (qemu 9.2 and upper) or via the legacy way via an offset obtained    
> > 
> > 10.0 by now
> >   
> > > from the hardware_errors firmware file.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > ---
> > >  hw/acpi/generic_event_device.c |  1 +
> > >  hw/acpi/ghes.c                 | 28 +++++++++++++++++++++-------
> > >  hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
> > >  hw/core/machine.c              |  2 ++
> > >  include/hw/acpi/ghes.h         |  1 +
> > >  5 files changed, 51 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > > index 5346cae573b7..fe537ed05c66 100644
> > > --- a/hw/acpi/generic_event_device.c
> > > +++ b/hw/acpi/generic_event_device.c
> > > @@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> > >  
> > >  static const Property acpi_ged_properties[] = {
> > >      DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
> > > +    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),    
> > 
> > s/hest_lookup/use_hest_addr/
> >   
> > >  };
> > >  
> > >  static const VMStateDescription vmstate_memhp_state = {
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index b46b563bcaf8..86c97f60d6a0 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -359,6 +359,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > >  {
> > >      AcpiTable table = { .sig = "HEST", .rev = 1,
> > >                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> > > +    AcpiGedState *acpi_ged_state;
> > > +    AcpiGhesState *ags = NULL;
> > >      int i;
> > >  
> > >      build_ghes_error_table(hardware_errors, linker, num_sources);
> > > @@ -379,10 +381,20 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > >       * tell firmware to write into GPA the address of HEST via fw_cfg,
> > >       * once initialized.
> > >       */
> > > -    bios_linker_loader_write_pointer(linker,
> > > -                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> > > -                                     sizeof(uint64_t),
> > > -                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> > > +
> > > +    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > > +                                                       NULL));    
> > 
> > the caller, already did lookup,
> > just pass hest_lookup as an argument and use it here  
> 
> for all the above: OK.
> 
> > > +    if (!acpi_ged_state) {
> > > +        return;
> > > +    }
> > > +
> > > +    ags = &acpi_ged_state->ghes_state;
> > > +    if (ags->hest_lookup) {
> > > +        bios_linker_loader_write_pointer(linker,
> > > +                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> > > +                                         sizeof(uint64_t),
> > > +                                         ACPI_BUILD_TABLE_FILE, hest_offset);
> > > +    }
> > >  }
> > >  
> > >  void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > @@ -396,8 +408,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > >      fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> > >          NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);    
> > 
> > btw shouldn't we disable hw_error_le if hest_lookup is active?  
> 
> Despite not being used, we still need the fw_cfg logic to recalculate the 
> table offsets, solving the bios_linker stuff.
> 
> At the tests I did, not having a callback causes some fw_cfg issue when QEMU
> tries to load the firmware or tries to update it.
> 
> > >  
> > > -    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> > > -        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> > > +    if (ags && ags->hest_lookup) {    
> > 
> > why bother with 'ags &&' if we don't do it hw_error_le?  
> 
> Legacy stuff. I'll drop "ags &&".
> 
> > 
> >   
> > > +        fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> > > +            NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> > > +    }
> > >  
> > >      ags->present = true;
> > >  }
> > > @@ -512,7 +526,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > >      }
> > >      ags = &acpi_ged_state->ghes_state;
> > >  
> > > -    if (!ags->hest_addr_le) {
> > > +    if (!ags->hest_lookup) {    
> > why? !ags->hest_addr_le is sufficient  
> 
> Either checking for "hest_addr_le" or for "use_hest_addr" would
> equally work, assuming that address == 0 is invalid. I opted to use
> the latest one because you requested on a previous review, and also
> because it makes clearer that this comes from the migration logic,
> which dictates what kind of lookup should be done.
> 
> From my side, either way works fine. What do you prefer?

After sleeping on it, IMO checking for !ags->hest_addr_le is better here and will
align with the new code from this patch:

	acpi/ghes: Cleanup the code which gets ghes ged state

that will remove ags->present in favor of checking for both hw_error_le and
hest_addr_le:

	AcpiGhesState *acpi_ghes_get_state(void)
	{
	     AcpiGedState *acpi_ged_state;
	     AcpiGhesState *ags;
 
	     if (!acpi_ged_state) {
	        return NULL;
	     }
	     ags = &acpi_ged_state->ghes_state;

	    if (!ags->hw_error_le && !ags->hest_addr_le) {
	        return NULL;
	    }
	    return ags;
	 }

So, I'll drop this hunk.

> 
> >   
> > >          get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > >                               &cper_addr, &read_ack_register_addr);
> > >      } else {
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index 3d411787fc37..ada5d08cfbe7 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -897,6 +897,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
> > >      { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> > >  };
> > >  
> > > +static const AcpiNotificationSourceId hest_ghes_notify_9_2[] = {
> > > +    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
> > > +};
> > > +
> > >  static
> > >  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> > >  {
> > > @@ -950,10 +954,28 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> > >      build_dbg2(tables_blob, tables->linker, vms);
> > >  
> > >      if (vms->ras) {
> > > -        acpi_add_table(table_offsets, tables_blob);
> > > -        acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> > > -                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
> > > -                        vms->oem_id, vms->oem_table_id);
> > > +        AcpiGhesState *ags;
> > > +        AcpiGedState *acpi_ged_state;
> > > +
> > > +        acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > > +                                                       NULL));
> > > +        if (acpi_ged_state) {
> > > +            ags = &acpi_ged_state->ghes_state;
> > > +
> > > +            acpi_add_table(table_offsets, tables_blob);
> > > +
> > > +            if (!ags->hest_lookup) {
> > > +                acpi_build_hest(tables_blob, tables->hardware_errors,
> > > +                                tables->linker, hest_ghes_notify_9_2,
> > > +                                ARRAY_SIZE(hest_ghes_notify_9_2),
> > > +                                vms->oem_id, vms->oem_table_id);
> > > +            } else {
> > > +                acpi_build_hest(tables_blob, tables->hardware_errors,
> > > +                                tables->linker, hest_ghes_notify,
> > > +                                ARRAY_SIZE(hest_ghes_notify),
> > > +                                vms->oem_id, vms->oem_table_id);
> > > +            }
> > > +        }
> > >      }
> > >  
> > >      if (ms->numa_state->num_nodes > 0) {
> > > diff --git a/hw/core/machine.c b/hw/core/machine.c
> > > index c23b39949649..0d0cde481954 100644
> > > --- a/hw/core/machine.c
> > > +++ b/hw/core/machine.c
> > > @@ -34,10 +34,12 @@
> > >  #include "hw/virtio/virtio-pci.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "hw/virtio/virtio-iommu.h"
> > > +#include "hw/acpi/generic_event_device.h"
> > >  #include "audio/audio.h"
> > >  
> > >  GlobalProperty hw_compat_9_2[] = {
> > >      {"arm-cpu", "backcompat-pauth-default-use-qarma5", "true"},
> > > +    { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
> > >  };
> > >  const size_t hw_compat_9_2_len = G_N_ELEMENTS(hw_compat_9_2);
> > >  
> > > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > > index 237721fec0a2..164ed8b0f9a3 100644
> > > --- a/include/hw/acpi/ghes.h
> > > +++ b/include/hw/acpi/ghes.h
> > > @@ -61,6 +61,7 @@ typedef struct AcpiGhesState {
> > >      uint64_t hest_addr_le;
> > >      uint64_t hw_error_le;
> > >      bool present; /* True if GHES is present at all on this board */    
> >                         and perhaps reformulate this as well
> >   
> > > +    bool hest_lookup; /* True if HEST address is present */    
> >                                  if device should use HEST addr for error source lookup 
> >   
> > >  } AcpiGhesState;
> > >  
> > >  /*    
> >   
> 
> 
> 
> Thanks,
> Mauro



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] acpi/ghes: add a firmware file with HEST address
  2025-01-22 15:46 ` [PATCH 02/11] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
  2025-01-23 10:02   ` Jonathan Cameron
@ 2025-01-29 13:33   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Igor Mammedov @ 2025-01-29 13:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Michael S . Tsirkin, Jonathan Cameron, Shiju Jose, qemu-arm,
	qemu-devel, Ani Sinha, Dongjiu Geng, linux-kernel

On Wed, 22 Jan 2025 16:46:19 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Store HEST table address at GPA, placing its content at
> hest_addr_le variable.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> ---
> 
> Change from v8:
> - hest_addr_lr is now pointing to the error source size and data.

that's confusing, variable name say it's HEST table address,
while in practice it's (that + offset).

I'd very much prefer it being table start and then you'd add
offset later on where it's going to be used.  

> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes.c         | 17 ++++++++++++++++-
>  include/hw/acpi/ghes.h |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 3f519ccab90d..34e3364d3fd8 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -30,6 +30,7 @@
>  
>  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
>  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
>  
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> @@ -261,7 +262,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
>      }
>  
>      /*
> -     * tell firmware to write hardware_errors GPA into
> +     * Tell firmware to write hardware_errors GPA into

drop this hunk as it's not related to  the patch

>       * hardware_errors_addr fw_cfg, once the former has been initialized.
>       */
>      bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> @@ -355,6 +356,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>  
>      acpi_table_begin(&table, table_data);
>  
> +    int hest_offset = table_data->len;
> +
>      /* Error Source Count */
>      build_append_int_noprefix(table_data, num_sources, 4);
>      for (i = 0; i < num_sources; i++) {
> @@ -362,6 +365,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>      }
>  
>      acpi_table_end(linker, &table);
> +
> +    /*
> +     * tell firmware to write into GPA the address of HEST via fw_cfg,
> +     * once initialized.
> +     */
> +    bios_linker_loader_write_pointer(linker,
> +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> +                                     sizeof(uint64_t),
> +                                     ACPI_BUILD_TABLE_FILE, hest_offset);
>  }
>  
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> @@ -375,6 +387,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
>      fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
>          NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
>  
> +    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> +        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> +
>      ags->present = true;
>  }
>  
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 9f0120d0d596..237721fec0a2 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
>  };
>  
>  typedef struct AcpiGhesState {
> +    uint64_t hest_addr_le;
>      uint64_t hw_error_le;
>      bool present; /* True if GHES is present at all on this board */
>  } AcpiGhesState;


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2025-01-29 13:34 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-22 15:46 [PATCH 00/11] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
2025-01-22 15:46 ` [PATCH 01/11] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
2025-01-23  9:56   ` Jonathan Cameron
2025-01-23 16:48   ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 02/11] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2025-01-23 10:02   ` Jonathan Cameron
2025-01-23 11:46     ` Mauro Carvalho Chehab
2025-01-23 17:01     ` Igor Mammedov
2025-01-28 10:12       ` Mauro Carvalho Chehab
2025-01-28 10:00     ` Mauro Carvalho Chehab
2025-01-28 14:10       ` Jonathan Cameron
2025-01-29 13:33   ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
2025-01-23 10:29   ` Jonathan Cameron
2025-01-23 18:23     ` Mauro Carvalho Chehab
2025-01-24  9:59       ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 04/11] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2025-01-23 10:31   ` Jonathan Cameron
2025-01-24 10:08   ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 05/11] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
2025-01-23 10:52   ` Jonathan Cameron
2025-01-24 10:23   ` Igor Mammedov
2025-01-28 11:29     ` Mauro Carvalho Chehab
2025-01-29  6:26       ` Mauro Carvalho Chehab
2025-01-22 15:46 ` [PATCH 06/11] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2025-01-23 10:52   ` Jonathan Cameron
2025-01-22 15:46 ` [PATCH 07/11] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
2025-01-23 10:54   ` Jonathan Cameron
2025-01-24 12:25   ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 08/11] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2025-01-24 12:30   ` Igor Mammedov
2025-01-28 17:42     ` Mauro Carvalho Chehab
2025-01-28 17:45       ` Michael S. Tsirkin
2025-01-22 15:46 ` [PATCH 09/11] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2025-01-23 10:56   ` Jonathan Cameron
2025-01-22 15:46 ` [PATCH 10/11] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2025-01-23 11:00   ` Jonathan Cameron
2025-01-24 12:40     ` Igor Mammedov
2025-01-24 12:38   ` Igor Mammedov
2025-01-22 15:46 ` [PATCH 11/11] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2025-01-23 12:10   ` Jonathan Cameron
2025-01-24 12:47 ` [PATCH 00/11] Change ghes to use HEST-based offsets and add support for " Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox