qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Change ghes driver to use HEST-based offsets
@ 2024-11-13  8:36 Mauro Carvalho Chehab
  2024-11-13  8:36 ` [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, qemu-arm,
	qemu-devel

This  series was part of the previous PR to add generic error injection
support on GHES. It depends on a cleanup patch series sent yesterday:

https://lore.kernel.org/qemu-devel/c3e608a16a795b2d2be476eddc3707febcdb1ca3.1731406254.git.mchehab+huawei@kernel.org/T/#m60839c88a009f04bc73f75832ccb6a41523259ef

It contains the changes of the math used to calculate offsets at HEST table 
and hardware_error firmware file. It prepares for the addition of GHES
error injection.

The first patch was previously at the cleanup series. It prepares
the logic to support multiple sources.

The second patch adds a new firmware file to store HEST address.

The third patch use the new firmware to calculate offsets using
HEST table.

Patches 4 and 5 add migration support. They assume that this
series will be merged for qemu 9.2 (maybe it is too late for that,
as QEMU is now on soft freeze). 

I tested migration using both virt-9.1 and virt-9.2 machines
on qemu 9.2.

I also tested migration with:

	qemu-9.1 -M virt-9.1 -cpu cortex-a57 => qemu-9.2 -M virt-9.1 -cpu cortex-a57
	qemu-9.2 -M virt-9.1 -cpu cortex-a57 => qemu-9.1 -M virt-9.1 -cpu cortex-a57

Mauro Carvalho Chehab (6):
  acpi/ghes: Prepare to support multiple sources on ghes
  acpi/ghes: add a firmware file with HEST address
  acpi/ghes: rename the function which gets hw error offsets
  acpi/ghes: Use HEST table offsets when preparing GHES records
  acpi/generic_event_device: Update GHES migration to cover hest addr
  acpi/generic_event_device: add logic to detect if HEST addr is
    available

 hw/acpi/generic_event_device.c |  30 +++++++
 hw/acpi/ghes.c                 | 158 +++++++++++++++++++++++++++++----
 hw/arm/virt-acpi-build.c       |  33 ++++++-
 hw/core/machine.c              |   2 +
 include/hw/acpi/ghes.h         |  23 +++--
 5 files changed, 217 insertions(+), 29 deletions(-)

-- 
2.47.0




^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
@ 2024-11-13  8:36 ` Mauro Carvalho Chehab
  2024-11-20 14:29   ` Jonathan Cameron via
  2024-11-13  8:36 ` [PATCH 2/6] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
	Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

The current code is actually dependent on having just one
error structure with a single source.

As the number of sources should be arch-dependent, as it
will depend on what kind of synchronous/assynchronous
notifications will exist, change the logic to dynamically
build the table.

Yet, for a proper support, we need to get the number of
sources by reading the number from the HEST table. However,
bios currently doesn't store a pointer to it.

For now just change the logic at table build time, while
enforcing that it will behave like before with a single
source ID.

A future patch will add a HEST table bios pointer and
change the logic at acpi_ghes_record_errors() to
dynamically use the new size.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/acpi/ghes.c           | 43 ++++++++++++++++++++++++++--------------
 hw/arm/virt-acpi-build.c |  5 +++++
 include/hw/acpi/ghes.h   | 21 +++++++++++++-------
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e7de3b302602..a590b0f6f85f 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -212,17 +212,26 @@ ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
  * See docs/specs/acpi_hest_ghes.rst for blobs format.
  */
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+                                   int num_sources)
 {
     int i, error_status_block_offset;
 
+    /*
+     * TODO: Current version supports only one source.
+     * A further patch will drop this check, after adding a proper migration
+     * code, as, for the code to work, we need to store a bios pointer to the
+     * HEST table.
+     */
+    assert(num_sources == 1);
+
     /* Build error_block_address */
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
     }
 
     /* Build read_ack_register */
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         /*
          * Initialize the value of read_ack_register to 1, so GHES can be
          * writable after (re)boot.
@@ -237,13 +246,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
 
     /* Reserve space for Error Status Data Block */
     acpi_data_push(hardware_errors,
-        ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+        ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
 
     /* Tell guest firmware to place hardware_errors blob into RAM */
     bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
                              hardware_errors, sizeof(uint64_t), false);
 
-    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+    for (i = 0; i < num_sources; i++) {
         /*
          * Tell firmware to patch error_block_address entries to point to
          * corresponding "Generic Error Status Block"
@@ -269,10 +278,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
 /* Build Generic Hardware Error Source version 2 (GHESv2) */
 static void build_ghes_v2(GArray *table_data,
                           BIOSLinker *linker,
-                          enum AcpiGhesNotifyType notify,
-                          uint16_t source_id)
+                          const AcpiNotificationSourceId *notif_src,
+                          uint16_t index, int num_sources)
 {
     uint64_t address_offset;
+    const uint16_t notify = notif_src->notify;
+    const uint16_t source_id = notif_src->source_id;
 
     /*
      * Type:
@@ -303,7 +314,7 @@ static void build_ghes_v2(GArray *table_data,
                                    address_offset + GAS_ADDR_OFFSET,
                                    sizeof(uint64_t),
                                    ACPI_HW_ERROR_FW_CFG_FILE,
-                                   source_id * sizeof(uint64_t));
+                                   index * sizeof(uint64_t));
 
     /* Notification Structure */
     build_ghes_hw_error_notification(table_data, notify);
@@ -323,8 +334,7 @@ static void build_ghes_v2(GArray *table_data,
                                    address_offset + GAS_ADDR_OFFSET,
                                    sizeof(uint64_t),
                                    ACPI_HW_ERROR_FW_CFG_FILE,
-                                   (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
-                                   * sizeof(uint64_t));
+                                   (num_sources + index) * sizeof(uint64_t));
 
     /*
      * Read Ack Preserve field
@@ -339,19 +349,23 @@ static void build_ghes_v2(GArray *table_data,
 /* Build Hardware Error Source Table */
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
                      BIOSLinker *linker,
+                     const AcpiNotificationSourceId * const notif_source,
+                     int num_sources,
                      const char *oem_id, const char *oem_table_id)
 {
     AcpiTable table = { .sig = "HEST", .rev = 1,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
+    int i;
 
-    build_ghes_error_table(hardware_errors, linker);
+    build_ghes_error_table(hardware_errors, linker, num_sources);
 
     acpi_table_begin(&table, table_data);
 
     /* Error Source Count */
-    build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
-    build_ghes_v2(table_data, linker,
-                  ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
+    build_append_int_noprefix(table_data, num_sources, 4);
+    for (i = 0; i < num_sources; i++) {
+        build_ghes_v2(table_data, linker, &notif_source[i], i, num_sources);
+    }
 
     acpi_table_end(linker, &table);
 }
@@ -416,7 +430,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
     get_ghes_offsets(le64_to_cpu(ags->hw_error_le), &cper_addr, &read_ack_register_addr);
 
     cper_addr = le64_to_cpu(cper_addr);
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e059317b002e..40f66792570c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -889,6 +889,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
     g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
 }
 
+static const AcpiNotificationSourceId hest_ghes_notify[] = {
+    {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -944,6 +948,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     if (vms->ras) {
         acpi_add_table(table_offsets, tables_blob);
         acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
                         vms->oem_id, vms->oem_table_id);
     }
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 9a765edb27ef..e47ffacbb5c9 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -57,20 +57,27 @@ enum AcpiGhesNotifyType {
     ACPI_GHES_NOTIFY_RESERVED = 12
 };
 
-enum {
-    ACPI_HEST_SRC_ID_SEA = 0,
-    /* future ids go here */
-
-    ACPI_GHES_ERROR_SOURCE_COUNT
-};
-
 typedef struct AcpiGhesState {
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
+/*
+ * ID numbers used to fill HEST source ID field
+ */
+enum AcpiGhesSourceID {
+    ACPI_HEST_SRC_ID_SYNC,
+};
+
+typedef struct AcpiNotificationSourceId {
+    enum AcpiGhesSourceID source_id;
+    enum AcpiGhesNotifyType notify;
+} AcpiNotificationSourceId;
+
 void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
                      BIOSLinker *linker,
+                     const AcpiNotificationSourceId * const notif_source,
+                     int num_sources,
                      const char *oem_id, const char *oem_table_id);
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] acpi/ghes: add a firmware file with HEST address
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
  2024-11-13  8:36 ` [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2024-11-13  8:36 ` Mauro Carvalho Chehab
  2024-11-20 14:32   ` Jonathan Cameron via
  2024-11-13  8:37 ` [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
	qemu-arm, qemu-devel

Store HEST table address at GPA, placing its content at
hest_addr_le variable.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

---

Change from v8:
- hest_addr_lr is now pointing to the error source size and data.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c         | 15 +++++++++++++++
 include/hw/acpi/ghes.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a590b0f6f85f..4cd79d42cd04 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
 
 #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
 #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
 
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
@@ -361,6 +362,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
 
     acpi_table_begin(&table, table_data);
 
+    int hest_offset = table_data->len;
+
     /* Error Source Count */
     build_append_int_noprefix(table_data, num_sources, 4);
     for (i = 0; i < num_sources; i++) {
@@ -368,6 +371,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
     }
 
     acpi_table_end(linker, &table);
+
+    /*
+     * tell firmware to write into GPA the address of HEST via fw_cfg,
+     * once initialized.
+     */
+    bios_linker_loader_write_pointer(linker,
+                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+                                     sizeof(uint64_t),
+                                     ACPI_BUILD_TABLE_FILE, hest_offset);
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -381,6 +393,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
+    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+
     ags->present = true;
 }
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index e47ffacbb5c9..a07c30ef13b7 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
 };
 
 typedef struct AcpiGhesState {
+    uint64_t hest_addr_le;
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
  2024-11-13  8:36 ` [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
  2024-11-13  8:36 ` [PATCH 2/6] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-11-13  8:37 ` Mauro Carvalho Chehab
  2024-11-20 14:33   ` Jonathan Cameron via
  2024-11-13  8:37 ` [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
	qemu-arm, qemu-devel

Rename: get_ghes_offsets->get_hw_error_offsets
to make clear that this function return offsets based on the
hardware error firmware.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 4cd79d42cd04..c93bbaf1994a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -399,9 +399,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     ags->present = true;
 }
 
-static void get_ghes_offsets(uint64_t ghes_addr,
-                             uint64_t *cper_addr,
-                             uint64_t *read_ack_register_addr)
+static void get_hw_error_offsets(uint64_t ghes_addr,
+                                 uint64_t *cper_addr,
+                                 uint64_t *read_ack_register_addr)
 {
     if (!ghes_addr) {
         return;
@@ -445,7 +445,8 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    get_ghes_offsets(le64_to_cpu(ags->hw_error_le), &cper_addr, &read_ack_register_addr);
+    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+                         &cper_addr, &read_ack_register_addr);
 
     cper_addr = le64_to_cpu(cper_addr);
     if (!cper_addr) {
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2024-11-13  8:37 ` [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets Mauro Carvalho Chehab
@ 2024-11-13  8:37 ` Mauro Carvalho Chehab
  2024-11-20 14:59   ` Jonathan Cameron via
  2024-11-13  8:37 ` [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
  2024-11-13  8:37 ` [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
	qemu-arm, qemu-devel

There are two pointers that are needed during error injection:

1. The start address of the CPER block to be stored;
2. The address of the ack, which needs a reset before next error.

Calculate them preferrable from the HEST table, as this allows
checking the source ID, the size of the table and the type of
HEST error block structures.

Yet, keep the old code, as this is needed for migration purposes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 88 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index c93bbaf1994a..9ee25efe8abf 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -61,6 +61,23 @@
  */
 #define ACPI_GHES_GESB_SIZE                 20
 
+/*
+ * Offsets with regards to the start of the HEST table stored at
+ * ags->hest_addr_le, according with the memory layout map at
+ * docs/specs/acpi_hest_ghes.rst.
+ */
+
+/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
+ */
+#define HEST_GHES_V2_TABLE_SIZE  92
+#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
+
+/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
+ * Table 18-380: 'Error Status Address' field
+ */
+#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
+
 /*
  * Values for error_severity field
  */
@@ -218,14 +235,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
 {
     int i, error_status_block_offset;
 
-    /*
-     * TODO: Current version supports only one source.
-     * A further patch will drop this check, after adding a proper migration
-     * code, as, for the code to work, we need to store a bios pointer to the
-     * HEST table.
-     */
-    assert(num_sources == 1);
-
     /* Build error_block_address */
     for (i = 0; i < num_sources; i++) {
         build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
@@ -425,6 +434,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
     *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
+static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
+                                    uint64_t *cper_addr,
+                                    uint64_t *read_ack_start_addr,
+                                    Error **errp)
+{
+    uint64_t hest_err_block_addr, hest_read_ack_addr;
+    uint64_t err_source_struct, error_block_addr;
+    uint32_t num_sources, i;
+
+    if (!hest_addr) {
+        return;
+    }
+
+    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
+    num_sources = le32_to_cpu(num_sources);
+
+    err_source_struct = hest_addr + sizeof(num_sources);
+
+    /*
+     * Currently, HEST Error source navigates only for GHESv2 tables
+     */
+
+    for (i = 0; i < num_sources; i++) {
+        uint64_t addr = err_source_struct;
+        uint16_t type, src_id;
+
+        cpu_physical_memory_read(addr, &type, sizeof(type));
+        type = le16_to_cpu(type);
+
+        /* For now, we only know the size of GHESv2 table */
+        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
+            error_setg(errp, "HEST: type %d not supported.", type);
+            return;
+        }
+
+        /* It is GHES. Compare CPER source address */
+        addr += sizeof(type);
+        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
+
+        if (src_id == source_id) {
+            break;
+        }
+
+        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
+    }
+    if (i == num_sources) {
+        error_setg(errp, "HEST: Source %d not found.", source_id);
+        return;
+    }
+
+    /* Navigate though table address pointers */
+    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
+    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
+
+    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+                             sizeof(error_block_addr));
+
+    cpu_physical_memory_read(error_block_addr, cper_addr,
+                             sizeof(*cper_addr));
+
+    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
+                             sizeof(*read_ack_start_addr));
+}
+
 void ghes_record_cper_errors(const void *cper, size_t len,
                              uint16_t source_id, Error **errp)
 {
@@ -445,8 +518,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
-                         &cper_addr, &read_ack_register_addr);
+    if (!ags->hest_addr_le) {
+        get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+                             &cper_addr, &read_ack_register_addr);
+    } else {
+        get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
+                                &cper_addr, &read_ack_register_addr, errp);
+    }
 
     cper_addr = le64_to_cpu(cper_addr);
     if (!cper_addr) {
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2024-11-13  8:37 ` [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
@ 2024-11-13  8:37 ` Mauro Carvalho Chehab
  2024-11-20 15:01   ` Jonathan Cameron via
  2024-11-13  8:37 ` [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, linux-kernel, qemu-devel

The GHES migration logic at GED should now support HEST table
location too.

Increase migration version and change needed to check for both
ghes_addr_le and hest_addr_le.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 17baf36132a8..c1116dd8d7ae 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -387,6 +387,34 @@ static const VMStateDescription vmstate_ghes_state = {
     }
 };
 
+static const VMStateDescription vmstate_hest = {
+    .name = "acpi-hest",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (const VMStateField[]) {
+        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool hest_needed(void *opaque)
+{
+    AcpiGedState *s = opaque;
+    return s->ghes_state.hest_addr_le;
+}
+
+static const VMStateDescription vmstate_hest_state = {
+    .name = "acpi-ged/hest",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = hest_needed,
+    .fields = (const VMStateField[]) {
+        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+                       vmstate_hest, AcpiGhesState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_acpi_ged = {
     .name = "acpi-ged",
     .version_id = 1,
@@ -399,6 +427,7 @@ static const VMStateDescription vmstate_acpi_ged = {
         &vmstate_memhp_state,
         &vmstate_cpuhp_state,
         &vmstate_ghes_state,
+        &vmstate_hest_state,
         NULL
     }
 };
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available
  2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2024-11-13  8:37 ` [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2024-11-13  8:37 ` Mauro Carvalho Chehab
  2024-11-20 15:09   ` Jonathan Cameron via
  5 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-13  8:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Philippe Mathieu-Daudé, Ani Sinha,
	Dongjiu Geng, Eduardo Habkost, Marcel Apfelbaum, Peter Maydell,
	Shannon Zhao, Yanan Wang, Zhao Liu, linux-kernel, qemu-arm,
	qemu-devel

Create a new property (x-has-hest-addr) and use it to detect if
the GHES table offsets can be calculated from the HEST address
(qemu 9.2 and upper) or via the legacy way via an offset obtained
from the hardware_errors firmware file.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/generic_event_device.c |  1 +
 hw/acpi/ghes.c                 | 27 ++++++++++++++++++++-------
 hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
 hw/core/machine.c              |  2 ++
 include/hw/acpi/ghes.h         |  1 +
 5 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index c1116dd8d7ae..df6b4fab2d30 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
 
 static Property acpi_ged_properties[] = {
     DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
+    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 9ee25efe8abf..2d34a6ddf133 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -365,6 +365,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
 {
     AcpiTable table = { .sig = "HEST", .rev = 1,
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
+    AcpiGedState *acpi_ged_state;
+    AcpiGhesState *ags = NULL;
     int i;
 
     build_ghes_error_table(hardware_errors, linker, num_sources);
@@ -385,10 +387,19 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
      * tell firmware to write into GPA the address of HEST via fw_cfg,
      * once initialized.
      */
-    bios_linker_loader_write_pointer(linker,
-                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
-                                     sizeof(uint64_t),
-                                     ACPI_BUILD_TABLE_FILE, hest_offset);
+
+    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+    if (acpi_ged_state) {
+        ags = &acpi_ged_state->ghes_state;
+    }
+
+    if (ags->hest_lookup) {
+        bios_linker_loader_write_pointer(linker,
+                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+                                         sizeof(uint64_t),
+                                         ACPI_BUILD_TABLE_FILE, hest_offset);
+    }
 }
 
 void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -402,8 +413,10 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
         NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
 
-    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
-        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+    if (ags && ags->hest_lookup) {
+        fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+            NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+    }
 
     ags->present = true;
 }
@@ -518,7 +531,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
     }
     ags = &acpi_ged_state->ghes_state;
 
-    if (!ags->hest_addr_le) {
+    if (!ags->hest_lookup) {
         get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
                              &cper_addr, &read_ack_register_addr);
     } else {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 40f66792570c..930ba9e0a14c 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -893,6 +893,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
     {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
 };
 
+static const AcpiNotificationSourceId hest_ghes_notify_9_1[] = {
+    {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
+};
+
 static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
@@ -946,10 +950,28 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
     build_dbg2(tables_blob, tables->linker, vms);
 
     if (vms->ras) {
-        acpi_add_table(table_offsets, tables_blob);
-        acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
-                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
-                        vms->oem_id, vms->oem_table_id);
+        AcpiGhesState *ags;
+        AcpiGedState *acpi_ged_state;
+
+        acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
+                                                       NULL));
+        if (acpi_ged_state) {
+            ags = &acpi_ged_state->ghes_state;
+
+            acpi_add_table(table_offsets, tables_blob);
+
+            if (!ags->hest_lookup) {
+                acpi_build_hest(tables_blob, tables->hardware_errors,
+                                tables->linker, hest_ghes_notify_9_1,
+                                ARRAY_SIZE(hest_ghes_notify_9_1),
+                                vms->oem_id, vms->oem_table_id);
+            } else {
+                acpi_build_hest(tables_blob, tables->hardware_errors,
+                                tables->linker, hest_ghes_notify,
+                                ARRAY_SIZE(hest_ghes_notify),
+                                vms->oem_id, vms->oem_table_id);
+            }
+        }
     }
 
     if (ms->numa_state->num_nodes > 0) {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index a35c4a8faecb..00521a1963ba 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,10 +34,12 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/acpi/generic_event_device.h"
 #include "audio/audio.h"
 
 GlobalProperty hw_compat_9_1[] = {
     { TYPE_PCI_DEVICE, "x-pcie-ext-tag", "false" },
+    { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
 };
 const size_t hw_compat_9_1_len = G_N_ELEMENTS(hw_compat_9_1);
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index a07c30ef13b7..040d6ee366b2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -61,6 +61,7 @@ typedef struct AcpiGhesState {
     uint64_t hest_addr_le;
     uint64_t hw_error_le;
     bool present; /* True if GHES is present at all on this board */
+    bool hest_lookup; /* True if HEST address is present */
 } AcpiGhesState;
 
 /*
-- 
2.47.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes
  2024-11-13  8:36 ` [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2024-11-20 14:29   ` Jonathan Cameron via
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 14:29 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
	qemu-devel

On Wed, 13 Nov 2024 09:36:58 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The current code is actually dependent on having just one
> error structure with a single source.

Trivial: Very short line wrap of the description. Maybe more
than 60 chars?

> 
> As the number of sources should be arch-dependent, as it
> will depend on what kind of synchronous/assynchronous
> notifications will exist, change the logic to dynamically
> build the table.
> 
> Yet, for a proper support, we need to get the number of
> sources by reading the number from the HEST table. However,
> bios currently doesn't store a pointer to it.
> 
> For now just change the logic at table build time, while
> enforcing that it will behave like before with a single
> source ID.
> 
> A future patch will add a HEST table bios pointer and
> change the logic at acpi_ghes_record_errors() to
> dynamically use the new size.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Took a fresh look. One trivial comment.


> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e059317b002e..40f66792570c 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -889,6 +889,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
>      g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
>  }
>  
> +static const AcpiNotificationSourceId hest_ghes_notify[] = {
> +    {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},

    { ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA },
seems to be local style.

> +};
> +
>  static
>  void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>  {
> @@ -944,6 +948,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
>      if (vms->ras) {
>          acpi_add_table(table_offsets, tables_blob);
>          acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> +                        hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
>                          vms->oem_id, vms->oem_table_id);
>      }



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6] acpi/ghes: add a firmware file with HEST address
  2024-11-13  8:36 ` [PATCH 2/6] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-11-20 14:32   ` Jonathan Cameron via
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 14:32 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Wed, 13 Nov 2024 09:36:59 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Store HEST table address at GPA, placing its content at
> hest_addr_le variable.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> 
> ---
> 
> Change from v8:
> - hest_addr_lr is now pointing to the error source size and data.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Drop the extra SoB. Doesn't matter really but looks strange in the
patch.

One trivial inline.

This stuff always gives me a headache, so with that in mind, looks
fine to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---
>  hw/acpi/ghes.c         | 15 +++++++++++++++
>  include/hw/acpi/ghes.h |  1 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index a590b0f6f85f..4cd79d42cd04 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -30,6 +30,7 @@
>  
>  #define ACPI_HW_ERROR_FW_CFG_FILE           "etc/hardware_errors"
>  #define ACPI_HW_ERROR_ADDR_FW_CFG_FILE      "etc/hardware_errors_addr"
> +#define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
>  
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> @@ -361,6 +362,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>  
>      acpi_table_begin(&table, table_data);
>  
> +    int hest_offset = table_data->len;
> +
>      /* Error Source Count */
>      build_append_int_noprefix(table_data, num_sources, 4);
>      for (i = 0; i < num_sources; i++) {
> @@ -368,6 +371,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>      }
>  
>      acpi_table_end(linker, &table);
> +
> +    /*
> +     * tell firmware to write into GPA the address of HEST via fw_cfg,
Tell

File is inconsistent but mostly uses a capital to start.
> +     * once initialized.
> +     */
> +    bios_linker_loader_write_pointer(linker,
> +                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> +                                     sizeof(uint64_t),
> +                                     ACPI_BUILD_TABLE_FILE, hest_offset);
>  }
esState;



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets
  2024-11-13  8:37 ` [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets Mauro Carvalho Chehab
@ 2024-11-20 14:33   ` Jonathan Cameron via
  2024-11-22  9:32     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 14:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Wed, 13 Nov 2024 09:37:00 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Rename: get_ghes_offsets->get_hw_error_offsets
> to make clear that this function return offsets based on the
> hardware error firmware.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records
  2024-11-13  8:37 ` [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
@ 2024-11-20 14:59   ` Jonathan Cameron via
  2024-11-22 10:37     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 14:59 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Wed, 13 Nov 2024 09:37:01 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There are two pointers that are needed during error injection:
> 
> 1. The start address of the CPER block to be stored;
> 2. The address of the ack, which needs a reset before next error.
> 
> Calculate them preferrable from the HEST table, as this allows
> checking the source ID, the size of the table and the type of
> HEST error block structures.

It is preferable to calculate them from the HEST table.  This allows
checking the source ID, the size of the table and the type of the
HEST error block structures.

A few comments inline.

Jonathan


> 
> Yet, keep the old code, as this is needed for migration purposes.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 88 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index c93bbaf1994a..9ee25efe8abf 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -61,6 +61,23 @@
>   */
>  #define ACPI_GHES_GESB_SIZE                 20
>  
> +/*
> + * Offsets with regards to the start of the HEST table stored at
> + * ags->hest_addr_le, according with the memory layout map at
> + * docs/specs/acpi_hest_ghes.rst.
> + */
> +
/*
 * ACPI 6.2:

to be consistent with local style.

> +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> + */
> +#define HEST_GHES_V2_TABLE_SIZE  92
> +#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
> +
/*
 * ACPI 6.2: 
> +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> + * Table 18-380: 'Error Status Address' field
> + */

> +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> +                                    uint64_t *cper_addr,
> +                                    uint64_t *read_ack_start_addr,
> +                                    Error **errp)
> +{
> +    uint64_t hest_err_block_addr, hest_read_ack_addr;
> +    uint64_t err_source_struct, error_block_addr;
> +    uint32_t num_sources, i;
> +
> +    if (!hest_addr) {

Trivial but I wonder if this should be named to indicate that it sin't the start
of HEST but the first bit of the header.
hest_body_address or something like that maybe?  I don't care that much
though if you prefer to keep as is.


> +        return;
> +    }
> +
> +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
> +    num_sources = le32_to_cpu(num_sources);
> +
> +    err_source_struct = hest_addr + sizeof(num_sources);
> +
> +    /*
> +     * Currently, HEST Error source navigates only for GHESv2 tables
> +     */

Feels like duplication of the comment below where the type check is.
Maybe drop this one?

> +
> +    for (i = 0; i < num_sources; i++) {
> +        uint64_t addr = err_source_struct;
> +        uint16_t type, src_id;
> +
> +        cpu_physical_memory_read(addr, &type, sizeof(type));
> +        type = le16_to_cpu(type);
> +
> +        /* For now, we only know the size of GHESv2 table */
> +        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> +            error_setg(errp, "HEST: type %d not supported.", type);
> +            return;
> +        }
> +
> +        /* It is GHES. Compare CPER source address */

It's GHESv2 (of course this bit is the same, but none the less comment
is misleading). I'd just go with
        /* Compare CPER source address */

> +        addr += sizeof(type);
> +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> +
> +        if (src_id == source_id) {
> +            break;
> +        }
> +
> +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> +    }
> +    if (i == num_sources) {
> +        error_setg(errp, "HEST: Source %d not found.", source_id);
> +        return;
> +    }
> +
> +    /* Navigate though table address pointers */
> +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;

> +
> +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> +                             sizeof(error_block_addr));
So this points to a registers
> +
> +    cpu_physical_memory_read(error_block_addr, cper_addr,
> +                             sizeof(*cper_addr));
and this reads the register. I'm not sure the spec defines the
contents of that register to be constant.  Maybe we should avoid
reading the register here and do it instead at read of the record?
I 'think' you could in theory use different storage for the CPER
depending on other unhandled errors or whatever else meant you didn't
want a fixed location.

Or maybe just add a comment to say that the location of CPER storage
is fixed.

> +
> +    cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> +                             sizeof(*read_ack_start_addr));
> +}



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr
  2024-11-13  8:37 ` [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2024-11-20 15:01   ` Jonathan Cameron via
  2024-11-22 10:40     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 15:01 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	linux-kernel, qemu-devel

On Wed, 13 Nov 2024 09:37:02 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The GHES migration logic at GED should now support HEST table
> location too.
> 
> Increase migration version and change needed to check for both
> ghes_addr_le and hest_addr_le.

Where is the migration version increased?  Maybe I'm misunderstanding
the comment.

> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 17baf36132a8..c1116dd8d7ae 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -387,6 +387,34 @@ static const VMStateDescription vmstate_ghes_state = {
>      }
>  };
>  
> +static const VMStateDescription vmstate_hest = {
> +    .name = "acpi-hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool hest_needed(void *opaque)
> +{
> +    AcpiGedState *s = opaque;
> +    return s->ghes_state.hest_addr_le;
> +}
> +
> +static const VMStateDescription vmstate_hest_state = {
> +    .name = "acpi-ged/hest",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = hest_needed,
> +    .fields = (const VMStateField[]) {
> +        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> +                       vmstate_hest, AcpiGhesState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_acpi_ged = {
>      .name = "acpi-ged",
>      .version_id = 1,
> @@ -399,6 +427,7 @@ static const VMStateDescription vmstate_acpi_ged = {
>          &vmstate_memhp_state,
>          &vmstate_cpuhp_state,
>          &vmstate_ghes_state,
> +        &vmstate_hest_state,
>          NULL
>      }
>  };



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available
  2024-11-13  8:37 ` [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
@ 2024-11-20 15:09   ` Jonathan Cameron via
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-20 15:09 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Eduardo Habkost, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel, qemu-arm, qemu-devel

On Wed, 13 Nov 2024 09:37:03 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Create a new property (x-has-hest-addr) and use it to detect if
> the GHES table offsets can be calculated from the HEST address
> (qemu 9.2 and upper) or via the legacy way via an offset obtained
> from the hardware_errors firmware file.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Comments inline.

Nice work hammering all this into shape.

Thanks,

Jonathan

> ---
>  hw/acpi/generic_event_device.c |  1 +
>  hw/acpi/ghes.c                 | 27 ++++++++++++++++++++-------
>  hw/arm/virt-acpi-build.c       | 30 ++++++++++++++++++++++++++----
>  hw/core/machine.c              |  2 ++
>  include/hw/acpi/ghes.h         |  1 +
>  5 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index c1116dd8d7ae..df6b4fab2d30 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -318,6 +318,7 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
>  
>  static Property acpi_ged_properties[] = {
>      DEFINE_PROP_UINT32("ged-event", AcpiGedState, ged_event_bitmap, 0),
> +    DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 9ee25efe8abf..2d34a6ddf133 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -365,6 +365,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>  {
>      AcpiTable table = { .sig = "HEST", .rev = 1,
>                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> +    AcpiGedState *acpi_ged_state;
> +    AcpiGhesState *ags = NULL;
>      int i;
>  
>      build_ghes_error_table(hardware_errors, linker, num_sources);
> @@ -385,10 +387,19 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>       * tell firmware to write into GPA the address of HEST via fw_cfg,
>       * once initialized.
>       */
> -    bios_linker_loader_write_pointer(linker,
> -                                     ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> -                                     sizeof(uint64_t),
> -                                     ACPI_BUILD_TABLE_FILE, hest_offset);
> +
> +    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> +                                                       NULL));
> +    if (acpi_ged_state) {

Won't fail, but if it did (And given you have a check you are assuming
it might).

> +        ags = &acpi_ged_state->ghes_state;
> +    }
> +
Then ags is NULL and boom.

> +    if (ags->hest_lookup) {
> +        bios_linker_loader_write_pointer(linker,
> +                                         ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> +                                         sizeof(uint64_t),
> +                                         ACPI_BUILD_TABLE_FILE, hest_offset);
> +    }
>  }

> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 40f66792570c..930ba9e0a14c 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -893,6 +893,10 @@ static const AcpiNotificationSourceId hest_ghes_notify[] = {
>      {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
>  };
>  
> +static const AcpiNotificationSourceId hest_ghes_notify_9_1[] = {
> +    {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
{ ACPI...
> +};



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets
  2024-11-20 14:33   ` Jonathan Cameron via
@ 2024-11-22  9:32     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22  9:32 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

Em Wed, 20 Nov 2024 14:33:08 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Wed, 13 Nov 2024 09:37:00 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Rename: get_ghes_offsets->get_hw_error_offsets
> > to make clear that this function return offsets based on the
> > hardware error firmware.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

As there were changes at the cleanup series, I ended merging this one
there on patch 13/15:

	https://lore.kernel.org/qemu-devel/e5661a6383449675b28e15c8479ebca42c939368.1732266152.git.mchehab+huawei@kernel.org/T/#u
	
Thanks,
Mauro


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records
  2024-11-20 14:59   ` Jonathan Cameron via
@ 2024-11-22 10:37     ` Mauro Carvalho Chehab
  2024-11-25 11:31       ` Jonathan Cameron via
  0 siblings, 1 reply; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 10:37 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

Em Wed, 20 Nov 2024 14:59:30 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Wed, 13 Nov 2024 09:37:01 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > There are two pointers that are needed during error injection:
> > 
> > 1. The start address of the CPER block to be stored;
> > 2. The address of the ack, which needs a reset before next error.
> > 
> > Calculate them preferrable from the HEST table, as this allows
> > checking the source ID, the size of the table and the type of
> > HEST error block structures.  
> 
> It is preferable to calculate them from the HEST table.  This allows
> checking the source ID, the size of the table and the type of the
> HEST error block structures.
> 
> A few comments inline.
> 
> Jonathan
> 
> 
> > 
> > Yet, keep the old code, as this is needed for migration purposes.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 88 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index c93bbaf1994a..9ee25efe8abf 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -61,6 +61,23 @@
> >   */
> >  #define ACPI_GHES_GESB_SIZE                 20
> >  
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> > +  
> /*
>  * ACPI 6.2:
> 
> to be consistent with local style.

Actually, the local style inside this file was preserved. See, before
this series we have:

$ git grep "ACPI " hw/acpi/ghes.c
hw/acpi/ghes.c: * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
hw/acpi/ghes.c: * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
hw/acpi/ghes.c: * ACPI 4.0: 17.3.2.7 Hardware Error Notification
hw/acpi/ghes.c: * ACPI 6.1: 18.3.2.7.1 Generic Error Data
hw/acpi/ghes.c: * ACPI 6.1: 18.3.2.7.1 Generic Error Data
hw/acpi/ghes.c:    /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
hw/acpi/ghes.c:         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
hw/acpi/ghes.c:     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source

> 
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > + */
> > +#define HEST_GHES_V2_TABLE_SIZE  92
> > +#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
> > +  
> /*
>  * ACPI 6.2: 
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> > + * Table 18-380: 'Error Status Address' field
> > + */  
> 
> > +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> > +                                    uint64_t *cper_addr,
> > +                                    uint64_t *read_ack_start_addr,
> > +                                    Error **errp)
> > +{
> > +    uint64_t hest_err_block_addr, hest_read_ack_addr;
> > +    uint64_t err_source_struct, error_block_addr;
> > +    uint32_t num_sources, i;
> > +
> > +    if (!hest_addr) {  
> 
> Trivial but I wonder if this should be named to indicate that it sin't the start
> of HEST but the first bit of the header.
> hest_body_address or something like that maybe?  I don't care that much
> though if you prefer to keep as is.

I prefer to keep a simple name here, so let's keep as is.

> 
> 
> > +        return;
> > +    }
> > +
> > +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
> > +    num_sources = le32_to_cpu(num_sources);
> > +
> > +    err_source_struct = hest_addr + sizeof(num_sources);
> > +
> > +    /*
> > +     * Currently, HEST Error source navigates only for GHESv2 tables
> > +     */  
> 
> Feels like duplication of the comment below where the type check is.
> Maybe drop this one?

If I recall correctly [1], Igor asked to place such comment, on one of
the past versions of this code.

IMO, placing it clearly there helps to identify what needs to change when
support for non-GHES tables is needed.

[1] this is the 12th version of this code - my memory is starting to fail
    to remind were exactly each change was requested.
> 
> > +
> > +    for (i = 0; i < num_sources; i++) {
> > +        uint64_t addr = err_source_struct;
> > +        uint16_t type, src_id;
> > +
> > +        cpu_physical_memory_read(addr, &type, sizeof(type));
> > +        type = le16_to_cpu(type);
> > +
> > +        /* For now, we only know the size of GHESv2 table */
> > +        if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> > +            error_setg(errp, "HEST: type %d not supported.", type);
> > +            return;
> > +        }
> > +
> > +        /* It is GHES. Compare CPER source address */  
> 
> It's GHESv2 (of course this bit is the same, but none the less comment
> is misleading). I'd just go with
>         /* Compare CPER source address */

I changed it to:

        /* Compare CPER source address at the GHESv2 structure */

to better state what is there. IMO, it is important to let it
clear, as, with:
ACPI 6.5: 18.3.2.11. Error Source Structure Header (Type 12 Onward)

what happens if that, if type <= 11, the struct is:

	offset 0:	type
	offset 2:	source ID

When type >= 12, the structure changes to:

	offset 0:	type
	offset 2:	Error Source Structure Length

As ACPI 6.5 doesn't define type 12, we don't know yet where source ID
will be placed.

> 
> > +        addr += sizeof(type);
> > +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > +
> > +        if (src_id == source_id) {
> > +            break;
> > +        }
> > +
> > +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > +    }
> > +    if (i == num_sources) {
> > +        error_setg(errp, "HEST: Source %d not found.", source_id);
> > +        return;
> > +    }
> > +
> > +    /* Navigate though table address pointers */
> > +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> > +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;  
> 
> > +
> > +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > +                             sizeof(error_block_addr));  
> So this points to a registers
> > +
> > +    cpu_physical_memory_read(error_block_addr, cper_addr,
> > +                             sizeof(*cper_addr));  
> and this reads the register. I'm not sure the spec defines the
> contents of that register to be constant.  Maybe we should avoid
> reading the register here and do it instead at read of the record?
> I 'think' you could in theory use different storage for the CPER
> depending on other unhandled errors or whatever else meant you didn't
> want a fixed location.
> 
> Or maybe just add a comment to say that the location of CPER storage
> is fixed.

Sorry, but I missed your point. The actual offset of the error block 
address is defined when fw_cfg callback is called. When this happens,
this function is called:

	void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
	                          GArray *hardware_error)
	{
	    /* Create a read-only fw_cfg file for GHES */
	    fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
        	            hardware_error->len);

	    /* Create a read-write fw_cfg file for Address */
	    fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
	        NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);

	    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
	        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);

	    ags->present = true;
	}

The other calls for fw_cfg functions ensure the offset of the memory read 
inside the hardware_error firmware and this never changes, as such offsets
are defined when the hardware_firmware is built at build_ghes_error_table()
function. This will only change if such function is called again, which
would, in turn, make acpi_ghes_add_fw_cfg() be called again.

In any case, no matter if build_ghes_error_table()/acpi_ghes_add_fw_cfg()
is called only once or multiple times [1], at the time 
get_ghes_source_offsets() is called, such offsets are stable.

[1] On some tests I did adding printks, the GHES build logic and the callbacks
    are called twice - both before loading OSPM.

    If migration is used, I suspect that it will be called again during
    migration but before starting OSPM. Again, when get_ghes_source_offsets()
    is called, the offsets are fixed.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr
  2024-11-20 15:01   ` Jonathan Cameron via
@ 2024-11-22 10:40     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 17+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 10:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	linux-kernel, qemu-devel

Em Wed, 20 Nov 2024 15:01:19 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:

> On Wed, 13 Nov 2024 09:37:02 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > The GHES migration logic at GED should now support HEST table
> > location too.
> > 
> > Increase migration version and change needed to check for both
> > ghes_addr_le and hest_addr_le.  
> 
> Where is the migration version increased?  Maybe I'm misunderstanding
> the comment.

Legacy comment. We dropped migration version increase, as this is not
needed anymore.

Comment dropped.

> 
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/acpi/generic_event_device.c | 29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> > 
> > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > index 17baf36132a8..c1116dd8d7ae 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -387,6 +387,34 @@ static const VMStateDescription vmstate_ghes_state = {
> >      }
> >  };
> >  
> > +static const VMStateDescription vmstate_hest = {
> > +    .name = "acpi-hest",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .fields = (const VMStateField[]) {
> > +        VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> > +static bool hest_needed(void *opaque)
> > +{
> > +    AcpiGedState *s = opaque;
> > +    return s->ghes_state.hest_addr_le;
> > +}
> > +
> > +static const VMStateDescription vmstate_hest_state = {
> > +    .name = "acpi-ged/hest",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .needed = hest_needed,
> > +    .fields = (const VMStateField[]) {
> > +        VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > +                       vmstate_hest, AcpiGhesState),
> > +        VMSTATE_END_OF_LIST()
> > +    }
> > +};
> > +
> >  static const VMStateDescription vmstate_acpi_ged = {
> >      .name = "acpi-ged",
> >      .version_id = 1,
> > @@ -399,6 +427,7 @@ static const VMStateDescription vmstate_acpi_ged = {
> >          &vmstate_memhp_state,
> >          &vmstate_cpuhp_state,
> >          &vmstate_ghes_state,
> > +        &vmstate_hest_state,
> >          NULL
> >      }
> >  };  
> 



Thanks,
Mauro


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records
  2024-11-22 10:37     ` Mauro Carvalho Chehab
@ 2024-11-25 11:31       ` Jonathan Cameron via
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron via @ 2024-11-25 11:31 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel


> >   
> > > 
> > > Yet, keep the old code, as this is needed for migration purposes.
> > > 
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > ---
> > >  hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
> > >  1 file changed, 88 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index c93bbaf1994a..9ee25efe8abf 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -61,6 +61,23 @@
> > >   */
> > >  #define ACPI_GHES_GESB_SIZE                 20
> > >  
> > > +/*
> > > + * Offsets with regards to the start of the HEST table stored at
> > > + * ags->hest_addr_le, according with the memory layout map at
> > > + * docs/specs/acpi_hest_ghes.rst.
> > > + */
> > > +    
> > /*
> >  * ACPI 6.2:
> > 
> > to be consistent with local style.  
> 
> Actually, the local style inside this file was preserved. See, before
> this series we have:
Ah. That's me being lazy and unclear :(
What I meant was

/*
 * ACPI 6.2:  18.3.2.8 Generic Hardware Error Source version 2

So open the comment with a blank line.
 
> 
> $ git grep "ACPI " hw/acpi/ghes.c
> hw/acpi/ghes.c: * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> hw/acpi/ghes.c: * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> hw/acpi/ghes.c: * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> hw/acpi/ghes.c: * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> hw/acpi/ghes.c: * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> hw/acpi/ghes.c:    /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> hw/acpi/ghes.c:         * ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> hw/acpi/ghes.c:     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> 
> >   

> > 
> >   
> > > +        return;
> > > +    }
> > > +
> > > +    cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
> > > +    num_sources = le32_to_cpu(num_sources);
> > > +
> > > +    err_source_struct = hest_addr + sizeof(num_sources);
> > > +
> > > +    /*
> > > +     * Currently, HEST Error source navigates only for GHESv2 tables
> > > +     */    
> > 
> > Feels like duplication of the comment below where the type check is.
> > Maybe drop this one?  
> 
> If I recall correctly [1], Igor asked to place such comment, on one of
> the past versions of this code.
> 
> IMO, placing it clearly there helps to identify what needs to change when
> support for non-GHES tables is needed.
> 
> [1] this is the 12th version of this code - my memory is starting to fail
>     to remind were exactly each change was requested.
Me too! :)


> >   
> > > +        addr += sizeof(type);
> > > +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > > +
> > > +        if (src_id == source_id) {
> > > +            break;
> > > +        }
> > > +
> > > +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > > +    }
> > > +    if (i == num_sources) {
> > > +        error_setg(errp, "HEST: Source %d not found.", source_id);
> > > +        return;
> > > +    }
> > > +
> > > +    /* Navigate though table address pointers */
> > > +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> > > +    hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;    
> >   
> > > +
> > > +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > > +                             sizeof(error_block_addr));    
> > So this points to a registers  
> > > +
> > > +    cpu_physical_memory_read(error_block_addr, cper_addr,
> > > +                             sizeof(*cper_addr));    
> > and this reads the register. I'm not sure the spec defines the
> > contents of that register to be constant.  Maybe we should avoid
> > reading the register here and do it instead at read of the record?
> > I 'think' you could in theory use different storage for the CPER
> > depending on other unhandled errors or whatever else meant you didn't
> > want a fixed location.
> > 
> > Or maybe just add a comment to say that the location of CPER storage
> > is fixed.  
> 
> Sorry, but I missed your point. The actual offset of the error block 
> address is defined when fw_cfg callback is called. When this happens,
> this function is called:
> 
> 	void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> 	                          GArray *hardware_error)
> 	{
> 	    /* Create a read-only fw_cfg file for GHES */
> 	    fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
>         	            hardware_error->len);
> 
> 	    /* Create a read-write fw_cfg file for Address */
> 	    fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> 	        NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
> 
> 	    fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> 	        NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> 
> 	    ags->present = true;
> 	}

My question was intended to be a little different but was based on a misread on my part.
I'd failed to figure out this function is called on each addition of an error
not just once.

Code is fine as is.

Jonathan


> 
> The other calls for fw_cfg functions ensure the offset of the memory read 
> inside the hardware_error firmware and this never changes, as such offsets
> are defined when the hardware_firmware is built at build_ghes_error_table()
> function. This will only change if such function is called again, which
> would, in turn, make acpi_ghes_add_fw_cfg() be called again.
> 
> In any case, no matter if build_ghes_error_table()/acpi_ghes_add_fw_cfg()
> is called only once or multiple times [1], at the time 
> get_ghes_source_offsets() is called, such offsets are stable.
> 
> [1] On some tests I did adding printks, the GHES build logic and the callbacks
>     are called twice - both before loading OSPM.
> 
>     If migration is used, I suspect that it will be called again during
>     migration but before starting OSPM. Again, when get_ghes_source_offsets()
>     is called, the offsets are fixed.
> 
> Thanks,
> Mauro



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-11-25 11:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-13  8:36 [PATCH 0/6] Change ghes driver to use HEST-based offsets Mauro Carvalho Chehab
2024-11-13  8:36 ` [PATCH 1/6] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
2024-11-20 14:29   ` Jonathan Cameron via
2024-11-13  8:36 ` [PATCH 2/6] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-11-20 14:32   ` Jonathan Cameron via
2024-11-13  8:37 ` [PATCH 3/6] acpi/ghes: rename the function which gets hw error offsets Mauro Carvalho Chehab
2024-11-20 14:33   ` Jonathan Cameron via
2024-11-22  9:32     ` Mauro Carvalho Chehab
2024-11-13  8:37 ` [PATCH 4/6] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
2024-11-20 14:59   ` Jonathan Cameron via
2024-11-22 10:37     ` Mauro Carvalho Chehab
2024-11-25 11:31       ` Jonathan Cameron via
2024-11-13  8:37 ` [PATCH 5/6] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2024-11-20 15:01   ` Jonathan Cameron via
2024-11-22 10:40     ` Mauro Carvalho Chehab
2024-11-13  8:37 ` [PATCH 6/6] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
2024-11-20 15:09   ` Jonathan Cameron via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).