* [PATCH v4 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
` (13 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which
has a better name. So, drop the duplication.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 7 ++-----
include/hw/acpi/ghes.h | 3 ++-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..dc217694deb9 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -34,9 +34,6 @@
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT 1
-
/* Generic Hardware Error Source version 2 */
#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
@@ -396,7 +393,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
- assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
+ assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
@@ -407,7 +404,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
if (physical_address) {
- if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
start_addr += source_id * sizeof(uint64_t);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..59e3b8fb24b9 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -59,7 +59,8 @@ enum AcpiGhesNotifyType {
enum {
ACPI_HEST_SRC_ID_SEA = 0,
/* future ids go here */
- ACPI_HEST_SRC_ID_RESERVED,
+
+ ACPI_GHES_ERROR_SOURCE_COUNT
};
typedef struct AcpiGhesState {
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
` (12 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Reduce the ident of the function and prepares it for
the next changes.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 56 ++++++++++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index dc217694deb9..e66f3be1502b 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -402,40 +402,42 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
start_addr = le64_to_cpu(ags->ghes_addr_le);
- if (physical_address) {
+ if (!physical_address) {
+ return -1;
+ }
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
+ start_addr += source_id * sizeof(uint64_t);
+ }
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
+ cpu_physical_memory_read(start_addr, &error_block_addr,
+ sizeof(error_block_addr));
- error_block_addr = le64_to_cpu(error_block_addr);
+ error_block_addr = le64_to_cpu(error_block_addr);
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ read_ack_register_addr = start_addr +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
- cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
+ cpu_physical_memory_read(read_ack_register_addr,
+ &read_ack_register, sizeof(read_ack_register));
- /* zero means OSPM does not acknowledge the error */
- if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack_register) {
+ error_report("OSPM does not acknowledge previous error,"
+ " so can not record CPER for current error anymore");
+ } else if (error_block_addr) {
+ read_ack_register = cpu_to_le64(0);
+ /*
+ * Clear the Read Ack Register, OSPM will write it to 1 when
+ * it acknowledges this error.
+ */
+ cpu_physical_memory_write(read_ack_register_addr,
+ &read_ack_register, sizeof(uint64_t));
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else
- error_report("can not find Generic Error Status Block");
+ ret = acpi_ghes_record_mem_error(error_block_addr,
+ physical_address);
+ } else {
+ error_report("can not find Generic Error Status Block");
}
return ret;
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 03/15] acpi/ghes: simplify the per-arch caller to build HEST table
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
` (11 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
The GHES driver requires not only a HEST table, but also a
separate firmware file to store Error Structure records.
It can't do one without the other.
Simplify the caller logic for it to require one function.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
Changes from v10:
- Removed the logic which associates notification and source
ID. This will be placed on a separate patch.
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 7 +++++--
hw/arm/virt-acpi-build.c | 5 ++---
include/hw/acpi/ghes.h | 4 ++--
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e66f3be1502b..4a6c45bcb4be 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -233,7 +233,7 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
{
int i, error_status_block_offset;
@@ -356,12 +356,15 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
}
/* Build Hardware Error Source Table */
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ build_ghes_error_table(hardware_errors, linker);
+
acpi_table_begin(&table, table_data);
/* Error Source Count */
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 620992c92c12..e059317b002e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -942,10 +942,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
build_dbg2(tables_blob, tables->linker, vms);
if (vms->ras) {
- build_ghes_error_table(tables->hardware_errors, tables->linker);
acpi_add_table(table_offsets, tables_blob);
- acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
- vms->oem_table_id);
+ acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ vms->oem_id, vms->oem_table_id);
}
if (ms->numa_state->num_nodes > 0) {
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 59e3b8fb24b9..20016c226d1f 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -68,8 +68,8 @@ typedef struct AcpiGhesState {
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 04/15] acpi/ghes: better handle source_id and notification
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 16:20 ` Igor Mammedov
2024-11-22 9:11 ` [PATCH v4 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
` (10 subsequent siblings)
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
GHES has two fields that are stored on HEST error source
blocks associated with notifications:
- notification type, which is a number defined at the ACPI spec
containing several arch-specific synchronous and assynchronous
types;
- source id, which is a HW/FW defined number, used to distinguish
between different implemented sources.
There could be several sources with the same notification type,
which is dependent of the way each architecture maps notifications.
Right now, build_ghes_v2() hardcodes a 1:1 mapping between such
fields. Move it to two independent parameters, allowing the
caller function to fill both.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
Chenges from v10:
- Some changes got moved to the previous patch.
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 4a6c45bcb4be..29cd7e4d8171 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -284,9 +284,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
+static void build_ghes_v2(GArray *table_data,
+ BIOSLinker *linker,
+ enum AcpiGhesNotifyType notify,
+ uint16_t source_id)
{
uint64_t address_offset;
+
/*
* Type:
* Generic Hardware Error Source version 2(GHESv2 - Type 10)
@@ -316,18 +320,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
- switch (source_id) {
- case ACPI_HEST_SRC_ID_SEA:
- /*
- * Notification Structure
- * Now only enable ARMv8 SEA notification type
- */
- build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
- break;
- default:
- error_report("Not support this error source");
- abort();
- }
+ /* Notification Structure */
+ build_ghes_hw_error_notification(table_data, notify);
/* Error Status Block Length */
build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
@@ -369,7 +363,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+ build_ghes_v2(table_data, linker,
+ ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
acpi_table_end(linker, &table);
}
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 04/15] acpi/ghes: better handle source_id and notification
2024-11-22 9:11 ` [PATCH v4 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-11-22 16:20 ` Igor Mammedov
0 siblings, 0 replies; 33+ messages in thread
From: Igor Mammedov @ 2024-11-22 16:20 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:21 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> GHES has two fields that are stored on HEST error source
> blocks associated with notifications:
>
> - notification type, which is a number defined at the ACPI spec
> containing several arch-specific synchronous and assynchronous
> types;
> - source id, which is a HW/FW defined number, used to distinguish
> between different implemented sources.
>
> There could be several sources with the same notification type,
> which is dependent of the way each architecture maps notifications.
>
> Right now, build_ghes_v2() hardcodes a 1:1 mapping between such
> fields. Move it to two independent parameters, allowing the
> caller function to fill both.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
>
> ---
>
> Chenges from v10:
>
> - Some changes got moved to the previous patch.
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 4a6c45bcb4be..29cd7e4d8171 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -284,9 +284,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> -static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> +static void build_ghes_v2(GArray *table_data,
> + BIOSLinker *linker,
> + enum AcpiGhesNotifyType notify,
> + uint16_t source_id)
> {
> uint64_t address_offset;
> +
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -316,18 +320,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
>
> - switch (source_id) {
> - case ACPI_HEST_SRC_ID_SEA:
> - /*
> - * Notification Structure
> - * Now only enable ARMv8 SEA notification type
> - */
> - build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> - break;
> - default:
> - error_report("Not support this error source");
> - abort();
> - }
> + /* Notification Structure */
> + build_ghes_hw_error_notification(table_data, notify);
>
> /* Error Status Block Length */
> build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> @@ -369,7 +363,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> - build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> + build_ghes_v2(table_data, linker,
> + ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
>
> acpi_table_end(linker, &table);
> }
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v4 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (3 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
` (9 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Align the header file with the actual implementation of
this function, as the first argument is source ID and not
notification type.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
include/hw/acpi/ghes.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 20016c226d1f..50e3a25ea384 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 06/15] acpi/ghes: Remove a duplicated out of bounds check
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (4 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
` (8 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
acpi_ghes_record_errors() has an assert() at the beginning
to ensure that source_id will be lower than
ACPI_GHES_ERROR_SOURCE_COUNT. Remove a duplicated check.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 29cd7e4d8171..5f67322bf0f2 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -404,9 +404,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
return -1;
}
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ start_addr += source_id * sizeof(uint64_t);
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 07/15] acpi/ghes: Change the type for source_id
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (5 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 15:41 ` Igor Mammedov
2024-11-22 9:11 ` [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
` (7 subsequent siblings)
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
As described at: ACPI 6.5 spec at:
18.3.2. ACPI Error Source
In particular at GHES/GHESv2 table:
Table 18.10 Generic Hardware Error Source Structure
HEST source ID is actually a 16-bit value.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index c315de1802d6..2b64cbd2819a 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 5f67322bf0f2..edc74c38bf8a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -383,7 +383,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
{
uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
uint64_t start_addr;
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 50e3a25ea384..9295e46be25e 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 07/15] acpi/ghes: Change the type for source_id
2024-11-22 9:11 ` [PATCH v4 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-11-22 15:41 ` Igor Mammedov
0 siblings, 0 replies; 33+ messages in thread
From: Igor Mammedov @ 2024-11-22 15:41 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:24 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> As described at: ACPI 6.5 spec at:
> 18.3.2. ACPI Error Source
>
> In particular at GHES/GHESv2 table:
> Table 18.10 Generic Hardware Error Source Structure
>
> HEST source ID is actually a 16-bit value.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 2 +-
> include/hw/acpi/ghes.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index c315de1802d6..2b64cbd2819a 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,7 +11,7 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/ghes.h"
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> {
> return -1;
> }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 5f67322bf0f2..edc74c38bf8a 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -383,7 +383,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> {
> uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> uint64_t start_addr;
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 50e3a25ea384..9295e46be25e 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (6 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 16:19 ` Igor Mammedov
` (2 more replies)
2024-11-22 9:11 ` [PATCH v4 09/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
` (6 subsequent siblings)
14 siblings, 3 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Split the code into separate functions to allow using the
common CPER filling code by different error sources.
The generic code was moved to ghes_record_cper_errors(),
and ghes_gen_err_data_uncorrectable_recoverable() now contains
only a logic to fill the Generic Error Data part of the record,
as described at:
ACPI 6.2: 18.3.2.7.1 Generic Error Data
The remaining code to generate a memory error now belongs to
acpi_ghes_record_errors() function.
A further patch will give it a better name.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 131 ++++++++++++++++++++++++-----------------
include/hw/acpi/ghes.h | 3 +
2 files changed, 79 insertions(+), 55 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index edc74c38bf8a..a96856d5110b 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -181,51 +181,24 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
build_append_int_noprefix(table, 0, 7);
}
-static int acpi_ghes_record_mem_error(uint64_t error_block_address,
- uint64_t error_physical_addr)
+static void
+ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
+ const uint8_t *section_type,
+ int data_length)
{
- GArray *block;
-
- /* Memory Error Section Type */
- const uint8_t uefi_cper_mem_sec[] =
- UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
- 0xED, 0x7C, 0x83, 0xB1);
-
/* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
* Table 17-13 Generic Error Data Entry
*/
QemuUUID fru_id = {};
- uint32_t data_length;
-
- block = g_array_new(false, true /* clear */, 1);
-
- /* This is the length if adding a new generic error data entry*/
- data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
- /*
- * It should not run out of the preallocated memory if adding a new generic
- * error data entry
- */
- assert((data_length + ACPI_GHES_GESB_SIZE) <=
- ACPI_GHES_MAX_RAW_DATA_LENGTH);
/* Build the new generic error status block header */
acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
/* Build this new generic error data entry header */
- acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
+ acpi_ghes_generic_error_data(block, section_type,
ACPI_CPER_SEV_RECOVERABLE, 0, 0,
ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-
- /* Build the memory section CPER for above new generic error data entry */
- acpi_ghes_build_append_mem_cper(block, error_physical_addr);
-
- /* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_address, block->data, block->len);
-
- g_array_free(block, true);
-
- return 0;
}
/*
@@ -383,15 +356,18 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp)
{
uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
uint64_t start_addr;
- bool ret = -1;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
- assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
+ if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+ error_setg(errp, "GHES CPER record is too big: %ld", len);
+ return;
+ }
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
@@ -400,16 +376,16 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
start_addr = le64_to_cpu(ags->ghes_addr_le);
- if (!physical_address) {
- return -1;
- }
-
start_addr += source_id * sizeof(uint64_t);
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
error_block_addr = le64_to_cpu(error_block_addr);
+ if (!error_block_addr) {
+ error_setg(errp, "can not find Generic Error Status Block");
+ return;
+ }
read_ack_register_addr = start_addr +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
@@ -419,24 +395,69 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
/* zero means OSPM does not acknowledge the error */
if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
-
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else {
- error_report("can not find Generic Error Status Block");
+ error_setg(errp,
+ "OSPM does not acknowledge previous error,"
+ " so can not record CPER for current error anymore");
+ return;
}
- return ret;
+ read_ack_register = cpu_to_le64(0);
+ /*
+ * Clear the Read Ack Register, OSPM will write it to 1 when
+ * it acknowledges this error.
+ */
+ cpu_physical_memory_write(read_ack_register_addr,
+ &read_ack_register, sizeof(uint64_t));
+
+ /* Write the generic error data entry into guest memory */
+ cpu_physical_memory_write(error_block_addr, cper, len);
+
+ return;
+}
+
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+{
+ /* Memory Error Section Type */
+ const uint8_t guid[] =
+ UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+ 0xED, 0x7C, 0x83, 0xB1);
+ Error *errp = NULL;
+ int data_length;
+ GArray *block;
+
+ if (!physical_address) {
+ error_report("can not find Generic Error Status Block for source id %d",
+ source_id);
+ return -1;
+ }
+
+ block = g_array_new(false, true /* clear */, 1);
+
+ data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
+ /*
+ * It should not run out of the preallocated memory if adding a new generic
+ * error data entry
+ */
+ assert((data_length + ACPI_GHES_GESB_SIZE) <=
+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+ ghes_gen_err_data_uncorrectable_recoverable(block, guid,
+ data_length);
+
+ /* Build the memory section CPER for above new generic error data entry */
+ acpi_ghes_build_append_mem_cper(block, physical_address);
+
+ /* Report the error */
+ ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+
+ g_array_free(block, true);
+
+ if (errp) {
+ error_report_err(errp);
+ return -1;
+ }
+
+ return 0;
}
bool acpi_ghes_present(void)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 9295e46be25e..8859346af51a 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
/*
* Values for Hardware Error Notification Type field
@@ -73,6 +74,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp);
int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-22 9:11 ` [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
@ 2024-11-22 16:19 ` Igor Mammedov
2024-11-25 11:06 ` Mauro Carvalho Chehab
2024-11-25 11:56 ` Jonathan Cameron via
2024-12-03 11:42 ` Igor Mammedov
2 siblings, 1 reply; 33+ messages in thread
From: Igor Mammedov @ 2024-11-22 16:19 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:25 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Split the code into separate functions to allow using the
> common CPER filling code by different error sources.
>
> The generic code was moved to ghes_record_cper_errors(),
> and ghes_gen_err_data_uncorrectable_recoverable() now contains
> only a logic to fill the Generic Error Data part of the record,
> as described at:
>
> ACPI 6.2: 18.3.2.7.1 Generic Error Data
>
> The remaining code to generate a memory error now belongs to
> acpi_ghes_record_errors() function.
>
> A further patch will give it a better name.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 131 ++++++++++++++++++++++++-----------------
> include/hw/acpi/ghes.h | 3 +
> 2 files changed, 79 insertions(+), 55 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index edc74c38bf8a..a96856d5110b 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -181,51 +181,24 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> build_append_int_noprefix(table, 0, 7);
> }
>
> -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> - uint64_t error_physical_addr)
> +static void
> +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> + const uint8_t *section_type,
> + int data_length)
> {
> - GArray *block;
> -
> - /* Memory Error Section Type */
> - const uint8_t uefi_cper_mem_sec[] =
> - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> - 0xED, 0x7C, 0x83, 0xB1);
> -
> /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> * Table 17-13 Generic Error Data Entry
> */
> QemuUUID fru_id = {};
> - uint32_t data_length;
> -
> - block = g_array_new(false, true /* clear */, 1);
> -
> - /* This is the length if adding a new generic error data entry*/
> - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> - /*
> - * It should not run out of the preallocated memory if adding a new generic
> - * error data entry
> - */
> - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> - ACPI_GHES_MAX_RAW_DATA_LENGTH);
>
> /* Build the new generic error status block header */
> acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
>
> /* Build this new generic error data entry header */
> - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> + acpi_ghes_generic_error_data(block, section_type,
> ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
> -
> - /* Build the memory section CPER for above new generic error data entry */
> - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> -
> - /* Write the generic error data entry into guest memory */
> - cpu_physical_memory_write(error_block_address, block->data, block->len);
> -
> - g_array_free(block, true);
> -
> - return 0;
> }
>
> /*
> @@ -383,15 +356,18 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> +void ghes_record_cper_errors(const void *cper, size_t len,
> + uint16_t source_id, Error **errp)
> {
> uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> uint64_t start_addr;
> - bool ret = -1;
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
>
> - assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
> + if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> + error_setg(errp, "GHES CPER record is too big: %ld", len);
> + return;
> + }
>
> acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> NULL));
> @@ -400,16 +376,16 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
>
> start_addr = le64_to_cpu(ags->ghes_addr_le);
>
> - if (!physical_address) {
> - return -1;
> - }
> -
> start_addr += source_id * sizeof(uint64_t);
>
> cpu_physical_memory_read(start_addr, &error_block_addr,
> sizeof(error_block_addr));
>
> error_block_addr = le64_to_cpu(error_block_addr);
> + if (!error_block_addr) {
> + error_setg(errp, "can not find Generic Error Status Block");
> + return;
> + }
>
> read_ack_register_addr = start_addr +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> @@ -419,24 +395,69 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
>
> /* zero means OSPM does not acknowledge the error */
> if (!read_ack_register) {
> - error_report("OSPM does not acknowledge previous error,"
> - " so can not record CPER for current error anymore");
> - } else if (error_block_addr) {
> - read_ack_register = cpu_to_le64(0);
> - /*
> - * Clear the Read Ack Register, OSPM will write it to 1 when
> - * it acknowledges this error.
> - */
> - cpu_physical_memory_write(read_ack_register_addr,
> - &read_ack_register, sizeof(uint64_t));
> -
> - ret = acpi_ghes_record_mem_error(error_block_addr,
> - physical_address);
> - } else {
> - error_report("can not find Generic Error Status Block");
> + error_setg(errp,
> + "OSPM does not acknowledge previous error,"
> + " so can not record CPER for current error anymore");
> + return;
> }
>
> - return ret;
> + read_ack_register = cpu_to_le64(0);
> + /*
> + * Clear the Read Ack Register, OSPM will write it to 1 when
> + * it acknowledges this error.
> + */
> + cpu_physical_memory_write(read_ack_register_addr,
> + &read_ack_register, sizeof(uint64_t));
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(error_block_addr, cper, len);
> +
> + return;
> +}
> +
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> +{
> + /* Memory Error Section Type */
> + const uint8_t guid[] =
> + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> + 0xED, 0x7C, 0x83, 0xB1);
> + Error *errp = NULL;
> + int data_length;
> + GArray *block;
> +
> + if (!physical_address) {
> + error_report("can not find Generic Error Status Block for source id %d",
> + source_id);
isn't it a copy paste error from somewhere, perhaps it's mixed up with something else?
Granted, the check was there before patches but error message seems bogus.
Also 'physical_address' is a faulty page here, and 0 is as valid as any other value.
I'd say we should drop the check, and if from ARM/KVM pov 0 addr shouldn't exist
then it should be up to caller to filter it out and not call acpi_ghes_record_errors()
in the 1st place.
> + return -1;
> + }
> +
> + block = g_array_new(false, true /* clear */, 1);
> +
> + data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> + /*
> + * It should not run out of the preallocated memory if adding a new generic
> + * error data entry
> + */
> + assert((data_length + ACPI_GHES_GESB_SIZE) <=
> + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> + ghes_gen_err_data_uncorrectable_recoverable(block, guid,
> + data_length);
> +
> + /* Build the memory section CPER for above new generic error data entry */
> + acpi_ghes_build_append_mem_cper(block, physical_address);
> +
> + /* Report the error */
> + ghes_record_cper_errors(block->data, block->len, source_id, &errp);
> +
> + g_array_free(block, true);
> +
> + if (errp) {
> + error_report_err(errp);
> + return -1;
> + }
> +
> + return 0;
> }
>
> bool acpi_ghes_present(void)
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 9295e46be25e..8859346af51a 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,7 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qapi/error.h"
>
> /*
> * Values for Hardware Error Notification Type field
> @@ -73,6 +74,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> +void ghes_record_cper_errors(const void *cper, size_t len,
> + uint16_t source_id, Error **errp);
> int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
>
> /**
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-22 16:19 ` Igor Mammedov
@ 2024-11-25 11:06 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-25 11:06 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Fri, 22 Nov 2024 17:19:44 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:
> > +
> > + if (!physical_address) {
> > + error_report("can not find Generic Error Status Block for source id %d",
> > + source_id);
>
> isn't it a copy paste error from somewhere, perhaps it's mixed up with something else?
> Granted, the check was there before patches but error message seems bogus.
>
> Also 'physical_address' is a faulty page here, and 0 is as valid as any other value.
> I'd say we should drop the check, and if from ARM/KVM pov 0 addr shouldn't exist
> then it should be up to caller to filter it out and not call acpi_ghes_record_errors()
> in the 1st place.
I'll add a patch just before this one removing the physical_address check,
and remove it from the moved function on this patch.
I'll submit a v5 once you review patches 13-15 from this series.
Regards,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-22 9:11 ` [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-11-22 16:19 ` Igor Mammedov
@ 2024-11-25 11:56 ` Jonathan Cameron via
2024-12-04 7:52 ` Mauro Carvalho Chehab
2024-12-03 11:42 ` Igor Mammedov
2 siblings, 1 reply; 33+ messages in thread
From: Jonathan Cameron via @ 2024-11-25 11:56 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:25 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Split the code into separate functions to allow using the
> common CPER filling code by different error sources.
>
> The generic code was moved to ghes_record_cper_errors(),
> and ghes_gen_err_data_uncorrectable_recoverable() now contains
> only a logic to fill the Generic Error Data part of the record,
> as described at:
>
> ACPI 6.2: 18.3.2.7.1 Generic Error Data
>
> The remaining code to generate a memory error now belongs to
> acpi_ghes_record_errors() function.
>
> A further patch will give it a better name.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
One trivial follow up that is enabled by the change you are discussing with Igor.
Up to you that one.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> +
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> +{
> + /* Memory Error Section Type */
> + const uint8_t guid[] =
> + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> + 0xED, 0x7C, 0x83, 0xB1);
> + Error *errp = NULL;
> + int data_length;
> + GArray *block;
> +
> + if (!physical_address) {
> + error_report("can not find Generic Error Status Block for source id %d",
> + source_id);
> + return -1;
> + }
With this error check gone (as per discussion with Igor) you could use
g_autofree to deal with freeing block.
That would bring the errp check right next to the call that would result
in errp potentially being set and slightly improve readability.
Mind you there are no uses of this in hw/acpi currently so maybe this
isn't a good time to start :)
> +
> + block = g_array_new(false, true /* clear */, 1);
> +
> + data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> + /*
> + * It should not run out of the preallocated memory if adding a new generic
> + * error data entry
> + */
> + assert((data_length + ACPI_GHES_GESB_SIZE) <=
> + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> + ghes_gen_err_data_uncorrectable_recoverable(block, guid,
> + data_length);
> +
> + /* Build the memory section CPER for above new generic error data entry */
> + acpi_ghes_build_append_mem_cper(block, physical_address);
> +
> + /* Report the error */
> + ghes_record_cper_errors(block->data, block->len, source_id, &errp);
> +
> + g_array_free(block, true);
> +
> + if (errp) {
> + error_report_err(errp);
> + return -1;
> + }
> +
> + return 0;
> }
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-25 11:56 ` Jonathan Cameron via
@ 2024-12-04 7:52 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-12-04 7:52 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Mon, 25 Nov 2024 11:56:43 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> On Fri, 22 Nov 2024 10:11:25 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Split the code into separate functions to allow using the
> > common CPER filling code by different error sources.
> >
> > The generic code was moved to ghes_record_cper_errors(),
> > and ghes_gen_err_data_uncorrectable_recoverable() now contains
> > only a logic to fill the Generic Error Data part of the record,
> > as described at:
> >
> > ACPI 6.2: 18.3.2.7.1 Generic Error Data
> >
> > The remaining code to generate a memory error now belongs to
> > acpi_ghes_record_errors() function.
> >
> > A further patch will give it a better name.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> One trivial follow up that is enabled by the change you are discussing with Igor.
> Up to you that one.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > +
> > +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> > +{
> > + /* Memory Error Section Type */
> > + const uint8_t guid[] =
> > + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > + 0xED, 0x7C, 0x83, 0xB1);
> > + Error *errp = NULL;
> > + int data_length;
> > + GArray *block;
> > +
> > + if (!physical_address) {
> > + error_report("can not find Generic Error Status Block for source id %d",
> > + source_id);
> > + return -1;
> > + }
>
> With this error check gone (as per discussion with Igor) you could use
> g_autofree to deal with freeing block.
>
> That would bring the errp check right next to the call that would result
> in errp potentially being set and slightly improve readability.
>
> Mind you there are no uses of this in hw/acpi currently so maybe this
> isn't a good time to start :)
Yeah, I prefer to not do such cleanup now. As you said, this isn't used
right now at ghes, and there are still two series on the top of it.
IMO, such kind of change should happen afterwards, and checking on
other places were memory is allocated in the driver.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-11-22 9:11 ` [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-11-22 16:19 ` Igor Mammedov
2024-11-25 11:56 ` Jonathan Cameron via
@ 2024-12-03 11:42 ` Igor Mammedov
2024-12-03 13:38 ` Mauro Carvalho Chehab
2 siblings, 1 reply; 33+ messages in thread
From: Igor Mammedov @ 2024-12-03 11:42 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:25 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Split the code into separate functions to allow using the
> common CPER filling code by different error sources.
>
> The generic code was moved to ghes_record_cper_errors(),
> and ghes_gen_err_data_uncorrectable_recoverable() now contains
> only a logic to fill the Generic Error Data part of the record,
> as described at:
>
> ACPI 6.2: 18.3.2.7.1 Generic Error Data
>
> The remaining code to generate a memory error now belongs to
> acpi_ghes_record_errors() function.
>
> A further patch will give it a better name.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
[...]
> - return ret;
> + read_ack_register = cpu_to_le64(0);
> + /*
> + * Clear the Read Ack Register, OSPM will write it to 1 when
^^^^^^^^^^^^^ typo?
> + * it acknowledges this error.
> + */
> + cpu_physical_memory_write(read_ack_register_addr,
> + &read_ack_register, sizeof(uint64_t));
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(error_block_addr, cper, len);
> +
> + return;
> +}
[...]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic
2024-12-03 11:42 ` Igor Mammedov
@ 2024-12-03 13:38 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-12-03 13:38 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Tue, 3 Dec 2024 12:42:32 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:
> > + /*
> > + * Clear the Read Ack Register, OSPM will write it to 1 when
> ^^^^^^^^^^^^^ typo?
> > + * it acknowledges this error.
> > + */
Yes. I'll add this hunk there:
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 68fb30c2d5c1..4b5332f8c667 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -403,8 +403,8 @@ void ghes_record_cper_errors(const void *cper, size_t len,
read_ack_register = cpu_to_le64(0);
/*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
+ * Clear the Read Ack Register, OSPM will write 1 to this register when
+ * it acknowledges the error.
*/
cpu_physical_memory_write(read_ack_register_addr,
&read_ack_register, sizeof(uint64_t));
Thanks,
Mauro
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 09/15] acpi/ghes: better name GHES memory error function
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (7 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
` (5 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, kvm, linux-kernel, qemu-arm, qemu-devel
The current function used to generate GHES data is specific for
memory errors. Give a better name for it, as we now have a generic
function as well.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 4 ++--
target/arm/kvm.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 2b64cbd2819a..7cec1812dad9 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a96856d5110b..ad7d895def2a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -415,7 +415,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
return;
}
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
{
/* Memory Error Section Type */
const uint8_t guid[] =
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 8859346af51a..21666a4bcc8b 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -74,15 +74,15 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t error_physical_addr);
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
*
* Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_record_errors() to record a memory error.
+ * safe to call acpi_ghes_memory_errors() to record a memory error.
*/
bool acpi_ghes_present(void);
#endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 7b6812c0de2e..b4260467f8b9 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2387,7 +2387,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+ if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
kvm_inject_arm_sea(c);
} else {
error_report("failed to record the error");
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (8 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 09/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 16:21 ` Igor Mammedov
2024-11-22 9:11 ` [PATCH v4 11/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
` (4 subsequent siblings)
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Make error handling within ghes_record_cper_errors() consistent,
i.e. instead abort just print a error in case ghes GED is not found.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index ad7d895def2a..25587f5fc9ab 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -371,7 +371,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
- g_assert(acpi_ged_state);
+ if (!acpi_ged_state) {
+ error_setg(errp, "Can't find ACPI_GED object");
+ return;
+ }
ags = &acpi_ged_state->ghes_state;
start_addr = le64_to_cpu(ags->ghes_addr_le);
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-11-22 9:11 ` [PATCH v4 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
@ 2024-11-22 16:21 ` Igor Mammedov
0 siblings, 0 replies; 33+ messages in thread
From: Igor Mammedov @ 2024-11-22 16:21 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:27 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Make error handling within ghes_record_cper_errors() consistent,
> i.e. instead abort just print a error in case ghes GED is not found.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index ad7d895def2a..25587f5fc9ab 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -371,7 +371,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>
> acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> NULL));
> - g_assert(acpi_ged_state);
> + if (!acpi_ged_state) {
> + error_setg(errp, "Can't find ACPI_GED object");
> + return;
> + }
> ags = &acpi_ged_state->ghes_state;
>
> start_addr = le64_to_cpu(ags->ghes_addr_le);
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v4 11/15] acpi/ghes: rename etc/hardware_error file macros
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (9 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 12/15] acpi/ghes: better name the offset of the hardware error firmware Mauro Carvalho Chehab
` (3 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Now that we have also have a file to store HEST data location,
which is part of GHES, better name the file where CPER records
are stored.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 38 +++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 25587f5fc9ab..292c77e78a5a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -28,8 +28,8 @@
#include "hw/nvram/fw_cfg.h"
#include "qemu/uuid.h"
-#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
-#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
+#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
@@ -234,7 +234,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
/* Tell guest firmware to place hardware_errors blob into RAM */
- bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+ bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
@@ -243,17 +243,21 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
* corresponding "Generic Error Status Block"
*/
bios_linker_loader_add_pointer(linker,
- ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ sizeof(uint64_t) * i,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ error_status_block_offset +
+ i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
}
/*
* tell firmware to write hardware_errors GPA into
* hardware_errors_addr fw_cfg, once the former has been initialized.
*/
- bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
- 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+ bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE, 0);
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
@@ -290,8 +294,10 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ source_id * sizeof(uint64_t));
/* Notification Structure */
build_ghes_hw_error_notification(table_data, notify);
@@ -308,9 +314,11 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ (ACPI_GHES_ERROR_SOURCE_COUNT + source_id)
+ * sizeof(uint64_t));
/*
* Read Ack Preserve field
@@ -346,11 +354,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
GArray *hardware_error)
{
/* Create a read-only fw_cfg file for GHES */
- fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+ fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
hardware_error->len);
/* Create a read-write fw_cfg file for Address */
- fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+ fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
ags->present = true;
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 12/15] acpi/ghes: better name the offset of the hardware error firmware
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (10 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 11/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
` (2 subsequent siblings)
14 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
The hardware error firmware is where HEST error structures are
stored. Those can be GHESv2, but they can also be other types.
Better name the location of the hardware error.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/generic_event_device.c | 4 ++--
hw/acpi/ghes.c | 4 ++--
include/hw/acpi/ghes.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 663d9cb09380..17baf36132a8 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -364,7 +364,7 @@ static const VMStateDescription vmstate_ghes = {
.version_id = 1,
.minimum_version_id = 1,
.fields = (const VMStateField[]) {
- VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
+ VMSTATE_UINT64(hw_error_le, AcpiGhesState),
VMSTATE_END_OF_LIST()
},
};
@@ -372,7 +372,7 @@ static const VMStateDescription vmstate_ghes = {
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
- return s->ghes_state.ghes_addr_le;
+ return s->ghes_state.hw_error_le;
}
static const VMStateDescription vmstate_ghes_state = {
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 292c77e78a5a..87fd3feedd2a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -359,7 +359,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
/* Create a read-write fw_cfg file for Address */
fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
- NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+ NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
ags->present = true;
}
@@ -385,7 +385,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
}
ags = &acpi_ged_state->ghes_state;
- start_addr = le64_to_cpu(ags->ghes_addr_le);
+ start_addr = le64_to_cpu(ags->hw_error_le);
start_addr += source_id * sizeof(uint64_t);
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 21666a4bcc8b..39619a2457cb 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -65,7 +65,7 @@ enum {
};
typedef struct AcpiGhesState {
- uint64_t ghes_addr_le;
+ uint64_t hw_error_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (11 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 12/15] acpi/ghes: better name the offset of the hardware error firmware Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-12-03 11:51 ` Igor Mammedov
2024-11-22 9:11 ` [PATCH v4 14/15] acpi/ghes: Change ghes fill logic to work with only one source Mauro Carvalho Chehab
2024-11-22 9:11 ` [PATCH v4 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Currently, CPER address location is calculated as an offset of
the hardware_errors table. It is also badly named, as the
offset actually used is the address where the CPER data starts,
and not the beginning of the error source.
Move the logic which calculates such offset to a separate
function, in preparation for a patch that will be changing the
logic to calculate it from the HEST table.
While here, properly name the variable which stores the cper
address.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 87fd3feedd2a..d99697b20164 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
+static void get_hw_error_offsets(uint64_t ghes_addr,
+ uint64_t *cper_addr,
+ uint64_t *read_ack_register_addr)
+{
+ if (!ghes_addr) {
+ return;
+ }
+
+ /*
+ * non-HEST version supports only one source, so no need to change
+ * the start offset based on the source ID. Also, we can't validate
+ * the source ID, as it is stored inside the HEST table.
+ */
+
+ cpu_physical_memory_read(ghes_addr, cper_addr,
+ sizeof(*cper_addr));
+
+ *cper_addr = le64_to_cpu(*cper_addr);
+
+ /*
+ * As the current version supports only one source, the ack offset is
+ * just sizeof(uint64_t).
+ */
+ *read_ack_register_addr = ghes_addr +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+}
+
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
- uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
+ uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
uint64_t start_addr;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
@@ -389,18 +416,14 @@ void ghes_record_cper_errors(const void *cper, size_t len,
start_addr += source_id * sizeof(uint64_t);
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
+ get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
- error_block_addr = le64_to_cpu(error_block_addr);
- if (!error_block_addr) {
+ cper_addr = le64_to_cpu(cper_addr);
+ if (!cper_addr) {
error_setg(errp, "can not find Generic Error Status Block");
return;
}
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
-
cpu_physical_memory_read(read_ack_register_addr,
&read_ack_register, sizeof(read_ack_register));
@@ -421,7 +444,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
&read_ack_register, sizeof(uint64_t));
/* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_addr, cper, len);
+ cpu_physical_memory_write(cper_addr, cper, len);
return;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-11-22 9:11 ` [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
@ 2024-12-03 11:51 ` Igor Mammedov
2024-12-03 13:47 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 33+ messages in thread
From: Igor Mammedov @ 2024-12-03 11:51 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:30 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Currently, CPER address location is calculated as an offset of
> the hardware_errors table. It is also badly named, as the
> offset actually used is the address where the CPER data starts,
> and not the beginning of the error source.
>
> Move the logic which calculates such offset to a separate
> function, in preparation for a patch that will be changing the
> logic to calculate it from the HEST table.
>
> While here, properly name the variable which stores the cper
> address.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> 1 file changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 87fd3feedd2a..d99697b20164 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> +static void get_hw_error_offsets(uint64_t ghes_addr,
> + uint64_t *cper_addr,
> + uint64_t *read_ack_register_addr)
> +{
> + if (!ghes_addr) {
> + return;
> + }
why do we need this check?
> +
> + /*
> + * non-HEST version supports only one source, so no need to change
> + * the start offset based on the source ID. Also, we can't validate
> + * the source ID, as it is stored inside the HEST table.
> + */
> +
> + cpu_physical_memory_read(ghes_addr, cper_addr,
> + sizeof(*cper_addr));
> +
> + *cper_addr = le64_to_cpu(*cper_addr);
1st bits flip, and then see later
> +
> + /*
> + * As the current version supports only one source, the ack offset is
> + * just sizeof(uint64_t).
> + */
> + *read_ack_register_addr = ghes_addr +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> +}
> +
> void ghes_record_cper_errors(const void *cper, size_t len,
> uint16_t source_id, Error **errp)
> {
> - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> + uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
if get_hw_error_offsets() isn't supposed to fail, then we do not need to initialize
above. So this hunk doesn't belong to this patch.
> uint64_t start_addr;
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
> @@ -389,18 +416,14 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>
> start_addr += source_id * sizeof(uint64_t);
>
> - cpu_physical_memory_read(start_addr, &error_block_addr,
> - sizeof(error_block_addr));
> + get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
>
> - error_block_addr = le64_to_cpu(error_block_addr);
> - if (!error_block_addr) {
> + cper_addr = le64_to_cpu(cper_addr);
^^^^ 2nd bits flip turning it back to guest byte order again
suggest to keep only one of them in get_hw_error_offsets()
> + if (!cper_addr) {
> error_setg(errp, "can not find Generic Error Status Block");
> return;
> }
>
> - read_ack_register_addr = start_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> -
> cpu_physical_memory_read(read_ack_register_addr,
> &read_ack_register, sizeof(read_ack_register));
>
> @@ -421,7 +444,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> &read_ack_register, sizeof(uint64_t));
>
> /* Write the generic error data entry into guest memory */
> - cpu_physical_memory_write(error_block_addr, cper, len);
> + cpu_physical_memory_write(cper_addr, cper, len);
>
> return;
> }
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-12-03 11:51 ` Igor Mammedov
@ 2024-12-03 13:47 ` Mauro Carvalho Chehab
2024-12-04 7:54 ` Igor Mammedov
0 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-12-03 13:47 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Tue, 3 Dec 2024 12:51:43 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Fri, 22 Nov 2024 10:11:30 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Currently, CPER address location is calculated as an offset of
> > the hardware_errors table. It is also badly named, as the
> > offset actually used is the address where the CPER data starts,
> > and not the beginning of the error source.
> >
> > Move the logic which calculates such offset to a separate
> > function, in preparation for a patch that will be changing the
> > logic to calculate it from the HEST table.
> >
> > While here, properly name the variable which stores the cper
> > address.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > ---
> > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > 1 file changed, 32 insertions(+), 9 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 87fd3feedd2a..d99697b20164 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > ags->present = true;
> > }
> >
> > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > + uint64_t *cper_addr,
> > + uint64_t *read_ack_register_addr)
> > +{
>
>
> > + if (!ghes_addr) {
> > + return;
> > + }
>
> why do we need this check?
It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
callback doesn't fill it properly, this will be zero.
> > +
> > + /*
> > + * non-HEST version supports only one source, so no need to change
> > + * the start offset based on the source ID. Also, we can't validate
> > + * the source ID, as it is stored inside the HEST table.
> > + */
> > +
> > + cpu_physical_memory_read(ghes_addr, cper_addr,
> > + sizeof(*cper_addr));
> > +
> > + *cper_addr = le64_to_cpu(*cper_addr);
> 1st bits flip, and then see later
>
> > +
> > + /*
> > + * As the current version supports only one source, the ack offset is
> > + * just sizeof(uint64_t).
> > + */
> > + *read_ack_register_addr = ghes_addr +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > +}
> > +
> > void ghes_record_cper_errors(const void *cper, size_t len,
> > uint16_t source_id, Error **errp)
> > {
> > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > + uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
>
> if get_hw_error_offsets() isn't supposed to fail, then we do not need to initialize
> above. So this hunk doesn't belong to this patch.
It may fail due to:
if (!ghes_addr) {
return;
}
>
> > uint64_t start_addr;
> > AcpiGedState *acpi_ged_state;
> > AcpiGhesState *ags;
> > @@ -389,18 +416,14 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> >
> > start_addr += source_id * sizeof(uint64_t);
> >
> > - cpu_physical_memory_read(start_addr, &error_block_addr,
> > - sizeof(error_block_addr));
> > + get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
> >
> > - error_block_addr = le64_to_cpu(error_block_addr);
> > - if (!error_block_addr) {
> > + cper_addr = le64_to_cpu(cper_addr);
> ^^^^ 2nd bits flip turning it back to guest byte order again
>
> suggest to keep only one of them in get_hw_error_offsets()
Ok, I'll drop this one.
> > + if (!cper_addr) {
> > error_setg(errp, "can not find Generic Error Status Block");
> > return;
> > }
> >
> > - read_ack_register_addr = start_addr +
> > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > -
> > cpu_physical_memory_read(read_ack_register_addr,
> > &read_ack_register, sizeof(read_ack_register));
> >
> > @@ -421,7 +444,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > &read_ack_register, sizeof(uint64_t));
> >
> > /* Write the generic error data entry into guest memory */
> > - cpu_physical_memory_write(error_block_addr, cper, len);
> > + cpu_physical_memory_write(cper_addr, cper, len);
> >
> > return;
> > }
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-12-03 13:47 ` Mauro Carvalho Chehab
@ 2024-12-04 7:54 ` Igor Mammedov
2024-12-04 8:56 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 33+ messages in thread
From: Igor Mammedov @ 2024-12-04 7:54 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Tue, 3 Dec 2024 14:47:30 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 3 Dec 2024 12:51:43 +0100
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Fri, 22 Nov 2024 10:11:30 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > Currently, CPER address location is calculated as an offset of
> > > the hardware_errors table. It is also badly named, as the
> > > offset actually used is the address where the CPER data starts,
> > > and not the beginning of the error source.
> > >
> > > Move the logic which calculates such offset to a separate
> > > function, in preparation for a patch that will be changing the
> > > logic to calculate it from the HEST table.
> > >
> > > While here, properly name the variable which stores the cper
> > > address.
> > >
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > ---
> > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > > 1 file changed, 32 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index 87fd3feedd2a..d99697b20164 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > ags->present = true;
> > > }
> > >
> > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > + uint64_t *cper_addr,
> > > + uint64_t *read_ack_register_addr)
> > > +{
> >
> >
> > > + if (!ghes_addr) {
> > > + return;
> > > + }
> >
> > why do we need this check?
>
> It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> callback doesn't fill it properly, this will be zero.
shouldn't happen, but yeah it firmware job to write back addr
which might happen for whatever reason (a bug for example).
Perhaps push this up to the stack, so we don't have to deal
with scattered checks in ghes code.
kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
and warn_once if that ever happens.
It already calls acpi_ghes_present() which resolves GED device
and then later we duplicate this job in ghes_record_cper_errors()
so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
and call it instead. And then move ghes_addr check/warn_once there.
This way the rest of ghes code won't have to deal handling practically
impossible error conditions that cause reader to wonder why it might happen.
> > > +
> > > + /*
> > > + * non-HEST version supports only one source, so no need to change
> > > + * the start offset based on the source ID. Also, we can't validate
> > > + * the source ID, as it is stored inside the HEST table.
> > > + */
> > > +
> > > + cpu_physical_memory_read(ghes_addr, cper_addr,
> > > + sizeof(*cper_addr));
> > > +
> > > + *cper_addr = le64_to_cpu(*cper_addr);
> > 1st bits flip, and then see later
> >
> > > +
> > > + /*
> > > + * As the current version supports only one source, the ack offset is
> > > + * just sizeof(uint64_t).
> > > + */
> > > + *read_ack_register_addr = ghes_addr +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > > +}
> > > +
> > > void ghes_record_cper_errors(const void *cper, size_t len,
> > > uint16_t source_id, Error **errp)
> > > {
> > > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > > + uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
> >
> > if get_hw_error_offsets() isn't supposed to fail, then we do not need to initialize
> > above. So this hunk doesn't belong to this patch.
>
> It may fail due to:
>
> if (!ghes_addr) {
> return;
> }
>
> >
> > > uint64_t start_addr;
> > > AcpiGedState *acpi_ged_state;
> > > AcpiGhesState *ags;
> > > @@ -389,18 +416,14 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > >
> > > start_addr += source_id * sizeof(uint64_t);
> > >
> > > - cpu_physical_memory_read(start_addr, &error_block_addr,
> > > - sizeof(error_block_addr));
> > > + get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
> > >
> > > - error_block_addr = le64_to_cpu(error_block_addr);
> > > - if (!error_block_addr) {
> > > + cper_addr = le64_to_cpu(cper_addr);
> > ^^^^ 2nd bits flip turning it back to guest byte order again
> >
> > suggest to keep only one of them in get_hw_error_offsets()
>
> Ok, I'll drop this one.
>
> > > + if (!cper_addr) {
> > > error_setg(errp, "can not find Generic Error Status Block");
> > > return;
> > > }
> > >
> > > - read_ack_register_addr = start_addr +
> > > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > > -
> > > cpu_physical_memory_read(read_ack_register_addr,
> > > &read_ack_register, sizeof(read_ack_register));
> > >
> > > @@ -421,7 +444,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > > &read_ack_register, sizeof(uint64_t));
> > >
> > > /* Write the generic error data entry into guest memory */
> > > - cpu_physical_memory_write(error_block_addr, cper, len);
> > > + cpu_physical_memory_write(cper_addr, cper, len);
> > >
> > > return;
> > > }
> >
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-12-04 7:54 ` Igor Mammedov
@ 2024-12-04 8:56 ` Mauro Carvalho Chehab
2024-12-04 9:24 ` Igor Mammedov
0 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-12-04 8:56 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Wed, 4 Dec 2024 08:54:40 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Tue, 3 Dec 2024 14:47:30 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Em Tue, 3 Dec 2024 12:51:43 +0100
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >
> > > On Fri, 22 Nov 2024 10:11:30 +0100
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >
> > > > Currently, CPER address location is calculated as an offset of
> > > > the hardware_errors table. It is also badly named, as the
> > > > offset actually used is the address where the CPER data starts,
> > > > and not the beginning of the error source.
> > > >
> > > > Move the logic which calculates such offset to a separate
> > > > function, in preparation for a patch that will be changing the
> > > > logic to calculate it from the HEST table.
> > > >
> > > > While here, properly name the variable which stores the cper
> > > > address.
> > > >
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > ---
> > > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > > > 1 file changed, 32 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > index 87fd3feedd2a..d99697b20164 100644
> > > > --- a/hw/acpi/ghes.c
> > > > +++ b/hw/acpi/ghes.c
> > > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > > ags->present = true;
> > > > }
> > > >
> > > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > > + uint64_t *cper_addr,
> > > > + uint64_t *read_ack_register_addr)
> > > > +{
> > >
> > >
> > > > + if (!ghes_addr) {
> > > > + return;
> > > > + }
> > >
> > > why do we need this check?
> >
> > It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> > callback doesn't fill it properly, this will be zero.
>
> shouldn't happen, but yeah it firmware job to write back addr
> which might happen for whatever reason (a bug for example).
>
The main reason I added it is that, after the second series, it could
also happen if there's something wrong with the backward compat logic.
So, both here and after switching to HEST-based offsets, I opted
to explicitly test.
> Perhaps push this up to the stack, so we don't have to deal
> with scattered checks in ghes code.
>
> kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
> and warn_once if that ever happens.
> It already calls acpi_ghes_present() which resolves GED device
> and then later we duplicate this job in ghes_record_cper_errors()
>
> so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
> and call it instead. And then move ghes_addr check/warn_once there.
> This way the rest of ghes code won't have to deal handling practically
> impossible error conditions that cause reader to wonder why it might happen.
I'll look on it. Yet, if ok for you, I would prefer dealing with this
once we have a bigger picture, e.g. once we merge those tree series:
- cleanup series (this one);
- HEST offset (I'll be sending a new version today);
- error_inject.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-12-04 8:56 ` Mauro Carvalho Chehab
@ 2024-12-04 9:24 ` Igor Mammedov
2024-12-09 9:27 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 33+ messages in thread
From: Igor Mammedov @ 2024-12-04 9:24 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 4 Dec 2024 09:56:35 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Wed, 4 Dec 2024 08:54:40 +0100
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Tue, 3 Dec 2024 14:47:30 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > Em Tue, 3 Dec 2024 12:51:43 +0100
> > > Igor Mammedov <imammedo@redhat.com> escreveu:
> > >
> > > > On Fri, 22 Nov 2024 10:11:30 +0100
> > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > >
> > > > > Currently, CPER address location is calculated as an offset of
> > > > > the hardware_errors table. It is also badly named, as the
> > > > > offset actually used is the address where the CPER data starts,
> > > > > and not the beginning of the error source.
> > > > >
> > > > > Move the logic which calculates such offset to a separate
> > > > > function, in preparation for a patch that will be changing the
> > > > > logic to calculate it from the HEST table.
> > > > >
> > > > > While here, properly name the variable which stores the cper
> > > > > address.
> > > > >
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > ---
> > > > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++---------
> > > > > 1 file changed, 32 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > > index 87fd3feedd2a..d99697b20164 100644
> > > > > --- a/hw/acpi/ghes.c
> > > > > +++ b/hw/acpi/ghes.c
> > > > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > > > ags->present = true;
> > > > > }
> > > > >
> > > > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > > > + uint64_t *cper_addr,
> > > > > + uint64_t *read_ack_register_addr)
> > > > > +{
> > > >
> > > >
> > > > > + if (!ghes_addr) {
> > > > > + return;
> > > > > + }
> > > >
> > > > why do we need this check?
> > >
> > > It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> > > callback doesn't fill it properly, this will be zero.
> >
> > shouldn't happen, but yeah it firmware job to write back addr
> > which might happen for whatever reason (a bug for example).
> >
>
> The main reason I added it is that, after the second series, it could
> also happen if there's something wrong with the backward compat logic.
>
> So, both here and after switching to HEST-based offsets, I opted
> to explicitly test.
>
> > Perhaps push this up to the stack, so we don't have to deal
> > with scattered checks in ghes code.
> >
> > kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
> > and warn_once if that ever happens.
> > It already calls acpi_ghes_present() which resolves GED device
> > and then later we duplicate this job in ghes_record_cper_errors()
> >
> > so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
> > and call it instead. And then move ghes_addr check/warn_once there.
> > This way the rest of ghes code won't have to deal handling practically
> > impossible error conditions that cause reader to wonder why it might happen.
>
> I'll look on it. Yet, if ok for you, I would prefer dealing with this
> once we have a bigger picture, e.g. once we merge those tree series:
>
> - cleanup series (this one);
> - HEST offset (I'll be sending a new version today);
ok, lets revisit this point after this series.
Since at this point we should have a clean picture of how new code
works.
> - error_inject.
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function
2024-12-04 9:24 ` Igor Mammedov
@ 2024-12-09 9:27 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-12-09 9:27 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Wed, 4 Dec 2024 10:24:13 +0100
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Wed, 4 Dec 2024 09:56:35 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Em Wed, 4 Dec 2024 08:54:40 +0100
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >
> > > On Tue, 3 Dec 2024 14:47:30 +0100
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >
> > > > Em Tue, 3 Dec 2024 12:51:43 +0100
> > > > Igor Mammedov <imammedo@redhat.com> escreveu:
> > > >
> > > > > On Fri, 22 Nov 2024 10:11:30 +0100
> > > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > > >
...
> > > > > > +static void get_hw_error_offsets(uint64_t ghes_addr,
> > > > > > + uint64_t *cper_addr,
> > > > > > + uint64_t *read_ack_register_addr)
> > > > > > +{
> > > > >
> > > > >
> > > > > > + if (!ghes_addr) {
> > > > > > + return;
> > > > > > + }
> > > > >
> > > > > why do we need this check?
> > > >
> > > > It is a safeguard measure to avoid crashes and OOM access. If fw_cfg
> > > > callback doesn't fill it properly, this will be zero.
> > >
> > > shouldn't happen, but yeah it firmware job to write back addr
> > > which might happen for whatever reason (a bug for example).
> > >
> >
> > The main reason I added it is that, after the second series, it could
> > also happen if there's something wrong with the backward compat logic.
> >
> > So, both here and after switching to HEST-based offsets, I opted
> > to explicitly test.
> >
> > > Perhaps push this up to the stack, so we don't have to deal
> > > with scattered checks in ghes code.
> > >
> > > kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check
> > > and warn_once if that ever happens.
> > > It already calls acpi_ghes_present() which resolves GED device
> > > and then later we duplicate this job in ghes_record_cper_errors()
> > >
> > > so maybe rename acpi_ghes_present to something like AcpiGhesState* acpi_ghes_get_state()
> > > and call it instead. And then move ghes_addr check/warn_once there.
> > > This way the rest of ghes code won't have to deal handling practically
> > > impossible error conditions that cause reader to wonder why it might happen.
> >
> > I'll look on it.
Wrote the cleanup patch. See enclosed. I'll place it at the end of the
second series.
> > Yet, if ok for you, I would prefer dealing with this
> > once we have a bigger picture, e.g. once we merge those tree series:
> >
> > - cleanup series (this one);
> > - HEST offset (I'll be sending a new version today);
> ok, lets revisit this point after this series.
> Since at this point we should have a clean picture of how new code
> works.
Thanks,
Mauro
[PATCH] acpi/ghes: Cleanup the code which gets ghes ged state
Move the check logic into a common function and simplify the
code which checks if GHES is enabled and was properly setup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 7cec1812dad9..fbabf955155a 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -16,7 +16,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
return -1;
}
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
{
- return false;
+ return NULL;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a9c5315c1936..17aada9ee352 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -420,10 +420,6 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
uint64_t *cper_addr,
uint64_t *read_ack_register_addr)
{
- if (!ghes_addr) {
- return;
- }
-
/*
* non-HEST version supports only one source, so no need to change
* the start offset based on the source ID. Also, we can't validate
@@ -451,10 +447,6 @@ static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
uint64_t err_source_struct, error_block_addr;
uint32_t num_sources, i;
- if (!hest_addr) {
- return;
- }
-
cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
num_sources = le32_to_cpu(num_sources);
@@ -513,7 +505,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
- AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
@@ -521,13 +512,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
return;
}
- acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
- NULL));
- if (!acpi_ged_state) {
- error_setg(errp, "Can't find ACPI_GED object");
+ ags = acpi_ghes_get_state();
+ if (!ags) {
return;
}
- ags = &acpi_ged_state->ghes_state;
if (!ags->hest_lookup) {
get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
@@ -537,11 +525,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
&cper_addr, &read_ack_register_addr, errp);
}
- if (!cper_addr) {
- error_setg(errp, "can not find Generic Error Status Block");
- return;
- }
-
cpu_physical_memory_read(read_ack_register_addr,
&read_ack_register, sizeof(read_ack_register));
@@ -606,7 +589,7 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
return 0;
}
-bool acpi_ghes_present(void)
+AcpiGhesState *acpi_ghes_get_state(void)
{
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
@@ -615,8 +598,14 @@ bool acpi_ghes_present(void)
NULL));
if (!acpi_ged_state) {
- return false;
+ return NULL;
}
ags = &acpi_ged_state->ghes_state;
- return ags->present;
+ if (!ags->present) {
+ return NULL;
+ }
+ if (!ags->hw_error_le && !ags->hest_addr_le) {
+ return NULL;
+ }
+ return ags;
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 2e8405edfe27..64fe2b5bea65 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -91,10 +91,11 @@ void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
/**
- * acpi_ghes_present: Report whether ACPI GHES table is present
+ * acpi_ghes_get_state: Get a pointer for ACPI ghes state
*
- * Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_memory_errors() to record a memory error.
+ * Returns: a pointer to ghes state if the system has an ACPI GHES table,
+ * it is enabled and it is safe to call acpi_ghes_memory_errors() to record
+ * a memory error. Returns false, otherwise.
*/
-bool acpi_ghes_present(void);
+AcpiGhesState *acpi_ghes_get_state(void);
#endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index b4260467f8b9..7802c32fb7e0 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2369,7 +2369,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
- if (acpi_ghes_present() && addr) {
+ if (acpi_ghes_get_state() && addr) {
ram_addr = qemu_ram_addr_from_host(addr);
if (ram_addr != RAM_ADDR_INVALID &&
kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH v4 14/15] acpi/ghes: Change ghes fill logic to work with only one source
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (12 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 13/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-12-03 11:52 ` Igor Mammedov
2024-11-22 9:11 ` [PATCH v4 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Extending to multiple sources require a BIOS pointer to the
beginning of the HEST table, which in turn requires a backward-compatible
code.
So, the current code supports only one source. Ensure that and simplify
the code.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/acpi/ghes.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index d99697b20164..b0b1865dc8d3 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -387,15 +387,13 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
* As the current version supports only one source, the ack offset is
* just sizeof(uint64_t).
*/
- *read_ack_register_addr = ghes_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
}
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
- uint64_t start_addr;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
@@ -412,11 +410,9 @@ void ghes_record_cper_errors(const void *cper, size_t len,
}
ags = &acpi_ged_state->ghes_state;
- start_addr = le64_to_cpu(ags->hw_error_le);
-
- start_addr += source_id * sizeof(uint64_t);
-
- get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
+ assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
+ get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
+ &cper_addr, &read_ack_register_addr);
cper_addr = le64_to_cpu(cper_addr);
if (!cper_addr) {
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 14/15] acpi/ghes: Change ghes fill logic to work with only one source
2024-11-22 9:11 ` [PATCH v4 14/15] acpi/ghes: Change ghes fill logic to work with only one source Mauro Carvalho Chehab
@ 2024-12-03 11:52 ` Igor Mammedov
0 siblings, 0 replies; 33+ messages in thread
From: Igor Mammedov @ 2024-12-03 11:52 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:31 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Extending to multiple sources require a BIOS pointer to the
> beginning of the HEST table, which in turn requires a backward-compatible
> code.
>
> So, the current code supports only one source. Ensure that and simplify
> the code.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 12 ++++--------
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index d99697b20164..b0b1865dc8d3 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -387,15 +387,13 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
> * As the current version supports only one source, the ack offset is
> * just sizeof(uint64_t).
> */
> - *read_ack_register_addr = ghes_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> + *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
> }
>
> void ghes_record_cper_errors(const void *cper, size_t len,
> uint16_t source_id, Error **errp)
> {
> uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
> - uint64_t start_addr;
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
>
> @@ -412,11 +410,9 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> }
> ags = &acpi_ged_state->ghes_state;
>
> - start_addr = le64_to_cpu(ags->hw_error_le);
> -
> - start_addr += source_id * sizeof(uint64_t);
> -
> - get_hw_error_offsets(start_addr, &cper_addr, &read_ack_register_addr);
> + assert(ACPI_GHES_ERROR_SOURCE_COUNT == 1);
> + get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> + &cper_addr, &read_ack_register_addr);
>
> cper_addr = le64_to_cpu(cper_addr);
> if (!cper_addr) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH v4 15/15] docs: acpi_hest_ghes: fix documentation for CPER size
2024-11-22 9:11 [PATCH v4 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (13 preceding siblings ...)
2024-11-22 9:11 ` [PATCH v4 14/15] acpi/ghes: Change ghes fill logic to work with only one source Mauro Carvalho Chehab
@ 2024-11-22 9:11 ` Mauro Carvalho Chehab
2024-12-03 11:56 ` Igor Mammedov
14 siblings, 1 reply; 33+ messages in thread
From: Mauro Carvalho Chehab @ 2024-11-22 9:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Dongjiu Geng,
linux-kernel, qemu-arm, qemu-devel
While the spec defines a CPER size of 4KiB for each record,
currently it is set to 1KiB. Fix the documentation and add
a pointer to the macro name there, as this may help to keep
it updated.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
docs/specs/acpi_hest_ghes.rst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index 68f1fbe0a4af..c3e9f8d9a702 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -67,8 +67,10 @@ Design Details
(3) The address registers table contains N Error Block Address entries
and N Read Ack Register entries. The size for each entry is 8-byte.
The Error Status Data Block table contains N Error Status Data Block
- entries. The size for each entry is 4096(0x1000) bytes. The total size
- for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+ entries. The size for each entry is defined at the source code as
+ ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+ for the "etc/hardware_errors" fw_cfg blob is
+ (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
N is the number of the kinds of hardware error sources.
(4) QEMU generates the ACPI linker/loader script for the firmware. The
--
2.47.0
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH v4 15/15] docs: acpi_hest_ghes: fix documentation for CPER size
2024-11-22 9:11 ` [PATCH v4 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
@ 2024-12-03 11:56 ` Igor Mammedov
0 siblings, 0 replies; 33+ messages in thread
From: Igor Mammedov @ 2024-12-03 11:56 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Fri, 22 Nov 2024 10:11:32 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> While the spec defines a CPER size of 4KiB for each record,
> currently it is set to 1KiB. Fix the documentation and add
> a pointer to the macro name there, as this may help to keep
> it updated.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Acked-by: Igor Mammedov <imammedo@redhat.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> docs/specs/acpi_hest_ghes.rst | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> index 68f1fbe0a4af..c3e9f8d9a702 100644
> --- a/docs/specs/acpi_hest_ghes.rst
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -67,8 +67,10 @@ Design Details
> (3) The address registers table contains N Error Block Address entries
> and N Read Ack Register entries. The size for each entry is 8-byte.
> The Error Status Data Block table contains N Error Status Data Block
> - entries. The size for each entry is 4096(0x1000) bytes. The total size
> - for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
> + entries. The size for each entry is defined at the source code as
> + ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
> + for the "etc/hardware_errors" fw_cfg blob is
> + (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
> N is the number of the kinds of hardware error sources.
>
> (4) QEMU generates the ACPI linker/loader script for the firmware. The
^ permalink raw reply [flat|nested] 33+ messages in thread