* [PATCH 00/15] Prepare GHES driver to support error injection
@ 2024-09-25 4:04 Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
` (14 more replies)
0 siblings, 15 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, kvm, qemu-arm,
qemu-devel
During the development of a patch series meant to allow GHESv2 error injections,
it was requested a change on how CPER offsets are calculated, by adding a new
BIOS pointer and reworking the GHES logic. See:
https://lore.kernel.org/qemu-devel/cover.1726293808.git.mchehab+huawei@kernel.org/
Such change ended being a big patch, so several intermediate steps are needed,
together with several cleanups and renames.
As agreed duing v10 review, I'll be splitting the big patch series into separate pull
requests, starting with the cleanup series. This is the first patch set, containing
only such preparation patches.
The next series will contain the shift to use offsets from the location of the
HEST table, together with a migration logic to make it compatible with 9.1.
Mauro Carvalho Chehab (15):
acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
acpi/ghes: simplify acpi_ghes_record_errors() code
acpi/ghes: simplify the per-arch caller to build HEST table
acpi/ghes: better handle source_id and notification
acpi/ghes: Fix acpi_ghes_record_errors() argument
acpi/ghes: Remove a duplicated out of bounds check
acpi/ghes: Change the type for source_id
acpi/ghes: Prepare to support multiple sources on ghes
acpi/ghes: make the GHES record generation more generic
acpi/ghes: move offset calculus to a separate function
acpi/ghes: better name GHES memory error function
acpi/ghes: don't crash QEMU if ghes GED is not found
acpi/ghes: rename etc/hardware_error file macros
better name the offset of the hardware error firmware
docs: acpi_hest_ghes: fix documentation for CPER size
docs/specs/acpi_hest_ghes.rst | 6 +-
hw/acpi/generic_event_device.c | 4 +-
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 279 ++++++++++++++++++++-------------
hw/arm/virt-acpi-build.c | 10 +-
include/hw/acpi/ghes.h | 34 ++--
target/arm/kvm.c | 3 +-
7 files changed, 206 insertions(+), 132 deletions(-)
--
2.46.1
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:02 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
` (13 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which
has a better name. So, drop the duplication.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/ghes.c | 7 ++-----
include/hw/acpi/ghes.h | 3 ++-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..dc217694deb9 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -34,9 +34,6 @@
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT 1
-
/* Generic Hardware Error Source version 2 */
#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
@@ -396,7 +393,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
- assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
+ assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
@@ -407,7 +404,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
if (physical_address) {
- if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
start_addr += source_id * sizeof(uint64_t);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..59e3b8fb24b9 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -59,7 +59,8 @@ enum AcpiGhesNotifyType {
enum {
ACPI_HEST_SRC_ID_SEA = 0,
/* future ids go here */
- ACPI_HEST_SRC_ID_RESERVED,
+
+ ACPI_GHES_ERROR_SOURCE_COUNT
};
typedef struct AcpiGhesState {
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:09 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
` (12 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Reduce the ident of the function and prepares it for
the next changes.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/ghes.c | 56 ++++++++++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index dc217694deb9..dacbd4d0c093 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -402,40 +402,42 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
start_addr = le64_to_cpu(ags->ghes_addr_le);
- if (physical_address) {
+ if (!physical_address) {
+ return -1;
+ }
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
+ start_addr += source_id * sizeof(uint64_t);
+ }
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
+ cpu_physical_memory_read(start_addr, &error_block_addr,
+ sizeof(error_block_addr));
- error_block_addr = le64_to_cpu(error_block_addr);
+ error_block_addr = le64_to_cpu(error_block_addr);
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ read_ack_register_addr = start_addr +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
- cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
+ cpu_physical_memory_read(read_ack_register_addr,
+ &read_ack_register, sizeof(read_ack_register));
- /* zero means OSPM does not acknowledge the error */
- if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack_register) {
+ error_report("OSPM does not acknowledge previous error,"
+ " so can not record CPER for current error anymore");
+ } else if (error_block_addr) {
+ read_ack_register = cpu_to_le64(0);
+ /*
+ * Clear the Read Ack Register, OSPM will write it to 1 when
+ * it acknowledges this error.
+ */
+ cpu_physical_memory_write(read_ack_register_addr,
+ &read_ack_register, sizeof(uint64_t));
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else
- error_report("can not find Generic Error Status Block");
+ ret = acpi_ghes_record_mem_error(error_block_addr,
+ physical_address);
+ } else {
+ error_report("can not find Generic Error Status Block");
}
return ret;
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:11 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
` (11 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
The GHES driver requires not only a HEST table, but also a
separate firmware file to store Error Structure records.
It can't do one without the other.
Simplify the caller logic for it to require one function.
This prepares for further changes where the HEST table
generation will become more generic.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes from v10:
- Removed the logic which associates notification and source
ID. This will be placed on a separate patch.
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 7 +++++--
hw/arm/virt-acpi-build.c | 5 ++---
include/hw/acpi/ghes.h | 4 ++--
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index dacbd4d0c093..7b42ed59cd15 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -233,7 +233,7 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
{
int i, error_status_block_offset;
@@ -356,12 +356,15 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
}
/* Build Hardware Error Source Table */
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ build_ghes_error_table(hardware_errors, linker);
+
acpi_table_begin(&table, table_data);
/* Error Source Count */
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..bafd9a56c217 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -943,10 +943,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
build_dbg2(tables_blob, tables->linker, vms);
if (vms->ras) {
- build_ghes_error_table(tables->hardware_errors, tables->linker);
acpi_add_table(table_offsets, tables_blob);
- acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
- vms->oem_table_id);
+ acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ vms->oem_id, vms->oem_table_id);
}
if (ms->numa_state->num_nodes > 0) {
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 59e3b8fb24b9..20016c226d1f 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -68,8 +68,8 @@ typedef struct AcpiGhesState {
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 04/15] acpi/ghes: better handle source_id and notification
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:13 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
` (10 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
GHES has two fields that are stored on HEST error source
blocks:
- notification type, which is a number defined at the ACPI spec
containing several arch-specific synchronous and assynchronous
types;
- source id, which is a HW/FW defined number, used to distinguish
between different implemented hardware report mechanisms.
Cleanup the logic to fill those, as they should be handled
independently.
This is a preparation for a future patch that will shift
those fields to the HEST init function call.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Chenges from v10:
- Some changes got moved to the previous patch.
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 7b42ed59cd15..7460cd1a8d56 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -284,9 +284,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
+static void build_ghes_v2(GArray *table_data,
+ BIOSLinker *linker,
+ enum AcpiGhesNotifyType notify,
+ uint16_t source_id)
{
uint64_t address_offset;
+
/*
* Type:
* Generic Hardware Error Source version 2(GHESv2 - Type 10)
@@ -316,18 +320,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
- switch (source_id) {
- case ACPI_HEST_SRC_ID_SEA:
- /*
- * Notification Structure
- * Now only enable ARMv8 SEA notification type
- */
- build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
- break;
- default:
- error_report("Not support this error source");
- abort();
- }
+ /* Notification Structure */
+ build_ghes_hw_error_notification(table_data, notify);
/* Error Status Block Length */
build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
@@ -369,7 +363,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+ build_ghes_v2(table_data, linker,
+ ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
acpi_table_end(linker, &table);
}
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (3 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:15 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
` (9 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
The first argument is source ID and not notification type.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
include/hw/acpi/ghes.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 20016c226d1f..50e3a25ea384 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (4 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:15 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
` (8 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
acpi_ghes_record_errors() has an assert() at the beginning
to ensure that source_id will be lower than
ACPI_GHES_ERROR_SOURCE_COUNT. Remove a duplicated check.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/ghes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 7460cd1a8d56..b932b6fe2c2e 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -404,9 +404,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
return -1;
}
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ start_addr += source_id * sizeof(uint64_t);
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 07/15] acpi/ghes: Change the type for source_id
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (5 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:16 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
` (7 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
HEST source ID is actually a 16-bit value
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index c315de1802d6..2b64cbd2819a 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b932b6fe2c2e..29240aa139d5 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -383,7 +383,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
{
uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
uint64_t start_addr;
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 50e3a25ea384..9295e46be25e 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (6 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-25 14:23 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 09/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
` (6 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
The current code is actually dependent on having just one
error structure with a single source.
As the number of sources should be arch-dependent, as it
will depend on what kind of synchronous/assynchronous
notifications will exist, change the logic to dynamically
build the table.
Yet, for a proper support, we need to get the number of
sources by reading the number from the HEST table. However,
bios currently doesn't store a pointer to it.
For now just change the logic at table build time, while
enforcing that it will behave like before with a single
source ID.
A future patch will add a HEST table bios pointer and
change the logic at acpi_ghes_record_errors() to
dynamically use the new size.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 65 ++++++++++++++++++++++++----------------
hw/arm/virt-acpi-build.c | 5 ++++
include/hw/acpi/ghes.h | 21 ++++++++-----
3 files changed, 59 insertions(+), 32 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 29240aa139d5..340a0263faf8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -233,17 +233,26 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+ int num_sources)
{
int i, error_status_block_offset;
+ /*
+ * TODO: Current version supports only one source.
+ * A further patch will drop this check, after adding a proper migration
+ * code, as, for the code to work, we need to store a bios pointer to the
+ * HEST table.
+ */
+ assert(num_sources == 1);
+
/* Build error_block_address */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
}
/* Build read_ack_register */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Initialize the value of read_ack_register to 1, so GHES can be
* writable after (re)boot.
@@ -258,13 +267,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
/* Reserve space for Error Status Data Block */
acpi_data_push(hardware_errors,
- ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+ ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Tell firmware to patch error_block_address entries to point to
* corresponding "Generic Error Status Block"
@@ -286,10 +295,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
/* Build Generic Hardware Error Source version 2 (GHESv2) */
static void build_ghes_v2(GArray *table_data,
BIOSLinker *linker,
- enum AcpiGhesNotifyType notify,
- uint16_t source_id)
+ const AcpiNotificationSourceId *notif_src,
+ uint16_t index, int num_sources)
{
uint64_t address_offset;
+ const uint16_t notify = notif_src->notify;
+ const uint16_t source_id = notif_src->source_id;
/*
* Type:
@@ -318,7 +329,7 @@ static void build_ghes_v2(GArray *table_data,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
+ ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t));
/* Notification Structure */
build_ghes_hw_error_notification(table_data, notify);
@@ -335,9 +346,10 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_GHES_ERRORS_FW_CFG_FILE,
+ (num_sources + index) * sizeof(uint64_t));
/*
* Read Ack Preserve field
@@ -352,19 +364,23 @@ static void build_ghes_v2(GArray *table_data,
/* Build Hardware Error Source Table */
void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
BIOSLinker *linker,
+ const AcpiNotificationSourceId * const notif_source,
+ int num_sources,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ int i;
- build_ghes_error_table(hardware_errors, linker);
+ build_ghes_error_table(hardware_errors, linker, num_sources);
acpi_table_begin(&table, table_data);
/* Error Source Count */
- build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, linker,
- ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
+ build_append_int_noprefix(table_data, num_sources, 4);
+ for (i = 0; i < num_sources; i++) {
+ build_ghes_v2(table_data, linker, ¬if_source[i], i, num_sources);
+ }
acpi_table_end(linker, &table);
}
@@ -391,28 +407,27 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
- assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
-
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
g_assert(acpi_ged_state);
ags = &acpi_ged_state->ghes_state;
+ /*
+ * Current version supports only one source, as assured during table build,
+ * so no need to change the start offset based on the source ID.
+ */
start_addr = le64_to_cpu(ags->ghes_addr_le);
- if (!physical_address) {
- return -1;
- }
-
- start_addr += source_id * sizeof(uint64_t);
-
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
error_block_addr = le64_to_cpu(error_block_addr);
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ /*
+ * As the current version supports only one source, the ack offset is
+ * just sizeof(uint64_t).
+ */
+ read_ack_register_addr = start_addr + sizeof(uint64_t);
cpu_physical_memory_read(read_ack_register_addr,
&read_ack_register, sizeof(read_ack_register));
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bafd9a56c217..476c365851c4 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -890,6 +890,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
}
+static const AcpiNotificationSourceId hest_ghes_notify[] = {
+ {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
+};
+
static
void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
{
@@ -945,6 +949,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
if (vms->ras) {
acpi_add_table(table_offsets, tables_blob);
acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
vms->oem_id, vms->oem_table_id);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 9295e46be25e..d6e2801d9cd9 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -56,20 +56,27 @@ enum AcpiGhesNotifyType {
ACPI_GHES_NOTIFY_RESERVED = 12
};
-enum {
- ACPI_HEST_SRC_ID_SEA = 0,
- /* future ids go here */
-
- ACPI_GHES_ERROR_SOURCE_COUNT
-};
-
typedef struct AcpiGhesState {
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
+/*
+ * ID numbers used to fill HEST source ID field
+ */
+enum AcpiGhesSourceID {
+ ACPI_HEST_SRC_ID_SYNC,
+};
+
+typedef struct AcpiNotificationSourceId {
+ enum AcpiGhesSourceID source_id;
+ enum AcpiGhesNotifyType notify;
+} AcpiNotificationSourceId;
+
void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
BIOSLinker *linker,
+ const AcpiNotificationSourceId * const notif_source,
+ int num_sources,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 09/15] acpi/ghes: make the GHES record generation more generic
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (7 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:00 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 10/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
` (5 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Split the code into separate functions to allow using the
common CPER filling code by different error sources.
The generic code was moved to ghes_record_cper_errors(),
and ghes_gen_err_data_uncorrectable_recoverable() now contains
only a logic to fill GEGB part of the record.
The remaining code to generate a memory error now belongs to
acpi_ghes_record_errors() function.
A further patch will give it a better name.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 118 +++++++++++++++++++++++++----------------
include/hw/acpi/ghes.h | 3 ++
2 files changed, 74 insertions(+), 47 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 340a0263faf8..307b5a41d539 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -181,51 +181,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
build_append_int_noprefix(table, 0, 7);
}
-static int acpi_ghes_record_mem_error(uint64_t error_block_address,
- uint64_t error_physical_addr)
+static void
+ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
+ const uint8_t *section_type,
+ int data_length)
{
- GArray *block;
-
- /* Memory Error Section Type */
- const uint8_t uefi_cper_mem_sec[] =
- UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
- 0xED, 0x7C, 0x83, 0xB1);
-
/* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
* Table 17-13 Generic Error Data Entry
*/
QemuUUID fru_id = {};
- uint32_t data_length;
- block = g_array_new(false, true /* clear */, 1);
-
- /* This is the length if adding a new generic error data entry*/
- data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
/*
- * It should not run out of the preallocated memory if adding a new generic
- * error data entry
+ * Calculate the size with this block. No need to check for
+ * too big CPER, as CPER size is checked at ghes_record_cper_errors()
*/
- assert((data_length + ACPI_GHES_GESB_SIZE) <=
- ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ data_length += ACPI_GHES_GESB_SIZE;
/* Build the new generic error status block header */
acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
/* Build this new generic error data entry header */
- acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
+ acpi_ghes_generic_error_data(block, section_type,
ACPI_CPER_SEV_RECOVERABLE, 0, 0,
ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-
- /* Build the memory section CPER for above new generic error data entry */
- acpi_ghes_build_append_mem_cper(block, error_physical_addr);
-
- /* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_address, block->data, block->len);
-
- g_array_free(block, true);
-
- return 0;
}
/*
@@ -399,14 +378,19 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp)
{
uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
uint64_t start_addr;
- bool ret = -1;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
+ if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+ error_setg(errp, "GHES CPER record is too big: %ld", len);
+ return;
+ }
+
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
g_assert(acpi_ged_state);
@@ -422,6 +406,10 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
sizeof(error_block_addr));
error_block_addr = le64_to_cpu(error_block_addr);
+ if (!error_block_addr) {
+ error_setg(errp, "can not find Generic Error Status Block");
+ return;
+ }
/*
* As the current version supports only one source, the ack offset is
@@ -434,24 +422,60 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
/* zero means OSPM does not acknowledge the error */
if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
-
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else {
- error_report("can not find Generic Error Status Block");
+ error_setg(errp,
+ "OSPM does not acknowledge previous error,"
+ " so can not record CPER for current error anymore");
+ return;
}
- return ret;
+ read_ack_register = cpu_to_le64(0);
+ /*
+ * Clear the Read Ack Register, OSPM will write it to 1 when
+ * it acknowledges this error.
+ */
+ cpu_physical_memory_write(read_ack_register_addr,
+ &read_ack_register, sizeof(uint64_t));
+
+ /* Write the generic error data entry into guest memory */
+ cpu_physical_memory_write(error_block_addr, cper, len);
+
+ return;
+}
+
+int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+{
+ /* Memory Error Section Type */
+ const uint8_t guid[] =
+ UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+ 0xED, 0x7C, 0x83, 0xB1);
+ Error *errp = NULL;
+ GArray *block;
+
+ if (!physical_address) {
+ error_report("can not find Generic Error Status Block for source id %d",
+ source_id);
+ return -1;
+ }
+
+ block = g_array_new(false, true /* clear */, 1);
+
+ ghes_gen_err_data_uncorrectable_recoverable(block, guid,
+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+ /* Build the memory section CPER for above new generic error data entry */
+ acpi_ghes_build_append_mem_cper(block, physical_address);
+
+ /* Report the error */
+ ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+
+ g_array_free(block, true);
+
+ if (errp) {
+ error_report_err(errp);
+ return -1;
+ }
+
+ return 0;
}
bool acpi_ghes_present(void)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index d6e2801d9cd9..1b988ac1e2f2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
/*
* Values for Hardware Error Notification Type field
@@ -80,6 +81,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp);
int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 10/15] acpi/ghes: move offset calculus to a separate function
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (8 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 09/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:03 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 11/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
` (4 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Currently, CPER address location is calculated as an offset of
the hardware_errors table. It is also badly named, as the
offset actually used is the address where the CPER data starts,
and not the beginning of the error source.
Move the logic which calculates such offset to a separate
function, in preparation for a patch that will be changing the
logic to calculate it from the HEST table.
While here, properly name the variable which stores the cper
address.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 52 +++++++++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 307b5a41d539..900f1571bc97 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -378,11 +378,36 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
+static void get_ghes_offsets(uint64_t ghes_addr,
+ uint64_t *cper_addr,
+ uint64_t *read_ack_register_addr)
+{
+ if (!ghes_addr) {
+ return;
+ }
+
+ /*
+ * non-HEST version supports only one source, so no need to change
+ * the start offset based on the source ID. Also, we can't validate
+ * the source ID, as it is stored inside the HEST table.
+ */
+
+ cpu_physical_memory_read(ghes_addr, cper_addr,
+ sizeof(*cper_addr));
+
+ *cper_addr = le64_to_cpu(*cper_addr);
+
+ /*
+ * As the current version supports only one source, the ack offset is
+ * just sizeof(uint64_t).
+ */
+ *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
+}
+
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
- uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
- uint64_t start_addr;
+ uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
@@ -396,29 +421,16 @@ void ghes_record_cper_errors(const void *cper, size_t len,
g_assert(acpi_ged_state);
ags = &acpi_ged_state->ghes_state;
- /*
- * Current version supports only one source, as assured during table build,
- * so no need to change the start offset based on the source ID.
- */
- start_addr = le64_to_cpu(ags->ghes_addr_le);
+ get_ghes_offsets(le64_to_cpu(ags->ghes_addr_le),
+ &cper_addr, &read_ack_register_addr);
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
-
- error_block_addr = le64_to_cpu(error_block_addr);
- if (!error_block_addr) {
+ if (!cper_addr) {
error_setg(errp, "can not find Generic Error Status Block");
return;
}
- /*
- * As the current version supports only one source, the ack offset is
- * just sizeof(uint64_t).
- */
- read_ack_register_addr = start_addr + sizeof(uint64_t);
-
cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
+ &read_ack_register, sizeof(read_ack_register));
/* zero means OSPM does not acknowledge the error */
if (!read_ack_register) {
@@ -437,7 +449,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
&read_ack_register, sizeof(uint64_t));
/* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_addr, cper, len);
+ cpu_physical_memory_write(cper_addr, cper, len);
return;
}
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 11/15] acpi/ghes: better name GHES memory error function
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (9 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 10/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:07 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
` (3 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, kvm, linux-kernel, qemu-arm, qemu-devel
The current function used to generate GHES data is specific for
memory errors. Give a better name for it, as we now have a generic
function as well.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 5 +++--
target/arm/kvm.c | 3 ++-
4 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 2b64cbd2819a..7cec1812dad9 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 900f1571bc97..3af1cd16d4d7 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -454,7 +454,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
return;
}
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
{
/* Memory Error Section Type */
const uint8_t guid[] =
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 1b988ac1e2f2..051a9322141f 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -81,15 +81,16 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
+int acpi_ghes_memory_errors(uint16_t source_id,
+ uint64_t error_physical_addr);
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
-int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
*
* Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_record_errors() to record a memory error.
+ * safe to call acpi_ghes_memory_errors() to record a memory error.
*/
bool acpi_ghes_present(void);
#endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 849e2e21b304..57192285fb96 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+ if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SYNC,
+ paddr)) {
kvm_inject_arm_sea(c);
} else {
error_report("failed to record the error");
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (10 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 11/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:09 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
` (2 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Instead, produce an error and continue working
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 3af1cd16d4d7..209095f67e9a 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -418,7 +418,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
- g_assert(acpi_ged_state);
+ if (!acpi_ged_state) {
+ error_setg(errp, "Can't find ACPI_GED object");
+ return;
+ }
ags = &acpi_ged_state->ghes_state;
get_ghes_offsets(le64_to_cpu(ags->ghes_addr_le),
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (11 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:11 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 14/15] better name the offset of the hardware error firmware Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Now that we have also have a file to store HEST data location,
which is part of GHES, better name the file where CPER records
are stored.
No functional changes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 209095f67e9a..3d03506fdaf8 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -28,8 +28,8 @@
#include "hw/nvram/fw_cfg.h"
#include "qemu/uuid.h"
-#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
-#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
+#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
@@ -249,7 +249,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
- bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+ bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
for (i = 0; i < num_sources; i++) {
@@ -258,17 +258,21 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
* corresponding "Generic Error Status Block"
*/
bios_linker_loader_add_pointer(linker,
- ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ sizeof(uint64_t) * i,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ error_status_block_offset +
+ i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
}
/*
* tell firmware to write hardware_errors GPA into
* hardware_errors_addr fw_cfg, once the former has been initialized.
*/
- bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
- 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+ bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE, 0);
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
@@ -307,8 +311,10 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ index * sizeof(uint64_t));
/* Notification Structure */
build_ghes_hw_error_notification(table_data, notify);
@@ -327,7 +333,7 @@ static void build_ghes_v2(GArray *table_data,
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE,
+ ACPI_HW_ERROR_FW_CFG_FILE,
(num_sources + index) * sizeof(uint64_t));
/*
@@ -368,11 +374,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
GArray *hardware_error)
{
/* Create a read-only fw_cfg file for GHES */
- fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+ fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
hardware_error->len);
/* Create a read-write fw_cfg file for Address */
- fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+ fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
ags->present = true;
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 14/15] better name the offset of the hardware error firmware
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (12 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:12 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
The hardware error firmware is where HEST error structures are
stored. Those can be GHESv2, but they can also be other types.
Better name the location of the hardware error.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/generic_event_device.c | 4 ++--
hw/acpi/ghes.c | 4 ++--
include/hw/acpi/ghes.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 15b4c3ebbf24..d4dbfb45e181 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -346,7 +346,7 @@ static const VMStateDescription vmstate_ghes = {
.version_id = 1,
.minimum_version_id = 1,
.fields = (const VMStateField[]) {
- VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
+ VMSTATE_UINT64(hw_error_le, AcpiGhesState),
VMSTATE_END_OF_LIST()
},
};
@@ -354,7 +354,7 @@ static const VMStateDescription vmstate_ghes = {
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
- return s->ghes_state.ghes_addr_le;
+ return s->ghes_state.hw_error_le;
}
static const VMStateDescription vmstate_ghes_state = {
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 3d03506fdaf8..8b3292be07e7 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -379,7 +379,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
/* Create a read-write fw_cfg file for Address */
fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
- NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+ NULL, &(ags->hw_error_le), sizeof(ags->hw_error_le), false);
ags->present = true;
}
@@ -430,7 +430,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
}
ags = &acpi_ged_state->ghes_state;
- get_ghes_offsets(le64_to_cpu(ags->ghes_addr_le),
+ get_ghes_offsets(le64_to_cpu(ags->hw_error_le),
&cper_addr, &read_ack_register_addr);
if (!cper_addr) {
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 051a9322141f..e47ffacbb5c9 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,7 +58,7 @@ enum AcpiGhesNotifyType {
};
typedef struct AcpiGhesState {
- uint64_t ghes_addr_le;
+ uint64_t hw_error_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
` (13 preceding siblings ...)
2024-09-25 4:04 ` [PATCH 14/15] better name the offset of the hardware error firmware Mauro Carvalho Chehab
@ 2024-09-25 4:04 ` Mauro Carvalho Chehab
2024-09-26 12:13 ` Jonathan Cameron via
14 siblings, 1 reply; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:04 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Dongjiu Geng,
linux-kernel, qemu-arm, qemu-devel
While the spec defines a CPER size of 4KiB for each record,
currently it is set to 1KiB. Fix the documentation and add
a pointer to the macro name there, as this may help to keep
it updated.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
---
docs/specs/acpi_hest_ghes.rst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index 68f1fbe0a4af..c3e9f8d9a702 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -67,8 +67,10 @@ Design Details
(3) The address registers table contains N Error Block Address entries
and N Read Ack Register entries. The size for each entry is 8-byte.
The Error Status Data Block table contains N Error Status Data Block
- entries. The size for each entry is 4096(0x1000) bytes. The total size
- for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+ entries. The size for each entry is defined at the source code as
+ ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+ for the "etc/hardware_errors" fw_cfg blob is
+ (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
N is the number of the kinds of hardware error sources.
(4) QEMU generates the ACPI linker/loader script for the firmware. The
--
2.46.1
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
2024-09-25 4:04 ` [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
@ 2024-09-25 14:02 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:02 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:06 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which
> has a better name. So, drop the duplication.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code
2024-09-25 4:04 ` [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
@ 2024-09-25 14:09 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:09 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:07 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Reduce the ident of the function and prepares it for
> the next changes.
>
> No functional changes.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Some of the alignment doesn't seem to match local style which
is either align after ( or align 4 spaces in from line above for
multiple line argument lists.
Code is fine as I guess the later structure is to prepare
for additions that prevent early returns being sensible.
With that in mind.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/acpi/ghes.c | 56 ++++++++++++++++++++++++++------------------------
> 1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index dc217694deb9..dacbd4d0c093 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -402,40 +402,42 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>
> start_addr = le64_to_cpu(ags->ghes_addr_le);
>
> - if (physical_address) {
> + if (!physical_address) {
> + return -1;
> + }
>
> - if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> + if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> + start_addr += source_id * sizeof(uint64_t);
> + }
>
> - cpu_physical_memory_read(start_addr, &error_block_addr,
> - sizeof(error_block_addr));
> + cpu_physical_memory_read(start_addr, &error_block_addr,
> + sizeof(error_block_addr));
>
> - error_block_addr = le64_to_cpu(error_block_addr);
> + error_block_addr = le64_to_cpu(error_block_addr);
>
> - read_ack_register_addr = start_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> + read_ack_register_addr = start_addr +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
>
> - cpu_physical_memory_read(read_ack_register_addr,
> - &read_ack_register, sizeof(read_ack_register));
> + cpu_physical_memory_read(read_ack_register_addr,
> + &read_ack_register, sizeof(read_ack_register));
Alignment looks fishy plus maybe move &read_ack_register up a line.
>
> - /* zero means OSPM does not acknowledge the error */
> - if (!read_ack_register) {
> - error_report("OSPM does not acknowledge previous error,"
> - " so can not record CPER for current error anymore");
> - } else if (error_block_addr) {
> - read_ack_register = cpu_to_le64(0);
> - /*
> - * Clear the Read Ack Register, OSPM will write it to 1 when
> - * it acknowledges this error.
> - */
> - cpu_physical_memory_write(read_ack_register_addr,
> - &read_ack_register, sizeof(uint64_t));
> + /* zero means OSPM does not acknowledge the error */
> + if (!read_ack_register) {
> + error_report("OSPM does not acknowledge previous error,"
> + " so can not record CPER for current error anymore");
I guess later changes make it unwise to just return -1 here.
> + } else if (error_block_addr) {
> + read_ack_register = cpu_to_le64(0);
> + /*
> + * Clear the Read Ack Register, OSPM will write it to 1 when
> + * it acknowledges this error.
> + */
> + cpu_physical_memory_write(read_ack_register_addr,
> + &read_ack_register, sizeof(uint64_t));
>
> - ret = acpi_ghes_record_mem_error(error_block_addr,
> - physical_address);
> - } else
> - error_report("can not find Generic Error Status Block");
> + ret = acpi_ghes_record_mem_error(error_block_addr,
> + physical_address);
or return here.
> + } else {
> + error_report("can not find Generic Error Status Block");
> }
>
> return ret;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table
2024-09-25 4:04 ` [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
@ 2024-09-25 14:11 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:11 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
qemu-devel
On Wed, 25 Sep 2024 06:04:08 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The GHES driver requires not only a HEST table, but also a
> separate firmware file to store Error Structure records.
> It can't do one without the other.
>
> Simplify the caller logic for it to require one function.
>
> This prepares for further changes where the HEST table
> generation will become more generic.
>
> No functional changes.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> ---
>
> Changes from v10:
> - Removed the logic which associates notification and source
> ID. This will be placed on a separate patch.
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Odd sign off here ;)
Otherwise seems reasonable to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 04/15] acpi/ghes: better handle source_id and notification
2024-09-25 4:04 ` [PATCH 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-09-25 14:13 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:09 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> GHES has two fields that are stored on HEST error source
> blocks:
>
> - notification type, which is a number defined at the ACPI spec
> containing several arch-specific synchronous and assynchronous
> types;
> - source id, which is a HW/FW defined number, used to distinguish
> between different implemented hardware report mechanisms.
>
> Cleanup the logic to fill those, as they should be handled
> independently.
>
> This is a preparation for a future patch that will shift
> those fields to the HEST init function call.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Trivial comment inline.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> ---
>
> Chenges from v10:
Changes
>
> - Some changes got moved to the previous patch.
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 7b42ed59cd15..7460cd1a8d56 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -284,9 +284,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> -static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> +static void build_ghes_v2(GArray *table_data,
> + BIOSLinker *linker,
> + enum AcpiGhesNotifyType notify,
> + uint16_t source_id)
> {
> uint64_t address_offset;
> +
Technically a stray change but meh there should have always been a blank
line here.
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -316,18 +320,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument
2024-09-25 4:04 ` [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
@ 2024-09-25 14:15 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:10 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The first argument is source ID and not notification type.
Maybe call out that the header already differs from the two
implementations in naming this parameter. This just corrects
that.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> ---
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> include/hw/acpi/ghes.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 20016c226d1f..50e3a25ea384 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check
2024-09-25 4:04 ` [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
@ 2024-09-25 14:15 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:11 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> acpi_ghes_record_errors() has an assert() at the beginning
> to ensure that source_id will be lower than
> ACPI_GHES_ERROR_SOURCE_COUNT. Remove a duplicated check.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 07/15] acpi/ghes: Change the type for source_id
2024-09-25 4:04 ` [PATCH 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-09-25 14:16 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:16 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:12 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> HEST source ID is actually a 16-bit value
Indeed.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 2 +-
> include/hw/acpi/ghes.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index c315de1802d6..2b64cbd2819a 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,7 +11,7 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/ghes.h"
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> {
> return -1;
> }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index b932b6fe2c2e..29240aa139d5 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -383,7 +383,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> {
> uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> uint64_t start_addr;
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 50e3a25ea384..9295e46be25e 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -73,7 +73,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes
2024-09-25 4:04 ` [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
@ 2024-09-25 14:23 ` Jonathan Cameron via
2024-10-01 5:29 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-25 14:23 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
qemu-devel
On Wed, 25 Sep 2024 06:04:13 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The current code is actually dependent on having just one
> error structure with a single source.
>
> As the number of sources should be arch-dependent, as it
> will depend on what kind of synchronous/assynchronous
> notifications will exist, change the logic to dynamically
> build the table.
Not really arch dependent. Depends on both arch and some
firmware implementation choices, but I guess that detail
doesn't matter here.
>
> Yet, for a proper support, we need to get the number of
> sources by reading the number from the HEST table. However,
> bios currently doesn't store a pointer to it.
>
> For now just change the logic at table build time, while
> enforcing that it will behave like before with a single
> source ID.
>
> A future patch will add a HEST table bios pointer and
> change the logic at acpi_ghes_record_errors() to
> dynamically use the new size.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Trivial comment inline
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> @@ -335,9 +346,10 @@ static void build_ghes_v2(GArray *table_data,
> build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> 4 /* QWord access */, 0);
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> - address_offset + GAS_ADDR_OFFSET,
> - sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> - (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
> + address_offset + GAS_ADDR_OFFSET,
I'd prefer if we avoided realigning unless absolutely necessary or
that it is split into a separate patch.
Makes things a tiny bit harder to review.
> + sizeof(uint64_t),
> + ACPI_GHES_ERRORS_FW_CFG_FILE,
> + (num_sources + index) * sizeof(uint64_t));
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 09/15] acpi/ghes: make the GHES record generation more generic
2024-09-25 4:04 ` [PATCH 09/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
@ 2024-09-26 12:00 ` Jonathan Cameron via
2024-09-26 14:19 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:00 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Split the code into separate functions to allow using the
> common CPER filling code by different error sources.
>
> The generic code was moved to ghes_record_cper_errors(),
> and ghes_gen_err_data_uncorrectable_recoverable() now contains
> only a logic to fill GEGB part of the record.
>
> The remaining code to generate a memory error now belongs to
> acpi_ghes_record_errors() function.
>
> A further patch will give it a better name.
That bit is fine, I'm less sure about
ghes_gen_err_data_uncorrectable_recoverable()
Maybe you refactor that later, but I'd suggest doing so in this
patch to make it
ghes_gen_data() with the uncorrectable and recoverable bits
passed in as parameters.
Jonathan
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 118 +++++++++++++++++++++++++----------------
> include/hw/acpi/ghes.h | 3 ++
> 2 files changed, 74 insertions(+), 47 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 340a0263faf8..307b5a41d539 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -181,51 +181,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> build_append_int_noprefix(table, 0, 7);
> }
>
> -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> - uint64_t error_physical_addr)
> +static void
> +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> + const uint8_t *section_type,
> + int data_length)
> {
That's an ugly name . Suggestion below on instead passing parameters
for the uncorrectable and recoverable parts and amking this
ghes_gen_err_data()
> - GArray *block;
> -
> - /* Memory Error Section Type */
> - const uint8_t uefi_cper_mem_sec[] =
> - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> - 0xED, 0x7C, 0x83, 0xB1);
> -
> /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> * Table 17-13 Generic Error Data Entry
> */
> QemuUUID fru_id = {};
> - uint32_t data_length;
>
> - block = g_array_new(false, true /* clear */, 1);
> -
> - /* This is the length if adding a new generic error data entry*/
> - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> /*
> - * It should not run out of the preallocated memory if adding a new generic
> - * error data entry
> + * Calculate the size with this block. No need to check for
> + * too big CPER, as CPER size is checked at ghes_record_cper_errors()
> */
> - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> - ACPI_GHES_MAX_RAW_DATA_LENGTH);
> + data_length += ACPI_GHES_GESB_SIZE;
>
> /* Build the new generic error status block header */
> acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
>
> /* Build this new generic error data entry header */
> - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> + acpi_ghes_generic_error_data(block, section_type,
> ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
Maybe should just pass in ACPI_CPER_SEV_RECOVERABLE
and ACPI_GEBS_UNCORRECTABLE in parameters.
Main advantage being that should allow reuse for other combinations
and it gets rid of the nasty function name!
> -
> - /* Build the memory section CPER for above new generic error data entry */
> - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> -
> - /* Write the generic error data entry into guest memory */
> - cpu_physical_memory_write(error_block_address, block->data, block->len);
> -
> - g_array_free(block, true);
> -
> - return 0;
> }
>
> /*
> @@ -399,14 +378,19 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> +void ghes_record_cper_errors(const void *cper, size_t len,
> + uint16_t source_id, Error **errp)
> {
> - return ret;
> + read_ack_register = cpu_to_le64(0);
> + /*
> + * Clear the Read Ack Register, OSPM will write it to 1 when
> + * it acknowledges this error.
> + */
Indent issue.
> + cpu_physical_memory_write(read_ack_register_addr,
> + &read_ack_register, sizeof(uint64_t));
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(error_block_addr, cper, len);
> +
> + return;
> +}
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] acpi/ghes: move offset calculus to a separate function
2024-09-25 4:04 ` [PATCH 10/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
@ 2024-09-26 12:03 ` Jonathan Cameron via
2024-10-01 5:38 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:03 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:15 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Currently, CPER address location is calculated as an offset of
> the hardware_errors table. It is also badly named, as the
> offset actually used is the address where the CPER data starts,
> and not the beginning of the error source.
>
> Move the logic which calculates such offset to a separate
> function, in preparation for a patch that will be changing the
> logic to calculate it from the HEST table.
>
> While here, properly name the variable which stores the cper
> address.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Trivial comment inline.
Given this is a placeholder for more radical refactor I'll not comment on
the maths etc being less flexible than it will hopefully end up!
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> void ghes_record_cper_errors(const void *cper, size_t len,
> uint16_t source_id, Error **errp)
> {
> - /*
> - * As the current version supports only one source, the ack offset is
> - * just sizeof(uint64_t).
> - */
> - read_ack_register_addr = start_addr + sizeof(uint64_t);
> -
> cpu_physical_memory_read(read_ack_register_addr,
> - &read_ack_register, sizeof(read_ack_register));
> + &read_ack_register, sizeof(read_ack_register));
>
Wrong patch for this alignment tidy up?
Or are my eyes deceiving me and there is more going on here...
J
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 11/15] acpi/ghes: better name GHES memory error function
2024-09-25 4:04 ` [PATCH 11/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-09-26 12:07 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:07 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Paolo Bonzini, Peter Maydell, kvm, linux-kernel,
qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:16 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The current function used to generate GHES data is specific for
> memory errors. Give a better name for it, as we now have a generic
> function as well.
>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
In general fine, but question below on what looks to be an unrelated change.
Jonathan
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 849e2e21b304..57192285fb96 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> */
> if (code == BUS_MCEERR_AR) {
> kvm_cpu_synchronize_state(c);
> - if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
> + if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SYNC,
The parameter changes seems unconnected to rest of the patch... Maybe at least
mention it in the patch description.
I can't find the definition of ARM_ACPI_HEST_SRC_ID_SYNC either so where
did that come from?
> + paddr)) {
> kvm_inject_arm_sea(c);
> } else {
> error_report("failed to record the error");
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-09-25 4:04 ` [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
@ 2024-09-26 12:09 ` Jonathan Cameron via
2024-09-26 14:24 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:09 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:17 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Instead, produce an error and continue working
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Make sense as defense in depth. Can we actually hit this for existing
systems, or is the injection stuff disabled if the ged isn't configured?
Jonathan
> ---
> hw/acpi/ghes.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 3af1cd16d4d7..209095f67e9a 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -418,7 +418,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
>
> acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> NULL));
> - g_assert(acpi_ged_state);
> + if (!acpi_ged_state) {
> + error_setg(errp, "Can't find ACPI_GED object");
> + return;
> + }
> ags = &acpi_ged_state->ghes_state;
>
> get_ghes_offsets(le64_to_cpu(ags->ghes_addr_le),
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros
2024-09-25 4:04 ` [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-09-26 12:11 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:11 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:18 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Now that we have also have a file to store HEST data location,
> which is part of GHES, better name the file where CPER records
> are stored.
>
> No functional changes.
>
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
In general fine, but I'd have preferred if you'd avoided
code realignment where possible so the changes are more minimal
and easier to spot.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> hw/acpi/ghes.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 209095f67e9a..3d03506fdaf8 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -28,8 +28,8 @@
> #include "hw/nvram/fw_cfg.h"
> #include "qemu/uuid.h"
>
> -#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
> -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
> +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
>
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
> @@ -249,7 +249,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
>
> /* Tell guest firmware to place hardware_errors blob into RAM */
> - bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> + bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
> hardware_errors, sizeof(uint64_t), false);
>
> for (i = 0; i < num_sources; i++) {
> @@ -258,17 +258,21 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> * corresponding "Generic Error Status Block"
> */
> bios_linker_loader_add_pointer(linker,
> - ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
> - sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> - error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> + ACPI_HW_ERROR_FW_CFG_FILE,
> + sizeof(uint64_t) * i,
> + sizeof(uint64_t),
> + ACPI_HW_ERROR_FW_CFG_FILE,
> + error_status_block_offset +
> + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
I'd rather it kept closer to original formatting so the changes are more obvious.
> }
>
> /*
> * tell firmware to write hardware_errors GPA into
> * hardware_errors_addr fw_cfg, once the former has been initialized.
> */
> - bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> - 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> + bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> + sizeof(uint64_t),
> + ACPI_HW_ERROR_FW_CFG_FILE, 0);
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> @@ -307,8 +311,10 @@ static void build_ghes_v2(GArray *table_data,
> build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> 4 /* QWord access */, 0);
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> - address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t));
> + address_offset + GAS_ADDR_OFFSET,
> + sizeof(uint64_t),
> + ACPI_HW_ERROR_FW_CFG_FILE,
> + index * sizeof(uint64_t));
As above. Closer to original code alignment and this would be easier to
review.
>
> /* Notification Structure */
> build_ghes_hw_error_notification(table_data, notify);
> @@ -327,7 +333,7 @@ static void build_ghes_v2(GArray *table_data,
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET,
> sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE,
> + ACPI_HW_ERROR_FW_CFG_FILE,
> (num_sources + index) * sizeof(uint64_t));
>
> /*
> @@ -368,11 +374,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> GArray *hardware_error)
> {
> /* Create a read-only fw_cfg file for GHES */
> - fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> + fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
> hardware_error->len);
>
> /* Create a read-write fw_cfg file for Address */
> - fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> + fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> ags->present = true;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 14/15] better name the offset of the hardware error firmware
2024-09-25 4:04 ` [PATCH 14/15] better name the offset of the hardware error firmware Mauro Carvalho Chehab
@ 2024-09-26 12:12 ` Jonathan Cameron via
2024-10-01 6:02 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:12 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 25 Sep 2024 06:04:19 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The hardware error firmware is where HEST error structures are
> stored. Those can be GHESv2, but they can also be other types.
>
> Better name the location of the hardware error.
>
> No functional changes.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This feels a little theoretical as for now they are always GHESv2 I think?
I guess it does no harm and may make sense after follow up series.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size
2024-09-25 4:04 ` [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
@ 2024-09-26 12:13 ` Jonathan Cameron via
0 siblings, 0 replies; 36+ messages in thread
From: Jonathan Cameron via @ 2024-09-26 12:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Dongjiu Geng, linux-kernel, qemu-arm,
qemu-devel
On Wed, 25 Sep 2024 06:04:20 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> While the spec defines a CPER size of 4KiB for each record,
> currently it is set to 1KiB. Fix the documentation and add
> a pointer to the macro name there, as this may help to keep
> it updated.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Thanks,
J
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 09/15] acpi/ghes: make the GHES record generation more generic
2024-09-26 12:00 ` Jonathan Cameron via
@ 2024-09-26 14:19 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-26 14:19 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Thu, 26 Sep 2024 13:00:56 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 25 Sep 2024 06:04:14 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Split the code into separate functions to allow using the
> > common CPER filling code by different error sources.
> >
> > The generic code was moved to ghes_record_cper_errors(),
> > and ghes_gen_err_data_uncorrectable_recoverable() now contains
> > only a logic to fill GEGB part of the record.
> >
> > The remaining code to generate a memory error now belongs to
> > acpi_ghes_record_errors() function.
> >
> > A further patch will give it a better name.
>
> That bit is fine, I'm less sure about
> ghes_gen_err_data_uncorrectable_recoverable()
> Maybe you refactor that later, but I'd suggest doing so in this
> patch to make it
> ghes_gen_data() with the uncorrectable and recoverable bits
> passed in as parameters.
For now, no need. When using the error injection script, such
function is not used. The script can already play with what's
defined there.
Besides that, I tried to generalize it, but it is not trivial,
as passing some things as parameter is really hard. So, instead,
I opted to keep the code as-is. It can be changed later once we
add internal events that require a different setting than what
we have with ARMv8 memory errors via SEA.
>
> Jonathan
>
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/ghes.c | 118 +++++++++++++++++++++++++----------------
> > include/hw/acpi/ghes.h | 3 ++
> > 2 files changed, 74 insertions(+), 47 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 340a0263faf8..307b5a41d539 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -181,51 +181,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> > build_append_int_noprefix(table, 0, 7);
> > }
> >
> > -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > - uint64_t error_physical_addr)
> > +static void
> > +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> > + const uint8_t *section_type,
> > + int data_length)
> > {
>
> That's an ugly name . Suggestion below on instead passing parameters
> for the uncorrectable and recoverable parts and amking this
> ghes_gen_err_data()
>
> > - GArray *block;
> > -
> > - /* Memory Error Section Type */
> > - const uint8_t uefi_cper_mem_sec[] =
> > - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > - 0xED, 0x7C, 0x83, 0xB1);
> > -
> > /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> > * Table 17-13 Generic Error Data Entry
> > */
> > QemuUUID fru_id = {};
> > - uint32_t data_length;
> >
> > - block = g_array_new(false, true /* clear */, 1);
> > -
> > - /* This is the length if adding a new generic error data entry*/
> > - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> > /*
> > - * It should not run out of the preallocated memory if adding a new generic
> > - * error data entry
> > + * Calculate the size with this block. No need to check for
> > + * too big CPER, as CPER size is checked at ghes_record_cper_errors()
> > */
> > - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> > - ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > + data_length += ACPI_GHES_GESB_SIZE;
> >
> > /* Build the new generic error status block header */
> > acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> > 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> >
> > /* Build this new generic error data entry header */
> > - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> > + acpi_ghes_generic_error_data(block, section_type,
> > ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> > ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
> Maybe should just pass in ACPI_CPER_SEV_RECOVERABLE
> and ACPI_GEBS_UNCORRECTABLE in parameters.
>
> Main advantage being that should allow reuse for other combinations
> and it gets rid of the nasty function name!
>
> > -
> > - /* Build the memory section CPER for above new generic error data entry */
> > - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> > -
> > - /* Write the generic error data entry into guest memory */
> > - cpu_physical_memory_write(error_block_address, block->data, block->len);
> > -
> > - g_array_free(block, true);
> > -
> > - return 0;
> > }
> >
> > /*
> > @@ -399,14 +378,19 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > ags->present = true;
> > }
> >
> > -int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > + uint16_t source_id, Error **errp)
> > {
>
> > - return ret;
> > + read_ack_register = cpu_to_le64(0);
> > + /*
> > + * Clear the Read Ack Register, OSPM will write it to 1 when
> > + * it acknowledges this error.
> > + */
>
> Indent issue.
>
> > + cpu_physical_memory_write(read_ack_register_addr,
> > + &read_ack_register, sizeof(uint64_t));
> > +
> > + /* Write the generic error data entry into guest memory */
> > + cpu_physical_memory_write(error_block_addr, cper, len);
> > +
> > + return;
> > +}
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-09-26 12:09 ` Jonathan Cameron via
@ 2024-09-26 14:24 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-26 14:24 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Thu, 26 Sep 2024 13:09:09 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 25 Sep 2024 06:04:17 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Instead, produce an error and continue working
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Make sense as defense in depth. Can we actually hit this for existing
> systems, or is the injection stuff disabled if the ged isn't configured?
What happens is that:
- with memory errors, this logic at acpi_ghes_memory_errors() will
report the error, just like error_report():
if (errp) {
error_report_err(errp);
return -1;
}
so, no practical changes.
- for injections via script, this will return the error via QMP
interface, preventing the guest crash.
The script can then handle it the way it wants (right now, it just
prints the error).
Thanks,
Mauro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes
2024-09-25 14:23 ` Jonathan Cameron via
@ 2024-10-01 5:29 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-10-01 5:29 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
qemu-devel
Em Wed, 25 Sep 2024 15:23:33 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 25 Sep 2024 06:04:13 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > The current code is actually dependent on having just one
> > error structure with a single source.
> >
> > As the number of sources should be arch-dependent, as it
> > will depend on what kind of synchronous/assynchronous
> > notifications will exist, change the logic to dynamically
> > build the table.
> Not really arch dependent. Depends on both arch and some
> firmware implementation choices, but I guess that detail
> doesn't matter here.
>
> >
> > Yet, for a proper support, we need to get the number of
> > sources by reading the number from the HEST table. However,
> > bios currently doesn't store a pointer to it.
> >
> > For now just change the logic at table build time, while
> > enforcing that it will behave like before with a single
> > source ID.
> >
> > A future patch will add a HEST table bios pointer and
> > change the logic at acpi_ghes_record_errors() to
> > dynamically use the new size.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Trivial comment inline
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> > @@ -335,9 +346,10 @@ static void build_ghes_v2(GArray *table_data,
> > build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> > 4 /* QWord access */, 0);
> > bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > - address_offset + GAS_ADDR_OFFSET,
> > - sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> > - (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
> > + address_offset + GAS_ADDR_OFFSET,
>
> I'd prefer if we avoided realigning unless absolutely necessary or
> that it is split into a separate patch.
> Makes things a tiny bit harder to review.
Heh, Igor nacked a patch doing the alignment change on a separate patch,
so let's do it at the patches that are actually changing the code.
At least for me, it is a low easier to review patches that are properly
aligned with parenthesis. So, yeah it may be a little more painful to
review a patch changing alignments, but IMO it pays off on future
revisions, specially if we place one argument per line, like in this
function.
>
> > + sizeof(uint64_t),
> > + ACPI_GHES_ERRORS_FW_CFG_FILE,
> > + (num_sources + index) * sizeof(uint64_t));
> >
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] acpi/ghes: move offset calculus to a separate function
2024-09-26 12:03 ` Jonathan Cameron via
@ 2024-10-01 5:38 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-10-01 5:38 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Thu, 26 Sep 2024 13:03:48 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 25 Sep 2024 06:04:15 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Currently, CPER address location is calculated as an offset of
> > the hardware_errors table. It is also badly named, as the
> > offset actually used is the address where the CPER data starts,
> > and not the beginning of the error source.
> >
> > Move the logic which calculates such offset to a separate
> > function, in preparation for a patch that will be changing the
> > logic to calculate it from the HEST table.
> >
> > While here, properly name the variable which stores the cper
> > address.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Trivial comment inline.
>
> Given this is a placeholder for more radical refactor I'll not comment on
> the maths etc being less flexible than it will hopefully end up!
Actually there will be two versions of the math calculus after the
next patch series:
1. one compatible with versions up to 9.1 that work with a single
source ID, using offsets calculated from the hardware_errors
table, which doesn't contain the number of sources. Such code
will be used only for migration. This is the one on this series;
2. one that will get the number of source IDs from the HEST table.
Such math will be added at the next patch series.
This requires a migration-incompatible change to store a
pointer to HEST table. The math there is flexible and should
work with all future changes, as it uses all offsets from the
HEST table, using the links there to the harware_errors firmware
file.
So, basically, the migration logic will check if a HEST pointer
is stored. If so, it will use (2). If not, it is because the VM
that was running on QEMU 9.1 had its state stored, and then
was recovered on QEMU 9.2. Such machine will then use the math
from (1), which supports a single source ID.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 14/15] better name the offset of the hardware error firmware
2024-09-26 12:12 ` Jonathan Cameron via
@ 2024-10-01 6:02 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 36+ messages in thread
From: Mauro Carvalho Chehab @ 2024-10-01 6:02 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Thu, 26 Sep 2024 13:12:25 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 25 Sep 2024 06:04:19 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > The hardware error firmware is where HEST error structures are
> > stored. Those can be GHESv2, but they can also be other types.
> >
> > Better name the location of the hardware error.
> >
> > No functional changes.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> This feels a little theoretical as for now they are always GHESv2 I think?
It is, but the new code that will be added on the next patch series
will allow future addition of other types as well, as it seeks for
the source ID inside the error block structures.
Yet, if we end adding other types, it will probably make sense to
rename ghes.c to hest.c.
> I guess it does no harm and may make sense after follow up series.
>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-10-01 14:58 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25 4:04 [PATCH 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-09-25 14:02 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
2024-09-25 14:09 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
2024-09-25 14:11 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
2024-09-25 14:13 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
2024-09-25 14:15 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
2024-09-25 14:15 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
2024-09-25 14:16 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 08/15] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
2024-09-25 14:23 ` Jonathan Cameron via
2024-10-01 5:29 ` Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 09/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-09-26 12:00 ` Jonathan Cameron via
2024-09-26 14:19 ` Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 10/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
2024-09-26 12:03 ` Jonathan Cameron via
2024-10-01 5:38 ` Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 11/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-26 12:07 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 12/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
2024-09-26 12:09 ` Jonathan Cameron via
2024-09-26 14:24 ` Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 13/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-26 12:11 ` Jonathan Cameron via
2024-09-25 4:04 ` [PATCH 14/15] better name the offset of the hardware error firmware Mauro Carvalho Chehab
2024-09-26 12:12 ` Jonathan Cameron via
2024-10-01 6:02 ` Mauro Carvalho Chehab
2024-09-25 4:04 ` [PATCH 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-09-26 12:13 ` Jonathan Cameron via
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).