* [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation
@ 2024-08-25 3:45 Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
` (11 more replies)
0 siblings, 12 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:45 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha, Cleber Rosa,
Dongjiu Geng, Eric Blake, Igor Mammedov, John Snow,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
This series add support for injecting generic CPER records. Such records
are generated outside QEMU via a provided script.
On this version, patches were reorganized to follow this pattern:
1. Addition of hest_add_le, mapping to the beginning of the HEST
error source structure size. This is the part of the HEST table that
contains the error source IDs to be notified;
2. GHESv2 code rework. It is basically a merge of all changes from v8,
except for GPIO notify, QEMU notifier and renames;
3. cleanups and renames for hw/acpi/ghes.c;
4. GPIO GED device creation logic;
5. Logic to inject errors via QMP;
6. Error injection script
On this version, the logic now better navigates the source IDs,
double-checking them at error injecting time. It is now easier to
add other HEST types if ever needed.
Also, the source ID is now guest-OS specific: ARM will use SEA and GPIO,
but x86 will likely use other notifiers and source IDs. Same will apply
to other processor types.
---
v9:
- Patches reorganized to make easier for reviewers;
- source ID is now guest-OS specific;
- Some patches got a revision history since v8;
- Several minor cleanups.
v8:
- Fix one of the BIOS links that were incorrect;
- Changed mem error internal injection to use a common code;
- No more hardcoded values for CPER: instead of using just the
payload at the QAPI, it now has the full raw CPER there;
- Error injection script now supports changing fields at the
Generic Error Data section of the CPER;
- Several minor cleanups.
v7:
- Change the way offsets are calculated and used on HEST table.
Now, it is compatible with migrations as all offsets are relative
to the HEST table;
- GHES interface is now more generic: the entire CPER is sent via
QMP, instead of just the payload;
- Some code cleanups to make the code more robust;
- The python script now uses QEMUMonitorProtocol class.
v6:
- PNP0C33 device creation moved to aml-build.c;
- acpi_ghes record functions now use ACPI notify parameter,
instead of source ID;
- the number of source IDs is now automatically calculated;
- some code cleanups and function/var renames;
- some fixes and cleanups at the error injection script;
- ghes cper stub now produces an error if cper JSON is not compiled;
- Offset calculation logic for GHES was refactored;
- Updated documentation to reflect the GHES allocated size;
- Added a x-mpidr object for QOM usage;
- Added a patch making usage of x-mpidr field at ARM injection
script;
v5:
- CPER guid is now passing as string;
- raw-data is now passed with base64 encode;
- Removed several GPIO left-overs from arm/virt.c changes;
- Lots of cleanups and improvements at the error injection script.
It now better handles QMP dialog and doesn't print debug messages.
Also, code was split on two modules, to make easier to add more
error injection commands.
v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.
v3:
- patch 1 cleanups with some comment changes and adding another place where
the poweroff GPIO define should be used. No changes on other patches (except
due to conflict resolution).
v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.
Example of generating a CPER record:
$ scripts/ghes_inject.py -d arm -p 0xdeadbeef
GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0
Generic Error Status Block (20 bytes):
00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................
00000010 00 00 00 00 ....
Generic Error Data Entry (72 bytes):
00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F.
00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000040 00 00 00 00 00 00 00 00 ........
Payload (72 bytes):
00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H.......
00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................
00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ......
00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................
00000040 ef be ad de 00 00 00 00 ........
Error injected.
[ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 9.359027] {1}[Hardware Error]: event severity: recoverable
[ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable
[ 9.360124] {1}[Hardware Error]: section_type: ARM processor error
[ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510
[ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[ 9.361643] {1}[Hardware Error]: running state: 0x0
[ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0
[ 9.362682] {1}[Hardware Error]: Error info structure 0:
[ 9.363030] {1}[Hardware Error]: num errors: 2
[ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error
[ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f
[ 9.364834] {1}[Hardware Error]: transaction type: Data Access
[ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write
[ 9.366441] {1}[Hardware Error]: cache level: 2
[ 9.367005] {1}[Hardware Error]: processor context not corrupted
[ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef
[ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered
Such script currently supports arm processor error CPER, but can easily be
extended to other GHES notification types.
Mauro Carvalho Chehab (12):
acpi/ghes: add a firmware file with HEST address
acpi/ghes: rework the logic to handle HEST source ID
acpi/ghes: rename etc/hardware_error file macros
acpi/ghes: better name GHES memory error function
acpi/ghes: add a notifier to notify when error data is ready
acpi/generic_event_device: add an APEI error device
arm/virt: Wire up a GED error device for ACPI / GHES
qapi/acpi-hest: add an interface to do generic CPER error injection
docs: acpi_hest_ghes: fix documentation for CPER size
scripts/ghes_inject: add a script to generate GHES error inject
target/arm: add an experimental mpidr arm cpu property object
scripts/arm_processor_error.py: retrieve mpidr if not filled
MAINTAINERS | 10 +
docs/specs/acpi_hest_ghes.rst | 6 +-
hw/acpi/Kconfig | 5 +
hw/acpi/aml-build.c | 10 +
hw/acpi/generic_event_device.c | 8 +
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 309 +++++++----
hw/acpi/ghes_cper.c | 32 ++
hw/acpi/ghes_cper_stub.c | 19 +
hw/acpi/meson.build | 2 +
hw/arm/Kconfig | 5 +
hw/arm/virt-acpi-build.c | 12 +-
hw/arm/virt.c | 12 +-
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/aml-build.h | 2 +
include/hw/acpi/generic_event_device.h | 1 +
include/hw/acpi/ghes.h | 28 +-
include/hw/arm/virt.h | 10 +
qapi/acpi-hest.json | 36 ++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
scripts/arm_processor_error.py | 388 ++++++++++++++
scripts/ghes_inject.py | 51 ++
scripts/qmp_helper.py | 702 +++++++++++++++++++++++++
target/arm/cpu.c | 1 +
target/arm/cpu.h | 1 +
target/arm/helper.c | 10 +-
target/arm/kvm.c | 3 +-
28 files changed, 1539 insertions(+), 129 deletions(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/acpi-hest.json
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
--
2.46.0
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
@ 2024-08-25 3:45 ` Mauro Carvalho Chehab
2024-09-11 13:51 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
` (10 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:45 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
Store HEST table address at GPA, placing its content at
hest_addr_le variable.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Change from v8:
- hest_addr_lr is now pointing to the error source size and data.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 15 +++++++++++++++
include/hw/acpi/ghes.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..529c14e3289f 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
@@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
acpi_table_begin(&table, table_data);
+ int hest_offset = table_data->len;
+
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
acpi_table_end(linker, &table);
+
+ /*
+ * tell firmware to write into GPA the address of HEST via fw_cfg,
+ * once initialized.
+ */
+ bios_linker_loader_write_pointer(linker,
+ ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_BUILD_TABLE_FILE, hest_offset);
}
void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+ fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+ NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+
ags->present = true;
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..28b956acb19a 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -63,6 +63,7 @@ enum {
};
typedef struct AcpiGhesState {
+ uint64_t hest_addr_le;
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-08-25 3:45 ` Mauro Carvalho Chehab
2024-09-11 15:01 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
` (9 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:45 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
The current logic is based on a lot of duct tape, with
offsets calculated based on one define with the number of
source IDs and an enum.
Rewrite the logic in a way that it would be more resilient
of code changes, by moving the source ID count to an enum
and make the offset calculus more explicit.
Such change was inspired on a patch from Jonathan Cameron
splitting the logic to get the CPER address on a separate
function, as this will be needed to support generic error
injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 275 ++++++++++++++++++++++++---------------
hw/arm/virt-acpi-build.c | 10 +-
include/hw/acpi/ghes.h | 18 +--
include/hw/arm/virt.h | 7 +
target/arm/kvm.c | 3 +-
5 files changed, 198 insertions(+), 115 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 529c14e3289f..965fb1b36587 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -35,9 +35,6 @@
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT 1
-
/* Generic Hardware Error Source version 2 */
#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
@@ -64,6 +61,19 @@
*/
#define ACPI_GHES_GESB_SIZE 20
+/*
+ * Offsets with regards to the start of the HEST table stored at
+ * ags->hest_addr_le, according with the memory layout map at
+ * docs/specs/acpi_hest_ghes.rst.
+ */
+
+/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */
+#define HEST_GHES_V2_TABLE_SIZE 92
+#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
+
+/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */
+#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
+
/*
* Values for error_severity field
*/
@@ -185,51 +195,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
build_append_int_noprefix(table, 0, 7);
}
-static int acpi_ghes_record_mem_error(uint64_t error_block_address,
- uint64_t error_physical_addr)
+static void
+ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
+ const uint8_t *section_type,
+ int data_length)
{
- GArray *block;
-
- /* Memory Error Section Type */
- const uint8_t uefi_cper_mem_sec[] =
- UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
- 0xED, 0x7C, 0x83, 0xB1);
-
/* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
* Table 17-13 Generic Error Data Entry
*/
QemuUUID fru_id = {};
- uint32_t data_length;
- block = g_array_new(false, true /* clear */, 1);
-
- /* This is the length if adding a new generic error data entry*/
- data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
/*
- * It should not run out of the preallocated memory if adding a new generic
- * error data entry
+ * Calculate the size with this block. No need to check for
+ * too big CPER, as CPER size is checked at ghes_record_cper_errors()
*/
- assert((data_length + ACPI_GHES_GESB_SIZE) <=
- ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ data_length += ACPI_GHES_GESB_SIZE;
/* Build the new generic error status block header */
acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
/* Build this new generic error data entry header */
- acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
+ acpi_ghes_generic_error_data(block, section_type,
ACPI_CPER_SEV_RECOVERABLE, 0, 0,
ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-
- /* Build the memory section CPER for above new generic error data entry */
- acpi_ghes_build_append_mem_cper(block, error_physical_addr);
-
- /* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_address, block->data, block->len);
-
- g_array_free(block, true);
-
- return 0;
}
/*
@@ -237,17 +226,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+ int num_sources)
{
int i, error_status_block_offset;
/* Build error_block_address */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
}
/* Build read_ack_register */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Initialize the value of read_ack_register to 1, so GHES can be
* writable after (re)boot.
@@ -262,13 +252,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
/* Reserve space for Error Status Data Block */
acpi_data_push(hardware_errors,
- ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+ ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Tell firmware to patch error_block_address entries to point to
* corresponding "Generic Error Status Block"
@@ -283,14 +273,20 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
* tell firmware to write hardware_errors GPA into
* hardware_errors_addr fw_cfg, once the former has been initialized.
*/
- bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
- 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+ bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
+static void build_ghes_v2(GArray *table_data,
+ BIOSLinker *linker,
+ enum AcpiGhesNotifyType notify,
+ uint16_t source_id,
+ int num_sources)
{
uint64_t address_offset;
+
/*
* Type:
* Generic Hardware Error Source version 2(GHESv2 - Type 10)
@@ -317,21 +313,13 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_GHES_ERRORS_FW_CFG_FILE,
+ source_id * sizeof(uint64_t));
- switch (source_id) {
- case ACPI_HEST_SRC_ID_SEA:
- /*
- * Notification Structure
- * Now only enable ARMv8 SEA notification type
- */
- build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
- break;
- default:
- error_report("Not support this error source");
- abort();
- }
+ /* Notification Structure */
+ build_ghes_hw_error_notification(table_data, notify);
/* Error Status Block Length */
build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
@@ -345,9 +333,11 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_GHES_ERRORS_FW_CFG_FILE,
+ (num_sources + source_id) *
+ sizeof(uint64_t));
/*
* Read Ack Preserve field
@@ -360,19 +350,28 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
}
/* Build Hardware Error Source Table */
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
+ const uint16_t * const notify,
+ int num_sources,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ int i;
+
+ build_ghes_error_table(hardware_errors, linker, num_sources);
acpi_table_begin(&table, table_data);
+ /* Beginning at the HEST Error Source struct count and data */
int hest_offset = table_data->len;
/* Error Source Count */
- build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+ build_append_int_noprefix(table_data, num_sources, 4);
+ for (i = 0; i < num_sources; i++) {
+ build_ghes_v2(table_data, linker, notify[i], i, num_sources);
+ }
acpi_table_end(linker, &table);
@@ -403,60 +402,132 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp)
{
- uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
- uint64_t start_addr;
- bool ret = -1;
+ uint64_t hest_read_ack_start_addr, read_ack_start_addr;
+ uint64_t hest_addr, cper_addr, err_source_struct;
+ uint64_t hest_err_block_addr, error_block_addr;
+ uint32_t num_sources, i;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
+ uint64_t read_ack;
- assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
+ if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+ error_setg(errp, "GHES CPER record is too big: %ld", len);
+ }
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
g_assert(acpi_ged_state);
ags = &acpi_ged_state->ghes_state;
- start_addr = le64_to_cpu(ags->ghes_addr_le);
-
- if (physical_address) {
-
- if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
- start_addr += source_id * sizeof(uint64_t);
- }
-
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
-
- error_block_addr = le64_to_cpu(error_block_addr);
-
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
-
- cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
-
- /* zero means OSPM does not acknowledge the error */
- if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
-
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else
- error_report("can not find Generic Error Status Block");
+ hest_addr = le64_to_cpu(ags->hest_addr_le);
+
+ cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
+
+ if (source_id >= num_sources) {
+ error_setg(errp,
+ "GHES: Source %d not found. Only %d sources are defined",
+ source_id, num_sources);
+ return;
+ }
+ err_source_struct = hest_addr + sizeof(num_sources);
+
+ for (i = 0; i < num_sources; i++) {
+ uint64_t addr = err_source_struct;
+ uint16_t type, src_id;
+
+ cpu_physical_memory_read(addr, &type, sizeof(type));
+
+ /* For now, we only know the size of GHESv2 table */
+ assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
+
+ /* It is GHES. Compare CPER source address */
+ addr += sizeof(type);
+ cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
+
+ if (src_id == source_id)
+ break;
+
+ err_source_struct += HEST_GHES_V2_TABLE_SIZE;
+ }
+ if (i == num_sources) {
+ error_setg(errp, "HEST: Source %d not found.", source_id);
+ return;
+ }
+
+ /* Check if BIOS addr pointers were properly generated */
+
+ hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
+ hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
+
+ cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+ sizeof(error_block_addr));
+
+ cpu_physical_memory_read(error_block_addr, &cper_addr,
+ sizeof(error_block_addr));
+
+ cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
+ sizeof(read_ack_start_addr));
+
+ /* Update ACK offset to notify about a new error */
+
+ cpu_physical_memory_read(read_ack_start_addr,
+ &read_ack, sizeof(read_ack));
+
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack) {
+ error_setg(errp,
+ "Last CPER record was not acknowledged yet");
+ read_ack = 1;
+ cpu_physical_memory_write(read_ack_start_addr,
+ &read_ack, sizeof(read_ack));
+ return;
+ }
+
+ read_ack = cpu_to_le64(0);
+ cpu_physical_memory_write(read_ack_start_addr,
+ &read_ack, sizeof(read_ack));
+
+ /* Write the generic error data entry into guest memory */
+ cpu_physical_memory_write(cper_addr, cper, len);
+}
+
+int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+{
+ /* Memory Error Section Type */
+ const uint8_t guid[] =
+ UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+ 0xED, 0x7C, 0x83, 0xB1);
+ Error *errp = NULL;
+ GArray *block;
+
+ if (!physical_address) {
+ error_report("can not find Generic Error Status Block for source id %d",
+ source_id);
+ return -1;
+ }
+
+ block = g_array_new(false, true /* clear */, 1);
+
+ ghes_gen_err_data_uncorrectable_recoverable(block, guid,
+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+ /* Build the memory section CPER for above new generic error data entry */
+ acpi_ghes_build_append_mem_cper(block, physical_address);
+
+ /* Report the error */
+ ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+
+ g_array_free(block, true);
+
+ if (errp) {
+ error_report_err(errp);
+ return -1;
}
- return ret;
+ return 0;
}
bool acpi_ghes_present(void)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..39100c2822c2 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -890,6 +890,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
}
+static const uint16_t hest_ghes_notify[] = {
+ [ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA,
+};
+
static
void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
{
@@ -943,10 +947,10 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
build_dbg2(tables_blob, tables->linker, vms);
if (vms->ras) {
- build_ghes_error_table(tables->hardware_errors, tables->linker);
acpi_add_table(table_offsets, tables_blob);
- acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
- vms->oem_table_id);
+ acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ hest_ghes_notify, sizeof(hest_ghes_notify),
+ vms->oem_id, vms->oem_table_id);
}
if (ms->numa_state->num_nodes > 0) {
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 28b956acb19a..4b5af86ec077 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
/*
* Values for Hardware Error Notification Type field
@@ -56,24 +57,23 @@ enum AcpiGhesNotifyType {
ACPI_GHES_NOTIFY_RESERVED = 12
};
-enum {
- ACPI_HEST_SRC_ID_SEA = 0,
- /* future ids go here */
- ACPI_HEST_SRC_ID_RESERVED,
-};
-
typedef struct AcpiGhesState {
uint64_t hest_addr_le;
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
+ const uint16_t * const notify,
+ int num_sources,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(int source_id,
+ uint64_t error_physical_addr);
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a4d937ed45ac..d62d8d9db5ae 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -189,6 +189,13 @@ OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
void virt_acpi_setup(VirtMachineState *vms);
bool virt_is_acpi_enabled(VirtMachineState *vms);
+/*
+ * ID numbers used to fill HEST source ID field
+ */
+enum {
+ ARM_ACPI_HEST_SRC_ID_SEA,
+};
+
/* Return number of redistributors that fit in the specified region */
static uint32_t virt_redist_capacity(VirtMachineState *vms, int region)
{
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 849e2e21b304..8c4c8263b85a 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+ if (!acpi_ghes_record_errors(ARM_ACPI_HEST_SRC_ID_SEA,
+ paddr)) {
kvm_inject_arm_sea(c);
} else {
error_report("failed to record the error");
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
@ 2024-08-25 3:45 ` Mauro Carvalho Chehab
2024-09-13 13:27 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 04/12] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
` (8 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:45 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
Now that we have also have a file to store HEST data location,
which is part of GHES, better name the file where CPER records
are stored.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 965fb1b36587..3190eb954de4 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -28,8 +28,8 @@
#include "hw/nvram/fw_cfg.h"
#include "qemu/uuid.h"
-#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
-#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
+#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
/* The max size in bytes for one error block */
@@ -255,7 +255,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
- bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+ bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
for (i = 0; i < num_sources; i++) {
@@ -264,8 +264,8 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
* corresponding "Generic Error Status Block"
*/
bios_linker_loader_add_pointer(linker,
- ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
+ ACPI_HW_ERROR_FW_CFG_FILE, sizeof(uint64_t) * i,
+ sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE,
error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
}
@@ -273,9 +273,9 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
* tell firmware to write hardware_errors GPA into
* hardware_errors_addr fw_cfg, once the former has been initialized.
*/
- bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0,
+ bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+ ACPI_HW_ERROR_FW_CFG_FILE, 0);
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
@@ -315,7 +315,7 @@ static void build_ghes_v2(GArray *table_data,
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE,
+ ACPI_HW_ERROR_FW_CFG_FILE,
source_id * sizeof(uint64_t));
/* Notification Structure */
@@ -335,7 +335,7 @@ static void build_ghes_v2(GArray *table_data,
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE,
+ ACPI_HW_ERROR_FW_CFG_FILE,
(num_sources + source_id) *
sizeof(uint64_t));
@@ -389,11 +389,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
GArray *hardware_error)
{
/* Create a read-only fw_cfg file for GHES */
- fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+ fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
hardware_error->len);
/* Create a read-write fw_cfg file for Address */
- fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+ fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 04/12] acpi/ghes: better name GHES memory error function
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-08-25 3:45 ` [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-08-25 3:45 ` Mauro Carvalho Chehab
2024-09-13 13:31 ` Igor Mammedov
2024-08-25 3:46 ` [PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
` (7 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:45 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, Paolo Bonzini, Peter Maydell, kvm,
linux-kernel, qemu-arm, qemu-devel
The current function used to generate GHES data is specific for
memory errors. Give a better name for it, as we now have a generic
function as well.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 4 ++--
target/arm/kvm.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index c315de1802d6..dd41b3fd91df 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint8_t source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 3190eb954de4..10ed9c0614ff 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -494,7 +494,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
cpu_physical_memory_write(cper_addr, cper, len);
}
-int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
{
/* Memory Error Section Type */
const uint8_t guid[] =
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 4b5af86ec077..be53b7c53c91 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -70,7 +70,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(int source_id,
+int acpi_ghes_memory_errors(int source_id,
uint64_t error_physical_addr);
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
@@ -79,7 +79,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
* acpi_ghes_present: Report whether ACPI GHES table is present
*
* Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_record_errors() to record a memory error.
+ * safe to call acpi_ghes_memory_errors() to record a memory error.
*/
bool acpi_ghes_present(void);
#endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 8c4c8263b85a..8e63e9a59a5e 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2373,7 +2373,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_record_errors(ARM_ACPI_HEST_SRC_ID_SEA,
+ if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SEA,
paddr)) {
kvm_inject_arm_sea(c);
} else {
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (3 preceding siblings ...)
2024-08-25 3:45 ` [PATCH v9 04/12] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 06/12] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
` (6 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
Some error injection notify methods are async, like GPIO
notify. Add a notifier to be used when the error record is
ready to be sent to the guest OS.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 5 +++++
include/hw/acpi/ghes.h | 3 +++
2 files changed, 8 insertions(+)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 10ed9c0614ff..2b7103a678a1 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -402,6 +402,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
+NotifierList acpi_generic_error_notifiers =
+ NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
@@ -492,6 +495,8 @@ void ghes_record_cper_errors(const void *cper, size_t len,
/* Write the generic error data entry into guest memory */
cpu_physical_memory_write(cper_addr, cper, len);
+
+ notifier_list_notify(&acpi_generic_error_notifiers, NULL);
}
int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index be53b7c53c91..b1ec9795270f 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -24,6 +24,9 @@
#include "hw/acpi/bios-linker-loader.h"
#include "qapi/error.h"
+#include "qemu/notify.h"
+
+extern NotifierList acpi_generic_error_notifiers;
/*
* Values for Hardware Error Notification Type field
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 06/12] acpi/generic_event_device: add an APEI error device
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (4 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
` (5 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Igor Mammedov, linux-kernel, qemu-devel, Jonathan Cameron
Adds a generic error device to handle generic hardware error
events as specified at ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.
The PNP0C33 device is used to report hardware errors to
the guest via ACPI APEI Generic Hardware Error Source (GHES).
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/aml-build.c | 10 ++++++++++
hw/acpi/generic_event_device.c | 8 ++++++++
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/aml-build.h | 2 ++
include/hw/acpi/generic_event_device.h | 1 +
5 files changed, 22 insertions(+)
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6d4517cfbe3d..222a933a8760 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
return var;
}
+
+/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
+Aml *aml_error_device(void)
+{
+ Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+ aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+ return dev;
+}
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 15b4c3ebbf24..b4c83a089a02 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
ACPI_GED_PWR_DOWN_EVT,
ACPI_GED_NVDIMM_HOTPLUG_EVT,
ACPI_GED_CPU_HOTPLUG_EVT,
+ ACPI_GED_ERROR_EVT,
};
/*
@@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
aml_int(0x80)));
break;
+ case ACPI_GED_ERROR_EVT:
+ aml_append(if_ctx,
+ aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+ aml_int(0x80)));
+ break;
case ACPI_GED_NVDIMM_HOTPLUG_EVT:
aml_append(if_ctx,
aml_notify(aml_name("\\_SB.NVDR"),
@@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
sel = ACPI_GED_MEM_HOTPLUG_EVT;
} else if (ev & ACPI_POWER_DOWN_STATUS) {
sel = ACPI_GED_PWR_DOWN_EVT;
+ } else if (ev & ACPI_GENERIC_ERROR) {
+ sel = ACPI_GED_ERROR_EVT;
} else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
} else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
ACPI_NVDIMM_HOTPLUG_STATUS = 16,
ACPI_VMGENID_CHANGE_STATUS = 32,
ACPI_POWER_DOWN_STATUS = 64,
+ ACPI_GENERIC_ERROR = 128,
} AcpiEventStatusBits;
#define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index a3784155cb33..44d1a6af0c69 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -252,6 +252,7 @@ struct CrsRangeSet {
/* Consumer/Producer */
#define AML_SERIAL_BUS_FLAG_CONSUME_ONLY (1 << 1)
+#define ACPI_APEI_ERROR_DEVICE "GEDD"
/**
* init_aml_allocator:
*
@@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
uint8_t channel);
Aml *aml_sleep(uint64_t msec);
Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
+Aml *aml_error_device(void);
/* Block AML object primitives */
Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
index 40af3550b56d..9ace8fe70328 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
#define ACPI_GED_PWR_DOWN_EVT 0x2
#define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
#define ACPI_GED_CPU_HOTPLUG_EVT 0x8
+#define ACPI_GED_ERROR_EVT 0x10
typedef struct GEDState {
MemoryRegion evt;
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (5 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 06/12] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
` (4 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Igor Mammedov, Peter Maydell, Shannon Zhao, linux-kernel,
qemu-arm, qemu-devel, Jonathan Cameron
Adds support to ARM virtualization to allow handling
generic error ACPI Event via GED & error source device.
It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
---
Changes from v8:
- Added a call to the function that produces GHES generic
records, as this is now added earlier in this series.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/arm/virt-acpi-build.c | 1 +
hw/arm/virt.c | 12 +++++++++++-
include/hw/arm/virt.h | 1 +
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 39100c2822c2..9c36da3c831f 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
}
acpi_dsdt_add_power_button(scope);
+ aml_append(scope, aml_error_device());
#ifdef CONFIG_TPM
acpi_dsdt_add_tpm(scope, vms);
#endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 687fe0bb8bc9..22448e5c5b73 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
DeviceState *dev;
MachineState *ms = MACHINE(vms);
int irq = vms->irqmap[VIRT_ACPI_GED];
- uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+ uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
if (ms->ram_slots) {
event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
}
}
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+ VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
+
+ acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+}
+
static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
uint32_t phandle)
{
@@ -2385,6 +2392,9 @@ static void machvirt_init(MachineState *machine)
if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
vms->acpi_dev = create_acpi_ged(vms);
+ vms->generic_error_notifier.notify = virt_generic_error_req;
+ notifier_list_add(&acpi_generic_error_notifiers,
+ &vms->generic_error_notifier);
} else {
create_gpio_devices(vms, VIRT_GPIO, sysmem);
}
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index d62d8d9db5ae..1c682d43fdac 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -175,6 +175,7 @@ struct VirtMachineState {
DeviceState *gic;
DeviceState *acpi_dev;
Notifier powerdown_notifier;
+ Notifier generic_error_notifier;
PCIBus *bus;
char *oem_id;
char *oem_table_id;
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (6 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-27 11:11 ` Markus Armbruster
2024-08-25 3:46 ` [PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
` (3 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Eric Blake, Igor Mammedov, Markus Armbruster,
Michael Roth, Paolo Bonzini, Peter Maydell, Shannon Zhao,
linux-kernel, qemu-arm, qemu-devel, Jonathan Cameron, Shiju Jose
Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2, and add support for it for ARM guests.
Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
independent. This is mapped at arch virt bindings, depending on the
types supported by QEMU and by the BIOS. So, on ARM, this is supported
via ACPI_GHES_NOTIFY_GPIO notification type.
This patch is co-authored:
- original ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- generic GHESv2 error inject by Mauro Carvalho Chehab;
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 7 +++++++
hw/acpi/Kconfig | 5 +++++
hw/acpi/ghes_cper.c | 32 ++++++++++++++++++++++++++++++++
hw/acpi/ghes_cper_stub.c | 19 +++++++++++++++++++
hw/acpi/meson.build | 2 ++
hw/arm/Kconfig | 5 +++++
hw/arm/virt-acpi-build.c | 1 +
include/hw/acpi/ghes.h | 4 ++++
include/hw/arm/virt.h | 2 ++
qapi/acpi-hest.json | 36 ++++++++++++++++++++++++++++++++++++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
12 files changed, 115 insertions(+)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/acpi-hest.json
diff --git a/MAINTAINERS b/MAINTAINERS
index 3584d6a6c6da..1d8091818899 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c
F: include/hw/acpi/ghes.h
F: docs/specs/acpi_hest_ghes.rst
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/acpi-hest.json
+
ppc4xx
L: qemu-ppc@nongnu.org
S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index e07d3204eb36..73ffbb82c150 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
bool
depends on ACPI
+config GHES_CPER
+ bool
+ depends on ACPI_APEI
+ default y
+
config ACPI_PCI
bool
depends on ACPI && PCI
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index 000000000000..a10a5e7ab29b
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,32 @@
+/*
+ * CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_ghes_cper(const char *qmp_cper, Error **errp)
+{
+
+ uint8_t *cper;
+ size_t len;
+
+ cper = qbase64_decode(qmp_cper, -1, &len, errp);
+ if (!cper) {
+ error_setg(errp, "missing GHES CPER payload");
+ return;
+ }
+
+ ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index 000000000000..36138c462ac9
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,19 @@
+/*
+ * Stub interface for CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_ghes_cper(const char *cper, Error **errp)
+{
+ error_setg(errp, "GHES QMP error inject is not compiled in");
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fa5c07db9068..6cbf430eb66d 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
+system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
+system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
system_ss.add(files('acpi-qmp-cmds.c'))
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 1ad60da7aa2d..bed6ba27d715 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -712,3 +712,8 @@ config ARMSSE
select UNIMP
select SSE_COUNTER
select SSE_TIMER
+
+config GHES_CPER
+ bool
+ depends on ARM
+ default y if AARCH64
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 9c36da3c831f..a6b03b16d922 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -893,6 +893,7 @@ static void acpi_align_size(GArray *blob, unsigned align)
static const uint16_t hest_ghes_notify[] = {
[ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA,
+ [ARM_ACPI_HEST_SRC_ID_GPIO] = ACPI_GHES_NOTIFY_GPIO,
};
static
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index b1ec9795270f..ea6b33e133d6 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -66,6 +66,10 @@ typedef struct AcpiGhesState {
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
+
+/* Source ID associated with qapi/acpi-hest.json QMP error injection */
+#define ACPI_HEST_SRC_ID_QMP 0
+
void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
BIOSLinker *linker,
const uint16_t * const notify,
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 1c682d43fdac..8b042053dfa1 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -33,6 +33,7 @@
#include "exec/hwaddr.h"
#include "qemu/notify.h"
#include "hw/boards.h"
+#include "hw/acpi/ghes.h"
#include "hw/arm/boot.h"
#include "hw/arm/bsa.h"
#include "hw/block/flash.h"
@@ -194,6 +195,7 @@ bool virt_is_acpi_enabled(VirtMachineState *vms);
* ID numbers used to fill HEST source ID field
*/
enum {
+ ARM_ACPI_HEST_SRC_ID_GPIO = ACPI_HEST_SRC_ID_QMP,
ARM_ACPI_HEST_SRC_ID_SEA,
};
diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json
new file mode 100644
index 000000000000..0255695c95ad
--- /dev/null
+++ b/qapi/acpi-hest.json
@@ -0,0 +1,36 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = GHESv2 CPER Error Injection
+#
+# Defined since ACPI Specification 6.1,
+# section 18.3.2.8 Generic Hardware Error Source version 2. See:
+#
+# https://uefi.org/sites/default/files/resources/ACPI_6_1.pdf
+##
+
+
+##
+# @ghes-cper:
+#
+# Inject a CPER error data to be filled according to ACPI 6.1
+# spec via GHESv2.
+#
+# @cper: contains a base64 encoded string with raw data for a single CPER
+# record with Generic Error Status Block, Generic Error Data Entry and
+# generic error data payload, as described at
+# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.2
+##
+{ 'command': 'ghes-cper',
+ 'data': {
+ 'cper': 'str'
+ },
+ 'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..35cea6147262 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -59,6 +59,7 @@ qapi_all_modules = [
if have_system
qapi_all_modules += [
'acpi',
+ 'acpi-hest',
'audio',
'cryptodev',
'qdev',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..baf19ab73afe 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -75,6 +75,7 @@
{ 'include': 'misc-target.json' }
{ 'include': 'audio.json' }
{ 'include': 'acpi.json' }
+{ 'include': 'acpi-hest.json' }
{ 'include': 'pci.json' }
{ 'include': 'stats.json' }
{ 'include': 'virtio.json' }
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (7 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
` (2 subsequent siblings)
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Dongjiu Geng, linux-kernel, qemu-arm,
qemu-devel, Igor Mammedov
While the spec defines a CPER size of 4KiB for each record,
currently it is set to 1KiB. Fix the documentation and add
a pointer to the macro name there, as this may help to keep
it updated.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
---
docs/specs/acpi_hest_ghes.rst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index 68f1fbe0a4af..c3e9f8d9a702 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -67,8 +67,10 @@ Design Details
(3) The address registers table contains N Error Block Address entries
and N Read Ack Register entries. The size for each entry is 8-byte.
The Error Status Data Block table contains N Error Status Data Block
- entries. The size for each entry is 4096(0x1000) bytes. The total size
- for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+ entries. The size for each entry is defined at the source code as
+ ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+ for the "etc/hardware_errors" fw_cfg blob is
+ (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
N is the number of the kinds of hardware error sources.
(4) QEMU generates the ACPI linker/loader script for the firmware. The
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (8 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Cleber Rosa, John Snow, linux-kernel,
qemu-devel
Using the QMP GHESv2 API requires preparing a raw data array
containing a CPER record.
Add a helper script with subcommands to prepare such data.
Currently, only ARM Processor error CPER record is supported.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 3 +
scripts/arm_processor_error.py | 377 ++++++++++++++++++
scripts/ghes_inject.py | 51 +++
scripts/qmp_helper.py | 702 +++++++++++++++++++++++++++++++++
4 files changed, 1133 insertions(+)
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
diff --git a/MAINTAINERS b/MAINTAINERS
index 1d8091818899..249ed2858198 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2083,6 +2083,9 @@ S: Maintained
F: hw/arm/ghes_cper.c
F: hw/acpi/ghes_cper_stub.c
F: qapi/acpi-hest.json
+F: scripts/ghes_inject.py
+F: scripts/arm_processor_error.py
+F: scripts/qmp_helper.py
ppc4xx
L: qemu-ppc@nongnu.org
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
new file mode 100644
index 000000000000..62e0c5662232
--- /dev/null
+++ b/scripts/arm_processor_error.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+# TODO: current implementation has dummy defaults.
+#
+# For a better implementation, a QMP addition/call is needed to
+# retrieve some data for ARM Processor Error injection:
+#
+# - ARM registers: power_state, mpidr.
+
+import argparse
+import re
+
+from qmp_helper import qmp, util, cper_guid
+
+class ArmProcessorEinj:
+ """
+ Implements ARM Processor Error injection via GHES
+ """
+
+ DESC = """
+ Generates an ARM processor error CPER, compatible with
+ UEFI 2.9A Errata.
+ """
+
+ ACPI_GHES_ARM_CPER_LENGTH = 40
+ ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
+
+ # Context types
+ CONTEXT_AARCH32_EL1 = 1
+ CONTEXT_AARCH64_EL1 = 5
+ CONTEXT_MISC_REG = 8
+
+ def __init__(self, subparsers):
+ """Initialize the error injection class and add subparser"""
+
+ # Valid choice values
+ self.arm_valid_bits = {
+ "mpidr": util.bit(0),
+ "affinity": util.bit(1),
+ "running": util.bit(2),
+ "vendor": util.bit(3),
+ }
+
+ self.pei_flags = {
+ "first": util.bit(0),
+ "last": util.bit(1),
+ "propagated": util.bit(2),
+ "overflow": util.bit(3),
+ }
+
+ self.pei_error_types = {
+ "cache": util.bit(1),
+ "tlb": util.bit(2),
+ "bus": util.bit(3),
+ "micro-arch": util.bit(4),
+ }
+
+ self.pei_valid_bits = {
+ "multiple-error": util.bit(0),
+ "flags": util.bit(1),
+ "error-info": util.bit(2),
+ "virt-addr": util.bit(3),
+ "phy-addr": util.bit(4),
+ }
+
+ self.data = bytearray()
+
+ parser = subparsers.add_parser("arm", description=self.DESC)
+
+ arm_valid_bits = ",".join(self.arm_valid_bits.keys())
+ flags = ",".join(self.pei_flags.keys())
+ error_types = ",".join(self.pei_error_types.keys())
+ pei_valid_bits = ",".join(self.pei_valid_bits.keys())
+
+ # UEFI N.16 ARM Validation bits
+ g_arm = parser.add_argument_group("ARM processor")
+ g_arm.add_argument("--arm", "--arm-valid",
+ help=f"ARM valid bits: {arm_valid_bits}")
+ g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level",
+ type=lambda x: int(x, 0),
+ help="Affinity level (when multiple levels apply)")
+ g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
+ help="Multiprocessor Affinity Register")
+ g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
+ help="Main ID Register")
+ g_arm.add_argument("-r", "--running",
+ action=argparse.BooleanOptionalAction,
+ default=None,
+ help="Indicates if the processor is running or not")
+ g_arm.add_argument("--psci", "--psci-state",
+ type=lambda x: int(x, 0),
+ help="Power State Coordination Interface - PSCI state")
+
+ # TODO: Add vendor-specific support
+
+ # UEFI N.17 bitmaps (type and flags)
+ g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
+ g_pei.add_argument("-t", "--type", nargs="+",
+ help=f"one or more error types: {error_types}")
+ g_pei.add_argument("-f", "--flags", nargs="*",
+ help=f"zero or more error flags: {flags}")
+ g_pei.add_argument("-V", "--pei-valid", "--error-valid", nargs="*",
+ help=f"zero or more PEI valid bits: {pei_valid_bits}")
+
+ # UEFI N.17 Integer values
+ g_pei.add_argument("-m", "--multiple-error", nargs="+",
+ help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")
+ g_pei.add_argument("-e", "--error-info", nargs="+",
+ help="Error information (UEFI 2.10 tables N.18 to N.20)")
+ g_pei.add_argument("-p", "--physical-address", nargs="+",
+ help="Physical address")
+ g_pei.add_argument("-v", "--virtual-address", nargs="+",
+ help="Virtual address")
+
+ # UEFI N.21 Context
+ g_ctx = parser.add_argument_group("Processor Context")
+ g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
+ help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")
+ g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
+ help="Minimal size of the context")
+ g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
+ help="Comma-separated arrays for each context")
+
+ # Vendor-specific data
+ g_vendor = parser.add_argument_group("Vendor-specific data")
+ g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
+ help="Vendor-specific byte arrays of data")
+
+ # Add arguments for Generic Error Data
+ qmp.argparse(parser)
+
+ parser.set_defaults(func=self.send_cper)
+
+ def send_cper(self, args):
+ """Parse subcommand arguments and send a CPER via QMP"""
+
+ qmp_cmd = qmp(args.host, args.port, args.debug)
+
+ # Handle Generic Error Data arguments if any
+ qmp_cmd.set_args(args)
+
+ is_cpu_type = re.compile(r"^([\w+]+\-)?arm\-cpu$")
+ cpus = qmp_cmd.search_qom("/machine/unattached/device",
+ "type", is_cpu_type)
+
+ cper = {}
+ pei = {}
+ ctx = {}
+ vendor = {}
+
+ arg = vars(args)
+
+ # Handle global parameters
+ if args.arm:
+ arm_valid_init = False
+ cper["valid"] = util.get_choice(name="valid",
+ value=args.arm,
+ choices=self.arm_valid_bits,
+ suffixes=["-error", "-err"])
+ else:
+ cper["valid"] = 0
+ arm_valid_init = True
+
+ if "running" in arg:
+ if args.running:
+ cper["running-state"] = util.bit(0)
+ else:
+ cper["running-state"] = 0
+ else:
+ cper["running-state"] = 0
+
+ if arm_valid_init:
+ if args.affinity:
+ cper["valid"] |= self.arm_valid_bits["affinity"]
+
+ if args.mpidr:
+ cper["valid"] |= self.arm_valid_bits["mpidr"]
+
+ if "running-state" in cper:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ if args.psci:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ # Handle PEI
+ if not args.type:
+ args.type = ["cache-error"]
+
+ util.get_mult_choices(
+ pei,
+ name="valid",
+ values=args.pei_valid,
+ choices=self.pei_valid_bits,
+ suffixes=["-valid", "--addr"],
+ )
+ util.get_mult_choices(
+ pei,
+ name="type",
+ values=args.type,
+ choices=self.pei_error_types,
+ suffixes=["-error", "-err"],
+ )
+ util.get_mult_choices(
+ pei,
+ name="flags",
+ values=args.flags,
+ choices=self.pei_flags,
+ suffixes=["-error", "-cap"],
+ )
+ util.get_mult_int(pei, "error-info", args.error_info)
+ util.get_mult_int(pei, "multiple-error", args.multiple_error)
+ util.get_mult_int(pei, "phy-addr", args.physical_address)
+ util.get_mult_int(pei, "virt-addr", args.virtual_address)
+
+ # Handle context
+ util.get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
+ util.get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
+ util.get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
+
+ util.get_mult_array(vendor, "bytes", args.vendor, max_val=255)
+
+ # Store PEI
+ pei_data = bytearray()
+ default_flags = self.pei_flags["first"]
+ default_flags |= self.pei_flags["last"]
+
+ error_info_num = 0
+
+ for i, p in pei.items(): # pylint: disable=W0612
+ error_info_num += 1
+
+ # UEFI 2.10 doesn't define how to encode error information
+ # when multiple types are raised. So, provide a default only
+ # if a single type is there
+ if "error-info" not in p:
+ if p["type"] == util.bit(1):
+ p["error-info"] = 0x0091000F
+ if p["type"] == util.bit(2):
+ p["error-info"] = 0x0054007F
+ if p["type"] == util.bit(3):
+ p["error-info"] = 0x80D6460FFF
+ if p["type"] == util.bit(4):
+ p["error-info"] = 0x78DA03FF
+
+ if "valid" not in p:
+ p["valid"] = 0
+ if "multiple-error" in p:
+ p["valid"] |= self.pei_valid_bits["multiple-error"]
+
+ if "flags" in p:
+ p["valid"] |= self.pei_valid_bits["flags"]
+
+ if "error-info" in p:
+ p["valid"] |= self.pei_valid_bits["error-info"]
+
+ if "phy-addr" in p:
+ p["valid"] |= self.pei_valid_bits["phy-addr"]
+
+ if "virt-addr" in p:
+ p["valid"] |= self.pei_valid_bits["virt-addr"]
+
+ # Version
+ util.data_add(pei_data, 0, 1)
+
+ util.data_add(pei_data,
+ self.ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
+
+ util.data_add(pei_data, p["valid"], 2)
+ util.data_add(pei_data, p["type"], 1)
+ util.data_add(pei_data, p.get("multiple-error", 1), 2)
+ util.data_add(pei_data, p.get("flags", default_flags), 1)
+ util.data_add(pei_data, p.get("error-info", 0), 8)
+ util.data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
+ util.data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
+
+ # Store Context
+ ctx_data = bytearray()
+ context_info_num = 0
+
+ if ctx:
+ ret = qmp_cmd.send_cmd("query-target", may_open=True)
+
+ default_ctx = self.CONTEXT_MISC_REG
+
+ if "arch" in ret:
+ if ret["arch"] == "aarch64":
+ default_ctx = self.CONTEXT_AARCH64_EL1
+ elif ret["arch"] == "arm":
+ default_ctx = self.CONTEXT_AARCH32_EL1
+
+ for k in sorted(ctx.keys()):
+ context_info_num += 1
+
+ if "type" not in ctx[k]:
+ ctx[k]["type"] = default_ctx
+
+ if "register" not in ctx[k]:
+ ctx[k]["register"] = []
+
+ reg_size = len(ctx[k]["register"])
+ size = 0
+
+ if "minimal-size" in ctx:
+ size = ctx[k]["minimal-size"]
+
+ size = max(size, reg_size)
+
+ size = (size + 1) % 0xFFFE
+
+ # Version
+ util.data_add(ctx_data, 0, 2)
+
+ util.data_add(ctx_data, ctx[k]["type"], 2)
+
+ util.data_add(ctx_data, 8 * size, 4)
+
+ for r in ctx[k]["register"]:
+ util.data_add(ctx_data, r, 8)
+
+ for i in range(reg_size, size): # pylint: disable=W0612
+ util.data_add(ctx_data, 0, 8)
+
+ # Vendor-specific bytes are not grouped
+ vendor_data = bytearray()
+ if vendor:
+ for k in sorted(vendor.keys()):
+ for b in vendor[k]["bytes"]:
+ util.data_add(vendor_data, b, 1)
+
+ # Encode ARM Processor Error
+ data = bytearray()
+
+ util.data_add(data, cper["valid"], 4)
+
+ util.data_add(data, error_info_num, 2)
+ util.data_add(data, context_info_num, 2)
+
+ # Calculate the length of the CPER data
+ cper_length = self.ACPI_GHES_ARM_CPER_LENGTH
+ cper_length += len(pei_data)
+ cper_length += len(vendor_data)
+ cper_length += len(ctx_data)
+ util.data_add(data, cper_length, 4)
+
+ util.data_add(data, arg.get("affinity-level", 0), 1)
+
+ # Reserved
+ util.data_add(data, 0, 3)
+
+ if "midr-el1" not in arg:
+ if cpus:
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "midr"
+ }
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
+ if isinstance(ret, int):
+ arg["midr-el1"] = ret
+
+ util.data_add(data, arg.get("mpidr-el1", 0), 8)
+ util.data_add(data, arg.get("midr-el1", 0), 8)
+ util.data_add(data, cper["running-state"], 4)
+ util.data_add(data, arg.get("psci-state", 0), 4)
+
+ # Add PEI
+ data.extend(pei_data)
+ data.extend(ctx_data)
+ data.extend(vendor_data)
+
+ self.data = data
+
+ qmp_cmd.send_cper(cper_guid.CPER_PROC_ARM, self.data)
diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
new file mode 100755
index 000000000000..67cb6077bec8
--- /dev/null
+++ b/scripts/ghes_inject.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+"""
+
+import argparse
+import sys
+
+from arm_processor_error import ArmProcessorEinj
+
+EINJ_DESC = """
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+
+It allows using UEFI BIOS EINJ features to generate GHES records.
+
+It helps testing CPER and GHES drivers at the guest OS and how
+userspace applications at the guest handle them.
+"""
+
+def main():
+ """Main program"""
+
+ # Main parser - handle generic args like QEMU QMP TCP socket options
+ parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+ usage="%(prog)s [options]",
+ description=EINJ_DESC)
+
+ g_options = parser.add_argument_group("QEMU QMP socket options")
+ g_options.add_argument("-H", "--host", default="localhost", type=str,
+ help="host name")
+ g_options.add_argument("-P", "--port", default=4445, type=int,
+ help="TCP port number")
+ g_options.add_argument('-d', '--debug', action='store_true')
+
+ subparsers = parser.add_subparsers()
+
+ ArmProcessorEinj(subparsers)
+
+ args = parser.parse_args()
+ if "func" in args:
+ args.func(args)
+ else:
+ sys.exit(f"Please specify a valid command for {sys.argv[0]}")
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
new file mode 100644
index 000000000000..348db9275c36
--- /dev/null
+++ b/scripts/qmp_helper.py
@@ -0,0 +1,702 @@
+#!/usr/bin/env python3
+#
+# # pylint: disable=C0103,E0213,E1135,E1136,E1137,R0902,R0903,R0912,R0913
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Helper classes to be used by ghes_inject command classes.
+"""
+
+import json
+import sys
+
+from datetime import datetime
+from os import path as os_path
+
+try:
+ qemu_dir = os_path.abspath(os_path.dirname(os_path.dirname(__file__)))
+ sys.path.append(os_path.join(qemu_dir, 'python'))
+
+ from qemu.qmp.legacy import QEMUMonitorProtocol
+
+except ModuleNotFoundError as exc:
+ print(f"Module '{exc.name}' not found.")
+ print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir")
+ sys.exit(1)
+
+from base64 import b64encode
+
+class util:
+ """
+ Ancillary functions to deal with bitmaps, parse arguments,
+ generate GUID and encode data on a bytearray buffer.
+ """
+
+ #
+ # Helper routines to handle multiple choice arguments
+ #
+ def get_choice(name, value, choices, suffixes=None, bitmask=True):
+ """Produce a list from multiple choice argument"""
+
+ new_values = 0
+
+ if not value:
+ return new_values
+
+ for val in value.split(","):
+ val = val.lower()
+
+ if suffixes:
+ for suffix in suffixes:
+ val = val.removesuffix(suffix)
+
+ if val not in choices.keys():
+ if suffixes:
+ for suffix in suffixes:
+ if val + suffix in choices.keys():
+ val += suffix
+ break
+
+ if val not in choices.keys():
+ sys.exit(f"Error on '{name}': choice '{val}' is invalid.")
+
+ val = choices[val]
+
+ if bitmask:
+ new_values |= val
+ else:
+ if new_values:
+ sys.exit(f"Error on '{name}': only one value is accepted.")
+
+ new_values = val
+
+ return new_values
+
+ def get_array(name, values, max_val=None):
+ """Add numbered hashes from integer lists into an array"""
+
+ array = []
+
+ for value in values:
+ for val in value.split(","):
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if max_val and val > max_val:
+ sys.exit(f"Error on '{name}': {val} is too little")
+
+ array.append(val)
+
+ return array
+
+ def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
+ """Add numbered hashes from integer lists"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ if not values:
+ i = 0
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = []
+ return
+
+ i = 0
+ for value in values:
+ for val in value.split(","):
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if max_val and val > max_val:
+ sys.exit(f"Error on '{name}': {val} is too little")
+
+ if i not in mult:
+ mult[i] = {}
+
+ if name not in mult[i]:
+ mult[i][name] = []
+
+ mult[i][name].append(val)
+
+ i += 1
+
+
+ def get_mult_choices(mult, name, values, choices,
+ suffixes=None, allow_zero=False):
+ """Add numbered hashes from multiple choice arguments"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ new_values = util.get_choice(name, val, choices, suffixes)
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = new_values
+ i += 1
+
+
+ def get_mult_int(mult, name, values, allow_zero=False):
+ """Add numbered hashes from integer arguments"""
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = val
+ i += 1
+
+
+ #
+ # Data encode helper functions
+ #
+ def bit(b):
+ """Simple macro to define a bit on a bitmask"""
+ return 1 << b
+
+
+ def data_add(data, value, num_bytes):
+ """Adds bytes from value inside a bitarray"""
+
+ data.extend(value.to_bytes(num_bytes, byteorder="little")) # pylint: disable=E1101
+
+ def dump_bytearray(name, data):
+ """Does an hexdump of a byte array, grouping in bytes"""
+
+ print(f"{name} ({len(data)} bytes):")
+
+ for ln_start in range(0, len(data), 16):
+ ln_end = min(ln_start + 16, len(data))
+ print(f" {ln_start:08x} ", end="")
+ for i in range(ln_start, ln_end):
+ print(f"{data[i]:02x} ", end="")
+ for i in range(ln_end, ln_start + 16):
+ print(" ", end="")
+ print(" ", end="")
+ for i in range(ln_start, ln_end):
+ if data[i] >= 32 and data[i] < 127:
+ print(chr(data[i]), end="")
+ else:
+ print(".", end="")
+
+ print()
+ print()
+
+ def time(string):
+ """Handle BCD timestamps used on Generic Error Data Block"""
+
+ time = None
+
+ # Formats to be used when parsing time stamps
+ formats = [
+ "%Y-%m-%d %H:%M:%S",
+ ]
+
+ if string == "now":
+ time = datetime.now()
+
+ if time is None:
+ for fmt in formats:
+ try:
+ time = datetime.strptime(string, fmt)
+ break
+ except ValueError:
+ pass
+
+ if time is None:
+ raise ValueError("Invalid time format")
+
+ return time
+
+class guid:
+ """
+ Simple class to handle GUID fields.
+ """
+
+ def __init__(self, time_low, time_mid, time_high, nodes):
+ """Initialize a GUID value"""
+
+ assert len(nodes) == 8
+
+ self.time_low = time_low
+ self.time_mid = time_mid
+ self.time_high = time_high
+ self.nodes = nodes
+
+ @classmethod
+ def UUID(cls, guid_str):
+ """Initialize a GUID using a string on its standard format"""
+
+ if len(guid_str) != 36:
+ print("Size not 36")
+ raise ValueError('Invalid GUID size')
+
+ # It is easier to parse without separators. So, drop them
+ guid_str = guid_str.replace('-', '')
+
+ if len(guid_str) != 32:
+ print("Size not 32", guid_str, len(guid_str))
+ raise ValueError('Invalid GUID hex size')
+
+ time_low = 0
+ time_mid = 0
+ time_high = 0
+ nodes = []
+
+ for i in reversed(range(16, 32, 2)):
+ h = guid_str[i:i + 2]
+ value = int(h, 16)
+ nodes.insert(0, value)
+
+ time_high = int(guid_str[12:16], 16)
+ time_mid = int(guid_str[8:12], 16)
+ time_low = int(guid_str[0:8], 16)
+
+ return cls(time_low, time_mid, time_high, nodes)
+
+ def __str__(self):
+ """Output a GUID value on its default string representation"""
+
+ clock = self.nodes[0] << 8 | self.nodes[1]
+
+ node = 0
+ for i in range(2, len(self.nodes)):
+ node = node << 8 | self.nodes[i]
+
+ s = f"{self.time_low:08x}-{self.time_mid:04x}-"
+ s += f"{self.time_high:04x}-{clock:04x}-{node:012x}"
+ return s
+
+ def to_bytes(self):
+ """Output a GUID value in bytes"""
+
+ data = bytearray()
+
+ util.data_add(data, self.time_low, 4)
+ util.data_add(data, self.time_mid, 2)
+ util.data_add(data, self.time_high, 2)
+ data.extend(bytearray(self.nodes))
+
+ return data
+
+class qmp:
+ """
+ Opens a connection and send/receive QMP commands.
+ """
+
+ def send_cmd(self, command, args=None, may_open=False, return_error=True):
+ """Send a command to QMP, optinally opening a connection"""
+
+ if may_open:
+ self._connect()
+ elif not self.connected:
+ return False
+
+ msg = { 'execute': command }
+ if args:
+ msg['arguments'] = args
+
+ try:
+ obj = self.qmp_monitor.cmd_obj(msg)
+ # Can we use some other exception class here?
+ except Exception as e: # pylint: disable=W0718
+ print(f"Command: {command}")
+ print(f"Failed to inject error: {e}.")
+ return None
+
+ if "return" in obj:
+ if isinstance(obj.get("return"), dict):
+ if obj["return"]:
+ return obj["return"]
+ return "OK"
+
+ return obj["return"]
+
+ if isinstance(obj.get("error"), dict):
+ error = obj["error"]
+ if return_error:
+ print(f"Command: {msg}")
+ print(f'{error["class"]}: {error["desc"]}')
+ else:
+ print(json.dumps(obj))
+
+ return None
+
+ def _close(self):
+ """Shutdown and close the socket, if opened"""
+ if not self.connected:
+ return
+
+ self.qmp_monitor.close()
+ self.connected = False
+
+ def _connect(self):
+ """Connect to a QMP TCP/IP port, if not connected yet"""
+
+ if self.connected:
+ return True
+
+ try:
+ self.qmp_monitor.connect(negotiate=True)
+ except ConnectionError:
+ sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
+
+ self.connected = True
+
+ return True
+
+ BLOCK_STATUS_BITS = {
+ "uncorrectable": util.bit(0),
+ "correctable": util.bit(1),
+ "multi-uncorrectable": util.bit(2),
+ "multi-correctable": util.bit(3),
+ }
+
+ ERROR_SEVERITY = {
+ "recoverable": 0,
+ "fatal": 1,
+ "corrected": 2,
+ "none": 3,
+ }
+
+ VALIDATION_BITS = {
+ "fru-id": util.bit(0),
+ "fru-text": util.bit(1),
+ "timestamp": util.bit(2),
+ }
+
+ GEDB_FLAGS_BITS = {
+ "recovered": util.bit(0),
+ "prev-error": util.bit(1),
+ "simulated": util.bit(2),
+ }
+
+ GENERIC_DATA_SIZE = 72
+
+ def argparse(parser):
+ """Prepare a parser group to query generic error data"""
+
+ block_status_bits = ",".join(qmp.BLOCK_STATUS_BITS.keys())
+ error_severity_enum = ",".join(qmp.ERROR_SEVERITY.keys())
+ validation_bits = ",".join(qmp.VALIDATION_BITS.keys())
+ gedb_flags_bits = ",".join(qmp.GEDB_FLAGS_BITS.keys())
+
+ g_gen = parser.add_argument_group("Generic Error Data") # pylint: disable=E1101
+ g_gen.add_argument("--block-status",
+ help=f"block status bits: {block_status_bits}")
+ g_gen.add_argument("--raw-data", nargs="+",
+ help="Raw data inside the Error Status Block")
+ g_gen.add_argument("--error-severity", "--severity",
+ help=f"error severity: {error_severity_enum}")
+ g_gen.add_argument("--gen-err-valid-bits",
+ "--generic-error-validation-bits",
+ help=f"validation bits: {validation_bits}")
+ g_gen.add_argument("--fru-id", type=guid.UUID,
+ help="GUID representing a physical device")
+ g_gen.add_argument("--fru-text",
+ help="ASCII string identifying the FRU hardware")
+ g_gen.add_argument("--timestamp", type=util.time,
+ help="Time when the error info was collected")
+ g_gen.add_argument("--precise", "--precise-timestamp",
+ action='store_true',
+ help="Marks the timestamp as precise if --timestamp is used")
+ g_gen.add_argument("--gedb-flags",
+ help=f"General Error Data Block flags: {gedb_flags_bits}")
+
+ def set_args(self, args):
+ """Set the arguments optionally defined via self.argparse()"""
+
+ if args.block_status:
+ self.block_status = util.get_choice(name="block-status",
+ value=args.block_status,
+ choices=self.BLOCK_STATUS_BITS,
+ bitmask=False)
+ if args.raw_data:
+ self.raw_data = util.get_array("raw-data", args.raw_data,
+ max_val=255)
+ print(self.raw_data)
+
+ if args.error_severity:
+ self.error_severity = util.get_choice(name="error-severity",
+ value=args.error_severity,
+ choices=self.ERROR_SEVERITY,
+ bitmask=False)
+
+ if args.fru_id:
+ self.fru_id = args.fru_id.to_bytes()
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["fru-id"]
+
+ if args.fru_text:
+ text = bytearray(args.fru_text.encode('ascii'))
+ if len(text) > 20:
+ sys.exit("FRU text is too big to fit")
+
+ self.fru_text = text
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["fru-text"]
+
+ if args.timestamp:
+ time = args.timestamp
+ century = int(time.year / 100)
+
+ bcd = bytearray()
+ util.data_add(bcd, (time.second // 10) << 4 | (time.second % 10), 1)
+ util.data_add(bcd, (time.minute // 10) << 4 | (time.minute % 10), 1)
+ util.data_add(bcd, (time.hour // 10) << 4 | (time.hour % 10), 1)
+
+ if args.precise:
+ util.data_add(bcd, 1, 1)
+ else:
+ util.data_add(bcd, 0, 1)
+
+ util.data_add(bcd, (time.day // 10) << 4 | (time.day % 10), 1)
+ util.data_add(bcd, (time.month // 10) << 4 | (time.month % 10), 1)
+ util.data_add(bcd,
+ ((time.year % 100) // 10) << 4 | (time.year % 10), 1)
+ util.data_add(bcd, ((century % 100) // 10) << 4 | (century % 10), 1)
+
+ self.timestamp = bcd
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["timestamp"]
+
+ if args.gen_err_valid_bits:
+ self.validation_bits = util.get_choice(name="validation",
+ value=args.gen_err_valid_bits,
+ choices=self.VALIDATION_BITS)
+
+ def __init__(self, host, port, debug=False):
+ """Initialize variables used by the QMP send logic"""
+
+ self.connected = False
+ self.host = host
+ self.port = port
+ self.debug = debug
+
+ # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Status Block
+ self.block_status = self.BLOCK_STATUS_BITS["uncorrectable"]
+ self.raw_data = []
+ self.error_severity = self.ERROR_SEVERITY["recoverable"]
+
+ # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Data Entry
+ self.validation_bits = 0
+ self.flags = 0
+ self.fru_id = bytearray(16)
+ self.fru_text = bytearray(20)
+ self.timestamp = bytearray(8)
+
+ self.qmp_monitor = QEMUMonitorProtocol(address=(self.host, self.port))
+
+ #
+ # Socket QMP send command
+ #
+ def send_cper_raw(self, cper_data):
+ """Send a raw CPER data to QEMU though QMP TCP socket"""
+
+ data = b64encode(bytes(cper_data)).decode('ascii')
+
+ cmd_arg = {
+ 'cper': data
+ }
+
+ self._connect()
+
+ if self.send_cmd("ghes-cper", cmd_arg):
+ print("Error injected.")
+
+ def send_cper(self, notif_type, payload):
+ """Send commands to QEMU though QMP TCP socket"""
+
+ # Fill CPER record header
+
+ # NOTE: bits 4 to 13 of block status contain the number of
+ # data entries in the data section. This is currently unsupported.
+
+ cper_length = len(payload)
+ data_length = cper_length + len(self.raw_data) + self.GENERIC_DATA_SIZE
+
+ # Generic Error Data Entry
+ gede = bytearray()
+
+ gede.extend(notif_type.to_bytes())
+ util.data_add(gede, self.error_severity, 4)
+ util.data_add(gede, 0x300, 2)
+ util.data_add(gede, self.validation_bits, 1)
+ util.data_add(gede, self.flags, 1)
+ util.data_add(gede, cper_length, 4)
+ gede.extend(self.fru_id)
+ gede.extend(self.fru_text)
+ gede.extend(self.timestamp)
+
+ # Generic Error Status Block
+ gebs = bytearray()
+
+ if self.raw_data:
+ raw_data_offset = len(gebs)
+ else:
+ raw_data_offset = 0
+
+ util.data_add(gebs, self.block_status, 4)
+ util.data_add(gebs, raw_data_offset, 4)
+ util.data_add(gebs, len(self.raw_data), 4)
+ util.data_add(gebs, data_length, 4)
+ util.data_add(gebs, self.error_severity, 4)
+
+ cper_data = bytearray()
+ cper_data.extend(gebs)
+ cper_data.extend(gede)
+ cper_data.extend(bytearray(self.raw_data))
+ cper_data.extend(bytearray(payload))
+
+ if self.debug:
+ print(f"GUID: {notif_type}")
+
+ util.dump_bytearray("Generic Error Status Block", gebs)
+ util.dump_bytearray("Generic Error Data Entry", gede)
+
+ if self.raw_data:
+ util.dump_bytearray("Raw data", bytearray(self.raw_data))
+
+ util.dump_bytearray("Payload", payload)
+
+ self.send_cper_raw(cper_data)
+
+
+ def search_qom(self, path, prop, regex):
+ """
+ Return a list of devices that match path array like:
+
+ /machine/unattached/device
+ /machine/peripheral-anon/device
+ ...
+ """
+
+ found = []
+
+ i = 0
+ while 1:
+ dev = f"{path}[{i}]"
+ args = {
+ 'path': dev,
+ 'property': prop
+ }
+ ret = self.send_cmd("qom-get", args, may_open=True, return_error=False)
+ if not ret:
+ break
+
+ if isinstance(ret, str):
+ if regex.search(ret):
+ found.append(dev)
+
+ i += 1
+ if i > 10000:
+ print("Too many objects returned by qom-get!")
+ break
+
+ return found
+
+class cper_guid:
+ """
+ Contains CPER GUID, as per:
+ https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
+ """
+
+ CPER_PROC_GENERIC = guid(0x9876CCAD, 0x47B4, 0x4bdb,
+ [0xB6, 0x5E, 0x16, 0xF1,
+ 0x93, 0xC4, 0xF3, 0xDB])
+
+ CPER_PROC_X86 = guid(0xDC3EA0B0, 0xA144, 0x4797,
+ [0xB9, 0x5B, 0x53, 0xFA,
+ 0x24, 0x2B, 0x6E, 0x1D])
+
+ CPER_PROC_ITANIUM = guid(0xe429faf1, 0x3cb7, 0x11d4,
+ [0xbc, 0xa7, 0x00, 0x80,
+ 0xc7, 0x3c, 0x88, 0x81])
+
+ CPER_PROC_ARM = guid(0xE19E3D16, 0xBC11, 0x11E4,
+ [0x9C, 0xAA, 0xC2, 0x05,
+ 0x1D, 0x5D, 0x46, 0xB0])
+
+ CPER_PLATFORM_MEM = guid(0xA5BC1114, 0x6F64, 0x4EDE,
+ [0xB8, 0x63, 0x3E, 0x83,
+ 0xED, 0x7C, 0x83, 0xB1])
+
+ CPER_PLATFORM_MEM2 = guid(0x61EC04FC, 0x48E6, 0xD813,
+ [0x25, 0xC9, 0x8D, 0xAA,
+ 0x44, 0x75, 0x0B, 0x12])
+
+ CPER_PCIE = guid(0xD995E954, 0xBBC1, 0x430F,
+ [0xAD, 0x91, 0xB4, 0x4D,
+ 0xCB, 0x3C, 0x6F, 0x35])
+
+ CPER_PCI_BUS = guid(0xC5753963, 0x3B84, 0x4095,
+ [0xBF, 0x78, 0xED, 0xDA,
+ 0xD3, 0xF9, 0xC9, 0xDD])
+
+ CPER_PCI_DEV = guid(0xEB5E4685, 0xCA66, 0x4769,
+ [0xB6, 0xA2, 0x26, 0x06,
+ 0x8B, 0x00, 0x13, 0x26])
+
+ CPER_FW_ERROR = guid(0x81212A96, 0x09ED, 0x4996,
+ [0x94, 0x71, 0x8D, 0x72,
+ 0x9C, 0x8E, 0x69, 0xED])
+
+ CPER_DMA_GENERIC = guid(0x5B51FEF7, 0xC79D, 0x4434,
+ [0x8F, 0x1B, 0xAA, 0x62,
+ 0xDE, 0x3E, 0x2C, 0x64])
+
+ CPER_DMA_VT = guid(0x71761D37, 0x32B2, 0x45cd,
+ [0xA7, 0xD0, 0xB0, 0xFE,
+ 0xDD, 0x93, 0xE8, 0xCF])
+
+ CPER_DMA_IOMMU = guid(0x036F84E1, 0x7F37, 0x428c,
+ [0xA7, 0x9E, 0x57, 0x5F,
+ 0xDF, 0xAA, 0x84, 0xEC])
+
+ CPER_CCIX_PER = guid(0x91335EF6, 0xEBFB, 0x4478,
+ [0xA6, 0xA6, 0x88, 0xB7,
+ 0x28, 0xCF, 0x75, 0xD7])
+
+ CPER_CXL_PROT_ERR = guid(0x80B9EFB4, 0x52B5, 0x4DE3,
+ [0xA7, 0x77, 0x68, 0x78,
+ 0x4B, 0x77, 0x10, 0x48])
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (9 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
2024-08-25 11:34 ` Peter Maydell
2024-08-25 3:46 ` [PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
11 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Peter Maydell, linux-kernel, qemu-arm,
qemu-devel
Accurately injecting an ARM Processor error ACPI/APEI GHES
error record requires the value of the ARM Multiprocessor
Affinity Register (mpidr).
While ARM implements it, this is currently not visible.
Add a field at CPU storing it, and place it at arm_cpu_properties
as experimental, thus allowing it to be queried via QMP using
qom-get function.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
target/arm/cpu.c | 1 +
target/arm/cpu.h | 1 +
target/arm/helper.c | 10 ++++++++--
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 19191c239181..30fcf0a10f46 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model)
static Property arm_cpu_properties[] = {
DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
+ DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
mp_affinity, ARM64_AFFINITY_INVALID),
DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9a3fd595621f..3ad4de793409 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@ struct ArchCPU {
uint64_t reset_pmcr_el0;
} isar;
uint64_t midr;
+ uint64_t mpidr;
uint32_t revidr;
uint32_t reset_fpsid;
uint64_t ctr;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0a582c1cd3b3..d6e7aa069489 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4690,7 +4690,7 @@ static uint64_t mpidr_read_val(CPUARMState *env)
return mpidr;
}
-static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+static uint64_t mpidr_read(CPUARMState *env)
{
unsigned int cur_el = arm_current_el(env);
@@ -4700,6 +4700,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
return mpidr_read_val(env);
}
+static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+ return mpidr_read(env);
+}
+
static const ARMCPRegInfo lpae_cp_reginfo[] = {
/* NOP AMAIR0/1 */
{ .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
@@ -9721,7 +9726,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
{ .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
.opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
.fgt = FGT_MPIDR_EL1,
- .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+ .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
};
#ifdef CONFIG_USER_ONLY
static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
@@ -9731,6 +9736,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
#endif
define_arm_cp_regs(cpu, mpidr_cp_reginfo);
+ cpu->mpidr = mpidr_read(env);
}
if (arm_feature(env, ARM_FEATURE_AUXCR)) {
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (10 preceding siblings ...)
2024-08-25 3:46 ` [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
@ 2024-08-25 3:46 ` Mauro Carvalho Chehab
11 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-25 3:46 UTC (permalink / raw)
Cc: Mauro Carvalho Chehab, Cleber Rosa, John Snow, linux-kernel,
qemu-devel
Add support to retrieve mpidr value via qom-get.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
scripts/arm_processor_error.py | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
index 62e0c5662232..0a16d4f0d8b1 100644
--- a/scripts/arm_processor_error.py
+++ b/scripts/arm_processor_error.py
@@ -5,12 +5,10 @@
#
# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-# TODO: current implementation has dummy defaults.
-#
-# For a better implementation, a QMP addition/call is needed to
-# retrieve some data for ARM Processor Error injection:
-#
-# - ARM registers: power_state, mpidr.
+# Note: currently it lacks a method to fill the ARM Processor Error CPER
+# psci field from emulation. On a real hardware, this is filled only
+# when a CPU is not running. Implementing support for it to simulate a
+# real hardware is not trivial.
import argparse
import re
@@ -174,11 +172,24 @@ def send_cper(self, args):
else:
cper["running-state"] = 0
+ if args.mpidr:
+ cper["mpidr-el1"] = arg["mpidr"]
+ elif cpus:
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "x-mpidr"
+ }
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
+ if isinstance(ret, int):
+ cper["mpidr-el1"] = ret
+ else:
+ cper["mpidr-el1"] = 0
+
if arm_valid_init:
if args.affinity:
cper["valid"] |= self.arm_valid_bits["affinity"]
- if args.mpidr:
+ if "mpidr-el1" in cper:
cper["valid"] |= self.arm_valid_bits["mpidr"]
if "running-state" in cper:
@@ -362,7 +373,7 @@ def send_cper(self, args):
if isinstance(ret, int):
arg["midr-el1"] = ret
- util.data_add(data, arg.get("mpidr-el1", 0), 8)
+ util.data_add(data, cper["mpidr-el1"], 8)
util.data_add(data, arg.get("midr-el1", 0), 8)
util.data_add(data, cper["running-state"], 4)
util.data_add(data, arg.get("psci-state", 0), 4)
--
2.46.0
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
2024-08-25 3:46 ` [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
@ 2024-08-25 11:34 ` Peter Maydell
2024-08-26 1:53 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2024-08-25 11:34 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: linux-kernel, qemu-arm, qemu-devel
On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Accurately injecting an ARM Processor error ACPI/APEI GHES
> error record requires the value of the ARM Multiprocessor
> Affinity Register (mpidr).
>
> While ARM implements it, this is currently not visible.
>
> Add a field at CPU storing it, and place it at arm_cpu_properties
> as experimental, thus allowing it to be queried via QMP using
> qom-get function.
> static Property arm_cpu_properties[] = {
> DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> + DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> mp_affinity, ARM64_AFFINITY_INVALID),
> DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
Why do we need this? Why is it experimental? The later patch
seems to use it via QMP, which I'm not super enthusiastic
about -- the preexisting mpidr and mp-affinity properties are
there for code that is creating CPU objects to configure
the CPU object, not as a query interface for QOM.
thanks
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
2024-08-25 11:34 ` Peter Maydell
@ 2024-08-26 1:53 ` Mauro Carvalho Chehab
2024-08-30 16:27 ` Peter Maydell
0 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-26 1:53 UTC (permalink / raw)
To: Peter Maydell; +Cc: linux-kernel, qemu-arm, qemu-devel
Em Sun, 25 Aug 2024 12:34:14 +0100
Peter Maydell <peter.maydell@linaro.org> escreveu:
> On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > error record requires the value of the ARM Multiprocessor
> > Affinity Register (mpidr).
> >
> > While ARM implements it, this is currently not visible.
> >
> > Add a field at CPU storing it, and place it at arm_cpu_properties
> > as experimental, thus allowing it to be queried via QMP using
> > qom-get function.
>
> > static Property arm_cpu_properties[] = {
> > DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > + DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> > DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> > mp_affinity, ARM64_AFFINITY_INVALID),
> > DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
>
> Why do we need this?
The ACPI HEST tables, in particular when using GHESv2 provide
several kinds of errors. Among them, we have ARM Processor Error,
as defined at UEFI 2.10 spec (and earlier versions), the Common
Platform Error Record (CPER) is defined as:
https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section
There are two fields that are part of the CPER record. One of them is
mandatory (MIDR); the other one is optional, but needed to decode another
field.
So, basically those errors need them.
> Why is it experimental?
This was a suggestion from Igor. As for now the QAPI for external
error injection is experimental, It makes sense to me to keep it
experimental as well.
> The later patch
> seems to use it via QMP, which I'm not super enthusiastic
> about -- the preexisting mpidr and mp-affinity properties are
> there for code that is creating CPU objects to configure
> the CPU object, not as a query interface for QOM.
I saw that. Basically the decoding by OS guest depends on MPIDR,
as explained at the description of Error affinity level field:
"For errors that can be attributed to a specific affinity level,
this field defines the affinity level at which the error was
produced, detected, and/or consumed. This is a value between 0
and 3. All other values (4-255) are reserved
For example, a vendor may choose to define affinity levels as
follows:
Level 0: errors that can be precisely attributed to a specific CPU
(e.g. due to a synchronous external abort)
Level 1: Cache parity and/or ECC errors detected at cache of affinity
level 1 (e.g. only attributed to higher level cache due to
prefetching and/or error propagation)
NOTE: Detailed meanings and groupings of affinity level are chip
and/or platform specific. The affinity level described here must
be consistent with the platform definitions used MPIDR. For
cache/TLB errors, the cache/TLB level is provided by the cache/TLB
error structure, which may differ from affinity level."
Regards,
Mauro
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection
2024-08-25 3:46 ` [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2024-08-27 11:11 ` Markus Armbruster
0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2024-08-27 11:11 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
Igor Mammedov, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel,
Jonathan Cameron, Shiju Jose
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2, and add support for it for ARM guests.
>
> Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
> independent. This is mapped at arch virt bindings, depending on the
> types supported by QEMU and by the BIOS. So, on ARM, this is supported
> via ACPI_GHES_NOTIFY_GPIO notification type.
>
> This patch is co-authored:
> - original ghes logic to inject a simple ARM record by Shiju Jose;
> - generic logic to handle block addresses by Jonathan Cameron;
> - generic GHESv2 error inject by Mauro Carvalho Chehab;
>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 7 +++++++
> hw/acpi/Kconfig | 5 +++++
> hw/acpi/ghes_cper.c | 32 ++++++++++++++++++++++++++++++++
> hw/acpi/ghes_cper_stub.c | 19 +++++++++++++++++++
> hw/acpi/meson.build | 2 ++
> hw/arm/Kconfig | 5 +++++
> hw/arm/virt-acpi-build.c | 1 +
> include/hw/acpi/ghes.h | 4 ++++
> include/hw/arm/virt.h | 2 ++
> qapi/acpi-hest.json | 36 ++++++++++++++++++++++++++++++++++++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> 12 files changed, 115 insertions(+)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c
> create mode 100644 qapi/acpi-hest.json
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3584d6a6c6da..1d8091818899 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2077,6 +2077,13 @@ F: hw/acpi/ghes.c
> F: include/hw/acpi/ghes.h
> F: docs/specs/acpi_hest_ghes.rst
>
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/ghes_cper.c
> +F: hw/acpi/ghes_cper_stub.c
> +F: qapi/acpi-hest.json
> +
> ppc4xx
> L: qemu-ppc@nongnu.org
> S: Orphan
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index e07d3204eb36..73ffbb82c150 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -51,6 +51,11 @@ config ACPI_APEI
> bool
> depends on ACPI
>
> +config GHES_CPER
> + bool
> + depends on ACPI_APEI
> + default y
> +
> config ACPI_PCI
> bool
> depends on ACPI && PCI
> diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> new file mode 100644
> index 000000000000..a10a5e7ab29b
> --- /dev/null
> +++ b/hw/acpi/ghes_cper.c
> @@ -0,0 +1,32 @@
> +/*
> + * CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/base64.h"
> +#include "qemu/error-report.h"
> +#include "qemu/uuid.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_ghes_cper(const char *qmp_cper, Error **errp)
> +{
> +
> + uint8_t *cper;
> + size_t len;
> +
> + cper = qbase64_decode(qmp_cper, -1, &len, errp);
> + if (!cper) {
> + error_setg(errp, "missing GHES CPER payload");
> + return;
> + }
> +
> + ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
> +}
> diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
> new file mode 100644
> index 000000000000..36138c462ac9
> --- /dev/null
> +++ b/hw/acpi/ghes_cper_stub.c
> @@ -0,0 +1,19 @@
> +/*
> + * Stub interface for CPER payload parser for error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-acpi-hest.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_ghes_cper(const char *cper, Error **errp)
> +{
> + error_setg(errp, "GHES QMP error inject is not compiled in");
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fa5c07db9068..6cbf430eb66d 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -34,4 +34,6 @@ endif
> system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
> system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
> system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
> system_ss.add(files('acpi-qmp-cmds.c'))
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 1ad60da7aa2d..bed6ba27d715 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -712,3 +712,8 @@ config ARMSSE
> select UNIMP
> select SSE_COUNTER
> select SSE_TIMER
> +
> +config GHES_CPER
> + bool
> + depends on ARM
> + default y if AARCH64
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index 9c36da3c831f..a6b03b16d922 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -893,6 +893,7 @@ static void acpi_align_size(GArray *blob, unsigned align)
>
> static const uint16_t hest_ghes_notify[] = {
> [ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA,
> + [ARM_ACPI_HEST_SRC_ID_GPIO] = ACPI_GHES_NOTIFY_GPIO,
> };
>
> static
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index b1ec9795270f..ea6b33e133d6 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -66,6 +66,10 @@ typedef struct AcpiGhesState {
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
>
> +
> +/* Source ID associated with qapi/acpi-hest.json QMP error injection */
> +#define ACPI_HEST_SRC_ID_QMP 0
> +
> void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> BIOSLinker *linker,
> const uint16_t * const notify,
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 1c682d43fdac..8b042053dfa1 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -33,6 +33,7 @@
> #include "exec/hwaddr.h"
> #include "qemu/notify.h"
> #include "hw/boards.h"
> +#include "hw/acpi/ghes.h"
> #include "hw/arm/boot.h"
> #include "hw/arm/bsa.h"
> #include "hw/block/flash.h"
> @@ -194,6 +195,7 @@ bool virt_is_acpi_enabled(VirtMachineState *vms);
> * ID numbers used to fill HEST source ID field
> */
> enum {
> + ARM_ACPI_HEST_SRC_ID_GPIO = ACPI_HEST_SRC_ID_QMP,
> ARM_ACPI_HEST_SRC_ID_SEA,
> };
>
> diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json
> new file mode 100644
> index 000000000000..0255695c95ad
> --- /dev/null
> +++ b/qapi/acpi-hest.json
> @@ -0,0 +1,36 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = GHESv2 CPER Error Injection
This makes it a section next to section ACPI. Should it be a subjection
of ACPI? If yes, add a second =.
> +#
> +# Defined since ACPI Specification 6.1,
> +# section 18.3.2.8 Generic Hardware Error Source version 2. See:
> +#
> +# https://uefi.org/sites/default/files/resources/ACPI_6_1.pdf
> +##
> +
> +
> +##
> +# @ghes-cper:
Can we pick a command name that provides more of a clue to the
uninitiated on what the command does? inject-ghes-cper?
inject-ghes-cper-error?
> +#
> +# Inject a CPER error data to be filled according to ACPI 6.1
> +# spec via GHESv2.
It's either "a datum", or "data". Scratch "data"?
What do you mean by "to be filled"? Who's doing the filling?
What about something like "Inject an error with additional ACPI 6.1 GHES
v2 error information"?
> +#
> +# @cper: contains a base64 encoded string with raw data for a single CPER
> +# record with Generic Error Status Block, Generic Error Data Entry and
> +# generic error data payload, as described at
> +# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
docs/devel/qapi-code-gen.rst:
For legibility, wrap text paragraphs so every line is at most 70
characters long.
Thus:
# @cper: contains a base64 encoded string with raw data for a single
# CPER record with Generic Error Status Block, Generic Error Data
# Entry and generic error data payload, as described at
# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.2
> +##
> +{ 'command': 'ghes-cper',
> + 'data': {
> + 'cper': 'str'
> + },
> + 'features': [ 'unstable' ]
> +}
> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..35cea6147262 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -59,6 +59,7 @@ qapi_all_modules = [
> if have_system
> qapi_all_modules += [
> 'acpi',
> + 'acpi-hest',
> 'audio',
> 'cryptodev',
> 'qdev',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..baf19ab73afe 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -75,6 +75,7 @@
> { 'include': 'misc-target.json' }
> { 'include': 'audio.json' }
> { 'include': 'acpi.json' }
> +{ 'include': 'acpi-hest.json' }
> { 'include': 'pci.json' }
> { 'include': 'stats.json' }
> { 'include': 'virtio.json' }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
2024-08-26 1:53 ` Mauro Carvalho Chehab
@ 2024-08-30 16:27 ` Peter Maydell
2024-09-01 6:40 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2024-08-30 16:27 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: linux-kernel, qemu-arm, qemu-devel
On Mon, 26 Aug 2024 at 04:12, Mauro Carvalho Chehab
<mchehab+huawei@kernel.org> wrote:
>
> Em Sun, 25 Aug 2024 12:34:14 +0100
> Peter Maydell <peter.maydell@linaro.org> escreveu:
>
> > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> > <mchehab+huawei@kernel.org> wrote:
> > >
> > > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > > error record requires the value of the ARM Multiprocessor
> > > Affinity Register (mpidr).
> > >
> > > While ARM implements it, this is currently not visible.
> > >
> > > Add a field at CPU storing it, and place it at arm_cpu_properties
> > > as experimental, thus allowing it to be queried via QMP using
> > > qom-get function.
> >
> > > static Property arm_cpu_properties[] = {
> > > DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > > + DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> > > DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> > > mp_affinity, ARM64_AFFINITY_INVALID),
> > > DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> >
> > Why do we need this?
>
> The ACPI HEST tables, in particular when using GHESv2 provide
> several kinds of errors. Among them, we have ARM Processor Error,
> as defined at UEFI 2.10 spec (and earlier versions), the Common
> Platform Error Record (CPER) is defined as:
>
> https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section
>
> There are two fields that are part of the CPER record. One of them is
> mandatory (MIDR); the other one is optional, but needed to decode another
> field.
>
> So, basically those errors need them.
OK, but why do scripts outside of QEMU need the information,
as opposed to telling QEMU "hey, generate an error" and
QEMU knowing the format to use? Do we have any other
QMP APIs where something external provides raw ACPI
data like this?
-- PMM
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object
2024-08-30 16:27 ` Peter Maydell
@ 2024-09-01 6:40 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-01 6:40 UTC (permalink / raw)
To: Peter Maydell; +Cc: linux-kernel, qemu-arm, qemu-devel
Em Fri, 30 Aug 2024 17:27:27 +0100
Peter Maydell <peter.maydell@linaro.org> escreveu:
> On Mon, 26 Aug 2024 at 04:12, Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> >
> > Em Sun, 25 Aug 2024 12:34:14 +0100
> > Peter Maydell <peter.maydell@linaro.org> escreveu:
> >
> > > On Sun, 25 Aug 2024 at 04:46, Mauro Carvalho Chehab
> > > <mchehab+huawei@kernel.org> wrote:
> > > >
> > > > Accurately injecting an ARM Processor error ACPI/APEI GHES
> > > > error record requires the value of the ARM Multiprocessor
> > > > Affinity Register (mpidr).
> > > >
> > > > While ARM implements it, this is currently not visible.
> > > >
> > > > Add a field at CPU storing it, and place it at arm_cpu_properties
> > > > as experimental, thus allowing it to be queried via QMP using
> > > > qom-get function.
> > >
> > > > static Property arm_cpu_properties[] = {
> > > > DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
> > > > + DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
> > > > DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
> > > > mp_affinity, ARM64_AFFINITY_INVALID),
> > > > DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
> > >
> > > Why do we need this?
> >
> > The ACPI HEST tables, in particular when using GHESv2 provide
> > several kinds of errors. Among them, we have ARM Processor Error,
> > as defined at UEFI 2.10 spec (and earlier versions), the Common
> > Platform Error Record (CPER) is defined as:
> >
> > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html?highlight=ghes#arm-processor-error-section
> >
> > There are two fields that are part of the CPER record. One of them is
> > mandatory (MIDR); the other one is optional, but needed to decode another
> > field.
> >
> > So, basically those errors need them.
>
> OK, but why do scripts outside of QEMU need the information,
> as opposed to telling QEMU "hey, generate an error" and
> QEMU knowing the format to use? Do we have any other
> QMP APIs where something external provides raw ACPI
> data like this?
This was discussed during the review of this patch series.
See, the ACPI Platform Error Interfaces (APEI) code currently in QEMU
implements limited support for ACPI HEST - Hardware Error Source Table [1].
[1] https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source
HEST consists of, currently, 9 error types (plus 3 obsoleted ones). Among
them, there is support for generic errors via GHES and GHESv2 types.
While not officially obsoleted, GHES is superseded by GHESv2.
GHESv2 (and GHES) has a section type field to identify which error type it
is [2]. Currently, there are +10 defined UUIDs for the section type.
[2] https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#section-descriptor
The current code on ghes.c implements GHESv2 support for a single
type (memory error), received from the host OS via SIGBUS.
Testing such code and injecting such error is not easy, as the host OS needs
to send a SIGBUS to the guest, this reflecting an error at the main OS.
Such code also has several limitations.
-
At the first three versions of this patch set, the code was just doing
like what you said: it was adding an error injection for a HEST GHESv2
ARM Processor Error. So the error record (CPER) were produced in QEMU using
some optional parameters passed via QMP to change fields when needed.
With such approach, QEMU could use directly the value from MIDR and MPIDR.
The main disadvantage is that, to make full support of HEST, a lot
of code will be needed to add support for every GHESv2 type and for
every GHESv2 section type. So, the feedback we had were to re-implement
it into a generic way.
The generic CPER error inject approach (since v4 of this series), has
soma advantages:
- it is easy to do fuzz testing, as the entire CPER is built via a python
script;
- no need to modify QEMU to support other GHESv2 types of record and
to support other types of processors;
- GHESv2 fields can also be dynamically generated;
- It shouldn't be hard to change the code to support other types of
HEST table (currently, only GHESv2 is supported).
The disadvantage is that queries are needed to pick configuration and
register values from the current emulation to do error injection. For
ARM Processor Error, it means that MPIDR and MIDR, are needed. Other
processors and other error types will also require to query other data
from QEMU, either using already-existing QMP code or by adding new ones.
Yet, the amount of code for such queries seem to be smaller than the
amount of code to be added for every single GHESv2/HEST type.
-
Worth saying that QEMU may still require internal HEST/GHES errors to be
able to reflect at the guests hardware problems detected at the host OS.
So, for instance, if a host OS memory is poisoned due to hardware errors,
QEMU and guests need to know, in order to kill processes affected
by a bad memory.
Regards,
Mauro
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-09-11 13:51 ` Igor Mammedov
2024-09-13 5:44 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-09-11 13:51 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Sun, 25 Aug 2024 05:45:56 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Store HEST table address at GPA, placing its content at
> hest_addr_le variable.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
This looks good to me.
in addition to this, it needs a patch on top to make sure
that we migrate hest_addr_le.
See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
and fixes on top of that for an example.
> ---
>
> Change from v8:
> - hest_addr_lr is now pointing to the error source size and data.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 15 +++++++++++++++
> include/hw/acpi/ghes.h | 1 +
> 2 files changed, 16 insertions(+)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index e9511d9b8f71..529c14e3289f 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -30,6 +30,7 @@
>
> #define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
> #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
>
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
> @@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
>
> acpi_table_begin(&table, table_data);
>
> + int hest_offset = table_data->len;
> +
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
>
> acpi_table_end(linker, &table);
> +
> + /*
> + * tell firmware to write into GPA the address of HEST via fw_cfg,
> + * once initialized.
> + */
> + bios_linker_loader_write_pointer(linker,
> + ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> + sizeof(uint64_t),
> + ACPI_BUILD_TABLE_FILE, hest_offset);
> }
>
> void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> @@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> + fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> + NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> +
> ags->present = true;
> }
>
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 674f6958e905..28b956acb19a 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -63,6 +63,7 @@ enum {
> };
>
> typedef struct AcpiGhesState {
> + uint64_t hest_addr_le;
> uint64_t ghes_addr_le;
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID
2024-08-25 3:45 ` [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
@ 2024-09-11 15:01 ` Igor Mammedov
2024-09-13 15:14 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-09-11 15:01 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm,
qemu-devel
On Sun, 25 Aug 2024 05:45:57 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The current logic is based on a lot of duct tape, with
> offsets calculated based on one define with the number of
> source IDs and an enum.
>
> Rewrite the logic in a way that it would be more resilient
> of code changes, by moving the source ID count to an enum
> and make the offset calculus more explicit.
>
> Such change was inspired on a patch from Jonathan Cameron
> splitting the logic to get the CPER address on a separate
> function, as this will be needed to support generic error
> injection.
patch is too large and does too many things at once,
see inline suggestions on how to split it in more
manageable chunks.
(I'll mark preferred patch order with numbers)
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> ---
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 275 ++++++++++++++++++++++++---------------
> hw/arm/virt-acpi-build.c | 10 +-
> include/hw/acpi/ghes.h | 18 +--
> include/hw/arm/virt.h | 7 +
> target/arm/kvm.c | 3 +-
> 5 files changed, 198 insertions(+), 115 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 529c14e3289f..965fb1b36587 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -35,9 +35,6 @@
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
>
> -/* Now only support ARMv8 SEA notification type error source */
> -#define ACPI_GHES_ERROR_SOURCE_COUNT 1
[patch 4] getting rid of this and introducing num_sources
(aka variable size HEST)
> /* Generic Hardware Error Source version 2 */
> #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
>
> @@ -64,6 +61,19 @@
> */
> #define ACPI_GHES_GESB_SIZE 20
>
> +/*
> + * Offsets with regards to the start of the HEST table stored at
> + * ags->hest_addr_le, according with the memory layout map at
> + * docs/specs/acpi_hest_ghes.rst.
> + */
perhaps mention in comment/commit message, that hest lookup
is implemented only GHESv2 error sources.
That will work as far we do forward migration only
(i.e. old qemu -> new qemu), which is what upstream supports.
However it won't work for backward migration (new qemu -> old qemu)
since old one doesn't know about new non-GHESv2 sources.
And that means we would need to introduce compat knobs for every
new non-GHESv2 source is added. Which is easy to overlook and
it adds up to maintenance.
(You've already described zoo of types ACPI spec has in v8 review,
but I don't thing it's too complex to implement lookup of all
known types. compared to headache we would have with compat
settings if anyone remembers)
I won't insist on adding all known sources lookup in this series,
if you agree to do it as a patch on top of this series within this
dev cycle (~2 months time-frame).
> +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */
+ ,Table 18-383
> +#define HEST_GHES_V2_TABLE_SIZE 92
> +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
> +
> +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */
Table 18-380 'Error Status Address' field
> +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
> +
> /*
> * Values for error_severity field
> */
> @@ -185,51 +195,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> build_append_int_noprefix(table, 0, 7);
> }
>
> -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> - uint64_t error_physical_addr)
> +static void
> +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> + const uint8_t *section_type,
> + int data_length)
[patch 2] splitting acpi_ghes_record_mem_error() on reusable and mem specific
code
> {
> - GArray *block;
> -
> - /* Memory Error Section Type */
> - const uint8_t uefi_cper_mem_sec[] =
> - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> - 0xED, 0x7C, 0x83, 0xB1);
> -
> /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> * Table 17-13 Generic Error Data Entry
> */
> QemuUUID fru_id = {};
> - uint32_t data_length;
>
> - block = g_array_new(false, true /* clear */, 1);
> -
> - /* This is the length if adding a new generic error data entry*/
> - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> /*
> - * It should not run out of the preallocated memory if adding a new generic
> - * error data entry
> + * Calculate the size with this block. No need to check for
> + * too big CPER, as CPER size is checked at ghes_record_cper_errors()
> */
> - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> - ACPI_GHES_MAX_RAW_DATA_LENGTH);
> + data_length += ACPI_GHES_GESB_SIZE;
>
> /* Build the new generic error status block header */
> acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
>
> /* Build this new generic error data entry header */
> - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> + acpi_ghes_generic_error_data(block, section_type,
> ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
> -
> - /* Build the memory section CPER for above new generic error data entry */
> - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> -
> - /* Write the generic error data entry into guest memory */
> - cpu_physical_memory_write(error_block_address, block->data, block->len);
> -
> - g_array_free(block, true);
> -
> - return 0;
> }
>
> /*
> @@ -237,17 +226,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
> * See docs/specs/acpi_hest_ghes.rst for blobs format.
> */
> -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> + int num_sources)
> {
> int i, error_status_block_offset;
>
> /* Build error_block_address */
> - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> + for (i = 0; i < num_sources; i++) {
> build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> }
>
> /* Build read_ack_register */
> - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> + for (i = 0; i < num_sources; i++) {
> /*
> * Initialize the value of read_ack_register to 1, so GHES can be
> * writable after (re)boot.
> @@ -262,13 +252,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
>
> /* Reserve space for Error Status Data Block */
> acpi_data_push(hardware_errors,
> - ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> + ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
>
> /* Tell guest firmware to place hardware_errors blob into RAM */
> bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> hardware_errors, sizeof(uint64_t), false);
>
> - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> + for (i = 0; i < num_sources; i++) {
> /*
> * Tell firmware to patch error_block_address entries to point to
> * corresponding "Generic Error Status Block"
> @@ -283,14 +273,20 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> * tell firmware to write hardware_errors GPA into
> * hardware_errors_addr fw_cfg, once the former has been initialized.
> */
> - bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> - 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> + bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0,
> + sizeof(uint64_t),
> + ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
[patch 1] all indent changes in its own patch, or just drop them altogether
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> -static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> +static void build_ghes_v2(GArray *table_data,
> + BIOSLinker *linker,
> + enum AcpiGhesNotifyType notify,
> + uint16_t source_id,
> + int num_sources)
> {
> uint64_t address_offset;
> +
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -317,21 +313,13 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> 4 /* QWord access */, 0);
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> - address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
> + address_offset + GAS_ADDR_OFFSET,
> + sizeof(uint64_t),
> + ACPI_GHES_ERRORS_FW_CFG_FILE,
> + source_id * sizeof(uint64_t));
ditto
> - switch (source_id) {
> - case ACPI_HEST_SRC_ID_SEA:
> - /*
> - * Notification Structure
> - * Now only enable ARMv8 SEA notification type
> - */
> - build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> - break;
> - default:
> - error_report("Not support this error source");
> - abort();
> - }
> + /* Notification Structure */
> + build_ghes_hw_error_notification(table_data, notify);
>
> /* Error Status Block Length */
> build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> @@ -345,9 +333,11 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> 4 /* QWord access */, 0);
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> - address_offset + GAS_ADDR_OFFSET,
> - sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> - (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
> + address_offset + GAS_ADDR_OFFSET,
> + sizeof(uint64_t),
> + ACPI_GHES_ERRORS_FW_CFG_FILE,
> + (num_sources + source_id) *
> + sizeof(uint64_t));
ditto
> /*
> * Read Ack Preserve field
> @@ -360,19 +350,28 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> }
>
> /* Build Hardware Error Source Table */
> -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> + BIOSLinker *linker,
> + const uint16_t * const notify,
> + int num_sources,
> const char *oem_id, const char *oem_table_id)
> {
> AcpiTable table = { .sig = "HEST", .rev = 1,
> .oem_id = oem_id, .oem_table_id = oem_table_id };
> + int i;
> +
> + build_ghes_error_table(hardware_errors, linker, num_sources);
>
> acpi_table_begin(&table, table_data);
>
> + /* Beginning at the HEST Error Source struct count and data */
> int hest_offset = table_data->len;
>
> /* Error Source Count */
> - build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> - build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> + build_append_int_noprefix(table_data, num_sources, 4);
> + for (i = 0; i < num_sources; i++) {
> + build_ghes_v2(table_data, linker, notify[i], i, num_sources);
> + }
>
> acpi_table_end(linker, &table);
>
> @@ -403,60 +402,132 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +void ghes_record_cper_errors(const void *cper, size_t len,
> + uint16_t source_id, Error **errp)
[patch 3] switching to hest source id lookup method
> {
> - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> - uint64_t start_addr;
> - bool ret = -1;
> + uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> + uint64_t hest_addr, cper_addr, err_source_struct;
> + uint64_t hest_err_block_addr, error_block_addr;
> + uint32_t num_sources, i;
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
> + uint64_t read_ack;
>
> - assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> + if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> + error_setg(errp, "GHES CPER record is too big: %ld", len);
> + }
>
> acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> NULL));
> g_assert(acpi_ged_state);
> ags = &acpi_ged_state->ghes_state;
>
> - start_addr = le64_to_cpu(ags->ghes_addr_le);
> -
> - if (physical_address) {
> -
> - if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> -
> - cpu_physical_memory_read(start_addr, &error_block_addr,
> - sizeof(error_block_addr));
> -
> - error_block_addr = le64_to_cpu(error_block_addr);
> -
> - read_ack_register_addr = start_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> -
> - cpu_physical_memory_read(read_ack_register_addr,
> - &read_ack_register, sizeof(read_ack_register));
> -
> - /* zero means OSPM does not acknowledge the error */
> - if (!read_ack_register) {
> - error_report("OSPM does not acknowledge previous error,"
> - " so can not record CPER for current error anymore");
> - } else if (error_block_addr) {
> - read_ack_register = cpu_to_le64(0);
> - /*
> - * Clear the Read Ack Register, OSPM will write it to 1 when
> - * it acknowledges this error.
> - */
> - cpu_physical_memory_write(read_ack_register_addr,
> - &read_ack_register, sizeof(uint64_t));
> -
> - ret = acpi_ghes_record_mem_error(error_block_addr,
> - physical_address);
> - } else
> - error_report("can not find Generic Error Status Block");
> + hest_addr = le64_to_cpu(ags->hest_addr_le);
> +
> + cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
> +
> + if (source_id >= num_sources) {
> + error_setg(errp,
> + "GHES: Source %d not found. Only %d sources are defined",
> + source_id, num_sources);
> + return;
> + }
> + err_source_struct = hest_addr + sizeof(num_sources);
> +
> + for (i = 0; i < num_sources; i++) {
> + uint64_t addr = err_source_struct;
> + uint16_t type, src_id;
> +
> + cpu_physical_memory_read(addr, &type, sizeof(type));
> +
> + /* For now, we only know the size of GHESv2 table */
> + assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
> +
> + /* It is GHES. Compare CPER source address */
> + addr += sizeof(type);
> + cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> +
> + if (src_id == source_id)
> + break;
> +
> + err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> + }
> + if (i == num_sources) {
> + error_setg(errp, "HEST: Source %d not found.", source_id);
> + return;
> + }
> +
> + /* Check if BIOS addr pointers were properly generated */
> +
> + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> + hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> +
> + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> + sizeof(error_block_addr));
> +
> + cpu_physical_memory_read(error_block_addr, &cper_addr,
> + sizeof(error_block_addr));
> +
> + cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> + sizeof(read_ack_start_addr));
> +
> + /* Update ACK offset to notify about a new error */
> +
> + cpu_physical_memory_read(read_ack_start_addr,
> + &read_ack, sizeof(read_ack));
> +
> + /* zero means OSPM does not acknowledge the error */
> + if (!read_ack) {
> + error_setg(errp,
> + "Last CPER record was not acknowledged yet");
> + read_ack = 1;
> + cpu_physical_memory_write(read_ack_start_addr,
> + &read_ack, sizeof(read_ack));
> + return;
> + }
> +
> + read_ack = cpu_to_le64(0);
> + cpu_physical_memory_write(read_ack_start_addr,
> + &read_ack, sizeof(read_ack));
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(cper_addr, cper, len);
> +}
> +
> +int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
> +{
> + /* Memory Error Section Type */
> + const uint8_t guid[] =
> + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> + 0xED, 0x7C, 0x83, 0xB1);
> + Error *errp = NULL;
> + GArray *block;
> +
> + if (!physical_address) {
> + error_report("can not find Generic Error Status Block for source id %d",
> + source_id);
> + return -1;
> + }
> +
> + block = g_array_new(false, true /* clear */, 1);
> +
> + ghes_gen_err_data_uncorrectable_recoverable(block, guid,
> + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> + /* Build the memory section CPER for above new generic error data entry */
> + acpi_ghes_build_append_mem_cper(block, physical_address);
> +
> + /* Report the error */
> + ghes_record_cper_errors(block->data, block->len, source_id, &errp);
> +
> + g_array_free(block, true);
> +
> + if (errp) {
> + error_report_err(errp);
> + return -1;
> }
>
> - return ret;
> + return 0;
> }
>
> bool acpi_ghes_present(void)
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..39100c2822c2 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -890,6 +890,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
> g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
> }
>
> +static const uint16_t hest_ghes_notify[] = {
> + [ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA,
> +};
I agree that machine/platform shall opt in for a specific source id,
but I'm not sure about whether we need platform specific source ids,
it seems to complicate things needlessly.
For example if one would define different src_id for error injection
for ARM and X86, then we would somehow need to take that in account
when QMP command X called so it would use correct ID
Maybe this needs it's own patch with a commit message that
would explain need for this approach (but so far I'm not seeing the point).
PS:
I'd prefer common/shared SRC_ID registry, from which boards would pick
applicable ones.
> +
> static
> void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> {
> @@ -943,10 +947,10 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> build_dbg2(tables_blob, tables->linker, vms);
>
> if (vms->ras) {
> - build_ghes_error_table(tables->hardware_errors, tables->linker);
> acpi_add_table(table_offsets, tables_blob);
> - acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
> - vms->oem_table_id);
> + acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> + hest_ghes_notify, sizeof(hest_ghes_notify),
> + vms->oem_id, vms->oem_table_id);
> }
>
> if (ms->numa_state->num_nodes > 0) {
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 28b956acb19a..4b5af86ec077 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,7 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qapi/error.h"
>
> /*
> * Values for Hardware Error Notification Type field
> @@ -56,24 +57,23 @@ enum AcpiGhesNotifyType {
> ACPI_GHES_NOTIFY_RESERVED = 12
> };
>
> -enum {
> - ACPI_HEST_SRC_ID_SEA = 0,
> - /* future ids go here */
> - ACPI_HEST_SRC_ID_RESERVED,
> -};
> -
> typedef struct AcpiGhesState {
> uint64_t hest_addr_le;
> uint64_t ghes_addr_le;
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
>
> -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
> -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> + BIOSLinker *linker,
> + const uint16_t * const notify,
> + int num_sources,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(int source_id,
> + uint64_t error_physical_addr);
> +void ghes_record_cper_errors(const void *cper, size_t len,
use GArray for cper so you won't have to pass down len
> + uint16_t source_id, Error **errp);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a4d937ed45ac..d62d8d9db5ae 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -189,6 +189,13 @@ OBJECT_DECLARE_TYPE(VirtMachineState, VirtMachineClass, VIRT_MACHINE)
> void virt_acpi_setup(VirtMachineState *vms);
> bool virt_is_acpi_enabled(VirtMachineState *vms);
>
> +/*
> + * ID numbers used to fill HEST source ID field
> + */
> +enum {
> + ARM_ACPI_HEST_SRC_ID_SEA,
> +};
> +
> /* Return number of redistributors that fit in the specified region */
> static uint32_t virt_redist_capacity(VirtMachineState *vms, int region)
> {
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 849e2e21b304..8c4c8263b85a 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> */
> if (code == BUS_MCEERR_AR) {
> kvm_cpu_synchronize_state(c);
> - if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
> + if (!acpi_ghes_record_errors(ARM_ACPI_HEST_SRC_ID_SEA,
> + paddr)) {
> kvm_inject_arm_sea(c);
> } else {
> error_report("failed to record the error");
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-09-11 13:51 ` Igor Mammedov
@ 2024-09-13 5:44 ` Mauro Carvalho Chehab
2024-09-13 13:25 ` Igor Mammedov
0 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-13 5:44 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Em Wed, 11 Sep 2024 15:51:08 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Sun, 25 Aug 2024 05:45:56 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Store HEST table address at GPA, placing its content at
> > hest_addr_le variable.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> This looks good to me.
>
> in addition to this, it needs a patch on top to make sure
> that we migrate hest_addr_le.
> See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> and fixes on top of that for an example.
Hmm... If I understood such change well, vmstate_ghes_state() will
use this structure as basis to do migration:
/* ghes.h */
typedef struct AcpiGhesState {
uint64_t hest_addr_le;
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
/* generic_event_device.c */
static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
.version_id = 1,
.minimum_version_id = 1,
.needed = ghes_needed,
.fields = (VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
vmstate_ghes_state, AcpiGhesState),
VMSTATE_END_OF_LIST()
}
};
/* hw/arm/virt-acpi-build.c */
void virt_acpi_setup(VirtMachineState *vms)
{
...
if (vms->ras) {
assert(vms->acpi_dev);
acpi_ged_state = ACPI_GED(vms->acpi_dev);
acpi_ghes_add_fw_cfg(&acpi_ged_state->ghes_state,
vms->fw_cfg, tables.hardware_errors);
}
Which relies on acpi_ghes_add_fw_cfg() function to setup callbacks for
the migration:
/* ghes.c */
void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
GArray *hardware_error)
{
/* Create a read-only fw_cfg file for GHES */
fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
hardware_error->len);
/* Create a read-write fw_cfg file for Address */
fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
ags->present = true;
}
It sounds to me that both ghes_addr_le and hest_addr_le will be migrated
altogether.
Did I miss something?
Thanks,
Mauro
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-09-13 5:44 ` Mauro Carvalho Chehab
@ 2024-09-13 13:25 ` Igor Mammedov
2024-09-14 5:33 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-09-13 13:25 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Fri, 13 Sep 2024 07:44:35 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Wed, 11 Sep 2024 15:51:08 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Sun, 25 Aug 2024 05:45:56 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > Store HEST table address at GPA, placing its content at
> > > hest_addr_le variable.
> > >
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >
> > This looks good to me.
> >
> > in addition to this, it needs a patch on top to make sure
> > that we migrate hest_addr_le.
> > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > and fixes on top of that for an example.
>
> Hmm... If I understood such change well, vmstate_ghes_state() will
> use this structure as basis to do migration:
>
> /* ghes.h */
> typedef struct AcpiGhesState {
> uint64_t hest_addr_le;
> uint64_t ghes_addr_le;
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
>
> /* generic_event_device.c */
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .needed = ghes_needed,
> .fields = (VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> vmstate_ghes_state, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> }
> };
current code looks like that:
static const VMStateDescription vmstate_ghes = {
.name = "acpi-ghes",
.version_id = 1,
.minimum_version_id = 1,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
VMSTATE_END_OF_LIST()
},
};
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
return s->ghes_state.ghes_addr_le;
}
static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
.version_id = 1,
.minimum_version_id = 1,
.needed = ghes_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
vmstate_ghes, AcpiGhesState),
VMSTATE_END_OF_LIST()
}
};
where
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
explicitly defines field(s) within structure to be sent over wire.
we need to add a conditional field for ghes_addr_le
which will be sent only with new machine types, but not with old ones
to avoid migration breakage.
I don't know much about migration, but maybe we can get away with
similar condition as in ghes_needed(), or enabling QMP error injection
based on machine type version.
Or maybe adding a CLI option to enable QMP error injection in which
case the explicit option would serve as a trigger enable QMP command and
to migrate hest_addr_le.
It might be even better this way, since machine wouldn't need to
carry extra error source that will be used only for testing
and practically never in production VMs (aka reduced attack surface).
You can easily test it locally:
new-qemu: with your patches
old-qemu: qemu-9.1
and then try to do forth & back migration for following cases:
1. (ping-pong case with OLD firmware/ACPI tables)
start old-qemu with 9.1 machine type ->
migrate to file ->
start new-qemu with 9.1 machine type -> restore from file ->
migrate to file ->
start old-qemu with 9.1 machine type ->restore from file ->
2. (ping-pong case with NEW firmware/ACPI tables)
do the same as #1 but starting with new-qemu binary
(from upstream pov #2 is optional, but not implementing it
is pain for downstream so it's better to have it if it's not
too much work)
> /* hw/arm/virt-acpi-build.c */
> void virt_acpi_setup(VirtMachineState *vms)
> {
> ...
> if (vms->ras) {
> assert(vms->acpi_dev);
> acpi_ged_state = ACPI_GED(vms->acpi_dev);
> acpi_ghes_add_fw_cfg(&acpi_ged_state->ghes_state,
> vms->fw_cfg, tables.hardware_errors);
> }
>
> Which relies on acpi_ghes_add_fw_cfg() function to setup callbacks for
> the migration:
>
> /* ghes.c */
> void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> GArray *hardware_error)
> {
> /* Create a read-only fw_cfg file for GHES */
> fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
> hardware_error->len);
>
> /* Create a read-write fw_cfg file for Address */
> fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
>
> ags->present = true;
> }
>
> It sounds to me that both ghes_addr_le and hest_addr_le will be migrated
> altogether.
fwcfg callbacks are irrelevant to migration, they tell firmware what to do
with specified addresses (in our case, write into ags->hest_addr_le address
of HEST), so that HW (qemu) would know where firmware placed the table.
But that's about all it does.
>
> Did I miss something?
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros
2024-08-25 3:45 ` [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-09-13 13:27 ` Igor Mammedov
0 siblings, 0 replies; 29+ messages in thread
From: Igor Mammedov @ 2024-09-13 13:27 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Sun, 25 Aug 2024 05:45:58 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Now that we have also have a file to store HEST data location,
> which is part of GHES, better name the file where CPER records
> are stored.
>
> No functional changes.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 965fb1b36587..3190eb954de4 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -28,8 +28,8 @@
> #include "hw/nvram/fw_cfg.h"
> #include "qemu/uuid.h"
>
> -#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
> -#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> +#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
> +#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> #define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
>
> /* The max size in bytes for one error block */
> @@ -255,7 +255,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
>
> /* Tell guest firmware to place hardware_errors blob into RAM */
> - bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> + bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
> hardware_errors, sizeof(uint64_t), false);
>
> for (i = 0; i < num_sources; i++) {
> @@ -264,8 +264,8 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> * corresponding "Generic Error Status Block"
> */
> bios_linker_loader_add_pointer(linker,
> - ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
> - sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
> + ACPI_HW_ERROR_FW_CFG_FILE, sizeof(uint64_t) * i,
> + sizeof(uint64_t), ACPI_HW_ERROR_FW_CFG_FILE,
> error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
> }
>
> @@ -273,9 +273,9 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> * tell firmware to write hardware_errors GPA into
> * hardware_errors_addr fw_cfg, once the former has been initialized.
> */
> - bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0,
> + bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
> sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> + ACPI_HW_ERROR_FW_CFG_FILE, 0);
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> @@ -315,7 +315,7 @@ static void build_ghes_v2(GArray *table_data,
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET,
> sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE,
> + ACPI_HW_ERROR_FW_CFG_FILE,
> source_id * sizeof(uint64_t));
>
> /* Notification Structure */
> @@ -335,7 +335,7 @@ static void build_ghes_v2(GArray *table_data,
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET,
> sizeof(uint64_t),
> - ACPI_GHES_ERRORS_FW_CFG_FILE,
> + ACPI_HW_ERROR_FW_CFG_FILE,
> (num_sources + source_id) *
> sizeof(uint64_t));
>
> @@ -389,11 +389,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> GArray *hardware_error)
> {
> /* Create a read-only fw_cfg file for GHES */
> - fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
> + fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
> hardware_error->len);
>
> /* Create a read-write fw_cfg file for Address */
> - fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> + fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 04/12] acpi/ghes: better name GHES memory error function
2024-08-25 3:45 ` [PATCH v9 04/12] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-09-13 13:31 ` Igor Mammedov
0 siblings, 0 replies; 29+ messages in thread
From: Igor Mammedov @ 2024-09-13 13:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, kvm, linux-kernel, qemu-arm, qemu-devel
On Sun, 25 Aug 2024 05:45:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The current function used to generate GHES data is specific for
> memory errors. Give a better name for it, as we now have a generic
> function as well.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 2 +-
> include/hw/acpi/ghes.h | 4 ++--
> target/arm/kvm.c | 2 +-
> 4 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index c315de1802d6..dd41b3fd91df 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,7 +11,7 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/ghes.h"
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_memory_errors(uint8_t source_id, uint64_t physical_address)
> {
> return -1;
> }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 3190eb954de4..10ed9c0614ff 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -494,7 +494,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> cpu_physical_memory_write(cper_addr, cper, len);
> }
>
> -int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
> +int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
> {
> /* Memory Error Section Type */
> const uint8_t guid[] =
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 4b5af86ec077..be53b7c53c91 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -70,7 +70,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(int source_id,
> +int acpi_ghes_memory_errors(int source_id,
> uint64_t error_physical_addr);
> void ghes_record_cper_errors(const void *cper, size_t len,
> uint16_t source_id, Error **errp);
> @@ -79,7 +79,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> * acpi_ghes_present: Report whether ACPI GHES table is present
> *
> * Returns: true if the system has an ACPI GHES table and it is
> - * safe to call acpi_ghes_record_errors() to record a memory error.
> + * safe to call acpi_ghes_memory_errors() to record a memory error.
> */
> bool acpi_ghes_present(void);
> #endif
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 8c4c8263b85a..8e63e9a59a5e 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2373,7 +2373,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> */
> if (code == BUS_MCEERR_AR) {
> kvm_cpu_synchronize_state(c);
> - if (!acpi_ghes_record_errors(ARM_ACPI_HEST_SRC_ID_SEA,
> + if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SEA,
> paddr)) {
> kvm_inject_arm_sea(c);
> } else {
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID
2024-09-11 15:01 ` Igor Mammedov
@ 2024-09-13 15:14 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-13 15:14 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm,
qemu-devel
Em Wed, 11 Sep 2024 17:01:57 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Sun, 25 Aug 2024 05:45:57 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > The current logic is based on a lot of duct tape, with
> > offsets calculated based on one define with the number of
> > source IDs and an enum.
> >
> > Rewrite the logic in a way that it would be more resilient
> > of code changes, by moving the source ID count to an enum
> > and make the offset calculus more explicit.
> >
> > Such change was inspired on a patch from Jonathan Cameron
> > splitting the logic to get the CPER address on a separate
> > function, as this will be needed to support generic error
> > injection.
>
> patch is too large and does too many things at once,
> see inline suggestions on how to split it in more
> manageable chunks.
> (I'll mark preferred patch order with numbers)
I ended adding more patches to make changes more logic.
>
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >
> > ---
> >
> > Changes from v8:
> > - Non-rename/cleanup changes merged altogether;
> > - source ID is now more generic, defined per guest target.
> > That should make easier to add support for 86.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/ghes.c | 275 ++++++++++++++++++++++++---------------
> > hw/arm/virt-acpi-build.c | 10 +-
> > include/hw/acpi/ghes.h | 18 +--
> > include/hw/arm/virt.h | 7 +
> > target/arm/kvm.c | 3 +-
> > 5 files changed, 198 insertions(+), 115 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 529c14e3289f..965fb1b36587 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -35,9 +35,6 @@
> > /* The max size in bytes for one error block */
> > #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
> >
> > -/* Now only support ARMv8 SEA notification type error source */
> > -#define ACPI_GHES_ERROR_SOURCE_COUNT 1
>
> [patch 4] getting rid of this and introducing num_sources
> (aka variable size HEST)
ok.
>
> > /* Generic Hardware Error Source version 2 */
> > #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
> >
> > @@ -64,6 +61,19 @@
> > */
> > #define ACPI_GHES_GESB_SIZE 20
> >
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> perhaps mention in comment/commit message, that hest lookup
> is implemented only GHESv2 error sources.
Ok, will add a comment, but IMO, it fits better at the routine which
handles HEST error sources, so I added this there:
/*
* Currently, HEST Error source navigates only for GHESv2 tables
*/
for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
uint64_t addr = err_source_struct;
uint16_t type, src_id;
...
>
> That will work as far we do forward migration only
> (i.e. old qemu -> new qemu), which is what upstream supports.
>
> However it won't work for backward migration (new qemu -> old qemu)
> since old one doesn't know about new non-GHESv2 sources.
> And that means we would need to introduce compat knobs for every
> new non-GHESv2 source is added. Which is easy to overlook and
> it adds up to maintenance.
> (You've already described zoo of types ACPI spec has in v8 review,
> but I don't thing it's too complex to implement lookup of all
> known types. compared to headache we would have with compat
> settings if anyone remembers)
>
> I won't insist on adding all known sources lookup in this series,
> if you agree to do it as a patch on top of this series within this
> dev cycle (~2 months time-frame).
Seems fine to me to place it at the dev cycle.
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 */
>
> + ,Table 18-383
>
> > +#define HEST_GHES_V2_TABLE_SIZE 92
> > +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
> > +
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source */
>
> Table 18-380 'Error Status Address' field
Actually on ACPI 6.2, those tables are 18-382 and 18-379.
I'll change the above to reflect that:
/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
* Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
*/
#define HEST_GHES_V2_TABLE_SIZE 92
#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
* Table 18-379: 'Error Status Address' field
>
> > +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
> > +
> > /*
> > * Values for error_severity field
> > */
> > @@ -185,51 +195,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> > build_append_int_noprefix(table, 0, 7);
> > }
> >
> > -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > - uint64_t error_physical_addr)
> > +static void
> > +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> > + const uint8_t *section_type,
> > + int data_length)
> [patch 2] splitting acpi_ghes_record_mem_error() on reusable and mem specific
> code
Ok, will move to a separate patch after this one.
>
> > {
> > - GArray *block;
> > -
> > - /* Memory Error Section Type */
> > - const uint8_t uefi_cper_mem_sec[] =
> > - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > - 0xED, 0x7C, 0x83, 0xB1);
> > -
> > /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> > * Table 17-13 Generic Error Data Entry
> > */
> > QemuUUID fru_id = {};
> > - uint32_t data_length;
> >
> > - block = g_array_new(false, true /* clear */, 1);
> > -
> > - /* This is the length if adding a new generic error data entry*/
> > - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> > /*
> > - * It should not run out of the preallocated memory if adding a new generic
> > - * error data entry
> > + * Calculate the size with this block. No need to check for
> > + * too big CPER, as CPER size is checked at ghes_record_cper_errors()
> > */
> > - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> > - ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > + data_length += ACPI_GHES_GESB_SIZE;
> >
> > /* Build the new generic error status block header */
> > acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> > 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> >
> > /* Build this new generic error data entry header */
> > - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> > + acpi_ghes_generic_error_data(block, section_type,
> > ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> > ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
> > -
> > - /* Build the memory section CPER for above new generic error data entry */
> > - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> > -
> > - /* Write the generic error data entry into guest memory */
> > - cpu_physical_memory_write(error_block_address, block->data, block->len);
> > -
> > - g_array_free(block, true);
> > -
> > - return 0;
> > }
> >
> > /*
> > @@ -237,17 +226,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
> > * See docs/specs/acpi_hest_ghes.rst for blobs format.
> > */
> > -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> > +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> > + int num_sources)
> > {
> > int i, error_status_block_offset;
> >
> > /* Build error_block_address */
> > - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > + for (i = 0; i < num_sources; i++) {
> > build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> > }
> >
> > /* Build read_ack_register */
> > - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > + for (i = 0; i < num_sources; i++) {
> > /*
> > * Initialize the value of read_ack_register to 1, so GHES can be
> > * writable after (re)boot.
> > @@ -262,13 +252,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> >
> > /* Reserve space for Error Status Data Block */
> > acpi_data_push(hardware_errors,
> > - ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
> > + ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
> >
> > /* Tell guest firmware to place hardware_errors blob into RAM */
> > bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
> > hardware_errors, sizeof(uint64_t), false);
> >
> > - for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > + for (i = 0; i < num_sources; i++) {
> > /*
> > * Tell firmware to patch error_block_address entries to point to
> > * corresponding "Generic Error Status Block"
>
> > @@ -283,14 +273,20 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> > * tell firmware to write hardware_errors GPA into
> > * hardware_errors_addr fw_cfg, once the former has been initialized.
> > */
> > - bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
> > - 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> > + bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, 0,
> > + sizeof(uint64_t),
> > + ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
>
> [patch 1] all indent changes in its own patch, or just drop them altogether
I'll drop the pure reformat changes from this patch. They'll
be placed at the patches that rename ACPI_GHES_*_FW_CFG_*.
Same for other occurrences.
> >
...
> > /*
> > * Read Ack Preserve field
> > @@ -360,19 +350,28 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> > }
> >
> > /* Build Hardware Error Source Table */
> > -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> > +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > + BIOSLinker *linker,
> > + const uint16_t * const notify,
> > + int num_sources,
> > const char *oem_id, const char *oem_table_id)
> > {
> > AcpiTable table = { .sig = "HEST", .rev = 1,
> > .oem_id = oem_id, .oem_table_id = oem_table_id };
> > + int i;
> > +
> > + build_ghes_error_table(hardware_errors, linker, num_sources);
> >
> > acpi_table_begin(&table, table_data);
> >
> > + /* Beginning at the HEST Error Source struct count and data */
> > int hest_offset = table_data->len;
> >
> > /* Error Source Count */
> > - build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> > - build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> > + build_append_int_noprefix(table_data, num_sources, 4);
> > + for (i = 0; i < num_sources; i++) {
> > + build_ghes_v2(table_data, linker, notify[i], i, num_sources);
> > + }
> >
> > acpi_table_end(linker, &table);
> >
> > @@ -403,60 +402,132 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > ags->present = true;
> > }
> >
> > -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > + uint16_t source_id, Error **errp)
>
> [patch 3] switching to hest source id lookup method
Ok.
> > {
> > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > - uint64_t start_addr;
> > - bool ret = -1;
> > + uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> > + uint64_t hest_addr, cper_addr, err_source_struct;
> > + uint64_t hest_err_block_addr, error_block_addr;
> > + uint32_t num_sources, i;
> > AcpiGedState *acpi_ged_state;
> > AcpiGhesState *ags;
> > + uint64_t read_ack;
> >
> > - assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> > + if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> > + error_setg(errp, "GHES CPER record is too big: %ld", len);
> > + }
> >
> > acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > NULL));
> > g_assert(acpi_ged_state);
> > ags = &acpi_ged_state->ghes_state;
> >
> > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > -
> > - if (physical_address) {
> > -
> > - if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> > - start_addr += source_id * sizeof(uint64_t);
> > - }
> > -
> > - cpu_physical_memory_read(start_addr, &error_block_addr,
> > - sizeof(error_block_addr));
> > -
> > - error_block_addr = le64_to_cpu(error_block_addr);
> > -
> > - read_ack_register_addr = start_addr +
> > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > -
> > - cpu_physical_memory_read(read_ack_register_addr,
> > - &read_ack_register, sizeof(read_ack_register));
> > -
> > - /* zero means OSPM does not acknowledge the error */
> > - if (!read_ack_register) {
> > - error_report("OSPM does not acknowledge previous error,"
> > - " so can not record CPER for current error anymore");
> > - } else if (error_block_addr) {
> > - read_ack_register = cpu_to_le64(0);
> > - /*
> > - * Clear the Read Ack Register, OSPM will write it to 1 when
> > - * it acknowledges this error.
> > - */
> > - cpu_physical_memory_write(read_ack_register_addr,
> > - &read_ack_register, sizeof(uint64_t));
> > -
> > - ret = acpi_ghes_record_mem_error(error_block_addr,
> > - physical_address);
> > - } else
> > - error_report("can not find Generic Error Status Block");
> > + hest_addr = le64_to_cpu(ags->hest_addr_le);
> > +
> > + cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
> > +
> > + if (source_id >= num_sources) {
> > + error_setg(errp,
> > + "GHES: Source %d not found. Only %d sources are defined",
> > + source_id, num_sources);
> > + return;
> > + }
> > + err_source_struct = hest_addr + sizeof(num_sources);
> > +
> > + for (i = 0; i < num_sources; i++) {
> > + uint64_t addr = err_source_struct;
> > + uint16_t type, src_id;
> > +
> > + cpu_physical_memory_read(addr, &type, sizeof(type));
> > +
> > + /* For now, we only know the size of GHESv2 table */
> > + assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
> > +
> > + /* It is GHES. Compare CPER source address */
> > + addr += sizeof(type);
> > + cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > +
> > + if (src_id == source_id)
> > + break;
> > +
> > + err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > + }
> > + if (i == num_sources) {
> > + error_setg(errp, "HEST: Source %d not found.", source_id);
> > + return;
> > + }
> > +
> > + /* Check if BIOS addr pointers were properly generated */
> > +
> > + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> > + hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> > +
> > + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > + sizeof(error_block_addr));
> > +
> > + cpu_physical_memory_read(error_block_addr, &cper_addr,
> > + sizeof(error_block_addr));
> > +
> > + cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> > + sizeof(read_ack_start_addr));
> > +
> > + /* Update ACK offset to notify about a new error */
> > +
> > + cpu_physical_memory_read(read_ack_start_addr,
> > + &read_ack, sizeof(read_ack));
> > +
> > + /* zero means OSPM does not acknowledge the error */
> > + if (!read_ack) {
> > + error_setg(errp,
> > + "Last CPER record was not acknowledged yet");
> > + read_ack = 1;
> > + cpu_physical_memory_write(read_ack_start_addr,
> > + &read_ack, sizeof(read_ack));
> > + return;
> > + }
> > +
> > + read_ack = cpu_to_le64(0);
> > + cpu_physical_memory_write(read_ack_start_addr,
> > + &read_ack, sizeof(read_ack));
> > +
> > + /* Write the generic error data entry into guest memory */
> > + cpu_physical_memory_write(cper_addr, cper, len);
> > +}
> > +
> > +int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
> > +{
> > + /* Memory Error Section Type */
> > + const uint8_t guid[] =
> > + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > + 0xED, 0x7C, 0x83, 0xB1);
> > + Error *errp = NULL;
> > + GArray *block;
> > +
> > + if (!physical_address) {
> > + error_report("can not find Generic Error Status Block for source id %d",
> > + source_id);
> > + return -1;
> > + }
> > +
> > + block = g_array_new(false, true /* clear */, 1);
> > +
> > + ghes_gen_err_data_uncorrectable_recoverable(block, guid,
> > + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > +
> > + /* Build the memory section CPER for above new generic error data entry */
> > + acpi_ghes_build_append_mem_cper(block, physical_address);
> > +
> > + /* Report the error */
> > + ghes_record_cper_errors(block->data, block->len, source_id, &errp);
> > +
> > + g_array_free(block, true);
> > +
> > + if (errp) {
> > + error_report_err(errp);
> > + return -1;
> > }
> >
> > - return ret;
> > + return 0;
> > }
> >
> > bool acpi_ghes_present(void)
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index f76fb117adff..39100c2822c2 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -890,6 +890,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
> > g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
> > }
> >
> > +static const uint16_t hest_ghes_notify[] = {
> > + [ARM_ACPI_HEST_SRC_ID_SEA] = ACPI_GHES_NOTIFY_SEA,
> > +};
>
> I agree that machine/platform shall opt in for a specific source id,
> but I'm not sure about whether we need platform specific source ids,
> it seems to complicate things needlessly.
>
> For example if one would define different src_id for error injection
> for ARM and X86, then we would somehow need to take that in account
> when QMP command X called so it would use correct ID
>
> Maybe this needs it's own patch with a commit message that
> would explain need for this approach (but so far I'm not seeing the point).
>
> PS:
> I'd prefer common/shared SRC_ID registry, from which boards would pick
> applicable ones.
I'll use a different approach, adding this to ghes.h:
/*
* ID numbers used to fill HEST source ID field
*/
enum AcpiGhesSourceID {
ACPI_HEST_SRC_ID_SYNC,
ACPI_HEST_SRC_ID_QMP, /* Use it only for QMP injected errors */
};
typedef struct AcpiNotificationSourceId {
enum AcpiGhesSourceID source_id;
enum AcpiGhesNotifyType notify;
} AcpiNotificationSourceId;
And, at the binding logic (at arm/virt-acpi-build):
static const AcpiNotificationSourceId hest_ghes_notify[] = {
{ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
{ACPI_HEST_SRC_ID_QMP, ACPI_GHES_NOTIFY_GPIO},
};
...
acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
hest_ghes_notify, sizeof(hest_ghes_notify),
vms->oem_id, vms->oem_table_id);
For x86, with just QMP implemented, this will be:
static const AcpiNotificationSourceId hest_ghes_notify[] = {
{ACPI_HEST_SRC_ID_QMP, ACPI_GHES_NOTIFY_SCI},
};
...
acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
hest_ghes_notify, sizeof(hest_ghes_notify),
vms->oem_id, vms->oem_table_id);
As the current logic doesn't assume anymore that source_id is an
index, but instead searches for it along the error structures,
such logic works fine and allows each arch to define what IDs
they'll use and what notification is associated to each one of
them.
> > +
> > static
> > void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> > {
> > @@ -943,10 +947,10 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> > build_dbg2(tables_blob, tables->linker, vms);
> >
> > if (vms->ras) {
> > - build_ghes_error_table(tables->hardware_errors, tables->linker);
> > acpi_add_table(table_offsets, tables_blob);
> > - acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
> > - vms->oem_table_id);
> > + acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> > + hest_ghes_notify, sizeof(hest_ghes_notify),
> > + vms->oem_id, vms->oem_table_id);
> > }
> >
> > if (ms->numa_state->num_nodes > 0) {
> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index 28b956acb19a..4b5af86ec077 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -23,6 +23,7 @@
> > #define ACPI_GHES_H
> >
> > #include "hw/acpi/bios-linker-loader.h"
> > +#include "qapi/error.h"
> >
> > /*
> > * Values for Hardware Error Notification Type field
> > @@ -56,24 +57,23 @@ enum AcpiGhesNotifyType {
> > ACPI_GHES_NOTIFY_RESERVED = 12
> > };
> >
> > -enum {
> > - ACPI_HEST_SRC_ID_SEA = 0,
> > - /* future ids go here */
> > - ACPI_HEST_SRC_ID_RESERVED,
> > -};
> > -
> > typedef struct AcpiGhesState {
> > uint64_t hest_addr_le;
> > uint64_t ghes_addr_le;
> > bool present; /* True if GHES is present at all on this board */
> > } AcpiGhesState;
> >
> > -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
> > -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> > +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > + BIOSLinker *linker,
> > + const uint16_t * const notify,
> > + int num_sources,
> > const char *oem_id, const char *oem_table_id);
> > void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> > GArray *hardware_errors);
> > -int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
> > +int acpi_ghes_record_errors(int source_id,
> > + uint64_t error_physical_addr);
> > +void ghes_record_cper_errors(const void *cper, size_t len,
>
> use GArray for cper so you won't have to pass down len
Here, it would work fine, but when adding hw/acpi/ghes_cper.c,
the logic there is:
cper = qbase64_decode(qmp_cper, -1, &len, errp);
if (!cper) {
error_setg(errp, "missing GHES CPER payload");
return;
}
ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
If I use a GArray, it would mean an extra memory allocation
for no good reason.
So, IMO, better to keep passing buffer and length.
Regards,
Mauro
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-09-13 13:25 ` Igor Mammedov
@ 2024-09-14 5:33 ` Mauro Carvalho Chehab
2024-09-16 11:05 ` Igor Mammedov
0 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 5:33 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Hi Igor,
Em Fri, 13 Sep 2024 15:25:18 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > > in addition to this, it needs a patch on top to make sure
> > > that we migrate hest_addr_le.
> > > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > > and fixes on top of that for an example.
> >
> > Hmm... If I understood such change well, vmstate_ghes_state() will
> > use this structure as basis to do migration:
> >
> > /* ghes.h */
> > typedef struct AcpiGhesState {
> > uint64_t hest_addr_le;
> > uint64_t ghes_addr_le;
> > bool present; /* True if GHES is present at all on this board */
> > } AcpiGhesState;
> >
> > /* generic_event_device.c */
> > static const VMStateDescription vmstate_ghes_state = {
> > .name = "acpi-ged/ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .needed = ghes_needed,
> > .fields = (VMStateField[]) {
> > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > vmstate_ghes_state, AcpiGhesState),
> > VMSTATE_END_OF_LIST()
> > }
> > };
>
> current code looks like that:
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
> VMSTATE_END_OF_LIST()
> },
> };
>
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> return s->ghes_state.ghes_addr_le;
> }
>
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> vmstate_ghes, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> }
> };
>
> where
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> explicitly defines field(s) within structure to be sent over wire.
>
> we need to add a conditional field for ghes_addr_le
> which will be sent only with new machine types, but not with old ones
> to avoid migration breakage.
>
> I don't know much about migration, but maybe we can get away with
> similar condition as in ghes_needed(), or enabling QMP error injection
> based on machine type version.
>
> Or maybe adding a CLI option to enable QMP error injection in which
> case the explicit option would serve as a trigger enable QMP command and
> to migrate hest_addr_le.
> It might be even better this way, since machine wouldn't need to
> carry extra error source that will be used only for testing
> and practically never in production VMs (aka reduced attack surface).
>
> You can easily test it locally:
> new-qemu: with your patches
> old-qemu: qemu-9.1
>
> and then try to do forth & back migration for following cases:
> 1. (ping-pong case with OLD firmware/ACPI tables)
> start old-qemu with 9.1 machine type ->
> migrate to file ->
> start new-qemu with 9.1 machine type -> restore from file ->
> migrate to file ->
As I never used migration, I'm a little stuck with the command line
parameters.
I guess I got the one to do the migration at the monitor:
(qemu) migrate file://tmp/migrate
But no idea how to start a machine using a saved state.
> start old-qemu with 9.1 machine type ->restore from file ->
>
> 2. (ping-pong case with NEW firmware/ACPI tables)
> do the same as #1 but starting with new-qemu binary
>
> (from upstream pov #2 is optional, but not implementing it
> is pain for downstream so it's better to have it if it's not
> too much work)
If I understood the migration documentation, every when new fields
are added, we should increment .version_id. If new version is
not backward-compatible, .minimum_version_id is also incremented.
So, for a migration-compatible code with a 9.1 VM, the code needs to
handle the case where hest_addr_le is not defined, e. g. use offsets
relative to ghes_addr_le, just like the current version, e.g.:
uint64_t cper_addr, read_ack_start_addr;
AcpiGedState *acpi_ged_state =
ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
AcpiGhesState *ags = &acpi_ged_state->ghes_state;
if (!ags->hest_addr_le) {
// Backward-compatible migration code
uint64_t base = le64_to_cpu(ags->ghes_addr_le);
*read_ack_start_addr = base +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
error_source_to_index[notify] * sizeof(uint64_t);
*cper_addr = base +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
} else {
// Use the new logic from ags->hest_addr_le
}
There are two problems with that:
1. On your reviews, if I understood right, the code above is not
migration safe. So, while implementing it would be technically
correct, migration still won't work;
2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not
defined anymore, as the size of the error source structure can
be different on different architectures, being 2 for new
VMs and 1 for old ones.
Basically the new code gets it right because it can see a
pointer to the HEST table, so it can get the number from there:
hest_addr = le64_to_cpu(ags->hest_addr_le);
cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
But, without hest_addr_le, getting num_sources is not possible.
An alternative would be to add a hacky code that works only for
arm machines (as new versions may support more archs).
Something like:
#define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1
#define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2
And have a hardcoded logic that would work before/after this
changeset but may break on newer versions, if the number of
source IDs change, if we add other HEST types, etc.
Now, assuming that such hack would work, it sounds too hacky to
my taste.
So, IMO it is a lot safer to not support migrations from v1 (only
ghes_addr_le), using a patch like the enclosed one to ensure that.
Btw, checking existing migration structs, it sounds that for almost all
structures, .version_id is identical to .minimum_version_id, meaning that
migration between different versions aren't supported on most cases.
Thanks,
Mauro
---
[PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr
The GHES migration logic at GED should now support HEST table
location too.
Increase migration version and change needed to check for both
ghes_addr_le and hest_addr_le.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index b4c83a089a02..efae0ff62c7b 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = {
static const VMStateDescription vmstate_ghes = {
.name = "acpi-ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
+ VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
VMSTATE_END_OF_LIST()
},
};
@@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = {
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
- return s->ghes_state.ghes_addr_le;
+ return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
}
static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.needed = ghes_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-09-14 5:33 ` Mauro Carvalho Chehab
@ 2024-09-16 11:05 ` Igor Mammedov
2024-10-01 8:54 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 29+ messages in thread
From: Igor Mammedov @ 2024-09-16 11:05 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Sat, 14 Sep 2024 07:33:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Hi Igor,
>
> Em Fri, 13 Sep 2024 15:25:18 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > > > in addition to this, it needs a patch on top to make sure
> > > > that we migrate hest_addr_le.
> > > > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > > > and fixes on top of that for an example.
> > >
> > > Hmm... If I understood such change well, vmstate_ghes_state() will
> > > use this structure as basis to do migration:
> > >
> > > /* ghes.h */
> > > typedef struct AcpiGhesState {
> > > uint64_t hest_addr_le;
> > > uint64_t ghes_addr_le;
> > > bool present; /* True if GHES is present at all on this board */
> > > } AcpiGhesState;
> > >
> > > /* generic_event_device.c */
> > > static const VMStateDescription vmstate_ghes_state = {
> > > .name = "acpi-ged/ghes",
> > > .version_id = 1,
> > > .minimum_version_id = 1,
> > > .needed = ghes_needed,
> > > .fields = (VMStateField[]) {
> > > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > > vmstate_ghes_state, AcpiGhesState),
> > > VMSTATE_END_OF_LIST()
> > > }
> > > };
> >
> > current code looks like that:
> >
> > static const VMStateDescription vmstate_ghes = {
> > .name = "acpi-ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .fields = (const VMStateField[]) {
> > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
> > VMSTATE_END_OF_LIST()
> > },
> > };
> >
> > static bool ghes_needed(void *opaque)
> > {
> > AcpiGedState *s = opaque;
> > return s->ghes_state.ghes_addr_le;
> > }
> >
> > static const VMStateDescription vmstate_ghes_state = {
> > .name = "acpi-ged/ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .needed = ghes_needed,
> > .fields = (const VMStateField[]) {
> > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > vmstate_ghes, AcpiGhesState),
> > VMSTATE_END_OF_LIST()
> > }
> > };
> >
> > where
> > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> > explicitly defines field(s) within structure to be sent over wire.
> >
> > we need to add a conditional field for ghes_addr_le
> > which will be sent only with new machine types, but not with old ones
> > to avoid migration breakage.
> >
> > I don't know much about migration, but maybe we can get away with
> > similar condition as in ghes_needed(), or enabling QMP error injection
> > based on machine type version.
> >
> > Or maybe adding a CLI option to enable QMP error injection in which
> > case the explicit option would serve as a trigger enable QMP command and
> > to migrate hest_addr_le.
> > It might be even better this way, since machine wouldn't need to
> > carry extra error source that will be used only for testing
> > and practically never in production VMs (aka reduced attack surface).
> >
> > You can easily test it locally:
> > new-qemu: with your patches
> > old-qemu: qemu-9.1
> >
> > and then try to do forth & back migration for following cases:
> > 1. (ping-pong case with OLD firmware/ACPI tables)
> > start old-qemu with 9.1 machine type ->
> > migrate to file ->
> > start new-qemu with 9.1 machine type -> restore from file ->
> > migrate to file ->
>
> As I never used migration, I'm a little stuck with the command line
> parameters.
>
> I guess I got the one to do the migration at the monitor:
>
> (qemu) migrate file://tmp/migrate
>
> But no idea how to start a machine using a saved state.
see https://www.linux-kvm.org/page/Migration
'savevm/loadvm to an external state file (using pseudo-migration)' section
>
> > start old-qemu with 9.1 machine type ->restore from file ->
> >
> > 2. (ping-pong case with NEW firmware/ACPI tables)
> > do the same as #1 but starting with new-qemu binary
> >
> > (from upstream pov #2 is optional, but not implementing it
> > is pain for downstream so it's better to have it if it's not
> > too much work)
>
> If I understood the migration documentation, every when new fields
> are added, we should increment .version_id. If new version is
> not backward-compatible, .minimum_version_id is also incremented.
>
> So, for a migration-compatible code with a 9.1 VM, the code needs to
> handle the case where hest_addr_le is not defined, e. g. use offsets
> relative to ghes_addr_le, just like the current version, e.g.:
>
> uint64_t cper_addr, read_ack_start_addr;
>
> AcpiGedState *acpi_ged_state =
> ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> AcpiGhesState *ags = &acpi_ged_state->ghes_state;
>
> if (!ags->hest_addr_le) {
> // Backward-compatible migration code
> uint64_t base = le64_to_cpu(ags->ghes_addr_le);
>
> *read_ack_start_addr = base +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> error_source_to_index[notify] * sizeof(uint64_t);
>
> *cper_addr = base +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> } else {
> // Use the new logic from ags->hest_addr_le
> }
>
> There are two problems with that:
>
> 1. On your reviews, if I understood right, the code above is not
> migration safe. So, while implementing it would be technically
> correct, migration still won't work;
>
> 2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not
> defined anymore, as the size of the error source structure can
> be different on different architectures, being 2 for new
> VMs and 1 for old ones.
>
> Basically the new code gets it right because it can see a
> pointer to the HEST table, so it can get the number from there:
>
> hest_addr = le64_to_cpu(ags->hest_addr_le);
> cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
>
> But, without hest_addr_le, getting num_sources is not possible.
>
> An alternative would be to add a hacky code that works only for
> arm machines (as new versions may support more archs).
>
> Something like:
> #define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1
> #define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2
>
> And have a hardcoded logic that would work before/after this
> changeset but may break on newer versions, if the number of
> source IDs change, if we add other HEST types, etc.
>
> Now, assuming that such hack would work, it sounds too hacky to
> my taste.
>
> So, IMO it is a lot safer to not support migrations from v1 (only
> ghes_addr_le), using a patch like the enclosed one to ensure that.
>
> Btw, checking existing migration structs, it sounds that for almost all
> structures, .version_id is identical to .minimum_version_id, meaning that
> migration between different versions aren't supported on most cases.
let me try to find more examples on how to implement migration bits
hopefully more comprehensible.
>
> Thanks,
> Mauro
>
> ---
>
> [PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr
>
> The GHES migration logic at GED should now support HEST table
> location too.
>
> Increase migration version and change needed to check for both
> ghes_addr_le and hest_addr_le.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index b4c83a089a02..efae0ff62c7b 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = {
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> + VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> },
> };
> @@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = {
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> - return s->ghes_state.ghes_addr_le;
> + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
> }
>
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
>
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-09-16 11:05 ` Igor Mammedov
@ 2024-10-01 8:54 ` Mauro Carvalho Chehab
2024-10-03 11:48 ` Igor Mammedov
0 siblings, 1 reply; 29+ messages in thread
From: Mauro Carvalho Chehab @ 2024-10-01 8:54 UTC (permalink / raw)
To: Igor Mammedov
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Em Mon, 16 Sep 2024 13:05:06 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > But no idea how to start a machine using a saved state.
>
> see https://www.linux-kvm.org/page/Migration
> 'savevm/loadvm to an external state file (using pseudo-migration)' section
>
It didn't work. Is migration currently working between 9.1 and 9.2?
I did a compilation of qemu version v9.1.0-rc0 and saved the state.
Then, on vanilla 9.2 (changeset 01dc65a3bc26), I tried to restore the
state with both "virt" and "virt-9.1". None worked:
$ qemu-system-aarch64 -m 4g,maxmem=8G,slots=8 -M type=virt-9.1,nvdimm=on,gic-version=3,ras=on -cpu max -smp 4 -numa node,nodeid=0,cpus=0-3,memdev=mem0 --nographic -monitor telnet:127.0.0.1:1234,server,nowait -incoming "exec: gzip -c -d statefile.gz" -no-reboot -bios /new_devel/edac/emulator/QEMU_EFI-silent.fd -kernel /new_devel/edac/work/arm64_build/arch/arm64/boot/Image.gz -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd -device virtio-net-pci,netdev=mynet,id=bob -drive if=none,file=/new_devel/edac/emulator/debian.qcow2,format=qcow2,id=hd -object memory-backend-ram,size=4G,id=mem0 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 -qmp tcp:localhost:4445,server=on,wait=off -append 'earlycon nomodeset root=/dev/vda1 fsck.mode=skip tp_printk maxcpus=4'
qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
qemu-system-aarch64: load of migration failed: Operation not permitted
$ qemu-system-aarch64 -m 4g,maxmem=8G,slots=8 -M type=virt,nvdimm=on,gic-version=3,ras=on -cpu max -smp 4 -numa node,nodeid=0,cpus=0-3,memdev=mem0 --nographic -monitor telnet:127.0.0.1:1234,server,nowait -incoming "exec: gzip -c -d statefile.gz" -no-reboot -bios /new_devel/edac/emulator/QEMU_EFI-silent.fd -kernel /new_devel/edac/work/arm64_build/arch/arm64/boot/Image.gz -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd -device virtio-net-pci,netdev=mynet,id=bob -drive if=none,file=/new_devel/edac/emulator/debian.qcow2,format=qcow2,id=hd -object memory-backend-ram,size=4G,id=mem0 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 -qmp tcp:localhost:4445,server=on,wait=off -append 'earlycon nomodeset root=/dev/vda1 fsck.mode=skip tp_printk maxcpus=4'
qemu-system-aarch64: Machine type received is 'virt-9.1' and local is 'virt-9.2'
qemu-system-aarch64: load of migration failed: Invalid argument
Did I made something wrong?
Regards,
Mauro
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
2024-10-01 8:54 ` Mauro Carvalho Chehab
@ 2024-10-03 11:48 ` Igor Mammedov
0 siblings, 0 replies; 29+ messages in thread
From: Igor Mammedov @ 2024-10-03 11:48 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
On Tue, 1 Oct 2024 10:54:26 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Mon, 16 Sep 2024 13:05:06 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > > But no idea how to start a machine using a saved state.
> >
> > see https://www.linux-kvm.org/page/Migration
> > 'savevm/loadvm to an external state file (using pseudo-migration)' section
> >
>
> It didn't work. Is migration currently working between 9.1 and 9.2?
>
> I did a compilation of qemu version v9.1.0-rc0 and saved the state.
but it's better to use actually released 9.1 source code.
that should be effectively virt-9.1 machine type
(unless something changed between rc0 and release version)
show us CLI on source side (i.e. qemu instance where you saved state,
it should be the same as below modulo '-incoming' option)
>
> Then, on vanilla 9.2 (changeset 01dc65a3bc26), I tried to restore the
> state with both "virt" and "virt-9.1". None worked:
>
>
> $ qemu-system-aarch64 -m 4g,maxmem=8G,slots=8 -M type=virt-9.1,nvdimm=on,gic-version=3,ras=on -cpu max -smp 4 -numa node,nodeid=0,cpus=0-3,memdev=mem0 --nographic -monitor telnet:127.0.0.1:1234,server,nowait -incoming "exec: gzip -c -d statefile.gz" -no-reboot -bios /new_devel/edac/emulator/QEMU_EFI-silent.fd -kernel /new_devel/edac/work/arm64_build/arch/arm64/boot/Image.gz -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd -device virtio-net-pci,netdev=mynet,id=bob -drive if=none,file=/new_devel/edac/emulator/debian.qcow2,format=qcow2,id=hd -object memory-backend-ram,size=4G,id=mem0 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 -qmp tcp:localhost:4445,server=on,wait=off -append 'earlycon nomodeset root=/dev/vda1 fsck.mode=skip tp_printk maxcpus=4'
> qemu-system-aarch64: error while loading state for instance 0x0 of device 'cpu'
> qemu-system-aarch64: load of migration failed: Operation not permitted
and that should've worked
> $ qemu-system-aarch64 -m 4g,maxmem=8G,slots=8 -M type=virt,nvdimm=on,gic-version=3,ras=on -cpu max -smp 4 -numa node,nodeid=0,cpus=0-3,memdev=mem0 --nographic -monitor telnet:127.0.0.1:1234,server,nowait -incoming "exec: gzip -c -d statefile.gz" -no-reboot -bios /new_devel/edac/emulator/QEMU_EFI-silent.fd -kernel /new_devel/edac/work/arm64_build/arch/arm64/boot/Image.gz -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd -device virtio-net-pci,netdev=mynet,id=bob -drive if=none,file=/new_devel/edac/emulator/debian.qcow2,format=qcow2,id=hd -object memory-backend-ram,size=4G,id=mem0 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 -qmp tcp:localhost:4445,server=on,wait=off -append 'earlycon nomodeset root=/dev/vda1 fsck.mode=skip tp_printk maxcpus=4'
> qemu-system-aarch64: Machine type received is 'virt-9.1' and local is 'virt-9.2'
> qemu-system-aarch64: load of migration failed: Invalid argument
that isn't meant to work.
>
> Did I made something wrong?
let's see src CLI first, and then ask someone who knows more about migration and how to troubleshoot it
> Regards,
> Mauro
>
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-10-03 11:49 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-11 13:51 ` Igor Mammedov
2024-09-13 5:44 ` Mauro Carvalho Chehab
2024-09-13 13:25 ` Igor Mammedov
2024-09-14 5:33 ` Mauro Carvalho Chehab
2024-09-16 11:05 ` Igor Mammedov
2024-10-01 8:54 ` Mauro Carvalho Chehab
2024-10-03 11:48 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-09-11 15:01 ` Igor Mammedov
2024-09-13 15:14 ` Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-13 13:27 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 04/12] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-13 13:31 ` Igor Mammedov
2024-08-25 3:46 ` [PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 06/12] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-27 11:11 ` Markus Armbruster
2024-08-25 3:46 ` [PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-08-25 11:34 ` Peter Maydell
2024-08-26 1:53 ` Mauro Carvalho Chehab
2024-08-30 16:27 ` Peter Maydell
2024-09-01 6:40 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).