* [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
@ 2024-09-14 6:13 Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
0 siblings, 2 replies; 6+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Cleber Rosa, Dongjiu Geng,
Eric Blake, John Snow, Markus Armbruster, Michael Roth,
Paolo Bonzini, Peter Maydell, Shannon Zhao, kvm, linux-kernel,
qemu-arm, qemu-devel
This series add support for injecting generic CPER records. Such records
are generated outside QEMU via a provided script.
On this version, the patch reworking the way offsets are calculated were
split on several other patches, to make one logical change per patch and
make review easier.
Despite the number of patches increased from 12 to 21, there is just one
real new patch (as the other ones are a split from a big change):
acpi/generic_event_device: Update GHES migration to cover hest addr
---
v10:
- Patch 1 split on several patches to make reviews easier;
- Added a migration patch;
- CPER QMP command was renamed;
- Updated some comments to better reflect exact ACPI version;
- Removed a code to reset acks when OSPM fails to read records;
- Removed a duplicated config GHES_CPER symbol;
- There is now an arch-independent namespace for GHES source IDs;
- Fixed the size of hest_ghes_notify array when creating tables;
- acpi-hest.json is now a section of ACPI;
- QMP command renamed from @ghes-cper to inject-ghes-error.
v9:
- Patches reorganized to make easier for reviewers;
- source ID is now guest-OS specific;
- Some patches got a revision history since v8;
- Several minor cleanups.
v8:
- Fix one of the BIOS links that were incorrect;
- Changed mem error internal injection to use a common code;
- No more hardcoded values for CPER: instead of using just the
payload at the QAPI, it now has the full raw CPER there;
- Error injection script now supports changing fields at the
Generic Error Data section of the CPER;
- Several minor cleanups.
v7:
- Change the way offsets are calculated and used on HEST table.
Now, it is compatible with migrations as all offsets are relative
to the HEST table;
- GHES interface is now more generic: the entire CPER is sent via
QMP, instead of just the payload;
- Some code cleanups to make the code more robust;
- The python script now uses QEMUMonitorProtocol class.
v6:
- PNP0C33 device creation moved to aml-build.c;
- acpi_ghes record functions now use ACPI notify parameter,
instead of source ID;
- the number of source IDs is now automatically calculated;
- some code cleanups and function/var renames;
- some fixes and cleanups at the error injection script;
- ghes cper stub now produces an error if cper JSON is not compiled;
- Offset calculation logic for GHES was refactored;
- Updated documentation to reflect the GHES allocated size;
- Added a x-mpidr object for QOM usage;
- Added a patch making usage of x-mpidr field at ARM injection
script;
v5:
- CPER guid is now passing as string;
- raw-data is now passed with base64 encode;
- Removed several GPIO left-overs from arm/virt.c changes;
- Lots of cleanups and improvements at the error injection script.
It now better handles QMP dialog and doesn't print debug messages.
Also, code was split on two modules, to make easier to add more
error injection commands.
v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.
v3:
- patch 1 cleanups with some comment changes and adding another place where
the poweroff GPIO define should be used. No changes on other patches (except
due to conflict resolution).
v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.
Example of generating a CPER record:
$ scripts/ghes_inject.py -d arm -p 0xdeadbeef
GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0
Generic Error Status Block (20 bytes):
00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................
00000010 00 00 00 00 ....
Generic Error Data Entry (72 bytes):
00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F.
00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H.......
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000040 00 00 00 00 00 00 00 00 ........
Payload (72 bytes):
00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H.......
00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................
00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ......
00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................
00000040 ef be ad de 00 00 00 00 ........
Error injected.
[ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 9.359027] {1}[Hardware Error]: event severity: recoverable
[ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable
[ 9.360124] {1}[Hardware Error]: section_type: ARM processor error
[ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510
[ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[ 9.361643] {1}[Hardware Error]: running state: 0x0
[ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0
[ 9.362682] {1}[Hardware Error]: Error info structure 0:
[ 9.363030] {1}[Hardware Error]: num errors: 2
[ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error
[ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f
[ 9.364834] {1}[Hardware Error]: transaction type: Data Access
[ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write
[ 9.366441] {1}[Hardware Error]: cache level: 2
[ 9.367005] {1}[Hardware Error]: processor context not corrupted
[ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef
[ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered
Such script currently supports arm processor error CPER, but can easily be
extended to other GHES notification types.
Mauro Carvalho Chehab (21):
acpi/ghes: add a firmware file with HEST address
acpi/generic_event_device: Update GHES migration to cover hest addr
acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
acpi/ghes: simplify acpi_ghes_record_errors() code
acpi/ghes: better handle source_id and notification
acpi/ghes: Remove a duplicated out of bounds check
acpi/ghes: rework the logic to handle HEST source ID
acpi/ghes: Change the type for source_id
acpi/ghes: Don't hardcode the number of sources on ghes
acpi/ghes: make the GHES record generation more generic
acpi/ghes: don't crash QEMU if ghes GED is not found
acpi/ghes: rename etc/hardware_error file macros
acpi/ghes: better name GHES memory error function
acpi/ghes: add a notifier to notify when error data is ready
acpi/generic_event_device: add an APEI error device
arm/virt: Wire up a GED error device for ACPI / GHES
qapi/acpi-hest: add an interface to do generic CPER error injection
docs: acpi_hest_ghes: fix documentation for CPER size
scripts/ghes_inject: add a script to generate GHES error inject
target/arm: add an experimental mpidr arm cpu property object
scripts/arm_processor_error.py: retrieve mpidr if not filled
MAINTAINERS | 10 +
docs/specs/acpi_hest_ghes.rst | 6 +-
hw/acpi/Kconfig | 5 +
hw/acpi/aml-build.c | 10 +
hw/acpi/generic_event_device.c | 19 +-
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 312 +++++++----
hw/acpi/ghes_cper.c | 32 ++
hw/acpi/ghes_cper_stub.c | 19 +
hw/acpi/meson.build | 2 +
hw/arm/virt-acpi-build.c | 12 +-
hw/arm/virt.c | 19 +-
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/aml-build.h | 2 +
include/hw/acpi/generic_event_device.h | 1 +
include/hw/acpi/ghes.h | 37 +-
include/hw/arm/virt.h | 2 +
qapi/acpi-hest.json | 35 ++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
scripts/arm_processor_error.py | 388 ++++++++++++++
scripts/ghes_inject.py | 51 ++
scripts/qmp_helper.py | 702 +++++++++++++++++++++++++
target/arm/cpu.c | 1 +
target/arm/cpu.h | 1 +
target/arm/helper.c | 10 +-
target/arm/kvm.c | 3 +-
27 files changed, 1552 insertions(+), 132 deletions(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/acpi-hest.json
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v10 13/21] acpi/ghes: better name GHES memory error function 2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab @ 2024-09-14 6:13 ` Mauro Carvalho Chehab 2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov 1 sibling, 0 replies; 6+ messages in thread From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw) To: Igor Mammedov Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini, Peter Maydell, kvm, linux-kernel, qemu-arm, qemu-devel The current function used to generate GHES data is specific for memory errors. Give a better name for it, as we now have a generic function as well. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> --- hw/acpi/ghes-stub.c | 2 +- hw/acpi/ghes.c | 2 +- include/hw/acpi/ghes.h | 4 ++-- target/arm/kvm.c | 3 ++- 4 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index 58a04e935142..b0f053d5998f 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,7 +11,7 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_record_errors(int source_id, uint64_t physical_address) +int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) { return -1; } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index dc15d6a693d6..a8feb39c9f30 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -501,7 +501,7 @@ void ghes_record_cper_errors(const void *cper, size_t len, cpu_physical_memory_write(cper_addr, cper, len); } -int acpi_ghes_record_errors(int source_id, uint64_t physical_address) +int acpi_ghes_memory_errors(int source_id, uint64_t physical_address) { /* Memory Error Section Type */ const uint8_t guid[] = diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index 344919f1f75c..7a7961e6078a 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -82,7 +82,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors, const char *oem_id, const char *oem_table_id); void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, GArray *hardware_errors); -int acpi_ghes_record_errors(int source_id, +int acpi_ghes_memory_errors(int source_id, uint64_t error_physical_addr); void ghes_record_cper_errors(const void *cper, size_t len, uint16_t source_id, Error **errp); @@ -91,7 +91,7 @@ void ghes_record_cper_errors(const void *cper, size_t len, * acpi_ghes_present: Report whether ACPI GHES table is present * * Returns: true if the system has an ACPI GHES table and it is - * safe to call acpi_ghes_record_errors() to record a memory error. + * safe to call acpi_ghes_memory_errors() to record a memory error. */ bool acpi_ghes_present(void); #endif diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 849e2e21b304..57192285fb96 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) */ if (code == BUS_MCEERR_AR) { kvm_cpu_synchronize_state(c); - if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) { + if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SYNC, + paddr)) { kvm_inject_arm_sea(c); } else { error_report("failed to record the error"); -- 2.46.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation 2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab 2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab @ 2024-09-17 12:15 ` Igor Mammedov 2024-09-24 13:00 ` Mauro Carvalho Chehab 1 sibling, 1 reply; 6+ messages in thread From: Igor Mammedov @ 2024-09-17 12:15 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha, Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow, Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel On Sat, 14 Sep 2024 08:13:21 +0200 Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > This series add support for injecting generic CPER records. Such records > are generated outside QEMU via a provided script. > > On this version, the patch reworking the way offsets are calculated were > split on several other patches, to make one logical change per patch and > make review easier. > > Despite the number of patches increased from 12 to 21, there is just one > real new patch (as the other ones are a split from a big change): > > acpi/generic_event_device: Update GHES migration to cover hest addr I'm done with this round of review. Given that the series accumulated a bunch of cleanups, I'd suggest to move all cleanups/renamings not related to new HEST lookup and new src id mapping to the beginning of the series, so once they reviewed they could be split up into a separate series that could be merged while we are ironing down the new functionality. > --- > > v10: > - Patch 1 split on several patches to make reviews easier; > - Added a migration patch; > - CPER QMP command was renamed; > - Updated some comments to better reflect exact ACPI version; > - Removed a code to reset acks when OSPM fails to read records; > - Removed a duplicated config GHES_CPER symbol; > - There is now an arch-independent namespace for GHES source IDs; > - Fixed the size of hest_ghes_notify array when creating tables; > - acpi-hest.json is now a section of ACPI; > - QMP command renamed from @ghes-cper to inject-ghes-error. > > v9: > - Patches reorganized to make easier for reviewers; > - source ID is now guest-OS specific; > - Some patches got a revision history since v8; > - Several minor cleanups. > > v8: > - Fix one of the BIOS links that were incorrect; > - Changed mem error internal injection to use a common code; > - No more hardcoded values for CPER: instead of using just the > payload at the QAPI, it now has the full raw CPER there; > - Error injection script now supports changing fields at the > Generic Error Data section of the CPER; > - Several minor cleanups. > > v7: > - Change the way offsets are calculated and used on HEST table. > Now, it is compatible with migrations as all offsets are relative > to the HEST table; > - GHES interface is now more generic: the entire CPER is sent via > QMP, instead of just the payload; > - Some code cleanups to make the code more robust; > - The python script now uses QEMUMonitorProtocol class. > > v6: > - PNP0C33 device creation moved to aml-build.c; > - acpi_ghes record functions now use ACPI notify parameter, > instead of source ID; > - the number of source IDs is now automatically calculated; > - some code cleanups and function/var renames; > - some fixes and cleanups at the error injection script; > - ghes cper stub now produces an error if cper JSON is not compiled; > - Offset calculation logic for GHES was refactored; > - Updated documentation to reflect the GHES allocated size; > - Added a x-mpidr object for QOM usage; > - Added a patch making usage of x-mpidr field at ARM injection > script; > > v5: > - CPER guid is now passing as string; > - raw-data is now passed with base64 encode; > - Removed several GPIO left-overs from arm/virt.c changes; > - Lots of cleanups and improvements at the error injection script. > It now better handles QMP dialog and doesn't print debug messages. > Also, code was split on two modules, to make easier to add more > error injection commands. > > v4: > - CPER generation moved to happen outside QEMU; > - One patch adding support for mpidr query was removed. > > v3: > - patch 1 cleanups with some comment changes and adding another place where > the poweroff GPIO define should be used. No changes on other patches (except > due to conflict resolution). > > v2: > - added a new patch using a define for GPIO power pin; > - patch 2 changed to also use a define for generic error GPIO pin; > - a couple cleanups at patch 2 removing uneeded else clauses. > > Example of generating a CPER record: > > $ scripts/ghes_inject.py -d arm -p 0xdeadbeef > GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0 > Generic Error Status Block (20 bytes): > 00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................ > 00000010 00 00 00 00 .... > > Generic Error Data Entry (72 bytes): > 00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F. > 00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 00000040 00 00 00 00 00 00 00 00 ........ > > Payload (72 bytes): > 00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H....... > 00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................ > 00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ...... > 00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................ > 00000040 ef be ad de 00 00 00 00 ........ > > Error injected. > > [ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 > [ 9.359027] {1}[Hardware Error]: event severity: recoverable > [ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable > [ 9.360124] {1}[Hardware Error]: section_type: ARM processor error > [ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510 > [ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000 > [ 9.361643] {1}[Hardware Error]: running state: 0x0 > [ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0 > [ 9.362682] {1}[Hardware Error]: Error info structure 0: > [ 9.363030] {1}[Hardware Error]: num errors: 2 > [ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error > [ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f > [ 9.364834] {1}[Hardware Error]: transaction type: Data Access > [ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write > [ 9.366441] {1}[Hardware Error]: cache level: 2 > [ 9.367005] {1}[Hardware Error]: processor context not corrupted > [ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef > [ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered > > Such script currently supports arm processor error CPER, but can easily be > extended to other GHES notification types. > > > Mauro Carvalho Chehab (21): > acpi/ghes: add a firmware file with HEST address > acpi/generic_event_device: Update GHES migration to cover hest addr > acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED > acpi/ghes: simplify acpi_ghes_record_errors() code > acpi/ghes: better handle source_id and notification > acpi/ghes: Remove a duplicated out of bounds check > acpi/ghes: rework the logic to handle HEST source ID > acpi/ghes: Change the type for source_id > acpi/ghes: Don't hardcode the number of sources on ghes > acpi/ghes: make the GHES record generation more generic > acpi/ghes: don't crash QEMU if ghes GED is not found > acpi/ghes: rename etc/hardware_error file macros > acpi/ghes: better name GHES memory error function > acpi/ghes: add a notifier to notify when error data is ready > acpi/generic_event_device: add an APEI error device > arm/virt: Wire up a GED error device for ACPI / GHES > qapi/acpi-hest: add an interface to do generic CPER error injection > docs: acpi_hest_ghes: fix documentation for CPER size > scripts/ghes_inject: add a script to generate GHES error inject > target/arm: add an experimental mpidr arm cpu property object > scripts/arm_processor_error.py: retrieve mpidr if not filled > > MAINTAINERS | 10 + > docs/specs/acpi_hest_ghes.rst | 6 +- > hw/acpi/Kconfig | 5 + > hw/acpi/aml-build.c | 10 + > hw/acpi/generic_event_device.c | 19 +- > hw/acpi/ghes-stub.c | 2 +- > hw/acpi/ghes.c | 312 +++++++---- > hw/acpi/ghes_cper.c | 32 ++ > hw/acpi/ghes_cper_stub.c | 19 + > hw/acpi/meson.build | 2 + > hw/arm/virt-acpi-build.c | 12 +- > hw/arm/virt.c | 19 +- > include/hw/acpi/acpi_dev_interface.h | 1 + > include/hw/acpi/aml-build.h | 2 + > include/hw/acpi/generic_event_device.h | 1 + > include/hw/acpi/ghes.h | 37 +- > include/hw/arm/virt.h | 2 + > qapi/acpi-hest.json | 35 ++ > qapi/meson.build | 1 + > qapi/qapi-schema.json | 1 + > scripts/arm_processor_error.py | 388 ++++++++++++++ > scripts/ghes_inject.py | 51 ++ > scripts/qmp_helper.py | 702 +++++++++++++++++++++++++ > target/arm/cpu.c | 1 + > target/arm/cpu.h | 1 + > target/arm/helper.c | 10 +- > target/arm/kvm.c | 3 +- > 27 files changed, 1552 insertions(+), 132 deletions(-) > create mode 100644 hw/acpi/ghes_cper.c > create mode 100644 hw/acpi/ghes_cper_stub.c > create mode 100644 qapi/acpi-hest.json > create mode 100644 scripts/arm_processor_error.py > create mode 100755 scripts/ghes_inject.py > create mode 100644 scripts/qmp_helper.py > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation 2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov @ 2024-09-24 13:00 ` Mauro Carvalho Chehab 2024-09-24 13:14 ` Igor Mammedov 0 siblings, 1 reply; 6+ messages in thread From: Mauro Carvalho Chehab @ 2024-09-24 13:00 UTC (permalink / raw) To: Igor Mammedov Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha, Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow, Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel Em Tue, 17 Sep 2024 14:15:19 +0200 Igor Mammedov <imammedo@redhat.com> escreveu: > I'm done with this round of review. > > Given that the series accumulated a bunch of cleanups, > I'd suggest to move all cleanups/renamings not related > to new HEST lookup and new src id mapping to the beginning > of the series, so once they reviewed they could be split up into > a separate series that could be merged while we are ironing down > the new functionality. I've rebased the series placing the preparation stuff (cleanups and renames) at the beginning. So, what I have now is: 1) preparation patches: 41709f0898e1 acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED 5409daa41c78 acpi/ghes: simplify acpi_ghes_record_errors() code 2539f1f662b9 acpi/ghes: better handle source_id and notification 3f19400549c1 acpi/ghes: Remove a duplicated out of bounds check f0b06ecede46 acpi/ghes: Change the type for source_id 9f08301ac195 acpi/ghes: Prepare to support multiple sources on ghes 2426cd76e868 acpi/ghes: make the GHES record generation more generic 3fb7ec864700 acpi/ghes: better name GHES memory error function 1a22dad3211e acpi/ghes: don't crash QEMU if ghes GED is not found 726968d4ee20 acpi/ghes: rename etc/hardware_error file macros f562380da7ce docs: acpi_hest_ghes: fix documentation for CPER size 69850f550f99 acpi/generic_event_device: add an APEI error device Patches were changed to ensure that they won't be add any new new features. They are just code shift in order to make the diff of the next patches smaller. There is a small point here: the logic was simplified to only support a single source ID (I added an assert() to enforce it) and simplified the calculus in preparation for the HEST and migration series. 2) add a BIOS pointer to HEST, using it. The migration stuff will be along those: c24f1a8708e3 acpi/ghes: add a firmware file with HEST address 853dce23ec39 acpi/ghes: Use HEST table offsets when preparing GHES records c148716fd7c8 acpi/generic_event_device: Update GHES migration to cover hest addr Up to that, still no new features, but the offset calculus will be relative to HEST table and will use the bios pointers stored there; 3) Add support for generic error inject: f5ec0d197d82 acpi/ghes: add a notifier to notify when error data is ready f5e015537209 arm/virt: Wire up a GED error device for ACPI / GHES 3b6692dbf473 qapi/acpi-hest: add an interface to do generic CPER error injection 620a5a49f218 scripts/ghes_inject: add a script to generate GHES error inject 4) MPIDR property: 2dd6e3aae450 target/arm: add an experimental mpidr arm cpu property object 02c88cd4daa2 scripts/arm_processor_error.py: retrieve mpidr if not filled I'm still testing if the rebase didn't cause any issues. So, the above may still change a little bit. I also need to address your comments to the cleanup patches and work at the migration, but just want to double check if this is what you want. If OK to you, my plan is to submit you the cleanup patches after I finish testing the hole series. The migration logic will require some time, and I don't want to bother with the cleanup stuff while doing it. So, perhaps while I'm doing it, you could review/merge the cleanups. We can do the same for each of the 4 above series of patches, as it makes review simpler as there will be less patches to look into on each series. Would it work for you? Thanks, Mauro ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation 2024-09-24 13:00 ` Mauro Carvalho Chehab @ 2024-09-24 13:14 ` Igor Mammedov 2024-09-25 4:26 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 6+ messages in thread From: Igor Mammedov @ 2024-09-24 13:14 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha, Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow, Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel On Tue, 24 Sep 2024 15:00:58 +0200 Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote: > Em Tue, 17 Sep 2024 14:15:19 +0200 > Igor Mammedov <imammedo@redhat.com> escreveu: > > > I'm done with this round of review. > > > > Given that the series accumulated a bunch of cleanups, > > I'd suggest to move all cleanups/renamings not related > > to new HEST lookup and new src id mapping to the beginning > > of the series, so once they reviewed they could be split up into > > a separate series that could be merged while we are ironing down > > the new functionality. > > I've rebased the series placing the preparation stuff (cleanups > and renames) at the beginning. So, what I have now is: > > 1) preparation patches: > > 41709f0898e1 acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED > 5409daa41c78 acpi/ghes: simplify acpi_ghes_record_errors() code > 2539f1f662b9 acpi/ghes: better handle source_id and notification > 3f19400549c1 acpi/ghes: Remove a duplicated out of bounds check > f0b06ecede46 acpi/ghes: Change the type for source_id > 9f08301ac195 acpi/ghes: Prepare to support multiple sources on ghes > 2426cd76e868 acpi/ghes: make the GHES record generation more generic > 3fb7ec864700 acpi/ghes: better name GHES memory error function > 1a22dad3211e acpi/ghes: don't crash QEMU if ghes GED is not found > 726968d4ee20 acpi/ghes: rename etc/hardware_error file macros > f562380da7ce docs: acpi_hest_ghes: fix documentation for CPER size > 69850f550f99 acpi/generic_event_device: add an APEI error device this one doesn't belong to clean ups, I think. Lets move this to #3 part > > Patches were changed to ensure that they won't be add any new > new features. They are just code shift in order to make the diff > of the next patches smaller. > > There is a small point here: the logic was simplified to only > support a single source ID (I added an assert() to enforce it) and > simplified the calculus in preparation for the HEST and migration > series. > > > 2) add a BIOS pointer to HEST, using it. The migration stuff > will be along those: > > c24f1a8708e3 acpi/ghes: add a firmware file with HEST address > 853dce23ec39 acpi/ghes: Use HEST table offsets when preparing GHES records > c148716fd7c8 acpi/generic_event_device: Update GHES migration to cover hest addr > > Up to that, still no new features, but the offset calculus will be > relative to HEST table and will use the bios pointers stored there; > > 3) Add support for generic error inject: > > f5ec0d197d82 acpi/ghes: add a notifier to notify when error data is ready > f5e015537209 arm/virt: Wire up a GED error device for ACPI / GHES > 3b6692dbf473 qapi/acpi-hest: add an interface to do generic CPER error injection > 620a5a49f218 scripts/ghes_inject: add a script to generate GHES error inject > > 4) MPIDR property: > 2dd6e3aae450 target/arm: add an experimental mpidr arm cpu property object > 02c88cd4daa2 scripts/arm_processor_error.py: retrieve mpidr if not filled > > I'm still testing if the rebase didn't cause any issues. So, the above > may still change a little bit. I also need to address your comments to the > cleanup patches and work at the migration, but just want to double check if > this is what you want. > > If OK to you, my plan is to submit you the cleanup patches after I > finish testing the hole series. > > The migration logic will require some time, and I don't want to bother > with the cleanup stuff while doing it. So, perhaps while I'm doing it, > you could review/merge the cleanups. > > We can do the same for each of the 4 above series of patches, as it > makes review simpler as there will be less patches to look into on > each series. > > Would it work for you? other than nit above, LGTM > > Thanks, > Mauro > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation 2024-09-24 13:14 ` Igor Mammedov @ 2024-09-25 4:26 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 6+ messages in thread From: Mauro Carvalho Chehab @ 2024-09-25 4:26 UTC (permalink / raw) To: Igor Mammedov Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha, Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow, Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell, Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel Em Tue, 24 Sep 2024 15:14:29 +0200 Igor Mammedov <imammedo@redhat.com> escreveu: > > 1) preparation patches: ... > > 69850f550f99 acpi/generic_event_device: add an APEI error device > this one doesn't belong to clean ups, I think. > Lets move this to #3 part Ok. > > The migration logic will require some time, and I don't want to bother > > with the cleanup stuff while doing it. So, perhaps while I'm doing it, > > you could review/merge the cleanups. > > > > We can do the same for each of the 4 above series of patches, as it > > makes review simpler as there will be less patches to look into on > > each series. > > > > Would it work for you? > > other than nit above, LGTM > Ok, sent a PR with the first set (cleanups) at: https://lore.kernel.org/qemu-devel/cover.1727236561.git.mchehab+huawei@kernel.org/ You can see the full series at: https://gitlab.com/mchehab_kernel/qemu/-/commits/qemu_submission_v11b?ref_type=heads It works fine, except for the migration part that I'm still working with. For the migration, there are how two functions at ghes.c: The one compatible with current behavior (up to version 9.1): https://gitlab.com/mchehab_kernel/qemu/-/blob/qemu_submission_v11b/hw/acpi/ghes.c?ref_type=heads#L411 And the new one using offsets calculated from HEST (newer versions): https://gitlab.com/mchehab_kernel/qemu/-/blob/qemu_submission_v11b/hw/acpi/ghes.c?ref_type=heads#L437 With that, the migration logic can decide what function should be called (currently, it is just checking if hest_addr_le is zero, but I guess I'll need to change it to match some variable added by the migration path. Also, in preparation for the migration tests, I created a separate branch at: https://gitlab.com/mchehab_kernel/qemu/-/commits/ghes_on_v9.1.0?ref_type=heads which contains the same patches on the top of 9.1, except for the HEST ones. It also contains a hack to use ACPI_GHES_NOTIFY_GPIO instead of ACPI_GHES_NOTIFY_SEA. With that, we have a way to use the same error injection logic on both 9.1 and upstream, hopefully being enough to test if migration works. Thanks, Mauro ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-25 4:26 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab 2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab 2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov 2024-09-24 13:00 ` Mauro Carvalho Chehab 2024-09-24 13:14 ` Igor Mammedov 2024-09-25 4:26 ` Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox