From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Shiju Jose <shiju.jose@huawei.com>,
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
Ani Sinha <anisinha@redhat.com>,
Dongjiu Geng <gengdongjiu1@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Peter Maydell <peter.maydell@linaro.org>,
Shannon Zhao <shannon.zhaosl@gmail.com>,
qemu-arm@nongnu.org, qemu-devel@nongnu.org
Subject: [PATCH v4 0/7] Add ACPI CPER firmware first error injection on ARM emulation
Date: Thu, 1 Aug 2024 16:47:03 +0200 [thread overview]
Message-ID: <cover.1722523312.git.mchehab+huawei@kernel.org> (raw)
Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.
With QEMU, it becomes a lot easier, as it can be done via QMP.
This series add support for injecting CPER records on ARM emulation.
The QEMU side changes add a QAPI able to do CPER error injection
on ARM, with a raw data parameter, making it very flexible.
A script is provided at the final patch implementing support for
ARM Processor CPER error injection according with ACPI 6.x and
UEFI 2.9A/2.10 specs, via QMP.
Injecting such errors can be done using the provided script:
$ ./scripts/ghes_inject.py arm
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2621-g3de6991b870a"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "ghes-cper", "arguments": {"cper": {"notification-type": [22, 61, 158, 225, 17, 188, 228, 17, 156, 170, 194, 5, 29, 93, 70, 176], "raw-data": [0, 0, 0, 0, 1, 0, 0, 0, 72, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0]}} }
{"return": {}}
Produces a simple CPER register, properly handled by the Linux
Kernel:
[ 5876.041410] {18}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 5876.041775] {18}[Hardware Error]: event severity: recoverable
[ 5876.042023] {18}[Hardware Error]: Error 0, type: recoverable
[ 5876.042280] {18}[Hardware Error]: section_type: ARM processor error
[ 5876.042538] {18}[Hardware Error]: MIDR: 0x0000000000000000
[ 5876.042781] {18}[Hardware Error]: Error info structure 0:
[ 5876.043013] {18}[Hardware Error]: num errors: 2
[ 5876.043222] {18}[Hardware Error]: error_type: 0x02: cache error
[ 5876.043500] {18}[Hardware Error]: error_info: 0x0000000000000000
[ 5876.043800] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
More complex use cases can be done, like:
$ ./scripts/ghes_inject.py arm --mpidr 0x444 --running --affinity 1 --error-info 12345678 --vendor 0x13,123,4,5,1 --ctx-array 0,1,2,3,4,5 -t cache tlb bus vendor tlb,vendor
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2621-g3de6991b870a"}, "capabilities": ["oob"]}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "ghes-cper", "arguments": {"cper": {"notification-type": [22, 61, 158, 225, 17, 188, 228, 17, 156, 170, 194, 5, 29, 93, 70, 176], "raw-data": [7, 0, 0, 0, 5, 0, 1, 0, 13, 1, 0, 0, 1, 0, 0, 0, 68, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 8, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 4, 0, 16, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 32, 0, 0, 20, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 239, 190, 173, 222, 0, 0, 0, 0, 173, 11, 186, 171, 0, 0, 0, 0, 0, 0, 5, 0, 56, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 123, 4, 5, 1]}} }
{"return": {}}
964.134325] {19}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 5964.134692] {19}[Hardware Error]: event severity: recoverable
[ 5964.134942] {19}[Hardware Error]: Error 0, type: recoverable
[ 5964.135200] {19}[Hardware Error]: section_type: ARM processor error
[ 5964.135466] {19}[Hardware Error]: MIDR: 0x0000000000000000
[ 5964.135700] {19}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000000000444
[ 5964.136025] {19}[Hardware Error]: error affinity level: 1
[ 5964.136255] {19}[Hardware Error]: running state: 0x1
[ 5964.136468] {19}[Hardware Error]: Power State Coordination Interface state: 0
[ 5964.136767] {19}[Hardware Error]: Error info structure 0:
[ 5964.137001] {19}[Hardware Error]: num errors: 2
[ 5964.137210] {19}[Hardware Error]: error_type: 0x02: cache error
[ 5964.137472] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.137737] {19}[Hardware Error]: Error info structure 1:
[ 5964.137976] {19}[Hardware Error]: num errors: 2
[ 5964.138192] {19}[Hardware Error]: error_type: 0x04: TLB error
[ 5964.138459] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.138727] {19}[Hardware Error]: Error info structure 2:
[ 5964.138967] {19}[Hardware Error]: num errors: 2
[ 5964.139185] {19}[Hardware Error]: error_type: 0x08: bus error
[ 5964.139451] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.139751] {19}[Hardware Error]: Error info structure 3:
[ 5964.139993] {19}[Hardware Error]: num errors: 2
[ 5964.140210] {19}[Hardware Error]: error_type: 0x10: micro-architectural error
[ 5964.140522] {19}[Hardware Error]: error_info: 0x0000000000000000
[ 5964.140790] {19}[Hardware Error]: Error info structure 4:
[ 5964.141030] {19}[Hardware Error]: num errors: 2
[ 5964.141261] {19}[Hardware Error]: error_type: 0x14: TLB error|micro-architectural error
[ 5964.141599] {19}[Hardware Error]: Context info structure 0:
[ 5964.141843] {19}[Hardware Error]: register context type: AArch64 EL1 context registers
[ 5964.142195] {19}[Hardware Error]: 00000000: 00000000 00000000 00000001 00000000
[ 5964.142534] {19}[Hardware Error]: 00000010: 00000002 00000000 00000003 00000000
[ 5964.142867] {19}[Hardware Error]: 00000020: 00000004 00000000 00000005 00000000
[ 5964.143193] {19}[Hardware Error]: 00000030: 00000000 00000000
[ 5964.143464] {19}[Hardware Error]: Vendor specific error info has 5 bytes:
[ 5964.143750] {19}[Hardware Error]: 00000000: 13 7b 04 05 01 .{...
[ 5964.144164] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
[ 5964.144483] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[ 5964.144793] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[ 5964.145099] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[ 5964.145454] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error
---
v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.
v3:
- patch 1 cleanups with some comment changes and adding another place where
the poweroff GPIO define should be used. No changes on other patches (except
due to conflict resolution).
v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.
Jonathan Cameron (1):
acpi/ghes: Support GPIO error source
Mauro Carvalho Chehab (6):
arm/virt: place power button pin number on a define
acpi/generic_event_device: add an APEI error device
arm/virt: Wire up GPIO error source for ACPI / GHES
qapi/ghes-cper: add an interface to do generic CPER error injection
acpi/ghes: add support for generic error injection via QAPI
scripts/ghes_inject: add a script to generate GHES error inject
MAINTAINERS | 8 +
hw/acpi/Kconfig | 5 +
hw/acpi/generic_event_device.c | 17 +
hw/acpi/ghes.c | 178 ++++++-
hw/acpi/ghes_cper.c | 53 ++
hw/acpi/meson.build | 2 +
hw/arm/Kconfig | 5 +
hw/arm/virt-acpi-build.c | 25 +-
hw/arm/virt.c | 33 +-
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/generic_event_device.h | 3 +
include/hw/acpi/ghes.h | 14 +-
include/hw/arm/virt.h | 5 +
qapi/ghes-cper.json | 54 ++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
scripts/ghes_inject.py | 673 +++++++++++++++++++++++++
17 files changed, 1048 insertions(+), 30 deletions(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 qapi/ghes-cper.json
create mode 100755 scripts/ghes_inject.py
--
2.45.2
next reply other threads:[~2024-08-01 14:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-01 14:47 Mauro Carvalho Chehab [this message]
2024-08-01 14:47 ` [PATCH v4 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-08-01 14:47 ` [PATCH v4 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-01 14:47 ` [PATCH v4 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-08-01 14:47 ` [PATCH v4 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-08-01 14:47 ` [PATCH v4 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-02 9:44 ` Markus Armbruster
2024-08-01 14:47 ` [PATCH v4 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-01 14:47 ` [PATCH v4 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1722523312.git.mchehab+huawei@kernel.org \
--to=mchehab+huawei@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shannon.zhaosl@gmail.com \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.