From: Igor Mammedov <imammedo@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Shiju Jose <shiju.jose@huawei.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Ani Sinha <anisinha@redhat.com>, Cleber Rosa <crosa@redhat.com>,
Dongjiu Geng <gengdongjiu1@gmail.com>,
Eric Blake <eblake@redhat.com>, John Snow <jsnow@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Michael Roth <michael.roth@amd.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Peter Maydell <peter.maydell@linaro.org>,
Shannon Zhao <shannon.zhaosl@gmail.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
qemu-arm@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
Date: Tue, 17 Sep 2024 14:15:19 +0200 [thread overview]
Message-ID: <20240917141519.57766bb6@imammedo.users.ipa.redhat.com> (raw)
In-Reply-To: <cover.1726293808.git.mchehab+huawei@kernel.org>
On Sat, 14 Sep 2024 08:13:21 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> This series add support for injecting generic CPER records. Such records
> are generated outside QEMU via a provided script.
>
> On this version, the patch reworking the way offsets are calculated were
> split on several other patches, to make one logical change per patch and
> make review easier.
>
> Despite the number of patches increased from 12 to 21, there is just one
> real new patch (as the other ones are a split from a big change):
>
> acpi/generic_event_device: Update GHES migration to cover hest addr
I'm done with this round of review.
Given that the series accumulated a bunch of cleanups,
I'd suggest to move all cleanups/renamings not related
to new HEST lookup and new src id mapping to the beginning
of the series, so once they reviewed they could be split up into
a separate series that could be merged while we are ironing down
the new functionality.
> ---
>
> v10:
> - Patch 1 split on several patches to make reviews easier;
> - Added a migration patch;
> - CPER QMP command was renamed;
> - Updated some comments to better reflect exact ACPI version;
> - Removed a code to reset acks when OSPM fails to read records;
> - Removed a duplicated config GHES_CPER symbol;
> - There is now an arch-independent namespace for GHES source IDs;
> - Fixed the size of hest_ghes_notify array when creating tables;
> - acpi-hest.json is now a section of ACPI;
> - QMP command renamed from @ghes-cper to inject-ghes-error.
>
> v9:
> - Patches reorganized to make easier for reviewers;
> - source ID is now guest-OS specific;
> - Some patches got a revision history since v8;
> - Several minor cleanups.
>
> v8:
> - Fix one of the BIOS links that were incorrect;
> - Changed mem error internal injection to use a common code;
> - No more hardcoded values for CPER: instead of using just the
> payload at the QAPI, it now has the full raw CPER there;
> - Error injection script now supports changing fields at the
> Generic Error Data section of the CPER;
> - Several minor cleanups.
>
> v7:
> - Change the way offsets are calculated and used on HEST table.
> Now, it is compatible with migrations as all offsets are relative
> to the HEST table;
> - GHES interface is now more generic: the entire CPER is sent via
> QMP, instead of just the payload;
> - Some code cleanups to make the code more robust;
> - The python script now uses QEMUMonitorProtocol class.
>
> v6:
> - PNP0C33 device creation moved to aml-build.c;
> - acpi_ghes record functions now use ACPI notify parameter,
> instead of source ID;
> - the number of source IDs is now automatically calculated;
> - some code cleanups and function/var renames;
> - some fixes and cleanups at the error injection script;
> - ghes cper stub now produces an error if cper JSON is not compiled;
> - Offset calculation logic for GHES was refactored;
> - Updated documentation to reflect the GHES allocated size;
> - Added a x-mpidr object for QOM usage;
> - Added a patch making usage of x-mpidr field at ARM injection
> script;
>
> v5:
> - CPER guid is now passing as string;
> - raw-data is now passed with base64 encode;
> - Removed several GPIO left-overs from arm/virt.c changes;
> - Lots of cleanups and improvements at the error injection script.
> It now better handles QMP dialog and doesn't print debug messages.
> Also, code was split on two modules, to make easier to add more
> error injection commands.
>
> v4:
> - CPER generation moved to happen outside QEMU;
> - One patch adding support for mpidr query was removed.
>
> v3:
> - patch 1 cleanups with some comment changes and adding another place where
> the poweroff GPIO define should be used. No changes on other patches (except
> due to conflict resolution).
>
> v2:
> - added a new patch using a define for GPIO power pin;
> - patch 2 changed to also use a define for generic error GPIO pin;
> - a couple cleanups at patch 2 removing uneeded else clauses.
>
> Example of generating a CPER record:
>
> $ scripts/ghes_inject.py -d arm -p 0xdeadbeef
> GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0
> Generic Error Status Block (20 bytes):
> 00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................
> 00000010 00 00 00 00 ....
>
> Generic Error Data Entry (72 bytes):
> 00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F.
> 00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H.......
> 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00000040 00 00 00 00 00 00 00 00 ........
>
> Payload (72 bytes):
> 00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H.......
> 00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................
> 00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ......
> 00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................
> 00000040 ef be ad de 00 00 00 00 ........
>
> Error injected.
>
> [ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [ 9.359027] {1}[Hardware Error]: event severity: recoverable
> [ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable
> [ 9.360124] {1}[Hardware Error]: section_type: ARM processor error
> [ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510
> [ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
> [ 9.361643] {1}[Hardware Error]: running state: 0x0
> [ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0
> [ 9.362682] {1}[Hardware Error]: Error info structure 0:
> [ 9.363030] {1}[Hardware Error]: num errors: 2
> [ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error
> [ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f
> [ 9.364834] {1}[Hardware Error]: transaction type: Data Access
> [ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write
> [ 9.366441] {1}[Hardware Error]: cache level: 2
> [ 9.367005] {1}[Hardware Error]: processor context not corrupted
> [ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef
> [ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered
>
> Such script currently supports arm processor error CPER, but can easily be
> extended to other GHES notification types.
>
>
> Mauro Carvalho Chehab (21):
> acpi/ghes: add a firmware file with HEST address
> acpi/generic_event_device: Update GHES migration to cover hest addr
> acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
> acpi/ghes: simplify acpi_ghes_record_errors() code
> acpi/ghes: better handle source_id and notification
> acpi/ghes: Remove a duplicated out of bounds check
> acpi/ghes: rework the logic to handle HEST source ID
> acpi/ghes: Change the type for source_id
> acpi/ghes: Don't hardcode the number of sources on ghes
> acpi/ghes: make the GHES record generation more generic
> acpi/ghes: don't crash QEMU if ghes GED is not found
> acpi/ghes: rename etc/hardware_error file macros
> acpi/ghes: better name GHES memory error function
> acpi/ghes: add a notifier to notify when error data is ready
> acpi/generic_event_device: add an APEI error device
> arm/virt: Wire up a GED error device for ACPI / GHES
> qapi/acpi-hest: add an interface to do generic CPER error injection
> docs: acpi_hest_ghes: fix documentation for CPER size
> scripts/ghes_inject: add a script to generate GHES error inject
> target/arm: add an experimental mpidr arm cpu property object
> scripts/arm_processor_error.py: retrieve mpidr if not filled
>
> MAINTAINERS | 10 +
> docs/specs/acpi_hest_ghes.rst | 6 +-
> hw/acpi/Kconfig | 5 +
> hw/acpi/aml-build.c | 10 +
> hw/acpi/generic_event_device.c | 19 +-
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 312 +++++++----
> hw/acpi/ghes_cper.c | 32 ++
> hw/acpi/ghes_cper_stub.c | 19 +
> hw/acpi/meson.build | 2 +
> hw/arm/virt-acpi-build.c | 12 +-
> hw/arm/virt.c | 19 +-
> include/hw/acpi/acpi_dev_interface.h | 1 +
> include/hw/acpi/aml-build.h | 2 +
> include/hw/acpi/generic_event_device.h | 1 +
> include/hw/acpi/ghes.h | 37 +-
> include/hw/arm/virt.h | 2 +
> qapi/acpi-hest.json | 35 ++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> scripts/arm_processor_error.py | 388 ++++++++++++++
> scripts/ghes_inject.py | 51 ++
> scripts/qmp_helper.py | 702 +++++++++++++++++++++++++
> target/arm/cpu.c | 1 +
> target/arm/cpu.h | 1 +
> target/arm/helper.c | 10 +-
> target/arm/kvm.c | 3 +-
> 27 files changed, 1552 insertions(+), 132 deletions(-)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c
> create mode 100644 qapi/acpi-hest.json
> create mode 100644 scripts/arm_processor_error.py
> create mode 100755 scripts/ghes_inject.py
> create mode 100644 scripts/qmp_helper.py
>
next prev parent reply other threads:[~2024-09-17 12:16 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-17 9:34 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2024-09-17 9:19 ` Igor Mammedov
2024-09-17 15:22 ` Peter Xu
2024-09-25 8:04 ` Mauro Carvalho Chehab
2024-09-17 12:01 ` Igor Mammedov
2024-09-25 7:32 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-09-17 9:22 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
2024-09-17 9:28 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
2024-09-17 9:01 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
2024-09-17 10:39 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-09-17 11:59 ` Igor Mammedov
2024-10-01 11:57 ` Mauro Carvalho Chehab
2024-10-03 14:51 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
2024-09-17 12:04 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
2024-09-17 12:15 ` Igor Mammedov [this message]
2024-09-24 13:00 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-24 13:14 ` Igor Mammedov
2024-09-25 4:26 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240917141519.57766bb6@imammedo.users.ipa.redhat.com \
--to=imammedo@redhat.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=armbru@redhat.com \
--cc=crosa@redhat.com \
--cc=eblake@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=jsnow@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shannon.zhaosl@gmail.com \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).