qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation
@ 2024-08-02 21:43 Mauro Carvalho Chehab
  2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
                   ` (6 more replies)
  0 siblings, 7 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Ani Sinha,
	Dongjiu Geng, Paolo Bonzini, Peter Maydell, Shannon Zhao,
	qemu-arm, qemu-devel

Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.

With QEMU, it becomes a lot easier, as it can be done via QMP.

This series add support for injecting CPER records on ARM emulation.

The QEMU side changes add a QAPI able to do CPER error injection
on ARM, with a raw data parameter, making it very flexible.

A script is provided at the final patch implementing support for
ARM Processor CPER error injection according with ACPI 6.x and 
UEFI 2.9A/2.10 specs, via QMP.

Injecting such errors can be done using the provided script:

	$ ./scripts/ghes_inject.py arm 
	Error injected.

Produces a simple CPER register, properly handled by the Linux
Kernel:

[  794.983753] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  794.984150] {4}[Hardware Error]: event severity: recoverable
[  794.984391] {4}[Hardware Error]:  Error 0, type: recoverable
[  794.984652] {4}[Hardware Error]:   section_type: ARM processor error
[  794.984926] {4}[Hardware Error]:   MIDR: 0x0000000000000000
[  794.985184] {4}[Hardware Error]:   running state: 0x0
[  794.985411] {4}[Hardware Error]:   Power State Coordination Interface state: 0
[  794.985720] {4}[Hardware Error]:   Error info structure 0:
[  794.985960] {4}[Hardware Error]:   num errors: 2
[  794.986175] {4}[Hardware Error]:    error_type: 0x02: cache error
[  794.986442] {4}[Hardware Error]:    error_info: 0x000000000091000f
[  794.986755] {4}[Hardware Error]:     transaction type: Data Access
[  794.987027] {4}[Hardware Error]:     cache error, operation type: Data write
[  794.987310] {4}[Hardware Error]:     cache level: 2
[  794.987529] {4}[Hardware Error]:     processor context not corrupted
[  794.987867] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error

More complex use cases can be done, like:

	$ ./scripts/ghes_inject.py arm --mpidr 0x444 --running --affinity 1 \
	  --error-info 12345678 --vendor 0x13,123,4,5,1 --ctx-array 0,1,2,3,4,5 \
	  -t cache tlb bus micro-arch tlb,micro-arch
	Error injected.

[  899.181246] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  899.181769] {5}[Hardware Error]: event severity: recoverable
[  899.182069] {5}[Hardware Error]:  Error 0, type: recoverable
[  899.182370] {5}[Hardware Error]:   section_type: ARM processor error
[  899.182689] {5}[Hardware Error]:   MIDR: 0x0000000000000000
[  899.182980] {5}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000000000000
[  899.183395] {5}[Hardware Error]:   error affinity level: 0
[  899.183683] {5}[Hardware Error]:   running state: 0x1
[  899.183962] {5}[Hardware Error]:   Power State Coordination Interface state: 0
[  899.184332] {5}[Hardware Error]:   Error info structure 0:
[  899.184610] {5}[Hardware Error]:   num errors: 2
[  899.184864] {5}[Hardware Error]:    error_type: 0x02: cache error
[  899.185181] {5}[Hardware Error]:    error_info: 0x0000000000bc614e
[  899.185504] {5}[Hardware Error]:     cache level: 2
[  899.185771] {5}[Hardware Error]:     processor context not corrupted
[  899.186082] {5}[Hardware Error]:   Error info structure 1:
[  899.186366] {5}[Hardware Error]:   num errors: 2
[  899.186613] {5}[Hardware Error]:    error_type: 0x04: TLB error
[  899.186929] {5}[Hardware Error]:    error_info: 0x000000000054007f
[  899.187236] {5}[Hardware Error]:     transaction type: Instruction
[  899.187588] {5}[Hardware Error]:     TLB error, operation type: Instruction fetch
[  899.187962] {5}[Hardware Error]:     TLB level: 1
[  899.188209] {5}[Hardware Error]:     processor context not corrupted
[  899.188535] {5}[Hardware Error]:     the error has not been corrected
[  899.188853] {5}[Hardware Error]:     PC is imprecise
[  899.189114] {5}[Hardware Error]:   Error info structure 2:
[  899.189404] {5}[Hardware Error]:   num errors: 2
[  899.189653] {5}[Hardware Error]:    error_type: 0x08: bus error
[  899.189967] {5}[Hardware Error]:    error_info: 0x00000080d6460fff
[  899.190293] {5}[Hardware Error]:     transaction type: Generic
[  899.190611] {5}[Hardware Error]:     bus error, operation type: Generic read (type of instruction or data request cannot be determined)
[  899.191174] {5}[Hardware Error]:     affinity level at which the bus error occurred: 1
[  899.191563] {5}[Hardware Error]:     processor context corrupted
[  899.191872] {5}[Hardware Error]:     the error has been corrected
[  899.192185] {5}[Hardware Error]:     PC is imprecise
[  899.192445] {5}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[  899.192939] {5}[Hardware Error]:     participation type: Local processor observed
[  899.193324] {5}[Hardware Error]:     request timed out
[  899.193596] {5}[Hardware Error]:     address space: External Memory Access
[  899.193945] {5}[Hardware Error]:     memory access attributes:0x20
[  899.194273] {5}[Hardware Error]:     access mode: secure
[  899.194544] {5}[Hardware Error]:   Error info structure 3:
[  899.194838] {5}[Hardware Error]:   num errors: 2
[  899.195088] {5}[Hardware Error]:    error_type: 0x10: micro-architectural error
[  899.195456] {5}[Hardware Error]:    error_info: 0x0000000078da03ff
[  899.195782] {5}[Hardware Error]:   Error info structure 4:
[  899.196070] {5}[Hardware Error]:   num errors: 2
[  899.196331] {5}[Hardware Error]:    error_type: 0x14: TLB error|micro-architectural error
[  899.196733] {5}[Hardware Error]:   Context info structure 0:
[  899.197024] {5}[Hardware Error]:    register context type: AArch64 EL1 context registers
[  899.197427] {5}[Hardware Error]:    00000000: 00000000 00000000
[  899.197741] {5}[Hardware Error]:   Vendor specific error info has 5 bytes:
[  899.198096] {5}[Hardware Error]:    00000000: 13 7b 04 05 01                                   .{...
[  899.198610] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
[  899.199000] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[  899.199388] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[  899.199767] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[  899.200194] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error

---

v5:
- CPER guid is now passing as string;
- raw-data is now passed with base64 encode;
- Removed several GPIO left-overs from arm/virt.c changes;
- Lots of cleanups and improvements at the error injection script.
  It now better handles QMP dialog and doesn't print debug messages.
  Also, code was split on two modules, to make easier to add more
  error injection commands.

v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.

v3:
- patch 1 cleanups with some comment changes and adding another place where
  the poweroff GPIO define should be used. No changes on other patches (except
  due to conflict resolution).

v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.


Jonathan Cameron (1):
  acpi/ghes: Support GPIO error source

Mauro Carvalho Chehab (6):
  arm/virt: place power button pin number on a define
  acpi/generic_event_device: add an APEI error device
  arm/virt: Wire up GPIO error source for ACPI / GHES
  qapi/ghes-cper: add an interface to do generic CPER error injection
  acpi/ghes: add support for generic error injection via QAPI
  scripts/ghes_inject: add a script to generate GHES error inject

 MAINTAINERS                            |  10 +
 hw/acpi/Kconfig                        |   5 +
 hw/acpi/generic_event_device.c         |  17 ++
 hw/acpi/ghes.c                         | 178 +++++++++++--
 hw/acpi/ghes_cper.c                    |  45 ++++
 hw/acpi/ghes_cper_stub.c               |  18 ++
 hw/acpi/meson.build                    |   2 +
 hw/arm/Kconfig                         |   5 +
 hw/arm/virt-acpi-build.c               |   7 +-
 hw/arm/virt.c                          |  23 +-
 include/hw/acpi/acpi_dev_interface.h   |   1 +
 include/hw/acpi/generic_event_device.h |   3 +
 include/hw/acpi/ghes.h                 |  16 +-
 include/hw/arm/virt.h                  |   4 +
 qapi/ghes-cper.json                    |  55 ++++
 qapi/meson.build                       |   1 +
 qapi/qapi-schema.json                  |   1 +
 scripts/arm_processor_error.py         | 352 +++++++++++++++++++++++++
 scripts/ghes_inject.py                 |  59 +++++
 scripts/qmp_helper.py                  | 249 +++++++++++++++++
 20 files changed, 1026 insertions(+), 25 deletions(-)
 create mode 100644 hw/acpi/ghes_cper.c
 create mode 100644 hw/acpi/ghes_cper_stub.c
 create mode 100644 qapi/ghes-cper.json
 create mode 100644 scripts/arm_processor_error.py
 create mode 100755 scripts/ghes_inject.py
 create mode 100644 scripts/qmp_helper.py

-- 
2.45.2




^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2024-08-13 19:00 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-08-06  8:57   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-05 16:39   ` Jonathan Cameron via
2024-08-06  5:50     ` Mauro Carvalho Chehab
2024-08-06  8:54   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-08-05 16:54   ` Jonathan Cameron via
2024-08-06  5:56     ` Mauro Carvalho Chehab
2024-08-06  9:15   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-08-05 16:56   ` Jonathan Cameron via
2024-08-06  6:09     ` Mauro Carvalho Chehab
2024-08-06  9:18       ` Igor Mammedov
2024-08-06  9:32   ` Igor Mammedov
2024-08-07  7:15     ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-05 17:00   ` Jonathan Cameron via
2024-08-06  9:15   ` Shiju Jose via
2024-08-06 12:51   ` Igor Mammedov
2024-08-06 12:58     ` Mauro Carvalho Chehab
2024-08-08  8:50   ` Markus Armbruster
2024-08-08 14:11     ` Mauro Carvalho Chehab
2024-08-08 14:22       ` Igor Mammedov
2024-08-08 14:45         ` Markus Armbruster
2024-08-09  8:42           ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-05 17:03   ` Jonathan Cameron via
2024-08-06 11:13   ` Shiju Jose via
2024-08-06 14:31   ` Igor Mammedov
2024-08-07  7:47     ` Mauro Carvalho Chehab
2024-08-07  9:34       ` Jonathan Cameron via
2024-08-07 13:23         ` Mauro Carvalho Chehab
2024-08-07 13:43           ` Igor Mammedov
2024-08-07 13:28         ` Igor Mammedov
2024-08-07 14:25     ` Jonathan Cameron via
2024-08-08  8:11       ` Igor Mammedov
2024-08-08 18:19         ` Mauro Carvalho Chehab
2024-08-12  9:39           ` Igor Mammedov
2024-08-13 18:59             ` Mauro Carvalho Chehab
2024-08-08 12:11     ` Mauro Carvalho Chehab
2024-08-08 12:45       ` Igor Mammedov
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-06 14:56   ` Igor Mammedov
2024-08-08 20:58   ` John Snow
2024-08-08 21:51     ` Mauro Carvalho Chehab
2024-08-08 21:21   ` John Snow
2024-08-08 22:41     ` Mauro Carvalho Chehab
2024-08-08 23:33       ` John Snow
2024-08-09  8:24         ` Mauro Carvalho Chehab
2024-08-09 19:26           ` John Snow
2024-08-09  6:26       ` Mauro Carvalho Chehab
2024-08-09  7:37         ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).