qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor
@ 2024-07-29 13:21 Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 1/6] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Philippe Mathieu-Daudé, Ani Sinha, Dongjiu Geng,
	Paolo Bonzini, Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu,
	qemu-arm, qemu-devel

Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.

With QEMU, it becomes a lot easier, as it can be done via QMP.

This series add support for ARM Processor CPER error injection,
according with ACPI 6.x and UEFI 2.9A/2.10 specs.

This series consists of:

- one patch using a define for ARM virt GPIO power pin
  (requested during last review);
- three patches from Jonathan (one coauthored with Shiju) with basic
  EINJ features, already submitted as RFC (but not merged yet) at:
    https://lore.kernel.org/qemu-devel/20240628090605.529-1-shiju.jose@huawei.com/
- three patches from me extending it to optionally allow to
  generate all sorts of possible valid combinations for
  ARM Processor CPER record.

I've been using it to test a Linux Kernel patch series fixing
UEFI 2.9A errata and ARM processor trace event:
   https://lore.kernel.org/linux-edac/3853853f820a666253ca8ed6c7c724dc3d50044a.1720679234.git.mchehab+huawei@kernel.org/T/#t

I also wrote some Wiki pages for rasdaemon (a Linux daemon
widely used to monitor and react to RAS events):
   https://github.com/mchehab/rasdaemon/wiki/error-injection

Being really helpful to test the Linux Kernel behavior when
firmware-first RAS events for ARM processor arrives there,
helping to validate how CPER and GHES driver handles them
(and further testing userspace apps like rasdaemon):

Sending this command to QMP:
    { "execute": "qmp_capabilities" } 
    { "execute": "arm-inject-error", "arguments": {"error": [{"type": ["cache-error"]}]} }

Produces a simple CPER register, properly handled by the Linux
Kernel:

[  839.952678] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  839.953145] {4}[Hardware Error]: event severity: recoverable
[  839.953451] {4}[Hardware Error]:  Error 0, type: recoverable
[  839.953763] {4}[Hardware Error]:   section_type: ARM processor error
[  839.954094] {4}[Hardware Error]:   MIDR: 0x0000000000000000
[  839.954383] {4}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[  839.954802] {4}[Hardware Error]:   running state: 0x0
[  839.955066] {4}[Hardware Error]:   Power State Coordination Interface state: 0
[  839.955424] {4}[Hardware Error]:   Error info structure 0:
[  839.955712] {4}[Hardware Error]:   num errors: 1
[  839.955983] {4}[Hardware Error]:    first error captured
[  839.956260] {4}[Hardware Error]:    propagated error captured
[  839.956561] {4}[Hardware Error]:    error_type: 0x02: cache error
[  839.956882] {4}[Hardware Error]:    error_info: 0x000000000054007f
[  839.957192] {4}[Hardware Error]:     transaction type: Instruction
[  839.957495] {4}[Hardware Error]:     cache error, operation type: Instruction fetch
[  839.957888] {4}[Hardware Error]:     cache level: 1
[  839.958166] {4}[Hardware Error]:     processor context not corrupted
[  839.958459] {4}[Hardware Error]:     the error has not been corrected
[  839.958771] {4}[Hardware Error]:     PC is imprecise
[  839.959074] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error

rasdaemon output (rasdaemon still needs to be patched for
UEFI 2.9A errata):

           <...>-211   [002] d..1.     0.000129 arm_event 2024-07-11 09:50:45 +0000 affinity: -1 MPIDR: 0x80000000 MIDR: 0x0 running_state: 0 psci_state: 0 ARM Processor Err Info data len: 32
<CANT FIND FIELD buf>cpu: 0; error: 2; affinity level: 255; MPIDR: 0000000080000000; MIDR: 0000000000000000; running state: 0; PSCI state: 0; ARM Processor Err Info data len: 32; ARM Processor Err Info raw data: 00 20 06 00 02 00 00 05 7f 00 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00; ARM Processor Err Context Info data len: 0; ARM Processor Err Context Info raw data: ; Vendor Specific Err Info data len: 0; Vendor Specific Err Info raw data: 

More complex events with multiple Processor Error Information structures
can be produced like:

    { "execute": "arm-inject-error", "arguments":  {
        "validation": ["mpidr-valid", "affinity-valid", "running-state-valid", "vendor-specific-valid"],
        "running-state": [], "psci-state": 1229279264, 
        "error": [{
            "validation": ["multiple-error-valid", "flags-valid"], 
	"type": ["tlb-error", "bus-error", "micro-arch-error"], 
                               "multiple-error": 3, "phy-addr": 57005, "virt-addr": 48879},
                 {"type": ["micro-arch-error"]}, 
                 {"type": ["tlb-error"]}, 
                 {"type": ["bus-error"]},
                 {"type": ["cache-error"]}],
                 "context": [{"register": [57005, 48879, 43962, 47787]}],
                 "vendor-specific": [12, 23, 53, 52, 3, 123, 243, 255]} }

[  925.340284] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  925.340662] {5}[Hardware Error]: event severity: recoverable
[  925.340924] {5}[Hardware Error]:  Error 0, type: recoverable
[  925.341280] {5}[Hardware Error]:   section_type: ARM processor error
[  925.341631] {5}[Hardware Error]:   MIDR: 0x0000000000000000
[  925.341893] {5}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[  925.342278] {5}[Hardware Error]:   error affinity level: 0
[  925.342571] {5}[Hardware Error]:   running state: 0x0
[  925.342835] {5}[Hardware Error]:   Power State Coordination Interface state: 1229279264
[  925.343157] {5}[Hardware Error]:   Error info structure 0:
[  925.343388] {5}[Hardware Error]:   num errors: 4
[  925.343602] {5}[Hardware Error]:    error_type: 0x1c: TLB error|bus error|micro-architectural error
[  925.343960] {5}[Hardware Error]:    virtual fault address: 0x000000000000beef
[  925.344241] {5}[Hardware Error]:    physical fault address: 0x000000000000dead
[  925.344526] {5}[Hardware Error]:   Error info structure 1:
[  925.344757] {5}[Hardware Error]:   num errors: 1
[  925.344965] {5}[Hardware Error]:    first error captured
[  925.345183] {5}[Hardware Error]:    propagated error captured
[  925.345416] {5}[Hardware Error]:    error_type: 0x10: micro-architectural error
[  925.345714] {5}[Hardware Error]:   Error info structure 2:
[  925.345946] {5}[Hardware Error]:   num errors: 1
[  925.346148] {5}[Hardware Error]:    first error captured
[  925.346413] {5}[Hardware Error]:    propagated error captured
[  925.346719] {5}[Hardware Error]:    error_type: 0x04: TLB error
[  925.346988] {5}[Hardware Error]:    error_info: 0x00000080d6460fff
[  925.347248] {5}[Hardware Error]:     transaction type: Generic
[  925.347492] {5}[Hardware Error]:     TLB error, operation type: Generic read (type of instruction or data request cannot be determined)
[  925.347945] {5}[Hardware Error]:     TLB level: 1
[  925.348153] {5}[Hardware Error]:     processor context corrupted
[  925.348392] {5}[Hardware Error]:     the error has been corrected
[  925.348635] {5}[Hardware Error]:     PC is imprecise
[  925.348848] {5}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[  925.349232] {5}[Hardware Error]:   Error info structure 3:
[  925.349459] {5}[Hardware Error]:   num errors: 1
[  925.349662] {5}[Hardware Error]:    first error captured
[  925.349884] {5}[Hardware Error]:    propagated error captured
[  925.350115] {5}[Hardware Error]:    error_type: 0x08: bus error
[  925.350371] {5}[Hardware Error]:    error_info: 0x0000000078da03ff
[  925.350629] {5}[Hardware Error]:     transaction type: Generic
[  925.350878] {5}[Hardware Error]:     bus error, operation type: Prefetch
[  925.351144] {5}[Hardware Error]:     affinity level at which the bus error occurred: 3
[  925.351451] {5}[Hardware Error]:     processor context not corrupted
[  925.351702] {5}[Hardware Error]:     the error has not been corrected
[  925.351960] {5}[Hardware Error]:     PC is precise
[  925.352164] {5}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[  925.352546] {5}[Hardware Error]:     participation type: Generic
[  925.352801] {5}[Hardware Error]:     address space: External Memory Access
[  925.353071] {5}[Hardware Error]:   Error info structure 4:
[  925.353299] {5}[Hardware Error]:   num errors: 1
[  925.353502] {5}[Hardware Error]:    first error captured
[  925.353720] {5}[Hardware Error]:    propagated error captured
[  925.353963] {5}[Hardware Error]:    error_type: 0x02: cache error
[  925.354222] {5}[Hardware Error]:    error_info: 0x000000000054007f
[  925.354478] {5}[Hardware Error]:     transaction type: Instruction
[  925.354782] {5}[Hardware Error]:     cache error, operation type: Instruction fetch
[  925.355203] {5}[Hardware Error]:     cache level: 1
[  925.355495] {5}[Hardware Error]:     processor context not corrupted
[  925.355848] {5}[Hardware Error]:     the error has not been corrected
[  925.356206] {5}[Hardware Error]:     PC is imprecise
[  925.356493] {5}[Hardware Error]:   Context info structure 0:
[  925.356809] {5}[Hardware Error]:    register context type: AArch64 EL1 context registers
[  925.357282] {5}[Hardware Error]:    00000000: 0000dead 00000000 0000beef 00000000
[  925.357800] {5}[Hardware Error]:    00000010: 0000abba 00000000 0000baab 00000000
[  925.358267] {5}[Hardware Error]:    00000020: 00000000 00000000
[  925.358523] {5}[Hardware Error]:   Vendor specific error info has 8 bytes:
[  925.358822] {5}[Hardware Error]:    00000000: 3435170c fff37b03                    ..54.{..
[  925.359192] [Firmware Warn]: GHES: Unhandled processor error type 0x1c: TLB error|bus error|micro-architectural error
[  925.359590] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[  925.359935] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[  925.360235] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[  925.360534] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error

---

v4:
- patches 4 and 7 folded;
- Several improvements at QAPI documentation: now, both man-page and html
  outputs look nicer and have tables to better define some fields;
- QAPI updates are now changed to QEMU version 9.2 and upper;
- Minor coding style improvements;
- added a MAINTAINERS entry for arm error inject;
- Generic Error Device notify callback renamed to generic_error_device_notify();
- GED patch description fixed;
- running_state/psci logic fixed.

v3:
- patch 1 cleanups with some comment changes and adding another place where
  the poweroff GPIO define should be used. No changes on other patches (except
  due to conflict resolution).

v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.

Jonathan Cameron (2):
  arm/virt: Wire up GPIO error source for ACPI / GHES
  acpi/ghes: Support GPIO error source.

Mauro Carvalho Chehab (4):
  arm/virt: place power button pin number on a define
  target/arm: preserve mpidr value
  acpi/ghes: update comments to point to newer ACPI specs
  acpi/ghes: Add a logic to inject ARM processor CPER

 MAINTAINERS                         |   7 +
 configs/targets/aarch64-softmmu.mak |   1 +
 hw/acpi/ghes.c                      | 339 +++++++++++++++++++---
 hw/arm/Kconfig                      |   4 +
 hw/arm/arm_error_inject.c           | 420 ++++++++++++++++++++++++++++
 hw/arm/arm_error_inject_stubs.c     |  34 +++
 hw/arm/meson.build                  |   3 +
 hw/arm/virt-acpi-build.c            |  34 ++-
 hw/arm/virt.c                       |  21 +-
 include/hw/acpi/ghes.h              |  41 +++
 include/hw/arm/virt.h               |   4 +
 include/hw/boards.h                 |   1 +
 qapi/arm-error-inject.json          | 284 +++++++++++++++++++
 qapi/meson.build                    |   1 +
 qapi/qapi-schema.json               |   1 +
 target/arm/cpu.h                    |   1 +
 target/arm/helper.c                 |  10 +-
 17 files changed, 1160 insertions(+), 46 deletions(-)
 create mode 100644 hw/arm/arm_error_inject.c
 create mode 100644 hw/arm/arm_error_inject_stubs.c
 create mode 100644 qapi/arm-error-inject.json

-- 
2.45.2




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/6] arm/virt: place power button pin number on a define
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Igor Mammedov, Peter Maydell,
	Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

Having magic numbers inside the code is not a good idea, as it
is error-prone. So, instead, create a macro with the number
definition.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/arm/virt-acpi-build.c | 6 +++---
 hw/arm/virt.c            | 7 ++++---
 include/hw/arm/virt.h    | 3 +++
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e10cad86dd73..f76fb117adff 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
     aml_append(dev, aml_name_decl("_CRS", crs));
 
     Aml *aei = aml_resource_template();
-    /* Pin 3 for power button */
-    const uint32_t pin_list[1] = {3};
+
+    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
     aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
-                                 AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
+                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
                                  "GPO0", NULL, 0));
     aml_append(dev, aml_name_decl("_AEI", aei));
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 719e83e6a1e7..687fe0bb8bc9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
     if (s->acpi_dev) {
         acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
     } else {
-        /* use gpio Pin 3 for power button event */
+        /* use gpio Pin for power button event */
         qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
     }
 }
@@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
                              uint32_t phandle)
 {
     gpio_key_dev = sysbus_create_simple("gpio-key", -1,
-                                        qdev_get_gpio_in(pl061_dev, 3));
+                                        qdev_get_gpio_in(pl061_dev,
+                                                         GPIO_PIN_POWER_BUTTON));
 
     qemu_fdt_add_subnode(fdt, "/gpio-keys");
     qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
@@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
     qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
                           KEY_POWER);
     qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
-                           "gpios", phandle, 3, 0);
+                           "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
 }
 
 #define SECURE_GPIO_POWEROFF 0
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ab961bb6a9b8..a4d937ed45ac 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -47,6 +47,9 @@
 /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
 #define PVTIME_SIZE_PER_CPU 64
 
+/* GPIO pins */
+#define GPIO_PIN_POWER_BUTTON  3
+
 enum {
     VIRT_FLASH,
     VIRT_MEM,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 1/6] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 16:08   ` Jonathan Cameron via
  2024-07-30  8:11   ` Zhao Liu
  2024-07-29 13:21 ` [PATCH v4 3/6] target/arm: preserve mpidr value Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Igor Mammedov, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, Zhao Liu, linux-kernel, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Creates a Generic Event Device (GED) as specified at
ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
with HID PNP0C33.

The PNP0C33 device is used to report hardware errors to
the bios via ACPI APEI Generic Hardware Error Source (GHES).

It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/

[mchehab: use a define for the generic event pin number and do some cleanups]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
 hw/arm/virt.c            | 14 ++++++++++++--
 include/hw/arm/virt.h    |  1 +
 include/hw/boards.h      |  1 +
 4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..c502ccf40909 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -63,6 +63,7 @@
 
 #define ARM_SPI_BASE 32
 
+#define ACPI_GENERIC_EVENT_DEVICE "GEDD"
 #define ACPI_BUILD_TABLE_SIZE             0x20000
 
 static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
@@ -142,6 +143,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
 static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
                                            uint32_t gpio_irq)
 {
+    uint32_t pin;
+
     Aml *dev = aml_device("GPO0");
     aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
     aml_append(dev, aml_name_decl("_UID", aml_int(0)));
@@ -155,7 +158,12 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
 
     Aml *aei = aml_resource_template();
 
-    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
+    pin = GPIO_PIN_POWER_BUTTON;
+    aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
+                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
+                                 "GPO0", NULL, 0));
+    /* Pin for generic error */
+    pin = GPIO_PIN_GENERIC_ERROR;
     aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
                                  AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
                                  "GPO0", NULL, 0));
@@ -166,6 +174,11 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
     aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
                                   aml_int(0x80)));
     aml_append(dev, method);
+    method = aml_method("_E06", 0, AML_NOTSERIALIZED);
+    aml_append(method, aml_notify(aml_name(ACPI_GENERIC_EVENT_DEVICE),
+                                  aml_int(0x80)));
+    aml_append(dev, method);
+
     aml_append(scope, dev);
 }
 
@@ -800,6 +813,15 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
     build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
 }
 
+static void acpi_dsdt_add_generic_event_device(Aml *scope)
+{
+    Aml *dev = aml_device(ACPI_GENERIC_EVENT_DEVICE);
+    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
+    aml_append(scope, dev);
+}
+
 /* DSDT */
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
                       HOTPLUG_HANDLER(vms->acpi_dev),
                       irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
                       memmap[VIRT_ACPI_GED].base);
-    } else {
-        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
-                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
     }
+    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
+                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
 
     if (vms->acpi_dev) {
         uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
@@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     }
 
     acpi_dsdt_add_power_button(scope);
+    acpi_dsdt_add_generic_event_device(scope);
 #ifdef CONFIG_TPM
     acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 687fe0bb8bc9..5a11691f29f6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -997,6 +997,13 @@ static void create_rtc(const VirtMachineState *vms)
 }
 
 static DeviceState *gpio_key_dev;
+
+static DeviceState *gpio_error_dev;
+static void virt_set_error(void)
+{
+    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
+}
+
 static void virt_powerdown_req(Notifier *n, void *opaque)
 {
     VirtMachineState *s = container_of(n, VirtMachineState, powerdown_notifier);
@@ -1015,6 +1022,9 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
     gpio_key_dev = sysbus_create_simple("gpio-key", -1,
                                         qdev_get_gpio_in(pl061_dev,
                                                          GPIO_PIN_POWER_BUTTON));
+    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
+                                          qdev_get_gpio_in(pl061_dev,
+                                                           GPIO_PIN_GENERIC_ERROR));
 
     qemu_fdt_add_subnode(fdt, "/gpio-keys");
     qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
@@ -2385,9 +2395,8 @@ static void machvirt_init(MachineState *machine)
 
     if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
         vms->acpi_dev = create_acpi_ged(vms);
-    } else {
-        create_gpio_devices(vms, VIRT_GPIO, sysmem);
     }
+    create_gpio_devices(vms, VIRT_GPIO, sysmem);
 
     if (vms->secure && !vmc->no_secure_gpio) {
         create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
@@ -3101,6 +3110,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->default_ram_id = "mach-virt.ram";
     mc->default_nic = "virtio-net-pci";
 
+    mc->generic_error_device_notify = virt_set_error;
     object_class_property_add(oc, "acpi", "OnOffAuto",
         virt_get_acpi, virt_set_acpi,
         NULL, NULL);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a4d937ed45ac..c9769d7d4d7f 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -49,6 +49,7 @@
 
 /* GPIO pins */
 #define GPIO_PIN_POWER_BUTTON  3
+#define GPIO_PIN_GENERIC_ERROR 6
 
 enum {
     VIRT_FLASH,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 48ff6d8b93f7..991f99138e57 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -308,6 +308,7 @@ struct MachineClass {
     int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
     ram_addr_t (*fixup_ram_size)(ram_addr_t size);
     uint64_t smbios_memory_device_size;
+    void (*generic_error_device_notify)(void);
 };
 
 /**
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/6] target/arm: preserve mpidr value
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 1/6] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 4/6] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Peter Maydell, linux-kernel, qemu-arm, qemu-devel

There is a logic at helper to properly fill the mpidr information.
This is needed for ARM Processor error injection, so store the
value inside a cpu opaque value, to allow it to be used.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 target/arm/cpu.h    |  1 +
 target/arm/helper.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a12859fc5335..d2e86f0877cc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@ struct ArchCPU {
         uint64_t reset_pmcr_el0;
     } isar;
     uint64_t midr;
+    uint64_t mpidr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint64_t ctr;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index ce319572354a..2432b5b09607 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4692,7 +4692,7 @@ static uint64_t mpidr_read_val(CPUARMState *env)
     return mpidr;
 }
 
-static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+static uint64_t mpidr_read(CPUARMState *env)
 {
     unsigned int cur_el = arm_current_el(env);
 
@@ -4702,6 +4702,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
     return mpidr_read_val(env);
 }
 
+static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    return mpidr_read(env);
+}
+
 static const ARMCPRegInfo lpae_cp_reginfo[] = {
     /* NOP AMAIR0/1 */
     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
@@ -9723,7 +9728,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
               .fgt = FGT_MPIDR_EL1,
-              .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+              .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
         };
 #ifdef CONFIG_USER_ONLY
         static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
@@ -9733,6 +9738,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
         modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
 #endif
         define_arm_cp_regs(cpu, mpidr_cp_reginfo);
+        cpu->mpidr = mpidr_read(env);
     }
 
     if (arm_feature(env, ARM_FEATURE_AUXCR)) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] acpi/ghes: update comments to point to newer ACPI specs
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2024-07-29 13:21 ` [PATCH v4 3/6] target/arm: preserve mpidr value Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 5/6] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER Mauro Carvalho Chehab
  5 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Igor Mammedov,
	linux-kernel, qemu-arm, qemu-devel

There is one reference to ACPI 4.0 and several references
to ACPI 6.x versions.

Update them to point to ACPI 6.5 whenever possible.

There's one reference that was kept pointing to ACPI 6.4,
though, with HEST revision 1.

ACPI 6.5 now defines HEST revision 2, and defined a new
way to handle source types starting from 12. According
with ACPI 6.5 revision history:

	2312 Update to the HEST table and adding new error
	     source descriptor - Table 18.2.

Yet, the spec doesn't define yet any new source
descriptors. It just defines a different behavior when
source type is above 11.

I also double-checked GHES implementation on an open
source project (Linux Kernel). Currently upstream
doesn't currently handle HEST revision, ignoring such
field.

In any case, revision 2 seems to be backward-compatible
with revison 1 when type <= 11 and just one error is
contained on a HEST record.

So, while it is probably safe to update it, there's no
real need. So, let's keep the implementation using
an ACPI 6.4 compatible table, e. g. HEST revision 1.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c | 48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..8816af8280be 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -44,9 +44,9 @@
 #define GAS_ADDR_OFFSET 4
 
 /*
- * The total size of Generic Error Data Entry
- * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
- * Table 18-343 Generic Error Data Entry
+ * The total size of Generic Error Data Entry before data field
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.12 Generic Error Data Entry
  */
 #define ACPI_GHES_DATA_LENGTH               72
 
@@ -58,8 +58,8 @@
 
 /*
  * Total size for Generic Error Status Block except Generic Error Data Entries
- * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
- * Table 18-380 Generic Error Status Block
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.11 Generic Error Status Block
  */
 #define ACPI_GHES_GESB_SIZE                 20
 
@@ -75,7 +75,8 @@ enum AcpiGenericErrorSeverity {
 
 /*
  * Hardware Error Notification
- * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ * ACPI 6.5: 18.3.2.9 Hardware Error Notification,
+ * Table 18.14 - Hardware Error Notification Structure
  * Composes dummy Hardware Error Notification descriptor of specified type
  */
 static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
@@ -105,7 +106,8 @@ static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
 
 /*
  * Generic Error Data Entry
- * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.12 - Generic Error Data Entry
  */
 static void acpi_ghes_generic_error_data(GArray *table,
                 const uint8_t *section_type, uint32_t error_severity,
@@ -141,7 +143,8 @@ static void acpi_ghes_generic_error_data(GArray *table,
 
 /*
  * Generic Error Status Block
- * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.11 - Generic Hardware Error Source Structure
  */
 static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
                 uint32_t raw_data_offset, uint32_t raw_data_length,
@@ -286,15 +289,18 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
         0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
 }
 
-/* Build Generic Hardware Error Source version 2 (GHESv2) */
+/*
+ * Build Generic Hardware Error Source version 2 (GHESv2)
+ * ACPI 6.5: 18.3.2.8 Generic Hardware Error Source version 2 (GHESv2 - Type 10),
+ * Table 18.13: Generic Hardware Error Source version 2 (GHESv2)
+ */
 static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
 {
     uint64_t address_offset;
-    /*
-     * Type:
-     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
-     */
+    /* Type: (GHESv2 - Type 10) */
     build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
+
+    /* ACPI 6.5: Table 18.10 - Generic Hardware Error Source Structure */
     /* Source Id */
     build_append_int_noprefix(table_data, source_id, 2);
     /* Related Source Id */
@@ -335,11 +341,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
     /* Error Status Block Length */
     build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
 
-    /*
-     * Read Ack Register
-     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
-     * version 2 (GHESv2 - Type 10)
-     */
+    /* ACPI 6.5: fields defined at GHESv2 table */
+    /* Read Ack Register */
     address_offset = table_data->len;
     build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
                      4 /* QWord access */, 0);
@@ -358,11 +361,16 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
     build_append_int_noprefix(table_data, 0x1, 8);
 }
 
-/* Build Hardware Error Source Table */
+/*
+ * Build Hardware Error Source Table
+ * ACPI 6.4: 18.3.2 ACPI Error Source
+ * Table 18.2: Hardware Error Source Table (HEST)
+ */
 void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
                      const char *oem_id, const char *oem_table_id)
 {
-    AcpiTable table = { .sig = "HEST", .rev = 1,
+    AcpiTable table = { .sig = "HEST",
+                        .rev = 1,                   /* ACPI 4.0 to 6.4 */
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
 
     acpi_table_begin(&table, table_data);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/6] acpi/ghes: Support GPIO error source.
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2024-07-29 13:21 ` [PATCH v4 4/6] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 13:21 ` [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER Mauro Carvalho Chehab
  5 siblings, 0 replies; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add error notification to GHES v2 using the GPIO source.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c         | 8 ++++++--
 include/hw/acpi/ghes.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 8816af8280be..9346f45c59a5 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -34,8 +34,8 @@
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
 
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT        1
+/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
+#define ACPI_GHES_ERROR_SOURCE_COUNT        2
 
 /* Generic Hardware Error Source version 2 */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
@@ -333,6 +333,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
          */
         build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
         break;
+    case ACPI_HEST_SRC_ID_GPIO:
+        build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
+        break;
     default:
         error_report("Not support this error source");
         abort();
@@ -378,6 +381,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
     /* Error Source Count */
     build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
     build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+    build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
 
     acpi_table_end(linker, &table);
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..4f1ab1a73a06 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
 
 enum {
     ACPI_HEST_SRC_ID_SEA = 0,
+    ACPI_HEST_SRC_ID_GPIO = 1,
     /* future ids go here */
     ACPI_HEST_SRC_ID_RESERVED,
 };
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER
  2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2024-07-29 13:21 ` [PATCH v4 5/6] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
@ 2024-07-29 13:21 ` Mauro Carvalho Chehab
  2024-07-29 16:31   ` Jonathan Cameron via
  5 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 13:21 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
	Igor Mammedov, Markus Armbruster, Michael Roth, Paolo Bonzini,
	Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Add an ACPI APEI GHES error injection logic for ARM
processor CPER, allowing to set fields at from
UEFI spec 2.10 tables N.16 and N.17 to any valid
value.

As some GHES functions require handling addresses, add
a helper function to support it.

Before starting erorr inject, the QAPI requires to negociate
QMP with:

{ "execute": "qmp_capabilities" }

Afterwards, errors can be injected with:

	{ "execute": "arm-inject-error" }

The error injection events supports several optional arguments,
having Processor Error Information (PEI) mapped into an array.

So, it is possible to inject multiple errors at the same CPER record,
as defined at UEFI spec, with:

	{ "execute": "arm-inject-error", "arguments": {
	   "error": [ {"type": [ "cache-error" ]},
		      {"type": [ "tlb-error" ]} ] } }

The above generates a single CPER record with two PEI info, one
reporting a cache error, and the other one a TLB error, using
default values for other fields.

As all fields from ARM Processor CPER are mapped, so, the error
could contain physical/virtual addresses, register dumps,
vendor-specific data, etc.

This patch is co-authored:
- ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- logic to allow changing all fields by Mauro Carvalho Chehab;

Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 MAINTAINERS                         |   7 +
 configs/targets/aarch64-softmmu.mak |   1 +
 hw/acpi/ghes.c                      | 283 ++++++++++++++++++-
 hw/arm/Kconfig                      |   4 +
 hw/arm/arm_error_inject.c           | 420 ++++++++++++++++++++++++++++
 hw/arm/arm_error_inject_stubs.c     |  34 +++
 hw/arm/meson.build                  |   3 +
 include/hw/acpi/ghes.h              |  40 +++
 qapi/arm-error-inject.json          | 284 +++++++++++++++++++
 qapi/meson.build                    |   1 +
 qapi/qapi-schema.json               |   1 +
 11 files changed, 1067 insertions(+), 11 deletions(-)
 create mode 100644 hw/arm/arm_error_inject.c
 create mode 100644 hw/arm/arm_error_inject_stubs.c
 create mode 100644 qapi/arm-error-inject.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 98eddf7ae155..713a104ef901 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
 F: include/hw/acpi/ghes.h
 F: docs/specs/acpi_hest_ghes.rst
 
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/arm_error_inject.c
+F: hw/arm/arm_error_inject_stubs.c
+F: qapi/arm-error-inject.json
+
 ppc4xx
 L: qemu-ppc@nongnu.org
 S: Orphan
diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
index 84cb32dc2f4f..b4b3cd97934a 100644
--- a/configs/targets/aarch64-softmmu.mak
+++ b/configs/targets/aarch64-softmmu.mak
@@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
 TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
 # needed by boot.c
 TARGET_NEED_FDT=y
+CONFIG_ARM_EINJ=y
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 9346f45c59a5..e435c9aa0961 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -27,6 +27,7 @@
 #include "hw/acpi/generic_event_device.h"
 #include "hw/nvram/fw_cfg.h"
 #include "qemu/uuid.h"
+#include "qapi/qapi-types-arm-error-inject.h"
 
 #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
 #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
@@ -53,6 +54,12 @@
 /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
 #define ACPI_GHES_MEM_CPER_LENGTH           80
 
+/*
+ * ARM Processor error section CPER sizes - UEFI 2.10: N.2.4.4
+ */
+#define ACPI_GHES_ARM_CPER_LENGTH           40
+#define ACPI_GHES_ARM_CPER_PEI_LENGTH       32
+
 /* Masks for block_status flags */
 #define ACPI_GEBS_UNCORRECTABLE         1
 
@@ -234,6 +241,152 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
     return 0;
 }
 
+/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
+static void acpi_ghes_build_append_arm_cper(ArmError err, uint32_t cper_length,
+                                            GArray *table)
+{
+    unsigned int i, j;
+
+    /*
+     * ARM Processor Error Record
+     */
+
+    /* Validation Bits */
+    build_append_int_noprefix(table, err.validation, 4);
+
+    /* Error Info Num */
+    build_append_int_noprefix(table, err.err_info_num, 2);
+
+    /* Context Info Num */
+    build_append_int_noprefix(table, err.context_info_num, 2);
+
+    /* Section length */
+    build_append_int_noprefix(table, cper_length, 4);
+
+    /* Error affinity level */
+    build_append_int_noprefix(table, err.affinity_level, 1);
+
+    /* Reserved */
+    build_append_int_noprefix(table, 0, 3);
+
+    /* MPIDR_EL1 */
+    build_append_int_noprefix(table, err.mpidr_el1, 8);
+
+    /* MIDR_EL1 */
+    build_append_int_noprefix(table, err.midr_el1, 8);
+
+    /* Running state */
+    build_append_int_noprefix(table, err.running_state, 4);
+
+    /* PSCI state: only valid when running state is zero  */
+    build_append_int_noprefix(table, err.psci_state, 4);
+
+    for (i = 0; i < err.err_info_num; i++) {
+        /* ARM Propcessor error information */
+        /* Version */
+        build_append_int_noprefix(table, 0, 1);
+
+        /*  Length */
+        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
+
+        /* Validation Bits */
+        build_append_int_noprefix(table, err.pei[i].validation, 2);
+
+        /* Type */
+        build_append_int_noprefix(table, err.pei[i].type, 1);
+
+        /* Multiple error count */
+        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
+
+        /* Flags  */
+        build_append_int_noprefix(table, err.pei[i].flags, 1);
+
+        /* Error information  */
+        build_append_int_noprefix(table, err.pei[i].error_info, 8);
+
+        /* Virtual fault address  */
+        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
+
+        /* Physical fault address  */
+        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
+    }
+
+    for (i = 0; i < err.context_info_num; i++) {
+        /* ARM Propcessor error context information */
+        /* Version */
+        build_append_int_noprefix(table, 0, 2);
+
+        /* Validation type */
+        build_append_int_noprefix(table, err.context[i].type, 2);
+
+        /* Register array size */
+        build_append_int_noprefix(table, err.context[i].size * 8, 4);
+
+        /* Register array (byte 8 of Context info) */
+        for (j = 0; j < err.context[i].size; j++) {
+            build_append_int_noprefix(table, err.context[i].array[j], 8);
+        }
+    }
+
+    for (i = 0; i < err.vendor_num; i++) {
+        build_append_int_noprefix(table, err.vendor[i], 1);
+    }
+}
+
+static int acpi_ghes_record_arm_error(ArmError error,
+                                      uint64_t error_block_address)
+{
+    GArray *block;
+
+    /* ARM processor Error Section Type */
+    const uint8_t uefi_cper_arm_sec[] =
+          UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05, \
+                  0x1D, 0x5D, 0x46, 0xB0);
+
+    /*
+     * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
+     * Table 17-13 Generic Error Data Entry
+     */
+    QemuUUID fru_id = {};
+    uint32_t cper_length, data_length;
+
+    block = g_array_new(false, true /* clear */, 1);
+
+    /* This is the length if adding a new generic error data entry */
+    cper_length = ACPI_GHES_ARM_CPER_LENGTH;
+    cper_length += ACPI_GHES_ARM_CPER_PEI_LENGTH * error.err_info_num;
+    cper_length += error.context_length;
+    cper_length += error.vendor_num;
+
+    data_length = ACPI_GHES_DATA_LENGTH + cper_length;
+
+    /*
+     * It should not run out of the preallocated memory if adding a new generic
+     * error data entry
+     */
+    assert((data_length + ACPI_GHES_GESB_SIZE) <=
+            ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+    /* Build the new generic error status block header */
+    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
+        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
+
+    /* Build this new generic error data entry header */
+    acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
+                                 ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+                                 cper_length, fru_id, 0);
+
+    /* Build the ARM processor error section CPER */
+    acpi_ghes_build_append_arm_cper(error, cper_length, block);
+
+    /* Write the generic error data entry into guest memory */
+    cpu_physical_memory_write(error_block_address, block->data, block->len);
+
+    g_array_free(block, true);
+
+    return 0;
+}
+
 /*
  * Build table for the hardware error fw_cfg blob.
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
@@ -400,23 +553,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     ags->present = true;
 }
 
+static uint64_t ghes_get_state_start_address(void)
+{
+    AcpiGedState *acpi_ged_state =
+        ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
+    AcpiGhesState *ags = &acpi_ged_state->ghes_state;
+
+    return le64_to_cpu(ags->ghes_addr_le);
+}
+
 int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 {
     uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
-    uint64_t start_addr;
+    uint64_t start_addr = ghes_get_state_start_address();
     bool ret = -1;
-    AcpiGedState *acpi_ged_state;
-    AcpiGhesState *ags;
-
     assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
 
-    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-                                                       NULL));
-    g_assert(acpi_ged_state);
-    ags = &acpi_ged_state->ghes_state;
-
-    start_addr = le64_to_cpu(ags->ghes_addr_le);
-
     if (physical_address) {
 
         if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
@@ -456,6 +608,115 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
     return ret;
 }
 
+/*
+ * Error register block data layout
+ *
+ * | +---------------------+ ges.ghes_addr_le
+ * | |error_block_address0 |
+ * | +---------------------+
+ * | |error_block_address1 |
+ * | +---------------------+ --+--
+ * | |    .............    | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | |error_block_addressN |
+ * | +---------------------+
+ * | | read_ack0           |
+ * | +---------------------+ --+--
+ * | | read_ack1           | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | |   .............     |
+ * | +---------------------+
+ * | | read_ackN           |
+ * | +---------------------+ --+--
+ * | |      CPER           |   |
+ * | |      ....           | GHES_MAX_RAW_DATA_LENGT
+ * | |      CPER           |   |
+ * | +---------------------+ --+--
+ * | |    ..........       |
+ * | +---------------------+
+ * | |      CPER           |
+ * | |      ....           |
+ * | |      CPER           |
+ * | +---------------------+
+ */
+
+/* Map from uint32_t notify to entry offset in GHES */
+static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
+                                                 0xff, 0xff, 0xff, 1, 0};
+
+static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
+                          uint64_t *read_ack_addr)
+{
+    uint64_t base;
+
+    if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
+        return false;
+    }
+
+    /* Find and check the source id for this new CPER */
+    if (error_source_to_index[notify] == 0xff) {
+        return false;
+    }
+
+    base = ghes_get_state_start_address();
+
+    *read_ack_addr = base +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        error_source_to_index[notify] * sizeof(uint64_t);
+
+    /* Could also be read back from the error_block_address register */
+    *error_block_addr = base +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
+
+    return true;
+}
+
+/* Notify BIOS about an error via Generic Error Device - GED */
+static void generic_error_device_notify(void)
+{
+    MachineState *machine = MACHINE(qdev_get_machine());
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+
+    if (mc->generic_error_device_notify) {
+        mc->generic_error_device_notify();
+    }
+}
+
+bool ghes_record_arm_errors(ArmError error, uint32_t notify)
+{
+    int rc, read_ack = 0;
+    uint64_t read_ack_addr = 0;
+    uint64_t error_block_addr = 0;
+
+    if (!ghes_get_addr(notify, &error_block_addr, &read_ack_addr)) {
+        return false;
+    }
+
+    cpu_physical_memory_read(read_ack_addr,
+                             &read_ack, sizeof(uint64_t));
+    /* zero means OSPM does not acknowledge the error */
+    if (!read_ack) {
+        error_report("Last time OSPM does not acknowledge the error,"
+                     " record CPER failed this time, set the ack value to"
+                     " avoid blocking next time CPER record! exit");
+        read_ack = 1;
+        cpu_physical_memory_write(read_ack_addr,
+                                  &read_ack, sizeof(uint64_t));
+        return false;
+    }
+
+    read_ack = cpu_to_le64(0);
+    cpu_physical_memory_write(read_ack_addr,
+                              &read_ack, sizeof(uint64_t));
+    rc = acpi_ghes_record_arm_error(error, error_block_addr);
+
+    generic_error_device_notify();
+
+    return rc;
+}
+
 bool acpi_ghes_present(void)
 {
     AcpiGedState *acpi_ged_state;
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 1ad60da7aa2d..bafac82f9fd3 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -712,3 +712,7 @@ config ARMSSE
     select UNIMP
     select SSE_COUNTER
     select SSE_TIMER
+
+config ARM_EINJ
+    bool
+    default y if AARCH64
diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
new file mode 100644
index 000000000000..5ebbdf2b2adc
--- /dev/null
+++ b/hw/arm/arm_error_inject.c
@@ -0,0 +1,420 @@
+/*
+ * ARM Processor error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/boards.h"
+#include "hw/acpi/ghes.h"
+#include "cpu.h"
+
+#define ACPI_GHES_ARM_CPER_CTX_DEFAULT_NREGS 74
+
+/* Handle ARM Processor Error Information (PEI) */
+static const ArmProcessorErrorInformationList *default_pei = { 0 };
+
+static ArmPEI *qmp_arm_pei(uint16_t *err_info_num,
+              bool has_error,
+              ArmProcessorErrorInformationList const *error_list)
+{
+    ArmProcessorErrorInformationList const *next;
+    ArmPeiValidationBitsList const *validation_list;
+    ArmPEI *pei = NULL;
+    uint16_t i;
+
+    if (!has_error) {
+        error_list = default_pei;
+    }
+
+    *err_info_num = 0;
+
+    for (next = error_list; next; next = next->next) {
+        (*err_info_num)++;
+
+        if (*err_info_num >= 255) {
+            break;
+        }
+    }
+
+    pei = g_new0(ArmPEI, (*err_info_num));
+
+    for (next = error_list, i = 0;
+                i < *err_info_num; i++, next = next->next) {
+        ArmProcessorErrorTypeList *type_list = next->value->type;
+        uint16_t pei_validation = 0;
+        uint8_t flags = 0;
+        uint8_t type = 0;
+
+        if (next->value->has_validation) {
+            validation_list = next->value->validation;
+
+            while (validation_list) {
+                pei_validation |= BIT(next->value->validation->value);
+                validation_list = validation_list->next;
+            }
+        }
+
+        /*
+         * According with UEFI 2.9A errata, the meaning of this field is
+         * given by the following bitmap:
+         *
+         *   +-----|---------------------------+
+         *   | Bit | Meaning                   |
+         *   +=====+===========================+
+         *   |  1  | Cache Error               |
+         *   |  2  | TLB Error                 |
+         *   |  3  | Bus Error                 |
+         *   |  4  | Micro-architectural Error |
+         *   +-----|---------------------------+
+         *
+         *   All other values are reserved.
+         *
+         * As bit 0 is reserved, QAPI ArmProcessorErrorType starts from bit 1.
+         */
+        while (type_list) {
+            type |= BIT(type_list->value + 1);
+            type_list = type_list->next;
+        }
+        if (!has_error) {
+            type = BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR);
+        }
+        pei[i].type = type;
+
+        if (next->value->has_flags) {
+            ArmProcessorFlagsList *flags_list = next->value->flags;
+
+            while (flags_list) {
+                flags |= BIT(flags_list->value);
+                flags_list = flags_list->next;
+            }
+        } else {
+            flags = BIT(ARM_PROCESSOR_FLAGS_FIRST_ERROR_CAP) |
+                    BIT(ARM_PROCESSOR_FLAGS_PROPAGATED);
+        }
+        pei[i].flags = flags;
+
+        if (next->value->has_multiple_error) {
+            pei[i].multiple_error = next->value->multiple_error;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_MULTIPLE_ERROR_VALID);
+        }
+
+        if (next->value->has_error_info) {
+            pei[i].error_info = next->value->error_info;
+        } else {
+            switch (type) {
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR):
+                pei[i].error_info = 0x0091000F;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR):
+                pei[i].error_info = 0x0054007F;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR):
+                pei[i].error_info = 0x80D6460FFF;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR):
+                pei[i].error_info = 0x78DA03FF;
+                break;
+            default:
+                /*
+                 * UEFI 2.9A/2.10 doesn't define how this should be filled
+                 * when multiple types are there. So, set default to zero,
+                 * causing it to be removed from validation bits.
+                 */
+                pei[i].error_info = 0;
+            }
+        }
+
+        if (next->value->has_virt_addr) {
+            pei[i].virt_addr = next->value->virt_addr;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_VIRT_ADDR_VALID);
+        }
+
+        if (next->value->has_phy_addr) {
+            pei[i].phy_addr = next->value->phy_addr;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_PHY_ADDR_VALID);
+        }
+
+        if (!next->value->has_validation) {
+            if (pei[i].flags) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_FLAGS_VALID);
+            }
+            if (pei[i].error_info) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_ERROR_INFO_VALID);
+            }
+            if (next->value->has_virt_addr) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_VIRT_ADDR_VALID);
+            }
+
+            if (next->value->has_phy_addr) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_PHY_ADDR_VALID);
+            }
+        }
+
+        pei[i].validation = pei_validation;
+    }
+
+    return pei;
+}
+
+/*
+ * UEFI 2.10 default context register type (See UEFI 2.10 table N.21 for more)
+ */
+#define CONTEXT_AARCH32_EL1   1
+#define CONTEXT_AARCH64_EL1   5
+
+static int get_default_context_type(void)
+{
+    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
+    bool aarch64;
+
+    aarch64 = object_property_get_bool(OBJECT(cpu), "aarch64", NULL);
+
+    if (aarch64) {
+        return CONTEXT_AARCH64_EL1;
+    }
+    return CONTEXT_AARCH32_EL1;
+}
+
+/* Handle ARM Context */
+static ArmContext *qmp_arm_context(uint16_t *context_info_num,
+                                   uint32_t *context_length,
+                                   bool has_context,
+                                   ArmProcessorContextList const *context_list)
+{
+    ArmProcessorContextList const *next;
+    ArmContext *context = NULL;
+    uint16_t i, j, num, default_type;
+
+    default_type = get_default_context_type();
+
+    if (!has_context) {
+        *context_info_num = 0;
+        *context_length = 0;
+
+        return NULL;
+    }
+
+    /* Calculate sizes */
+    num = 0;
+    for (next = context_list; next; next = next->next) {
+        uint32_t n_regs = 0;
+
+        if (next->value->has_q_register) {
+            uint64List *reg = next->value->q_register;
+
+            while (reg) {
+                n_regs++;
+                reg = reg->next;
+            }
+
+            if (next->value->has_minimal_size &&
+                next->value->minimal_size < n_regs) {
+                n_regs = next->value->minimal_size;
+            }
+        } else if (!next->value->has_minimal_size) {
+            n_regs = ACPI_GHES_ARM_CPER_CTX_DEFAULT_NREGS;
+        }
+
+        if (!n_regs) {
+            next->value->minimal_size = 0;
+        } else {
+            next->value->minimal_size = (n_regs + 1) % 0xfffe;
+        }
+
+        num++;
+        if (num >= 65535) {
+            break;
+        }
+    }
+
+    context = g_new0(ArmContext, num);
+
+    /* Fill context data */
+
+    *context_length = 0;
+    *context_info_num = 0;
+
+    next = context_list;
+    for (i = 0; i < num; i++, next = next->next) {
+        if (!next->value->minimal_size) {
+            continue;
+        }
+
+        if (next->value->has_type) {
+            context[*context_info_num].type = next->value->type;
+        } else {
+            context[*context_info_num].type = default_type;
+        }
+        context[*context_info_num].size = next->value->minimal_size;
+        context[*context_info_num].array = g_malloc0(context[*context_info_num].size * 8);
+
+        (*context_info_num)++;
+
+        /* length = 64 bits * (size of the reg array + context type) */
+        *context_length += (context->size + 1) * 8;
+
+        if (!next->value->has_q_register) {
+            *context->array = 0xDEADBEEF;
+        } else {
+            uint64_t *pos = context->array;
+            uint64List *reg = next->value->q_register;
+
+            for (j = 0; j < context->size; j++) {
+                if (!reg) {
+                    break;
+                }
+
+                *(pos++) = reg->value;
+                reg = reg->next;
+            }
+        }
+    }
+
+    if (!*context_info_num) {
+        g_free(context);
+        return NULL;
+    }
+
+    return context;
+}
+
+static uint8_t *qmp_arm_vendor(uint32_t *vendor_num, bool has_vendor_specific,
+                               uint8List const *vendor_specific_list)
+{
+    uint8List const *next = vendor_specific_list;
+    uint8_t *vendor, *p;
+
+    if (!has_vendor_specific) {
+        return NULL;
+    }
+
+    *vendor_num = 0;
+
+    while (next) {
+        next = next->next;
+        (*vendor_num)++;
+    }
+
+    vendor = g_malloc(*vendor_num);
+
+    p = vendor;
+    next = vendor_specific_list;
+    while (next) {
+        *p = next->value;
+        next = next->next;
+        p++;
+    }
+
+    return vendor;
+}
+
+/* For ARM processor errors */
+void qmp_arm_inject_error(bool has_validation,
+                    ArmProcessorValidationBitsList *validation_list,
+                    bool has_affinity_level,
+                    uint8_t affinity_level,
+                    bool has_mpidr_el1,
+                    uint64_t mpidr_el1,
+                    bool has_midr_el1,
+                    uint64_t midr_el1,
+                    bool has_running_state,
+                    ArmProcessorRunningStateList *running_state_list,
+                    bool has_psci_state,
+                    uint32_t psci_state,
+                    bool has_context,
+                    ArmProcessorContextList *context_list,
+                    bool has_vendor_specific,
+                    uint8List *vendor_specific_list,
+                    bool has_error,
+                    ArmProcessorErrorInformationList *error_list,
+                    Error **errp)
+{
+    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0));
+    uint32_t running_state = 0;
+    uint16_t validation = 0;
+    ArmError error;
+    uint16_t i;
+
+    /* Handle UEFI 2.0 N.16 specific fields, setting defaults when needed */
+
+    if (!has_midr_el1) {
+        mpidr_el1 = armcpu->midr;
+    }
+
+    if (!has_mpidr_el1) {
+        mpidr_el1 = armcpu->mpidr;
+    }
+
+    if (!has_psci_state) {
+        psci_state = armcpu->power_state;
+    }
+
+    if (has_running_state) {
+        while (running_state_list) {
+            running_state |= BIT(running_state_list->value);
+            running_state_list = running_state_list->next;
+        }
+
+        if (running_state) {
+            psci_state = 0;
+        }
+    }
+
+    if (has_validation) {
+        while (validation_list) {
+            validation |= BIT(validation_list->value);
+            validation_list = validation_list->next;
+        }
+    } else {
+        if (has_vendor_specific) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_VENDOR_SPECIFIC_VALID);
+        }
+
+        if (has_affinity_level) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_AFFINITY_VALID);
+        }
+
+        if (mpidr_el1) {
+            validation = BIT(ARM_PROCESSOR_VALIDATION_BITS_MPIDR_VALID);
+        }
+
+        if (running_state) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_RUNNING_STATE_VALID);
+        }
+    }
+
+    /* Fill an error record */
+
+    error.validation = validation;
+    error.affinity_level = affinity_level;
+    error.mpidr_el1 = mpidr_el1;
+    error.midr_el1 = midr_el1;
+    error.running_state = running_state;
+    error.psci_state = psci_state;
+
+    error.pei = qmp_arm_pei(&error.err_info_num, has_error, error_list);
+    error.context = qmp_arm_context(&error.context_info_num,
+                                    &error.context_length,
+                                    has_context, context_list);
+    error.vendor = qmp_arm_vendor(&error.vendor_num, has_vendor_specific,
+                                  vendor_specific_list);
+
+    ghes_record_arm_errors(error, ACPI_GHES_NOTIFY_GPIO);
+
+    if (error.context) {
+        for (i = 0; i < error.context_info_num; i++) {
+            g_free(error.context[i].array);
+        }
+    }
+    g_free(error.context);
+    g_free(error.pei);
+    g_free(error.vendor);
+
+    return;
+}
diff --git a/hw/arm/arm_error_inject_stubs.c b/hw/arm/arm_error_inject_stubs.c
new file mode 100644
index 000000000000..be6e8be2d0d9
--- /dev/null
+++ b/hw/arm/arm_error_inject_stubs.c
@@ -0,0 +1,34 @@
+/*
+ * QMP stub for ARM processor error injection.
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_arm_inject_error(bool has_validation,
+                        ArmProcessorValidationBitsList *validation,
+                        bool has_affinity_level,
+                        uint8_t affinity_level,
+                        bool has_mpidr_el1,
+                        uint64_t mpidr_el1,
+                        bool has_midr_el1,
+                        uint64_t midr_el1,
+                        bool has_running_state,
+                        ArmProcessorRunningStateList *running_state,
+                        bool has_psci_state,
+                        uint32_t psci_state,
+                        bool has_context, ArmProcessorContextList *context,
+                        bool has_vendor_specific, uint8List *vendor_specific,
+                        bool has_error,
+                        ArmProcessorErrorInformationList *error,
+                        Error **errp)
+{
+    error_setg(errp, "ARM processor error support is not compiled in");
+}
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 0c07ab522f4c..cb7fe09fc87b 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -60,6 +60,7 @@ arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
 arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add(when: 'CONFIG_ARM_EINJ', if_true: files('arm_error_inject.c'))
 
 system_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
 system_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c'))
@@ -77,5 +78,7 @@ system_ss.add(when: 'CONFIG_TOSA', if_true: files('tosa.c'))
 system_ss.add(when: 'CONFIG_VERSATILE', if_true: files('versatilepb.c'))
 system_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
 system_ss.add(when: 'CONFIG_Z2', if_true: files('z2.c'))
+system_ss.add(when: 'CONFIG_ARM_EINJ', if_false: files('arm_error_inject_stubs.c'))
+system_ss.add(when: 'CONFIG_ALL', if_true: files('arm_error_inject_stubs.c'))
 
 hw_arch += {'arm': arm_ss}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 4f1ab1a73a06..c591a5fb02c4 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
 #define ACPI_GHES_H
 
 #include "hw/acpi/bios-linker-loader.h"
+#include "qapi/qapi-commands-arm-error-inject.h"
 
 /*
  * Values for Hardware Error Notification Type field
@@ -68,6 +69,43 @@ typedef struct AcpiGhesState {
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
+typedef struct ArmPEI {
+    uint16_t validation;
+    uint8_t type;
+    uint16_t multiple_error;
+    uint8_t flags;
+    uint64_t error_info;
+    uint64_t virt_addr;
+    uint64_t phy_addr;
+} ArmPEI;
+
+typedef struct ArmContext {
+    uint16_t type;
+    uint32_t size;
+    uint64_t *array;
+} ArmContext;
+
+/* ARM processor - UEFI 2.10 table N.16 */
+typedef struct ArmError {
+    uint16_t validation;
+
+    uint8_t affinity_level;
+    uint64_t mpidr_el1;
+    uint64_t midr_el1;
+    uint32_t running_state;
+    uint32_t psci_state;
+
+    /* Those are calculated based on the input data */
+    uint16_t err_info_num;
+    uint16_t context_info_num;
+    uint32_t vendor_num;
+    uint32_t context_length;
+
+    ArmPEI *pei;
+    ArmContext *context;
+    uint8_t *vendor;
+} ArmError;
+
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
 void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
                      const char *oem_id, const char *oem_table_id);
@@ -75,6 +113,8 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
 
+bool ghes_record_arm_errors(ArmError error, uint32_t notify);
+
 /**
  * acpi_ghes_present: Report whether ACPI GHES table is present
  *
diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
new file mode 100644
index 000000000000..6e3445c7fe0e
--- /dev/null
+++ b/qapi/arm-error-inject.json
@@ -0,0 +1,284 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = ARM Processor Errors
+#
+# These are defined at
+# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html.
+# See tables N.16, N.17 and N.21.
+##
+
+##
+# @ArmProcessorValidationBits:
+#
+# Indicates whether or not fields of ARM processor CPER record are
+# valid.
+#
+# @mpidr-valid:  MPIDR is valid
+#
+# @affinity-valid: Error affinity level is valid
+#
+# @running-state-valid: Running State is valid
+#
+# @vendor-specific-valid: Vendor Specific Info is valid
+#
+# Since: 9.2
+##
+{ 'enum': 'ArmProcessorValidationBits',
+  'data': ['mpidr-valid',
+           'affinity-valid',
+           'running-state-valid',
+           'vendor-specific-valid']
+}
+
+##
+# @ArmProcessorFlags:
+#
+# Indicates error attributes at the Error info section.
+#
+# @first-error-cap: First error captured
+#
+# @last-error-cap:  Last error captured
+#
+# @propagated: Propagated
+#
+# @overflow: Overflow
+#
+# Since: 9.2
+##
+{ 'enum': 'ArmProcessorFlags',
+  'data': ['first-error-cap',
+           'last-error-cap',
+           'propagated',
+           'overflow']
+}
+
+##
+# @ArmProcessorRunningState:
+#
+# Indicates if the processor is running.
+#
+# @processor-running: indicates that the processor is running
+#
+# Since: 9.2
+##
+{ 'enum': 'ArmProcessorRunningState',
+  'data': ['processor-running']
+}
+
+##
+# @ArmProcessorErrorType:
+#
+# Type of ARM processor error information to inject.
+#
+# @cache-error: Cache error
+#
+# @tlb-error: TLB error
+#
+# @bus-error: Bus error
+#
+# @micro-arch-error: Micro architectural error
+#
+# Since: 9.2
+##
+{ 'enum': 'ArmProcessorErrorType',
+  'data': ['cache-error',
+           'tlb-error',
+           'bus-error',
+           'micro-arch-error']
+ }
+
+##
+# @ArmPeiValidationBits:
+#
+# Indicates whether or not fields of Processor Error Info section
+# are valid.
+#
+# @multiple-error-valid: Information at multiple-error field is valid
+#
+# @flags-valid: Information at flags field is valid
+#
+# @error-info-valid: Information at error-info field is valid
+#
+# @virt-addr-valid: Information at virt-addr field is valid
+#
+# @phy-addr-valid: Information at phy-addr field is valid
+#
+# Since: 9.2
+##
+{ 'enum': 'ArmPeiValidationBits',
+  'data': ['multiple-error-valid',
+           'flags-valid',
+           'error-info-valid',
+           'virt-addr-valid',
+           'phy-addr-valid']
+}
+
+##
+# @ArmProcessorErrorInformation:
+#
+# Contains ARM processor error information (PEI) data according
+# with UEFI CPER table N.17.
+#
+# @validation: Valid validation bits for error-info section.
+#   Argument is optional. If not specified, those flags will
+#   be enabled: first-error-cap and propagated.
+#
+# @type: ARM processor error types to inject. Argument is mandatory.
+#
+# @multiple-error: Indicates whether multiple errors have occurred.
+#   Argument is optional. If not specified and @validation not
+#   forced, this field will be marked as invalid at CPER record.
+#   When valid, the meaning of this field is:
+#
+#       ============== ===================================
+#       multiple-error meaning
+#       ============== ===================================
+#       0              single error
+#       1              multiple errors (with a lost count)
+#       2 or more      actual count of multiple errors
+#       ============== ===================================
+#
+# @flags: Indicates flags that describe the error attributes.
+#   Argument is optional. If not specified and defaults to
+#   first-error and propagated.
+#
+# @error-info: Error information structure is specific to each error
+#   type. Argument is optional, and its value depends on the PEI
+#   type(s). If not defined, the default depends on the type:
+#
+#       ================  ==================
+#       For type          error-info default
+#       ================  ==================
+#       cache-error       ``0x0091000F``
+#       tlb-error         ``0x0054007F``
+#       bus-error         ``0x80D6460FFF``
+#       micro-arch-error  ``0x78DA03FF``
+#       ================  ==================
+#
+#    - if multiple types used, this bit is disabled from
+#      @validation bits, as UEFI doesn't define the expected behavior.
+#
+# @virt-addr: Virtual fault address associated with the error.
+#   Argument is optional. If not specified and @validation not
+#   forced, this field will be marked as invalid at CPER record.
+#
+# @phy-addr: Physical fault address associated with the error.
+#   Argument is optional. If not specified and @validation not
+#   forced, this field will be marked as invalid at CPER record.
+#
+# Since: 9.2
+##
+{ 'struct': 'ArmProcessorErrorInformation',
+  'data': { '*validation': ['ArmPeiValidationBits'],
+            'type': ['ArmProcessorErrorType'],
+            '*multiple-error': 'uint16',
+            '*flags': ['ArmProcessorFlags'],
+            '*error-info': 'uint64',
+            '*virt-addr':  'uint64',
+            '*phy-addr': 'uint64'}
+}
+
+##
+# @ArmProcessorContext:
+#
+# Provide processor context state specific to the ARM processor
+# architecture, according with UEFI 2.10 CPER table N.21.
+#
+# @type: Contains an integer value indicating the type of context
+#   state being reported. Argument is optional. If not defined, it
+#   will be set to be EL1 register for the emulation, e. g.:
+#
+#       ========  =============================
+#       on arm32  AArch32 EL1 context registers
+#       on arm64  AArch64 EL1 context registers
+#       ========  =============================
+#
+# @register: Provides the contents of the actual registers or raw
+#   data, depending on the context type. Argument is optional. If
+#   not defined, it will fill the first register with 0xDEADBEEF,
+#   and the other ones with zero.
+#
+# @minimal-size: Argument is optional. If provided, define the minimal
+#   size of the context register array. The actual size is defined by
+#   checking the number of register values plus the content of this
+#   field (if used), ensuring that each processor context information
+#   structure array is padded with zeros if the size is not a multiple
+#   of 16 bytes.
+#
+# Since: 9.2
+##
+{ 'struct': 'ArmProcessorContext',
+  'data': { '*type': 'uint16',
+            '*minimal-size': 'uint32',
+            '*register': ['uint64']}
+}
+
+##
+# @arm-inject-error:
+#
+# Inject ARM Processor error with data to be filled accordign with
+# UEFI 2.10 CPER table N.16.
+#
+# @validation: Valid validation bits for ARM processor CPER.
+#   Argument is optional. If not specified, the default is calculated
+#   based on having the corresponding arguments filled.
+#
+# @affinity-level: Error affinity level for errors that can be
+#   attributed to a specific affinity level. Argument is optional.
+#   If not specified and @validation not forced, this field will be
+#   marked as invalid at CPER record.
+#
+# @mpidr-el1: Processor’s unique ID in the system. Argument is
+#   optional. If not specified, it will use the cpu mpidr field from
+#   the emulation data. If zero and @validation is not forced, this
+#   field will be marked as invalid at CPER record.
+#
+# @midr-el1: Identification info of the chip. Argument is optional.
+#   If not specified, it will use the cpu mpidr field from the
+#   emulation data. If zero and @validation is not forced, this
+#   field will be marked as invalid at CPER record.
+#
+# @running-state: Indicates the running state of the processor.
+#   Argument is optional. If not specified and @validation not
+#   forced, this field will be marked as invalid at CPER record.
+#
+# @psci-state: Provides PSCI state of the processor, as defined in
+#   ARM PSCI document. Argument is optional. If not specified, it
+#   will use the cpu power state field from the emulation data.
+#
+# @context: Contains an array of processor context registers.
+#   Argument is optional. If not specified, no context will be added.
+#
+# @vendor-specific: Contains a byte array of vendor-specific data.
+#   Argument is optional. If not specified, no vendor-specific data
+#   will be added.
+#
+# @error: Contains an array of ARM processor error information (PEI)
+#   sections. Argument is optional. If not specified, defaults to a
+#   single Program Error Information record with a cache error, e. g.
+#   it would be equivalent of filling the PEI argument with::
+#
+#       "error" = { "type"= {[ "cache-error" ]} }
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.2
+##
+{ 'command': 'arm-inject-error',
+  'data': {
+    '*validation': ['ArmProcessorValidationBits'],
+    '*affinity-level': 'uint8',
+    '*mpidr-el1': 'uint64',
+    '*midr-el1': 'uint64',
+    '*running-state':  ['ArmProcessorRunningState'],
+    '*psci-state': 'uint32',
+    '*context': ['ArmProcessorContext'],
+    '*vendor-specific': ['uint8'],
+    '*error': ['ArmProcessorErrorInformation']
+  },
+  'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..5927932c4be3 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
 endif
 
 qapi_all_modules = [
+  'arm-error-inject',
   'authz',
   'block',
   'block-core',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..479a22de7e43 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -81,3 +81,4 @@
 { 'include': 'vfio.json' }
 { 'include': 'cryptodev.json' }
 { 'include': 'cxl.json' }
+{ 'include': 'arm-error-inject.json' }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-29 13:21 ` [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-07-29 16:08   ` Jonathan Cameron via
  2024-07-30  5:13     ` Mauro Carvalho Chehab
  2024-07-30  8:11   ` Zhao Liu
  1 sibling, 1 reply; 16+ messages in thread
From: Jonathan Cameron via @ 2024-07-29 16:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Michael S. Tsirkin, Philippe Mathieu-Daudé,
	Ani Sinha, Eduardo Habkost, Igor Mammedov, Marcel Apfelbaum,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, linux-kernel,
	qemu-arm, qemu-devel

On Mon, 29 Jul 2024 15:21:06 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Creates a Generic Event Device (GED) as specified at

I wrote this a while back and wasn't aware of the naming
mess around GED in the ACPI spec.  This one is just
referred to as 'error device' whereas there is also
a Generic Event Device.

Linux solved this clash by going with Hardware Error Device
I think we should do the same here.

> ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> with HID PNP0C33.
> 
> The PNP0C33 device is used to report hardware errors to
> the bios via ACPI APEI Generic Hardware Error Source (GHES).
> 
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> 
> [mchehab: use a define for the generic event pin number and do some cleanups]
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
>  hw/arm/virt.c            | 14 ++++++++++++--
>  include/hw/arm/virt.h    |  1 +
>  include/hw/boards.h      |  1 +
>  4 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..c502ccf40909 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -63,6 +63,7 @@
>  
>  #define ARM_SPI_BASE 32
>  
> +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"

Ah. My mistake. This is the confusing named GENERIC_ERROR_DEVICE
or HARDWARE_ERROR_DEVICE (which is what Linux called it because
in the ACPI Spec it is just (all lower case) error device).

>  #define ACPI_BUILD_TABLE_SIZE             0x20000

>  /* DSDT */
>  static void
>  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>                        HOTPLUG_HANDLER(vms->acpi_dev),
>                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
>                        memmap[VIRT_ACPI_GED].base);
> -    } else {
> -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
>      }
> +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));

Arguably excess brackets, but obviously this is just a code move
so fine to keep it the same.

>  
>      if (vms->acpi_dev) {
>          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      }
>  
>      acpi_dsdt_add_power_button(scope);
> +    acpi_dsdt_add_generic_event_device(scope);
>  #ifdef CONFIG_TPM
>      acpi_dsdt_add_tpm(scope, vms);
>  #endif



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER
  2024-07-29 13:21 ` [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER Mauro Carvalho Chehab
@ 2024-07-29 16:31   ` Jonathan Cameron via
  2024-07-30  6:16     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Cameron via @ 2024-07-29 16:31 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
	Eric Blake, Igor Mammedov, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Mon, 29 Jul 2024 15:21:10 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Add an ACPI APEI GHES error injection logic for ARM
> processor CPER, allowing to set fields at from
> UEFI spec 2.10 tables N.16 and N.17 to any valid
> value.
> 
> As some GHES functions require handling addresses, add
> a helper function to support it.
> 
> Before starting erorr inject, the QAPI requires to negociate
> QMP with:
> 
> { "execute": "qmp_capabilities" }
> 
> Afterwards, errors can be injected with:
> 
> 	{ "execute": "arm-inject-error" }
> 
> The error injection events supports several optional arguments,
> having Processor Error Information (PEI) mapped into an array.
> 
> So, it is possible to inject multiple errors at the same CPER record,
> as defined at UEFI spec, with:
> 
> 	{ "execute": "arm-inject-error", "arguments": {
> 	   "error": [ {"type": [ "cache-error" ]},
> 		      {"type": [ "tlb-error" ]} ] } }
> 
> The above generates a single CPER record with two PEI info, one
> reporting a cache error, and the other one a TLB error, using
> default values for other fields.
> 
> As all fields from ARM Processor CPER are mapped, so, the error
> could contain physical/virtual addresses, register dumps,
> vendor-specific data, etc.
> 
> This patch is co-authored:
> - ghes logic to inject a simple ARM record by Shiju Jose;
> - generic logic to handle block addresses by Jonathan Cameron;
> - logic to allow changing all fields by Mauro Carvalho Chehab;
> 
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

A few minor comments inline.
This crossed (I think) with a reply from Markus though so there are
some other bits to address from that.

Jonathan

> diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
> index 84cb32dc2f4f..b4b3cd97934a 100644
> --- a/configs/targets/aarch64-softmmu.mak
> +++ b/configs/targets/aarch64-softmmu.mak
> @@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
>  TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
>  # needed by boot.c
>  TARGET_NEED_FDT=y
> +CONFIG_ARM_EINJ=y
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 9346f45c59a5..e435c9aa0961 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -27,6 +27,7 @@
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "qemu/uuid.h"
> +#include "qapi/qapi-types-arm-error-inject.h"
>  
>  #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
>  #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> @@ -53,6 +54,12 @@
>  /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
>  #define ACPI_GHES_MEM_CPER_LENGTH           80
>  
> +/*
> + * ARM Processor error section CPER sizes - UEFI 2.10: N.2.4.4
> + */
> +#define ACPI_GHES_ARM_CPER_LENGTH           40
> +#define ACPI_GHES_ARM_CPER_PEI_LENGTH       32
> +
>  /* Masks for block_status flags */
>  #define ACPI_GEBS_UNCORRECTABLE         1
>  
> @@ -234,6 +241,152 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>      return 0;
>  }
>  
> +/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
> +static void acpi_ghes_build_append_arm_cper(ArmError err, uint32_t cper_length,
> +                                            GArray *table)
> +{
> +    unsigned int i, j;
> +
> +    /*
> +     * ARM Processor Error Record
> +     */
> +
> +    /* Validation Bits */

Given nice naming, maybe drop the comments where the field
name makes it obvious?

> +    build_append_int_noprefix(table, err.validation, 4);
> +
> +    /* Error Info Num */
> +    build_append_int_noprefix(table, err.err_info_num, 2);
> +
> +    /* Context Info Num */
> +    build_append_int_noprefix(table, err.context_info_num, 2);
> +
> +    /* Section length */
> +    build_append_int_noprefix(table, cper_length, 4);
> +
> +    /* Error affinity level */
> +    build_append_int_noprefix(table, err.affinity_level, 1);
> +
> +    /* Reserved */
> +    build_append_int_noprefix(table, 0, 3);
> +
> +    /* MPIDR_EL1 */
> +    build_append_int_noprefix(table, err.mpidr_el1, 8);
> +
> +    /* MIDR_EL1 */
> +    build_append_int_noprefix(table, err.midr_el1, 8);
> +
> +    /* Running state */
> +    build_append_int_noprefix(table, err.running_state, 4);
> +
> +    /* PSCI state: only valid when running state is zero  */
> +    build_append_int_noprefix(table, err.psci_state, 4);
> +
> +    for (i = 0; i < err.err_info_num; i++) {
> +        /* ARM Propcessor error information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 1);
> +
> +        /*  Length */
> +        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
> +
> +        /* Validation Bits */
> +        build_append_int_noprefix(table, err.pei[i].validation, 2);
> +
> +        /* Type */
> +        build_append_int_noprefix(table, err.pei[i].type, 1);
> +
> +        /* Multiple error count */
> +        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
> +
> +        /* Flags  */
> +        build_append_int_noprefix(table, err.pei[i].flags, 1);
> +
> +        /* Error information  */
> +        build_append_int_noprefix(table, err.pei[i].error_info, 8);
> +
> +        /* Virtual fault address  */
> +        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
> +
> +        /* Physical fault address  */
> +        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
> +    }
> +
> +    for (i = 0; i < err.context_info_num; i++) {
> +        /* ARM Propcessor error context information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 2);
> +
> +        /* Validation type */
> +        build_append_int_noprefix(table, err.context[i].type, 2);
> +
> +        /* Register array size */
> +        build_append_int_noprefix(table, err.context[i].size * 8, 4);
> +
> +        /* Register array (byte 8 of Context info) */
> +        for (j = 0; j < err.context[i].size; j++) {
> +            build_append_int_noprefix(table, err.context[i].array[j], 8);
> +        }
> +    }
> +
> +    for (i = 0; i < err.vendor_num; i++) {
> +        build_append_int_noprefix(table, err.vendor[i], 1);
> +    }
> +}


...

> diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
> new file mode 100644
> index 000000000000..5ebbdf2b2adc
> --- /dev/null
> +++ b/hw/arm/arm_error_inject.c
> @@ -0,0 +1,420 @@
> +/*
> + * ARM Processor error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/boards.h"
> +#include "hw/acpi/ghes.h"
> +#include "cpu.h"
> +
> +#define ACPI_GHES_ARM_CPER_CTX_DEFAULT_NREGS 74
> +
> +/* Handle ARM Processor Error Information (PEI) */
> +static const ArmProcessorErrorInformationList *default_pei = { 0 };
> +
> +static ArmPEI *qmp_arm_pei(uint16_t *err_info_num,
> +              bool has_error,
> +              ArmProcessorErrorInformationList const *error_list)
> +{
> +    ArmProcessorErrorInformationList const *next;
> +    ArmPeiValidationBitsList const *validation_list;
> +    ArmPEI *pei = NULL;
> +    uint16_t i;
> +
> +    if (!has_error) {
> +        error_list = default_pei;
> +    }
> +
> +    *err_info_num = 0;
> +
> +    for (next = error_list; next; next = next->next) {
> +        (*err_info_num)++;
> +
> +        if (*err_info_num >= 255) {
> +            break;
> +        }
> +    }
> +
> +    pei = g_new0(ArmPEI, (*err_info_num));
> +
> +    for (next = error_list, i = 0;
> +                i < *err_info_num; i++, next = next->next) {
> 
Odd alignment.


> +/* For ARM processor errors */
> +void qmp_arm_inject_error(bool has_validation,
> +                    ArmProcessorValidationBitsList *validation_list,
> +                    bool has_affinity_level,
> +                    uint8_t affinity_level,
> +                    bool has_mpidr_el1,
> +                    uint64_t mpidr_el1,
> +                    bool has_midr_el1,
> +                    uint64_t midr_el1,
> +                    bool has_running_state,
> +                    ArmProcessorRunningStateList *running_state_list,
> +                    bool has_psci_state,
> +                    uint32_t psci_state,
> +                    bool has_context,
> +                    ArmProcessorContextList *context_list,
> +                    bool has_vendor_specific,
> +                    uint8List *vendor_specific_list,
> +                    bool has_error,
> +                    ArmProcessorErrorInformationList *error_list,
> +                    Error **errp)
> +{
...

> +
> +    if (error.context) {

No need for check. If context_info_num is zero their
won't be a loop iteration. 

> +        for (i = 0; i < error.context_info_num; i++) {
> +            g_free(error.context[i].array);
> +        }
> +    }
> +    g_free(error.context);
> +    g_free(error.pei);
> +    g_free(error.vendor);
> +
> +    return;
> +}
> diff --git a/hw/arm/arm_error_inject_stubs.c b/hw/arm/arm_error_inject_stubs.c
> new file mode 100644
> index 000000000000..be6e8be2d0d9
> --- /dev/null
> +++ b/hw/arm/arm_error_inject_stubs.c
> @@ -0,0 +1,34 @@
> +/*
> + * QMP stub for ARM processor error injection.
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_arm_inject_error(bool has_validation,
> +                        ArmProcessorValidationBitsList *validation,
> +                        bool has_affinity_level,
> +                        uint8_t affinity_level,
> +                        bool has_mpidr_el1,
> +                        uint64_t mpidr_el1,
> +                        bool has_midr_el1,
> +                        uint64_t midr_el1,
> +                        bool has_running_state,
> +                        ArmProcessorRunningStateList *running_state,
> +                        bool has_psci_state,
> +                        uint32_t psci_state,
> +                        bool has_context, ArmProcessorContextList *context,
> +                        bool has_vendor_specific, uint8List *vendor_specific,
> +                        bool has_error,
> +                        ArmProcessorErrorInformationList *error,
> +                        Error **errp)
> +{
> +    error_setg(errp, "ARM processor error support is not compiled in");
> +}
Markus suggested:

> A target-specific command like this one should be conditional.  Try
> this:
> 
>     { 'command': 'arm-inject-error',
>       'data': { 'errortypes': ['ArmProcessorErrorType'] },
>       'features': [ 'unstable' ],
>       'if': 'TARGET_ARM' }
>
> No need to provide a qmp_arm_inject_error() stub then.

(I noticed because never knew you could do this.)

Probably crossed with your v4 posting.

> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 4f1ab1a73a06..c591a5fb02c4 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h

...

> +/* ARM processor - UEFI 2.10 table N.16 */
> +typedef struct ArmError {
> +    uint16_t validation;
> +
> +    uint8_t affinity_level;
> +    uint64_t mpidr_el1;
> +    uint64_t midr_el1;
> +    uint32_t running_state;
> +    uint32_t psci_state;
> +
> +    /* Those are calculated based on the input data */
    /* Calculated based on the input data */
> +    uint16_t err_info_num;
> +    uint16_t context_info_num;
> +    uint32_t vendor_num;
> +    uint32_t context_length;
> +
> +    ArmPEI *pei;
> +    ArmContext *context;
> +    uint8_t *vendor;
> +} ArmError;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-29 16:08   ` Jonathan Cameron via
@ 2024-07-30  5:13     ` Mauro Carvalho Chehab
  2024-07-30  8:39       ` Jonathan Cameron via
  0 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-30  5:13 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shiju Jose, Michael S. Tsirkin, Philippe Mathieu-Daudé,
	Ani Sinha, Eduardo Habkost, Igor Mammedov, Marcel Apfelbaum,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, linux-kernel,
	qemu-arm, qemu-devel

Em Mon, 29 Jul 2024 17:08:40 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> On Mon, 29 Jul 2024 15:21:06 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Creates a Generic Event Device (GED) as specified at  
> 
> I wrote this a while back and wasn't aware of the naming
> mess around GED in the ACPI spec.  This one is just
> referred to as 'error device' whereas there is also
> a Generic Event Device.
> 
> Linux solved this clash by going with Hardware Error Device
> I think we should do the same here.

I opted to do it a little bit different to stay closer to ACPI 6.5
18.3.2.7.2. - Event Notification For Generic Error Sources.

There, it is actually talking about a General Purpose Event (GPE).
Current ACPI spec doesn't mention "GED", so maybe such term was fixed
on some previous ACPI spec revision.

Basically, it currently mentions:
	- error device
	- GPE / General Purpose Event
	- Generic Hardware Error Source Structure 

I guess Linux crafted the term Hardware Error device by mixing
those.

As we don't need to really preserve such names here, as this appears
only at the patch description, I opted to rewrite the patch description
to:

    arm/virt: Wire up GPIO error source for ACPI / GHES
    
    Creates a hardware event device to support General Purpose
    Event (GPE) as specified at ACPI 6.5 specification at 18.3.2.7.2:
    https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
    using HID PNP0C33.
    
    The PNP0C33 device is used to report hardware errors to
    the bios via ACPI APEI Generic Hardware Error Source (GHES).
    
    It is aligned with Linux Kernel patch:
    https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
    
    [mchehab: use a define for the generic event pin number and do some cleanups]
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Clearly associating "hardware event device" with ACPI GPE. That sounds
good enough to be stored at the git description associated with such
change.

> > ACPI 6.5 specification at 18.3.2.7.2:
> > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > with HID PNP0C33.
> > 
> > The PNP0C33 device is used to report hardware errors to
> > the bios via ACPI APEI Generic Hardware Error Source (GHES).
> > 
> > It is aligned with Linux Kernel patch:
> > https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> > 
> > [mchehab: use a define for the generic event pin number and do some cleanups]
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> 
> > ---
> >  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
> >  hw/arm/virt.c            | 14 ++++++++++++--
> >  include/hw/arm/virt.h    |  1 +
> >  include/hw/boards.h      |  1 +
> >  4 files changed, 40 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index f76fb117adff..c502ccf40909 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -63,6 +63,7 @@
> >  
> >  #define ARM_SPI_BASE 32
> >  
> > +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"  
> 
> Ah. My mistake. This is the confusing named GENERIC_ERROR_DEVICE
> or HARDWARE_ERROR_DEVICE (which is what Linux called it because
> in the ACPI Spec it is just (all lower case) error device).

I opted to use a different name there, using just error device,
together with the name of the PNP device. So:

	#define PNP0C33_ERROR_DEVICE "GEDD"

This is clear enough for people just looking at the driver, and
even clearer for people familiar with session 18.3.2.7.2 of the
ACPI spec.

> 
> >  #define ACPI_BUILD_TABLE_SIZE             0x20000  
> 
> >  /* DSDT */
> >  static void
> >  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >                        HOTPLUG_HANDLER(vms->acpi_dev),
> >                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
> >                        memmap[VIRT_ACPI_GED].base);
> > -    } else {
> > -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> >      }
> > +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));  
> 
> Arguably excess brackets, but obviously this is just a code move
> so fine to keep it the same.

I'll drop the extra brackets.

> >  
> >      if (vms->acpi_dev) {
> >          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> > @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      }
> >  
> >      acpi_dsdt_add_power_button(scope);

> > +    acpi_dsdt_add_generic_event_device(scope);

I'm also renaming this function/function call to run away from GED,
calling it as:

	 acpi_dsdt_add_error_device(scope);

> >  #ifdef CONFIG_TPM
> >      acpi_dsdt_add_tpm(scope, vms);
> >  #endif  
> 

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER
  2024-07-29 16:31   ` Jonathan Cameron via
@ 2024-07-30  6:16     ` Mauro Carvalho Chehab
  2024-07-31  7:23       ` Markus Armbruster
  0 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-30  6:16 UTC (permalink / raw)
  To: Jonathan Cameron, Markus Armbruster
  Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
	Eric Blake, Igor Mammedov, Michael Roth, Paolo Bonzini,
	Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Mon, 29 Jul 2024 17:31:09 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> On Mon, 29 Jul 2024 15:21:10 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

...

> Markus suggested:
> 
> > A target-specific command like this one should be conditional.  Try
> > this:
> > 
> >     { 'command': 'arm-inject-error',
> >       'data': { 'errortypes': ['ArmProcessorErrorType'] },
> >       'features': [ 'unstable' ],
> >       'if': 'TARGET_ARM' }
> >
> > No need to provide a qmp_arm_inject_error() stub then.  
> 
> (I noticed because never knew you could do this.)
> 
> Probably crossed with your v4 posting.

Tried it, but can't figure out how to properly set it up at meson.build,
as it is basically producing build time errors during qapi file generation
on non-ARM platforms. For instance:

FAILED: libqemuutil.a.p/meson-generated_.._qapi_qapi-visit-arm-error-inject.c.o 
cc -m64 -Ilibqemuutil.a.p -I. -I.. -Isubprojects/libvhost-user -I../subprojects/libvhost-user -Iqapi -Itrace -Iui -Iui/shader -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/sysprof-6 -I/usr/include/gio-unix-2.0 -I/usr/include/p11-kit-1 -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g -fstack-protector-strong -Wempty-body -Wendif-labels -Wexpansion-to-defined -Wformat-security -Wformat-y2k -Wignored-qualifiers -Wimplicit-fallthrough=2 -Winit-self -Wmissing-format-attribute -Wmissing-prototypes -Wnested-externs -Wold-style-declaration -Wold-style-definition -Wredundant-decls -Wshadow=local -Wstrict-prototypes -Wtype-limits -Wundef -Wvla -Wwrite-strings -Wno-missing-include-dirs -Wno-psabi -Wno-shift-negative-value -isystem /new_devel/edac/qemu/linux-headers -isystem linux-headers -iquote . -iquote /new_devel/edac/qemu -iquote /new_devel/edac/qemu/include -iquote /new_devel/edac/qemu/host/include/x8
 6_64 -iquote /new_devel/edac/qemu/host/include/generic -iquote /new_devel/edac/qemu/tcg/i386 -pthread -msse2 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv -ftrivial-auto-var-init=zero -fzero-call-used-regs=used-gpr -fPIE -DWITH_GZFILEOP -MD -MQ libqemuutil.a.p/meson-generated_.._qapi_qapi-visit-arm-error-inject.c.o -MF libqemuutil.a.p/meson-generated_.._qapi_qapi-visit-arm-error-inject.c.o.d -o libqemuutil.a.p/meson-generated_.._qapi_qapi-visit-arm-error-inject.c.o -c qapi/qapi-visit-arm-error-inject.c
In file included from qapi/qapi-visit-arm-error-inject.h:17,
                 from qapi/qapi-visit-arm-error-inject.c:15:
qapi/qapi-types-arm-error-inject.h:18:13: error: attempt to use poisoned "TARGET_ARM"
   18 | #if defined(TARGET_ARM)
      |             ^
In file included from /new_devel/edac/qemu/include/exec/poison.h:7,
                 from /new_devel/edac/qemu/include/qemu/osdep.h:38,
                 from qapi/qapi-visit-arm-error-inject.c:13:
./config-poison.h:718:20: note: poisoned here

Such error is created by two files generated from qapi, due
to this change:

	diff --git a/qapi/meson.build b/qapi/meson.build
	index e7bc54e5d047..5927932c4be3 100644
	--- a/qapi/meson.build
	+++ b/qapi/meson.build
	@@ -24,2 +24,3 @@ endif
	 qapi_all_modules = [
	+  'arm-error-inject',
	   'authz',

No idea how to fix it.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-29 13:21 ` [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
  2024-07-29 16:08   ` Jonathan Cameron via
@ 2024-07-30  8:11   ` Zhao Liu
  2024-07-31  5:21     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 16+ messages in thread
From: Zhao Liu @ 2024-07-30  8:11 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Igor Mammedov, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, linux-kernel, qemu-arm, qemu-devel

Hi Mauro,

On Mon, Jul 29, 2024 at 03:21:06PM +0200, Mauro Carvalho Chehab wrote:
> Date: Mon, 29 Jul 2024 15:21:06 +0200
> From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Subject: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
> X-Mailer: git-send-email 2.45.2
> 
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Creates a Generic Event Device (GED) as specified at
> ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> with HID PNP0C33.
> 
> The PNP0C33 device is used to report hardware errors to
> the bios via ACPI APEI Generic Hardware Error Source (GHES).
> 
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> 
> [mchehab: use a define for the generic event pin number and do some cleanups]
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
>  hw/arm/virt.c            | 14 ++++++++++++--
>  include/hw/arm/virt.h    |  1 +
>  include/hw/boards.h      |  1 +
>  4 files changed, 40 insertions(+), 6 deletions(-)

[snip]

> +static void virt_set_error(void)
> +{
> +    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> +}
> +

[snip]

> +    mc->generic_error_device_notify = virt_set_error;

[snip]

> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 48ff6d8b93f7..991f99138e57 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -308,6 +308,7 @@ struct MachineClass {
>      int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
>      ram_addr_t (*fixup_ram_size)(ram_addr_t size);
>      uint64_t smbios_memory_device_size;
> +    void (*generic_error_device_notify)(void);

The name looks inconsistent with the style of other MachineClass virtual
methods. What about the name like "notify_xxx"? And pls add the comment
about this new method.

BTW, I found this method is called in generic_error_device_notify() of
Patch 6. And the mc->generic_error_device_notify() - as the virtual
metchod of MachineClass looks just to implement a hook, and it doesn't
seem to have anything to do with MachineClass/MachineState, so my
question is why do we need to add this method to MachineClass?

Could we maintain a notifier list in ghes.c and expose an interface
to allow arm code register a notifier? This eliminates the need to add
the “notify” method to MachineClass.

Regards,
Zhao



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-30  5:13     ` Mauro Carvalho Chehab
@ 2024-07-30  8:39       ` Jonathan Cameron via
  0 siblings, 0 replies; 16+ messages in thread
From: Jonathan Cameron via @ 2024-07-30  8:39 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Michael S. Tsirkin, Philippe Mathieu-Daudé,
	Ani Sinha, Eduardo Habkost, Igor Mammedov, Marcel Apfelbaum,
	Peter Maydell, Shannon Zhao, Yanan Wang, Zhao Liu, linux-kernel,
	qemu-arm, qemu-devel

On Tue, 30 Jul 2024 07:13:30 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Mon, 29 Jul 2024 17:08:40 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> 
> > On Mon, 29 Jul 2024 15:21:06 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > 
> > > Creates a Generic Event Device (GED) as specified at    
> > 
> > I wrote this a while back and wasn't aware of the naming
> > mess around GED in the ACPI spec.  This one is just
> > referred to as 'error device' whereas there is also
> > a Generic Event Device.
> > 
> > Linux solved this clash by going with Hardware Error Device
> > I think we should do the same here.  
> 
> I opted to do it a little bit different to stay closer to ACPI 6.5
> 18.3.2.7.2. - Event Notification For Generic Error Sources.
> 
> There, it is actually talking about a General Purpose Event (GPE).
> Current ACPI spec doesn't mention "GED", so maybe such term was fixed
> on some previous ACPI spec revision.
> 
> Basically, it currently mentions:
> 	- error device
> 	- GPE / General Purpose Event
> 	- Generic Hardware Error Source Structure 
> 
> I guess Linux crafted the term Hardware Error device by mixing
> those.
> 
> As we don't need to really preserve such names here, as this appears
> only at the patch description, I opted to rewrite the patch description
> to:
> 
>     arm/virt: Wire up GPIO error source for ACPI / GHES
>     
>     Creates a hardware event device to support General Purpose
>     Event (GPE) as specified at ACPI 6.5 specification at 18.3.2.7.2:
>     https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
>     using HID PNP0C33.
>     
>     The PNP0C33 device is used to report hardware errors to
>     the bios via ACPI APEI Generic Hardware Error Source (GHES).
>     
>     It is aligned with Linux Kernel patch:
>     https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
>     
>     [mchehab: use a define for the generic event pin number and do some cleanups]
>     Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>     Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> 
> Clearly associating "hardware event device" with ACPI GPE. That sounds
> good enough to be stored at the git description associated with such
> change.
All looks good to me.

Thanks,

Jonathan

> 
> > > ACPI 6.5 specification at 18.3.2.7.2:
> > > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > > with HID PNP0C33.
> > > 
> > > The PNP0C33 device is used to report hardware errors to
> > > the bios via ACPI APEI Generic Hardware Error Source (GHES).
> > > 
> > > It is aligned with Linux Kernel patch:
> > > https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> > > 
> > > [mchehab: use a define for the generic event pin number and do some cleanups]
> > > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>    
> >   
> > > ---
> > >  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
> > >  hw/arm/virt.c            | 14 ++++++++++++--
> > >  include/hw/arm/virt.h    |  1 +
> > >  include/hw/boards.h      |  1 +
> > >  4 files changed, 40 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > > index f76fb117adff..c502ccf40909 100644
> > > --- a/hw/arm/virt-acpi-build.c
> > > +++ b/hw/arm/virt-acpi-build.c
> > > @@ -63,6 +63,7 @@
> > >  
> > >  #define ARM_SPI_BASE 32
> > >  
> > > +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"    
> > 
> > Ah. My mistake. This is the confusing named GENERIC_ERROR_DEVICE
> > or HARDWARE_ERROR_DEVICE (which is what Linux called it because
> > in the ACPI Spec it is just (all lower case) error device).  
> 
> I opted to use a different name there, using just error device,
> together with the name of the PNP device. So:
> 
> 	#define PNP0C33_ERROR_DEVICE "GEDD"
> 
> This is clear enough for people just looking at the driver, and
> even clearer for people familiar with session 18.3.2.7.2 of the
> ACPI spec.
> 
> >   
> > >  #define ACPI_BUILD_TABLE_SIZE             0x20000    
> >   
> > >  /* DSDT */
> > >  static void
> > >  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > > @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > >                        HOTPLUG_HANDLER(vms->acpi_dev),
> > >                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
> > >                        memmap[VIRT_ACPI_GED].base);
> > > -    } else {
> > > -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > > -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> > >      }
> > > +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > > +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));    
> > 
> > Arguably excess brackets, but obviously this is just a code move
> > so fine to keep it the same.  
> 
> I'll drop the extra brackets.
> 
> > >  
> > >      if (vms->acpi_dev) {
> > >          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> > > @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > >      }
> > >  
> > >      acpi_dsdt_add_power_button(scope);  
> 
> > > +    acpi_dsdt_add_generic_event_device(scope);  
> 
> I'm also renaming this function/function call to run away from GED,
> calling it as:
> 
> 	 acpi_dsdt_add_error_device(scope);
> 
> > >  #ifdef CONFIG_TPM
> > >      acpi_dsdt_add_tpm(scope, vms);
> > >  #endif    
> >   
> 
> Thanks,
> Mauro



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-30  8:11   ` Zhao Liu
@ 2024-07-31  5:21     ` Mauro Carvalho Chehab
  2024-07-31  8:00       ` Zhao Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-31  5:21 UTC (permalink / raw)
  To: Zhao Liu
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Igor Mammedov, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, linux-kernel, qemu-arm, qemu-devel

Em Tue, 30 Jul 2024 16:11:42 +0800
Zhao Liu <zhao1.liu@intel.com> escreveu:

> Hi Mauro,
> 
> On Mon, Jul 29, 2024 at 03:21:06PM +0200, Mauro Carvalho Chehab wrote:
> > Date: Mon, 29 Jul 2024 15:21:06 +0200
> > From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Subject: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
> > X-Mailer: git-send-email 2.45.2
> > 
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Creates a Generic Event Device (GED) as specified at
> > ACPI 6.5 specification at 18.3.2.7.2:
> > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > with HID PNP0C33.
> > 
> > The PNP0C33 device is used to report hardware errors to
> > the bios via ACPI APEI Generic Hardware Error Source (GHES).
> > 
> > It is aligned with Linux Kernel patch:
> > https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> > 
> > [mchehab: use a define for the generic event pin number and do some cleanups]
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
> >  hw/arm/virt.c            | 14 ++++++++++++--
> >  include/hw/arm/virt.h    |  1 +
> >  include/hw/boards.h      |  1 +
> >  4 files changed, 40 insertions(+), 6 deletions(-)  
> 
> [snip]
> 
> > +static void virt_set_error(void)
> > +{
> > +    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> > +}
> > +  
> 
> [snip]
> 
> > +    mc->generic_error_device_notify = virt_set_error;  
> 
> [snip]
> 
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index 48ff6d8b93f7..991f99138e57 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -308,6 +308,7 @@ struct MachineClass {
> >      int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
> >      ram_addr_t (*fixup_ram_size)(ram_addr_t size);
> >      uint64_t smbios_memory_device_size;
> > +    void (*generic_error_device_notify)(void);  
> 
> The name looks inconsistent with the style of other MachineClass virtual
> methods. What about the name like "notify_xxx"? And pls add the comment
> about this new method.
> 
> BTW, I found this method is called in generic_error_device_notify() of
> Patch 6. And the mc->generic_error_device_notify() - as the virtual
> metchod of MachineClass looks just to implement a hook, and it doesn't
> seem to have anything to do with MachineClass/MachineState, so my
> question is why do we need to add this method to MachineClass?
> 
> Could we maintain a notifier list in ghes.c and expose an interface
> to allow arm code register a notifier? This eliminates the need to add
> the “notify” method to MachineClass.

Makes sense. I'll change the logic to use this notifier list code inside
ghes.c, and drop generic_error_device_notify():

	NotifierList generic_error_notifiers =
	    NOTIFIER_LIST_INITIALIZER(error_device_notifiers);

	/* Notify BIOS about an error via Generic Error Device - GED */
	static void generic_error_device_notify(void)
	{
	    notifier_list_notify(&generic_error_notifiers, NULL);
	}

Regards,
Mauro


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER
  2024-07-30  6:16     ` Mauro Carvalho Chehab
@ 2024-07-31  7:23       ` Markus Armbruster
  0 siblings, 0 replies; 16+ messages in thread
From: Markus Armbruster @ 2024-07-31  7:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Mon, 29 Jul 2024 17:31:09 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
>
>> On Mon, 29 Jul 2024 15:21:10 +0200
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> ...
>
>> Markus suggested:
>> 
>> > A target-specific command like this one should be conditional.  Try
>> > this:
>> > 
>> >     { 'command': 'arm-inject-error',
>> >       'data': { 'errortypes': ['ArmProcessorErrorType'] },
>> >       'features': [ 'unstable' ],
>> >       'if': 'TARGET_ARM' }
>> >
>> > No need to provide a qmp_arm_inject_error() stub then.  
>> 
>> (I noticed because never knew you could do this.)
>> 
>> Probably crossed with your v4 posting.
>
> Tried it, but can't figure out how to properly set it up at meson.build,
> as it is basically producing build time errors during qapi file generation
> on non-ARM platforms. For instance:
>
> FAILED: libqemuutil.a.p/meson-generated_.._qapi_qapi-visit-arm-error-inject.c.o 

[...]

> In file included from qapi/qapi-visit-arm-error-inject.h:17,
>                  from qapi/qapi-visit-arm-error-inject.c:15:
> qapi/qapi-types-arm-error-inject.h:18:13: error: attempt to use poisoned "TARGET_ARM"
>    18 | #if defined(TARGET_ARM)
>       |             ^
> In file included from /new_devel/edac/qemu/include/exec/poison.h:7,
>                  from /new_devel/edac/qemu/include/qemu/osdep.h:38,
>                  from qapi/qapi-visit-arm-error-inject.c:13:
> ./config-poison.h:718:20: note: poisoned here
>
> Such error is created by two files generated from qapi, due
> to this change:
>
> 	diff --git a/qapi/meson.build b/qapi/meson.build
> 	index e7bc54e5d047..5927932c4be3 100644
> 	--- a/qapi/meson.build
> 	+++ b/qapi/meson.build
> 	@@ -24,2 +24,3 @@ endif
> 	 qapi_all_modules = [
> 	+  'arm-error-inject',
> 	   'authz',
>
> No idea how to fix it.

Uh, I neglected to point out an important detail.  Sorry about that!

The 'if' condition uses a symbol that is poisoned in target-independent
compiles.  Such conditions work only in target modules.  By convention,
these are named FOO-target.json.

So, you can either change new module's filename to end in -target.json,
or you stick the command into one of the two existing target modules,
machine-target.json and misc-target.json.

Speaking of modules: i'm not sure dedicating a module just to ARM error
injection is a good idea.  Perhaps we could have arm-target.json for
things that will only ever make sense on ARM.  Same for other targets.

But I recommend to first reach consensus on Igor's objection in review
of v3:

    Message-ID: <20240730131709.10e72c7d@imammedo.users.ipa.redhat.com>
    https://lore.kernel.org/qemu-devel/20240730131709.10e72c7d@imammedo.users.ipa.redhat.com/



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-31  5:21     ` Mauro Carvalho Chehab
@ 2024-07-31  8:00       ` Zhao Liu
  0 siblings, 0 replies; 16+ messages in thread
From: Zhao Liu @ 2024-07-31  8:00 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Igor Mammedov, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, linux-kernel, qemu-arm, qemu-devel

On Wed, Jul 31, 2024 at 07:21:58AM +0200, Mauro Carvalho Chehab wrote:

[snip]

> > The name looks inconsistent with the style of other MachineClass virtual
> > methods. What about the name like "notify_xxx"? And pls add the comment
> > about this new method.
> > 
> > BTW, I found this method is called in generic_error_device_notify() of
> > Patch 6. And the mc->generic_error_device_notify() - as the virtual
> > metchod of MachineClass looks just to implement a hook, and it doesn't
> > seem to have anything to do with MachineClass/MachineState, so my
> > question is why do we need to add this method to MachineClass?
> > 
> > Could we maintain a notifier list in ghes.c and expose an interface
> > to allow arm code register a notifier? This eliminates the need to add
> > the “notify” method to MachineClass.
> 
> Makes sense. I'll change the logic to use this notifier list code inside
> ghes.c, and drop generic_error_device_notify():
> 
> 	NotifierList generic_error_notifiers =
> 	    NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
> 
> 	/* Notify BIOS about an error via Generic Error Device - GED */
> 	static void generic_error_device_notify(void)
> 	{
> 	    notifier_list_notify(&generic_error_notifiers, NULL);
> 	}

Fine for me.

Regards,
Zhao




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-07-31  7:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-29 13:21 [PATCH v4 0/6] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
2024-07-29 13:21 ` [PATCH v4 1/6] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-07-29 13:21 ` [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-07-29 16:08   ` Jonathan Cameron via
2024-07-30  5:13     ` Mauro Carvalho Chehab
2024-07-30  8:39       ` Jonathan Cameron via
2024-07-30  8:11   ` Zhao Liu
2024-07-31  5:21     ` Mauro Carvalho Chehab
2024-07-31  8:00       ` Zhao Liu
2024-07-29 13:21 ` [PATCH v4 3/6] target/arm: preserve mpidr value Mauro Carvalho Chehab
2024-07-29 13:21 ` [PATCH v4 4/6] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
2024-07-29 13:21 ` [PATCH v4 5/6] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-07-29 13:21 ` [PATCH v4 6/6] acpi/ghes: Add a logic to inject ARM processor CPER Mauro Carvalho Chehab
2024-07-29 16:31   ` Jonathan Cameron via
2024-07-30  6:16     ` Mauro Carvalho Chehab
2024-07-31  7:23       ` Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).