qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor
@ 2024-07-22  6:45 Mauro Carvalho Chehab
  2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
                   ` (6 more replies)
  0 siblings, 7 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Alex Bennée, Philippe Mathieu-Daudé, Ani Sinha,
	Beraldo Leal, Dongjiu Geng, Paolo Bonzini, Peter Maydell,
	Shannon Zhao, Thomas Huth, Wainer dos Santos Moschetta,
	Yanan Wang, qemu-arm, qemu-devel

Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.

With QEMU, it becomes a lot easier, as it can be done via QMP.

This series add support for ARM Processor CPER error injection,
according with ACPI 6.x and UEFI 2.9A/2.10 specs.

This series consists of:

- one patch using a define for ARM virt GPIO power pin
  (requested during last review);
- three patches from Jonathan (one coauthored with Shiju) with basic
  EINJ features, already submitted as RFC (but not merged yet) at:
    https://lore.kernel.org/qemu-devel/20240628090605.529-1-shiju.jose@huawei.com/
- three patches from me extending it to optionally allow to
  generate all sorts of possible valid combinations for
  ARM Processor CPER record.

I've been using it to test a Linux Kernel patch series fixing
UEFI 2.9A errata and ARM processor trace event:
   https://lore.kernel.org/linux-edac/3853853f820a666253ca8ed6c7c724dc3d50044a.1720679234.git.mchehab+huawei@kernel.org/T/#t

I also wrote some Wiki pages for rasdaemon (a Linux daemon
widely used to monitor and react to RAS events):
   https://github.com/mchehab/rasdaemon/wiki/error-injection

Being really helpful to test the Linux Kernel behavior when
firmware-first RAS events for ARM processor arrives there,
helping to validate how CPER and GHES driver handles them
(and further testing userspace apps like rasdaemon):

Sending this command to QMP:
    { "execute": "qmp_capabilities" } 
    { "execute": "arm-inject-error", "arguments": {"error": [{"type": ["cache-error"]}]} }

Produces a simple CPER register, properly handled by the Linux
Kernel:

[  839.952678] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  839.953145] {4}[Hardware Error]: event severity: recoverable
[  839.953451] {4}[Hardware Error]:  Error 0, type: recoverable
[  839.953763] {4}[Hardware Error]:   section_type: ARM processor error
[  839.954094] {4}[Hardware Error]:   MIDR: 0x0000000000000000
[  839.954383] {4}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[  839.954802] {4}[Hardware Error]:   running state: 0x0
[  839.955066] {4}[Hardware Error]:   Power State Coordination Interface state: 0
[  839.955424] {4}[Hardware Error]:   Error info structure 0:
[  839.955712] {4}[Hardware Error]:   num errors: 1
[  839.955983] {4}[Hardware Error]:    first error captured
[  839.956260] {4}[Hardware Error]:    propagated error captured
[  839.956561] {4}[Hardware Error]:    error_type: 0x02: cache error
[  839.956882] {4}[Hardware Error]:    error_info: 0x000000000054007f
[  839.957192] {4}[Hardware Error]:     transaction type: Instruction
[  839.957495] {4}[Hardware Error]:     cache error, operation type: Instruction fetch
[  839.957888] {4}[Hardware Error]:     cache level: 1
[  839.958166] {4}[Hardware Error]:     processor context not corrupted
[  839.958459] {4}[Hardware Error]:     the error has not been corrected
[  839.958771] {4}[Hardware Error]:     PC is imprecise
[  839.959074] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error

rasdaemon output (rasdaemon still needs to be patched for
UEFI 2.9A errata):

           <...>-211   [002] d..1.     0.000129 arm_event 2024-07-11 09:50:45 +0000 affinity: -1 MPIDR: 0x80000000 MIDR: 0x0 running_state: 0 psci_state: 0 ARM Processor Err Info data len: 32
<CANT FIND FIELD buf>cpu: 0; error: 2; affinity level: 255; MPIDR: 0000000080000000; MIDR: 0000000000000000; running state: 0; PSCI state: 0; ARM Processor Err Info data len: 32; ARM Processor Err Info raw data: 00 20 06 00 02 00 00 05 7f 00 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00; ARM Processor Err Context Info data len: 0; ARM Processor Err Context Info raw data: ; Vendor Specific Err Info data len: 0; Vendor Specific Err Info raw data: 

More complex events with multiple Processor Error Information structures
can be produced like:

    { "execute": "arm-inject-error", "arguments":  {
        "validation": ["mpidr-valid", "affinity-valid", "running-state-valid", "vendor-specific-valid"],
        "running-state": [], "psci-state": 1229279264, 
        "error": [{
            "validation": ["multiple-error-valid", "flags-valid"], 
	"type": ["tlb-error", "bus-error", "micro-arch-error"], 
                               "multiple-error": 3, "phy-addr": 57005, "virt-addr": 48879},
                 {"type": ["micro-arch-error"]}, 
                 {"type": ["tlb-error"]}, 
                 {"type": ["bus-error"]},
                 {"type": ["cache-error"]}],
                 "context": [{"register": [57005, 48879, 43962, 47787]}],
                 "vendor-specific": [12, 23, 53, 52, 3, 123, 243, 255]} }

[  925.340284] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[  925.340662] {5}[Hardware Error]: event severity: recoverable
[  925.340924] {5}[Hardware Error]:  Error 0, type: recoverable
[  925.341280] {5}[Hardware Error]:   section_type: ARM processor error
[  925.341631] {5}[Hardware Error]:   MIDR: 0x0000000000000000
[  925.341893] {5}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
[  925.342278] {5}[Hardware Error]:   error affinity level: 0
[  925.342571] {5}[Hardware Error]:   running state: 0x0
[  925.342835] {5}[Hardware Error]:   Power State Coordination Interface state: 1229279264
[  925.343157] {5}[Hardware Error]:   Error info structure 0:
[  925.343388] {5}[Hardware Error]:   num errors: 4
[  925.343602] {5}[Hardware Error]:    error_type: 0x1c: TLB error|bus error|micro-architectural error
[  925.343960] {5}[Hardware Error]:    virtual fault address: 0x000000000000beef
[  925.344241] {5}[Hardware Error]:    physical fault address: 0x000000000000dead
[  925.344526] {5}[Hardware Error]:   Error info structure 1:
[  925.344757] {5}[Hardware Error]:   num errors: 1
[  925.344965] {5}[Hardware Error]:    first error captured
[  925.345183] {5}[Hardware Error]:    propagated error captured
[  925.345416] {5}[Hardware Error]:    error_type: 0x10: micro-architectural error
[  925.345714] {5}[Hardware Error]:   Error info structure 2:
[  925.345946] {5}[Hardware Error]:   num errors: 1
[  925.346148] {5}[Hardware Error]:    first error captured
[  925.346413] {5}[Hardware Error]:    propagated error captured
[  925.346719] {5}[Hardware Error]:    error_type: 0x04: TLB error
[  925.346988] {5}[Hardware Error]:    error_info: 0x00000080d6460fff
[  925.347248] {5}[Hardware Error]:     transaction type: Generic
[  925.347492] {5}[Hardware Error]:     TLB error, operation type: Generic read (type of instruction or data request cannot be determined)
[  925.347945] {5}[Hardware Error]:     TLB level: 1
[  925.348153] {5}[Hardware Error]:     processor context corrupted
[  925.348392] {5}[Hardware Error]:     the error has been corrected
[  925.348635] {5}[Hardware Error]:     PC is imprecise
[  925.348848] {5}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[  925.349232] {5}[Hardware Error]:   Error info structure 3:
[  925.349459] {5}[Hardware Error]:   num errors: 1
[  925.349662] {5}[Hardware Error]:    first error captured
[  925.349884] {5}[Hardware Error]:    propagated error captured
[  925.350115] {5}[Hardware Error]:    error_type: 0x08: bus error
[  925.350371] {5}[Hardware Error]:    error_info: 0x0000000078da03ff
[  925.350629] {5}[Hardware Error]:     transaction type: Generic
[  925.350878] {5}[Hardware Error]:     bus error, operation type: Prefetch
[  925.351144] {5}[Hardware Error]:     affinity level at which the bus error occurred: 3
[  925.351451] {5}[Hardware Error]:     processor context not corrupted
[  925.351702] {5}[Hardware Error]:     the error has not been corrected
[  925.351960] {5}[Hardware Error]:     PC is precise
[  925.352164] {5}[Hardware Error]:     Program execution can be restarted reliably at the PC associated with the error.
[  925.352546] {5}[Hardware Error]:     participation type: Generic
[  925.352801] {5}[Hardware Error]:     address space: External Memory Access
[  925.353071] {5}[Hardware Error]:   Error info structure 4:
[  925.353299] {5}[Hardware Error]:   num errors: 1
[  925.353502] {5}[Hardware Error]:    first error captured
[  925.353720] {5}[Hardware Error]:    propagated error captured
[  925.353963] {5}[Hardware Error]:    error_type: 0x02: cache error
[  925.354222] {5}[Hardware Error]:    error_info: 0x000000000054007f
[  925.354478] {5}[Hardware Error]:     transaction type: Instruction
[  925.354782] {5}[Hardware Error]:     cache error, operation type: Instruction fetch
[  925.355203] {5}[Hardware Error]:     cache level: 1
[  925.355495] {5}[Hardware Error]:     processor context not corrupted
[  925.355848] {5}[Hardware Error]:     the error has not been corrected
[  925.356206] {5}[Hardware Error]:     PC is imprecise
[  925.356493] {5}[Hardware Error]:   Context info structure 0:
[  925.356809] {5}[Hardware Error]:    register context type: AArch64 EL1 context registers
[  925.357282] {5}[Hardware Error]:    00000000: 0000dead 00000000 0000beef 00000000
[  925.357800] {5}[Hardware Error]:    00000010: 0000abba 00000000 0000baab 00000000
[  925.358267] {5}[Hardware Error]:    00000020: 00000000 00000000
[  925.358523] {5}[Hardware Error]:   Vendor specific error info has 8 bytes:
[  925.358822] {5}[Hardware Error]:    00000000: 3435170c fff37b03                    ..54.{..
[  925.359192] [Firmware Warn]: GHES: Unhandled processor error type 0x1c: TLB error|bus error|micro-architectural error
[  925.359590] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[  925.359935] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[  925.360235] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[  925.360534] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error

---

v3:
- patch 1 cleanups with some comment changes and adding another place where
  the poweroff GPIO define should be used. No changes on other patches (except
  due to conflict resolution).

v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.

Jonathan Cameron (3):
  arm/virt: Wire up GPIO error source for ACPI / GHES
  acpi/ghes: Support GPIO error source.
  acpi/ghes: Add a logic to handle block addresses and FW first ARM
    processor error injection

Mauro Carvalho Chehab (4):
  arm/virt: place power button pin number on a define
  target/arm: preserve mpidr value
  acpi/ghes: update comments to point to newer ACPI specs
  acpi/ghes: extend arm error injection logic

 configs/targets/aarch64-softmmu.mak |   1 +
 hw/acpi/ghes.c                      | 324 ++++++++++++++++++---
 hw/arm/Kconfig                      |   4 +
 hw/arm/arm_error_inject.c           | 420 ++++++++++++++++++++++++++++
 hw/arm/arm_error_inject_stubs.c     |  34 +++
 hw/arm/meson.build                  |   3 +
 hw/arm/virt-acpi-build.c            |  34 ++-
 hw/arm/virt.c                       |  21 +-
 include/hw/acpi/ghes.h              |  41 +++
 include/hw/arm/virt.h               |   4 +
 include/hw/boards.h                 |   1 +
 qapi/arm-error-inject.json          | 277 ++++++++++++++++++
 qapi/meson.build                    |   1 +
 qapi/qapi-schema.json               |   1 +
 target/arm/cpu.h                    |   1 +
 target/arm/helper.c                 |  10 +-
 tests/lcitool/libvirt-ci            |   2 +-
 17 files changed, 1132 insertions(+), 47 deletions(-)
 create mode 100644 hw/arm/arm_error_inject.c
 create mode 100644 hw/arm/arm_error_inject_stubs.c
 create mode 100644 qapi/arm-error-inject.json

-- 
2.45.2




^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-30  7:25   ` Igor Mammedov
  2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Igor Mammedov, Peter Maydell,
	Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

Having magic numbers inside the code is not a good idea, as it
is error-prone. So, instead, create a macro with the number
definition.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
 hw/arm/virt-acpi-build.c | 6 +++---
 hw/arm/virt.c            | 7 ++++---
 include/hw/arm/virt.h    | 3 +++
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e10cad86dd73..f76fb117adff 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
     aml_append(dev, aml_name_decl("_CRS", crs));
 
     Aml *aei = aml_resource_template();
-    /* Pin 3 for power button */
-    const uint32_t pin_list[1] = {3};
+
+    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
     aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
-                                 AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
+                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
                                  "GPO0", NULL, 0));
     aml_append(dev, aml_name_decl("_AEI", aei));
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0c68d66a345..c99c8b1713c6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
     if (s->acpi_dev) {
         acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
     } else {
-        /* use gpio Pin 3 for power button event */
+        /* use gpio Pin for power button event */
         qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
     }
 }
@@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
                              uint32_t phandle)
 {
     gpio_key_dev = sysbus_create_simple("gpio-key", -1,
-                                        qdev_get_gpio_in(pl061_dev, 3));
+                                        qdev_get_gpio_in(pl061_dev,
+                                                         GPIO_PIN_POWER_BUTTON));
 
     qemu_fdt_add_subnode(fdt, "/gpio-keys");
     qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
@@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
     qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
                           KEY_POWER);
     qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
-                           "gpios", phandle, 3, 0);
+                           "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
 }
 
 #define SECURE_GPIO_POWEROFF 0
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ab961bb6a9b8..a4d937ed45ac 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -47,6 +47,9 @@
 /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
 #define PVTIME_SIZE_PER_CPU 64
 
+/* GPIO pins */
+#define GPIO_PIN_POWER_BUTTON  3
+
 enum {
     VIRT_FLASH,
     VIRT_MEM,
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
  2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-26 12:30   ` Jonathan Cameron via
  2024-07-30  8:36   ` Igor Mammedov
  2024-07-22  6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Igor Mammedov, Marcel Apfelbaum, Peter Maydell, Shannon Zhao,
	Yanan Wang, linux-kernel, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Creates a GED - Generic Event Device and set a GPIO to
be used or error injection.

[mchehab: use a define for the generic event pin number and do some cleanups]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
 hw/arm/virt.c            | 14 ++++++++++++--
 include/hw/arm/virt.h    |  1 +
 include/hw/boards.h      |  1 +
 4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..c502ccf40909 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -63,6 +63,7 @@
 
 #define ARM_SPI_BASE 32
 
+#define ACPI_GENERIC_EVENT_DEVICE "GEDD"
 #define ACPI_BUILD_TABLE_SIZE             0x20000
 
 static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
@@ -142,6 +143,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
 static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
                                            uint32_t gpio_irq)
 {
+    uint32_t pin;
+
     Aml *dev = aml_device("GPO0");
     aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
     aml_append(dev, aml_name_decl("_UID", aml_int(0)));
@@ -155,7 +158,12 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
 
     Aml *aei = aml_resource_template();
 
-    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
+    pin = GPIO_PIN_POWER_BUTTON;
+    aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
+                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
+                                 "GPO0", NULL, 0));
+    /* Pin for generic error */
+    pin = GPIO_PIN_GENERIC_ERROR;
     aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
                                  AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
                                  "GPO0", NULL, 0));
@@ -166,6 +174,11 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
     aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
                                   aml_int(0x80)));
     aml_append(dev, method);
+    method = aml_method("_E06", 0, AML_NOTSERIALIZED);
+    aml_append(method, aml_notify(aml_name(ACPI_GENERIC_EVENT_DEVICE),
+                                  aml_int(0x80)));
+    aml_append(dev, method);
+
     aml_append(scope, dev);
 }
 
@@ -800,6 +813,15 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
     build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
 }
 
+static void acpi_dsdt_add_generic_event_device(Aml *scope)
+{
+    Aml *dev = aml_device(ACPI_GENERIC_EVENT_DEVICE);
+    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
+    aml_append(scope, dev);
+}
+
 /* DSDT */
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
@@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
                       HOTPLUG_HANDLER(vms->acpi_dev),
                       irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
                       memmap[VIRT_ACPI_GED].base);
-    } else {
-        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
-                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
     }
+    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
+                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
 
     if (vms->acpi_dev) {
         uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
@@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     }
 
     acpi_dsdt_add_power_button(scope);
+    acpi_dsdt_add_generic_event_device(scope);
 #ifdef CONFIG_TPM
     acpi_dsdt_add_tpm(scope, vms);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c99c8b1713c6..f81cf3a69961 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -997,6 +997,13 @@ static void create_rtc(const VirtMachineState *vms)
 }
 
 static DeviceState *gpio_key_dev;
+
+static DeviceState *gpio_error_dev;
+static void virt_set_error(void)
+{
+    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
+}
+
 static void virt_powerdown_req(Notifier *n, void *opaque)
 {
     VirtMachineState *s = container_of(n, VirtMachineState, powerdown_notifier);
@@ -1015,6 +1022,9 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
     gpio_key_dev = sysbus_create_simple("gpio-key", -1,
                                         qdev_get_gpio_in(pl061_dev,
                                                          GPIO_PIN_POWER_BUTTON));
+    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
+                                          qdev_get_gpio_in(pl061_dev,
+                                                           GPIO_PIN_GENERIC_ERROR));
 
     qemu_fdt_add_subnode(fdt, "/gpio-keys");
     qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
@@ -2385,9 +2395,8 @@ static void machvirt_init(MachineState *machine)
 
     if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
         vms->acpi_dev = create_acpi_ged(vms);
-    } else {
-        create_gpio_devices(vms, VIRT_GPIO, sysmem);
     }
+    create_gpio_devices(vms, VIRT_GPIO, sysmem);
 
     if (vms->secure && !vmc->no_secure_gpio) {
         create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
@@ -3101,6 +3110,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     mc->default_ram_id = "mach-virt.ram";
     mc->default_nic = "virtio-net-pci";
 
+    mc->set_error = virt_set_error;
     object_class_property_add(oc, "acpi", "OnOffAuto",
         virt_get_acpi, virt_set_acpi,
         NULL, NULL);
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a4d937ed45ac..c9769d7d4d7f 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -49,6 +49,7 @@
 
 /* GPIO pins */
 #define GPIO_PIN_POWER_BUTTON  3
+#define GPIO_PIN_GENERIC_ERROR 6
 
 enum {
     VIRT_FLASH,
diff --git a/include/hw/boards.h b/include/hw/boards.h
index ef6f18f2c1a7..6cf01f3934ae 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -304,6 +304,7 @@ struct MachineClass {
     const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
     int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
     ram_addr_t (*fixup_ram_size)(ram_addr_t size);
+    void (*set_error)(void);
 };
 
 /**
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 3/7] acpi/ghes: Support GPIO error source.
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
  2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
  2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-30  8:40   ` Igor Mammedov
  2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel,
	Mauro Carvalho Chehab

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Add error notification to GHES v2 using the GPIO source.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c         | 8 ++++++--
 include/hw/acpi/ghes.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..5b8bc6eeb437 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -34,8 +34,8 @@
 /* The max size in bytes for one error block */
 #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
 
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT        1
+/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
+#define ACPI_GHES_ERROR_SOURCE_COUNT        2
 
 /* Generic Hardware Error Source version 2 */
 #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
@@ -327,6 +327,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
          */
         build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
         break;
+    case ACPI_HEST_SRC_ID_GPIO:
+        build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
+        break;
     default:
         error_report("Not support this error source");
         abort();
@@ -370,6 +373,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
     /* Error Source Count */
     build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
     build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+    build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
 
     acpi_table_end(linker, &table);
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..4f1ab1a73a06 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
 
 enum {
     ACPI_HEST_SRC_ID_SEA = 0,
+    ACPI_HEST_SRC_ID_GPIO = 1,
     /* future ids go here */
     ACPI_HEST_SRC_ID_RESERVED,
 };
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2024-07-22  6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-25  9:48   ` Markus Armbruster
                     ` (2 more replies)
  2024-07-22  6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Markus Armbruster,
	Michael Roth, Paolo Bonzini, Peter Maydell, linux-kernel,
	qemu-arm, qemu-devel, Mauro Carvalho Chehab

From: Jonathan Cameron <Jonathan.Cameron@huawei.com>

1. Some GHES functions require handling addresses. Add a helper function
   to support it.

2. Add support for ACPI CPER (firmware-first) ARM processor error injection.

Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
upper specs, using error type bit encoding as detailed at UEFI 2.9A
errata.

Error injection examples:

{ "execute": "qmp_capabilities" }

{ "execute": "arm-inject-error",
      "arguments": {
        "errortypes": ['cache-error']
      }
}

{ "execute": "arm-inject-error",
      "arguments": {
        "errortypes": ['tlb-error']
      }
}

{ "execute": "arm-inject-error",
      "arguments": {
        "errortypes": ['bus-error']
      }
}

{ "execute": "arm-inject-error",
      "arguments": {
        "errortypes": ['cache-error', 'tlb-error']
      }
}

{ "execute": "arm-inject-error",
      "arguments": {
        "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
      }
}
...

Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
For Add a logic to handle block addresses,
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
For FW first ARM processor error injection,
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
---
 configs/targets/aarch64-softmmu.mak |   1 +
 hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
 hw/arm/Kconfig                      |   4 +
 hw/arm/arm_error_inject.c           |  35 ++++
 hw/arm/arm_error_inject_stubs.c     |  18 ++
 hw/arm/meson.build                  |   3 +
 include/hw/acpi/ghes.h              |   2 +
 qapi/arm-error-inject.json          |  49 ++++++
 qapi/meson.build                    |   1 +
 qapi/qapi-schema.json               |   1 +
 10 files changed, 361 insertions(+), 11 deletions(-)
 create mode 100644 hw/arm/arm_error_inject.c
 create mode 100644 hw/arm/arm_error_inject_stubs.c
 create mode 100644 qapi/arm-error-inject.json

diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
index 84cb32dc2f4f..b4b3cd97934a 100644
--- a/configs/targets/aarch64-softmmu.mak
+++ b/configs/targets/aarch64-softmmu.mak
@@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
 TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
 # needed by boot.c
 TARGET_NEED_FDT=y
+CONFIG_ARM_EINJ=y
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 5b8bc6eeb437..6075ef5893ce 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -27,6 +27,7 @@
 #include "hw/acpi/generic_event_device.h"
 #include "hw/nvram/fw_cfg.h"
 #include "qemu/uuid.h"
+#include "qapi/qapi-types-arm-error-inject.h"
 
 #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
 #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
@@ -53,6 +54,12 @@
 /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
 #define ACPI_GHES_MEM_CPER_LENGTH           80
 
+/*
+ * ARM Processor section CPER size, UEFI 2.10: N.2.4.4
+ * ARM Processor Error Section
+ */
+#define ACPI_GHES_ARM_CPER_LENGTH (72 + 600)
+
 /* Masks for block_status flags */
 #define ACPI_GEBS_UNCORRECTABLE         1
 
@@ -231,6 +238,142 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
     return 0;
 }
 
+/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
+static void acpi_ghes_build_append_arm_cper(uint8_t error_types, GArray *table)
+{
+    /*
+     * ARM Processor Error Record
+     */
+
+    /* Validation Bits */
+    build_append_int_noprefix(table,
+                              (1ULL << 3) | /* Vendor specific info Valid */
+                              (1ULL << 2) | /* Running status Valid */
+                              (1ULL << 1) | /* Error affinity level Valid */
+                              (1ULL << 0), /* MPIDR Valid */
+                              4);
+    /* Error Info Num */
+    build_append_int_noprefix(table, 1, 2);
+    /* Context Info Num */
+    build_append_int_noprefix(table, 1, 2);
+    /* Section length */
+    build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_LENGTH, 4);
+    /* Error affinity level */
+    build_append_int_noprefix(table, 2, 1);
+    /* Reserved */
+    build_append_int_noprefix(table, 0, 3);
+    /* MPIDR_EL1 */
+    build_append_int_noprefix(table, 0xAB12, 8);
+    /* MIDR_EL1 */
+    build_append_int_noprefix(table, 0xCD24, 8);
+    /* Running state */
+    build_append_int_noprefix(table, 0x1, 4);
+    /* PSCI state */
+    build_append_int_noprefix(table, 0x1234, 4);
+
+    /* ARM Propcessor error information */
+    /* Version */
+    build_append_int_noprefix(table, 0, 1);
+    /*  Length */
+    build_append_int_noprefix(table, 32, 1);
+    /* Validation Bits */
+    build_append_int_noprefix(table,
+                              (1ULL << 4) | /* Physical fault address Valid */
+                             (1ULL << 3) | /* Virtual fault address Valid */
+                             (1ULL << 2) | /* Error information Valid */
+                              (1ULL << 1) | /* Flags Valid */
+                              (1ULL << 0), /* Multiple error count Valid */
+                              2);
+    /* Type */
+    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR) ||
+        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR) ||
+        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR) ||
+        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
+        build_append_int_noprefix(table, error_types, 1);
+    } else {
+        return;
+    }
+    /* Multiple error count */
+    build_append_int_noprefix(table, 2, 2);
+    /* Flags  */
+    build_append_int_noprefix(table, 0xD, 1);
+    /* Error information  */
+    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR)) {
+        build_append_int_noprefix(table, 0x0091000F, 8);
+    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR)) {
+        build_append_int_noprefix(table, 0x0054007F, 8);
+    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR)) {
+        build_append_int_noprefix(table, 0x80D6460FFF, 8);
+    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
+        build_append_int_noprefix(table, 0x78DA03FF, 8);
+    } else {
+        return;
+    }
+    /* Virtual fault address  */
+    build_append_int_noprefix(table, 0x67320230, 8);
+    /* Physical fault address  */
+    build_append_int_noprefix(table, 0x5CDFD492, 8);
+
+    /* ARM Propcessor error context information */
+    /* Version */
+    build_append_int_noprefix(table, 0, 2);
+    /* Validation Bits */
+    /* AArch64 EL1 context registers Valid */
+    build_append_int_noprefix(table, 5, 2);
+    /* Register array size */
+    build_append_int_noprefix(table, 592, 4);
+    /* Register array */
+    build_append_int_noprefix(table, 0x12ABDE67, 8);
+}
+
+static int acpi_ghes_record_arm_error(uint8_t error_types,
+                                      uint64_t error_block_address)
+{
+    GArray *block;
+
+    /* ARM processor Error Section Type */
+    const uint8_t uefi_cper_arm_sec[] =
+          UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05, \
+                  0x1D, 0x5D, 0x46, 0xB0);
+
+    /*
+     * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
+     * Table 17-13 Generic Error Data Entry
+     */
+    QemuUUID fru_id = {};
+    uint32_t data_length;
+
+    block = g_array_new(false, true /* clear */, 1);
+
+    /* This is the length if adding a new generic error data entry*/
+    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_ARM_CPER_LENGTH;
+    /*
+     * It should not run out of the preallocated memory if adding a new generic
+     * error data entry
+     */
+    assert((data_length + ACPI_GHES_GESB_SIZE) <=
+            ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+    /* Build the new generic error status block header */
+    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
+        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
+
+    /* Build this new generic error data entry header */
+    acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
+        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+        ACPI_GHES_ARM_CPER_LENGTH, fru_id, 0);
+
+    /* Build the ARM processor error section CPER */
+    acpi_ghes_build_append_arm_cper(error_types, block);
+
+    /* Write the generic error data entry into guest memory */
+    cpu_physical_memory_write(error_block_address, block->data, block->len);
+
+    g_array_free(block, true);
+
+    return 0;
+}
+
 /*
  * Build table for the hardware error fw_cfg blob.
  * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
@@ -392,23 +535,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
     ags->present = true;
 }
 
+static uint64_t ghes_get_state_start_address(void)
+{
+    AcpiGedState *acpi_ged_state =
+        ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
+    AcpiGhesState *ags = &acpi_ged_state->ghes_state;
+
+    return le64_to_cpu(ags->ghes_addr_le);
+}
+
 int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
 {
     uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
-    uint64_t start_addr;
+    uint64_t start_addr = ghes_get_state_start_address();
     bool ret = -1;
-    AcpiGedState *acpi_ged_state;
-    AcpiGhesState *ags;
-
     assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
 
-    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
-                                                       NULL));
-    g_assert(acpi_ged_state);
-    ags = &acpi_ged_state->ghes_state;
-
-    start_addr = le64_to_cpu(ags->ghes_addr_le);
-
     if (physical_address) {
 
         if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
@@ -448,6 +590,100 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
     return ret;
 }
 
+/*
+ * Error register block data layout
+ *
+ * | +---------------------+ ges.ghes_addr_le
+ * | |error_block_address0 |
+ * | +---------------------+
+ * | |error_block_address1 |
+ * | +---------------------+ --+--
+ * | |    .............    | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | |error_block_addressN |
+ * | +---------------------+
+ * | | read_ack_register0  |
+ * | +---------------------+ --+--
+ * | | read_ack_register1  | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | |   .............     |
+ * | +---------------------+
+ * | | read_ack_registerN  |
+ * | +---------------------+ --+--
+ * | |      CPER           |   |
+ * | |      ....           | GHES_MAX_RAW_DATA_LENGT
+ * | |      CPER           |   |
+ * | +---------------------+ --+--
+ * | |    ..........       |
+ * | +---------------------+
+ * | |      CPER           |
+ * | |      ....           |
+ * | |      CPER           |
+ * | +---------------------+
+ */
+
+/* Map from uint32_t notify to entry offset in GHES */
+static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
+                                                 0xff, 0xff, 0xff, 1, 0};
+
+static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
+                          uint64_t *read_ack_register_addr)
+{
+    uint64_t base;
+
+    if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
+        return false;
+    }
+
+    /* Find and check the source id for this new CPER */
+    if (error_source_to_index[notify] == 0xff) {
+        return false;
+    }
+
+    base = ghes_get_state_start_address();
+
+    *read_ack_register_addr = base +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        error_source_to_index[notify] * sizeof(uint64_t);
+
+    /* Could also be read back from the error_block_address register */
+    *error_block_addr = base +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+        error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
+
+    return true;
+}
+
+bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
+{
+    int read_ack_register = 0;
+    uint64_t read_ack_register_addr = 0;
+    uint64_t error_block_addr = 0;
+
+    if (!ghes_get_addr(notify, &error_block_addr, &read_ack_register_addr)) {
+        return false;
+    }
+
+    cpu_physical_memory_read(read_ack_register_addr,
+                             &read_ack_register, sizeof(uint64_t));
+    /* zero means OSPM does not acknowledge the error */
+    if (!read_ack_register) {
+        error_report("Last time OSPM does not acknowledge the error,"
+                     " record CPER failed this time, set the ack value to"
+                     " avoid blocking next time CPER record! exit");
+        read_ack_register = 1;
+        cpu_physical_memory_write(read_ack_register_addr,
+                                  &read_ack_register, sizeof(uint64_t));
+        return false;
+    }
+
+    read_ack_register = cpu_to_le64(0);
+    cpu_physical_memory_write(read_ack_register_addr,
+                              &read_ack_register, sizeof(uint64_t));
+    return acpi_ghes_record_arm_error(error_types, error_block_addr);
+}
+
 bool acpi_ghes_present(void)
 {
     AcpiGedState *acpi_ged_state;
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 1ad60da7aa2d..bafac82f9fd3 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -712,3 +712,7 @@ config ARMSSE
     select UNIMP
     select SSE_COUNTER
     select SSE_TIMER
+
+config ARM_EINJ
+    bool
+    default y if AARCH64
diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
new file mode 100644
index 000000000000..1da97d5d4fdc
--- /dev/null
+++ b/hw/arm/arm_error_inject.c
@@ -0,0 +1,35 @@
+/*
+ * ARM Processor error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi-commands-arm-error-inject.h"
+#include "hw/boards.h"
+#include "hw/acpi/ghes.h"
+
+/* For ARM processor errors */
+void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
+{
+    MachineState *machine = MACHINE(qdev_get_machine());
+    MachineClass *mc = MACHINE_GET_CLASS(machine);
+    uint8_t error_types = 0;
+
+    while (errortypes) {
+        error_types |= BIT(errortypes->value);
+        errortypes = errortypes->next;
+    }
+
+    ghes_record_arm_errors(error_types, ACPI_GHES_NOTIFY_GPIO);
+    if (mc->set_error) {
+        mc->set_error();
+    }
+
+    return;
+}
diff --git a/hw/arm/arm_error_inject_stubs.c b/hw/arm/arm_error_inject_stubs.c
new file mode 100644
index 000000000000..b51f4202fe64
--- /dev/null
+++ b/hw/arm/arm_error_inject_stubs.c
@@ -0,0 +1,18 @@
+/*
+ * QMP stub for ARM processor error injection.
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi-commands-arm-error-inject.h"
+
+void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
+{
+    error_setg(errp, "ARM processor error support is not compiled in");
+}
diff --git a/hw/arm/meson.build b/hw/arm/meson.build
index 0c07ab522f4c..cb7fe09fc87b 100644
--- a/hw/arm/meson.build
+++ b/hw/arm/meson.build
@@ -60,6 +60,7 @@ arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
 arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
 arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
 arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
+arm_ss.add(when: 'CONFIG_ARM_EINJ', if_true: files('arm_error_inject.c'))
 
 system_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
 system_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c'))
@@ -77,5 +78,7 @@ system_ss.add(when: 'CONFIG_TOSA', if_true: files('tosa.c'))
 system_ss.add(when: 'CONFIG_VERSATILE', if_true: files('versatilepb.c'))
 system_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
 system_ss.add(when: 'CONFIG_Z2', if_true: files('z2.c'))
+system_ss.add(when: 'CONFIG_ARM_EINJ', if_false: files('arm_error_inject_stubs.c'))
+system_ss.add(when: 'CONFIG_ALL', if_true: files('arm_error_inject_stubs.c'))
 
 hw_arch += {'arm': arm_ss}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 4f1ab1a73a06..dc531ffce7ae 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -75,6 +75,8 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
 
+bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify);
+
 /**
  * acpi_ghes_present: Report whether ACPI GHES table is present
  *
diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
new file mode 100644
index 000000000000..430e6cea6b60
--- /dev/null
+++ b/qapi/arm-error-inject.json
@@ -0,0 +1,49 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = ARM Processor Errors
+##
+
+##
+# @ArmProcessorErrorType:
+#
+# Type of ARM processor error to inject
+#
+# @unknown-error: Unknown error
+#
+# @cache-error: Cache error
+#
+# @tlb-error: TLB error
+#
+# @bus-error: Bus error.
+#
+# @micro-arch-error: Micro architectural error.
+#
+# Since: 9.1
+##
+{ 'enum': 'ArmProcessorErrorType',
+  'data': ['unknown-error',
+	   'cache-error',
+           'tlb-error',
+           'bus-error',
+           'micro-arch-error']
+}
+
+##
+# @arm-inject-error:
+#
+# Inject ARM Processor error.
+#
+# @errortypes: ARM processor error types to inject
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.1
+##
+{ 'command': 'arm-inject-error',
+  'data': { 'errortypes': ['ArmProcessorErrorType'] },
+  'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..5927932c4be3 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
 endif
 
 qapi_all_modules = [
+  'arm-error-inject',
   'authz',
   'block',
   'block-core',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..479a22de7e43 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -81,3 +81,4 @@
 { 'include': 'vfio.json' }
 { 'include': 'cryptodev.json' }
 { 'include': 'cxl.json' }
+{ 'include': 'arm-error-inject.json' }
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 5/7] target/arm: preserve mpidr value
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-26 12:50   ` Jonathan Cameron via
  2024-07-22  6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
  2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
  6 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Peter Maydell, linux-kernel, qemu-arm, qemu-devel

There is a logic at helper to properly fill the mpidr information.
This is needed for ARM Processor error injection, so store the
value inside a cpu opaque value, to allow it to be used.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 target/arm/cpu.h    |  1 +
 target/arm/helper.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a12859fc5335..d2e86f0877cc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@ struct ArchCPU {
         uint64_t reset_pmcr_el0;
     } isar;
     uint64_t midr;
+    uint64_t mpidr;
     uint32_t revidr;
     uint32_t reset_fpsid;
     uint64_t ctr;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index ce319572354a..2432b5b09607 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4692,7 +4692,7 @@ static uint64_t mpidr_read_val(CPUARMState *env)
     return mpidr;
 }
 
-static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+static uint64_t mpidr_read(CPUARMState *env)
 {
     unsigned int cur_el = arm_current_el(env);
 
@@ -4702,6 +4702,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
     return mpidr_read_val(env);
 }
 
+static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    return mpidr_read(env);
+}
+
 static const ARMCPRegInfo lpae_cp_reginfo[] = {
     /* NOP AMAIR0/1 */
     { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
@@ -9723,7 +9728,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
             { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
               .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
               .fgt = FGT_MPIDR_EL1,
-              .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+              .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
         };
 #ifdef CONFIG_USER_ONLY
         static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
@@ -9733,6 +9738,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
         modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
 #endif
         define_arm_cp_regs(cpu, mpidr_cp_reginfo);
+        cpu->mpidr = mpidr_read(env);
     }
 
     if (arm_feature(env, ARM_FEATURE_AUXCR)) {
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2024-07-22  6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-30 11:24   ` Igor Mammedov
  2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
  6 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Igor Mammedov,
	linux-kernel, qemu-arm, qemu-devel

There is one reference to ACPI 4.0 and several references
to ACPI 6.x versions.

Update them to point to ACPI 6.5 whenever possible.

There's one reference that was kept pointing to ACPI 6.4,
though, with HEST revision 1.

ACPI 6.5 now defines HEST revision 2, and defined a new
way to handle source types starting from 12. According
with ACPI 6.5 revision history:

	2312 Update to the HEST table and adding new error
	     source descriptor - Table 18.2.

Yet, the spec doesn't define yet any new source
descriptors. It just defines a different behavior when
source type is above 11.

I also double-checked GHES implementation on an open
source project (Linux Kernel). Currently upstream
doesn't currently handle HEST revision, ignoring such
field.

In any case, revision 2 seems to be backward-compatible
with revison 1 when type <= 11 and just one error is
contained on a HEST record.

So, while it is probably safe to update it, there's no
real need. So, let's keep the implementation using
an ACPI 6.4 compatible table, e. g. HEST revision 1.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c | 48 ++++++++++++++++++++++++++++--------------------
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 6075ef5893ce..ebf1b812aaaa 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -45,9 +45,9 @@
 #define GAS_ADDR_OFFSET 4
 
 /*
- * The total size of Generic Error Data Entry
- * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
- * Table 18-343 Generic Error Data Entry
+ * The total size of Generic Error Data Entry before data field
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.12 Generic Error Data Entry
  */
 #define ACPI_GHES_DATA_LENGTH               72
 
@@ -65,8 +65,8 @@
 
 /*
  * Total size for Generic Error Status Block except Generic Error Data Entries
- * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
- * Table 18-380 Generic Error Status Block
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.11 Generic Error Status Block
  */
 #define ACPI_GHES_GESB_SIZE                 20
 
@@ -82,7 +82,8 @@ enum AcpiGenericErrorSeverity {
 
 /*
  * Hardware Error Notification
- * ACPI 4.0: 17.3.2.7 Hardware Error Notification
+ * ACPI 6.5: 18.3.2.9 Hardware Error Notification,
+ * Table 18.14 - Hardware Error Notification Structure
  * Composes dummy Hardware Error Notification descriptor of specified type
  */
 static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
@@ -112,7 +113,8 @@ static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
 
 /*
  * Generic Error Data Entry
- * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.12 - Generic Error Data Entry
  */
 static void acpi_ghes_generic_error_data(GArray *table,
                 const uint8_t *section_type, uint32_t error_severity,
@@ -148,7 +150,8 @@ static void acpi_ghes_generic_error_data(GArray *table,
 
 /*
  * Generic Error Status Block
- * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
+ * Table 18.11 - Generic Hardware Error Source Structure
  */
 static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
                 uint32_t raw_data_offset, uint32_t raw_data_length,
@@ -429,15 +432,18 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
         0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
 }
 
-/* Build Generic Hardware Error Source version 2 (GHESv2) */
+/*
+ * Build Generic Hardware Error Source version 2 (GHESv2)
+ * ACPI 6.5: 18.3.2.8 Generic Hardware Error Source version 2 (GHESv2 - Type 10),
+ * Table 18.13: Generic Hardware Error Source version 2 (GHESv2)
+ */
 static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
 {
     uint64_t address_offset;
-    /*
-     * Type:
-     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
-     */
+    /* Type: (GHESv2 - Type 10) */
     build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
+
+    /* ACPI 6.5: Table 18.10 - Generic Hardware Error Source Structure */
     /* Source Id */
     build_append_int_noprefix(table_data, source_id, 2);
     /* Related Source Id */
@@ -481,11 +487,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
     /* Error Status Block Length */
     build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
 
-    /*
-     * Read Ack Register
-     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
-     * version 2 (GHESv2 - Type 10)
-     */
+    /* ACPI 6.5: fields defined at GHESv2 table */
+    /* Read Ack Register */
     address_offset = table_data->len;
     build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
                      4 /* QWord access */, 0);
@@ -504,11 +507,16 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
     build_append_int_noprefix(table_data, 0x1, 8);
 }
 
-/* Build Hardware Error Source Table */
+/*
+ * Build Hardware Error Source Table
+ * ACPI 6.4: 18.3.2 ACPI Error Source
+ * Table 18.2: Hardware Error Source Table (HEST)
+ */
 void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
                      const char *oem_id, const char *oem_table_id)
 {
-    AcpiTable table = { .sig = "HEST", .rev = 1,
+    AcpiTable table = { .sig = "HEST",
+                        .rev = 1,                   /* ACPI 4.0 to 6.4 */
                         .oem_id = oem_id, .oem_table_id = oem_table_id };
 
     acpi_table_begin(&table, table_data);
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
  2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2024-07-22  6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
@ 2024-07-22  6:45 ` Mauro Carvalho Chehab
  2024-07-25 10:03   ` Markus Armbruster
  2024-07-26 13:22   ` Jonathan Cameron via
  6 siblings, 2 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-22  6:45 UTC (permalink / raw)
  Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
	Alex Bennée, Michael S. Tsirkin, Philippe Mathieu-Daudé,
	Ani Sinha, Beraldo Leal, Dongjiu Geng, Eric Blake, Igor Mammedov,
	Markus Armbruster, Peter Maydell, Thomas Huth,
	Wainer dos Santos Moschetta, linux-kernel, qemu-arm, qemu-devel

Enrich CPER error injection logic for ARM processor to allow
setting values to  from UEFI 2.10 tables N.16 and N.17.

It should be noticed that, with such change, all arguments are
now optional, so, once QMP is negotiated with:

	{ "execute": "qmp_capabilities" }

the simplest way to generate a cache error is to use:

	{ "execute": "arm-inject-error" }

Also, as now PEI is mapped into an array, it is possible to
inject multiple errors at the same CPER record with:

	{ "execute": "arm-inject-error", "arguments": {
	   "error": [ {"type": [ "cache-error" ]},
		      {"type": [ "tlb-error" ]} ] } }

This would generate both cache and TLB errors, using default
values for other fields.

As all fields from ARM Processor CPER are now mapped, all
types of CPER records can be generated with the new QAPI.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 hw/acpi/ghes.c                  | 168 +++++++-------
 hw/arm/arm_error_inject.c       | 399 +++++++++++++++++++++++++++++++-
 hw/arm/arm_error_inject_stubs.c |  20 +-
 include/hw/acpi/ghes.h          |  40 +++-
 qapi/arm-error-inject.json      | 250 +++++++++++++++++++-
 tests/lcitool/libvirt-ci        |   2 +-
 6 files changed, 778 insertions(+), 101 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index ebf1b812aaaa..afd1d098a7e3 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -55,10 +55,10 @@
 #define ACPI_GHES_MEM_CPER_LENGTH           80
 
 /*
- * ARM Processor section CPER size, UEFI 2.10: N.2.4.4
- * ARM Processor Error Section
+ * ARM Processor error section CPER sizes - UEFI 2.10: N.2.4.4
  */
-#define ACPI_GHES_ARM_CPER_LENGTH (72 + 600)
+#define ACPI_GHES_ARM_CPER_LENGTH           40
+#define ACPI_GHES_ARM_CPER_PEI_LENGTH       32
 
 /* Masks for block_status flags */
 #define ACPI_GEBS_UNCORRECTABLE         1
@@ -242,94 +242,98 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
 }
 
 /* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
-static void acpi_ghes_build_append_arm_cper(uint8_t error_types, GArray *table)
+static void acpi_ghes_build_append_arm_cper(ArmError err, uint32_t cper_length,
+                                            GArray *table)
 {
+    unsigned int i, j;
+
     /*
      * ARM Processor Error Record
      */
 
     /* Validation Bits */
-    build_append_int_noprefix(table,
-                              (1ULL << 3) | /* Vendor specific info Valid */
-                              (1ULL << 2) | /* Running status Valid */
-                              (1ULL << 1) | /* Error affinity level Valid */
-                              (1ULL << 0), /* MPIDR Valid */
-                              4);
+    build_append_int_noprefix(table, err.validation, 4);
+
     /* Error Info Num */
-    build_append_int_noprefix(table, 1, 2);
+    build_append_int_noprefix(table, err.err_info_num, 2);
+
     /* Context Info Num */
-    build_append_int_noprefix(table, 1, 2);
+    build_append_int_noprefix(table, err.context_info_num, 2);
+
     /* Section length */
-    build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_LENGTH, 4);
+    build_append_int_noprefix(table, cper_length, 4);
+
     /* Error affinity level */
-    build_append_int_noprefix(table, 2, 1);
+    build_append_int_noprefix(table, err.affinity_level, 1);
+
     /* Reserved */
     build_append_int_noprefix(table, 0, 3);
+
     /* MPIDR_EL1 */
-    build_append_int_noprefix(table, 0xAB12, 8);
+    build_append_int_noprefix(table, err.mpidr_el1, 8);
+
     /* MIDR_EL1 */
-    build_append_int_noprefix(table, 0xCD24, 8);
+    build_append_int_noprefix(table, err.midr_el1, 8);
+
     /* Running state */
-    build_append_int_noprefix(table, 0x1, 4);
-    /* PSCI state */
-    build_append_int_noprefix(table, 0x1234, 4);
-
-    /* ARM Propcessor error information */
-    /* Version */
-    build_append_int_noprefix(table, 0, 1);
-    /*  Length */
-    build_append_int_noprefix(table, 32, 1);
-    /* Validation Bits */
-    build_append_int_noprefix(table,
-                              (1ULL << 4) | /* Physical fault address Valid */
-                             (1ULL << 3) | /* Virtual fault address Valid */
-                             (1ULL << 2) | /* Error information Valid */
-                              (1ULL << 1) | /* Flags Valid */
-                              (1ULL << 0), /* Multiple error count Valid */
-                              2);
-    /* Type */
-    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR) ||
-        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR) ||
-        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR) ||
-        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
-        build_append_int_noprefix(table, error_types, 1);
-    } else {
-        return;
+    build_append_int_noprefix(table, err.running_state, 4);
+
+    /* PSCI state: only valid when running state is zero  */
+    build_append_int_noprefix(table, err.psci_state, 4);
+
+    for (i = 0; i < err.err_info_num; i++) {
+        /* ARM Propcessor error information */
+        /* Version */
+        build_append_int_noprefix(table, 0, 1);
+
+        /*  Length */
+        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
+
+        /* Validation Bits */
+        build_append_int_noprefix(table, err.pei[i].validation, 2);
+
+        /* Type */
+        build_append_int_noprefix(table, err.pei[i].type, 1);
+
+        /* Multiple error count */
+        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
+
+        /* Flags  */
+        build_append_int_noprefix(table, err.pei[i].flags, 1);
+
+        /* Error information  */
+        build_append_int_noprefix(table, err.pei[i].error_info, 8);
+
+        /* Virtual fault address  */
+        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
+
+        /* Physical fault address  */
+        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
+    }
+
+    for (i = 0; i < err.context_info_num; i++) {
+        /* ARM Propcessor error context information */
+        /* Version */
+        build_append_int_noprefix(table, 0, 2);
+
+        /* Validation type */
+        build_append_int_noprefix(table, err.context[i].type, 2);
+
+        /* Register array size */
+        build_append_int_noprefix(table, err.context[i].size * 8, 4);
+
+        /* Register array (byte 8 of Context info) */
+        for (j = 0; j < err.context[i].size; j++) {
+            build_append_int_noprefix(table, err.context[i].array[j], 8);
+        }
     }
-    /* Multiple error count */
-    build_append_int_noprefix(table, 2, 2);
-    /* Flags  */
-    build_append_int_noprefix(table, 0xD, 1);
-    /* Error information  */
-    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR)) {
-        build_append_int_noprefix(table, 0x0091000F, 8);
-    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR)) {
-        build_append_int_noprefix(table, 0x0054007F, 8);
-    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR)) {
-        build_append_int_noprefix(table, 0x80D6460FFF, 8);
-    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
-        build_append_int_noprefix(table, 0x78DA03FF, 8);
-    } else {
-        return;
+
+    for (i = 0; i < err.vendor_num; i++) {
+        build_append_int_noprefix(table, err.vendor[i], 1);
     }
-    /* Virtual fault address  */
-    build_append_int_noprefix(table, 0x67320230, 8);
-    /* Physical fault address  */
-    build_append_int_noprefix(table, 0x5CDFD492, 8);
-
-    /* ARM Propcessor error context information */
-    /* Version */
-    build_append_int_noprefix(table, 0, 2);
-    /* Validation Bits */
-    /* AArch64 EL1 context registers Valid */
-    build_append_int_noprefix(table, 5, 2);
-    /* Register array size */
-    build_append_int_noprefix(table, 592, 4);
-    /* Register array */
-    build_append_int_noprefix(table, 0x12ABDE67, 8);
 }
 
-static int acpi_ghes_record_arm_error(uint8_t error_types,
+static int acpi_ghes_record_arm_error(ArmError error,
                                       uint64_t error_block_address)
 {
     GArray *block;
@@ -344,12 +348,18 @@ static int acpi_ghes_record_arm_error(uint8_t error_types,
      * Table 17-13 Generic Error Data Entry
      */
     QemuUUID fru_id = {};
-    uint32_t data_length;
+    uint32_t cper_length, data_length;
 
     block = g_array_new(false, true /* clear */, 1);
 
     /* This is the length if adding a new generic error data entry*/
-    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_ARM_CPER_LENGTH;
+    cper_length = ACPI_GHES_ARM_CPER_LENGTH;
+    cper_length += ACPI_GHES_ARM_CPER_PEI_LENGTH * error.err_info_num;
+    cper_length += error.context_length;
+    cper_length += error.vendor_num;
+
+    data_length = ACPI_GHES_DATA_LENGTH + cper_length;
+
     /*
      * It should not run out of the preallocated memory if adding a new generic
      * error data entry
@@ -363,11 +373,11 @@ static int acpi_ghes_record_arm_error(uint8_t error_types,
 
     /* Build this new generic error data entry header */
     acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
-        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
-        ACPI_GHES_ARM_CPER_LENGTH, fru_id, 0);
+                                 ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+                                 cper_length, fru_id, 0);
 
     /* Build the ARM processor error section CPER */
-    acpi_ghes_build_append_arm_cper(error_types, block);
+    acpi_ghes_build_append_arm_cper(error, cper_length, block);
 
     /* Write the generic error data entry into guest memory */
     cpu_physical_memory_write(error_block_address, block->data, block->len);
@@ -663,7 +673,7 @@ static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
     return true;
 }
 
-bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
+bool ghes_record_arm_errors(ArmError error, uint32_t notify)
 {
     int read_ack_register = 0;
     uint64_t read_ack_register_addr = 0;
@@ -689,7 +699,7 @@ bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
     read_ack_register = cpu_to_le64(0);
     cpu_physical_memory_write(read_ack_register_addr,
                               &read_ack_register, sizeof(uint64_t));
-    return acpi_ghes_record_arm_error(error_types, error_block_addr);
+    return acpi_ghes_record_arm_error(error, error_block_addr);
 }
 
 bool acpi_ghes_present(void)
diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
index 1da97d5d4fdc..67f1c77546b9 100644
--- a/hw/arm/arm_error_inject.c
+++ b/hw/arm/arm_error_inject.c
@@ -10,23 +10,408 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qapi-commands-arm-error-inject.h"
 #include "hw/boards.h"
 #include "hw/acpi/ghes.h"
+#include "cpu.h"
+
+#define ACPI_GHES_ARM_CPER_CTX_DEFAULT_NREGS 74
+
+/* Handle ARM Processor Error Information (PEI) */
+static const ArmProcessorErrorInformationList *default_pei = { 0 };
+
+static ArmPEI *qmp_arm_pei(uint16_t *err_info_num,
+              bool has_error,
+              ArmProcessorErrorInformationList const *error_list)
+{
+    ArmProcessorErrorInformationList const *next;
+    ArmPeiValidationBitsList const *validation_list;
+    ArmPEI *pei = NULL;
+    uint16_t i;
+
+    if (!has_error) {
+        error_list = default_pei;
+    }
+
+    *err_info_num = 0;
+
+    for (next = error_list; next; next = next->next) {
+        (*err_info_num)++;
+
+        if (*err_info_num >= 255) {
+            break;
+        }
+    }
+
+    pei = g_new0(ArmPEI, (*err_info_num));
+
+    for (next = error_list, i = 0;
+                i < *err_info_num; i++, next = next->next) {
+        ArmProcessorErrorTypeList *type_list = next->value->type;
+        uint16_t pei_validation = 0;
+        uint8_t flags = 0;
+        uint8_t type = 0;
+
+        if (next->value->has_validation) {
+            validation_list = next->value->validation;
+
+            while (validation_list) {
+                pei_validation |= BIT(next->value->validation->value);
+                validation_list = validation_list->next;
+            }
+        }
+
+        /*
+         * According with UEFI 2.9A errata, the meaning of this field is
+         * given by the following bitmap:
+         *
+         *   +-----|---------------------------+
+         *   | Bit | Meaning                   |
+         *   +=====+===========================+
+         *   |  1  | Cache Error               |
+         *   |  2  | TLB Error                 |
+         *   |  3  | Bus Error                 |
+         *   |  4  | Micro-architectural Error |
+         *   +-----|---------------------------+
+         *
+         *   All other values are reserved.
+         *
+         * As bit 0 is reserved, QAPI ArmProcessorErrorType starts from bit 1.
+         */
+        while (type_list) {
+            type |= BIT(type_list->value + 1);
+            type_list = type_list->next;
+        }
+        if (!has_error) {
+            type = BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR);
+        }
+        pei[i].type = type;
+
+        if (next->value->has_flags) {
+            ArmProcessorFlagsList *flags_list = next->value->flags;
+
+            while (flags_list) {
+                flags |= BIT(flags_list->value);
+                flags_list = flags_list->next;
+            }
+        } else {
+            flags = BIT(ARM_PROCESSOR_FLAGS_FIRST_ERROR_CAP) |
+                    BIT(ARM_PROCESSOR_FLAGS_PROPAGATED);
+        }
+        pei[i].flags = flags;
+
+        if (next->value->has_multiple_error) {
+            pei[i].multiple_error = next->value->multiple_error;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_MULTIPLE_ERROR_VALID);
+        }
+
+        if (next->value->has_error_info) {
+            pei[i].error_info = next->value->error_info;
+        } else {
+            switch (type) {
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR):
+                pei[i].error_info = 0x0091000F;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR):
+                pei[i].error_info = 0x0054007F;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR):
+                pei[i].error_info = 0x80D6460FFF;
+                break;
+            case BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR):
+                pei[i].error_info = 0x78DA03FF;
+                break;
+            default:
+                /*
+                 * UEFI 2.9A/2.10 doesn't define how this should be filled
+                 * when multiple types are there. So, set default to zero,
+                 * causing it to be removed from validation bits.
+                 */
+                pei[i].error_info = 0;
+            }
+        }
+
+        if (next->value->has_virt_addr) {
+            pei[i].virt_addr = next->value->virt_addr;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_VIRT_ADDR_VALID);
+        }
+
+        if (next->value->has_phy_addr) {
+            pei[i].phy_addr = next->value->phy_addr;
+            pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_PHY_ADDR_VALID);
+        }
+
+        if (!next->value->has_validation) {
+            if (pei[i].flags) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_FLAGS_VALID);
+            }
+            if (pei[i].error_info) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_ERROR_INFO_VALID);
+            }
+            if (next->value->has_virt_addr) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_VIRT_ADDR_VALID);
+            }
+
+            if (next->value->has_phy_addr) {
+                pei_validation |= BIT(ARM_PEI_VALIDATION_BITS_PHY_ADDR_VALID);
+            }
+        }
+
+        pei[i].validation = pei_validation;
+    }
+
+    return pei;
+}
+
+/*
+ * UEFI 2.10 default context register type (See UEFI 2.10 table N.21 for more)
+ */
+#define CONTEXT_AARCH32_EL1   1
+#define CONTEXT_AARCH64_EL1   5
+
+static int get_default_context_type(void)
+{
+    ARMCPU *cpu = ARM_CPU(qemu_get_cpu(0));
+    bool aarch64;
+
+    aarch64 = object_property_get_bool(OBJECT(cpu), "aarch64", NULL);
+
+    if (aarch64) {
+        return CONTEXT_AARCH64_EL1;
+    }
+    return CONTEXT_AARCH32_EL1;
+}
+
+/* Handle ARM Context */
+static ArmContext *qmp_arm_context(uint16_t *context_info_num,
+                                   uint32_t *context_length,
+                                   bool has_context,
+                                   ArmProcessorContextList const *context_list)
+{
+    ArmProcessorContextList const *next;
+    ArmContext *context = NULL;
+    uint16_t i, j, num, default_type;
+
+    default_type = get_default_context_type();
+
+    if (!has_context) {
+        *context_info_num = 0;
+        *context_length = 0;
+
+        return NULL;
+    }
+
+    /* Calculate sizes */
+    num = 0;
+    for (next = context_list; next; next = next->next) {
+        uint32_t n_regs = 0;
+
+        if (next->value->has_q_register) {
+            uint64List *reg = next->value->q_register;
+
+            while (reg) {
+                n_regs++;
+                reg = reg->next;
+            }
+
+            if (next->value->has_minimal_size &&
+                                        next->value->minimal_size < n_regs) {
+                n_regs = next->value->minimal_size;
+            }
+        } else if (!next->value->has_minimal_size) {
+            n_regs = ACPI_GHES_ARM_CPER_CTX_DEFAULT_NREGS;
+        }
+
+        if (!n_regs) {
+            next->value->minimal_size = 0;
+        } else {
+            next->value->minimal_size = (n_regs + 1) % 0xfffe;
+        }
+
+        num++;
+        if (num >= 65535) {
+            break;
+        }
+    }
+
+    context = g_new0(ArmContext, num);
+
+    /* Fill context data */
+
+    *context_length = 0;
+    *context_info_num = 0;
+
+    next = context_list;
+    for (i = 0; i < num; i++, next = next->next) {
+        if (!next->value->minimal_size) {
+            continue;
+        }
+
+        if (next->value->has_type) {
+            context[*context_info_num].type = next->value->type;
+        } else {
+            context[*context_info_num].type = default_type;
+        }
+        context[*context_info_num].size = next->value->minimal_size;
+        context[*context_info_num].array = g_malloc0(context[*context_info_num].size * 8);
+
+        (*context_info_num)++;
+
+        /* length = 64 bits * (size of the reg array + context type) */
+        *context_length += (context->size + 1) * 8;
+
+        if (!next->value->has_q_register) {
+            *context->array = 0xDEADBEEF;
+        } else {
+            uint64_t *pos = context->array;
+            uint64List *reg = next->value->q_register;
+
+            for (j = 0; j < context->size; j++) {
+                if (!reg) {
+                    break;
+                }
+
+                *(pos++) = reg->value;
+                reg = reg->next;
+            }
+        }
+    }
+
+    if (!*context_info_num) {
+        g_free(context);
+        return NULL;
+    }
+
+    return context;
+}
+
+static uint8_t *qmp_arm_vendor(uint32_t *vendor_num, bool has_vendor_specific,
+                               uint8List const *vendor_specific_list)
+{
+    uint8List const *next = vendor_specific_list;
+    uint8_t *vendor = NULL, *p;
+
+    if (!has_vendor_specific) {
+        return NULL;
+    }
+
+    *vendor_num = 0;
+
+    while (next) {
+        next = next->next;
+        (*vendor_num)++;
+    }
+
+    vendor = g_malloc(*vendor_num);
+
+    p = vendor;
+    next = vendor_specific_list;
+    while (next) {
+        *p = next->value;
+        next = next->next;
+        p++;
+    }
+
+    return vendor;
+}
 
 /* For ARM processor errors */
-void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
+void qmp_arm_inject_error(bool has_validation,
+                    ArmProcessorValidationBitsList *validation_list,
+                    bool has_affinity_level,
+                    uint8_t affinity_level,
+                    bool has_mpidr_el1,
+                    uint64_t mpidr_el1,
+                    bool has_midr_el1,
+                    uint64_t midr_el1,
+                    bool has_running_state,
+                    ArmProcessorRunningStateList *running_state_list,
+                    bool has_psci_state,
+                    uint32_t psci_state,
+                    bool has_context, ArmProcessorContextList *context_list,
+                    bool has_vendor_specific, uint8List *vendor_specific_list,
+                    bool has_error,
+                    ArmProcessorErrorInformationList *error_list,
+                    Error **errp)
 {
     MachineState *machine = MACHINE(qdev_get_machine());
     MachineClass *mc = MACHINE_GET_CLASS(machine);
-    uint8_t error_types = 0;
+    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0));
+    uint32_t running_state = 0;
+    uint16_t validation = 0;
+    ArmError error;
+    uint16_t i;
 
-    while (errortypes) {
-        error_types |= BIT(errortypes->value);
-        errortypes = errortypes->next;
+    /* Handle UEFI 2.0 N.16 specific fields, setting defaults when needed */
+
+    if (!has_midr_el1) {
+        mpidr_el1 = armcpu->midr;
+    }
+
+    if (!has_mpidr_el1) {
+        mpidr_el1 = armcpu->mpidr;
+    }
+
+    if (has_running_state) {
+        while (running_state_list) {
+            running_state |= BIT(running_state_list->value);
+            running_state_list = running_state_list;
+        }
+
+        if (running_state) {
+            error.psci_state = 0;
+        }
+    }
+
+    if (has_validation) {
+        while (validation_list) {
+            validation |= BIT(validation_list->value);
+            validation_list = validation_list->next;
+        }
+    } else {
+        if (has_vendor_specific) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_VENDOR_SPECIFIC_VALID);
+        }
+
+        if (has_affinity_level) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_AFFINITY_VALID);
+        }
+
+        if (mpidr_el1) {
+            validation = BIT(ARM_PROCESSOR_VALIDATION_BITS_MPIDR_VALID);
+        }
+
+        if (!has_running_state) {
+            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_RUNNING_STATE_VALID);
+        }
+    }
+
+    /* Fill an error record */
+
+    error.validation = validation;
+    error.affinity_level = affinity_level;
+    error.mpidr_el1 = mpidr_el1;
+    error.midr_el1 = midr_el1;
+    error.running_state = running_state;
+    error.psci_state = psci_state;
+
+    error.pei = qmp_arm_pei(&error.err_info_num, has_error, error_list);
+    error.context = qmp_arm_context(&error.context_info_num,
+                                    &error.context_length,
+                                    has_context, context_list);
+    error.vendor = qmp_arm_vendor(&error.vendor_num, has_vendor_specific,
+                                  vendor_specific_list);
+
+    ghes_record_arm_errors(error, ACPI_GHES_NOTIFY_GPIO);
+
+    if (error.context) {
+        for (i = 0; i < error.context_info_num; i++) {
+            g_free(error.context[i].array);
+        }
     }
+    g_free(error.context);
+    g_free(error.pei);
+    g_free(error.vendor);
 
-    ghes_record_arm_errors(error_types, ACPI_GHES_NOTIFY_GPIO);
     if (mc->set_error) {
         mc->set_error();
     }
diff --git a/hw/arm/arm_error_inject_stubs.c b/hw/arm/arm_error_inject_stubs.c
index b51f4202fe64..be6e8be2d0d9 100644
--- a/hw/arm/arm_error_inject_stubs.c
+++ b/hw/arm/arm_error_inject_stubs.c
@@ -10,9 +10,25 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "qapi-commands-arm-error-inject.h"
+#include "hw/acpi/ghes.h"
 
-void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
+void qmp_arm_inject_error(bool has_validation,
+                        ArmProcessorValidationBitsList *validation,
+                        bool has_affinity_level,
+                        uint8_t affinity_level,
+                        bool has_mpidr_el1,
+                        uint64_t mpidr_el1,
+                        bool has_midr_el1,
+                        uint64_t midr_el1,
+                        bool has_running_state,
+                        ArmProcessorRunningStateList *running_state,
+                        bool has_psci_state,
+                        uint32_t psci_state,
+                        bool has_context, ArmProcessorContextList *context,
+                        bool has_vendor_specific, uint8List *vendor_specific,
+                        bool has_error,
+                        ArmProcessorErrorInformationList *error,
+                        Error **errp)
 {
     error_setg(errp, "ARM processor error support is not compiled in");
 }
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index dc531ffce7ae..c591a5fb02c4 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
 #define ACPI_GHES_H
 
 #include "hw/acpi/bios-linker-loader.h"
+#include "qapi/qapi-commands-arm-error-inject.h"
 
 /*
  * Values for Hardware Error Notification Type field
@@ -68,6 +69,43 @@ typedef struct AcpiGhesState {
     bool present; /* True if GHES is present at all on this board */
 } AcpiGhesState;
 
+typedef struct ArmPEI {
+    uint16_t validation;
+    uint8_t type;
+    uint16_t multiple_error;
+    uint8_t flags;
+    uint64_t error_info;
+    uint64_t virt_addr;
+    uint64_t phy_addr;
+} ArmPEI;
+
+typedef struct ArmContext {
+    uint16_t type;
+    uint32_t size;
+    uint64_t *array;
+} ArmContext;
+
+/* ARM processor - UEFI 2.10 table N.16 */
+typedef struct ArmError {
+    uint16_t validation;
+
+    uint8_t affinity_level;
+    uint64_t mpidr_el1;
+    uint64_t midr_el1;
+    uint32_t running_state;
+    uint32_t psci_state;
+
+    /* Those are calculated based on the input data */
+    uint16_t err_info_num;
+    uint16_t context_info_num;
+    uint32_t vendor_num;
+    uint32_t context_length;
+
+    ArmPEI *pei;
+    ArmContext *context;
+    uint8_t *vendor;
+} ArmError;
+
 void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
 void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
                      const char *oem_id, const char *oem_table_id);
@@ -75,7 +113,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
 
-bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify);
+bool ghes_record_arm_errors(ArmError error, uint32_t notify);
 
 /**
  * acpi_ghes_present: Report whether ACPI GHES table is present
diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
index 430e6cea6b60..2a314830fe60 100644
--- a/qapi/arm-error-inject.json
+++ b/qapi/arm-error-inject.json
@@ -2,40 +2,258 @@
 # vim: filetype=python
 
 ##
-# = ARM Processor Errors
+# = ARM Processor Errors as defined at:
+# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
+# See tables N.16, N.17 and N.21.
 ##
 
+##
+# @ArmProcessorValidationBits:
+#
+# Indcates whether or not fields of ARM processor CPER record are valid.
+#
+# @mpidr-valid:  MPIDR Valid
+#
+# @affinity-valid: Error affinity level Valid
+#
+# @running-state-valid: Running State
+#
+# @vendor-specific-valid: Vendor Specific Info Valid
+#
+# Since: 9.1
+##
+{ 'enum': 'ArmProcessorValidationBits',
+  'data': ['mpidr-valid',
+           'affinity-valid',
+           'running-state-valid',
+           'vendor-specific-valid']
+}
+
+##
+# @ArmProcessorFlags:
+#
+# Indicates error attributes at the Error info section.
+#
+# @first-error-cap: First error captured
+#
+# @last-error-cap:  Last error captured
+#
+# @propagated: Propagated
+#
+# @overflow: Overflow
+#
+# Since: 9.1
+##
+{ 'enum': 'ArmProcessorFlags',
+  'data': ['first-error-cap',
+           'last-error-cap',
+           'propagated',
+           'overflow']
+}
+
+##
+# @ArmProcessorRunningState:
+#
+# Indicates if the processor is running.
+#
+# @processor-running: indicates that the processor is running
+#
+# Since: 9.1
+##
+{ 'enum': 'ArmProcessorRunningState',
+  'data': ['processor-running']
+}
+
 ##
 # @ArmProcessorErrorType:
 #
-# Type of ARM processor error to inject
-#
-# @unknown-error: Unknown error
+# Type of ARM processor error information to inject.
 #
 # @cache-error: Cache error
 #
 # @tlb-error: TLB error
 #
-# @bus-error: Bus error.
+# @bus-error: Bus error
 #
-# @micro-arch-error: Micro architectural error.
+# @micro-arch-error: Micro architectural error
 #
 # Since: 9.1
 ##
 { 'enum': 'ArmProcessorErrorType',
-  'data': ['unknown-error',
-	   'cache-error',
+  'data': ['cache-error',
            'tlb-error',
            'bus-error',
            'micro-arch-error']
+ }
+
+##
+# @ArmPeiValidationBits:
+#
+# Indcates whether or not fields of Processor Error Info section are valid.
+#
+# @multiple-error-valid: Information at multiple-error field is valid
+#
+# @flags-valid: Information at flags field is valid
+#
+# @error-info-valid: Information at error-info field is valid
+#
+# @virt-addr-valid: Information at virt-addr field is valid
+#
+# @phy-addr-valid: Information at phy-addr field is valid
+#
+# Since: 9.1
+##
+{ 'enum': 'ArmPeiValidationBits',
+  'data': ['multiple-error-valid',
+           'flags-valid',
+           'error-info-valid',
+           'virt-addr-valid',
+           'phy-addr-valid']
+}
+
+##
+# @ArmProcessorErrorInformation:
+#
+# Contains ARM processor error information (PEI) data according with UEFI
+# CPER table N.17.
+#
+# @validation:
+#       Valid validation bits for error-info section.
+#       Argument is optional. If not specified, those flags will be enabled:
+#       first-error-cap and propagated.
+#
+# @type:
+#       ARM processor error types to inject. Argument is mandatory.
+#
+# @multiple-error:
+#       Indicates whether multiple errors have occurred.
+#       Argument is optional. If not specified and @validation not enforced,
+#       this field will be marked as invalid at CPER record..
+#
+# @flags:
+#       Indicates flags that describe the error attributes.
+#       Argument is optional. If not specified and defaults to
+#       first-error and propagated.
+#
+# @error-info:
+#       Error information structure is specific to each error type.
+#       Argument is optional, and its value depends on the PEI type(s).
+#       If not defined, the default depends on the type:
+#       - for cache-error: 0x0091000F;
+#       - for tlb-error: 0x0054007F;
+#       - for bus-error: 0x80D6460FFF;
+#       - for micro-arch-error: 0x78DA03FF;
+#       - if multiple types used, this bit is disabled from @validation bits.
+#
+# @virt-addr:
+#       Virtual fault address associated with the error.
+#       Argument is optional. If not specified and @validation not enforced,
+#       this field will be marked as invalid at CPER record..
+#
+# @phy-addr:
+#       Physical fault address associated with the error.
+#       Argument is optional. If not specified and @validation not enforced,
+#       this field will be marked as invalid at CPER record..
+#
+# Since: 9.1
+##
+{ 'struct': 'ArmProcessorErrorInformation',
+  'data': { '*validation': ['ArmPeiValidationBits'],
+            'type': ['ArmProcessorErrorType'],
+            '*multiple-error': 'uint16',
+            '*flags': ['ArmProcessorFlags'],
+            '*error-info': 'uint64',
+            '*virt-addr':  'uint64',
+            '*phy-addr': 'uint64'}
+}
+
+##
+# @ArmProcessorContext:
+#
+# Provide processor context state specific to the ARM processor architecture,
+# According with UEFI 2.10 CPER table N.21.
+# Argument is optional.If not specified, no context will be used.
+#
+# @type:
+#       Contains an integer value indicating the type of context state being
+#       reported.
+#       Argument is optional. If not defined, it will be set to be EL1 register
+#       for the emulation, e. g.:
+#       - on arm32: AArch32 EL1 context registers;
+#       - on arm64: AArch64 EL1 context registers.
+#
+# @register:
+#       Provides the contents of the actual registers or raw data, depending
+#       on the context type.
+#       Argument is optional. If not defined, it will fill the first register
+#       with 0xDEADBEEF, and the other ones with zero.
+#
+# @minimal-size:
+#       Argument is optional. If provided, define the minimal size of the
+#       context register array. The actual size is defined by checking the
+#       number of register values plus the content of this field (if used),
+#       ensuring that each processor context information structure array is
+#       padded with zeros if the size is not a multiple of 16 bytes.
+#
+# Since: 9.1
+##
+{ 'struct': 'ArmProcessorContext',
+  'data': { '*type': 'uint16',
+            '*minimal-size': 'uint32',
+            '*register': ['uint64']}
 }
 
 ##
 # @arm-inject-error:
 #
-# Inject ARM Processor error.
+# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
+# CPER table N.16.
 #
-# @errortypes: ARM processor error types to inject
+# @validation:
+#       Valid validation bits for ARM processor CPER.
+#       Argument is optional. If not specified, the default is
+#       calculated based on having the corresponding arguments filled.
+#
+# @affinity-level:
+#       Error affinity level for errors that can be attributed to a specific
+#       affinity level.
+#       Argument is optional. If not specified and @validation not enforced,
+#       this field will be marked as invalid at CPER record.
+#
+# @mpidr-el1:
+#       Processor’s unique ID in the system.
+#       Argument is optional. If not specified, it will use the cpu mpidr
+#       field from the emulation data. If zero and @validation is not
+#       enforced, this field will be marked as invalid at CPER record.
+#
+# @midr-el1:  Identification info of the chip
+#       Argument is optional. If not specified, it will use the cpu mpidr
+#       field from the emulation data. If zero and @validation is not
+#       enforced, this field will be marked as invalid at CPER record.
+#
+# @running-state:
+#       Indicates the running state of the processor.
+#       Argument is optional. If not specified and @validation not enforced,
+#       this field will be marked as invalid at CPER record.
+#
+# @psci-state:
+#       Provides PSCI state of the processor, as defined in ARM PSCI document.
+#       Argument is optional. If not specified, it will use the cpu power
+#       state field from the emulation data.
+#
+# @context:
+#       Contains an array of processor context registers.
+#       Argument is optional. If not specified, no context will be added.
+#
+# @vendor-specific:
+#       Contains a byte array of vendor-specific data.
+#       Argument is optional. If not specified, no vendor-specific data
+#       will be added.
+#
+# @error:
+#       Contains an array of ARM processor error information (PEI) sections.
+#       Argument is optional. If not specified, defaults to a single
+#       Program Error Information record defaulting to type=cache-error.
 #
 # Features:
 #
@@ -44,6 +262,16 @@
 # Since: 9.1
 ##
 { 'command': 'arm-inject-error',
-  'data': { 'errortypes': ['ArmProcessorErrorType'] },
+  'data': {
+    '*validation': ['ArmProcessorValidationBits'],
+    '*affinity-level': 'uint8',
+    '*mpidr-el1': 'uint64',
+    '*midr-el1': 'uint64',
+    '*running-state':  ['ArmProcessorRunningState'],
+    '*psci-state': 'uint32',
+    '*context': ['ArmProcessorContext'],
+    '*vendor-specific': ['uint8'],
+    '*error': ['ArmProcessorErrorInformation']
+  },
   'features': [ 'unstable' ]
 }
diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
index 0e9490cebc72..77c800186f34 160000
--- a/tests/lcitool/libvirt-ci
+++ b/tests/lcitool/libvirt-ci
@@ -1 +1 @@
-Subproject commit 0e9490cebc726ef772b6c9e27dac32e7ae99f9b2
+Subproject commit 77c800186f34b21be7660750577cc5582a914deb
-- 
2.45.2



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
@ 2024-07-25  9:48   ` Markus Armbruster
  2024-07-26 12:46     ` Jonathan Cameron via
  2024-07-29 12:21     ` Mauro Carvalho Chehab
  2024-07-26 12:44   ` Jonathan Cameron via
  2024-07-30 11:17   ` Igor Mammedov
  2 siblings, 2 replies; 42+ messages in thread
From: Markus Armbruster @ 2024-07-25  9:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> 1. Some GHES functions require handling addresses. Add a helper function
>    to support it.
>
> 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
>
> Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
> upper specs, using error type bit encoding as detailed at UEFI 2.9A
> errata.
>
> Error injection examples:
>
> { "execute": "qmp_capabilities" }
>
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error']
>       }
> }
>
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['tlb-error']
>       }
> }
>
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['bus-error']
>       }
> }
>
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error']
>       }
> }
>
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>       }
> }
> ...
>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> For Add a logic to handle block addresses,
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> For FW first ARM processor error injection,
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  configs/targets/aarch64-softmmu.mak |   1 +
>  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
>  hw/arm/Kconfig                      |   4 +
>  hw/arm/arm_error_inject.c           |  35 ++++
>  hw/arm/arm_error_inject_stubs.c     |  18 ++
>  hw/arm/meson.build                  |   3 +
>  include/hw/acpi/ghes.h              |   2 +
>  qapi/arm-error-inject.json          |  49 ++++++
>  qapi/meson.build                    |   1 +
>  qapi/qapi-schema.json               |   1 +
>  10 files changed, 361 insertions(+), 11 deletions(-)
>  create mode 100644 hw/arm/arm_error_inject.c
>  create mode 100644 hw/arm/arm_error_inject_stubs.c
>  create mode 100644 qapi/arm-error-inject.json

Since the new file not covered in MAINTAINERS, get_maintainer.pl will
blame it on the QAPI maintainers alone.  No good.

[...]

> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> new file mode 100644
> index 000000000000..430e6cea6b60
> --- /dev/null
> +++ b/qapi/arm-error-inject.json
> @@ -0,0 +1,49 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = ARM Processor Errors
> +##
> +
> +##
> +# @ArmProcessorErrorType:
> +#
> +# Type of ARM processor error to inject
> +#
> +# @unknown-error: Unknown error

Removed in PATCH 7, and unused until then.  Why add it in the first
place?

> +#
> +# @cache-error: Cache error
> +#
> +# @tlb-error: TLB error
> +#
> +# @bus-error: Bus error.
> +#
> +# @micro-arch-error: Micro architectural error.
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmProcessorErrorType',
> +  'data': ['unknown-error',
> +	   'cache-error',

Tab in this line.  Please convert to spaces.

> +           'tlb-error',
> +           'bus-error',
> +           'micro-arch-error']
> +}
> +
> +##
> +# @arm-inject-error:
> +#
> +# Inject ARM Processor error.
> +#
> +# @errortypes: ARM processor error types to inject
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.1
> +##
> +{ 'command': 'arm-inject-error',
> +  'data': { 'errortypes': ['ArmProcessorErrorType'] },

Please separate words with dashes: 'error-types'.

> +  'features': [ 'unstable' ]
> +}

Is this used only with TARGET_ARM?

Why is being able to inject multiple error types at once useful?

I'd expect at least some of these errors to come with additional
information.  For instance, I imagine a bus error is associated with
some address.

If we encode the the error to inject as an enum value, adding more will
be hard.

If we wrap the enum in a struct

    { 'struct': 'ArmProcessorError',
      'data': { 'type': 'ArmProcessorErrorType' } }

we can later extend it like

    { 'union': 'ArmProcessorError',
      'base: { 'type': 'ArmProcessorErrorType' }
      'data': {
          'bus-error': 'ArmProcessorBusErrorData' } }

    { 'struct': 'ArmProcessorBusErrorData',
      'data': ... }

> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..5927932c4be3 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
>  endif
>  
>  qapi_all_modules = [
> +  'arm-error-inject',
>    'authz',
>    'block',
>    'block-core',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..479a22de7e43 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -81,3 +81,4 @@
>  { 'include': 'vfio.json' }
>  { 'include': 'cryptodev.json' }
>  { 'include': 'cxl.json' }
> +{ 'include': 'arm-error-inject.json' }



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
  2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
@ 2024-07-25 10:03   ` Markus Armbruster
  2024-07-29 11:18     ` Mauro Carvalho Chehab
  2024-07-26 13:22   ` Jonathan Cameron via
  1 sibling, 1 reply; 42+ messages in thread
From: Markus Armbruster @ 2024-07-25 10:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Alex Bennée,
	Michael S. Tsirkin, Philippe Mathieu-Daudé, Ani Sinha,
	Beraldo Leal, Dongjiu Geng, Eric Blake, Igor Mammedov,
	Peter Maydell, Thomas Huth, Wainer dos Santos Moschetta,
	linux-kernel, qemu-arm, qemu-devel

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Enrich CPER error injection logic for ARM processor to allow
> setting values to  from UEFI 2.10 tables N.16 and N.17.
>
> It should be noticed that, with such change, all arguments are
> now optional, so, once QMP is negotiated with:
>
> 	{ "execute": "qmp_capabilities" }
>
> the simplest way to generate a cache error is to use:
>
> 	{ "execute": "arm-inject-error" }
>
> Also, as now PEI is mapped into an array, it is possible to
> inject multiple errors at the same CPER record with:
>
> 	{ "execute": "arm-inject-error", "arguments": {
> 	   "error": [ {"type": [ "cache-error" ]},
> 		      {"type": [ "tlb-error" ]} ] } }
>
> This would generate both cache and TLB errors, using default
> values for other fields.
>
> As all fields from ARM Processor CPER are now mapped, all
> types of CPER records can be generated with the new QAPI.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

[...]

> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> index 430e6cea6b60..2a314830fe60 100644
> --- a/qapi/arm-error-inject.json
> +++ b/qapi/arm-error-inject.json
> @@ -2,40 +2,258 @@
>  # vim: filetype=python
>  
>  ##
> -# = ARM Processor Errors
> +# = ARM Processor Errors as defined at:
> +# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
> +# See tables N.16, N.17 and N.21.
>  ##

This comes out badly in HTML.

Try something like

   # = ARM Processor Errors
   #
   # These are defined at
   # https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
   # See tables N.16, N.17 and N.21.

If any part of this is relevant in PATCH 4 already, squash the relevant
parts into that patch please.

>  
> +##
> +# @ArmProcessorValidationBits:
> +#
> +# Indcates whether or not fields of ARM processor CPER record are valid.

docs/devel/qapi-code-gen.rst section "Documentation markup":

    For legibility, wrap text paragraphs so every line is at most 70
    characters long.

> +#
> +# @mpidr-valid:  MPIDR Valid
> +#
> +# @affinity-valid: Error affinity level Valid
> +#
> +# @running-state-valid: Running State
> +#
> +# @vendor-specific-valid: Vendor Specific Info Valid
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmProcessorValidationBits',
> +  'data': ['mpidr-valid',
> +           'affinity-valid',
> +           'running-state-valid',
> +           'vendor-specific-valid']
> +}
> +
> +##
> +# @ArmProcessorFlags:
> +#
> +# Indicates error attributes at the Error info section.
> +#
> +# @first-error-cap: First error captured
> +#
> +# @last-error-cap:  Last error captured
> +#
> +# @propagated: Propagated
> +#
> +# @overflow: Overflow
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmProcessorFlags',
> +  'data': ['first-error-cap',
> +           'last-error-cap',
> +           'propagated',
> +           'overflow']
> +}
> +
> +##
> +# @ArmProcessorRunningState:
> +#
> +# Indicates if the processor is running.
> +#
> +# @processor-running: indicates that the processor is running
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmProcessorRunningState',
> +  'data': ['processor-running']
> +}
> +
>  ##
>  # @ArmProcessorErrorType:
>  #
> -# Type of ARM processor error to inject
> -#
> -# @unknown-error: Unknown error
> +# Type of ARM processor error information to inject.
>  #
>  # @cache-error: Cache error
>  #
>  # @tlb-error: TLB error
>  #
> -# @bus-error: Bus error.
> +# @bus-error: Bus error
>  #
> -# @micro-arch-error: Micro architectural error.
> +# @micro-arch-error: Micro architectural error
>  #
>  # Since: 9.1
>  ##
>  { 'enum': 'ArmProcessorErrorType',
> -  'data': ['unknown-error',
> -	   'cache-error',
> +  'data': ['cache-error',
>             'tlb-error',
>             'bus-error',
>             'micro-arch-error']
> + }

Squash the changes to this type into PATCH 4, please.

> +
> +##
> +# @ArmPeiValidationBits:
> +#
> +# Indcates whether or not fields of Processor Error Info section are valid.
> +#
> +# @multiple-error-valid: Information at multiple-error field is valid
> +#
> +# @flags-valid: Information at flags field is valid
> +#
> +# @error-info-valid: Information at error-info field is valid
> +#
> +# @virt-addr-valid: Information at virt-addr field is valid
> +#
> +# @phy-addr-valid: Information at phy-addr field is valid
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmPeiValidationBits',
> +  'data': ['multiple-error-valid',
> +           'flags-valid',
> +           'error-info-valid',
> +           'virt-addr-valid',
> +           'phy-addr-valid']
> +}
> +
> +##
> +# @ArmProcessorErrorInformation:
> +#
> +# Contains ARM processor error information (PEI) data according with UEFI
> +# CPER table N.17.
> +#
> +# @validation:
> +#       Valid validation bits for error-info section.
> +#       Argument is optional. If not specified, those flags will be enabled:
> +#       first-error-cap and propagated.

Please format like this for consistency:

   # @validation: Valid validation bits for error-info section.
   #     Argument is optional.  If not specified, those flags will be
   #     enabled: first-error-cap and propagated.

> +#
> +# @type:
> +#       ARM processor error types to inject. Argument is mandatory.
> +#
> +# @multiple-error:
> +#       Indicates whether multiple errors have occurred.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# @flags:
> +#       Indicates flags that describe the error attributes.
> +#       Argument is optional. If not specified and defaults to
> +#       first-error and propagated.
> +#
> +# @error-info:
> +#       Error information structure is specific to each error type.
> +#       Argument is optional, and its value depends on the PEI type(s).
> +#       If not defined, the default depends on the type:
> +#       - for cache-error: 0x0091000F;
> +#       - for tlb-error: 0x0054007F;
> +#       - for bus-error: 0x80D6460FFF;
> +#       - for micro-arch-error: 0x78DA03FF;
> +#       - if multiple types used, this bit is disabled from @validation bits.
> +#
> +# @virt-addr:
> +#       Virtual fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# @phy-addr:
> +#       Physical fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorErrorInformation',
> +  'data': { '*validation': ['ArmPeiValidationBits'],
> +            'type': ['ArmProcessorErrorType'],
> +            '*multiple-error': 'uint16',
> +            '*flags': ['ArmProcessorFlags'],
> +            '*error-info': 'uint64',
> +            '*virt-addr':  'uint64',
> +            '*phy-addr': 'uint64'}
> +}
> +
> +##
> +# @ArmProcessorContext:
> +#
> +# Provide processor context state specific to the ARM processor architecture,
> +# According with UEFI 2.10 CPER table N.21.
> +# Argument is optional.If not specified, no context will be used.
> +#
> +# @type:
> +#       Contains an integer value indicating the type of context state being
> +#       reported.
> +#       Argument is optional. If not defined, it will be set to be EL1 register
> +#       for the emulation, e. g.:
> +#       - on arm32: AArch32 EL1 context registers;
> +#       - on arm64: AArch64 EL1 context registers.
> +#
> +# @register:
> +#       Provides the contents of the actual registers or raw data, depending
> +#       on the context type.
> +#       Argument is optional. If not defined, it will fill the first register
> +#       with 0xDEADBEEF, and the other ones with zero.
> +#
> +# @minimal-size:
> +#       Argument is optional. If provided, define the minimal size of the
> +#       context register array. The actual size is defined by checking the
> +#       number of register values plus the content of this field (if used),
> +#       ensuring that each processor context information structure array is
> +#       padded with zeros if the size is not a multiple of 16 bytes.
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorContext',
> +  'data': { '*type': 'uint16',
> +            '*minimal-size': 'uint32',
> +            '*register': ['uint64']}
>  }
>  
>  ##
>  # @arm-inject-error:
>  #
> -# Inject ARM Processor error.
> +# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
> +# CPER table N.16.
>  #
> -# @errortypes: ARM processor error types to inject
> +# @validation:
> +#       Valid validation bits for ARM processor CPER.
> +#       Argument is optional. If not specified, the default is
> +#       calculated based on having the corresponding arguments filled.
> +#
> +# @affinity-level:
> +#       Error affinity level for errors that can be attributed to a specific
> +#       affinity level.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.
> +#
> +# @mpidr-el1:
> +#       Processor’s unique ID in the system.
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.
> +#
> +# @midr-el1:  Identification info of the chip
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.
> +#
> +# @running-state:
> +#       Indicates the running state of the processor.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.
> +#
> +# @psci-state:
> +#       Provides PSCI state of the processor, as defined in ARM PSCI document.
> +#       Argument is optional. If not specified, it will use the cpu power
> +#       state field from the emulation data.
> +#
> +# @context:
> +#       Contains an array of processor context registers.
> +#       Argument is optional. If not specified, no context will be added.
> +#
> +# @vendor-specific:
> +#       Contains a byte array of vendor-specific data.
> +#       Argument is optional. If not specified, no vendor-specific data
> +#       will be added.
> +#
> +# @error:
> +#       Contains an array of ARM processor error information (PEI) sections.
> +#       Argument is optional. If not specified, defaults to a single
> +#       Program Error Information record defaulting to type=cache-error.
>  #
>  # Features:
>  #
> @@ -44,6 +262,16 @@
>  # Since: 9.1
>  ##
>  { 'command': 'arm-inject-error',
> -  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> +  'data': {
> +    '*validation': ['ArmProcessorValidationBits'],
> +    '*affinity-level': 'uint8',
> +    '*mpidr-el1': 'uint64',
> +    '*midr-el1': 'uint64',
> +    '*running-state':  ['ArmProcessorRunningState'],
> +    '*psci-state': 'uint32',
> +    '*context': ['ArmProcessorContext'],
> +    '*vendor-specific': ['uint8'],
> +    '*error': ['ArmProcessorErrorInformation']
> +  },
>    'features': [ 'unstable' ]
>  }

This changes the command pretty much completely.  Why is the previous
state worth capturing in git?

> diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
> index 0e9490cebc72..77c800186f34 160000
> --- a/tests/lcitool/libvirt-ci
> +++ b/tests/lcitool/libvirt-ci
> @@ -1 +1 @@
> -Subproject commit 0e9490cebc726ef772b6c9e27dac32e7ae99f9b2
> +Subproject commit 77c800186f34b21be7660750577cc5582a914deb

Accident?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-07-26 12:30   ` Jonathan Cameron via
  2024-07-30  8:36   ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-26 12:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Michael S. Tsirkin, Philippe Mathieu-Daudé,
	Ani Sinha, Eduardo Habkost, Igor Mammedov, Marcel Apfelbaum,
	Peter Maydell, Shannon Zhao, Yanan Wang, linux-kernel, qemu-arm,
	qemu-devel

On Mon, 22 Jul 2024 08:45:54 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Creates a GED - Generic Event Device and set a GPIO to
The wonder of confusing names in ACPI.  I thought I'd
fixed this but clearly not.

This GED isn't a Generic Event Device, it's a 
Generic Error (Device) as per 
18.3.2.7.2  Event Notification for Generic Error Sources
in ACPI 6.5
PNP0C33 vs Generic Event Device which is unrelated :(
and has ID ACPI0013

This one is a bit of ACPI glue logic.
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
which is the kernel patch provides a bit more info.
In kernel world this is called Hardware Error Device
I guess to avoid tripping over GED naming?
Maybe we are better using that here - so HED_*

> be used or error injection.

This should probably shout a bit more about the
set_error callback added to struct MachineClass

Whilst that works I suspect it will be controversial and I'd like to hear
other suggestions on how to provide a convenient hook to signal that
we've put an event in the firmware buffer where that signaling can come
from pretty much any hw device emulation in QEMU.

Jonathan



> 
> [mchehab: use a define for the generic event pin number and do some cleanups]
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
>  hw/arm/virt.c            | 14 ++++++++++++--
>  include/hw/arm/virt.h    |  1 +
>  include/hw/boards.h      |  1 +
>  4 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..c502ccf40909 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -63,6 +63,7 @@
>  
>  #define ARM_SPI_BASE 32
>  
> +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"
>  #define ACPI_BUILD_TABLE_SIZE             0x20000
>  
>  static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
> @@ -142,6 +143,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>  static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>                                             uint32_t gpio_irq)
>  {
> +    uint32_t pin;
> +
>      Aml *dev = aml_device("GPO0");
>      aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
>      aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> @@ -155,7 +158,12 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>  
>      Aml *aei = aml_resource_template();
>  
> -    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
> +    pin = GPIO_PIN_POWER_BUTTON;
> +    aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> +                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
> +                                 "GPO0", NULL, 0));
> +    /* Pin for generic error */
> +    pin = GPIO_PIN_GENERIC_ERROR;
>      aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
>                                   AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
>                                   "GPO0", NULL, 0));
> @@ -166,6 +174,11 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>      aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
>                                    aml_int(0x80)));
>      aml_append(dev, method);
> +    method = aml_method("_E06", 0, AML_NOTSERIALIZED);
> +    aml_append(method, aml_notify(aml_name(ACPI_GENERIC_EVENT_DEVICE),
> +                                  aml_int(0x80)));
> +    aml_append(dev, method);
> +
>      aml_append(scope, dev);
>  }
>  
> @@ -800,6 +813,15 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
>      build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
>  }
>  
> +static void acpi_dsdt_add_generic_event_device(Aml *scope)
> +{
> +    Aml *dev = aml_device(ACPI_GENERIC_EVENT_DEVICE);
> +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> +    aml_append(scope, dev);
> +}
> +
>  /* DSDT */
>  static void
>  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>                        HOTPLUG_HANDLER(vms->acpi_dev),
>                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
>                        memmap[VIRT_ACPI_GED].base);
> -    } else {
> -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
>      }
> +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
>  
>      if (vms->acpi_dev) {
>          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      }
>  
>      acpi_dsdt_add_power_button(scope);
> +    acpi_dsdt_add_generic_event_device(scope);
>  #ifdef CONFIG_TPM
>      acpi_dsdt_add_tpm(scope, vms);
>  #endif
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index c99c8b1713c6..f81cf3a69961 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -997,6 +997,13 @@ static void create_rtc(const VirtMachineState *vms)
>  }
>  
>  static DeviceState *gpio_key_dev;
> +
> +static DeviceState *gpio_error_dev;
> +static void virt_set_error(void)
> +{
> +    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> +}
> +
>  static void virt_powerdown_req(Notifier *n, void *opaque)
>  {
>      VirtMachineState *s = container_of(n, VirtMachineState, powerdown_notifier);
> @@ -1015,6 +1022,9 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
>      gpio_key_dev = sysbus_create_simple("gpio-key", -1,
>                                          qdev_get_gpio_in(pl061_dev,
>                                                           GPIO_PIN_POWER_BUTTON));
> +    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
> +                                          qdev_get_gpio_in(pl061_dev,
> +                                                           GPIO_PIN_GENERIC_ERROR));
>  
>      qemu_fdt_add_subnode(fdt, "/gpio-keys");
>      qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> @@ -2385,9 +2395,8 @@ static void machvirt_init(MachineState *machine)
>  
>      if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
>          vms->acpi_dev = create_acpi_ged(vms);
> -    } else {
> -        create_gpio_devices(vms, VIRT_GPIO, sysmem);
>      }
> +    create_gpio_devices(vms, VIRT_GPIO, sysmem);
>  
>      if (vms->secure && !vmc->no_secure_gpio) {
>          create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
> @@ -3101,6 +3110,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->default_ram_id = "mach-virt.ram";
>      mc->default_nic = "virtio-net-pci";
>  
> +    mc->set_error = virt_set_error;
>      object_class_property_add(oc, "acpi", "OnOffAuto",
>          virt_get_acpi, virt_set_acpi,
>          NULL, NULL);
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a4d937ed45ac..c9769d7d4d7f 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -49,6 +49,7 @@
>  
>  /* GPIO pins */
>  #define GPIO_PIN_POWER_BUTTON  3
> +#define GPIO_PIN_GENERIC_ERROR 6
>  
>  enum {
>      VIRT_FLASH,
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index ef6f18f2c1a7..6cf01f3934ae 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -304,6 +304,7 @@ struct MachineClass {
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
>      int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
>      ram_addr_t (*fixup_ram_size)(ram_addr_t size);
> +    void (*set_error)(void);
>  };
>  
>  /**



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
  2024-07-25  9:48   ` Markus Armbruster
@ 2024-07-26 12:44   ` Jonathan Cameron via
  2024-07-29 11:40     ` Mauro Carvalho Chehab
  2024-07-30 11:17   ` Igor Mammedov
  2 siblings, 1 reply; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-26 12:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
	Eric Blake, Igor Mammedov, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:56 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> 1. Some GHES functions require handling addresses. Add a helper function
>    to support it.
> 
> 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
> 
> Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
> upper specs, using error type bit encoding as detailed at UEFI 2.9A
> errata.
> 
> Error injection examples:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['tlb-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['bus-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>       }
> }
> ...
> 
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> For Add a logic to handle block addresses,
# before comments I think?
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> For FW first ARM processor error injection,
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
I can't remember what I wrote in here so may well be commenting on
my past self ;)

> ---
>  configs/targets/aarch64-softmmu.mak |   1 +
>  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
>  hw/arm/Kconfig                      |   4 +
>  hw/arm/arm_error_inject.c           |  35 ++++
>  hw/arm/arm_error_inject_stubs.c     |  18 ++
>  hw/arm/meson.build                  |   3 +
>  include/hw/acpi/ghes.h              |   2 +
>  qapi/arm-error-inject.json          |  49 ++++++
>  qapi/meson.build                    |   1 +
>  qapi/qapi-schema.json               |   1 +
>  10 files changed, 361 insertions(+), 11 deletions(-)
>  create mode 100644 hw/arm/arm_error_inject.c
>  create mode 100644 hw/arm/arm_error_inject_stubs.c
>  create mode 100644 qapi/arm-error-inject.json
> 
> diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
> index 84cb32dc2f4f..b4b3cd97934a 100644
> --- a/configs/targets/aarch64-softmmu.mak
> +++ b/configs/targets/aarch64-softmmu.mak
> @@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
>  TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
>  # needed by boot.c
>  TARGET_NEED_FDT=y
> +CONFIG_ARM_EINJ=y
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 5b8bc6eeb437..6075ef5893ce 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -27,6 +27,7 @@
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "qemu/uuid.h"
> +#include "qapi/qapi-types-arm-error-inject.h"
>  
>  #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
>  #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> @@ -53,6 +54,12 @@
>  /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
>  #define ACPI_GHES_MEM_CPER_LENGTH           80
>  
> +/*
> + * ARM Processor section CPER size, UEFI 2.10: N.2.4.4
> + * ARM Processor Error Section
> + */
> +#define ACPI_GHES_ARM_CPER_LENGTH (72 + 600)
> +
>  /* Masks for block_status flags */
>  #define ACPI_GEBS_UNCORRECTABLE         1
>  
> @@ -231,6 +238,142 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>      return 0;
>  }
>  
> +/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
> +static void acpi_ghes_build_append_arm_cper(uint8_t error_types, GArray *table)
> +{
> +    /*
> +     * ARM Processor Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1ULL << 3) | /* Vendor specific info Valid */
> +                              (1ULL << 2) | /* Running status Valid */
> +                              (1ULL << 1) | /* Error affinity level Valid */
> +                              (1ULL << 0), /* MPIDR Valid */
> +                              4);
> +    /* Error Info Num */
> +    build_append_int_noprefix(table, 1, 2);
> +    /* Context Info Num */
> +    build_append_int_noprefix(table, 1, 2);
> +    /* Section length */
> +    build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_LENGTH, 4);
> +    /* Error affinity level */
> +    build_append_int_noprefix(table, 2, 1);
> +    /* Reserved */
> +    build_append_int_noprefix(table, 0, 3);
> +    /* MPIDR_EL1 */
> +    build_append_int_noprefix(table, 0xAB12, 8);

These need to be real - I see you fix that in later
patches, but I'd be tempted to pull it back here.  Or maybe just
add a comment to say you will rewrite this later.

I know you aren't keen to smash patches with different authorship
together, but here I think you should just have this
correct from the start (so combine this and 5-7)
perhaps with some links back to the version where they are split?

> +    /* MIDR_EL1 */
> +    build_append_int_noprefix(table, 0xCD24, 8);
> +    /* Running state */
> +    build_append_int_noprefix(table, 0x1, 4);
> +    /* PSCI state */
> +    build_append_int_noprefix(table, 0x1234, 4);
> +
> +    /* ARM Propcessor error information */
> +    /* Version */
> +    build_append_int_noprefix(table, 0, 1);
> +    /*  Length */
> +    build_append_int_noprefix(table, 32, 1);
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1ULL << 4) | /* Physical fault address Valid */

Some tabs hiding in here that need to be spaces.

> +                             (1ULL << 3) | /* Virtual fault address Valid */
> +                             (1ULL << 2) | /* Error information Valid */
> +                              (1ULL << 1) | /* Flags Valid */
> +                              (1ULL << 0), /* Multiple error count Valid */
> +                              2);
> +    /* Type */
> +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> +        build_append_int_noprefix(table, error_types, 1);
> +    } else {
> +        return;
> +    }
> +    /* Multiple error count */
> +    build_append_int_noprefix(table, 2, 2);
> +    /* Flags  */
> +    build_append_int_noprefix(table, 0xD, 1);
> +    /* Error information  */
> +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR)) {
> +        build_append_int_noprefix(table, 0x0091000F, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR)) {
> +        build_append_int_noprefix(table, 0x0054007F, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR)) {
> +        build_append_int_noprefix(table, 0x80D6460FFF, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> +        build_append_int_noprefix(table, 0x78DA03FF, 8);
> +    } else {
> +        return;
> +    }
> +    /* Virtual fault address  */
> +    build_append_int_noprefix(table, 0x67320230, 8);
> +    /* Physical fault address  */
> +    build_append_int_noprefix(table, 0x5CDFD492, 8);
> +
> +    /* ARM Propcessor error context information */
> +    /* Version */
> +    build_append_int_noprefix(table, 0, 2);
> +    /* Validation Bits */
> +    /* AArch64 EL1 context registers Valid */
> +    build_append_int_noprefix(table, 5, 2);
> +    /* Register array size */
> +    build_append_int_noprefix(table, 592, 4);
> +    /* Register array */
> +    build_append_int_noprefix(table, 0x12ABDE67, 8);
> +}
> +
> +static int acpi_ghes_record_arm_error(uint8_t error_types,
> +                                      uint64_t error_block_address)
> +{
> +    GArray *block;
> +
> +    /* ARM processor Error Section Type */
> +    const uint8_t uefi_cper_arm_sec[] =
> +          UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05, \
> +                  0x1D, 0x5D, 0x46, 0xB0);
> +
> +    /*
> +     * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> +     * Table 17-13 Generic Error Data Entry
> +     */
> +    QemuUUID fru_id = {};
> +    uint32_t data_length;
> +
> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* This is the length if adding a new generic error data entry*/

space before *

> +    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_ARM_CPER_LENGTH;
> +    /*
> +     * It should not run out of the preallocated memory if adding a new generic
> +     * error data entry
> +     */
> +    assert((data_length + ACPI_GHES_GESB_SIZE) <=
> +            ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> +        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> +
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
> +        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> +        ACPI_GHES_ARM_CPER_LENGTH, fru_id, 0);
> +
> +    /* Build the ARM processor error section CPER */
> +    acpi_ghes_build_append_arm_cper(error_types, block);
> +
> +    /* Write the generic error data entry into guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data, block->len);
> +
> +    g_array_free(block, true);
> +
> +    return 0;
> +}


> +bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
> +{
> +    int read_ack_register = 0;
> +    uint64_t read_ack_register_addr = 0;
> +    uint64_t error_block_addr = 0;
> +
> +    if (!ghes_get_addr(notify, &error_block_addr, &read_ack_register_addr)) {
> +        return false;
> +    }
> +
> +    cpu_physical_memory_read(read_ack_register_addr,
> +                             &read_ack_register, sizeof(uint64_t));

longer but I'd prefer sizeof(read_ack_register)
Maybe we can shorten to read_ack and read_ack_addr?

> +    /* zero means OSPM does not acknowledge the error */
> +    if (!read_ack_register) {
> +        error_report("Last time OSPM does not acknowledge the error,"
> +                     " record CPER failed this time, set the ack value to"
> +                     " avoid blocking next time CPER record! exit");
> +        read_ack_register = 1;
> +        cpu_physical_memory_write(read_ack_register_addr,
> +                                  &read_ack_register, sizeof(uint64_t));
sizeof(read_ack_register)

> +        return false;
> +    }
> +
> +    read_ack_register = cpu_to_le64(0);
> +    cpu_physical_memory_write(read_ack_register_addr,
> +                              &read_ack_register, sizeof(uint64_t));

sizeof(read_ack_register)

> +    return acpi_ghes_record_arm_error(error_types, error_block_addr);
> +}
> +

> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> new file mode 100644
> index 000000000000..430e6cea6b60
> --- /dev/null
> +++ b/qapi/arm-error-inject.json

> +##
> +# @arm-inject-error:
> +#
> +# Inject ARM Processor error.
> +#
> +# @errortypes: ARM processor error types to inject
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.1
Update to 9.2 on next version.
> +##
> +{ 'command': 'arm-inject-error',
> +  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> +  'features': [ 'unstable' ]
> +}



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-25  9:48   ` Markus Armbruster
@ 2024-07-26 12:46     ` Jonathan Cameron via
  2024-07-29 12:49       ` Mauro Carvalho Chehab
  2024-07-29 12:21     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-26 12:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Mauro Carvalho Chehab, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

A few quick replies from me.
I'm sure Mauro will add more info.

> > +           'tlb-error',
> > +           'bus-error',
> > +           'micro-arch-error']
> > +}
> > +
> > +##
> > +# @arm-inject-error:
> > +#
> > +# Inject ARM Processor error.
> > +#
> > +# @errortypes: ARM processor error types to inject
> > +#
> > +# Features:
> > +#
> > +# @unstable: This command is experimental.
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'command': 'arm-inject-error',
> > +  'data': { 'errortypes': ['ArmProcessorErrorType'] },  
> 
> Please separate words with dashes: 'error-types'.
> 
> > +  'features': [ 'unstable' ]
> > +}  
> 
> Is this used only with TARGET_ARM?
> 
> Why is being able to inject multiple error types at once useful?

It pokes a weird corner of the specification that I think previously 
tripped up Linux.

> 
> I'd expect at least some of these errors to come with additional
> information.  For instance, I imagine a bus error is associated with
> some address.

Absolutely agree that in sane case you wouldn't have multiple errors
but we want to hit the insane ones :(

There is only prevision for one set of data in the record despite
it providing a bitmap for the type of error.

> 
> If we encode the the error to inject as an enum value, adding more will
> be hard.
> 
> If we wrap the enum in a struct
> 
>     { 'struct': 'ArmProcessorError',
>       'data': { 'type': 'ArmProcessorErrorType' } }
> 
> we can later extend it like
> 
>     { 'union': 'ArmProcessorError',
>       'base: { 'type': 'ArmProcessorErrorType' }
>       'data': {
>           'bus-error': 'ArmProcessorBusErrorData' } }
> 
>     { 'struct': 'ArmProcessorBusErrorData',
>       'data': ... }
> 
> > diff --git a/qapi/meson.build b/qapi/meson.build
> > index e7bc54e5d047..5927932c4be3 100644
> > --- a/qapi/meson.build
> > +++ b/qapi/meson.build
> > @@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
> >  endif
> >  
> >  qapi_all_modules = [
> > +  'arm-error-inject',
> >    'authz',
> >    'block',
> >    'block-core',
> > diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> > index b1581988e4eb..479a22de7e43 100644
> > --- a/qapi/qapi-schema.json
> > +++ b/qapi/qapi-schema.json
> > @@ -81,3 +81,4 @@
> >  { 'include': 'vfio.json' }
> >  { 'include': 'cryptodev.json' }
> >  { 'include': 'cxl.json' }
> > +{ 'include': 'arm-error-inject.json' }  
> 
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 5/7] target/arm: preserve mpidr value
  2024-07-22  6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
@ 2024-07-26 12:50   ` Jonathan Cameron via
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-26 12:50 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:57 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There is a logic at helper to properly fill the mpidr information.
> This is needed for ARM Processor error injection, so store the
> value inside a cpu opaque value, to allow it to be used.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Seems reasonable to me, but not really my area of expertise.
FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  target/arm/cpu.h    |  1 +
>  target/arm/helper.c | 10 ++++++++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index a12859fc5335..d2e86f0877cc 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1033,6 +1033,7 @@ struct ArchCPU {
>          uint64_t reset_pmcr_el0;
>      } isar;
>      uint64_t midr;
> +    uint64_t mpidr;
>      uint32_t revidr;
>      uint32_t reset_fpsid;
>      uint64_t ctr;
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index ce319572354a..2432b5b09607 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -4692,7 +4692,7 @@ static uint64_t mpidr_read_val(CPUARMState *env)
>      return mpidr;
>  }
>  
> -static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
> +static uint64_t mpidr_read(CPUARMState *env)
>  {
>      unsigned int cur_el = arm_current_el(env);
>  
> @@ -4702,6 +4702,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
>      return mpidr_read_val(env);
>  }
>  
> +static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
> +{
> +    return mpidr_read(env);
> +}
> +
>  static const ARMCPRegInfo lpae_cp_reginfo[] = {
>      /* NOP AMAIR0/1 */
>      { .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
> @@ -9723,7 +9728,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>              { .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
>                .opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
>                .fgt = FGT_MPIDR_EL1,
> -              .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
> +              .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
>          };
>  #ifdef CONFIG_USER_ONLY
>          static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
> @@ -9733,6 +9738,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
>          modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
>  #endif
>          define_arm_cp_regs(cpu, mpidr_cp_reginfo);
> +        cpu->mpidr = mpidr_read(env);
>      }
>  
>      if (arm_feature(env, ARM_FEATURE_AUXCR)) {



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
  2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
  2024-07-25 10:03   ` Markus Armbruster
@ 2024-07-26 13:22   ` Jonathan Cameron via
  2024-07-29 11:10     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-26 13:22 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shiju Jose, Alex Bennée, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Beraldo Leal,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Markus Armbruster,
	Peter Maydell, Thomas Huth, Wainer dos Santos Moschetta,
	linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Enrich CPER error injection logic for ARM processor to allow
> setting values to  from UEFI 2.10 tables N.16 and N.17.
> 
> It should be noticed that, with such change, all arguments are
> now optional, so, once QMP is negotiated with:
> 
> 	{ "execute": "qmp_capabilities" }
> 
> the simplest way to generate a cache error is to use:
> 
> 	{ "execute": "arm-inject-error" }
> 
> Also, as now PEI is mapped into an array, it is possible to
> inject multiple errors at the same CPER record with:
> 
> 	{ "execute": "arm-inject-error", "arguments": {
> 	   "error": [ {"type": [ "cache-error" ]},
> 		      {"type": [ "tlb-error" ]} ] } }
> 
> This would generate both cache and TLB errors, using default
> values for other fields.
> 
> As all fields from ARM Processor CPER are now mapped, all
> types of CPER records can be generated with the new QAPI.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
If you are happy to smash this into patch 4 then also take ownership
of the result and change the author as I wrote almost none of the code
that ended up in the result as only the GHESv2 stuff was mind
even before you joined this effort - the rest was Shiju's

If you want, I'm fine with a co-developed on the result

Jonathan



> ---
>  hw/acpi/ghes.c                  | 168 +++++++-------
>  hw/arm/arm_error_inject.c       | 399 +++++++++++++++++++++++++++++++-
>  hw/arm/arm_error_inject_stubs.c |  20 +-
>  include/hw/acpi/ghes.h          |  40 +++-
>  qapi/arm-error-inject.json      | 250 +++++++++++++++++++-
>  tests/lcitool/libvirt-ci        |   2 +-
>  6 files changed, 778 insertions(+), 101 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index ebf1b812aaaa..afd1d098a7e3 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c

> +    build_append_int_noprefix(table, err.running_state, 4);
> +
> +    /* PSCI state: only valid when running state is zero  */
> +    build_append_int_noprefix(table, err.psci_state, 4);
> +
> +    for (i = 0; i < err.err_info_num; i++) {
> +        /* ARM Propcessor error information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 1);
> +
> +        /*  Length */
> +        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
> +
> +        /* Validation Bits */
> +        build_append_int_noprefix(table, err.pei[i].validation, 2);

Maybe drop some comments when the data being written makes it obvious?

> +
> +        /* Type */
> +        build_append_int_noprefix(table, err.pei[i].type, 1);
> +
> +        /* Multiple error count */
> +        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
> +
> +        /* Flags  */
> +        build_append_int_noprefix(table, err.pei[i].flags, 1);
> +
> +        /* Error information  */
> +        build_append_int_noprefix(table, err.pei[i].error_info, 8);
> +
> +        /* Virtual fault address  */
> +        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
> +
> +        /* Physical fault address  */
> +        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
> +    }
> +
> +    for (i = 0; i < err.context_info_num; i++) {
> +        /* ARM Propcessor error context information */
> +        /* Version */
> +        build_append_int_noprefix(table, 0, 2);
> +
> +        /* Validation type */
> +        build_append_int_noprefix(table, err.context[i].type, 2);
> +
> +        /* Register array size */
> +        build_append_int_noprefix(table, err.context[i].size * 8, 4);
> +
> +        /* Register array (byte 8 of Context info) */
> +        for (j = 0; j < err.context[i].size; j++) {
> +            build_append_int_noprefix(table, err.context[i].array[j], 8);
> +        }
>      }

> diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
> index 1da97d5d4fdc..67f1c77546b9 100644
> --- a/hw/arm/arm_error_inject.c
> +++ b/hw/arm/arm_error_inject.c
> @@ -10,23 +10,408 @@

> +
> +/* Handle ARM Context */
> +static ArmContext *qmp_arm_context(uint16_t *context_info_num,
> +                                   uint32_t *context_length,
> +                                   bool has_context,
> +                                   ArmProcessorContextList const *context_list)
> +{
> +    ArmProcessorContextList const *next;
> +    ArmContext *context = NULL;
> +    uint16_t i, j, num, default_type;
> +
> +    default_type = get_default_context_type();
> +
> +    if (!has_context) {
> +        *context_info_num = 0;
> +        *context_length = 0;
> +
> +        return NULL;
> +    }
> +
> +    /* Calculate sizes */
> +    num = 0;
> +    for (next = context_list; next; next = next->next) {
> +        uint32_t n_regs = 0;
> +
> +        if (next->value->has_q_register) {
> +            uint64List *reg = next->value->q_register;
> +
> +            while (reg) {
> +                n_regs++;
> +                reg = reg->next;
> +            }
> +
> +            if (next->value->has_minimal_size &&
> +                                        next->value->minimal_size < n_regs) {
I'd align just after (

> +
> +static uint8_t *qmp_arm_vendor(uint32_t *vendor_num, bool has_vendor_specific,
> +                               uint8List const *vendor_specific_list)
> +{
> +    uint8List const *next = vendor_specific_list;
> +    uint8_t *vendor = NULL, *p;

vendor always set before use.

> +
> +    if (!has_vendor_specific) {
> +        return NULL;
> +    }
> +
> +    *vendor_num = 0;
> +
> +    while (next) {
> +        next = next->next;
> +        (*vendor_num)++;
> +    }
> +
> +    vendor = g_malloc(*vendor_num);
> +
> +    p = vendor;
> +    next = vendor_specific_list;
> +    while (next) {
> +        *p = next->value;
> +        next = next->next;
> +        p++;
> +    }
> +
> +    return vendor;
> +}
> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> index 430e6cea6b60..2a314830fe60 100644
> --- a/qapi/arm-error-inject.json
> +++ b/qapi/arm-error-inject.json

> +
> +##
> +# @ArmProcessorErrorInformation:
> +#
> +# Contains ARM processor error information (PEI) data according with UEFI
> +# CPER table N.17.
> +#
> +# @validation:
> +#       Valid validation bits for error-info section.
> +#       Argument is optional. If not specified, those flags will be enabled:
> +#       first-error-cap and propagated.
> +#
> +# @type:
> +#       ARM processor error types to inject. Argument is mandatory.
> +#
> +# @multiple-error:
> +#       Indicates whether multiple errors have occurred.
> +#       Argument is optional. If not specified and @validation not enforced,

forced probably rather than enforced.

> +#       this field will be marked as invalid at CPER record..
. only

Good to mention the odd encoding of 0 = single error, 1 = multiple (lost count)
2+ = actual count of errors

> +#
> +# @flags:
> +#       Indicates flags that describe the error attributes.
> +#       Argument is optional. If not specified and defaults to
> +#       first-error and propagated.
> +#
> +# @error-info:
> +#       Error information structure is specific to each error type.
> +#       Argument is optional, and its value depends on the PEI type(s).
> +#       If not defined, the default depends on the type:
> +#       - for cache-error: 0x0091000F;
> +#       - for tlb-error: 0x0054007F;
> +#       - for bus-error: 0x80D6460FFF;
> +#       - for micro-arch-error: 0x78DA03FF;
> +#       - if multiple types used, this bit is disabled from @validation bits.
> +#
> +# @virt-addr:
> +#       Virtual fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# @phy-addr:
> +#       Physical fault address associated with the error.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record..
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorErrorInformation',
> +  'data': { '*validation': ['ArmPeiValidationBits'],
> +            'type': ['ArmProcessorErrorType'],
> +            '*multiple-error': 'uint16',
> +            '*flags': ['ArmProcessorFlags'],
> +            '*error-info': 'uint64',
> +            '*virt-addr':  'uint64',
> +            '*phy-addr': 'uint64'}
> +}
> +
> +##
> +# @ArmProcessorContext:
> +#
> +# Provide processor context state specific to the ARM processor architecture,
> +# According with UEFI 2.10 CPER table N.21.
> +# Argument is optional.If not specified, no context will be used.
                          ^ space
> +#
> +# @type:
> +#       Contains an integer value indicating the type of context state being
> +#       reported.
> +#       Argument is optional. If not defined, it will be set to be EL1 register
> +#       for the emulation, e. g.:
> +#       - on arm32: AArch32 EL1 context registers;
> +#       - on arm64: AArch64 EL1 context registers.
> +#
> +# @register:
> +#       Provides the contents of the actual registers or raw data, depending
> +#       on the context type.
> +#       Argument is optional. If not defined, it will fill the first register
> +#       with 0xDEADBEEF, and the other ones with zero.
We could fill this in with a valid snap shot I think?  It' just a set of CPU registers.
Obviously content would be pretty random and meaningless given the
error isn't correlated with particular activity (as we triggered it) but maybe would
useful for testing the parsing?

Perhaps that's a job for the future as we will want to be able to override it
anyway.

> +#
> +# @minimal-size:
> +#       Argument is optional. If provided, define the minimal size of the
> +#       context register array. The actual size is defined by checking the
> +#       number of register values plus the content of this field (if used),
> +#       ensuring that each processor context information structure array is
> +#       padded with zeros if the size is not a multiple of 16 bytes.
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'ArmProcessorContext',
> +  'data': { '*type': 'uint16',
> +            '*minimal-size': 'uint32',
> +            '*register': ['uint64']}
>  }
>  
>  ##
>  # @arm-inject-error:
>  #
> -# Inject ARM Processor error.
> +# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
> +# CPER table N.16.
>  #
> -# @errortypes: ARM processor error types to inject
> +# @validation:
> +#       Valid validation bits for ARM processor CPER.
> +#       Argument is optional. If not specified, the default is
> +#       calculated based on having the corresponding arguments filled.
> +#
> +# @affinity-level:
> +#       Error affinity level for errors that can be attributed to a specific
> +#       affinity level.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.
As below.

> +#
> +# @mpidr-el1:
> +#       Processor’s unique ID in the system.
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.
The zero case is obscure enough I'd be tempted to say that if we want
to test that then we will override the validation field.

The logic will end up simpler and still allow the same level of corner
case testing for no valid mpidr (which is really odd if it occurs!)

> +#
> +# @midr-el1:  Identification info of the chip
> +#       Argument is optional. If not specified, it will use the cpu mpidr
> +#       field from the emulation data. If zero and @validation is not
> +#       enforced, this field will be marked as invalid at CPER record.

Same as above.

> +#
> +# @running-state:
> +#       Indicates the running state of the processor.
> +#       Argument is optional. If not specified and @validation not enforced,
> +#       this field will be marked as invalid at CPER record.

Fun corners of the spec.  Can't allow bit0 of this and psci-state.
We should perhaps enforce that? I don't think we need to inject completely
invalid states (just corners of what is valid).

> +#
> +# @psci-state:
> +#       Provides PSCI state of the processor, as defined in ARM PSCI document.
> +#       Argument is optional. If not specified, it will use the cpu power
> +#       state field from the emulation data.
Hmm. Do you think validation is meant to cover this? Is it under running-state?

> +#
> +# @context:
> +#       Contains an array of processor context registers.
> +#       Argument is optional. If not specified, no context will be added.
> +#
> +# @vendor-specific:
> +#       Contains a byte array of vendor-specific data.
> +#       Argument is optional. If not specified, no vendor-specific data
> +#       will be added.
> +#
> +# @error:
> +#       Contains an array of ARM processor error information (PEI) sections.
> +#       Argument is optional. If not specified, defaults to a single
> +#       Program Error Information record defaulting to type=cache-error.
>  #
>  # Features:
>  #
> @@ -44,6 +262,16 @@
>  # Since: 9.1
>  ##
>  { 'command': 'arm-inject-error',
> -  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> +  'data': {
> +    '*validation': ['ArmProcessorValidationBits'],
> +    '*affinity-level': 'uint8',
> +    '*mpidr-el1': 'uint64',
> +    '*midr-el1': 'uint64',
> +    '*running-state':  ['ArmProcessorRunningState'],
> +    '*psci-state': 'uint32',
> +    '*context': ['ArmProcessorContext'],
> +    '*vendor-specific': ['uint8'],
> +    '*error': ['ArmProcessorErrorInformation']
> +  },
>    'features': [ 'unstable' ]
>  }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
  2024-07-26 13:22   ` Jonathan Cameron via
@ 2024-07-29 11:10     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 11:10 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shiju Jose, Alex Bennée, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Beraldo Leal,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Markus Armbruster,
	Peter Maydell, Thomas Huth, Wainer dos Santos Moschetta,
	linux-kernel, qemu-arm, qemu-devel

Em Fri, 26 Jul 2024 14:22:25 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> On Mon, 22 Jul 2024 08:45:59 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Enrich CPER error injection logic for ARM processor to allow
> > setting values to  from UEFI 2.10 tables N.16 and N.17.
> > 
> > It should be noticed that, with such change, all arguments are
> > now optional, so, once QMP is negotiated with:
> > 
> > 	{ "execute": "qmp_capabilities" }
> > 
> > the simplest way to generate a cache error is to use:
> > 
> > 	{ "execute": "arm-inject-error" }
> > 
> > Also, as now PEI is mapped into an array, it is possible to
> > inject multiple errors at the same CPER record with:
> > 
> > 	{ "execute": "arm-inject-error", "arguments": {
> > 	   "error": [ {"type": [ "cache-error" ]},
> > 		      {"type": [ "tlb-error" ]} ] } }
> > 
> > This would generate both cache and TLB errors, using default
> > values for other fields.
> > 
> > As all fields from ARM Processor CPER are now mapped, all
> > types of CPER records can be generated with the new QAPI.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> If you are happy to smash this into patch 4 then also take ownership
> of the result and change the author as I wrote almost none of the code
> that ended up in the result as only the GHESv2 stuff was mind
> even before you joined this effort - the rest was Shiju's
> 
> If you want, I'm fine with a co-developed on the result

Ok, I'll fold both patches.

> > ---
> >  hw/acpi/ghes.c                  | 168 +++++++-------
> >  hw/arm/arm_error_inject.c       | 399 +++++++++++++++++++++++++++++++-
> >  hw/arm/arm_error_inject_stubs.c |  20 +-
> >  include/hw/acpi/ghes.h          |  40 +++-
> >  qapi/arm-error-inject.json      | 250 +++++++++++++++++++-
> >  tests/lcitool/libvirt-ci        |   2 +-
> >  6 files changed, 778 insertions(+), 101 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index ebf1b812aaaa..afd1d098a7e3 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c  
> 
> > +    build_append_int_noprefix(table, err.running_state, 4);
> > +
> > +    /* PSCI state: only valid when running state is zero  */
> > +    build_append_int_noprefix(table, err.psci_state, 4);
> > +
> > +    for (i = 0; i < err.err_info_num; i++) {
> > +        /* ARM Propcessor error information */
> > +        /* Version */
> > +        build_append_int_noprefix(table, 0, 1);
> > +
> > +        /*  Length */
> > +        build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1);
> > +
> > +        /* Validation Bits */
> > +        build_append_int_noprefix(table, err.pei[i].validation, 2);  
> 
> Maybe drop some comments when the data being written makes it obvious?

The current change preserves the already existing coding style of ghes.c. 
See, for instance acpi_ghes_generic_error_data(). There, except for 
revision, all other comments are obvious. From my side, I prefer to have 
this patch preserving the same style.

Yet, if agreed, we can later send a cleanup patch dropping all obvious
comments at once.

> > +
> > +        /* Type */
> > +        build_append_int_noprefix(table, err.pei[i].type, 1);
> > +
> > +        /* Multiple error count */
> > +        build_append_int_noprefix(table, err.pei[i].multiple_error, 2);
> > +
> > +        /* Flags  */
> > +        build_append_int_noprefix(table, err.pei[i].flags, 1);
> > +
> > +        /* Error information  */
> > +        build_append_int_noprefix(table, err.pei[i].error_info, 8);
> > +
> > +        /* Virtual fault address  */
> > +        build_append_int_noprefix(table, err.pei[i].virt_addr, 8);
> > +
> > +        /* Physical fault address  */
> > +        build_append_int_noprefix(table, err.pei[i].phy_addr, 8);
> > +    }
> > +
> > +    for (i = 0; i < err.context_info_num; i++) {
> > +        /* ARM Propcessor error context information */
> > +        /* Version */
> > +        build_append_int_noprefix(table, 0, 2);
> > +
> > +        /* Validation type */
> > +        build_append_int_noprefix(table, err.context[i].type, 2);
> > +
> > +        /* Register array size */
> > +        build_append_int_noprefix(table, err.context[i].size * 8, 4);
> > +
> > +        /* Register array (byte 8 of Context info) */
> > +        for (j = 0; j < err.context[i].size; j++) {
> > +            build_append_int_noprefix(table, err.context[i].array[j], 8);
> > +        }
> >      }  
> 
> > diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
> > index 1da97d5d4fdc..67f1c77546b9 100644
> > --- a/hw/arm/arm_error_inject.c
> > +++ b/hw/arm/arm_error_inject.c
> > @@ -10,23 +10,408 @@  
> 
> > +
> > +/* Handle ARM Context */
> > +static ArmContext *qmp_arm_context(uint16_t *context_info_num,
> > +                                   uint32_t *context_length,
> > +                                   bool has_context,
> > +                                   ArmProcessorContextList const *context_list)
> > +{
> > +    ArmProcessorContextList const *next;
> > +    ArmContext *context = NULL;
> > +    uint16_t i, j, num, default_type;
> > +
> > +    default_type = get_default_context_type();
> > +
> > +    if (!has_context) {
> > +        *context_info_num = 0;
> > +        *context_length = 0;
> > +
> > +        return NULL;
> > +    }
> > +
> > +    /* Calculate sizes */
> > +    num = 0;
> > +    for (next = context_list; next; next = next->next) {
> > +        uint32_t n_regs = 0;
> > +
> > +        if (next->value->has_q_register) {
> > +            uint64List *reg = next->value->q_register;
> > +
> > +            while (reg) {
> > +                n_regs++;
> > +                reg = reg->next;
> > +            }
> > +
> > +            if (next->value->has_minimal_size &&
> > +                                        next->value->minimal_size < n_regs) {  
> I'd align just after (
> > +
> > +static uint8_t *qmp_arm_vendor(uint32_t *vendor_num, bool has_vendor_specific,
> > +                               uint8List const *vendor_specific_list)
> > +{
> > +    uint8List const *next = vendor_specific_list;
> > +    uint8_t *vendor = NULL, *p;  
> 
> vendor always set before use.

True. Will remove the assignment to NULL.

> > +
> > +    if (!has_vendor_specific) {
> > +        return NULL;
> > +    }
> > +
> > +    *vendor_num = 0;
> > +
> > +    while (next) {
> > +        next = next->next;
> > +        (*vendor_num)++;
> > +    }
> > +
> > +    vendor = g_malloc(*vendor_num);
> > +
> > +    p = vendor;
> > +    next = vendor_specific_list;
> > +    while (next) {
> > +        *p = next->value;
> > +        next = next->next;
> > +        p++;
> > +    }
> > +
> > +    return vendor;
> > +}
> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> > index 430e6cea6b60..2a314830fe60 100644
> > --- a/qapi/arm-error-inject.json
> > +++ b/qapi/arm-error-inject.json  
> 
> > +
> > +##
> > +# @ArmProcessorErrorInformation:
> > +#
> > +# Contains ARM processor error information (PEI) data according with UEFI
> > +# CPER table N.17.
> > +#
> > +# @validation:
> > +#       Valid validation bits for error-info section.
> > +#       Argument is optional. If not specified, those flags will be enabled:
> > +#       first-error-cap and propagated.
> > +#
> > +# @type:
> > +#       ARM processor error types to inject. Argument is mandatory.
> > +#
> > +# @multiple-error:
> > +#       Indicates whether multiple errors have occurred.
> > +#       Argument is optional. If not specified and @validation not enforced,  
> 
> forced probably rather than enforced.

Changed everywhere at this patch.

> 
> > +#       this field will be marked as invalid at CPER record..  
> . only
> 
> Good to mention the odd encoding of 0 = single error, 1 = multiple (lost count)
> 2+ = actual count of errors

Added.

> 
> > +#
> > +# @flags:
> > +#       Indicates flags that describe the error attributes.
> > +#       Argument is optional. If not specified and defaults to
> > +#       first-error and propagated.
> > +#
> > +# @error-info:
> > +#       Error information structure is specific to each error type.
> > +#       Argument is optional, and its value depends on the PEI type(s).
> > +#       If not defined, the default depends on the type:
> > +#       - for cache-error: 0x0091000F;
> > +#       - for tlb-error: 0x0054007F;
> > +#       - for bus-error: 0x80D6460FFF;
> > +#       - for micro-arch-error: 0x78DA03FF;
> > +#       - if multiple types used, this bit is disabled from @validation bits.
> > +#
> > +# @virt-addr:
> > +#       Virtual fault address associated with the error.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record..
> > +#
> > +# @phy-addr:
> > +#       Physical fault address associated with the error.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record..
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'ArmProcessorErrorInformation',
> > +  'data': { '*validation': ['ArmPeiValidationBits'],
> > +            'type': ['ArmProcessorErrorType'],
> > +            '*multiple-error': 'uint16',
> > +            '*flags': ['ArmProcessorFlags'],
> > +            '*error-info': 'uint64',
> > +            '*virt-addr':  'uint64',
> > +            '*phy-addr': 'uint64'}
> > +}
> > +
> > +##
> > +# @ArmProcessorContext:
> > +#
> > +# Provide processor context state specific to the ARM processor architecture,
> > +# According with UEFI 2.10 CPER table N.21.
> > +# Argument is optional.If not specified, no context will be used.  
>                           ^ space
> > +#
> > +# @type:
> > +#       Contains an integer value indicating the type of context state being
> > +#       reported.
> > +#       Argument is optional. If not defined, it will be set to be EL1 register
> > +#       for the emulation, e. g.:
> > +#       - on arm32: AArch32 EL1 context registers;
> > +#       - on arm64: AArch64 EL1 context registers.
> > +#
> > +# @register:
> > +#       Provides the contents of the actual registers or raw data, depending
> > +#       on the context type.
> > +#       Argument is optional. If not defined, it will fill the first register
> > +#       with 0xDEADBEEF, and the other ones with zero.  
> We could fill this in with a valid snap shot I think?  It' just a set of CPU registers.
> Obviously content would be pretty random and meaningless given the
> error isn't correlated with particular activity (as we triggered it) but maybe would
> useful for testing the parsing?

I considered this as well, but the goal of having a default context set
is just to check if OS is properly receiving/handling such data.

If we use an EL1 context register dump, instead of a fixed default:

1. the values will be pretty much random, and, as you said, not related
   with a real issue - so probably the values will be bogus anyway;
2. there won't be an easy way to identify if the OS is handling it
   the right way, as there won't be any way to associate the value sent
   to BIOS/Kernel with an expected behavior. With a fixed value, one can 
   check if 0xDEADBEEF is the first thing that happens at the context
   dump;
3. If one wants to simulate a real hardware error, he can instead send a
   proper set of register values;
4. the default to report EL1 context register may not be what it is
   wanted for tests.

> Perhaps that's a job for the future as we will want to be able to override it
> anyway.

This can already be overridden:

	$ qemu_einj.py arm --ctx-array 0xffee,0xdeadbeef,0xabbabaa,0x0,0xbeafdeed
		 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2620-g8e5b224ee328-dirty"}, "capabilities": ["oob"]}}
	{ "execute": "qmp_capabilities" } 
		 {"return": {}}
	{ "execute": "arm-inject-error", "arguments": {"error": [{"type": ["cache-error"]}], "context": [{"register": [65518, 3735928559, 180071338, 0, 3199196909]}]} }

[   52.044695] {3}[Hardware Error]:     PC is imprecise
[   52.044909] {3}[Hardware Error]:   Context info structure 0:
[   52.045147] {3}[Hardware Error]:    register context type: AArch64 EL1 context registers
[   52.045500] {3}[Hardware Error]:    00000000: 0000ffee 00000000 deadbeef 00000000
[   52.045866] {3}[Hardware Error]:    00000010: 0abbabaa 00000000 00000000 00000000
[   52.046184] {3}[Hardware Error]:    00000020: beafdeed 00000000 00000000 00000000

And other types of register dump can also be set, like:

	$ qemu_einj.py arm --ctx-array 0xffee,0xdeadbeef,0xabbabaa,0x0,0xbeafdeed --ctx-type 7
		 {"QMP": {"version": {"qemu": {"micro": 50, "minor": 0, "major": 9}, "package": "v9.0.0-2620-g8e5b224ee328-dirty"}, "capabilities": ["oob"]}}
	{ "execute": "qmp_capabilities" } 
		 {"return": {}}
	{ "execute": "arm-inject-error", "arguments": {"error": [{"type": ["cache-error"]}], "context": [{"type": 7, "register": [65518, 3735928559, 180071338, 0, 3199196909]}]} }
		 {"return": {}}

[  172.693339] {4}[Hardware Error]:   Context info structure 0:
[  172.693643] {4}[Hardware Error]:    register context type: AArch64 EL3 context registers
[  172.694050] {4}[Hardware Error]:    00000000: 0000ffee 00000000 deadbeef 00000000
[  172.694445] {4}[Hardware Error]:    00000010: 0abbabaa 00000000 00000000 00000000
[  172.694859] {4}[Hardware Error]:    00000020: beafdeed 00000000 00000000 00000000

So, one can replicate any context needed - preferably reproducing a real
real error condition that happened on some hardware.

> 
> > +#
> > +# @minimal-size:
> > +#       Argument is optional. If provided, define the minimal size of the
> > +#       context register array. The actual size is defined by checking the
> > +#       number of register values plus the content of this field (if used),
> > +#       ensuring that each processor context information structure array is
> > +#       padded with zeros if the size is not a multiple of 16 bytes.
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'ArmProcessorContext',
> > +  'data': { '*type': 'uint16',
> > +            '*minimal-size': 'uint32',
> > +            '*register': ['uint64']}
> >  }
> >  
> >  ##
> >  # @arm-inject-error:
> >  #
> > -# Inject ARM Processor error.
> > +# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
> > +# CPER table N.16.
> >  #
> > -# @errortypes: ARM processor error types to inject
> > +# @validation:
> > +#       Valid validation bits for ARM processor CPER.
> > +#       Argument is optional. If not specified, the default is
> > +#       calculated based on having the corresponding arguments filled.
> > +#
> > +# @affinity-level:
> > +#       Error affinity level for errors that can be attributed to a specific
> > +#       affinity level.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record.  
> As below.
> 
> > +#
> > +# @mpidr-el1:
> > +#       Processor’s unique ID in the system.
> > +#       Argument is optional. If not specified, it will use the cpu mpidr
> > +#       field from the emulation data. If zero and @validation is not
> > +#       enforced, this field will be marked as invalid at CPER record.  
> The zero case is obscure enough I'd be tempted to say that if we want
> to test that then we will override the validation field.
>
> The logic will end up simpler and still allow the same level of corner
> case testing for no valid mpidr (which is really odd if it occurs!)

The zero case may happen if the MPIDR CPU field inside the emulation is
not properly by QEMU arm32/64 specific machine. 

I opted to make it explicit, as this happened to me on some of my
tests. So I ended adding the check and this comment.

> > +#
> > +# @midr-el1:  Identification info of the chip
> > +#       Argument is optional. If not specified, it will use the cpu mpidr
> > +#       field from the emulation data. If zero and @validation is not
> > +#       enforced, this field will be marked as invalid at CPER record.  
> 
> Same as above.
> 
> > +#
> > +# @running-state:
> > +#       Indicates the running state of the processor.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record.  
> 
> Fun corners of the spec.  Can't allow bit0 of this and psci-state.
> We should perhaps enforce that? I don't think we need to inject completely
> invalid states (just corners of what is valid).

The logic there is already checking it, filling psci-state only if
the bit is not set.

> 
> > +#
> > +# @psci-state:
> > +#       Provides PSCI state of the processor, as defined in ARM PSCI document.
> > +#       Argument is optional. If not specified, it will use the cpu power
> > +#       state field from the emulation data.  
> Hmm. Do you think validation is meant to cover this? Is it under running-state?

IMO, the right filling of it according with the spec should be enforced,
with this logic:

    if (!has_psci_state) {
        psci_state = armcpu->power_state;
    }

    if (has_running_state) {
        while (running_state_list) {
            running_state |= BIT(running_state_list->value);
            running_state_list = running_state_list;
        }

        if (running_state) {
            psci_state = 0;
        }
    }

    error.psci_state = psci_state;

    if (has_validation) {
...
    } else {
...
        if (running_state) {
            validation |= BIT(ARM_PROCESSOR_VALIDATION_BITS_RUNNING_STATE_VALID);
        }
    }

E.g.:

- if running_state is enforced, psci_state will be zero;
- if not enforced:
	- if not defined: use CPU-defined power_state;
	- otherwise, use the value passed via QMP.

I'm changing the code to do the above, as there were some errors at the
check logic.

> 
> > +#
> > +# @context:
> > +#       Contains an array of processor context registers.
> > +#       Argument is optional. If not specified, no context will be added.
> > +#
> > +# @vendor-specific:
> > +#       Contains a byte array of vendor-specific data.
> > +#       Argument is optional. If not specified, no vendor-specific data
> > +#       will be added.
> > +#
> > +# @error:
> > +#       Contains an array of ARM processor error information (PEI) sections.
> > +#       Argument is optional. If not specified, defaults to a single
> > +#       Program Error Information record defaulting to type=cache-error.
> >  #
> >  # Features:
> >  #
> > @@ -44,6 +262,16 @@
> >  # Since: 9.1
> >  ##
> >  { 'command': 'arm-inject-error',
> > -  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> > +  'data': {
> > +    '*validation': ['ArmProcessorValidationBits'],
> > +    '*affinity-level': 'uint8',
> > +    '*mpidr-el1': 'uint64',
> > +    '*midr-el1': 'uint64',
> > +    '*running-state':  ['ArmProcessorRunningState'],
> > +    '*psci-state': 'uint32',
> > +    '*context': ['ArmProcessorContext'],
> > +    '*vendor-specific': ['uint8'],
> > +    '*error': ['ArmProcessorErrorInformation']
> > +  },
> >    'features': [ 'unstable' ]
> >  }  

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 7/7] acpi/ghes: extend arm error injection logic
  2024-07-25 10:03   ` Markus Armbruster
@ 2024-07-29 11:18     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 11:18 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Jonathan Cameron, Shiju Jose, Alex Bennée,
	Michael S. Tsirkin, Philippe Mathieu-Daudé, Ani Sinha,
	Beraldo Leal, Dongjiu Geng, Eric Blake, Igor Mammedov,
	Peter Maydell, Thomas Huth, Wainer dos Santos Moschetta,
	linux-kernel, qemu-arm, qemu-devel

Em Thu, 25 Jul 2024 12:03:46 +0200
Markus Armbruster <armbru@redhat.com> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Enrich CPER error injection logic for ARM processor to allow
> > setting values to  from UEFI 2.10 tables N.16 and N.17.
> >
> > It should be noticed that, with such change, all arguments are
> > now optional, so, once QMP is negotiated with:
> >
> > 	{ "execute": "qmp_capabilities" }
> >
> > the simplest way to generate a cache error is to use:
> >
> > 	{ "execute": "arm-inject-error" }
> >
> > Also, as now PEI is mapped into an array, it is possible to
> > inject multiple errors at the same CPER record with:
> >
> > 	{ "execute": "arm-inject-error", "arguments": {
> > 	   "error": [ {"type": [ "cache-error" ]},
> > 		      {"type": [ "tlb-error" ]} ] } }
> >
> > This would generate both cache and TLB errors, using default
> > values for other fields.
> >
> > As all fields from ARM Processor CPER are now mapped, all
> > types of CPER records can be generated with the new QAPI.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> 
> [...]
> 
> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> > index 430e6cea6b60..2a314830fe60 100644
> > --- a/qapi/arm-error-inject.json
> > +++ b/qapi/arm-error-inject.json
> > @@ -2,40 +2,258 @@
> >  # vim: filetype=python
> >  
> >  ##
> > -# = ARM Processor Errors
> > +# = ARM Processor Errors as defined at:
> > +# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
> > +# See tables N.16, N.17 and N.21.
> >  ##  
> 
> This comes out badly in HTML.
> 
> Try something like
> 
>    # = ARM Processor Errors
>    #
>    # These are defined at
>    # https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
>    # See tables N.16, N.17 and N.21.

Ok. I double-checked the results of both manpage and html on the
version I'm about to submit. The parsed macros should now be OK
for both.

> 
> If any part of this is relevant in PATCH 4 already, squash the relevant
> parts into that patch please.

Ok. I ended squashing patch 7 with patch 4.

> 
> >  
> > +##
> > +# @ArmProcessorValidationBits:
> > +#
> > +# Indcates whether or not fields of ARM processor CPER record are valid.  
> 
> docs/devel/qapi-code-gen.rst section "Documentation markup":
> 
>     For legibility, wrap text paragraphs so every line is at most 70
>     characters long.

Ok.

> > +#
> > +# @mpidr-valid:  MPIDR Valid
> > +#
> > +# @affinity-valid: Error affinity level Valid
> > +#
> > +# @running-state-valid: Running State
> > +#
> > +# @vendor-specific-valid: Vendor Specific Info Valid
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'ArmProcessorValidationBits',
> > +  'data': ['mpidr-valid',
> > +           'affinity-valid',
> > +           'running-state-valid',
> > +           'vendor-specific-valid']
> > +}
> > +
> > +##
> > +# @ArmProcessorFlags:
> > +#
> > +# Indicates error attributes at the Error info section.
> > +#
> > +# @first-error-cap: First error captured
> > +#
> > +# @last-error-cap:  Last error captured
> > +#
> > +# @propagated: Propagated
> > +#
> > +# @overflow: Overflow
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'ArmProcessorFlags',
> > +  'data': ['first-error-cap',
> > +           'last-error-cap',
> > +           'propagated',
> > +           'overflow']
> > +}
> > +
> > +##
> > +# @ArmProcessorRunningState:
> > +#
> > +# Indicates if the processor is running.
> > +#
> > +# @processor-running: indicates that the processor is running
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'ArmProcessorRunningState',
> > +  'data': ['processor-running']
> > +}
> > +
> >  ##
> >  # @ArmProcessorErrorType:
> >  #
> > -# Type of ARM processor error to inject
> > -#
> > -# @unknown-error: Unknown error
> > +# Type of ARM processor error information to inject.
> >  #
> >  # @cache-error: Cache error
> >  #
> >  # @tlb-error: TLB error
> >  #
> > -# @bus-error: Bus error.
> > +# @bus-error: Bus error
> >  #
> > -# @micro-arch-error: Micro architectural error.
> > +# @micro-arch-error: Micro architectural error
> >  #
> >  # Since: 9.1
> >  ##
> >  { 'enum': 'ArmProcessorErrorType',
> > -  'data': ['unknown-error',
> > -	   'cache-error',
> > +  'data': ['cache-error',
> >             'tlb-error',
> >             'bus-error',
> >             'micro-arch-error']
> > + }  
> 
> Squash the changes to this type into PATCH 4, please.

Ok.

> > +
> > +##
> > +# @ArmPeiValidationBits:
> > +#
> > +# Indcates whether or not fields of Processor Error Info section are valid.
> > +#
> > +# @multiple-error-valid: Information at multiple-error field is valid
> > +#
> > +# @flags-valid: Information at flags field is valid
> > +#
> > +# @error-info-valid: Information at error-info field is valid
> > +#
> > +# @virt-addr-valid: Information at virt-addr field is valid
> > +#
> > +# @phy-addr-valid: Information at phy-addr field is valid
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'ArmPeiValidationBits',
> > +  'data': ['multiple-error-valid',
> > +           'flags-valid',
> > +           'error-info-valid',
> > +           'virt-addr-valid',
> > +           'phy-addr-valid']
> > +}
> > +
> > +##
> > +# @ArmProcessorErrorInformation:
> > +#
> > +# Contains ARM processor error information (PEI) data according with UEFI
> > +# CPER table N.17.
> > +#
> > +# @validation:
> > +#       Valid validation bits for error-info section.
> > +#       Argument is optional. If not specified, those flags will be enabled:
> > +#       first-error-cap and propagated.  
> 
> Please format like this for consistency:
> 
>    # @validation: Valid validation bits for error-info section.
>    #     Argument is optional.  If not specified, those flags will be
>    #     enabled: first-error-cap and propagated.

Ok.

> 
> > +#
> > +# @type:
> > +#       ARM processor error types to inject. Argument is mandatory.
> > +#
> > +# @multiple-error:
> > +#       Indicates whether multiple errors have occurred.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record..
> > +#
> > +# @flags:
> > +#       Indicates flags that describe the error attributes.
> > +#       Argument is optional. If not specified and defaults to
> > +#       first-error and propagated.
> > +#
> > +# @error-info:
> > +#       Error information structure is specific to each error type.
> > +#       Argument is optional, and its value depends on the PEI type(s).
> > +#       If not defined, the default depends on the type:
> > +#       - for cache-error: 0x0091000F;
> > +#       - for tlb-error: 0x0054007F;
> > +#       - for bus-error: 0x80D6460FFF;
> > +#       - for micro-arch-error: 0x78DA03FF;
> > +#       - if multiple types used, this bit is disabled from @validation bits.
> > +#
> > +# @virt-addr:
> > +#       Virtual fault address associated with the error.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record..
> > +#
> > +# @phy-addr:
> > +#       Physical fault address associated with the error.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record..
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'ArmProcessorErrorInformation',
> > +  'data': { '*validation': ['ArmPeiValidationBits'],
> > +            'type': ['ArmProcessorErrorType'],
> > +            '*multiple-error': 'uint16',
> > +            '*flags': ['ArmProcessorFlags'],
> > +            '*error-info': 'uint64',
> > +            '*virt-addr':  'uint64',
> > +            '*phy-addr': 'uint64'}
> > +}
> > +
> > +##
> > +# @ArmProcessorContext:
> > +#
> > +# Provide processor context state specific to the ARM processor architecture,
> > +# According with UEFI 2.10 CPER table N.21.
> > +# Argument is optional.If not specified, no context will be used.
> > +#
> > +# @type:
> > +#       Contains an integer value indicating the type of context state being
> > +#       reported.
> > +#       Argument is optional. If not defined, it will be set to be EL1 register
> > +#       for the emulation, e. g.:
> > +#       - on arm32: AArch32 EL1 context registers;
> > +#       - on arm64: AArch64 EL1 context registers.
> > +#
> > +# @register:
> > +#       Provides the contents of the actual registers or raw data, depending
> > +#       on the context type.
> > +#       Argument is optional. If not defined, it will fill the first register
> > +#       with 0xDEADBEEF, and the other ones with zero.
> > +#
> > +# @minimal-size:
> > +#       Argument is optional. If provided, define the minimal size of the
> > +#       context register array. The actual size is defined by checking the
> > +#       number of register values plus the content of this field (if used),
> > +#       ensuring that each processor context information structure array is
> > +#       padded with zeros if the size is not a multiple of 16 bytes.
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'ArmProcessorContext',
> > +  'data': { '*type': 'uint16',
> > +            '*minimal-size': 'uint32',
> > +            '*register': ['uint64']}
> >  }
> >  
> >  ##
> >  # @arm-inject-error:
> >  #
> > -# Inject ARM Processor error.
> > +# Inject ARM Processor error with data to be filled accordign with UEFI 2.10
> > +# CPER table N.16.
> >  #
> > -# @errortypes: ARM processor error types to inject
> > +# @validation:
> > +#       Valid validation bits for ARM processor CPER.
> > +#       Argument is optional. If not specified, the default is
> > +#       calculated based on having the corresponding arguments filled.
> > +#
> > +# @affinity-level:
> > +#       Error affinity level for errors that can be attributed to a specific
> > +#       affinity level.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record.
> > +#
> > +# @mpidr-el1:
> > +#       Processor’s unique ID in the system.
> > +#       Argument is optional. If not specified, it will use the cpu mpidr
> > +#       field from the emulation data. If zero and @validation is not
> > +#       enforced, this field will be marked as invalid at CPER record.
> > +#
> > +# @midr-el1:  Identification info of the chip
> > +#       Argument is optional. If not specified, it will use the cpu mpidr
> > +#       field from the emulation data. If zero and @validation is not
> > +#       enforced, this field will be marked as invalid at CPER record.
> > +#
> > +# @running-state:
> > +#       Indicates the running state of the processor.
> > +#       Argument is optional. If not specified and @validation not enforced,
> > +#       this field will be marked as invalid at CPER record.
> > +#
> > +# @psci-state:
> > +#       Provides PSCI state of the processor, as defined in ARM PSCI document.
> > +#       Argument is optional. If not specified, it will use the cpu power
> > +#       state field from the emulation data.
> > +#
> > +# @context:
> > +#       Contains an array of processor context registers.
> > +#       Argument is optional. If not specified, no context will be added.
> > +#
> > +# @vendor-specific:
> > +#       Contains a byte array of vendor-specific data.
> > +#       Argument is optional. If not specified, no vendor-specific data
> > +#       will be added.
> > +#
> > +# @error:
> > +#       Contains an array of ARM processor error information (PEI) sections.
> > +#       Argument is optional. If not specified, defaults to a single
> > +#       Program Error Information record defaulting to type=cache-error.
> >  #
> >  # Features:
> >  #
> > @@ -44,6 +262,16 @@
> >  # Since: 9.1
> >  ##
> >  { 'command': 'arm-inject-error',
> > -  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> > +  'data': {
> > +    '*validation': ['ArmProcessorValidationBits'],
> > +    '*affinity-level': 'uint8',
> > +    '*mpidr-el1': 'uint64',
> > +    '*midr-el1': 'uint64',
> > +    '*running-state':  ['ArmProcessorRunningState'],
> > +    '*psci-state': 'uint32',
> > +    '*context': ['ArmProcessorContext'],
> > +    '*vendor-specific': ['uint8'],
> > +    '*error': ['ArmProcessorErrorInformation']
> > +  },
> >    'features': [ 'unstable' ]
> >  }  
> 
> This changes the command pretty much completely.  Why is the previous
> state worth capturing in git?

I was thinking on having the first patch with minimal stuff and
letting patch 7 with everything, but after yours and Jonathan's
comments, I opted to merge them altogether.

> 
> > diff --git a/tests/lcitool/libvirt-ci b/tests/lcitool/libvirt-ci
> > index 0e9490cebc72..77c800186f34 160000
> > --- a/tests/lcitool/libvirt-ci
> > +++ b/tests/lcitool/libvirt-ci
> > @@ -1 +1 @@
> > -Subproject commit 0e9490cebc726ef772b6c9e27dac32e7ae99f9b2
> > +Subproject commit 77c800186f34b21be7660750577cc5582a914deb  
> 
> Accident?
> 

Yes. Working with submodules is sometimes tricky, as git commit -a wants
to merge everything including submodule changes, and manually dropping
submodule from existing commits is tricky. I added this to my environment,
but this affects only git diff porcelain:

	[diff]
	        ignoreSubmodules = all

I wonder is are there ways for git commit -a to also ignore submodules...
perhaps some git hook?

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-26 12:44   ` Jonathan Cameron via
@ 2024-07-29 11:40     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 11:40 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
	Eric Blake, Igor Mammedov, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Fri, 26 Jul 2024 13:44:12 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> On Mon, 22 Jul 2024 08:45:56 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > 1. Some GHES functions require handling addresses. Add a helper function
> >    to support it.
> > 
> > 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
> > 
> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
> > upper specs, using error type bit encoding as detailed at UEFI 2.9A
> > errata.
> > 
> > Error injection examples:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error']
> >       }
> > }
> > 
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['tlb-error']
> >       }
> > }
> > 
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['bus-error']
> >       }
> > }
> > 
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error', 'tlb-error']
> >       }
> > }
> > 
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
> >       }
> > }
> > ...
> > 
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > For Add a logic to handle block addresses,  
> # before comments I think?
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > For FW first ARM processor error injection,
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>  
> I can't remember what I wrote in here so may well be commenting on
> my past self ;)
> 
> > ---
> >  configs/targets/aarch64-softmmu.mak |   1 +
> >  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
> >  hw/arm/Kconfig                      |   4 +
> >  hw/arm/arm_error_inject.c           |  35 ++++
> >  hw/arm/arm_error_inject_stubs.c     |  18 ++
> >  hw/arm/meson.build                  |   3 +
> >  include/hw/acpi/ghes.h              |   2 +
> >  qapi/arm-error-inject.json          |  49 ++++++
> >  qapi/meson.build                    |   1 +
> >  qapi/qapi-schema.json               |   1 +
> >  10 files changed, 361 insertions(+), 11 deletions(-)
> >  create mode 100644 hw/arm/arm_error_inject.c
> >  create mode 100644 hw/arm/arm_error_inject_stubs.c
> >  create mode 100644 qapi/arm-error-inject.json
> > 
> > diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
> > index 84cb32dc2f4f..b4b3cd97934a 100644
> > --- a/configs/targets/aarch64-softmmu.mak
> > +++ b/configs/targets/aarch64-softmmu.mak
> > @@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
> >  TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
> >  # needed by boot.c
> >  TARGET_NEED_FDT=y
> > +CONFIG_ARM_EINJ=y
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 5b8bc6eeb437..6075ef5893ce 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -27,6 +27,7 @@
> >  #include "hw/acpi/generic_event_device.h"
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "qemu/uuid.h"
> > +#include "qapi/qapi-types-arm-error-inject.h"
> >  
> >  #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
> >  #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> > @@ -53,6 +54,12 @@
> >  /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
> >  #define ACPI_GHES_MEM_CPER_LENGTH           80
> >  
> > +/*
> > + * ARM Processor section CPER size, UEFI 2.10: N.2.4.4
> > + * ARM Processor Error Section
> > + */
> > +#define ACPI_GHES_ARM_CPER_LENGTH (72 + 600)
> > +
> >  /* Masks for block_status flags */
> >  #define ACPI_GEBS_UNCORRECTABLE         1
> >  
> > @@ -231,6 +238,142 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> >      return 0;
> >  }
> >  
> > +/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
> > +static void acpi_ghes_build_append_arm_cper(uint8_t error_types, GArray *table)
> > +{
> > +    /*
> > +     * ARM Processor Error Record
> > +     */
> > +
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table,
> > +                              (1ULL << 3) | /* Vendor specific info Valid */
> > +                              (1ULL << 2) | /* Running status Valid */
> > +                              (1ULL << 1) | /* Error affinity level Valid */
> > +                              (1ULL << 0), /* MPIDR Valid */
> > +                              4);
> > +    /* Error Info Num */
> > +    build_append_int_noprefix(table, 1, 2);
> > +    /* Context Info Num */
> > +    build_append_int_noprefix(table, 1, 2);
> > +    /* Section length */
> > +    build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_LENGTH, 4);
> > +    /* Error affinity level */
> > +    build_append_int_noprefix(table, 2, 1);
> > +    /* Reserved */
> > +    build_append_int_noprefix(table, 0, 3);
> > +    /* MPIDR_EL1 */
> > +    build_append_int_noprefix(table, 0xAB12, 8);  
> 
> These need to be real - I see you fix that in later
> patches, but I'd be tempted to pull it back here.  Or maybe just
> add a comment to say you will rewrite this later.
> 
> I know you aren't keen to smash patches with different authorship
> together, but here I think you should just have this
> correct from the start (so combine this and 5-7)
> perhaps with some links back to the version where they are split?

I folded this with patch 7. I kept patch 5 as a separate one,
as it is a different logical change.

After folding, this field is filled from the emulation value for
it (by default, as it can be overridden via QMP).

> > +    /* MIDR_EL1 */
> > +    build_append_int_noprefix(table, 0xCD24, 8);
> > +    /* Running state */
> > +    build_append_int_noprefix(table, 0x1, 4);
> > +    /* PSCI state */
> > +    build_append_int_noprefix(table, 0x1234, 4);
> > +
> > +    /* ARM Propcessor error information */
> > +    /* Version */
> > +    build_append_int_noprefix(table, 0, 1);
> > +    /*  Length */
> > +    build_append_int_noprefix(table, 32, 1);
> > +    /* Validation Bits */
> > +    build_append_int_noprefix(table,
> > +                              (1ULL << 4) | /* Physical fault address Valid */  
> 
> Some tabs hiding in here that need to be spaces.

Solved when folding with patch 7.

> > +                             (1ULL << 3) | /* Virtual fault address Valid */
> > +                             (1ULL << 2) | /* Error information Valid */
> > +                              (1ULL << 1) | /* Flags Valid */
> > +                              (1ULL << 0), /* Multiple error count Valid */
> > +                              2);
> > +    /* Type */
> > +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR) ||
> > +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR) ||
> > +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR) ||
> > +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> > +        build_append_int_noprefix(table, error_types, 1);
> > +    } else {
> > +        return;
> > +    }
> > +    /* Multiple error count */
> > +    build_append_int_noprefix(table, 2, 2);
> > +    /* Flags  */
> > +    build_append_int_noprefix(table, 0xD, 1);
> > +    /* Error information  */
> > +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR)) {
> > +        build_append_int_noprefix(table, 0x0091000F, 8);
> > +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR)) {
> > +        build_append_int_noprefix(table, 0x0054007F, 8);
> > +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR)) {
> > +        build_append_int_noprefix(table, 0x80D6460FFF, 8);
> > +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> > +        build_append_int_noprefix(table, 0x78DA03FF, 8);
> > +    } else {
> > +        return;
> > +    }
> > +    /* Virtual fault address  */
> > +    build_append_int_noprefix(table, 0x67320230, 8);
> > +    /* Physical fault address  */
> > +    build_append_int_noprefix(table, 0x5CDFD492, 8);
> > +
> > +    /* ARM Propcessor error context information */
> > +    /* Version */
> > +    build_append_int_noprefix(table, 0, 2);
> > +    /* Validation Bits */
> > +    /* AArch64 EL1 context registers Valid */
> > +    build_append_int_noprefix(table, 5, 2);
> > +    /* Register array size */
> > +    build_append_int_noprefix(table, 592, 4);
> > +    /* Register array */
> > +    build_append_int_noprefix(table, 0x12ABDE67, 8);
> > +}
> > +
> > +static int acpi_ghes_record_arm_error(uint8_t error_types,
> > +                                      uint64_t error_block_address)
> > +{
> > +    GArray *block;
> > +
> > +    /* ARM processor Error Section Type */
> > +    const uint8_t uefi_cper_arm_sec[] =
> > +          UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05, \
> > +                  0x1D, 0x5D, 0x46, 0xB0);
> > +
> > +    /*
> > +     * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> > +     * Table 17-13 Generic Error Data Entry
> > +     */
> > +    QemuUUID fru_id = {};
> > +    uint32_t data_length;
> > +
> > +    block = g_array_new(false, true /* clear */, 1);
> > +
> > +    /* This is the length if adding a new generic error data entry*/  
> 
> space before *

Fixed.

> 
> > +    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_ARM_CPER_LENGTH;
> > +    /*
> > +     * It should not run out of the preallocated memory if adding a new generic
> > +     * error data entry
> > +     */
> > +    assert((data_length + ACPI_GHES_GESB_SIZE) <=
> > +            ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > +
> > +    /* Build the new generic error status block header */
> > +    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> > +        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> > +
> > +    /* Build this new generic error data entry header */
> > +    acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
> > +        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> > +        ACPI_GHES_ARM_CPER_LENGTH, fru_id, 0);
> > +
> > +    /* Build the ARM processor error section CPER */
> > +    acpi_ghes_build_append_arm_cper(error_types, block);
> > +
> > +    /* Write the generic error data entry into guest memory */
> > +    cpu_physical_memory_write(error_block_address, block->data, block->len);
> > +
> > +    g_array_free(block, true);
> > +
> > +    return 0;
> > +}  
> 
> 
> > +bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
> > +{
> > +    int read_ack_register = 0;
> > +    uint64_t read_ack_register_addr = 0;
> > +    uint64_t error_block_addr = 0;
> > +
> > +    if (!ghes_get_addr(notify, &error_block_addr, &read_ack_register_addr)) {
> > +        return false;
> > +    }
> > +
> > +    cpu_physical_memory_read(read_ack_register_addr,
> > +                             &read_ack_register, sizeof(uint64_t));  
> 
> longer but I'd prefer sizeof(read_ack_register)
> Maybe we can shorten to read_ack and read_ack_addr?
> 
> > +    /* zero means OSPM does not acknowledge the error */
> > +    if (!read_ack_register) {
> > +        error_report("Last time OSPM does not acknowledge the error,"
> > +                     " record CPER failed this time, set the ack value to"
> > +                     " avoid blocking next time CPER record! exit");
> > +        read_ack_register = 1;
> > +        cpu_physical_memory_write(read_ack_register_addr,
> > +                                  &read_ack_register, sizeof(uint64_t));  
> sizeof(read_ack_register)
> 
> > +        return false;
> > +    }
> > +
> > +    read_ack_register = cpu_to_le64(0);
> > +    cpu_physical_memory_write(read_ack_register_addr,
> > +                              &read_ack_register, sizeof(uint64_t));  
> 
> sizeof(read_ack_register)
> 
> > +    return acpi_ghes_record_arm_error(error_types, error_block_addr);
> > +}
> > +  

Changed as suggested.

> 
> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> > new file mode 100644
> > index 000000000000..430e6cea6b60
> > --- /dev/null
> > +++ b/qapi/arm-error-inject.json  
> 
> > +##
> > +# @arm-inject-error:
> > +#
> > +# Inject ARM Processor error.
> > +#
> > +# @errortypes: ARM processor error types to inject
> > +#
> > +# Features:
> > +#
> > +# @unstable: This command is experimental.
> > +#
> > +# Since: 9.1  
> Update to 9.2 on next version.

Ok.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-25  9:48   ` Markus Armbruster
  2024-07-26 12:46     ` Jonathan Cameron via
@ 2024-07-29 12:21     ` Mauro Carvalho Chehab
  2024-07-29 14:32       ` Markus Armbruster
  1 sibling, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 12:21 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Thu, 25 Jul 2024 11:48:12 +0200
Markus Armbruster <armbru@redhat.com> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >
> > 1. Some GHES functions require handling addresses. Add a helper function
> >    to support it.
> >
> > 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
> >
> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
> > upper specs, using error type bit encoding as detailed at UEFI 2.9A
> > errata.
> >
> > Error injection examples:
> >
> > { "execute": "qmp_capabilities" }
> >
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error']
> >       }
> > }
> >
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['tlb-error']
> >       }
> > }
> >
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['bus-error']
> >       }
> > }
> >
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error', 'tlb-error']
> >       }
> > }
> >
> > { "execute": "arm-inject-error",
> >       "arguments": {
> >         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
> >       }
> > }
> > ...
> >
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > For Add a logic to handle block addresses,
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > For FW first ARM processor error injection,
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> > ---
> >  configs/targets/aarch64-softmmu.mak |   1 +
> >  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
> >  hw/arm/Kconfig                      |   4 +
> >  hw/arm/arm_error_inject.c           |  35 ++++
> >  hw/arm/arm_error_inject_stubs.c     |  18 ++
> >  hw/arm/meson.build                  |   3 +
> >  include/hw/acpi/ghes.h              |   2 +
> >  qapi/arm-error-inject.json          |  49 ++++++
> >  qapi/meson.build                    |   1 +
> >  qapi/qapi-schema.json               |   1 +
> >  10 files changed, 361 insertions(+), 11 deletions(-)
> >  create mode 100644 hw/arm/arm_error_inject.c
> >  create mode 100644 hw/arm/arm_error_inject_stubs.c
> >  create mode 100644 qapi/arm-error-inject.json  
> 
> Since the new file not covered in MAINTAINERS, get_maintainer.pl will
> blame it on the QAPI maintainers alone.  No good.

Added myself there:

diff --git a/MAINTAINERS b/MAINTAINERS
index 98eddf7ae155..713a104ef901 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
 F: include/hw/acpi/ghes.h
 F: docs/specs/acpi_hest_ghes.rst
 
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/arm_error_inject.c
+F: hw/arm/arm_error_inject_stubs.c
+F: qapi/arm-error-inject.json
+
 ppc4xx
 L: qemu-ppc@nongnu.org
 S: Orphan

> 
> [...]
> 
> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> > new file mode 100644
> > index 000000000000..430e6cea6b60
> > --- /dev/null
> > +++ b/qapi/arm-error-inject.json
> > @@ -0,0 +1,49 @@
> > +# -*- Mode: Python -*-
> > +# vim: filetype=python
> > +
> > +##
> > +# = ARM Processor Errors
> > +##
> > +
> > +##
> > +# @ArmProcessorErrorType:
> > +#
> > +# Type of ARM processor error to inject
> > +#
> > +# @unknown-error: Unknown error  
> 
> Removed in PATCH 7, and unused until then.  Why add it in the first
> place?

I folded this with patch 7, so this was gone now.

> 
> > +#
> > +# @cache-error: Cache error
> > +#
> > +# @tlb-error: TLB error
> > +#
> > +# @bus-error: Bus error.
> > +#
> > +# @micro-arch-error: Micro architectural error.
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'enum': 'ArmProcessorErrorType',
> > +  'data': ['unknown-error',
> > +	   'cache-error',  
> 
> Tab in this line.  Please convert to spaces.

Ok.

> 
> > +           'tlb-error',
> > +           'bus-error',
> > +           'micro-arch-error']
> > +}
> > +
> > +##
> > +# @arm-inject-error:
> > +#
> > +# Inject ARM Processor error.
> > +#
> > +# @errortypes: ARM processor error types to inject
> > +#
> > +# Features:
> > +#
> > +# @unstable: This command is experimental.
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'command': 'arm-inject-error',
> > +  'data': { 'errortypes': ['ArmProcessorErrorType'] },  
> 
> Please separate words with dashes: 'error-types'.

Done.

Folding with patch 7 broke it on two separate fields: error and
type.

> 
> > +  'features': [ 'unstable' ]
> > +}  
> 
> Is this used only with TARGET_ARM?

Yes, as this CPER record is defined only for arm. There are three other
processor error info:
	- for x86;
	- for ia32;
	- for "generic cpu".

They have different structures, with different fields.

> Why is being able to inject multiple error types at once useful?

The CPER ARM Processor record is defined at UEFI spec as having from 1 to
255 errors, that can be using the same type or not. The idea behind UEFI
spec is that a single root error may be reflected on multiple errors.

It may also help to reduce BIOS interrupts to OS, by merging errors
altogether, as memory errors usually happen in bursts.

Due to that, a single Processor Error Information inside a CPER record
for ARM processor can, according with UEFI spec, contain more than one
of the following bits set:

            +-----|---------------------------+
            | Bit | Meaning                   |
            +=====+===========================+
            |  1  | Cache Error               |
            |  2  | TLB Error                 |
            |  3  | Bus Error                 |
            |  4  | Micro-architectural Error |
            +-----|---------------------------+

So, the spec allows, for instance, to have a single Processor Error
Information (PEI) with micro-arch and tlb-error flags raised at the
same time.

We need the capability of testing multiple error types in order to check
if OS implementation is decoding it the right way. In particular, Linux
was not doing it right, as the CPER ARM Processor record handler was 
written at the time UEFI 2.6 spec was written, while the actual encoding
for the error type was only defined at UEFI 2.9A errata and newer.

> I'd expect at least some of these errors to come with additional
> information.  For instance, I imagine a bus error is associated with
> some address.

It actually depends on the ARM and PEI valid fields: the address may or 
may not be present, depending if the phy/logical address valid field bit
is set or not.

> 
> If we encode the the error to inject as an enum value, adding more will
> be hard.
> 
> If we wrap the enum in a struct
> 
>     { 'struct': 'ArmProcessorError',
>       'data': { 'type': 'ArmProcessorErrorType' } }
> 
> we can later extend it like
> 
>     { 'union': 'ArmProcessorError',
>       'base: { 'type': 'ArmProcessorErrorType' }
>       'data': {
>           'bus-error': 'ArmProcessorBusErrorData' } }
> 
>     { 'struct': 'ArmProcessorBusErrorData',
>       'data': ... }

I don't see this working as one might expect. See, the ARM error
information data can be repeated from 1 to 255 times. It is given 
by this struct (see patch 7):

	{ 'struct': 'ArmProcessorErrorInformation',
	  'data': { '*validation': ['ArmPeiValidationBits'],
	            'type': ['ArmProcessorErrorType'],
	            '*multiple-error': 'uint16',
	            '*flags': ['ArmProcessorFlags'],
	            '*error-info': 'uint64',
	            '*virt-addr':  'uint64',
	            '*phy-addr': 'uint64'}
	}

According with the UEFI spec, the type is always be present.
The other fields are marked as valid or not via the field
"validation". So, there's one bit indicating what is valid between
the fields at the PEI structure, e. g.:

	- multiple-error: multiple occurrences of the error;
	- flags;
	- error-info: error information;
	- virt-addr: virtual address;
	- phy-addr: physical address.

There are also other fields that are global for the entire record,
also marked as valid or not via another bitmask.

The contents of almost all those fields are independent of the error
type. The only field which content is affected by the error type is
"error-info", and the definition of such field is not fully specified.

So, currently, UEFI spec only defines it when:

1. the error type has just one bit set;
2. the error type is either cache, TLB or bus error[1].
   If type is micro-arch-specific error, the spec doesn't tell how this 
   field if filled.

To make the API simple (yet powerful), I opted to not enforce any encoding
for error-info: let userspace fill it as required and use some default
that would make sense, if this is not passed via QMP.

[1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information

> > diff --git a/qapi/meson.build b/qapi/meson.build
> > index e7bc54e5d047..5927932c4be3 100644
> > --- a/qapi/meson.build
> > +++ b/qapi/meson.build
> > @@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
> >  endif
> >  
> >  qapi_all_modules = [
> > +  'arm-error-inject',
> >    'authz',
> >    'block',
> >    'block-core',
> > diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> > index b1581988e4eb..479a22de7e43 100644
> > --- a/qapi/qapi-schema.json
> > +++ b/qapi/qapi-schema.json
> > @@ -81,3 +81,4 @@
> >  { 'include': 'vfio.json' }
> >  { 'include': 'cryptodev.json' }
> >  { 'include': 'cxl.json' }
> > +{ 'include': 'arm-error-inject.json' }  
> 

Thanks,
Mauro


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-26 12:46     ` Jonathan Cameron via
@ 2024-07-29 12:49       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-29 12:49 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Markus Armbruster, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Fri, 26 Jul 2024 13:46:46 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> A few quick replies from me.
> I'm sure Mauro will add more info.
> 
> > > +           'tlb-error',
> > > +           'bus-error',
> > > +           'micro-arch-error']
> > > +}
> > > +
> > > +##
> > > +# @arm-inject-error:
> > > +#
> > > +# Inject ARM Processor error.
> > > +#
> > > +# @errortypes: ARM processor error types to inject
> > > +#
> > > +# Features:
> > > +#
> > > +# @unstable: This command is experimental.
> > > +#
> > > +# Since: 9.1
> > > +##
> > > +{ 'command': 'arm-inject-error',
> > > +  'data': { 'errortypes': ['ArmProcessorErrorType'] },    
> > 
> > Please separate words with dashes: 'error-types'.
> >   
> > > +  'features': [ 'unstable' ]
> > > +}    
> > 
> > Is this used only with TARGET_ARM?
> > 
> > Why is being able to inject multiple error types at once useful?  
> 
> It pokes a weird corner of the specification that I think previously 
> tripped up Linux.
> 
> > 
> > I'd expect at least some of these errors to come with additional
> > information.  For instance, I imagine a bus error is associated with
> > some address.  
> 
> Absolutely agree that in sane case you wouldn't have multiple errors
> but we want to hit the insane ones :(

Yes.

> There is only prevision for one set of data in the record despite
> it providing a bitmap for the type of error.

Well, there isn't anything at the UEFI forbidding to use multiple bits.

On a "normal" field with a bitmask, more than one bit set is supported.
So, as spec doesn't deny it, it should be valid to have more than one 
bits filled.

Now, when multiple errors bits from this table are set:

            +-----|---------------------------+
            | Bit | Meaning                   |
            +=====+===========================+
            |  1  | Cache Error               |
            |  2  | TLB Error                 |
            |  3  | Bus Error                 |
            |  4  | Micro-architectural Error |
            +-----|---------------------------+

- if bit 4 is set, as specified at the spec, the error-info field is 
  defined by the ARM vendor, according with:

	"N.2.4.4.1.1. ARM Vendor Specific Micro-Architecture ErrorStructure

	 This is a vendor specific structure. Please refer to your hardware
	 vendor documentation for the format of this structure."

  So, provided that the vendor-specific documentation explicitly allows
  setting bit 4 with other bits, I don't see an UEFI compliance problem.

- if bit 4 is not set, but multiple bits 1 to 3 are set, the content
  of error-info is currently undefined, as tables N.18 to N.20 won't
  apply.

Anyway, from spec PoV, IMO UEFI API requires an errata to clearly enforce
that just one bit should be set or to define the behavior when multiple
ones are set.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-29 12:21     ` Mauro Carvalho Chehab
@ 2024-07-29 14:32       ` Markus Armbruster
  2024-08-01 14:34         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Markus Armbruster @ 2024-07-29 14:32 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Thu, 25 Jul 2024 11:48:12 +0200
> Markus Armbruster <armbru@redhat.com> escreveu:
>
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>> 
>> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> >
>> > 1. Some GHES functions require handling addresses. Add a helper function
>> >    to support it.
>> >
>> > 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
>> >
>> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
>> > upper specs, using error type bit encoding as detailed at UEFI 2.9A
>> > errata.
>> >
>> > Error injection examples:
>> >
>> > { "execute": "qmp_capabilities" }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['tlb-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['bus-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error', 'tlb-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>> >       }
>> > }
>> > ...
>> >
>> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
>> > For Add a logic to handle block addresses,
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > For FW first ARM processor error injection,
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> > ---
>> >  configs/targets/aarch64-softmmu.mak |   1 +
>> >  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
>> >  hw/arm/Kconfig                      |   4 +
>> >  hw/arm/arm_error_inject.c           |  35 ++++
>> >  hw/arm/arm_error_inject_stubs.c     |  18 ++
>> >  hw/arm/meson.build                  |   3 +
>> >  include/hw/acpi/ghes.h              |   2 +
>> >  qapi/arm-error-inject.json          |  49 ++++++
>> >  qapi/meson.build                    |   1 +
>> >  qapi/qapi-schema.json               |   1 +
>> >  10 files changed, 361 insertions(+), 11 deletions(-)
>> >  create mode 100644 hw/arm/arm_error_inject.c
>> >  create mode 100644 hw/arm/arm_error_inject_stubs.c
>> >  create mode 100644 qapi/arm-error-inject.json  
>> 
>> Since the new file not covered in MAINTAINERS, get_maintainer.pl will
>> blame it on the QAPI maintainers alone.  No good.
>
> Added myself there:
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 98eddf7ae155..713a104ef901 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
>  F: include/hw/acpi/ghes.h
>  F: docs/specs/acpi_hest_ghes.rst
>  
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/arm_error_inject.c
> +F: hw/arm/arm_error_inject_stubs.c
> +F: qapi/arm-error-inject.json
> +
>  ppc4xx
>  L: qemu-ppc@nongnu.org
>  S: Orphan
>
>> 
>> [...]
>> 
>> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
>> > new file mode 100644
>> > index 000000000000..430e6cea6b60
>> > --- /dev/null
>> > +++ b/qapi/arm-error-inject.json
>> > @@ -0,0 +1,49 @@
>> > +# -*- Mode: Python -*-
>> > +# vim: filetype=python
>> > +
>> > +##
>> > +# = ARM Processor Errors
>> > +##
>> > +
>> > +##
>> > +# @ArmProcessorErrorType:
>> > +#
>> > +# Type of ARM processor error to inject
>> > +#
>> > +# @unknown-error: Unknown error  
>> 
>> Removed in PATCH 7, and unused until then.  Why add it in the first
>> place?
>
> I folded this with patch 7, so this was gone now.
>
>> 
>> > +#
>> > +# @cache-error: Cache error
>> > +#
>> > +# @tlb-error: TLB error
>> > +#
>> > +# @bus-error: Bus error.
>> > +#
>> > +# @micro-arch-error: Micro architectural error.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'enum': 'ArmProcessorErrorType',
>> > +  'data': ['unknown-error',
>> > +	   'cache-error',  
>> 
>> Tab in this line.  Please convert to spaces.
>
> Ok.
>
>> 
>> > +           'tlb-error',
>> > +           'bus-error',
>> > +           'micro-arch-error']
>> > +}
>> > +
>> > +##
>> > +# @arm-inject-error:
>> > +#
>> > +# Inject ARM Processor error.
>> > +#
>> > +# @errortypes: ARM processor error types to inject
>> > +#
>> > +# Features:
>> > +#
>> > +# @unstable: This command is experimental.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'command': 'arm-inject-error',
>> > +  'data': { 'errortypes': ['ArmProcessorErrorType'] },  
>> 
>> Please separate words with dashes: 'error-types'.
>
> Done.
>
> Folding with patch 7 broke it on two separate fields: error and
> type.
>
>> 
>> > +  'features': [ 'unstable' ]
>> > +}  
>> 
>> Is this used only with TARGET_ARM?
>
> Yes, as this CPER record is defined only for arm. There are three other
> processor error info:
> 	- for x86;
> 	- for ia32;
> 	- for "generic cpu".
>
> They have different structures, with different fields.

A generic inject-error command feels nicer, but coding its arguments in
the schema could be more trouble than it's worth.  I'm not asking you to
try.

A target-specific command like this one should be conditional.  Try
this:

    { 'command': 'arm-inject-error',
      'data': { 'errortypes': ['ArmProcessorErrorType'] },
      'features': [ 'unstable' ],
      'if': 'TARGET_ARM' }

No need to provide a qmp_arm_inject_error() stub then.

>> Why is being able to inject multiple error types at once useful?
>
> The CPER ARM Processor record is defined at UEFI spec as having from 1 to
> 255 errors, that can be using the same type or not. The idea behind UEFI
> spec is that a single root error may be reflected on multiple errors.
>
> It may also help to reduce BIOS interrupts to OS, by merging errors
> altogether, as memory errors usually happen in bursts.
>
> Due to that, a single Processor Error Information inside a CPER record
> for ARM processor can, according with UEFI spec, contain more than one
> of the following bits set:
>
>             +-----|---------------------------+
>             | Bit | Meaning                   |
>             +=====+===========================+
>             |  1  | Cache Error               |
>             |  2  | TLB Error                 |
>             |  3  | Bus Error                 |
>             |  4  | Micro-architectural Error |
>             +-----|---------------------------+
>
> So, the spec allows, for instance, to have a single Processor Error
> Information (PEI) with micro-arch and tlb-error flags raised at the
> same time.
>
> We need the capability of testing multiple error types in order to check
> if OS implementation is decoding it the right way. In particular, Linux
> was not doing it right, as the CPER ARM Processor record handler was 
> written at the time UEFI 2.6 spec was written, while the actual encoding
> for the error type was only defined at UEFI 2.9A errata and newer.

I see.

>> I'd expect at least some of these errors to come with additional
>> information.  For instance, I imagine a bus error is associated with
>> some address.
>
> It actually depends on the ARM and PEI valid fields: the address may or 
> may not be present, depending if the phy/logical address valid field bit
> is set or not.
>
>> 
>> If we encode the the error to inject as an enum value, adding more will
>> be hard.
>> 
>> If we wrap the enum in a struct
>> 
>>     { 'struct': 'ArmProcessorError',
>>       'data': { 'type': 'ArmProcessorErrorType' } }
>> 
>> we can later extend it like
>> 
>>     { 'union': 'ArmProcessorError',
>>       'base: { 'type': 'ArmProcessorErrorType' }
>>       'data': {
>>           'bus-error': 'ArmProcessorBusErrorData' } }
>> 
>>     { 'struct': 'ArmProcessorBusErrorData',
>>       'data': ... }
>
> I don't see this working as one might expect. See, the ARM error
> information data can be repeated from 1 to 255 times. It is given 
> by this struct (see patch 7):
>
> 	{ 'struct': 'ArmProcessorErrorInformation',
> 	  'data': { '*validation': ['ArmPeiValidationBits'],
> 	            'type': ['ArmProcessorErrorType'],
> 	            '*multiple-error': 'uint16',
> 	            '*flags': ['ArmProcessorFlags'],
> 	            '*error-info': 'uint64',
> 	            '*virt-addr':  'uint64',
> 	            '*phy-addr': 'uint64'}
> 	}
>
> According with the UEFI spec, the type is always be present.
> The other fields are marked as valid or not via the field
> "validation". So, there's one bit indicating what is valid between
> the fields at the PEI structure, e. g.:
>
> 	- multiple-error: multiple occurrences of the error;
> 	- flags;
> 	- error-info: error information;
> 	- virt-addr: virtual address;
> 	- phy-addr: physical address.
>
> There are also other fields that are global for the entire record,
> also marked as valid or not via another bitmask.
>
> The contents of almost all those fields are independent of the error
> type. The only field which content is affected by the error type is
> "error-info", and the definition of such field is not fully specified.
>
> So, currently, UEFI spec only defines it when:
>
> 1. the error type has just one bit set;
> 2. the error type is either cache, TLB or bus error[1].
>    If type is micro-arch-specific error, the spec doesn't tell how this 
>    field if filled.
>
> To make the API simple (yet powerful), I opted to not enforce any encoding
> for error-info: let userspace fill it as required and use some default
> that would make sense, if this is not passed via QMP.
>
> [1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information

I asked because designing for extensibility is good practice.

It's not a hard requirement here, because feature 'unstable' gives us
lincense to change the interface incompatibly.

[...]



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
@ 2024-07-30  7:25   ` Igor Mammedov
  2024-07-30  8:29     ` Peter Maydell
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30  7:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:53 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Having magic numbers inside the code is not a good idea, as it
> is error-prone. So, instead, create a macro with the number
> definition.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
>  hw/arm/virt-acpi-build.c | 6 +++---
>  hw/arm/virt.c            | 7 ++++---
>  include/hw/arm/virt.h    | 3 +++
>  3 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e10cad86dd73..f76fb117adff 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>      aml_append(dev, aml_name_decl("_CRS", crs));
>  
>      Aml *aei = aml_resource_template();
> -    /* Pin 3 for power button */
> -    const uint32_t pin_list[1] = {3};
> +
> +    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
>      aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> -                                 AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
> +                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
>                                   "GPO0", NULL, 0));
>      aml_append(dev, aml_name_decl("_AEI", aei));
>  
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index b0c68d66a345..c99c8b1713c6 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
>      if (s->acpi_dev) {
>          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
>      } else {
> -        /* use gpio Pin 3 for power button event */
> +        /* use gpio Pin for power button event */
>          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);

/me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
you are passing 3. Is this a bug?

BTW: dropping '3' from comment doesn't make it any better.

>      }
>  }
> @@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
>                               uint32_t phandle)
>  {
>      gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> -                                        qdev_get_gpio_in(pl061_dev, 3));
> +                                        qdev_get_gpio_in(pl061_dev,
> +                                                         GPIO_PIN_POWER_BUTTON));
>  
>      qemu_fdt_add_subnode(fdt, "/gpio-keys");
>      qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> @@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
>      qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
>                            KEY_POWER);
>      qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
> -                           "gpios", phandle, 3, 0);
> +                           "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
>  }
>  
>  #define SECURE_GPIO_POWEROFF 0
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index ab961bb6a9b8..a4d937ed45ac 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -47,6 +47,9 @@
>  /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
>  #define PVTIME_SIZE_PER_CPU 64
>  
> +/* GPIO pins */
> +#define GPIO_PIN_POWER_BUTTON  3
> +
>  enum {
>      VIRT_FLASH,
>      VIRT_MEM,



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-07-30  7:25   ` Igor Mammedov
@ 2024-07-30  8:29     ` Peter Maydell
  2024-07-30 11:26       ` Igor Mammedov
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Maydell @ 2024-07-30  8:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose,
	Michael S. Tsirkin, Ani Sinha, Shannon Zhao, linux-kernel,
	qemu-arm, qemu-devel

On Tue, 30 Jul 2024 at 08:26, Igor Mammedov <imammedo@redhat.com> wrote:
>
> On Mon, 22 Jul 2024 08:45:53 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Having magic numbers inside the code is not a good idea, as it
> > is error-prone. So, instead, create a macro with the number
> > definition.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index b0c68d66a345..c99c8b1713c6 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> >      if (s->acpi_dev) {
> >          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> >      } else {
> > -        /* use gpio Pin 3 for power button event */
> > +        /* use gpio Pin for power button event */
> >          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
>
> /me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
> you are passing 3. Is this a bug?

No. The gpio_key_dev is a gpio-key device which has one
input (which you assert to "press the key") and one output,
which goes high when the key is pressed and then falls
100ms later. The virt board wires up the output of the
gpio-key device to input 3 on the PL061 GPIO controller.
(This happens in create_gpio_keys().) So the code is correct
to assert input 0 on the gpio-key device and the comment
isn't wrong that this results in GPIO pin 3 being asserted:
the link is just indirect.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
  2024-07-26 12:30   ` Jonathan Cameron via
@ 2024-07-30  8:36   ` Igor Mammedov
  2024-07-31  5:17     ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30  8:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Marcel Apfelbaum, Peter Maydell, Shannon Zhao, Yanan Wang,
	linux-kernel, qemu-arm, qemu-devel, shameerali.kolothum.thodi

On Mon, 22 Jul 2024 08:45:54 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Creates a GED - Generic Event Device and set a GPIO to
> be used or error injection.

QEMU already has GED device, so question is why it wasn't
used for event delivery?
I nutshell, I'd really prefer this series being rewritten
to reuse exiting GED instead of adding ad hoc GPIO and ACPI
plumbing.

PS:
as side effect of that, error injection could be used no only for
ARM but other machines that use GED (providing they implement GHES) 

Also CCing Shameer wrt touched power button code

> [mchehab: use a define for the generic event pin number and do some cleanups]
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
>  hw/arm/virt.c            | 14 ++++++++++++--
>  include/hw/arm/virt.h    |  1 +
>  include/hw/boards.h      |  1 +
>  4 files changed, 40 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..c502ccf40909 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -63,6 +63,7 @@
>  
>  #define ARM_SPI_BASE 32
>  
> +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"
>  #define ACPI_BUILD_TABLE_SIZE             0x20000
>  
>  static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
> @@ -142,6 +143,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
>  static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>                                             uint32_t gpio_irq)

this function supposed to be called when acpi_dev is not present (exiting GED device)
and run on old machines only, so it should not be called for recent machine types.
I'd avoid adding anything to it.

see more comment about it below

>  {
> +    uint32_t pin;
> +
>      Aml *dev = aml_device("GPO0");
>      aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
>      aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> @@ -155,7 +158,12 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>  
>      Aml *aei = aml_resource_template();
>  
> -    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
> +    pin = GPIO_PIN_POWER_BUTTON;
> +    aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> +                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
> +                                 "GPO0", NULL, 0));
> +    /* Pin for generic error */
> +    pin = GPIO_PIN_GENERIC_ERROR;
>      aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
>                                   AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
>                                   "GPO0", NULL, 0));
> @@ -166,6 +174,11 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
>      aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
>                                    aml_int(0x80)));
>      aml_append(dev, method);
> +    method = aml_method("_E06", 0, AML_NOTSERIALIZED);
> +    aml_append(method, aml_notify(aml_name(ACPI_GENERIC_EVENT_DEVICE),
> +                                  aml_int(0x80)));
> +    aml_append(dev, method);
> +
>      aml_append(scope, dev);
>  }
>  
> @@ -800,6 +813,15 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
>      build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
>  }
>  
> +static void acpi_dsdt_add_generic_event_device(Aml *scope)
> +{
> +    Aml *dev = aml_device(ACPI_GENERIC_EVENT_DEVICE);
> +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
this is not _event_ device, it's referred as _error_ device in spec.

PS:
please properly document new ACPI primitives/devices,
see comment above aml_notify() for example.
Use earliest APIC spec where the device was defined for the 1st time.

> +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> +    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> +    aml_append(scope, dev);
> +}
> +
>  /* DSDT */
>  static void
>  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>                        HOTPLUG_HANDLER(vms->acpi_dev),
>                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
>                        memmap[VIRT_ACPI_GED].base);
> -    } else {
> -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
>      }
> +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));

wouldn't that create double/conflicting power button handlers
(GPIO and GED one), on recent machine types GED should be used
and power button in acpi_dsdt_add_gpio() is used only if
machine doesn't have GED.

>  
>      if (vms->acpi_dev) {
>          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>      }
>  
>      acpi_dsdt_add_power_button(scope);
> +    acpi_dsdt_add_generic_event_device(scope);
>  #ifdef CONFIG_TPM
>      acpi_dsdt_add_tpm(scope, vms);
>  #endif
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index c99c8b1713c6..f81cf3a69961 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -997,6 +997,13 @@ static void create_rtc(const VirtMachineState *vms)
>  }
>  
>  static DeviceState *gpio_key_dev;
> +
> +static DeviceState *gpio_error_dev;
> +static void virt_set_error(void)
> +{
> +    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> +}
> +
>  static void virt_powerdown_req(Notifier *n, void *opaque)
>  {
>      VirtMachineState *s = container_of(n, VirtMachineState, powerdown_notifier);
> @@ -1015,6 +1022,9 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
>      gpio_key_dev = sysbus_create_simple("gpio-key", -1,
>                                          qdev_get_gpio_in(pl061_dev,
>                                                           GPIO_PIN_POWER_BUTTON));
> +    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
> +                                          qdev_get_gpio_in(pl061_dev,
> +                                                           GPIO_PIN_GENERIC_ERROR));
>  
>      qemu_fdt_add_subnode(fdt, "/gpio-keys");
>      qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> @@ -2385,9 +2395,8 @@ static void machvirt_init(MachineState *machine)
>  
>      if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
>          vms->acpi_dev = create_acpi_ged(vms);
> -    } else {
> -        create_gpio_devices(vms, VIRT_GPIO, sysmem);
>      }
> +    create_gpio_devices(vms, VIRT_GPIO, sysmem);

again, this create duplicate/conflicting power button source

>  
>      if (vms->secure && !vmc->no_secure_gpio) {
>          create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
> @@ -3101,6 +3110,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      mc->default_ram_id = "mach-virt.ram";
>      mc->default_nic = "virtio-net-pci";
>  
> +    mc->set_error = virt_set_error;
>      object_class_property_add(oc, "acpi", "OnOffAuto",
>          virt_get_acpi, virt_set_acpi,
>          NULL, NULL);
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a4d937ed45ac..c9769d7d4d7f 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -49,6 +49,7 @@
>  
>  /* GPIO pins */
>  #define GPIO_PIN_POWER_BUTTON  3
> +#define GPIO_PIN_GENERIC_ERROR 6
>  
>  enum {
>      VIRT_FLASH,
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index ef6f18f2c1a7..6cf01f3934ae 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -304,6 +304,7 @@ struct MachineClass {
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
>      int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
>      ram_addr_t (*fixup_ram_size)(ram_addr_t size);
> +    void (*set_error)(void);
>  };
>  
>  /**



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/7] acpi/ghes: Support GPIO error source.
  2024-07-22  6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
@ 2024-07-30  8:40   ` Igor Mammedov
  2024-08-01 12:56     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30  8:40 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:55 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Add error notification to GHES v2 using the GPIO source.
> 
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes.c         | 8 ++++++--
>  include/hw/acpi/ghes.h | 1 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index e9511d9b8f71..5b8bc6eeb437 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -34,8 +34,8 @@
>  /* The max size in bytes for one error block */
>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
>  
> -/* Now only support ARMv8 SEA notification type error source */
> -#define ACPI_GHES_ERROR_SOURCE_COUNT        1
> +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT        2
>  
>  /* Generic Hardware Error Source version 2 */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> @@ -327,6 +327,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
>           */
>          build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
>          break;
> +    case ACPI_HEST_SRC_ID_GPIO:
> +        build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
> +        break;
>      default:
>          error_report("Not support this error source");
>          abort();
> @@ -370,6 +373,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
>      /* Error Source Count */
>      build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
>      build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> +    build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
>  
>      acpi_table_end(linker, &table);
>  }
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 674f6958e905..4f1ab1a73a06 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
>  
>  enum {
>      ACPI_HEST_SRC_ID_SEA = 0,
> +    ACPI_HEST_SRC_ID_GPIO = 1,
is it defined by some spec, or just a made up number?

>      /* future ids go here */
>      ACPI_HEST_SRC_ID_RESERVED,
>  };



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
  2024-07-25  9:48   ` Markus Armbruster
  2024-07-26 12:44   ` Jonathan Cameron via
@ 2024-07-30 11:17   ` Igor Mammedov
  2024-07-31  7:11     ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30 11:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:56 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

that's quite a bit of code that in 99% won't ever be used
(assuming error injection testing scenario),
not to mention it's a hw depended one and governed by different specs.

Essentially we would need to create _whole_ lot of QAPI
commands to cover possible errors for no benefit to QEMU.

Let take for example very simple _OST status reporting,
QEMU of cause can decode values and present it to users in
more 'presentable' form. However instead of translating
numbers (aka. spec language) into a made up QEMU language,
QEMU just passes values up the stack and users can use
well defined spec to interpret its meaning.

benefits are: QEMU doesn't have to maintain translation
code and QAPI ABI is limited to passing raw values.

Can we do similar thing here as well?
i.e. simplify error injection commands to
a command that takes raw value and passes it
to guest (QEMU here acts as proxy, if I'm not
mistaken)?

Preferably make it generic enough to handle
not only ARM but other error formats HEST is
able to handle.

PS:
For user convenience, QEMU can carry a script that
could help generate this raw value in user friendly way
but at the same time it won't put maintenance
burden on QEMU itself.

> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> 1. Some GHES functions require handling addresses. Add a helper function
>    to support it.
> 
> 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
> 
> Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
> upper specs, using error type bit encoding as detailed at UEFI 2.9A
> errata.
> 
> Error injection examples:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['tlb-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['bus-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error']
>       }
> }
> 
> { "execute": "arm-inject-error",
>       "arguments": {
>         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>       }
> }
> ...
> 
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> For Add a logic to handle block addresses,
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> For FW first ARM processor error injection,
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  configs/targets/aarch64-softmmu.mak |   1 +
>  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
>  hw/arm/Kconfig                      |   4 +
>  hw/arm/arm_error_inject.c           |  35 ++++
>  hw/arm/arm_error_inject_stubs.c     |  18 ++
>  hw/arm/meson.build                  |   3 +
>  include/hw/acpi/ghes.h              |   2 +
>  qapi/arm-error-inject.json          |  49 ++++++
>  qapi/meson.build                    |   1 +
>  qapi/qapi-schema.json               |   1 +
>  10 files changed, 361 insertions(+), 11 deletions(-)
>  create mode 100644 hw/arm/arm_error_inject.c
>  create mode 100644 hw/arm/arm_error_inject_stubs.c
>  create mode 100644 qapi/arm-error-inject.json
> 
> diff --git a/configs/targets/aarch64-softmmu.mak b/configs/targets/aarch64-softmmu.mak
> index 84cb32dc2f4f..b4b3cd97934a 100644
> --- a/configs/targets/aarch64-softmmu.mak
> +++ b/configs/targets/aarch64-softmmu.mak
> @@ -5,3 +5,4 @@ TARGET_KVM_HAVE_GUEST_DEBUG=y
>  TARGET_XML_FILES= gdb-xml/aarch64-core.xml gdb-xml/aarch64-fpu.xml gdb-xml/arm-core.xml gdb-xml/arm-vfp.xml gdb-xml/arm-vfp3.xml gdb-xml/arm-vfp-sysregs.xml gdb-xml/arm-neon.xml gdb-xml/arm-m-profile.xml gdb-xml/arm-m-profile-mve.xml gdb-xml/aarch64-pauth.xml
>  # needed by boot.c
>  TARGET_NEED_FDT=y
> +CONFIG_ARM_EINJ=y
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 5b8bc6eeb437..6075ef5893ce 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -27,6 +27,7 @@
>  #include "hw/acpi/generic_event_device.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "qemu/uuid.h"
> +#include "qapi/qapi-types-arm-error-inject.h"
>  
>  #define ACPI_GHES_ERRORS_FW_CFG_FILE        "etc/hardware_errors"
>  #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE     "etc/hardware_errors_addr"
> @@ -53,6 +54,12 @@
>  /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error Section */
>  #define ACPI_GHES_MEM_CPER_LENGTH           80
>  
> +/*
> + * ARM Processor section CPER size, UEFI 2.10: N.2.4.4
> + * ARM Processor Error Section
> + */
> +#define ACPI_GHES_ARM_CPER_LENGTH (72 + 600)
> +
>  /* Masks for block_status flags */
>  #define ACPI_GEBS_UNCORRECTABLE         1
>  
> @@ -231,6 +238,142 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
>      return 0;
>  }
>  
> +/* UEFI 2.9: N.2.4.4 ARM Processor Error Section */
> +static void acpi_ghes_build_append_arm_cper(uint8_t error_types, GArray *table)
> +{
> +    /*
> +     * ARM Processor Error Record
> +     */
> +
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1ULL << 3) | /* Vendor specific info Valid */
> +                              (1ULL << 2) | /* Running status Valid */
> +                              (1ULL << 1) | /* Error affinity level Valid */
> +                              (1ULL << 0), /* MPIDR Valid */
> +                              4);
> +    /* Error Info Num */
> +    build_append_int_noprefix(table, 1, 2);
> +    /* Context Info Num */
> +    build_append_int_noprefix(table, 1, 2);
> +    /* Section length */
> +    build_append_int_noprefix(table, ACPI_GHES_ARM_CPER_LENGTH, 4);
> +    /* Error affinity level */
> +    build_append_int_noprefix(table, 2, 1);
> +    /* Reserved */
> +    build_append_int_noprefix(table, 0, 3);
> +    /* MPIDR_EL1 */
> +    build_append_int_noprefix(table, 0xAB12, 8);
> +    /* MIDR_EL1 */
> +    build_append_int_noprefix(table, 0xCD24, 8);
> +    /* Running state */
> +    build_append_int_noprefix(table, 0x1, 4);
> +    /* PSCI state */
> +    build_append_int_noprefix(table, 0x1234, 4);
> +
> +    /* ARM Propcessor error information */
> +    /* Version */
> +    build_append_int_noprefix(table, 0, 1);
> +    /*  Length */
> +    build_append_int_noprefix(table, 32, 1);
> +    /* Validation Bits */
> +    build_append_int_noprefix(table,
> +                              (1ULL << 4) | /* Physical fault address Valid */
> +                             (1ULL << 3) | /* Virtual fault address Valid */
> +                             (1ULL << 2) | /* Error information Valid */
> +                              (1ULL << 1) | /* Flags Valid */
> +                              (1ULL << 0), /* Multiple error count Valid */
> +                              2);
> +    /* Type */
> +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR) ||
> +        error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> +        build_append_int_noprefix(table, error_types, 1);
> +    } else {
> +        return;
> +    }
> +    /* Multiple error count */
> +    build_append_int_noprefix(table, 2, 2);
> +    /* Flags  */
> +    build_append_int_noprefix(table, 0xD, 1);
> +    /* Error information  */
> +    if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_CACHE_ERROR)) {
> +        build_append_int_noprefix(table, 0x0091000F, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_TLB_ERROR)) {
> +        build_append_int_noprefix(table, 0x0054007F, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_BUS_ERROR)) {
> +        build_append_int_noprefix(table, 0x80D6460FFF, 8);
> +    } else if (error_types & BIT(ARM_PROCESSOR_ERROR_TYPE_MICRO_ARCH_ERROR)) {
> +        build_append_int_noprefix(table, 0x78DA03FF, 8);
> +    } else {
> +        return;
> +    }
> +    /* Virtual fault address  */
> +    build_append_int_noprefix(table, 0x67320230, 8);
> +    /* Physical fault address  */
> +    build_append_int_noprefix(table, 0x5CDFD492, 8);
> +
> +    /* ARM Propcessor error context information */
> +    /* Version */
> +    build_append_int_noprefix(table, 0, 2);
> +    /* Validation Bits */
> +    /* AArch64 EL1 context registers Valid */
> +    build_append_int_noprefix(table, 5, 2);
> +    /* Register array size */
> +    build_append_int_noprefix(table, 592, 4);
> +    /* Register array */
> +    build_append_int_noprefix(table, 0x12ABDE67, 8);
> +}
> +
> +static int acpi_ghes_record_arm_error(uint8_t error_types,
> +                                      uint64_t error_block_address)
> +{
> +    GArray *block;
> +
> +    /* ARM processor Error Section Type */
> +    const uint8_t uefi_cper_arm_sec[] =
> +          UUID_LE(0xE19E3D16, 0xBC11, 0x11E4, 0x9C, 0xAA, 0xC2, 0x05, \
> +                  0x1D, 0x5D, 0x46, 0xB0);
> +
> +    /*
> +     * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> +     * Table 17-13 Generic Error Data Entry
> +     */
> +    QemuUUID fru_id = {};
> +    uint32_t data_length;
> +
> +    block = g_array_new(false, true /* clear */, 1);
> +
> +    /* This is the length if adding a new generic error data entry*/
> +    data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_ARM_CPER_LENGTH;
> +    /*
> +     * It should not run out of the preallocated memory if adding a new generic
> +     * error data entry
> +     */
> +    assert((data_length + ACPI_GHES_GESB_SIZE) <=
> +            ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> +    /* Build the new generic error status block header */
> +    acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> +        0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> +
> +    /* Build this new generic error data entry header */
> +    acpi_ghes_generic_error_data(block, uefi_cper_arm_sec,
> +        ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> +        ACPI_GHES_ARM_CPER_LENGTH, fru_id, 0);
> +
> +    /* Build the ARM processor error section CPER */
> +    acpi_ghes_build_append_arm_cper(error_types, block);
> +
> +    /* Write the generic error data entry into guest memory */
> +    cpu_physical_memory_write(error_block_address, block->data, block->len);
> +
> +    g_array_free(block, true);
> +
> +    return 0;
> +}
> +
>  /*
>   * Build table for the hardware error fw_cfg blob.
>   * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
> @@ -392,23 +535,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
>      ags->present = true;
>  }
>  
> +static uint64_t ghes_get_state_start_address(void)
> +{
> +    AcpiGedState *acpi_ged_state =
> +        ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> +    AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> +
> +    return le64_to_cpu(ags->ghes_addr_le);
> +}
> +
>  int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>  {
>      uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> -    uint64_t start_addr;
> +    uint64_t start_addr = ghes_get_state_start_address();
>      bool ret = -1;
> -    AcpiGedState *acpi_ged_state;
> -    AcpiGhesState *ags;
> -
>      assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
>  
> -    acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> -                                                       NULL));
> -    g_assert(acpi_ged_state);
> -    ags = &acpi_ged_state->ghes_state;
> -
> -    start_addr = le64_to_cpu(ags->ghes_addr_le);
> -
>      if (physical_address) {
>  
>          if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> @@ -448,6 +590,100 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>      return ret;
>  }
>  
> +/*
> + * Error register block data layout
> + *
> + * | +---------------------+ ges.ghes_addr_le
> + * | |error_block_address0 |
> + * | +---------------------+
> + * | |error_block_address1 |
> + * | +---------------------+ --+--
> + * | |    .............    | GHES_ADDRESS_SIZE
> + * | +---------------------+ --+--
> + * | |error_block_addressN |
> + * | +---------------------+
> + * | | read_ack_register0  |
> + * | +---------------------+ --+--
> + * | | read_ack_register1  | GHES_ADDRESS_SIZE
> + * | +---------------------+ --+--
> + * | |   .............     |
> + * | +---------------------+
> + * | | read_ack_registerN  |
> + * | +---------------------+ --+--
> + * | |      CPER           |   |
> + * | |      ....           | GHES_MAX_RAW_DATA_LENGT
> + * | |      CPER           |   |
> + * | +---------------------+ --+--
> + * | |    ..........       |
> + * | +---------------------+
> + * | |      CPER           |
> + * | |      ....           |
> + * | |      CPER           |
> + * | +---------------------+
> + */
> +
> +/* Map from uint32_t notify to entry offset in GHES */
> +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> +                                                 0xff, 0xff, 0xff, 1, 0};
> +
> +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> +                          uint64_t *read_ack_register_addr)
> +{
> +    uint64_t base;
> +
> +    if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> +        return false;
> +    }
> +
> +    /* Find and check the source id for this new CPER */
> +    if (error_source_to_index[notify] == 0xff) {
> +        return false;
> +    }
> +
> +    base = ghes_get_state_start_address();
> +
> +    *read_ack_register_addr = base +
> +        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> +        error_source_to_index[notify] * sizeof(uint64_t);
> +
> +    /* Could also be read back from the error_block_address register */
> +    *error_block_addr = base +
> +        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> +        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> +        error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +
> +    return true;
> +}
> +
> +bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify)
> +{
> +    int read_ack_register = 0;
> +    uint64_t read_ack_register_addr = 0;
> +    uint64_t error_block_addr = 0;
> +
> +    if (!ghes_get_addr(notify, &error_block_addr, &read_ack_register_addr)) {
> +        return false;
> +    }
> +
> +    cpu_physical_memory_read(read_ack_register_addr,
> +                             &read_ack_register, sizeof(uint64_t));
> +    /* zero means OSPM does not acknowledge the error */
> +    if (!read_ack_register) {
> +        error_report("Last time OSPM does not acknowledge the error,"
> +                     " record CPER failed this time, set the ack value to"
> +                     " avoid blocking next time CPER record! exit");
> +        read_ack_register = 1;
> +        cpu_physical_memory_write(read_ack_register_addr,
> +                                  &read_ack_register, sizeof(uint64_t));
> +        return false;
> +    }
> +
> +    read_ack_register = cpu_to_le64(0);
> +    cpu_physical_memory_write(read_ack_register_addr,
> +                              &read_ack_register, sizeof(uint64_t));
> +    return acpi_ghes_record_arm_error(error_types, error_block_addr);
> +}
> +
>  bool acpi_ghes_present(void)
>  {
>      AcpiGedState *acpi_ged_state;
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 1ad60da7aa2d..bafac82f9fd3 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -712,3 +712,7 @@ config ARMSSE
>      select UNIMP
>      select SSE_COUNTER
>      select SSE_TIMER
> +
> +config ARM_EINJ
> +    bool
> +    default y if AARCH64
> diff --git a/hw/arm/arm_error_inject.c b/hw/arm/arm_error_inject.c
> new file mode 100644
> index 000000000000..1da97d5d4fdc
> --- /dev/null
> +++ b/hw/arm/arm_error_inject.c
> @@ -0,0 +1,35 @@
> +/*
> + * ARM Processor error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi-commands-arm-error-inject.h"
> +#include "hw/boards.h"
> +#include "hw/acpi/ghes.h"
> +
> +/* For ARM processor errors */
> +void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
> +{
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    uint8_t error_types = 0;
> +
> +    while (errortypes) {
> +        error_types |= BIT(errortypes->value);
> +        errortypes = errortypes->next;
> +    }
> +
> +    ghes_record_arm_errors(error_types, ACPI_GHES_NOTIFY_GPIO);
> +    if (mc->set_error) {
> +        mc->set_error();
> +    }
> +
> +    return;
> +}
> diff --git a/hw/arm/arm_error_inject_stubs.c b/hw/arm/arm_error_inject_stubs.c
> new file mode 100644
> index 000000000000..b51f4202fe64
> --- /dev/null
> +++ b/hw/arm/arm_error_inject_stubs.c
> @@ -0,0 +1,18 @@
> +/*
> + * QMP stub for ARM processor error injection.
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi-commands-arm-error-inject.h"
> +
> +void qmp_arm_inject_error(ArmProcessorErrorTypeList *errortypes, Error **errp)
> +{
> +    error_setg(errp, "ARM processor error support is not compiled in");
> +}
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index 0c07ab522f4c..cb7fe09fc87b 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -60,6 +60,7 @@ arm_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmuv3.c'))
>  arm_ss.add(when: 'CONFIG_FSL_IMX6UL', if_true: files('fsl-imx6ul.c', 'mcimx6ul-evk.c'))
>  arm_ss.add(when: 'CONFIG_NRF51_SOC', if_true: files('nrf51_soc.c'))
>  arm_ss.add(when: 'CONFIG_XEN', if_true: files('xen_arm.c'))
> +arm_ss.add(when: 'CONFIG_ARM_EINJ', if_true: files('arm_error_inject.c'))
>  
>  system_ss.add(when: 'CONFIG_ARM_SMMUV3', if_true: files('smmu-common.c'))
>  system_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c'))
> @@ -77,5 +78,7 @@ system_ss.add(when: 'CONFIG_TOSA', if_true: files('tosa.c'))
>  system_ss.add(when: 'CONFIG_VERSATILE', if_true: files('versatilepb.c'))
>  system_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
>  system_ss.add(when: 'CONFIG_Z2', if_true: files('z2.c'))
> +system_ss.add(when: 'CONFIG_ARM_EINJ', if_false: files('arm_error_inject_stubs.c'))
> +system_ss.add(when: 'CONFIG_ALL', if_true: files('arm_error_inject_stubs.c'))
>  
>  hw_arch += {'arm': arm_ss}
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 4f1ab1a73a06..dc531ffce7ae 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -75,6 +75,8 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>                            GArray *hardware_errors);
>  int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
>  
> +bool ghes_record_arm_errors(uint8_t error_types, uint32_t notify);
> +
>  /**
>   * acpi_ghes_present: Report whether ACPI GHES table is present
>   *
> diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
> new file mode 100644
> index 000000000000..430e6cea6b60
> --- /dev/null
> +++ b/qapi/arm-error-inject.json
> @@ -0,0 +1,49 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = ARM Processor Errors
> +##
> +
> +##
> +# @ArmProcessorErrorType:
> +#
> +# Type of ARM processor error to inject
> +#
> +# @unknown-error: Unknown error
> +#
> +# @cache-error: Cache error
> +#
> +# @tlb-error: TLB error
> +#
> +# @bus-error: Bus error.
> +#
> +# @micro-arch-error: Micro architectural error.
> +#
> +# Since: 9.1
> +##
> +{ 'enum': 'ArmProcessorErrorType',
> +  'data': ['unknown-error',
> +	   'cache-error',
> +           'tlb-error',
> +           'bus-error',
> +           'micro-arch-error']
> +}
> +
> +##
> +# @arm-inject-error:
> +#
> +# Inject ARM Processor error.
> +#
> +# @errortypes: ARM processor error types to inject
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.1
> +##
> +{ 'command': 'arm-inject-error',
> +  'data': { 'errortypes': ['ArmProcessorErrorType'] },
> +  'features': [ 'unstable' ]
> +}
> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..5927932c4be3 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -22,6 +22,7 @@ if have_system or have_tools or have_ga
>  endif
>  
>  qapi_all_modules = [
> +  'arm-error-inject',
>    'authz',
>    'block',
>    'block-core',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..479a22de7e43 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -81,3 +81,4 @@
>  { 'include': 'vfio.json' }
>  { 'include': 'cryptodev.json' }
>  { 'include': 'cxl.json' }
> +{ 'include': 'arm-error-inject.json' }



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs
  2024-07-22  6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
@ 2024-07-30 11:24   ` Igor Mammedov
  2024-07-30 11:36     ` Michael S. Tsirkin
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30 11:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Mon, 22 Jul 2024 08:45:58 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> There is one reference to ACPI 4.0 and several references
> to ACPI 6.x versions.
> 
> Update them to point to ACPI 6.5 whenever possible.

when it comes to APCI doc comments, they should point to
the 1st (earliest) revision that provides given feature/value/field/table.


> There's one reference that was kept pointing to ACPI 6.4,
> though, with HEST revision 1.
> 
> ACPI 6.5 now defines HEST revision 2, and defined a new
> way to handle source types starting from 12. According
> with ACPI 6.5 revision history:
> 
> 	2312 Update to the HEST table and adding new error
> 	     source descriptor - Table 18.2.
> 
> Yet, the spec doesn't define yet any new source
> descriptors. It just defines a different behavior when
> source type is above 11.
> 
> I also double-checked GHES implementation on an open
> source project (Linux Kernel). Currently upstream
> doesn't currently handle HEST revision, ignoring such
> field.
> 
> In any case, revision 2 seems to be backward-compatible
> with revison 1 when type <= 11 and just one error is
> contained on a HEST record.
> 
> So, while it is probably safe to update it, there's no
> real need. So, let's keep the implementation using
> an ACPI 6.4 compatible table, e. g. HEST revision 1.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes.c | 48 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 6075ef5893ce..ebf1b812aaaa 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -45,9 +45,9 @@
>  #define GAS_ADDR_OFFSET 4
>  
>  /*
> - * The total size of Generic Error Data Entry
> - * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> - * Table 18-343 Generic Error Data Entry
> + * The total size of Generic Error Data Entry before data field
> + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> + * Table 18.12 Generic Error Data Entry
>   */
>  #define ACPI_GHES_DATA_LENGTH               72
>  
> @@ -65,8 +65,8 @@
>  
>  /*
>   * Total size for Generic Error Status Block except Generic Error Data Entries
> - * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> - * Table 18-380 Generic Error Status Block
> + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> + * Table 18.11 Generic Error Status Block
>   */
>  #define ACPI_GHES_GESB_SIZE                 20
>  
> @@ -82,7 +82,8 @@ enum AcpiGenericErrorSeverity {
>  
>  /*
>   * Hardware Error Notification
> - * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> + * ACPI 6.5: 18.3.2.9 Hardware Error Notification,
> + * Table 18.14 - Hardware Error Notification Structure
>   * Composes dummy Hardware Error Notification descriptor of specified type
>   */
>  static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
> @@ -112,7 +113,8 @@ static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
>  
>  /*
>   * Generic Error Data Entry
> - * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> + * Table 18.12 - Generic Error Data Entry
>   */
>  static void acpi_ghes_generic_error_data(GArray *table,
>                  const uint8_t *section_type, uint32_t error_severity,
> @@ -148,7 +150,8 @@ static void acpi_ghes_generic_error_data(GArray *table,
>  
>  /*
>   * Generic Error Status Block
> - * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> + * Table 18.11 - Generic Hardware Error Source Structure
>   */
>  static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
>                  uint32_t raw_data_offset, uint32_t raw_data_length,
> @@ -429,15 +432,18 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
>          0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
>  }
>  
> -/* Build Generic Hardware Error Source version 2 (GHESv2) */
> +/*
> + * Build Generic Hardware Error Source version 2 (GHESv2)
> + * ACPI 6.5: 18.3.2.8 Generic Hardware Error Source version 2 (GHESv2 - Type 10),
> + * Table 18.13: Generic Hardware Error Source version 2 (GHESv2)
> + */
>  static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
>  {
>      uint64_t address_offset;
> -    /*
> -     * Type:
> -     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> -     */
> +    /* Type: (GHESv2 - Type 10) */
>      build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> +
> +    /* ACPI 6.5: Table 18.10 - Generic Hardware Error Source Structure */
>      /* Source Id */
>      build_append_int_noprefix(table_data, source_id, 2);
>      /* Related Source Id */
> @@ -481,11 +487,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
>      /* Error Status Block Length */
>      build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
>  
> -    /*
> -     * Read Ack Register
> -     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> -     * version 2 (GHESv2 - Type 10)
> -     */
> +    /* ACPI 6.5: fields defined at GHESv2 table */
> +    /* Read Ack Register */
>      address_offset = table_data->len;
>      build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>                       4 /* QWord access */, 0);
> @@ -504,11 +507,16 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
>      build_append_int_noprefix(table_data, 0x1, 8);
>  }
>  
> -/* Build Hardware Error Source Table */
> +/*
> + * Build Hardware Error Source Table
> + * ACPI 6.4: 18.3.2 ACPI Error Source
> + * Table 18.2: Hardware Error Source Table (HEST)
> + */
>  void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
>                       const char *oem_id, const char *oem_table_id)
>  {
> -    AcpiTable table = { .sig = "HEST", .rev = 1,
> +    AcpiTable table = { .sig = "HEST",
> +                        .rev = 1,                   /* ACPI 4.0 to 6.4 */
>                          .oem_id = oem_id, .oem_table_id = oem_table_id };
>  
>      acpi_table_begin(&table, table_data);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-07-30  8:29     ` Peter Maydell
@ 2024-07-30 11:26       ` Igor Mammedov
  2024-08-01 13:15         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-07-30 11:26 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose,
	Michael S. Tsirkin, Ani Sinha, Shannon Zhao, linux-kernel,
	qemu-arm, qemu-devel

On Tue, 30 Jul 2024 09:29:37 +0100
Peter Maydell <peter.maydell@linaro.org> wrote:

> On Tue, 30 Jul 2024 at 08:26, Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > On Mon, 22 Jul 2024 08:45:53 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >  
> > > Having magic numbers inside the code is not a good idea, as it
> > > is error-prone. So, instead, create a macro with the number
> > > definition.
> > >
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>  
> 
> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > index b0c68d66a345..c99c8b1713c6 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> > >      if (s->acpi_dev) {
> > >          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> > >      } else {
> > > -        /* use gpio Pin 3 for power button event */
> > > +        /* use gpio Pin for power button event */
> > >          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);  
> >
> > /me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
> > you are passing 3. Is this a bug?  
> 
> No. The gpio_key_dev is a gpio-key device which has one
> input (which you assert to "press the key") and one output,
> which goes high when the key is pressed and then falls
> 100ms later. The virt board wires up the output of the
> gpio-key device to input 3 on the PL061 GPIO controller.
> (This happens in create_gpio_keys().) So the code is correct
> to assert input 0 on the gpio-key device and the comment
> isn't wrong that this results in GPIO pin 3 being asserted:
> the link is just indirect.

it's likely obvious to ARM folks, but maybe comment should
clarify above for unaware.
 
> 
> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs
  2024-07-30 11:24   ` Igor Mammedov
@ 2024-07-30 11:36     ` Michael S. Tsirkin
  2024-07-31  6:05       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Michael S. Tsirkin @ 2024-07-30 11:36 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Tue, Jul 30, 2024 at 01:24:30PM +0200, Igor Mammedov wrote:
> On Mon, 22 Jul 2024 08:45:58 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > There is one reference to ACPI 4.0 and several references
> > to ACPI 6.x versions.
> > 
> > Update them to point to ACPI 6.5 whenever possible.
> 
> when it comes to APCI doc comments, they should point to
> the 1st (earliest) revision that provides given feature/value/field/table.

Yes. And the motivation is twofold.
First, guests are built against
old acpi versions. knowing in which version things appeared
helps us know which guests support a feature.
Second, acpi guys keep churning out new versions.
It makes no sense to try and update to latest one,
it will soon get out of date again.

> 
> > There's one reference that was kept pointing to ACPI 6.4,
> > though, with HEST revision 1.
> > 
> > ACPI 6.5 now defines HEST revision 2, and defined a new
> > way to handle source types starting from 12. According
> > with ACPI 6.5 revision history:
> > 
> > 	2312 Update to the HEST table and adding new error
> > 	     source descriptor - Table 18.2.
> > 
> > Yet, the spec doesn't define yet any new source
> > descriptors. It just defines a different behavior when
> > source type is above 11.
> > 
> > I also double-checked GHES implementation on an open
> > source project (Linux Kernel). Currently upstream
> > doesn't currently handle HEST revision, ignoring such
> > field.
> > 
> > In any case, revision 2 seems to be backward-compatible
> > with revison 1 when type <= 11 and just one error is
> > contained on a HEST record.
> > 
> > So, while it is probably safe to update it, there's no
> > real need. So, let's keep the implementation using
> > an ACPI 6.4 compatible table, e. g. HEST revision 1.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/acpi/ghes.c | 48 ++++++++++++++++++++++++++++--------------------
> >  1 file changed, 28 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 6075ef5893ce..ebf1b812aaaa 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -45,9 +45,9 @@
> >  #define GAS_ADDR_OFFSET 4
> >  
> >  /*
> > - * The total size of Generic Error Data Entry
> > - * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > - * Table 18-343 Generic Error Data Entry
> > + * The total size of Generic Error Data Entry before data field
> > + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> > + * Table 18.12 Generic Error Data Entry
> >   */
> >  #define ACPI_GHES_DATA_LENGTH               72
> >  
> > @@ -65,8 +65,8 @@
> >  
> >  /*
> >   * Total size for Generic Error Status Block except Generic Error Data Entries
> > - * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> > - * Table 18-380 Generic Error Status Block
> > + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> > + * Table 18.11 Generic Error Status Block
> >   */
> >  #define ACPI_GHES_GESB_SIZE                 20
> >  
> > @@ -82,7 +82,8 @@ enum AcpiGenericErrorSeverity {
> >  
> >  /*
> >   * Hardware Error Notification
> > - * ACPI 4.0: 17.3.2.7 Hardware Error Notification
> > + * ACPI 6.5: 18.3.2.9 Hardware Error Notification,
> > + * Table 18.14 - Hardware Error Notification Structure
> >   * Composes dummy Hardware Error Notification descriptor of specified type
> >   */
> >  static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
> > @@ -112,7 +113,8 @@ static void build_ghes_hw_error_notification(GArray *table, const uint8_t type)
> >  
> >  /*
> >   * Generic Error Data Entry
> > - * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> > + * Table 18.12 - Generic Error Data Entry
> >   */
> >  static void acpi_ghes_generic_error_data(GArray *table,
> >                  const uint8_t *section_type, uint32_t error_severity,
> > @@ -148,7 +150,8 @@ static void acpi_ghes_generic_error_data(GArray *table,
> >  
> >  /*
> >   * Generic Error Status Block
> > - * ACPI 6.1: 18.3.2.7.1 Generic Error Data
> > + * ACPI 6.5: 18.3.2.7.1 Generic Error Data,
> > + * Table 18.11 - Generic Hardware Error Source Structure
> >   */
> >  static void acpi_ghes_generic_error_status(GArray *table, uint32_t block_status,
> >                  uint32_t raw_data_offset, uint32_t raw_data_length,
> > @@ -429,15 +432,18 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> >          0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
> >  }
> >  
> > -/* Build Generic Hardware Error Source version 2 (GHESv2) */
> > +/*
> > + * Build Generic Hardware Error Source version 2 (GHESv2)
> > + * ACPI 6.5: 18.3.2.8 Generic Hardware Error Source version 2 (GHESv2 - Type 10),
> > + * Table 18.13: Generic Hardware Error Source version 2 (GHESv2)
> > + */
> >  static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> >  {
> >      uint64_t address_offset;
> > -    /*
> > -     * Type:
> > -     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> > -     */
> > +    /* Type: (GHESv2 - Type 10) */
> >      build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
> > +
> > +    /* ACPI 6.5: Table 18.10 - Generic Hardware Error Source Structure */
> >      /* Source Id */
> >      build_append_int_noprefix(table_data, source_id, 2);
> >      /* Related Source Id */
> > @@ -481,11 +487,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> >      /* Error Status Block Length */
> >      build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> >  
> > -    /*
> > -     * Read Ack Register
> > -     * ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
> > -     * version 2 (GHESv2 - Type 10)
> > -     */
> > +    /* ACPI 6.5: fields defined at GHESv2 table */
> > +    /* Read Ack Register */
> >      address_offset = table_data->len;
> >      build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
> >                       4 /* QWord access */, 0);
> > @@ -504,11 +507,16 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> >      build_append_int_noprefix(table_data, 0x1, 8);
> >  }
> >  
> > -/* Build Hardware Error Source Table */
> > +/*
> > + * Build Hardware Error Source Table
> > + * ACPI 6.4: 18.3.2 ACPI Error Source
> > + * Table 18.2: Hardware Error Source Table (HEST)
> > + */
> >  void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> >                       const char *oem_id, const char *oem_table_id)
> >  {
> > -    AcpiTable table = { .sig = "HEST", .rev = 1,
> > +    AcpiTable table = { .sig = "HEST",
> > +                        .rev = 1,                   /* ACPI 4.0 to 6.4 */
> >                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> >  
> >      acpi_table_begin(&table, table_data);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES
  2024-07-30  8:36   ` Igor Mammedov
@ 2024-07-31  5:17     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-31  5:17 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Philippe Mathieu-Daudé, Ani Sinha, Eduardo Habkost,
	Marcel Apfelbaum, Peter Maydell, Shannon Zhao, Yanan Wang,
	linux-kernel, qemu-arm, qemu-devel, shameerali.kolothum.thodi

Em Tue, 30 Jul 2024 10:36:15 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Mon, 22 Jul 2024 08:45:54 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> > Creates a GED - Generic Event Device and set a GPIO to
> > be used or error injection.  
> 
> QEMU already has GED device, so question is why it wasn't
> used for event delivery?
> I nutshell, I'd really prefer this series being rewritten
> to reuse exiting GED instead of adding ad hoc GPIO and ACPI
> plumbing.

Makes sense. I'll split this one on two patches, the first
one adding the error device PNP to acpi/generic_event_device,
and the second one with ghes and arm virt changes to support
it, using a notifier list inside ghes to signalize the error
events.

Jonathan,

As the logic will be different, I'm placing you as co-author,
and adding you as Cc on the patches. If you're ok with that,
please reply with your SoB to them when I submit the next patch 
series.

> PS:
> as side effect of that, error injection could be used no only for
> ARM but other machines that use GED (providing they implement GHES) 
> 
> Also CCing Shameer wrt touched power button code
> 
> > [mchehab: use a define for the generic event pin number and do some cleanups]
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/arm/virt-acpi-build.c | 30 ++++++++++++++++++++++++++----
> >  hw/arm/virt.c            | 14 ++++++++++++--
> >  include/hw/arm/virt.h    |  1 +
> >  include/hw/boards.h      |  1 +
> >  4 files changed, 40 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> > index f76fb117adff..c502ccf40909 100644
> > --- a/hw/arm/virt-acpi-build.c
> > +++ b/hw/arm/virt-acpi-build.c
> > @@ -63,6 +63,7 @@
> >  
> >  #define ARM_SPI_BASE 32
> >  
> > +#define ACPI_GENERIC_EVENT_DEVICE "GEDD"
> >  #define ACPI_BUILD_TABLE_SIZE             0x20000
> >  
> >  static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms)
> > @@ -142,6 +143,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
> >  static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
> >                                             uint32_t gpio_irq)  
> 
> this function supposed to be called when acpi_dev is not present (exiting GED device)
> and run on old machines only, so it should not be called for recent machine types.
> I'd avoid adding anything to it.
> 
> see more comment about it below
> 
> >  {
> > +    uint32_t pin;
> > +
> >      Aml *dev = aml_device("GPO0");
> >      aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0061")));
> >      aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > @@ -155,7 +158,12 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
> >  
> >      Aml *aei = aml_resource_template();
> >  
> > -    const uint32_t pin = GPIO_PIN_POWER_BUTTON;
> > +    pin = GPIO_PIN_POWER_BUTTON;
> > +    aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> > +                                 AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
> > +                                 "GPO0", NULL, 0));
> > +    /* Pin for generic error */
> > +    pin = GPIO_PIN_GENERIC_ERROR;
> >      aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> >                                   AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
> >                                   "GPO0", NULL, 0));
> > @@ -166,6 +174,11 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
> >      aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> >                                    aml_int(0x80)));
> >      aml_append(dev, method);
> > +    method = aml_method("_E06", 0, AML_NOTSERIALIZED);
> > +    aml_append(method, aml_notify(aml_name(ACPI_GENERIC_EVENT_DEVICE),
> > +                                  aml_int(0x80)));
> > +    aml_append(dev, method);
> > +
> >      aml_append(scope, dev);
> >  }
> >  
> > @@ -800,6 +813,15 @@ static void build_fadt_rev6(GArray *table_data, BIOSLinker *linker,
> >      build_fadt(table_data, linker, &fadt, vms->oem_id, vms->oem_table_id);
> >  }
> >  
> > +static void acpi_dsdt_add_generic_event_device(Aml *scope)
> > +{
> > +    Aml *dev = aml_device(ACPI_GENERIC_EVENT_DEVICE);
> > +    aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));  
> this is not _event_ device, it's referred as _error_ device in spec.
> 
> PS:
> please properly document new ACPI primitives/devices,
> see comment above aml_notify() for example.
> Use earliest APIC spec where the device was defined for the 1st time.
> 
> > +    aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > +    aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> > +    aml_append(scope, dev);
> > +}
> > +
> >  /* DSDT */
> >  static void
> >  build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> > @@ -841,10 +863,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >                        HOTPLUG_HANDLER(vms->acpi_dev),
> >                        irqmap[VIRT_ACPI_GED] + ARM_SPI_BASE, AML_SYSTEM_MEMORY,
> >                        memmap[VIRT_ACPI_GED].base);
> > -    } else {
> > -        acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > -                           (irqmap[VIRT_GPIO] + ARM_SPI_BASE));
> >      }
> > +    acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
> > +                       (irqmap[VIRT_GPIO] + ARM_SPI_BASE));  
> 
> wouldn't that create double/conflicting power button handlers
> (GPIO and GED one), on recent machine types GED should be used
> and power button in acpi_dsdt_add_gpio() is used only if
> machine doesn't have GED.
> 
> >  
> >      if (vms->acpi_dev) {
> >          uint32_t event = object_property_get_uint(OBJECT(vms->acpi_dev),
> > @@ -858,6 +879,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> >      }
> >  
> >      acpi_dsdt_add_power_button(scope);
> > +    acpi_dsdt_add_generic_event_device(scope);
> >  #ifdef CONFIG_TPM
> >      acpi_dsdt_add_tpm(scope, vms);
> >  #endif
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index c99c8b1713c6..f81cf3a69961 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -997,6 +997,13 @@ static void create_rtc(const VirtMachineState *vms)
> >  }
> >  
> >  static DeviceState *gpio_key_dev;
> > +
> > +static DeviceState *gpio_error_dev;
> > +static void virt_set_error(void)
> > +{
> > +    qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> > +}
> > +
> >  static void virt_powerdown_req(Notifier *n, void *opaque)
> >  {
> >      VirtMachineState *s = container_of(n, VirtMachineState, powerdown_notifier);
> > @@ -1015,6 +1022,9 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> >      gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> >                                          qdev_get_gpio_in(pl061_dev,
> >                                                           GPIO_PIN_POWER_BUTTON));
> > +    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
> > +                                          qdev_get_gpio_in(pl061_dev,
> > +                                                           GPIO_PIN_GENERIC_ERROR));
> >  
> >      qemu_fdt_add_subnode(fdt, "/gpio-keys");
> >      qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> > @@ -2385,9 +2395,8 @@ static void machvirt_init(MachineState *machine)
> >  
> >      if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
> >          vms->acpi_dev = create_acpi_ged(vms);
> > -    } else {
> > -        create_gpio_devices(vms, VIRT_GPIO, sysmem);
> >      }
> > +    create_gpio_devices(vms, VIRT_GPIO, sysmem);  
> 
> again, this create duplicate/conflicting power button source
> 
> >  
> >      if (vms->secure && !vmc->no_secure_gpio) {
> >          create_gpio_devices(vms, VIRT_SECURE_GPIO, secure_sysmem);
> > @@ -3101,6 +3110,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
> >      mc->default_ram_id = "mach-virt.ram";
> >      mc->default_nic = "virtio-net-pci";
> >  
> > +    mc->set_error = virt_set_error;
> >      object_class_property_add(oc, "acpi", "OnOffAuto",
> >          virt_get_acpi, virt_set_acpi,
> >          NULL, NULL);
> > diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> > index a4d937ed45ac..c9769d7d4d7f 100644
> > --- a/include/hw/arm/virt.h
> > +++ b/include/hw/arm/virt.h
> > @@ -49,6 +49,7 @@
> >  
> >  /* GPIO pins */
> >  #define GPIO_PIN_POWER_BUTTON  3
> > +#define GPIO_PIN_GENERIC_ERROR 6
> >  
> >  enum {
> >      VIRT_FLASH,
> > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > index ef6f18f2c1a7..6cf01f3934ae 100644
> > --- a/include/hw/boards.h
> > +++ b/include/hw/boards.h
> > @@ -304,6 +304,7 @@ struct MachineClass {
> >      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
> >      int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
> >      ram_addr_t (*fixup_ram_size)(ram_addr_t size);
> > +    void (*set_error)(void);
> >  };
> >  
> >  /**  
> 



Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs
  2024-07-30 11:36     ` Michael S. Tsirkin
@ 2024-07-31  6:05       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-31  6:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Igor Mammedov, Jonathan Cameron, Shiju Jose, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

Em Tue, 30 Jul 2024 07:36:32 -0400
"Michael S. Tsirkin" <mst@redhat.com> escreveu:

> On Tue, Jul 30, 2024 at 01:24:30PM +0200, Igor Mammedov wrote:
> > On Mon, 22 Jul 2024 08:45:58 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > There is one reference to ACPI 4.0 and several references
> > > to ACPI 6.x versions.
> > > 
> > > Update them to point to ACPI 6.5 whenever possible.  
> > 
> > when it comes to APCI doc comments, they should point to
> > the 1st (earliest) revision that provides given feature/value/field/table.  
> 
> Yes. And the motivation is twofold.
> First, guests are built against
> old acpi versions. knowing in which version things appeared
> helps us know which guests support a feature.

Good point, but IMO, a comment like "since: ACPI 4.0" would
be better, as the comment may not reflect the first version
supporting such features, but, instead, when someone added
support to a particular feature set.

> Second, acpi guys keep churning out new versions.
> It makes no sense to try and update to latest one,
> it will soon get out of date again.

True, but having it updated helps people adding new code to
get things right.

Anyway, I got your point, I'll drop this patch.

> > >  void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> > >                       const char *oem_id, const char *oem_table_id)
> > >  {
> > > -    AcpiTable table = { .sig = "HEST", .rev = 1,
> > > +    AcpiTable table = { .sig = "HEST",
> > > +                        .rev = 1,                   /* ACPI 4.0 to 6.4 */
> > >                          .oem_id = oem_id, .oem_table_id = oem_table_id };
> > >  
> > >      acpi_table_begin(&table, table_data);  

This hunk might still make sense, though. When double-checking the links
against ACPI 6.5, I noticed that HEST now requires .rev = 2.

There are some future incompatibilities, but the current
implementation of acpi/ghes satisfies both rev 1 and ref 2 of HEST.

Also, this is not relevant on Linux, as the revision is not checked 
there.

So, currently this is not a problem.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-30 11:17   ` Igor Mammedov
@ 2024-07-31  7:11     ` Mauro Carvalho Chehab
  2024-07-31  8:57       ` Jonathan Cameron via
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-31  7:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Tue, 30 Jul 2024 13:17:09 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Mon, 22 Jul 2024 08:45:56 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> that's quite a bit of code that in 99% won't ever be used
> (assuming error injection testing scenario),
> not to mention it's a hw depended one and governed by different specs.
>
> Essentially we would need to create _whole_ lot of QAPI
> commands to cover possible errors for no benefit to QEMU.
> 
> Let take for example very simple _OST status reporting,
> QEMU of cause can decode values and present it to users in
> more 'presentable' form. However instead of translating
> numbers (aka. spec language) into a made up QEMU language,
> QEMU just passes values up the stack and users can use
> well defined spec to interpret its meaning.
> 
> benefits are: QEMU doesn't have to maintain translation
> code and QAPI ABI is limited to passing raw values.
> 
> Can we do similar thing here as well?
> i.e. simplify error injection commands to
> a command that takes raw value and passes it
> to guest (QEMU here acts as proxy, if I'm not
> mistaken)?
> 
> Preferably make it generic enough to handle
> not only ARM but other error formats HEST is
> able to handle.

A too generic interface doesn't sound feasible to me, as the
EINJ code needs to check QEMU implementation details before
doing the error inject.

See, processor is probably the simplest error injection
source, as most of the fields there aren't related to how
the hardware simulation is done.

Yet, if you see patch 7 of this series, you'll notice that some
fields should actually be filled based on the emulation.

On ARM, we have some IDs that depend on the emulation
(MIDR, MPIDR, power state). Doing that on userspace may require
a QAPI to query them.

The memory layout, however, is the most complex one. Even for
an ARM processor CPER (which is the simplest scenario), the 
physical/virtual address need to be checked against the emulation
environment.

Other error sources (like memory errors, CXL, etc) will require
a deep knowledge about how QEMU mapped such devices.

So, in practice, if we move this to an EINJ script, we'll need
to add a probably more complex QAPI to allow querying the memory
layout and other device and CPU specific bindings.

Also, we don't know what newer versions of ACPI spec will reserve
us. See, even the HEST table contents is dependent of the HEST 
revision number, as made clear at the ACPI 6.5 notes:

	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source

and at:

	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward

So, if we're willing to add support for a more generic "raw data"
QAPI, I would still do it per-type, and for the fields that won't
require knowledge of the device-emulation details.

Btw, my proposal on patch 7 of this series is to have raw data
for:
	- the error-info field;
	- registers dump;
	- micro-architecture specific data.

I don't mind trying to have more raw data there as I see (marginal) 
benefits of allowing to generate CPER invalid records [1], but some of
those  fields need to be validated and/or filled internally at QEMU - if
not forced to an specific value by the caller.

[1] a raw data EINJ can be useful for fuzzy logic fault detection to 
    check if badly formed packages won't cause a Kernel panic or be
    an exploit. Yet, not really a concern for APEI, as if the hardware
    is faulty, a Kernel panic is not out of the table. Also, if the
    the BIOS is already compromised and has malicious code on it, 
    the EINJ interface is not the main concern.

> PS:
> For user convenience, QEMU can carry a script that
> could help generate this raw value in user friendly way
> but at the same time it won't put maintenance
> burden on QEMU itself.

The script will still require reviews, and the same code will 
be there. So, from maintenance burden, there won't be much
difference.

Btw, I'm actually using myself a script to test it, currently
sitting together with rasdaemon - which is the Linux tool to detect
and handle hardware errors:

	https://github.com/mchehab/rasdaemon/blob/master/contrib/qemu_einj.py

as it helps a lot when trying to simulate more complex errors.

Once QEMU gains support to inject processor errors, I can prepare a 
separate patch to move it to QEMU.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-31  7:11     ` Mauro Carvalho Chehab
@ 2024-07-31  8:57       ` Jonathan Cameron via
  2024-07-31 10:30         ` Mauro Carvalho Chehab
  2024-08-01  8:36         ` Igor Mammedov
  0 siblings, 2 replies; 42+ messages in thread
From: Jonathan Cameron via @ 2024-07-31  8:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Wed, 31 Jul 2024 09:11:33 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Tue, 30 Jul 2024 13:17:09 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > On Mon, 22 Jul 2024 08:45:56 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > 
> > that's quite a bit of code that in 99% won't ever be used
> > (assuming error injection testing scenario),
> > not to mention it's a hw depended one and governed by different specs.
> >
> > Essentially we would need to create _whole_ lot of QAPI
> > commands to cover possible errors for no benefit to QEMU.

Fair point.  A 'few' error types might be helpful to general
users like the original memory error injection this is built
on which is reduce blast radius of a real error (and used in
production VMM cases), but most are about testing the rest
of the stack, not really QEMU.

So it's very helpful for a smallish group of users.

> > 
> > Let take for example very simple _OST status reporting,
> > QEMU of cause can decode values and present it to users in
> > more 'presentable' form. However instead of translating
> > numbers (aka. spec language) into a made up QEMU language,
> > QEMU just passes values up the stack and users can use
> > well defined spec to interpret its meaning.
> > 
> > benefits are: QEMU doesn't have to maintain translation
> > code and QAPI ABI is limited to passing raw values.
> > 
> > Can we do similar thing here as well?
> > i.e. simplify error injection commands to
> > a command that takes raw value and passes it
> > to guest (QEMU here acts as proxy, if I'm not
> > mistaken)?
> > 
> > Preferably make it generic enough to handle
> > not only ARM but other error formats HEST is
> > able to handle.  
> 
> A too generic interface doesn't sound feasible to me, as the
> EINJ code needs to check QEMU implementation details before
> doing the error inject.

To be clear we are talking here about a script that
generates 'similar' stuff to ACPI EINJ does and injects
via qapi, not guest injection (which is almost always locked
down on production machines / distros because of the footgun
aspect).  + ACPI EINJ interface suffers exactly the same
problems with state discoverability we have with a raw interface here.
(I checked with Mauro offline that I'd interpreted this
comment correctly!)

> 
> See, processor is probably the simplest error injection
> source, as most of the fields there aren't related to how
> the hardware simulation is done.
> 
> Yet, if you see patch 7 of this series, you'll notice that some
> fields should actually be filled based on the emulation.
> 
> On ARM, we have some IDs that depend on the emulation
> (MIDR, MPIDR, power state). Doing that on userspace may require
> a QAPI to query them.

We could strip back the QAPI part to only the bits that are
not dependent on state.  However, the kicker to that is we'd
need to make sure all that state is available to an external
tool (or fully controllable from initial launch command line).
I'm not sure where the gaps are but, I'm fairly sure there
will be some.  Doesn't save much code other than documentation
of the QAPI.

> 
> The memory layout, however, is the most complex one. Even for
> an ARM processor CPER (which is the simplest scenario), the 
> physical/virtual address need to be checked against the emulation
> environment.
> 
> Other error sources (like memory errors, CXL, etc) will require
> a deep knowledge about how QEMU mapped such devices.

For CXL stuff we'll piggy back on native error injection interfaces
that are already there and couldn't be avoided because they
are writing a bunch of register state (that we elide in the FW
first path). 
https://lore.kernel.org/qemu-devel/20240205141940.31111-12-Jonathan.Cameron@huawei.com/
So we won't be adding new QAPI, but the error record generation logic
will be in QEMU.  For background, the CXL FW first error injection
has taken a back seat to the ARM errors because of the obvious
other factor that CXL isn't supported on ARM in upstream QEMU.
Once I escape a few near term deadlines I'll add the x86
support for GHESv2 / SCI interrupt signaling as you'd see on a
typical x86 server.

> 
> So, in practice, if we move this to an EINJ script, we'll need
> to add a probably more complex QAPI to allow querying the memory
> layout and other device and CPU specific bindings.
> 
> Also, we don't know what newer versions of ACPI spec will reserve
> us. See, even the HEST table contents is dependent of the HEST 
> revision number, as made clear at the ACPI 6.5 notes:
> 
> 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source
> 
> and at:
> 
> 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward
> 
> So, if we're willing to add support for a more generic "raw data"
> QAPI, I would still do it per-type, and for the fields that won't
> require knowledge of the device-emulation details.

Could blend the two options and provide no qapi for the bits
that are QEMU state dependent - if fuzzing, can inject
the full record raw as doesn't have to be valid state anyway.

> 
> Btw, my proposal on patch 7 of this series is to have raw data
> for:
> 	- the error-info field;
> 	- registers dump;
> 	- micro-architecture specific data.
> 
> I don't mind trying to have more raw data there as I see (marginal) 
> benefits of allowing to generate CPER invalid records [1], but some of
> those  fields need to be validated and/or filled internally at QEMU - if
> not forced to an specific value by the caller.
> 
> [1] a raw data EINJ can be useful for fuzzy logic fault detection to 
>     check if badly formed packages won't cause a Kernel panic or be
>     an exploit. Yet, not really a concern for APEI, as if the hardware
>     is faulty, a Kernel panic is not out of the table. Also, if the
>     the BIOS is already compromised and has malicious code on it, 
>     the EINJ interface is not the main concern.
> 
> > PS:
> > For user convenience, QEMU can carry a script that
> > could help generate this raw value in user friendly way
> > but at the same time it won't put maintenance
> > burden on QEMU itself.  
> 
> The script will still require reviews, and the same code will 
> be there. So, from maintenance burden, there won't be much
> difference.

Agreed. I'd also be very keen that the script is tightly coupled to
QEMU as doesn't make sense to carry with kernel or RAS daemon and
I'd want to ultimately get this stuff into all the appropriate
CI flows.

> 
> Btw, I'm actually using myself a script to test it, currently
> sitting together with rasdaemon - which is the Linux tool to detect
> and handle hardware errors:
> 
> 	https://github.com/mchehab/rasdaemon/blob/master/contrib/qemu_einj.py
> 
> as it helps a lot when trying to simulate more complex errors.
> 
> Once QEMU gains support to inject processor errors, I can prepare a 
> separate patch to move it to QEMU.
> 
> Thanks,
> Mauro

So tricky questions. I'm not sure which way is the least painful!

Jonathan




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-31  8:57       ` Jonathan Cameron via
@ 2024-07-31 10:30         ` Mauro Carvalho Chehab
  2024-08-01  8:36         ` Igor Mammedov
  1 sibling, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-07-31 10:30 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Wed, 31 Jul 2024 09:57:19 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:

> On Wed, 31 Jul 2024 09:11:33 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Em Tue, 30 Jul 2024 13:17:09 +0200
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >   
> > > On Mon, 22 Jul 2024 08:45:56 +0200
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > 
> > > that's quite a bit of code that in 99% won't ever be used
> > > (assuming error injection testing scenario),
> > > not to mention it's a hw depended one and governed by different specs.
> > >
> > > Essentially we would need to create _whole_ lot of QAPI
> > > commands to cover possible errors for no benefit to QEMU.  
> 
> Fair point.  A 'few' error types might be helpful to general
> users like the original memory error injection this is built
> on which is reduce blast radius of a real error (and used in
> production VMM cases), but most are about testing the rest
> of the stack, not really QEMU.

My concern is that we may end needing QAPI for querying stuff
that will be used on such script.

On a more concrete example, let's suppose we want to produce a
HEST record for a CXL source, using the layout given by a QEMU
command line like this:

./qemu-system-aarch64 -M virt,nvdimm=on,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \
 -cpu max -smp 4 -kernel Image -drive if=none,file=debian.qcow2,format=qcow2,id=hd \
 -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd \
 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 -qmp tcp:localhost:4445,server=on,wait=off \
 -device virtio-net-pci,netdev=mynet,id=bob -nographic -no-reboot \
 -append 'earlycon root=/dev/vda1 fsck.mode=skip tp_printk maxcpus=4' \
 -monitor telnet:127.0.0.1:1234,server,nowait -bios QEMU_EFI.fd \
 -object memory-backend-ram,size=4G,id=mem0 -numa node,nodeid=0,cpus=0-3,memdev=mem0 \
 -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-mem5,share=on,mem-path=/tmp/cxltest5.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem6,share=on,mem-path=/tmp/cxltest6.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem7,share=on,mem-path=/tmp/cxltest7.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-mem8,share=on,mem-path=/tmp/cxltest8.raw,size=256M,align=256M \
 -object memory-backend-file,id=cxl-lsa5,share=on,mem-path=/tmp/lsa5.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa6,share=on,mem-path=/tmp/lsa6.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa7,share=on,mem-path=/tmp/lsa7.raw,size=1M,align=1M \
 -object memory-backend-file,id=cxl-lsa8,share=on,mem-path=/tmp/lsa8.raw,size=1M,align=1M \
 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
 -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=2 \
 -device cxl-rp,port=1,bus=cxl.1,id=root_port2,chassis=0,slot=3 \
 -device virtio-rng-pci,bus=root_port2 \
 -device cxl-upstream,port=33,bus=root_port0,id=us0,multifunction=on,addr=0.0 \
 -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
 -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
 -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
 -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
 -device cxl-type3,bus=swport0,memdev=cxl-mem1,id=cxl-pmem0,lsa=cxl-lsa1,sn=3 \
 -device cxl-type3,bus=swport1,memdev=cxl-mem2,id=cxl-pmem1,lsa=cxl-lsa2,sn=4 \
 -device cxl-type3,bus=swport2,memdev=cxl-mem3,id=cxl-pmem2,lsa=cxl-lsa3,sn=5 \
 -device cxl-type3,bus=swport3,memdev=cxl-mem4,id=cxl-pmem3,lsa=cxl-lsa4,sn=6 \
 -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=1k \
 -machine ras=on

There is a complex set of CXL devices there. Doing error injection
will need to know exactly how such devices were created and the
memory got allocated, as, when testing an userspace application
running at the guest OS (like rasdaemon), we need an address that
makes sense, otherwise the memory poison of rasdaemon won't be
disabling the right address set.

So, while a HEST QAPI raw interface will have a very trivial API,
and the ghes code being also simpler, we'll need a QAPI interface
good enough to describe all CXL specific details for the error
injection script to produce a proper HEST record, if this ends
being mapped using such QAPI.

> 
> So it's very helpful for a smallish group of users.
> 

My end goal is to be able to use QEMU to validate and identify 
regressions at the Linux Kernel RAS subsystem and at the userspace 
daemon responsible to receive, log and take actions based on the 
error information (on other words, rasdaemon). Not only for
ARM, but also for x86 (and maybe in the future for other archs,
depending on how they end implementing RAS features).

At the end of the day, I'd like to have a github action running QEMU
to check  rasdaemon proposed patches against a docker container with
the latest version of Linux and QEMU.

> > > Let take for example very simple _OST status reporting,
> > > QEMU of cause can decode values and present it to users in
> > > more 'presentable' form. However instead of translating
> > > numbers (aka. spec language) into a made up QEMU language,
> > > QEMU just passes values up the stack and users can use
> > > well defined spec to interpret its meaning.
> > > 
> > > benefits are: QEMU doesn't have to maintain translation
> > > code and QAPI ABI is limited to passing raw values.
> > > 
> > > Can we do similar thing here as well?
> > > i.e. simplify error injection commands to
> > > a command that takes raw value and passes it
> > > to guest (QEMU here acts as proxy, if I'm not
> > > mistaken)?
> > > 
> > > Preferably make it generic enough to handle
> > > not only ARM but other error formats HEST is
> > > able to handle.    
> > 
> > A too generic interface doesn't sound feasible to me, as the
> > EINJ code needs to check QEMU implementation details before
> > doing the error inject.  
> 
> To be clear we are talking here about a script that
> generates 'similar' stuff to ACPI EINJ does and injects
> via qapi, not guest injection (which is almost always locked
> down on production machines / distros because of the footgun
> aspect).  + ACPI EINJ interface suffers exactly the same
> problems with state discoverability we have with a raw interface here.
> (I checked with Mauro offline that I'd interpreted this
> comment correctly!)

Yes, the end goal is to inject GHESv2 errors using a generic
event device, e. g.:

	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10

> > See, processor is probably the simplest error injection
> > source, as most of the fields there aren't related to how
> > the hardware simulation is done.
> > 
> > Yet, if you see patch 7 of this series, you'll notice that some
> > fields should actually be filled based on the emulation.
> > 
> > On ARM, we have some IDs that depend on the emulation
> > (MIDR, MPIDR, power state). Doing that on userspace may require
> > a QAPI to query them.  
> 
> We could strip back the QAPI part to only the bits that are
> not dependent on state. 

Works for me.

> However, the kicker to that is we'd
> need to make sure all that state is available to an external
> tool (or fully controllable from initial launch command line).
> I'm not sure where the gaps are but, I'm fairly sure there
> will be some.  Doesn't save much code other than documentation
> of the QAPI.

For CPU error injection, certainly the amount of code for the error
injection won't change much. The code will be split into QEMU C code 
and Python's code (inside a qemu-einj.py script). As raw data won't
be validated, some code may actually be removed.

On the other hand, some new query QAPI interfaces will be needed,
specially when handling memory-related HEST data, as the memory 
layout with enough details for the script to properly produce errors
will be needed.

So, we're talking on adding a couple of additional QAPIs. The
advantage is that such QAPIs will be independent of the ghes
driver, but some may be arch-specific, like the ones reporting fields
like ARM mpidr, midr, power_state, for instance. On an initial
implementation, we can live without those, but for the ARM processor
error injection.

I'll try to craft a proposal with a very minimal QAPI for GHESv2
injection, implementing it for ARM processor (and maybe for some other
type, to check if the interface will fit). Let's see how it goes.

> > The memory layout, however, is the most complex one. Even for
> > an ARM processor CPER (which is the simplest scenario), the 
> > physical/virtual address need to be checked against the emulation
> > environment.
> > 
> > Other error sources (like memory errors, CXL, etc) will require
> > a deep knowledge about how QEMU mapped such devices.  
> 
> For CXL stuff we'll piggy back on native error injection interfaces
> that are already there and couldn't be avoided because they
> are writing a bunch of register state (that we elide in the FW
> first path). 
> https://lore.kernel.org/qemu-devel/20240205141940.31111-12-Jonathan.Cameron@huawei.com/
> So we won't be adding new QAPI, but the error record generation logic
> will be in QEMU.  For background, the CXL FW first error injection
> has taken a back seat to the ARM errors because of the obvious
> other factor that CXL isn't supported on ARM in upstream QEMU.

Makes sense to me.

> Once I escape a few near term deadlines I'll add the x86
> support for GHESv2 / SCI interrupt signaling as you'd see on a
> typical x86 server.

Well, if we go to a generic GHESv2 QAPI interface, the arm/virt.c
will have everything in place to generate GHESv2. It could be used
to simulate a x86 processor event, as the changes will happen at
the script side. We'll still need to add the needed bits at x86
virt code, though, if we want the error injection to be tested
against a guest doing x86 emulation.

> > 
> > So, in practice, if we move this to an EINJ script, we'll need
> > to add a probably more complex QAPI to allow querying the memory
> > layout and other device and CPU specific bindings.
> > 
> > Also, we don't know what newer versions of ACPI spec will reserve
> > us. See, even the HEST table contents is dependent of the HEST 
> > revision number, as made clear at the ACPI 6.5 notes:
> > 
> > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source
> > 
> > and at:
> > 
> > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward
> > 
> > So, if we're willing to add support for a more generic "raw data"
> > QAPI, I would still do it per-type, and for the fields that won't
> > require knowledge of the device-emulation details.  
> 
> Could blend the two options and provide no qapi for the bits
> that are QEMU state dependent - if fuzzing, can inject
> the full record raw as doesn't have to be valid state anyway.

Not sure. I guess it will depend if we'll be using a simple raw data
buffer for CPER record(s) or if we'll break it per CPER type.

The GHES interface for generic CPER records would be simpler, but it
may require some QEMU query QAPIs for the script to be able to get
what it is needed..

Let me try to code it and see how it goes.

> > Btw, my proposal on patch 7 of this series is to have raw data
> > for:
> > 	- the error-info field;
> > 	- registers dump;
> > 	- micro-architecture specific data.
> > 
> > I don't mind trying to have more raw data there as I see (marginal) 
> > benefits of allowing to generate CPER invalid records [1], but some of
> > those  fields need to be validated and/or filled internally at QEMU - if
> > not forced to an specific value by the caller.
> > 
> > [1] a raw data EINJ can be useful for fuzzy logic fault detection to 
> >     check if badly formed packages won't cause a Kernel panic or be
> >     an exploit. Yet, not really a concern for APEI, as if the hardware
> >     is faulty, a Kernel panic is not out of the table. Also, if the
> >     the BIOS is already compromised and has malicious code on it, 
> >     the EINJ interface is not the main concern.
> >   
> > > PS:
> > > For user convenience, QEMU can carry a script that
> > > could help generate this raw value in user friendly way
> > > but at the same time it won't put maintenance
> > > burden on QEMU itself.    
> > 
> > The script will still require reviews, and the same code will 
> > be there. So, from maintenance burden, there won't be much
> > difference.  
> 
> Agreed. I'd also be very keen that the script is tightly coupled to
> QEMU as doesn't make sense to carry with kernel or RAS daemon and
> I'd want to ultimately get this stuff into all the appropriate
> CI flows.

Agreed: placing it together with QEMU is indeed the best location.

> > 
> > Btw, I'm actually using myself a script to test it, currently
> > sitting together with rasdaemon - which is the Linux tool to detect
> > and handle hardware errors:
> > 
> > 	https://github.com/mchehab/rasdaemon/blob/master/contrib/qemu_einj.py
> > 
> > as it helps a lot when trying to simulate more complex errors.
> > 
> > Once QEMU gains support to inject processor errors, I can prepare a 
> > separate patch to move it to QEMU.
> > 
> > Thanks,
> > Mauro  
> 
> So tricky questions. I'm not sure which way is the least painful!

Agreed.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-31  8:57       ` Jonathan Cameron via
  2024-07-31 10:30         ` Mauro Carvalho Chehab
@ 2024-08-01  8:36         ` Igor Mammedov
  2024-08-01 14:26           ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-08-01  8:36 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mauro Carvalho Chehab, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

On Wed, 31 Jul 2024 09:57:19 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Wed, 31 Jul 2024 09:11:33 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Em Tue, 30 Jul 2024 13:17:09 +0200
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >   
> > > On Mon, 22 Jul 2024 08:45:56 +0200
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
[...]
> > > Preferably make it generic enough to handle
> > > not only ARM but other error formats HEST is
> > > able to handle.    
> > 
> > A too generic interface doesn't sound feasible to me, as the
> > EINJ code needs to check QEMU implementation details before
> > doing the error inject.  
> 
> To be clear we are talking here about a script that
> generates 'similar' stuff to ACPI EINJ does and injects
> via qapi, not guest injection (which is almost always locked
> down on production machines / distros because of the footgun
> aspect).  + ACPI EINJ interface suffers exactly the same
> problems with state discoverability we have with a raw interface here.
> (I checked with Mauro offline that I'd interpreted this
> comment correctly!)
> 
> > 
> > See, processor is probably the simplest error injection
> > source, as most of the fields there aren't related to how
> > the hardware simulation is done.
> > 
> > Yet, if you see patch 7 of this series, you'll notice that some
> > fields should actually be filled based on the emulation.
> > 
> > On ARM, we have some IDs that depend on the emulation
> > (MIDR, MPIDR, power state). Doing that on userspace may require
> > a QAPI to query them.  

QEMU has qmp commands to query QOM tree, device properties is
likely what you'd be interested with.
Adding new QAPI might be not necessary as long as needed
data point are exposed via device's properties.

And additional properties are relatively cheap, especially if their
names prefixed with 'x-' which by convention means
/internal use, not stable, not ABI/

Well whole qmp tree structure hasn't been declared as ABI (as far as I know),
but it's relatively stable and we try not to mess with it much
(especially for mainstream virt machines), as some external users
might (ab)use it anyway (no promises on QEMU side though).

On contrary QAPI is mostly considered as ABI QEMU provides
to its users with burden to maintain it stability.

If injection script is internal tool to QEMU, it should be fine
for it to use qom introspection to get data and limit QAPI
necessary minimum only.
To make sure it won't be broken silently by 'innocent' QEMU
contributors, have a CI job to make sure that it still works
as intended.

> We could strip back the QAPI part to only the bits that are
> not dependent on state.  However, the kicker to that is we'd
> need to make sure all that state is available to an external
> tool (or fully controllable from initial launch command line).
> I'm not sure where the gaps are but, I'm fairly sure there
> will be some.  Doesn't save much code other than documentation
> of the QAPI.
> 
> > 
> > The memory layout, however, is the most complex one. Even for
> > an ARM processor CPER (which is the simplest scenario), the 
> > physical/virtual address need to be checked against the emulation
> > environment.
> > 
> > Other error sources (like memory errors, CXL, etc) will require
> > a deep knowledge about how QEMU mapped such devices.  
> 
> For CXL stuff we'll piggy back on native error injection interfaces
> that are already there and couldn't be avoided because they
> are writing a bunch of register state (that we elide in the FW
> first path). 
> https://lore.kernel.org/qemu-devel/20240205141940.31111-12-Jonathan.Cameron@huawei.com/
> So we won't be adding new QAPI, but the error record generation logic
> will be in QEMU.  For background, the CXL FW first error injection
> has taken a back seat to the ARM errors because of the obvious
> other factor that CXL isn't supported on ARM in upstream QEMU.
> Once I escape a few near term deadlines I'll add the x86
> support for GHESv2 / SCI interrupt signaling as you'd see on a
> typical x86 server.
> 
> > 
> > So, in practice, if we move this to an EINJ script, we'll need
> > to add a probably more complex QAPI to allow querying the memory
> > layout and other device and CPU specific bindings.
> > 
> > Also, we don't know what newer versions of ACPI spec will reserve
> > us. See, even the HEST table contents is dependent of the HEST 
> > revision number, as made clear at the ACPI 6.5 notes:
> > 
> > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source
> > 
> > and at:
> > 
> > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward
> > 
> > So, if we're willing to add support for a more generic "raw data"
> > QAPI, I would still do it per-type, and for the fields that won't
> > require knowledge of the device-emulation details.  
> 
> Could blend the two options and provide no qapi for the bits
> that are QEMU state dependent - if fuzzing, can inject
> the full record raw as doesn't have to be valid state anyway.
> 
> > 
> > Btw, my proposal on patch 7 of this series is to have raw data
> > for:
> > 	- the error-info field;
> > 	- registers dump;
> > 	- micro-architecture specific data.
> > 
> > I don't mind trying to have more raw data there as I see (marginal) 
> > benefits of allowing to generate CPER invalid records [1], but some of
> > those  fields need to be validated and/or filled internally at QEMU - if
> > not forced to an specific value by the caller.
> > 
> > [1] a raw data EINJ can be useful for fuzzy logic fault detection to 
> >     check if badly formed packages won't cause a Kernel panic or be
> >     an exploit. Yet, not really a concern for APEI, as if the hardware
> >     is faulty, a Kernel panic is not out of the table. Also, if the
> >     the BIOS is already compromised and has malicious code on it, 
> >     the EINJ interface is not the main concern.
> >   
> > > PS:
> > > For user convenience, QEMU can carry a script that
> > > could help generate this raw value in user friendly way
> > > but at the same time it won't put maintenance
> > > burden on QEMU itself.    
> > 
> > The script will still require reviews, and the same code will 
> > be there. So, from maintenance burden, there won't be much
> > difference.  

it makes a lot of difference if code is integral part qemu binary,
(less people have to spend time on reviewing it, avoid increasing
attack surface, ... (other made up reasons)).

Implementing shim/proxy in QEMU and putting all error composing logic
into a separate script (even if it's a part QEMU source), shifts
most of the burden to whomever (I'd assume you'd volunteer yourself)
would maintain the script.

If script breaks, it doesn't affect QEMU itself (nor I believe it
should affect release process), script's maintainer(s) can have their
own schedule/process on how to deal with it.
 
> Agreed. I'd also be very keen that the script is tightly coupled to
> QEMU as doesn't make sense to carry with kernel or RAS daemon and
> I'd want to ultimately get this stuff into all the appropriate
> CI flows.

Agreed, it makes much more sense to carry such script as a part of QEMU.


> > 
> > Btw, I'm actually using myself a script to test it, currently
> > sitting together with rasdaemon - which is the Linux tool to detect
> > and handle hardware errors:
> > 
> > 	https://github.com/mchehab/rasdaemon/blob/master/contrib/qemu_einj.py
> > 
> > as it helps a lot when trying to simulate more complex errors.
> > 
> > Once QEMU gains support to inject processor errors, I can prepare a 
> > separate patch to move it to QEMU.
> > 
> > Thanks,
> > Mauro  
> 
> So tricky questions. I'm not sure which way is the least painful!
> 
> Jonathan
> 
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/7] acpi/ghes: Support GPIO error source.
  2024-07-30  8:40   ` Igor Mammedov
@ 2024-08-01 12:56     ` Mauro Carvalho Chehab
  2024-08-01 14:32       ` Jonathan Cameron via
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-01 12:56 UTC (permalink / raw)
  To: Igor Mammedov, Jonathan Cameron
  Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
	linux-kernel, qemu-arm, qemu-devel

Em Tue, 30 Jul 2024 10:40:28 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index 674f6958e905..4f1ab1a73a06 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
> >  
> >  enum {
> >      ACPI_HEST_SRC_ID_SEA = 0,
> > +    ACPI_HEST_SRC_ID_GPIO = 1,  
> is it defined by some spec, or just a made up number?

I don't know. Maybe Jonathan or Shiju knows better, as the original patch
came from them, but I didn't find any parts of the ACPI spec defining the
values for source ID.

Checking at build_ghes_v2() implementation, this is used on two places:

1. as GHESv2 source ID:
    /*
     * Type:
     * Generic Hardware Error Source version 2(GHESv2 - Type 10)
     */
    build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
    /* Source Id */
    build_append_int_noprefix(table_data, source_id, 2);
    /* Related Source Id */
    build_append_int_noprefix(table_data, 0xffff, 2);

as an address offset:

    address_offset = table_data->len;
    /* Error Status Address */
    build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
                     4 /* QWord access */, 0);
    bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
        address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
        ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));

So, if I had to guess, I'd say that this was made up, in a way that
the size of the table will fit just two sources, starting from zero.

So, I'll change the code to just:

	enum {
            ACPI_HEST_SRC_ID_SEA = 0,
            ACPI_HEST_SRC_ID_GPIO, 
	    /* future ids go here */
	    ACPI_HEST_SRC_ID_RESERVED,
	};

To remove the false impression that this could be originated from the
spec.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-07-30 11:26       ` Igor Mammedov
@ 2024-08-01 13:15         ` Mauro Carvalho Chehab
  2024-08-05 14:04           ` Igor Mammedov
  0 siblings, 1 reply; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-01 13:15 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Ani Sinha, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

Em Tue, 30 Jul 2024 13:26:20 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Tue, 30 Jul 2024 09:29:37 +0100
> Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> > On Tue, 30 Jul 2024 at 08:26, Igor Mammedov <imammedo@redhat.com> wrote:  
> > >
> > > On Mon, 22 Jul 2024 08:45:53 +0200
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >    
> > > > Having magic numbers inside the code is not a good idea, as it
> > > > is error-prone. So, instead, create a macro with the number
> > > > definition.
> > > >
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>    
> >   
> > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > > index b0c68d66a345..c99c8b1713c6 100644
> > > > --- a/hw/arm/virt.c
> > > > +++ b/hw/arm/virt.c
> > > > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> > > >      if (s->acpi_dev) {
> > > >          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> > > >      } else {
> > > > -        /* use gpio Pin 3 for power button event */
> > > > +        /* use gpio Pin for power button event */
> > > >          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);    
> > >
> > > /me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
> > > you are passing 3. Is this a bug?    
> > 
> > No. The gpio_key_dev is a gpio-key device which has one
> > input (which you assert to "press the key") and one output,
> > which goes high when the key is pressed and then falls
> > 100ms later. The virt board wires up the output of the
> > gpio-key device to input 3 on the PL061 GPIO controller.
> > (This happens in create_gpio_keys().) So the code is correct
> > to assert input 0 on the gpio-key device and the comment
> > isn't wrong that this results in GPIO pin 3 being asserted:
> > the link is just indirect.  
> 
> it's likely obvious to ARM folks, but maybe comment should
> clarify above for unaware.

Not sure if a comment here with the pin number is a good idea.
After all, this patch was originated because we were using
Pin 6 for GPIO error, while the comment was outdated (stating
that it was pin 8 instead) :-)

After this series, there will be two GPIO pins used inside arm/virt,
both defined at arm/virt.h:

	/* GPIO pins */
	#define GPIO_PIN_POWER_BUTTON  3
	#define GPIO_PIN_GENERIC_ERROR 6

Those macros are used when GPIOs are created:

	static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
	                             uint32_t phandle)
	{
	    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
	                                        qdev_get_gpio_in(pl061_dev,
                                                         GPIO_PIN_POWER_BUTTON));
	    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
	                                          qdev_get_gpio_in(pl061_dev,
	                                                           GPIO_PIN_GENERIC_ERROR));
So, at least for me, it is clear that gpio_key_dev is using pin 3.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-08-01  8:36         ` Igor Mammedov
@ 2024-08-01 14:26           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-01 14:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Thu, 1 Aug 2024 10:36:23 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Wed, 31 Jul 2024 09:57:19 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> 
> > On Wed, 31 Jul 2024 09:11:33 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >   
> > > Em Tue, 30 Jul 2024 13:17:09 +0200
> > > Igor Mammedov <imammedo@redhat.com> escreveu:
> > >     
> > > > On Mon, 22 Jul 2024 08:45:56 +0200
> > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:  
> [...]
> > > > Preferably make it generic enough to handle
> > > > not only ARM but other error formats HEST is
> > > > able to handle.      
> > > 
> > > A too generic interface doesn't sound feasible to me, as the
> > > EINJ code needs to check QEMU implementation details before
> > > doing the error inject.    
> > 
> > To be clear we are talking here about a script that
> > generates 'similar' stuff to ACPI EINJ does and injects
> > via qapi, not guest injection (which is almost always locked
> > down on production machines / distros because of the footgun
> > aspect).  + ACPI EINJ interface suffers exactly the same
> > problems with state discoverability we have with a raw interface here.
> > (I checked with Mauro offline that I'd interpreted this
> > comment correctly!)
> >   
> > > 
> > > See, processor is probably the simplest error injection
> > > source, as most of the fields there aren't related to how
> > > the hardware simulation is done.
> > > 
> > > Yet, if you see patch 7 of this series, you'll notice that some
> > > fields should actually be filled based on the emulation.
> > > 
> > > On ARM, we have some IDs that depend on the emulation
> > > (MIDR, MPIDR, power state). Doing that on userspace may require
> > > a QAPI to query them.    
> 
> QEMU has qmp commands to query QOM tree, device properties is
> likely what you'd be interested with.
> Adding new QAPI might be not necessary as long as needed
> data point are exposed via device's properties.
> 
> And additional properties are relatively cheap, especially if their
> names prefixed with 'x-' which by convention means
> /internal use, not stable, not ABI/
> 
> Well whole qmp tree structure hasn't been declared as ABI (as far as I know),
> but it's relatively stable and we try not to mess with it much
> (especially for mainstream virt machines), as some external users
> might (ab)use it anyway (no promises on QEMU side though).
> 
> On contrary QAPI is mostly considered as ABI QEMU provides
> to its users with burden to maintain it stability.
> 
> If injection script is internal tool to QEMU, it should be fine
> for it to use qom introspection to get data and limit QAPI
> necessary minimum only.

OK, good. Anyway, after sleeping on it, I decided to not focus at
the query for now, as my goal is to validate Linux Kernel and 
rasdaemon. So, I'll just fill them when calling the error inject
script.

As a reference, for ARM Processor error injection, there are just 4
fields that would benefit for a query to fill the default values:

	- mpidr-el1
	- midr-el1
	- power_state
	- ARM context registers and its type

> To make sure it won't be broken silently by 'innocent' QEMU
> contributors, have a CI job to make sure that it still works
> as intended.
> 
> > We could strip back the QAPI part to only the bits that are
> > not dependent on state.  However, the kicker to that is we'd
> > need to make sure all that state is available to an external
> > tool (or fully controllable from initial launch command line).
> > I'm not sure where the gaps are but, I'm fairly sure there
> > will be some.  Doesn't save much code other than documentation
> > of the QAPI.
> >   
> > > 
> > > The memory layout, however, is the most complex one. Even for
> > > an ARM processor CPER (which is the simplest scenario), the 
> > > physical/virtual address need to be checked against the emulation
> > > environment.
> > > 
> > > Other error sources (like memory errors, CXL, etc) will require
> > > a deep knowledge about how QEMU mapped such devices.    
> > 
> > For CXL stuff we'll piggy back on native error injection interfaces
> > that are already there and couldn't be avoided because they
> > are writing a bunch of register state (that we elide in the FW
> > first path). 
> > https://lore.kernel.org/qemu-devel/20240205141940.31111-12-Jonathan.Cameron@huawei.com/
> > So we won't be adding new QAPI, but the error record generation logic
> > will be in QEMU.  For background, the CXL FW first error injection
> > has taken a back seat to the ARM errors because of the obvious
> > other factor that CXL isn't supported on ARM in upstream QEMU.
> > Once I escape a few near term deadlines I'll add the x86
> > support for GHESv2 / SCI interrupt signaling as you'd see on a
> > typical x86 server.
> >   
> > > 
> > > So, in practice, if we move this to an EINJ script, we'll need
> > > to add a probably more complex QAPI to allow querying the memory
> > > layout and other device and CPU specific bindings.
> > > 
> > > Also, we don't know what newer versions of ACPI spec will reserve
> > > us. See, even the HEST table contents is dependent of the HEST 
> > > revision number, as made clear at the ACPI 6.5 notes:
> > > 
> > > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#acpi-error-source
> > > 
> > > and at:
> > > 
> > > 	https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#error-source-structure-header-type-12-onward
> > > 
> > > So, if we're willing to add support for a more generic "raw data"
> > > QAPI, I would still do it per-type, and for the fields that won't
> > > require knowledge of the device-emulation details.    
> > 
> > Could blend the two options and provide no qapi for the bits
> > that are QEMU state dependent - if fuzzing, can inject
> > the full record raw as doesn't have to be valid state anyway.
> >   
> > > 
> > > Btw, my proposal on patch 7 of this series is to have raw data
> > > for:
> > > 	- the error-info field;
> > > 	- registers dump;
> > > 	- micro-architecture specific data.
> > > 
> > > I don't mind trying to have more raw data there as I see (marginal) 
> > > benefits of allowing to generate CPER invalid records [1], but some of
> > > those  fields need to be validated and/or filled internally at QEMU - if
> > > not forced to an specific value by the caller.
> > > 
> > > [1] a raw data EINJ can be useful for fuzzy logic fault detection to 
> > >     check if badly formed packages won't cause a Kernel panic or be
> > >     an exploit. Yet, not really a concern for APEI, as if the hardware
> > >     is faulty, a Kernel panic is not out of the table. Also, if the
> > >     the BIOS is already compromised and has malicious code on it, 
> > >     the EINJ interface is not the main concern.
> > >     
> > > > PS:
> > > > For user convenience, QEMU can carry a script that
> > > > could help generate this raw value in user friendly way
> > > > but at the same time it won't put maintenance
> > > > burden on QEMU itself.      
> > > 
> > > The script will still require reviews, and the same code will 
> > > be there. So, from maintenance burden, there won't be much
> > > difference.    
> 
> it makes a lot of difference if code is integral part qemu binary,
> (less people have to spend time on reviewing it, avoid increasing
> attack surface, ... (other made up reasons)).

I see.

> Implementing shim/proxy in QEMU and putting all error composing logic
> into a separate script (even if it's a part QEMU source), shifts
> most of the burden to whomever (I'd assume you'd volunteer yourself)
> would maintain the script.

Yes, I'll maintain it.

> If script breaks, it doesn't affect QEMU itself (nor I believe it
> should affect release process), script's maintainer(s) can have their
> own schedule/process on how to deal with it.

That's good.

> > Agreed. I'd also be very keen that the script is tightly coupled to
> > QEMU as doesn't make sense to carry with kernel or RAS daemon and
> > I'd want to ultimately get this stuff into all the appropriate
> > CI flows.  
> 
> Agreed, it makes much more sense to carry such script as a part of QEMU.

Ok, I'll be submitting a v4 using the CPER raw data plus script approach.
I'll be placing the script at the final patch.

I opted to make the simplest possible QAPI (keeping it marked as unstable),
as we might need/want to improve it to support other features.

Btw, as error injection is not trivial, and using the script is the best
way to do it, I would prefer to keep such QAPI always marked as unstable,
as it is preferred that QEMU users to use it (and submit patches improving
it) instead of manually crafting CPER records with their own scripts.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 3/7] acpi/ghes: Support GPIO error source.
  2024-08-01 12:56     ` Mauro Carvalho Chehab
@ 2024-08-01 14:32       ` Jonathan Cameron via
  0 siblings, 0 replies; 42+ messages in thread
From: Jonathan Cameron via @ 2024-08-01 14:32 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel

On Thu, 1 Aug 2024 14:56:37 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Tue, 30 Jul 2024 10:40:28 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > > index 674f6958e905..4f1ab1a73a06 100644
> > > --- a/include/hw/acpi/ghes.h
> > > +++ b/include/hw/acpi/ghes.h
> > > @@ -58,6 +58,7 @@ enum AcpiGhesNotifyType {
> > >  
> > >  enum {
> > >      ACPI_HEST_SRC_ID_SEA = 0,
> > > +    ACPI_HEST_SRC_ID_GPIO = 1,    
> > is it defined by some spec, or just a made up number?  
> 
> I don't know. Maybe Jonathan or Shiju knows better, as the original patch
> came from them, but I didn't find any parts of the ACPI spec defining the
> values for source ID.

> 
> Checking at build_ghes_v2() implementation, this is used on two places:
> 
> 1. as GHESv2 source ID:
>     /*
>      * Type:
>      * Generic Hardware Error Source version 2(GHESv2 - Type 10)
>      */
>     build_append_int_noprefix(table_data, ACPI_GHES_SOURCE_GENERIC_ERROR_V2, 2);
>     /* Source Id */
>     build_append_int_noprefix(table_data, source_id, 2);
>     /* Related Source Id */
>     build_append_int_noprefix(table_data, 0xffff, 2);
> 
> as an address offset:
> 
>     address_offset = table_data->len;
>     /* Error Status Address */
>     build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
>                      4 /* QWord access */, 0);
>     bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>         address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
>         ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
> 
> So, if I had to guess, I'd say that this was made up, in a way that
> the size of the table will fit just two sources, starting from zero.
> 
> So, I'll change the code to just:
> 
> 	enum {
>             ACPI_HEST_SRC_ID_SEA = 0,
>             ACPI_HEST_SRC_ID_GPIO, 

LGTM.  The naming is perhaps not ideal but the scheme predates my
involvement so I'm not sure of the reasoning.  Could change it
to QEMU_ACPI...
to make it really really clear these aren't an ACPI spec thing, but
may not be worth it.

J

> 	    /* future ids go here */
> 	    ACPI_HEST_SRC_ID_RESERVED,
> 	};
> 
> To remove the false impression that this could be originated from the
> spec.
> 
> Thanks,
> Mauro
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
  2024-07-29 14:32       ` Markus Armbruster
@ 2024-08-01 14:34         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-01 14:34 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
	Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
	Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel

Em Mon, 29 Jul 2024 16:32:41 +0200
Markus Armbruster <armbru@redhat.com> escreveu:

>  Yes, as this CPER record is defined only for arm. There are three other
> > processor error info:
> > 	- for x86;
> > 	- for ia32;
> > 	- for "generic cpu".
> >
> > They have different structures, with different fields.  
> 
> A generic inject-error command feels nicer, but coding its arguments in
> the schema could be more trouble than it's worth.  I'm not asking you to
> try.
> 
> A target-specific command like this one should be conditional.  Try
> this:
> 
>     { 'command': 'arm-inject-error',
>       'data': { 'errortypes': ['ArmProcessorErrorType'] },
>       'features': [ 'unstable' ],
>       'if': 'TARGET_ARM' }
> 
> No need to provide a qmp_arm_inject_error() stub then.

I tried it, but it generates lots of poison errors. Basically, QAPI
generation includes poison.h, making it to complain about on
non-ARM builds.

Anyway, the new version I'm about to submit is not dependent on
ARM anymore (as it is a generic GHES error injection that can be used
by any arch).

Still, as I added a Kconfig symbol for it, I still needed a stub.

It would be cool not needing it, but on the other hand it doesn't
hurt much.

> >> If we encode the the error to inject as an enum value, adding more will
> >> be hard.
> >> 
> >> If we wrap the enum in a struct
> >> 
> >>     { 'struct': 'ArmProcessorError',
> >>       'data': { 'type': 'ArmProcessorErrorType' } }
> >> 
> >> we can later extend it like
> >> 
> >>     { 'union': 'ArmProcessorError',
> >>       'base: { 'type': 'ArmProcessorErrorType' }
> >>       'data': {
> >>           'bus-error': 'ArmProcessorBusErrorData' } }
> >> 
> >>     { 'struct': 'ArmProcessorBusErrorData',
> >>       'data': ... }  
> >
> > I don't see this working as one might expect. See, the ARM error
> > information data can be repeated from 1 to 255 times. It is given 
> > by this struct (see patch 7):
> >
> > 	{ 'struct': 'ArmProcessorErrorInformation',
> > 	  'data': { '*validation': ['ArmPeiValidationBits'],
> > 	            'type': ['ArmProcessorErrorType'],
> > 	            '*multiple-error': 'uint16',
> > 	            '*flags': ['ArmProcessorFlags'],
> > 	            '*error-info': 'uint64',
> > 	            '*virt-addr':  'uint64',
> > 	            '*phy-addr': 'uint64'}
> > 	}
> >
> > According with the UEFI spec, the type is always be present.
> > The other fields are marked as valid or not via the field
> > "validation". So, there's one bit indicating what is valid between
> > the fields at the PEI structure, e. g.:
> >
> > 	- multiple-error: multiple occurrences of the error;
> > 	- flags;
> > 	- error-info: error information;
> > 	- virt-addr: virtual address;
> > 	- phy-addr: physical address.
> >
> > There are also other fields that are global for the entire record,
> > also marked as valid or not via another bitmask.
> >
> > The contents of almost all those fields are independent of the error
> > type. The only field which content is affected by the error type is
> > "error-info", and the definition of such field is not fully specified.
> >
> > So, currently, UEFI spec only defines it when:
> >
> > 1. the error type has just one bit set;
> > 2. the error type is either cache, TLB or bus error[1].
> >    If type is micro-arch-specific error, the spec doesn't tell how this 
> >    field if filled.
> >
> > To make the API simple (yet powerful), I opted to not enforce any encoding
> > for error-info: let userspace fill it as required and use some default
> > that would make sense, if this is not passed via QMP.
> >
> > [1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information  
> 
> I asked because designing for extensibility is good practice.
> 
> It's not a hard requirement here, because feature 'unstable' gives us
> lincense to change the interface incompatibly.

IMO keeping it as unstable makes sense, as this QAPI is specific for
error injection, which is hardly a feature widely used. Also, with the
script approach, the actual CPER record generation happens on a script.

If we provide it together with QEMU, if the QAPI ever changes, the
changes inside the script will happen altogether. So, IMO, no need to
make it stable.

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-08-01 13:15         ` Mauro Carvalho Chehab
@ 2024-08-05 14:04           ` Igor Mammedov
  2024-08-05 15:22             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 42+ messages in thread
From: Igor Mammedov @ 2024-08-05 14:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Peter Maydell, Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Ani Sinha, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

On Thu, 1 Aug 2024 15:15:44 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Em Tue, 30 Jul 2024 13:26:20 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
> 
> > On Tue, 30 Jul 2024 09:29:37 +0100
> > Peter Maydell <peter.maydell@linaro.org> wrote:
> >   
> > > On Tue, 30 Jul 2024 at 08:26, Igor Mammedov <imammedo@redhat.com> wrote:    
> > > >
> > > > On Mon, 22 Jul 2024 08:45:53 +0200
> > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > >      
> > > > > Having magic numbers inside the code is not a good idea, as it
> > > > > is error-prone. So, instead, create a macro with the number
> > > > > definition.
> > > > >
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>      
> > >     
> > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > > > index b0c68d66a345..c99c8b1713c6 100644
> > > > > --- a/hw/arm/virt.c
> > > > > +++ b/hw/arm/virt.c
> > > > > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> > > > >      if (s->acpi_dev) {
> > > > >          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> > > > >      } else {
> > > > > -        /* use gpio Pin 3 for power button event */
> > > > > +        /* use gpio Pin for power button event */
> > > > >          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);      
> > > >
> > > > /me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
> > > > you are passing 3. Is this a bug?      
> > > 
> > > No. The gpio_key_dev is a gpio-key device which has one
> > > input (which you assert to "press the key") and one output,
> > > which goes high when the key is pressed and then falls
> > > 100ms later. The virt board wires up the output of the
> > > gpio-key device to input 3 on the PL061 GPIO controller.
> > > (This happens in create_gpio_keys().) So the code is correct
> > > to assert input 0 on the gpio-key device and the comment
> > > isn't wrong that this results in GPIO pin 3 being asserted:
> > > the link is just indirect.    
> > 
> > it's likely obvious to ARM folks, but maybe comment should
> > clarify above for unaware.  
> 
> Not sure if a comment here with the pin number is a good idea.
> After all, this patch was originated because we were using
> Pin 6 for GPIO error, while the comment was outdated (stating
> that it was pin 8 instead) :-)
> 
> After this series, there will be two GPIO pins used inside arm/virt,
> both defined at arm/virt.h:
> 
> 	/* GPIO pins */
> 	#define GPIO_PIN_POWER_BUTTON  3
> 	#define GPIO_PIN_GENERIC_ERROR 6
> 
> Those macros are used when GPIOs are created:
> 
> 	static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> 	                             uint32_t phandle)
> 	{
> 	    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> 	                                        qdev_get_gpio_in(pl061_dev,
>                                                          GPIO_PIN_POWER_BUTTON));
> 	    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
> 	                                          qdev_get_gpio_in(pl061_dev,
> 	                                                           GPIO_PIN_GENERIC_ERROR));
> So, at least for me, it is clear that gpio_key_dev is using pin 3.

if you switch to using already existing GED device,
then this patch will go away since event will be delivered by GED
instead of GPIO + _AEI.

> 
> Thanks,
> Mauro
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/7] arm/virt: place power button pin number on a define
  2024-08-05 14:04           ` Igor Mammedov
@ 2024-08-05 15:22             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 42+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-05 15:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Peter Maydell, Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
	Ani Sinha, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel

Em Mon, 5 Aug 2024 16:04:39 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Thu, 1 Aug 2024 15:15:44 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > Em Tue, 30 Jul 2024 13:26:20 +0200
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >   
> > > On Tue, 30 Jul 2024 09:29:37 +0100
> > > Peter Maydell <peter.maydell@linaro.org> wrote:
> > >     
> > > > On Tue, 30 Jul 2024 at 08:26, Igor Mammedov <imammedo@redhat.com> wrote:      
> > > > >
> > > > > On Mon, 22 Jul 2024 08:45:53 +0200
> > > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > > >        
> > > > > > Having magic numbers inside the code is not a good idea, as it
> > > > > > is error-prone. So, instead, create a macro with the number
> > > > > > definition.
> > > > > >
> > > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>        
> > > >       
> > > > > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > > > > index b0c68d66a345..c99c8b1713c6 100644
> > > > > > --- a/hw/arm/virt.c
> > > > > > +++ b/hw/arm/virt.c
> > > > > > @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> > > > > >      if (s->acpi_dev) {
> > > > > >          acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> > > > > >      } else {
> > > > > > -        /* use gpio Pin 3 for power button event */
> > > > > > +        /* use gpio Pin for power button event */
> > > > > >          qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);        
> > > > >
> > > > > /me confused, it was saying Pin 3 but is passing 0 as argument where as elsewhere
> > > > > you are passing 3. Is this a bug?        
> > > > 
> > > > No. The gpio_key_dev is a gpio-key device which has one
> > > > input (which you assert to "press the key") and one output,
> > > > which goes high when the key is pressed and then falls
> > > > 100ms later. The virt board wires up the output of the
> > > > gpio-key device to input 3 on the PL061 GPIO controller.
> > > > (This happens in create_gpio_keys().) So the code is correct
> > > > to assert input 0 on the gpio-key device and the comment
> > > > isn't wrong that this results in GPIO pin 3 being asserted:
> > > > the link is just indirect.      
> > > 
> > > it's likely obvious to ARM folks, but maybe comment should
> > > clarify above for unaware.    
> > 
> > Not sure if a comment here with the pin number is a good idea.
> > After all, this patch was originated because we were using
> > Pin 6 for GPIO error, while the comment was outdated (stating
> > that it was pin 8 instead) :-)
> > 
> > After this series, there will be two GPIO pins used inside arm/virt,
> > both defined at arm/virt.h:
> > 
> > 	/* GPIO pins */
> > 	#define GPIO_PIN_POWER_BUTTON  3
> > 	#define GPIO_PIN_GENERIC_ERROR 6
> > 
> > Those macros are used when GPIOs are created:
> > 
> > 	static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> > 	                             uint32_t phandle)
> > 	{
> > 	    gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> > 	                                        qdev_get_gpio_in(pl061_dev,
> >                                                          GPIO_PIN_POWER_BUTTON));
> > 	    gpio_error_dev = sysbus_create_simple("gpio-key", -1,
> > 	                                          qdev_get_gpio_in(pl061_dev,
> > 	                                                           GPIO_PIN_GENERIC_ERROR));
> > So, at least for me, it is clear that gpio_key_dev is using pin 3.  
> 
> if you switch to using already existing GED device,
> then this patch will go away since event will be delivered by GED
> instead of GPIO + _AEI.

This patch is actually independent from the rest. It is related to a power
down event, and not related at all with error inject.

The rationale for keeping it on this series was due to the original
patch 2 (as otherwise merge conflicts would rise). It can now be merged
in separate.

Btw, this is doing a cleanup requested by Michael and Peter:

	https://lore.kernel.org/qemu-devel/CAFEAcA-PYnZ-32MRX+PgvzhnoAV80zBKMYg61j2f=oHaGfwSsg@mail.gmail.com/

Thanks,
Mauro


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2024-08-05 15:23 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22  6:45 [PATCH v3 0/7] Add ACPI CPER firmware first error injection for Arm Processor Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-07-30  7:25   ` Igor Mammedov
2024-07-30  8:29     ` Peter Maydell
2024-07-30 11:26       ` Igor Mammedov
2024-08-01 13:15         ` Mauro Carvalho Chehab
2024-08-05 14:04           ` Igor Mammedov
2024-08-05 15:22             ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-07-26 12:30   ` Jonathan Cameron via
2024-07-30  8:36   ` Igor Mammedov
2024-07-31  5:17     ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-07-30  8:40   ` Igor Mammedov
2024-08-01 12:56     ` Mauro Carvalho Chehab
2024-08-01 14:32       ` Jonathan Cameron via
2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
2024-07-25  9:48   ` Markus Armbruster
2024-07-26 12:46     ` Jonathan Cameron via
2024-07-29 12:49       ` Mauro Carvalho Chehab
2024-07-29 12:21     ` Mauro Carvalho Chehab
2024-07-29 14:32       ` Markus Armbruster
2024-08-01 14:34         ` Mauro Carvalho Chehab
2024-07-26 12:44   ` Jonathan Cameron via
2024-07-29 11:40     ` Mauro Carvalho Chehab
2024-07-30 11:17   ` Igor Mammedov
2024-07-31  7:11     ` Mauro Carvalho Chehab
2024-07-31  8:57       ` Jonathan Cameron via
2024-07-31 10:30         ` Mauro Carvalho Chehab
2024-08-01  8:36         ` Igor Mammedov
2024-08-01 14:26           ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
2024-07-26 12:50   ` Jonathan Cameron via
2024-07-22  6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
2024-07-30 11:24   ` Igor Mammedov
2024-07-30 11:36     ` Michael S. Tsirkin
2024-07-31  6:05       ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
2024-07-25 10:03   ` Markus Armbruster
2024-07-29 11:18     ` Mauro Carvalho Chehab
2024-07-26 13:22   ` Jonathan Cameron via
2024-07-29 11:10     ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).