* [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation
@ 2024-08-02 21:43 Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
` (6 more replies)
0 siblings, 7 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Ani Sinha,
Dongjiu Geng, Paolo Bonzini, Peter Maydell, Shannon Zhao,
qemu-arm, qemu-devel
Testing OS kernel ACPI APEI CPER support is tricky, as one depends on
having hardware with special-purpose BIOS and/or hardware.
With QEMU, it becomes a lot easier, as it can be done via QMP.
This series add support for injecting CPER records on ARM emulation.
The QEMU side changes add a QAPI able to do CPER error injection
on ARM, with a raw data parameter, making it very flexible.
A script is provided at the final patch implementing support for
ARM Processor CPER error injection according with ACPI 6.x and
UEFI 2.9A/2.10 specs, via QMP.
Injecting such errors can be done using the provided script:
$ ./scripts/ghes_inject.py arm
Error injected.
Produces a simple CPER register, properly handled by the Linux
Kernel:
[ 794.983753] {4}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 794.984150] {4}[Hardware Error]: event severity: recoverable
[ 794.984391] {4}[Hardware Error]: Error 0, type: recoverable
[ 794.984652] {4}[Hardware Error]: section_type: ARM processor error
[ 794.984926] {4}[Hardware Error]: MIDR: 0x0000000000000000
[ 794.985184] {4}[Hardware Error]: running state: 0x0
[ 794.985411] {4}[Hardware Error]: Power State Coordination Interface state: 0
[ 794.985720] {4}[Hardware Error]: Error info structure 0:
[ 794.985960] {4}[Hardware Error]: num errors: 2
[ 794.986175] {4}[Hardware Error]: error_type: 0x02: cache error
[ 794.986442] {4}[Hardware Error]: error_info: 0x000000000091000f
[ 794.986755] {4}[Hardware Error]: transaction type: Data Access
[ 794.987027] {4}[Hardware Error]: cache error, operation type: Data write
[ 794.987310] {4}[Hardware Error]: cache level: 2
[ 794.987529] {4}[Hardware Error]: processor context not corrupted
[ 794.987867] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
More complex use cases can be done, like:
$ ./scripts/ghes_inject.py arm --mpidr 0x444 --running --affinity 1 \
--error-info 12345678 --vendor 0x13,123,4,5,1 --ctx-array 0,1,2,3,4,5 \
-t cache tlb bus micro-arch tlb,micro-arch
Error injected.
[ 899.181246] {5}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 899.181769] {5}[Hardware Error]: event severity: recoverable
[ 899.182069] {5}[Hardware Error]: Error 0, type: recoverable
[ 899.182370] {5}[Hardware Error]: section_type: ARM processor error
[ 899.182689] {5}[Hardware Error]: MIDR: 0x0000000000000000
[ 899.182980] {5}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000000000000
[ 899.183395] {5}[Hardware Error]: error affinity level: 0
[ 899.183683] {5}[Hardware Error]: running state: 0x1
[ 899.183962] {5}[Hardware Error]: Power State Coordination Interface state: 0
[ 899.184332] {5}[Hardware Error]: Error info structure 0:
[ 899.184610] {5}[Hardware Error]: num errors: 2
[ 899.184864] {5}[Hardware Error]: error_type: 0x02: cache error
[ 899.185181] {5}[Hardware Error]: error_info: 0x0000000000bc614e
[ 899.185504] {5}[Hardware Error]: cache level: 2
[ 899.185771] {5}[Hardware Error]: processor context not corrupted
[ 899.186082] {5}[Hardware Error]: Error info structure 1:
[ 899.186366] {5}[Hardware Error]: num errors: 2
[ 899.186613] {5}[Hardware Error]: error_type: 0x04: TLB error
[ 899.186929] {5}[Hardware Error]: error_info: 0x000000000054007f
[ 899.187236] {5}[Hardware Error]: transaction type: Instruction
[ 899.187588] {5}[Hardware Error]: TLB error, operation type: Instruction fetch
[ 899.187962] {5}[Hardware Error]: TLB level: 1
[ 899.188209] {5}[Hardware Error]: processor context not corrupted
[ 899.188535] {5}[Hardware Error]: the error has not been corrected
[ 899.188853] {5}[Hardware Error]: PC is imprecise
[ 899.189114] {5}[Hardware Error]: Error info structure 2:
[ 899.189404] {5}[Hardware Error]: num errors: 2
[ 899.189653] {5}[Hardware Error]: error_type: 0x08: bus error
[ 899.189967] {5}[Hardware Error]: error_info: 0x00000080d6460fff
[ 899.190293] {5}[Hardware Error]: transaction type: Generic
[ 899.190611] {5}[Hardware Error]: bus error, operation type: Generic read (type of instruction or data request cannot be determined)
[ 899.191174] {5}[Hardware Error]: affinity level at which the bus error occurred: 1
[ 899.191563] {5}[Hardware Error]: processor context corrupted
[ 899.191872] {5}[Hardware Error]: the error has been corrected
[ 899.192185] {5}[Hardware Error]: PC is imprecise
[ 899.192445] {5}[Hardware Error]: Program execution can be restarted reliably at the PC associated with the error.
[ 899.192939] {5}[Hardware Error]: participation type: Local processor observed
[ 899.193324] {5}[Hardware Error]: request timed out
[ 899.193596] {5}[Hardware Error]: address space: External Memory Access
[ 899.193945] {5}[Hardware Error]: memory access attributes:0x20
[ 899.194273] {5}[Hardware Error]: access mode: secure
[ 899.194544] {5}[Hardware Error]: Error info structure 3:
[ 899.194838] {5}[Hardware Error]: num errors: 2
[ 899.195088] {5}[Hardware Error]: error_type: 0x10: micro-architectural error
[ 899.195456] {5}[Hardware Error]: error_info: 0x0000000078da03ff
[ 899.195782] {5}[Hardware Error]: Error info structure 4:
[ 899.196070] {5}[Hardware Error]: num errors: 2
[ 899.196331] {5}[Hardware Error]: error_type: 0x14: TLB error|micro-architectural error
[ 899.196733] {5}[Hardware Error]: Context info structure 0:
[ 899.197024] {5}[Hardware Error]: register context type: AArch64 EL1 context registers
[ 899.197427] {5}[Hardware Error]: 00000000: 00000000 00000000
[ 899.197741] {5}[Hardware Error]: Vendor specific error info has 5 bytes:
[ 899.198096] {5}[Hardware Error]: 00000000: 13 7b 04 05 01 .{...
[ 899.198610] [Firmware Warn]: GHES: Unhandled processor error type 0x02: cache error
[ 899.199000] [Firmware Warn]: GHES: Unhandled processor error type 0x04: TLB error
[ 899.199388] [Firmware Warn]: GHES: Unhandled processor error type 0x08: bus error
[ 899.199767] [Firmware Warn]: GHES: Unhandled processor error type 0x10: micro-architectural error
[ 899.200194] [Firmware Warn]: GHES: Unhandled processor error type 0x14: TLB error|micro-architectural error
---
v5:
- CPER guid is now passing as string;
- raw-data is now passed with base64 encode;
- Removed several GPIO left-overs from arm/virt.c changes;
- Lots of cleanups and improvements at the error injection script.
It now better handles QMP dialog and doesn't print debug messages.
Also, code was split on two modules, to make easier to add more
error injection commands.
v4:
- CPER generation moved to happen outside QEMU;
- One patch adding support for mpidr query was removed.
v3:
- patch 1 cleanups with some comment changes and adding another place where
the poweroff GPIO define should be used. No changes on other patches (except
due to conflict resolution).
v2:
- added a new patch using a define for GPIO power pin;
- patch 2 changed to also use a define for generic error GPIO pin;
- a couple cleanups at patch 2 removing uneeded else clauses.
Jonathan Cameron (1):
acpi/ghes: Support GPIO error source
Mauro Carvalho Chehab (6):
arm/virt: place power button pin number on a define
acpi/generic_event_device: add an APEI error device
arm/virt: Wire up GPIO error source for ACPI / GHES
qapi/ghes-cper: add an interface to do generic CPER error injection
acpi/ghes: add support for generic error injection via QAPI
scripts/ghes_inject: add a script to generate GHES error inject
MAINTAINERS | 10 +
hw/acpi/Kconfig | 5 +
hw/acpi/generic_event_device.c | 17 ++
hw/acpi/ghes.c | 178 +++++++++++--
hw/acpi/ghes_cper.c | 45 ++++
hw/acpi/ghes_cper_stub.c | 18 ++
hw/acpi/meson.build | 2 +
hw/arm/Kconfig | 5 +
hw/arm/virt-acpi-build.c | 7 +-
hw/arm/virt.c | 23 +-
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/generic_event_device.h | 3 +
include/hw/acpi/ghes.h | 16 +-
include/hw/arm/virt.h | 4 +
qapi/ghes-cper.json | 55 ++++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
scripts/arm_processor_error.py | 352 +++++++++++++++++++++++++
scripts/ghes_inject.py | 59 +++++
scripts/qmp_helper.py | 249 +++++++++++++++++
20 files changed, 1026 insertions(+), 25 deletions(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/ghes-cper.json
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
--
2.45.2
^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH v5 1/7] arm/virt: place power button pin number on a define
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
@ 2024-08-02 21:43 ` Mauro Carvalho Chehab
2024-08-06 8:57 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
` (5 subsequent siblings)
6 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Igor Mammedov, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
Having magic numbers inside the code is not a good idea, as it
is error-prone. So, instead, create a macro with the number
definition.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
hw/arm/virt-acpi-build.c | 6 +++---
hw/arm/virt.c | 7 ++++---
include/hw/arm/virt.h | 3 +++
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index e10cad86dd73..f76fb117adff 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
aml_append(dev, aml_name_decl("_CRS", crs));
Aml *aei = aml_resource_template();
- /* Pin 3 for power button */
- const uint32_t pin_list[1] = {3};
+
+ const uint32_t pin = GPIO_PIN_POWER_BUTTON;
aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
- AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
+ AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
"GPO0", NULL, 0));
aml_append(dev, aml_name_decl("_AEI", aei));
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 719e83e6a1e7..687fe0bb8bc9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
if (s->acpi_dev) {
acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
} else {
- /* use gpio Pin 3 for power button event */
+ /* use gpio Pin for power button event */
qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
}
}
@@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
uint32_t phandle)
{
gpio_key_dev = sysbus_create_simple("gpio-key", -1,
- qdev_get_gpio_in(pl061_dev, 3));
+ qdev_get_gpio_in(pl061_dev,
+ GPIO_PIN_POWER_BUTTON));
qemu_fdt_add_subnode(fdt, "/gpio-keys");
qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
@@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
KEY_POWER);
qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
- "gpios", phandle, 3, 0);
+ "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
}
#define SECURE_GPIO_POWEROFF 0
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index ab961bb6a9b8..a4d937ed45ac 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -47,6 +47,9 @@
/* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
#define PVTIME_SIZE_PER_CPU 64
+/* GPIO pins */
+#define GPIO_PIN_POWER_BUTTON 3
+
enum {
VIRT_FLASH,
VIRT_MEM,
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
@ 2024-08-02 21:43 ` Mauro Carvalho Chehab
2024-08-05 16:39 ` Jonathan Cameron via
2024-08-06 8:54 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
` (4 subsequent siblings)
6 siblings, 2 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Igor Mammedov, linux-kernel,
qemu-devel
Adds a Generic Event Device to handle generic hardware error
events, supporting General Purpose Event (GPE) as specified at
ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.
The PNP0C33 device is used to report hardware errors to
the bios via ACPI APEI Generic Hardware Error Source (GHES).
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/generic_event_device.c | 17 +++++++++++++++++
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/generic_event_device.h | 3 +++
3 files changed, 21 insertions(+)
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 15b4c3ebbf24..b9ad05e98c05 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
ACPI_GED_PWR_DOWN_EVT,
ACPI_GED_NVDIMM_HOTPLUG_EVT,
ACPI_GED_CPU_HOTPLUG_EVT,
+ ACPI_GED_ERROR_EVT
};
/*
@@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
aml_int(0x80)));
break;
+ case ACPI_GED_ERROR_EVT:
+ aml_append(if_ctx,
+ aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+ aml_int(0x80)));
+ break;
case ACPI_GED_NVDIMM_HOTPLUG_EVT:
aml_append(if_ctx,
aml_notify(aml_name("\\_SB.NVDR"),
@@ -153,6 +159,15 @@ void acpi_dsdt_add_power_button(Aml *scope)
aml_append(scope, dev);
}
+void acpi_dsdt_add_error_device(Aml *scope)
+{
+ Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+ aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+ aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
+ aml_append(scope, dev);
+}
+
/* Memory read by the GED _EVT AML dynamic method */
static uint64_t ged_evt_read(void *opaque, hwaddr addr, unsigned size)
{
@@ -295,6 +310,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
sel = ACPI_GED_MEM_HOTPLUG_EVT;
} else if (ev & ACPI_POWER_DOWN_STATUS) {
sel = ACPI_GED_PWR_DOWN_EVT;
+ } else if (ev & ACPI_GENERIC_ERROR) {
+ sel = ACPI_GED_ERROR_EVT;
} else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
} else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
ACPI_NVDIMM_HOTPLUG_STATUS = 16,
ACPI_VMGENID_CHANGE_STATUS = 32,
ACPI_POWER_DOWN_STATUS = 64,
+ ACPI_GENERIC_ERROR = 128,
} AcpiEventStatusBits;
#define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
index 40af3550b56d..b8f2f1328e0c 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -66,6 +66,7 @@
#include "qom/object.h"
#define ACPI_POWER_BUTTON_DEVICE "PWRB"
+#define ACPI_APEI_ERROR_DEVICE "GEDD"
#define TYPE_ACPI_GED "acpi-ged"
OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
@@ -98,6 +99,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
#define ACPI_GED_PWR_DOWN_EVT 0x2
#define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
#define ACPI_GED_CPU_HOTPLUG_EVT 0x8
+#define ACPI_GED_ERROR_EVT 0x10
typedef struct GEDState {
MemoryRegion evt;
@@ -120,5 +122,6 @@ struct AcpiGedState {
void build_ged_aml(Aml *table, const char* name, HotplugHandler *hotplug_dev,
uint32_t ged_irq, AmlRegionSpace rs, hwaddr ged_base);
void acpi_dsdt_add_power_button(Aml *scope);
+void acpi_dsdt_add_error_device(Aml *scope);
#endif
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2024-08-02 21:43 ` Mauro Carvalho Chehab
2024-08-05 16:54 ` Jonathan Cameron via
2024-08-06 9:15 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
` (3 subsequent siblings)
6 siblings, 2 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Igor Mammedov,
Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
Adds support to ARM virtualization to allow handling
a General Purpose Event (GPE) via GED error device.
It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 3 +++
hw/arm/virt-acpi-build.c | 1 +
hw/arm/virt.c | 16 +++++++++++++++-
include/hw/acpi/ghes.h | 3 +++
include/hw/arm/virt.h | 1 +
5 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..8d0262e6c1aa 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
return ret;
}
+NotifierList generic_error_notifiers =
+ NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
bool acpi_ghes_present(void)
{
AcpiGedState *acpi_ged_state;
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..f8bbe3e7a0b8 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
}
acpi_dsdt_add_power_button(scope);
+ acpi_dsdt_add_error_device(scope);
#ifdef CONFIG_TPM
acpi_dsdt_add_tpm(scope, vms);
#endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 687fe0bb8bc9..8b315328154f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -73,6 +73,7 @@
#include "standard-headers/linux/input.h"
#include "hw/arm/smmuv3.h"
#include "hw/acpi/acpi.h"
+#include "hw/acpi/ghes.h"
#include "target/arm/cpu-qom.h"
#include "target/arm/internals.h"
#include "target/arm/multiprocessing.h"
@@ -677,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
DeviceState *dev;
MachineState *ms = MACHINE(vms);
int irq = vms->irqmap[VIRT_ACPI_GED];
- uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+ uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
if (ms->ram_slots) {
event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1009,6 +1010,15 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
}
}
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+ VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
+
+ if (s->acpi_dev) {
+ acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+ }
+}
+
static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
uint32_t phandle)
{
@@ -2397,6 +2407,10 @@ static void machvirt_init(MachineState *machine)
vms->powerdown_notifier.notify = virt_powerdown_req;
qemu_register_powerdown_notifier(&vms->powerdown_notifier);
+ vms->generic_error_notifier.notify = virt_generic_error_req;
+ notifier_list_add(&generic_error_notifiers,
+ &vms->generic_error_notifier);
+
/* Create mmio transports, so the user can create virtio backends
* (which will be automatically plugged in to the transports). If
* no backend is created the transport will just sit harmlessly idle.
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..6891eafff5ab 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,9 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qemu/notify.h"
+
+extern NotifierList generic_error_notifiers;
/*
* Values for Hardware Error Notification Type field
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a4d937ed45ac..ad9f6e94dcc5 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -175,6 +175,7 @@ struct VirtMachineState {
DeviceState *gic;
DeviceState *acpi_dev;
Notifier powerdown_notifier;
+ Notifier generic_error_notifier;
PCIBus *bus;
char *oem_id;
char *oem_table_id;
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-08-02 21:43 ` Mauro Carvalho Chehab
2024-08-05 16:56 ` Jonathan Cameron via
2024-08-06 9:32 ` Igor Mammedov
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
` (2 subsequent siblings)
6 siblings, 2 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:43 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Igor Mammedov, linux-kernel, qemu-arm, qemu-devel,
Mauro Carvalho Chehab
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Add error notification to GHES v2 using the GPIO source.
[mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 16 ++++++++++------
include/hw/acpi/ghes.h | 3 ++-
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 8d0262e6c1aa..a745dcc7be5e 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -34,8 +34,8 @@
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT 1
+/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
+#define ACPI_GHES_ERROR_SOURCE_COUNT 2
/* Generic Hardware Error Source version 2 */
#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
@@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
{
uint64_t address_offset;
+
+ assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
+
/*
* Type:
* Generic Hardware Error Source version 2(GHESv2 - Type 10)
@@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
*/
build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
break;
+ case ACPI_HEST_SRC_ID_GPIO:
+ build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
+ break;
default:
error_report("Not support this error source");
abort();
@@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+ build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
acpi_table_end(linker, &table);
}
@@ -406,10 +413,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
start_addr = le64_to_cpu(ags->ghes_addr_le);
if (physical_address) {
-
- if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ start_addr += source_id * sizeof(uint64_t);
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 6891eafff5ab..33be1eb5acf4 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -59,9 +59,10 @@ enum AcpiGhesNotifyType {
ACPI_GHES_NOTIFY_RESERVED = 12
};
+/* Those are used as table indexes when building GHES tables */
enum {
ACPI_HEST_SRC_ID_SEA = 0,
- /* future ids go here */
+ ACPI_HEST_SRC_ID_GPIO,
ACPI_HEST_SRC_ID_RESERVED,
};
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (3 preceding siblings ...)
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
@ 2024-08-02 21:44 ` Mauro Carvalho Chehab
2024-08-05 17:00 ` Jonathan Cameron via
` (3 more replies)
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
6 siblings, 4 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:44 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
Igor Mammedov, Markus Armbruster, Michael Roth, Paolo Bonzini,
Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2.
The actual GHES code will be added at the followup patch.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 7 +++++
hw/acpi/Kconfig | 5 ++++
hw/acpi/ghes_cper.c | 45 ++++++++++++++++++++++++++++++++
hw/acpi/ghes_cper_stub.c | 18 +++++++++++++
hw/acpi/meson.build | 2 ++
hw/arm/Kconfig | 5 ++++
include/hw/acpi/ghes.h | 7 +++++
qapi/ghes-cper.json | 55 ++++++++++++++++++++++++++++++++++++++++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
10 files changed, 146 insertions(+)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/ghes-cper.json
diff --git a/MAINTAINERS b/MAINTAINERS
index 98eddf7ae155..655edcb6688c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
F: include/hw/acpi/ghes.h
F: docs/specs/acpi_hest_ghes.rst
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/ghes-cper.json
+
ppc4xx
L: qemu-ppc@nongnu.org
S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index e07d3204eb36..73ffbb82c150 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
bool
depends on ACPI
+config GHES_CPER
+ bool
+ depends on ACPI_APEI
+ default y
+
config ACPI_PCI
bool
depends on ACPI && PCI
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index 000000000000..7aa7e71e90dc
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,45 @@
+/*
+ * ARM Processor error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-ghes-cper.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper,
+ Error **errp)
+{
+ int rc;
+ AcpiGhesCper cper;
+ QemuUUID be_uuid, le_uuid;
+
+ rc = qemu_uuid_parse(qmp_cper->notification_type, &be_uuid);
+ if (rc) {
+ error_setg(errp, "GHES: Invalid UUID: %s",
+ qmp_cper->notification_type);
+ return;
+ }
+
+ le_uuid = qemu_uuid_bswap(be_uuid);
+ cper.guid = le_uuid.data;
+
+ cper.data = qbase64_decode(qmp_cper->raw_data, -1,
+ &cper.data_len, errp);
+ if (!cper.data) {
+ return;
+ }
+
+ /* TODO: call a function at ghes */
+
+ g_free(cper.data);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index 000000000000..7ce6ed70a265
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,18 @@
+/*
+ * ARM Processor error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-ghes-cper.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_ghes_cper(CommonPlatformErrorRecord *cper, Error **errp)
+{
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fa5c07db9068..6cbf430eb66d 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
+system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
+system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
system_ss.add(files('acpi-qmp-cmds.c'))
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 1ad60da7aa2d..bed6ba27d715 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -712,3 +712,8 @@ config ARMSSE
select UNIMP
select SSE_COUNTER
select SSE_TIMER
+
+config GHES_CPER
+ bool
+ depends on ARM
+ default y if AARCH64
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 33be1eb5acf4..06a5b8820cd5 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
#include "qemu/notify.h"
extern NotifierList generic_error_notifiers;
@@ -78,6 +79,12 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+typedef struct AcpiGhesCper {
+ uint8_t *guid;
+ uint8_t *data;
+ size_t data_len;
+} AcpiGhesCper;
+
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
*
diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
new file mode 100644
index 000000000000..3cc4f9f2aaa9
--- /dev/null
+++ b/qapi/ghes-cper.json
@@ -0,0 +1,55 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# = GHESv2 CPER Error Injection
+#
+# These are defined at
+# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+# (GHESv2 - Type 10)
+##
+
+##
+# @CommonPlatformErrorRecord:
+#
+# Common Platform Error Record - CPER - as defined at the UEFI
+# specification. See
+# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
+# for more details.
+#
+# @notification-type: pre-assigned GUID string indicating the record
+# association with an error event notification type, as defined
+# at https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
+#
+# @raw-data: Contains a base64 encoded string with the payload of
+# the CPER.
+#
+# Since: 9.2
+##
+{ 'struct': 'CommonPlatformErrorRecord',
+ 'data': {
+ 'notification-type': 'str',
+ 'raw-data': 'str'
+ }
+}
+
+##
+# @ghes-cper:
+#
+# Inject ARM Processor error with data to be filled according with
+# ACPI 6.2 GHESv2 spec.
+#
+# @cper: a single CPER record to be sent to the guest OS.
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.2
+##
+{ 'command': 'ghes-cper',
+ 'data': {
+ 'cper': 'CommonPlatformErrorRecord'
+ },
+ 'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..bd13cd7d40c9 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -35,6 +35,7 @@ qapi_all_modules = [
'dump',
'ebpf',
'error',
+ 'ghes-cper',
'introspect',
'job',
'machine-common',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..c1a267399fe5 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -75,6 +75,7 @@
{ 'include': 'misc-target.json' }
{ 'include': 'audio.json' }
{ 'include': 'acpi.json' }
+{ 'include': 'ghes-cper.json' }
{ 'include': 'pci.json' }
{ 'include': 'stats.json' }
{ 'include': 'virtio.json' }
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (4 preceding siblings ...)
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2024-08-02 21:44 ` Mauro Carvalho Chehab
2024-08-05 17:03 ` Jonathan Cameron via
` (2 more replies)
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
6 siblings, 3 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:44 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Igor Mammedov,
linux-kernel, qemu-arm, qemu-devel
Provide a generic interface for error injection via GHESv2.
This patch is co-authored:
- original ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- generic GHESv2 error inject by Mauro Carvalho Chehab;
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
hw/acpi/ghes_cper.c | 2 +-
include/hw/acpi/ghes.h | 3 +
3 files changed, 152 insertions(+), 12 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a745dcc7be5e..e125c9475773 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
+static uint64_t ghes_get_state_start_address(void)
+{
+ AcpiGedState *acpi_ged_state =
+ ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
+ AcpiGhesState *ags = &acpi_ged_state->ghes_state;
+
+ return le64_to_cpu(ags->ghes_addr_le);
+}
+
int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
{
uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
- uint64_t start_addr;
+ uint64_t start_addr = ghes_get_state_start_address();
bool ret = -1;
- AcpiGedState *acpi_ged_state;
- AcpiGhesState *ags;
-
assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
- acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
- NULL));
- g_assert(acpi_ged_state);
- ags = &acpi_ged_state->ghes_state;
-
- start_addr = le64_to_cpu(ags->ghes_addr_le);
-
if (physical_address) {
start_addr += source_id * sizeof(uint64_t);
@@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
return ret;
}
+/*
+ * Error register block data layout
+ *
+ * | +---------------------+ ges.ghes_addr_le
+ * | |error_block_address0 |
+ * | +---------------------+
+ * | |error_block_address1 |
+ * | +---------------------+ --+--
+ * | | ............. | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | |error_block_addressN |
+ * | +---------------------+
+ * | | read_ack0 |
+ * | +---------------------+ --+--
+ * | | read_ack1 | GHES_ADDRESS_SIZE
+ * | +---------------------+ --+--
+ * | | ............. |
+ * | +---------------------+
+ * | | read_ackN |
+ * | +---------------------+ --+--
+ * | | CPER | |
+ * | | .... | GHES_MAX_RAW_DATA_LENGT
+ * | | CPER | |
+ * | +---------------------+ --+--
+ * | | .......... |
+ * | +---------------------+
+ * | | CPER |
+ * | | .... |
+ * | | CPER |
+ * | +---------------------+
+ */
+
+/* Map from uint32_t notify to entry offset in GHES */
+static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
+ 0xff, 0xff, 0xff, 1, 0};
+
+static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
+ uint64_t *read_ack_addr)
+{
+ uint64_t base;
+
+ if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
+ return false;
+ }
+
+ /* Find and check the source id for this new CPER */
+ if (error_source_to_index[notify] == 0xff) {
+ return false;
+ }
+
+ base = ghes_get_state_start_address();
+
+ *read_ack_addr = base +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+ error_source_to_index[notify] * sizeof(uint64_t);
+
+ /* Could also be read back from the error_block_address register */
+ *error_block_addr = base +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
+ error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
+
+ return true;
+}
+
NotifierList generic_error_notifiers =
NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
+ uint32_t notify)
+{
+ int read_ack = 0;
+ uint32_t i;
+ uint64_t read_ack_addr = 0;
+ uint64_t error_block_addr = 0;
+ uint32_t data_length;
+ GArray *block;
+
+ if (!ghes_get_addr(notify, &error_block_addr, &read_ack_addr)) {
+ error_setg(errp, "GHES: Invalid error block/ack address(es)");
+ return;
+ }
+
+ cpu_physical_memory_read(read_ack_addr,
+ &read_ack, sizeof(uint64_t));
+
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack) {
+ error_setg(errp,
+ "Last CPER record was not acknowledged yet");
+ read_ack = 1;
+ cpu_physical_memory_write(read_ack_addr,
+ &read_ack, sizeof(uint64_t));
+ return;
+ }
+
+ read_ack = cpu_to_le64(0);
+ cpu_physical_memory_write(read_ack_addr,
+ &read_ack, sizeof(uint64_t));
+
+ /* Build CPER record */
+
+ /*
+ * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
+ * Table 17-13 Generic Error Data Entry
+ */
+ QemuUUID fru_id = {};
+
+ block = g_array_new(false, true /* clear */, 1);
+ data_length = ACPI_GHES_DATA_LENGTH + cper->data_len;
+
+ /*
+ * It should not run out of the preallocated memory if
+ * adding a new generic error data entry
+ */
+ assert((data_length + ACPI_GHES_GESB_SIZE) <=
+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+ /* Build the new generic error status block header */
+ acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
+ 0, 0, data_length,
+ ACPI_CPER_SEV_RECOVERABLE);
+
+ /* Build this new generic error data entry header */
+ acpi_ghes_generic_error_data(block, cper->guid,
+ ACPI_CPER_SEV_RECOVERABLE, 0, 0,
+ cper->data_len, fru_id, 0);
+
+ /* Add CPER data */
+ for (i = 0; i < cper->data_len; i++) {
+ build_append_int_noprefix(block, cper->data[i], 1);
+ }
+
+ /* Write the generic error data entry into guest memory */
+ cpu_physical_memory_write(error_block_addr, block->data, block->len);
+
+ g_array_free(block, true);
+
+ notifier_list_notify(&generic_error_notifiers, NULL);
+}
+
bool acpi_ghes_present(void)
{
AcpiGedState *acpi_ged_state;
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
index 7aa7e71e90dc..d7ff7debee74 100644
--- a/hw/acpi/ghes_cper.c
+++ b/hw/acpi/ghes_cper.c
@@ -39,7 +39,7 @@ void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper,
return;
}
- /* TODO: call a function at ghes */
+ ghes_record_cper_errors(&cper, errp, ACPI_GHES_NOTIFY_GPIO);
g_free(cper.data);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 06a5b8820cd5..ee6f6cd96911 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -85,6 +85,9 @@ typedef struct AcpiGhesCper {
size_t data_len;
} AcpiGhesCper;
+void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
+ uint32_t notify);
+
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
*
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (5 preceding siblings ...)
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
@ 2024-08-02 21:44 ` Mauro Carvalho Chehab
2024-08-06 14:56 ` Igor Mammedov
` (2 more replies)
6 siblings, 3 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-02 21:44 UTC (permalink / raw)
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Cleber Rosa,
John Snow, linux-kernel, qemu-devel
Using the QMP GHESv2 API requires preparing a raw data array
containing a CPER record.
Add a helper script with subcommands to prepare such data.
Currently, only ARM Processor error CPER record is supported.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 3 +
scripts/arm_processor_error.py | 352 +++++++++++++++++++++++++++++++++
scripts/ghes_inject.py | 59 ++++++
scripts/qmp_helper.py | 249 +++++++++++++++++++++++
4 files changed, 663 insertions(+)
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
diff --git a/MAINTAINERS b/MAINTAINERS
index 655edcb6688c..e490f69da1de 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2081,6 +2081,9 @@ S: Maintained
F: hw/arm/ghes_cper.c
F: hw/acpi/ghes_cper_stub.c
F: qapi/ghes-cper.json
+F: scripts/ghes_inject.py
+F: scripts/arm_processor_error.py
+F: scripts/qmp_helper.py
ppc4xx
L: qemu-ppc@nongnu.org
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
new file mode 100644
index 000000000000..df4efa508790
--- /dev/null
+++ b/scripts/arm_processor_error.py
@@ -0,0 +1,352 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+# TODO: current implementation has dummy defaults.
+#
+# For a better implementation, a QMP addition/call is needed to
+# retrieve some data for ARM Processor Error injection:
+#
+# - machine emulation architecture, as ARM current default is
+# for AArch64;
+# - ARM registers: power_state, midr, mpidr.
+
+import argparse
+import json
+
+from qmp_helper import (qmp_command, get_choice, get_mult_array,
+ get_mult_choices, get_mult_int, bit,
+ data_add, to_guid)
+
+# Arm processor EINJ logic
+#
+ACPI_GHES_ARM_CPER_LENGTH = 40
+ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
+
+# TODO: query it from emulation. Current default valid only for Aarch64
+CONTEXT_AARCH64_EL1 = 5
+
+class ArmProcessorEinj:
+ """
+ Implements ARM Processor Error injection via GHES
+ """
+
+ def __init__(self):
+ """Initialize the error injection class"""
+
+ # Valid choice values
+ self.arm_valid_bits = {
+ "mpidr": bit(0),
+ "affinity": bit(1),
+ "running": bit(2),
+ "vendor": bit(3),
+ }
+
+ self.pei_flags = {
+ "first": bit(0),
+ "last": bit(1),
+ "propagated": bit(2),
+ "overflow": bit(3),
+ }
+
+ self.pei_error_types = {
+ "cache": bit(1),
+ "tlb": bit(2),
+ "bus": bit(3),
+ "micro-arch": bit(4),
+ }
+
+ self.pei_valid_bits = {
+ "multiple-error": bit(0),
+ "flags": bit(1),
+ "error-info": bit(2),
+ "virt-addr": bit(3),
+ "phy-addr": bit(4),
+ }
+
+ self.data = bytearray()
+
+ def create_subparser(self, subparsers):
+ """Add a subparser to handle for the error fields"""
+
+ parser = subparsers.add_parser("arm",
+ help="Generate an ARM processor CPER")
+
+ arm_valid_bits = ",".join(self.arm_valid_bits.keys())
+ flags = ",".join(self.pei_flags.keys())
+ error_types = ",".join(self.pei_error_types.keys())
+ pei_valid_bits = ",".join(self.arm_valid_bits.keys())
+
+ # UEFI N.16 ARM Validation bits
+ g_arm = parser.add_argument_group("ARM processor")
+ g_arm.add_argument("--arm", "--arm-valid",
+ help=f"ARM valid bits: {arm_valid_bits}")
+ g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level",
+ type=lambda x: int(x, 0),
+ help="Affinity level (when multiple levels apply)")
+ g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
+ help="Multiprocessor Affinity Register")
+ g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
+ help="Main ID Register")
+ g_arm.add_argument("-r", "--running",
+ action=argparse.BooleanOptionalAction,
+ default=None,
+ help="Indicates if the processor is running or not")
+ g_arm.add_argument("--psci", "--psci-state",
+ type=lambda x: int(x, 0),
+ help="Power State Coordination Interface - PSCI state")
+
+ # TODO: Add vendor-specific support
+
+ # UEFI N.17 bitmaps (type and flags)
+ g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
+ g_pei.add_argument("-t", "--type", nargs="+",
+ help=f"one or more error types: {error_types}")
+ g_pei.add_argument("-f", "--flags", nargs="*",
+ help=f"zero or more error flags: {flags}")
+ g_pei.add_argument("-V", "--pei-valid", "--error-valid", nargs="*",
+ help=f"zero or more PEI valid bits: {pei_valid_bits}")
+
+ # UEFI N.17 Integer values
+ g_pei.add_argument("-m", "--multiple-error", nargs="+",
+ help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")
+ g_pei.add_argument("-e", "--error-info", nargs="+",
+ help="Error information (UEFI 2.10 tables N.18 to N.20)")
+ g_pei.add_argument("-p", "--physical-address", nargs="+",
+ help="Physical address")
+ g_pei.add_argument("-v", "--virtual-address", nargs="+",
+ help="Virtual address")
+
+ # UEFI N.21 Context
+ g_ctx = parser.add_argument_group("Processor Context")
+ g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
+ help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")
+ g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
+ help="Minimal size of the context")
+ g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
+ help="Comma-separated arrays for each context")
+
+ # Vendor-specific data
+ g_vendor = parser.add_argument_group("Vendor-specific data")
+ g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
+ help="Vendor-specific byte arrays of data")
+
+ def parse_args(self, args):
+ """Parse subcommand arguments"""
+
+ cper = {}
+ pei = {}
+ ctx = {}
+ vendor = {}
+
+ arg = vars(args)
+
+ # Handle global parameters
+ if args.arm:
+ arm_valid_init = False
+ cper["valid"] = get_choice(name="valid",
+ value=args.arm,
+ choices=self.arm_valid_bits,
+ suffixes=["-error", "-err"])
+ else:
+ cper["valid"] = 0
+ arm_valid_init = True
+
+ if "running" in arg:
+ if args.running:
+ cper["running-state"] = bit(0)
+ else:
+ cper["running-state"] = 0
+ else:
+ cper["running-state"] = 0
+
+ if arm_valid_init:
+ if args.affinity:
+ cper["valid"] |= self.arm_valid_bits["affinity"]
+
+ if args.mpidr:
+ cper["valid"] |= self.arm_valid_bits["mpidr"]
+
+ if "running-state" in cper:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ if args.psci:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ # Handle PEI
+ if not args.type:
+ args.type = ["cache-error"]
+
+ get_mult_choices(
+ pei,
+ name="valid",
+ values=args.pei_valid,
+ choices=self.pei_valid_bits,
+ suffixes=["-valid", "-info", "--information", "--addr"],
+ )
+ get_mult_choices(
+ pei,
+ name="type",
+ values=args.type,
+ choices=self.pei_error_types,
+ suffixes=["-error", "-err"],
+ )
+ get_mult_choices(
+ pei,
+ name="flags",
+ values=args.flags,
+ choices=self.pei_flags,
+ suffixes=["-error", "-cap"],
+ )
+ get_mult_int(pei, "error-info", args.error_info)
+ get_mult_int(pei, "multiple-error", args.multiple_error)
+ get_mult_int(pei, "phy-addr", args.physical_address)
+ get_mult_int(pei, "virt-addr", args.virtual_address)
+
+ # Handle context
+ get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
+ get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
+ get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
+
+ get_mult_array(vendor, "bytes", args.vendor, max_val=255)
+
+ # Store PEI
+ pei_data = bytearray()
+ default_flags = self.pei_flags["first"]
+ default_flags |= self.pei_flags["last"]
+
+ error_info_num = 0
+
+ for i, p in pei.items(): # pylint: disable=W0612
+ error_info_num += 1
+
+ # UEFI 2.10 doesn't define how to encode error information
+ # when multiple types are raised. So, provide a default only
+ # if a single type is there
+ if "error-info" not in p:
+ if p["type"] == bit(1):
+ p["error-info"] = 0x0091000F
+ if p["type"] == bit(2):
+ p["error-info"] = 0x0054007F
+ if p["type"] == bit(3):
+ p["error-info"] = 0x80D6460FFF
+ if p["type"] == bit(4):
+ p["error-info"] = 0x78DA03FF
+
+ if "valid" not in p:
+ p["valid"] = 0
+ if "multiple-error" in p:
+ p["valid"] |= self.pei_valid_bits["multiple-error"]
+
+ if "flags" in p:
+ p["valid"] |= self.pei_valid_bits["flags"]
+
+ if "error-info" in p:
+ p["valid"] |= self.pei_valid_bits["error-info"]
+
+ if "phy-addr" in p:
+ p["valid"] |= self.pei_valid_bits["phy-addr"]
+
+ if "virt-addr" in p:
+ p["valid"] |= self.pei_valid_bits["virt-addr"]
+
+ # Version
+ data_add(pei_data, 0, 1)
+
+ data_add(pei_data, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
+
+ data_add(pei_data, p["valid"], 2)
+ data_add(pei_data, p["type"], 1)
+ data_add(pei_data, p.get("multiple-error", 1), 2)
+ data_add(pei_data, p.get("flags", default_flags), 1)
+ data_add(pei_data, p.get("error-info", 0), 8)
+ data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
+ data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
+
+ # Store Context
+ ctx_data = bytearray()
+ context_info_num = 0
+
+ if ctx:
+ for k in sorted(ctx.keys()):
+ context_info_num += 1
+
+ if "type" not in ctx:
+ ctx[k]["type"] = CONTEXT_AARCH64_EL1
+
+ if "register" not in ctx:
+ ctx[k]["register"] = []
+
+ reg_size = len(ctx[k]["register"])
+ size = 0
+
+ if "minimal-size" in ctx:
+ size = ctx[k]["minimal-size"]
+
+ size = max(size, reg_size)
+
+ size = (size + 1) % 0xFFFE
+
+ # Version
+ data_add(ctx_data, 0, 2)
+
+ data_add(ctx_data, ctx[k]["type"], 2)
+
+ data_add(ctx_data, 8 * size, 4)
+
+ for r in ctx[k]["register"]:
+ data_add(ctx_data, r, 8)
+
+ for i in range(reg_size, size): # pylint: disable=W0612
+ data_add(ctx_data, 0, 8)
+
+ # Vendor-specific bytes are not grouped
+ vendor_data = bytearray()
+ if vendor:
+ for k in sorted(vendor.keys()):
+ for b in vendor[k]["bytes"]:
+ data_add(vendor_data, b, 1)
+
+ # Encode ARM Processor Error
+ data = bytearray()
+
+ data_add(data, cper["valid"], 4)
+
+ data_add(data, error_info_num, 2)
+ data_add(data, context_info_num, 2)
+
+ # Calculate the length of the CPER data
+ cper_length = ACPI_GHES_ARM_CPER_LENGTH
+ cper_length += len(pei_data)
+ cper_length += len(vendor_data)
+ cper_length += len(ctx_data)
+ data_add(data, cper_length, 4)
+
+ data_add(data, arg.get("affinity-level", 0), 1)
+
+ # Reserved
+ data_add(data, 0, 3)
+
+ data_add(data, arg.get("mpidr-el1", 0), 8)
+ data_add(data, arg.get("midr-el1", 0), 8)
+ data_add(data, cper["running-state"], 4)
+ data_add(data, arg.get("psci-state", 0), 4)
+
+ # Add PEI
+ data.extend(pei_data)
+ data.extend(ctx_data)
+ data.extend(vendor_data)
+
+ self.data = data
+
+ def run(self, host, port):
+ """Execute QMP commands"""
+
+ guid = to_guid(0xE19E3D16, 0xBC11, 0x11E4,
+ [0x9C, 0xAA, 0xC2, 0x05,
+ 0x1D, 0x5D, 0x46, 0xB0])
+
+ qmp_command(host, port, guid, self.data)
diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
new file mode 100755
index 000000000000..8415ccbbc53d
--- /dev/null
+++ b/scripts/ghes_inject.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301, C0114
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+import argparse
+
+from arm_processor_error import ArmProcessorEinj
+
+EINJ_DESCRIPTION = """
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.\n
+
+It allows using UEFI BIOS EINJ features to generate GHES records.
+
+It helps testing Linux CPER and GHES drivers and to test rasdaemon
+error handling logic.
+
+Currently, it support ARM processor error injection for ARM processor
+events, being compatible with UEFI 2.9A Errata.
+
+This small utility works together with those QEMU additions:
+- https://gitlab.com/mchehab_kernel/qemu/-/tree/arm-error-inject-v2
+"""
+
+def main():
+ """Main program"""
+
+ # Main parser - handle generic args like QEMU QMP TCP socket options
+ parser = argparse.ArgumentParser(prog="einj.py",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ usage="%(prog)s [options]",
+ description=EINJ_DESCRIPTION,
+ epilog="If a field is not defined, a default value will be applied by QEMU.")
+
+ g_options = parser.add_argument_group("QEMU QMP socket options")
+ g_options.add_argument("-H", "--host", default="localhost", type=str,
+ help="host name")
+ g_options.add_argument("-P", "--port", default=4445, type=int,
+ help="TCP port number")
+
+ arm_einj = ArmProcessorEinj()
+
+ # Call subparsers
+ subparsers = parser.add_subparsers(dest='command')
+
+ arm_einj.create_subparser(subparsers)
+
+ args = parser.parse_args()
+
+ # Handle subparser commands
+ if args.command == "arm":
+ arm_einj.parse_args(args)
+ arm_einj.run(args.host, args.port)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
new file mode 100644
index 000000000000..13fae7a7af0e
--- /dev/null
+++ b/scripts/qmp_helper.py
@@ -0,0 +1,249 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301, C0114, R0912, R0913, R0915, W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+import json
+import socket
+import sys
+
+from base64 import b64encode
+
+#
+# Socket QMP send command
+#
+def qmp_command(host, port, guid, data):
+ """Send commands to QEMU though QMP TCP socket"""
+
+ # Fill the commands to be sent
+ commands = []
+
+ # Needed to negotiate QMP and for QEMU to accept the command
+ commands.append('{ "execute": "qmp_capabilities" } ')
+
+ base64_data = b64encode(bytes(data)).decode('ascii')
+
+ cmd_arg = {
+ 'cper': {
+ 'notification-type': guid,
+ "raw-data": base64_data
+ }
+ }
+
+ command = '{ "execute": "ghes-cper", '
+ command += '"arguments": ' + json.dumps(cmd_arg) + " }"
+
+ commands.append(command)
+
+ s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ try:
+ s.connect((host, port))
+ except ConnectionRefusedError:
+ sys.exit(f"Can't connect to QMP host {host}:{port}")
+
+ data = s.recv(1024)
+ try:
+ obj = json.loads(data.decode("utf-8"))
+ except json.JSONDecodeError as e:
+ print(f"Invalid QMP answer: {e}")
+ s.close()
+ return
+
+ if "QMP" not in obj:
+ print(f"Invalid QMP answer: {data.decode("utf-8")}")
+ s.close()
+ return
+
+ for i, command in enumerate(commands):
+ s.sendall(command.encode("utf-8"))
+ data = s.recv(1024)
+ try:
+ obj = json.loads(data.decode("utf-8"))
+ except json.JSONDecodeError as e:
+ print(f"Invalid QMP answer: {e}")
+ s.close()
+ return
+
+ if isinstance(obj.get("return"), dict):
+ if obj["return"]:
+ print(json.dumps(obj["return"]))
+ elif i > 0:
+ print("Error injected.")
+ elif isinstance(obj.get("error"), dict):
+ error = obj["error"]
+ print(f'{error["class"]}: {error["desc"]}')
+ else:
+ print(json.dumps(obj))
+
+ s.shutdown(socket.SHUT_WR)
+ while 1:
+ data = s.recv(1024)
+ if data == b"":
+ break
+ try:
+ obj = json.loads(data.decode("utf-8"))
+ except json.JSONDecodeError as e:
+ print(f"Invalid QMP answer: {e}")
+ s.close()
+ return
+
+ if isinstance(obj.get("return"), dict):
+ print(json.dumps(obj["return"]))
+ if isinstance(obj.get("error"), dict):
+ error = obj["error"]
+ print(f'{error["class"]}: {error["desc"]}')
+ else:
+ print(json.dumps(obj))
+
+ s.close()
+
+
+#
+# Helper routines to handle multiple choice arguments
+#
+def get_choice(name, value, choices, suffixes=None):
+ """Produce a list from multiple choice argument"""
+
+ new_values = 0
+
+ if not value:
+ return new_values
+
+ for val in value.split(","):
+ val = val.lower()
+
+ if suffixes:
+ for suffix in suffixes:
+ val = val.removesuffix(suffix)
+
+ if val not in choices.keys():
+ sys.exit(f"Error on '{name}': choice {val} is invalid.")
+
+ val = choices[val]
+
+ new_values |= val
+
+ return new_values
+
+
+def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
+ """Add numbered hashes from integer lists"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ if not values:
+ i = 0
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = []
+ return
+
+ i = 0
+ for value in values:
+ for val in value.split(","):
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if max_val and val > max_val:
+ sys.exit(f"Error on '{name}': {val} is too little")
+
+ if i not in mult:
+ mult[i] = {}
+
+ if name not in mult[i]:
+ mult[i][name] = []
+
+ mult[i][name].append(val)
+
+ i += 1
+
+
+def get_mult_choices(mult, name, values, choices,
+ suffixes=None, allow_zero=False):
+ """Add numbered hashes from multiple choice arguments"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ new_values = get_choice(name, val, choices, suffixes)
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = new_values
+ i += 1
+
+
+def get_mult_int(mult, name, values, allow_zero=False):
+ """Add numbered hashes from integer arguments"""
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = val
+ i += 1
+
+
+#
+# Data encode helper functions
+#
+def bit(b):
+ """Simple macro to define a bit on a bitmask"""
+ return 1 << b
+
+
+def data_add(data, value, num_bytes):
+ """Adds bytes from value inside a bitarray"""
+
+ data.extend(value.to_bytes(num_bytes, byteorder="little"))
+
+def to_guid(time_low, time_mid, time_high, nodes):
+ """Create an GUID string"""
+
+ assert(len(nodes) == 8)
+
+ clock = nodes[0] << 8 | nodes[1]
+
+ node = 0
+ for i in range(2, len(nodes)):
+ node = node << 8 | nodes[i]
+
+ s = f"{time_low:08x}-{time_mid:04x}-"
+ s += f"{time_high:04x}-{clock:04x}-{node:012x}"
+
+ return s
--
2.45.2
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2024-08-05 16:39 ` Jonathan Cameron via
2024-08-06 5:50 ` Mauro Carvalho Chehab
2024-08-06 8:54 ` Igor Mammedov
1 sibling, 1 reply; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-05 16:39 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Igor Mammedov,
linux-kernel, qemu-devel
On Fri, 2 Aug 2024 23:43:57 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Adds a Generic Event Device to handle generic hardware error
> events, supporting General Purpose Event (GPE) as specified at
> ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> using HID PNP0C33.
>
> The PNP0C33 device is used to report hardware errors to
> the bios via ACPI APEI Generic Hardware Error Source (GHES).
>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Much nicer with a GED event.
Happy to give SoB on this as you requested due to changes.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
One minor comment though.
The pnp0c33 device isn't technically coupled to the generic_event_device.
Perhaps that should be in aml_build.h/.c instead of where you
have it here?
Maybe we can move it later though if anyone implements non GED signalling?
Jonathan
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/generic_event_device.c | 17 +++++++++++++++++
> include/hw/acpi/acpi_dev_interface.h | 1 +
> include/hw/acpi/generic_event_device.h | 3 +++
> 3 files changed, 21 insertions(+)
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf24..b9ad05e98c05 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
> ACPI_GED_PWR_DOWN_EVT,
> ACPI_GED_NVDIMM_HOTPLUG_EVT,
> ACPI_GED_CPU_HOTPLUG_EVT,
> + ACPI_GED_ERROR_EVT
> };
>
> /*
> @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
> aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> aml_int(0x80)));
> break;
> + case ACPI_GED_ERROR_EVT:
> + aml_append(if_ctx,
> + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> + aml_int(0x80)));
> + break;
> case ACPI_GED_NVDIMM_HOTPLUG_EVT:
> aml_append(if_ctx,
> aml_notify(aml_name("\\_SB.NVDR"),
> @@ -153,6 +159,15 @@ void acpi_dsdt_add_power_button(Aml *scope)
> aml_append(scope, dev);
> }
>
> +void acpi_dsdt_add_error_device(Aml *scope)
> +{
> + Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> + aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> + aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> + aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> + aml_append(scope, dev);
> +}
> +
> /* Memory read by the GED _EVT AML dynamic method */
> static uint64_t ged_evt_read(void *opaque, hwaddr addr, unsigned size)
> {
> @@ -295,6 +310,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> sel = ACPI_GED_MEM_HOTPLUG_EVT;
> } else if (ev & ACPI_POWER_DOWN_STATUS) {
> sel = ACPI_GED_PWR_DOWN_EVT;
> + } else if (ev & ACPI_GENERIC_ERROR) {
> + sel = ACPI_GED_ERROR_EVT;
> } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
> sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
> } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> index 68d9d15f50aa..8294f8f0ccca 100644
> --- a/include/hw/acpi/acpi_dev_interface.h
> +++ b/include/hw/acpi/acpi_dev_interface.h
> @@ -13,6 +13,7 @@ typedef enum {
> ACPI_NVDIMM_HOTPLUG_STATUS = 16,
> ACPI_VMGENID_CHANGE_STATUS = 32,
> ACPI_POWER_DOWN_STATUS = 64,
> + ACPI_GENERIC_ERROR = 128,
> } AcpiEventStatusBits;
>
> #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> index 40af3550b56d..b8f2f1328e0c 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -66,6 +66,7 @@
> #include "qom/object.h"
>
> #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> +#define ACPI_APEI_ERROR_DEVICE "GEDD"
>
> #define TYPE_ACPI_GED "acpi-ged"
> OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> @@ -98,6 +99,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> #define ACPI_GED_PWR_DOWN_EVT 0x2
> #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> #define ACPI_GED_CPU_HOTPLUG_EVT 0x8
> +#define ACPI_GED_ERROR_EVT 0x10
>
> typedef struct GEDState {
> MemoryRegion evt;
> @@ -120,5 +122,6 @@ struct AcpiGedState {
> void build_ged_aml(Aml *table, const char* name, HotplugHandler *hotplug_dev,
> uint32_t ged_irq, AmlRegionSpace rs, hwaddr ged_base);
> void acpi_dsdt_add_power_button(Aml *scope);
> +void acpi_dsdt_add_error_device(Aml *scope);
>
> #endif
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
@ 2024-08-05 16:54 ` Jonathan Cameron via
2024-08-06 5:56 ` Mauro Carvalho Chehab
2024-08-06 9:15 ` Igor Mammedov
1 sibling, 1 reply; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-05 16:54 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, Peter Maydell, Shannon Zhao, linux-kernel,
qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:43:58 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
Do we need to rename this now there is a GED involved?
Is it even technically a GPIO any more?
Spec says in 18.3.2.7
HW-reduced ACPI platforms signal the error using a GPIO
interrupt or another interrupt declared under
a generic event device (Interrupt-signaled ACPI events)
and goes on to say that a _CRS entry is used to
list the interrupt.
Give the Generic Event Device has a _CRS
with aml_interrupt() as the type I think we should
even have the hest entry say it's an interrupt (external?)
rather than a gpio.
> Adds support to ARM virtualization to allow handling
> a General Purpose Event (GPE) via GED error device.
>
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Again, more or less fine with this
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
to go with that co-auth
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 3 +++
> hw/arm/virt-acpi-build.c | 1 +
> hw/arm/virt.c | 16 +++++++++++++++-
> include/hw/acpi/ghes.h | 3 +++
> include/hw/arm/virt.h | 1 +
> 5 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index e9511d9b8f71..8d0262e6c1aa 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> return ret;
> }
>
> +NotifierList generic_error_notifiers =
> + NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
> +
> bool acpi_ghes_present(void)
> {
> AcpiGedState *acpi_ged_state;
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..f8bbe3e7a0b8 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> }
>
> acpi_dsdt_add_power_button(scope);
> + acpi_dsdt_add_error_device(scope);
> #ifdef CONFIG_TPM
> acpi_dsdt_add_tpm(scope, vms);
> #endif
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 687fe0bb8bc9..8b315328154f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,6 +73,7 @@
> #include "standard-headers/linux/input.h"
> #include "hw/arm/smmuv3.h"
> #include "hw/acpi/acpi.h"
> +#include "hw/acpi/ghes.h"
> #include "target/arm/cpu-qom.h"
> #include "target/arm/internals.h"
> #include "target/arm/multiprocessing.h"
> @@ -677,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
> DeviceState *dev;
> MachineState *ms = MACHINE(vms);
> int irq = vms->irqmap[VIRT_ACPI_GED];
> - uint32_t event = ACPI_GED_PWR_DOWN_EVT;
> + uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
>
> if (ms->ram_slots) {
> event |= ACPI_GED_MEM_HOTPLUG_EVT;
> @@ -1009,6 +1010,15 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> }
> }
>
> +static void virt_generic_error_req(Notifier *n, void *opaque)
> +{
> + VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
> +
> + if (s->acpi_dev) {
> + acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
> + }
> +}
> +
> static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> uint32_t phandle)
> {
> @@ -2397,6 +2407,10 @@ static void machvirt_init(MachineState *machine)
> vms->powerdown_notifier.notify = virt_powerdown_req;
> qemu_register_powerdown_notifier(&vms->powerdown_notifier);
>
> + vms->generic_error_notifier.notify = virt_generic_error_req;
> + notifier_list_add(&generic_error_notifiers,
> + &vms->generic_error_notifier);
> +
> /* Create mmio transports, so the user can create virtio backends
> * (which will be automatically plugged in to the transports). If
> * no backend is created the transport will just sit harmlessly idle.
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 674f6958e905..6891eafff5ab 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,9 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/notify.h"
> +
> +extern NotifierList generic_error_notifiers;
>
> /*
> * Values for Hardware Error Notification Type field
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a4d937ed45ac..ad9f6e94dcc5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -175,6 +175,7 @@ struct VirtMachineState {
> DeviceState *gic;
> DeviceState *acpi_dev;
> Notifier powerdown_notifier;
> + Notifier generic_error_notifier;
> PCIBus *bus;
> char *oem_id;
> char *oem_table_id;
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
@ 2024-08-05 16:56 ` Jonathan Cameron via
2024-08-06 6:09 ` Mauro Carvalho Chehab
2024-08-06 9:32 ` Igor Mammedov
1 sibling, 1 reply; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-05 16:56 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:43:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Add error notification to GHES v2 using the GPIO source.
The gpio / external interrupt follows through.
>
> [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks]
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 16 ++++++++++------
> include/hw/acpi/ghes.h | 3 ++-
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 8d0262e6c1aa..a745dcc7be5e 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -34,8 +34,8 @@
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
>
> -/* Now only support ARMv8 SEA notification type error source */
> -#define ACPI_GHES_ERROR_SOURCE_COUNT 1
> +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT 2
>
> /* Generic Hardware Error Source version 2 */
> #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
> @@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> {
> uint64_t address_offset;
> +
> + assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> +
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> */
> build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> break;
> + case ACPI_HEST_SRC_ID_GPIO:
> + build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
> + break;
> default:
> error_report("Not support this error source");
> abort();
> @@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> + build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
>
> acpi_table_end(linker, &table);
> }
> @@ -406,10 +413,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> start_addr = le64_to_cpu(ags->ghes_addr_le);
>
> if (physical_address) {
> -
> - if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> + start_addr += source_id * sizeof(uint64_t);
>
> cpu_physical_memory_read(start_addr, &error_block_addr,
> sizeof(error_block_addr));
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 6891eafff5ab..33be1eb5acf4 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -59,9 +59,10 @@ enum AcpiGhesNotifyType {
> ACPI_GHES_NOTIFY_RESERVED = 12
> };
>
> +/* Those are used as table indexes when building GHES tables */
> enum {
> ACPI_HEST_SRC_ID_SEA = 0,
> - /* future ids go here */
> + ACPI_HEST_SRC_ID_GPIO,
> ACPI_HEST_SRC_ID_RESERVED,
> };
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2024-08-05 17:00 ` Jonathan Cameron via
2024-08-06 9:15 ` Shiju Jose via
` (2 subsequent siblings)
3 siblings, 0 replies; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-05 17:00 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Eric Blake, Igor Mammedov, Markus Armbruster, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:44:00 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2.
>
> The actual GHES code will be added at the followup patch.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Looks good to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
@ 2024-08-05 17:03 ` Jonathan Cameron via
2024-08-06 11:13 ` Shiju Jose via
2024-08-06 14:31 ` Igor Mammedov
2 siblings, 0 replies; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-05 17:03 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:44:01 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Provide a generic interface for error injection via GHESv2.
>
> This patch is co-authored:
> - original ghes logic to inject a simple ARM record by Shiju Jose;
> - generic logic to handle block addresses by Jonathan Cameron;
> - generic GHESv2 error inject by Mauro Carvalho Chehab;
>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Looks fine to me.
Feel free to put in my SoB on the resulting co-auth
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
if appropriate? Does this work same as kernel co-developed-by?
> +void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
> + uint32_t notify)
> +{
> + int read_ack = 0;
> + uint32_t i;
> + uint64_t read_ack_addr = 0;
> + uint64_t error_block_addr = 0;
> + uint32_t data_length;
> + GArray *block;
> +
> + if (!ghes_get_addr(notify, &error_block_addr, &read_ack_addr)) {
> + error_setg(errp, "GHES: Invalid error block/ack address(es)");
> + return;
> + }
> +
> + cpu_physical_memory_read(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
> +
> + /* zero means OSPM does not acknowledge the error */
> + if (!read_ack) {
> + error_setg(errp,
> + "Last CPER record was not acknowledged yet");
> + read_ack = 1;
> + cpu_physical_memory_write(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
> + return;
> + }
> +
> + read_ack = cpu_to_le64(0);
> + cpu_physical_memory_write(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
> +
> + /* Build CPER record */
> +
> + /*
> + * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> + * Table 17-13 Generic Error Data Entry
> + */
> + QemuUUID fru_id = {};
> +
> + block = g_array_new(false, true /* clear */, 1);
> + data_length = ACPI_GHES_DATA_LENGTH + cper->data_len;
> +
> + /*
Odd formatting.
> + * It should not run out of the preallocated memory if
> + * adding a new generic error data entry
> + */
> + assert((data_length + ACPI_GHES_GESB_SIZE) <=
> + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> +
> + /* Build the new generic error status block header */
> + acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> + 0, 0, data_length,
> + ACPI_CPER_SEV_RECOVERABLE);
> +
> + /* Build this new generic error data entry header */
> + acpi_ghes_generic_error_data(block, cper->guid,
> + ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> + cper->data_len, fru_id, 0);
> +
> + /* Add CPER data */
> + for (i = 0; i < cper->data_len; i++) {
> + build_append_int_noprefix(block, cper->data[i], 1);
> + }
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(error_block_addr, block->data, block->len);
> +
> + g_array_free(block, true);
> +
> + notifier_list_notify(&generic_error_notifiers, NULL);
> +}
> +
> bool acpi_ghes_present(void)
> {
> AcpiGedState *acpi_ged_state;
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device
2024-08-05 16:39 ` Jonathan Cameron via
@ 2024-08-06 5:50 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-06 5:50 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Igor Mammedov,
linux-kernel, qemu-devel
Em Mon, 5 Aug 2024 17:39:46 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Fri, 2 Aug 2024 23:43:57 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Adds a Generic Event Device to handle generic hardware error
> > events, supporting General Purpose Event (GPE) as specified at
> > ACPI 6.5 specification at 18.3.2.7.2:
> > https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> > using HID PNP0C33.
> >
> > The PNP0C33 device is used to report hardware errors to
> > the bios via ACPI APEI Generic Hardware Error Source (GHES).
> >
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Much nicer with a GED event.
> Happy to give SoB on this as you requested due to changes.
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> One minor comment though.
> The pnp0c33 device isn't technically coupled to the generic_event_device.
> Perhaps that should be in aml_build.h/.c instead of where you
> have it here?
>
> Maybe we can move it later though if anyone implements non GED signalling?
I opted to place it there at hw/acpi/generic_event_device.c, just after
PNP0C0C, e. g.:
void acpi_dsdt_add_power_button(Aml *scope)
{
Aml *dev = aml_device(ACPI_POWER_BUTTON_DEVICE);
aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C0C")));
aml_append(dev, aml_name_decl("_UID", aml_int(0)));
aml_append(scope, dev);
}
void acpi_dsdt_add_error_device(Aml *scope)
{
Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
aml_append(dev, aml_name_decl("_UID", aml_int(0)));
aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
aml_append(scope, dev);
}
IMO this way it will be kept closer to other PNP devices. If this starts
to grow, then some later cleanup could move those to some separate file,
but, as now there are just two, I would just keep both there at GED
file.
>
> Jonathan
>
>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/generic_event_device.c | 17 +++++++++++++++++
> > include/hw/acpi/acpi_dev_interface.h | 1 +
> > include/hw/acpi/generic_event_device.h | 3 +++
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > index 15b4c3ebbf24..b9ad05e98c05 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
> > ACPI_GED_PWR_DOWN_EVT,
> > ACPI_GED_NVDIMM_HOTPLUG_EVT,
> > ACPI_GED_CPU_HOTPLUG_EVT,
> > + ACPI_GED_ERROR_EVT
> > };
> >
> > /*
> > @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
> > aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> > aml_int(0x80)));
> > break;
> > + case ACPI_GED_ERROR_EVT:
> > + aml_append(if_ctx,
> > + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> > + aml_int(0x80)));
> > + break;
> > case ACPI_GED_NVDIMM_HOTPLUG_EVT:
> > aml_append(if_ctx,
> > aml_notify(aml_name("\\_SB.NVDR"),
> > @@ -153,6 +159,15 @@ void acpi_dsdt_add_power_button(Aml *scope)
> > aml_append(scope, dev);
> > }
> >
> > +void acpi_dsdt_add_error_device(Aml *scope)
> > +{
> > + Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> > + aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> > + aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> > + aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> > + aml_append(scope, dev);
> > +}
> > +
> > /* Memory read by the GED _EVT AML dynamic method */
> > static uint64_t ged_evt_read(void *opaque, hwaddr addr, unsigned size)
> > {
> > @@ -295,6 +310,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> > sel = ACPI_GED_MEM_HOTPLUG_EVT;
> > } else if (ev & ACPI_POWER_DOWN_STATUS) {
> > sel = ACPI_GED_PWR_DOWN_EVT;
> > + } else if (ev & ACPI_GENERIC_ERROR) {
> > + sel = ACPI_GED_ERROR_EVT;
> > } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
> > sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
> > } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> > diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> > index 68d9d15f50aa..8294f8f0ccca 100644
> > --- a/include/hw/acpi/acpi_dev_interface.h
> > +++ b/include/hw/acpi/acpi_dev_interface.h
> > @@ -13,6 +13,7 @@ typedef enum {
> > ACPI_NVDIMM_HOTPLUG_STATUS = 16,
> > ACPI_VMGENID_CHANGE_STATUS = 32,
> > ACPI_POWER_DOWN_STATUS = 64,
> > + ACPI_GENERIC_ERROR = 128,
> > } AcpiEventStatusBits;
> >
> > #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> > diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> > index 40af3550b56d..b8f2f1328e0c 100644
> > --- a/include/hw/acpi/generic_event_device.h
> > +++ b/include/hw/acpi/generic_event_device.h
> > @@ -66,6 +66,7 @@
> > #include "qom/object.h"
> >
> > #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> > +#define ACPI_APEI_ERROR_DEVICE "GEDD"
> >
> > #define TYPE_ACPI_GED "acpi-ged"
> > OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> > @@ -98,6 +99,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> > #define ACPI_GED_PWR_DOWN_EVT 0x2
> > #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> > #define ACPI_GED_CPU_HOTPLUG_EVT 0x8
> > +#define ACPI_GED_ERROR_EVT 0x10
> >
> > typedef struct GEDState {
> > MemoryRegion evt;
> > @@ -120,5 +122,6 @@ struct AcpiGedState {
> > void build_ged_aml(Aml *table, const char* name, HotplugHandler *hotplug_dev,
> > uint32_t ged_irq, AmlRegionSpace rs, hwaddr ged_base);
> > void acpi_dsdt_add_power_button(Aml *scope);
> > +void acpi_dsdt_add_error_device(Aml *scope);
> >
> > #endif
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES
2024-08-05 16:54 ` Jonathan Cameron via
@ 2024-08-06 5:56 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-06 5:56 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, Peter Maydell, Shannon Zhao, linux-kernel,
qemu-arm, qemu-devel
Em Mon, 5 Aug 2024 17:54:00 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Fri, 2 Aug 2024 23:43:58 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> Do we need to rename this now there is a GED involved?
> Is it even technically a GPIO any more?
> Spec says in 18.3.2.7
> HW-reduced ACPI platforms signal the error using a GPIO
> interrupt or another interrupt declared under
> a generic event device (Interrupt-signaled ACPI events)
> and goes on to say that a _CRS entry is used to
> list the interrupt.
>
> Give the Generic Event Device has a _CRS
> with aml_interrupt() as the type I think we should
> even have the hest entry say it's an interrupt (external?)
> rather than a gpio.
True. I'll change patch description to:
arm/virt: Wire up a GED error device for ACPI / GHES
Adds support to ARM virtualization to allow handling
a General Purpose Event (GPE) via GED error device.
It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
As the spec at
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
revers to it as:
"The implementation of Event notification requires the platform
to define a device with PNP ID PNP0C33 in the ACPI namespace,
referred to as the error device."
> > Adds support to ARM virtualization to allow handling
> > a General Purpose Event (GPE) via GED error device.
> >
> > It is aligned with Linux Kernel patch:
> > https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
> >
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Again, more or less fine with this
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> to go with that co-auth
Thanks!
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-05 16:56 ` Jonathan Cameron via
@ 2024-08-06 6:09 ` Mauro Carvalho Chehab
2024-08-06 9:18 ` Igor Mammedov
0 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-06 6:09 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, linux-kernel, qemu-arm, qemu-devel
Em Mon, 5 Aug 2024 17:56:17 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Fri, 2 Aug 2024 23:43:59 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >
> > Add error notification to GHES v2 using the GPIO source.
>
> The gpio / external interrupt follows through.
True. As session 18.3.2.7 of the spec says:
The OSPM evaluates the control method associated with this event
as indicated in The Event Method for Handling GPIO Signaled Events
and The Event Method for Handling Interrupt Signaled Events.
E. g. defining two methods:
- GED GPIO;
- GED interrupt
I'm doing this rename:
ACPI_HEST_SRC_ID_GPIO -> ACPI_HEST_SRC_ID_GED_INT
To clearly state what it is implemented there.
I'm also changing patch description to:
acpi/ghes: Add support for General Purpose Event
As a GED error device is now defined, add another type
of notification.
Add error notification to GHES v2 using the GPIO source.
[mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and
rename HEST event to better identify GED interrupt OSPM]
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Regards,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-05 16:39 ` Jonathan Cameron via
@ 2024-08-06 8:54 ` Igor Mammedov
1 sibling, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 8:54 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
linux-kernel, qemu-devel
On Fri, 2 Aug 2024 23:43:57 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
subj: s/APEI/error/
to match spec
> Adds a Generic Event Device to handle generic hardware error
^^^^^
Did you want to say: Error ?
> events, supporting General Purpose Event (GPE) as specified at
even though GPE can be used (for example with non hw-reduced pc/q35 machines),
it's not what you are doing here.
s/General Purpose Event (GPE)/generic event device/
> ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> using HID PNP0C33.
>
> The PNP0C33 device is used to report hardware errors to
> the bios via ACPI APEI Generic Hardware Error Source (GHES).
event is sent not to 'bios' but to the guest, OSPM in spec language
>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/generic_event_device.c | 17 +++++++++++++++++
> include/hw/acpi/acpi_dev_interface.h | 1 +
> include/hw/acpi/generic_event_device.h | 3 +++
> 3 files changed, 21 insertions(+)
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf24..b9ad05e98c05 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
> ACPI_GED_PWR_DOWN_EVT,
> ACPI_GED_NVDIMM_HOTPLUG_EVT,
> ACPI_GED_CPU_HOTPLUG_EVT,
> + ACPI_GED_ERROR_EVT
> };
>
> /*
> @@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
> aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
> aml_int(0x80)));
> break;
> + case ACPI_GED_ERROR_EVT:
> + aml_append(if_ctx,
> + aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
> + aml_int(0x80)));
> + break;
> case ACPI_GED_NVDIMM_HOTPLUG_EVT:
> aml_append(if_ctx,
> aml_notify(aml_name("\\_SB.NVDR"),
> @@ -153,6 +159,15 @@ void acpi_dsdt_add_power_button(Aml *scope)
> aml_append(scope, dev);
> }
put mandatory comment here, in format: earliest spec rev + chapter
> +void acpi_dsdt_add_error_device(Aml *scope)
s/void acpi_dsdt_add_error_device/Aml* aml_error_device()/
> +{
> + Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
> + aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
> + aml_append(dev, aml_name_decl("_UID", aml_int(0)));
> + aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
not necessary unless you want set it anything else beside 0xF
> + aml_append(scope, dev);
> +}
and maybe move the function to aml-build.c
> /* Memory read by the GED _EVT AML dynamic method */
> static uint64_t ged_evt_read(void *opaque, hwaddr addr, unsigned size)
> {
> @@ -295,6 +310,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
> sel = ACPI_GED_MEM_HOTPLUG_EVT;
> } else if (ev & ACPI_POWER_DOWN_STATUS) {
> sel = ACPI_GED_PWR_DOWN_EVT;
> + } else if (ev & ACPI_GENERIC_ERROR) {
> + sel = ACPI_GED_ERROR_EVT;
> } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
> sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
> } else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
> diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
> index 68d9d15f50aa..8294f8f0ccca 100644
> --- a/include/hw/acpi/acpi_dev_interface.h
> +++ b/include/hw/acpi/acpi_dev_interface.h
> @@ -13,6 +13,7 @@ typedef enum {
> ACPI_NVDIMM_HOTPLUG_STATUS = 16,
> ACPI_VMGENID_CHANGE_STATUS = 32,
> ACPI_POWER_DOWN_STATUS = 64,
> + ACPI_GENERIC_ERROR = 128,
> } AcpiEventStatusBits;
>
> #define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
> diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
> index 40af3550b56d..b8f2f1328e0c 100644
> --- a/include/hw/acpi/generic_event_device.h
> +++ b/include/hw/acpi/generic_event_device.h
> @@ -66,6 +66,7 @@
> #include "qom/object.h"
>
> #define ACPI_POWER_BUTTON_DEVICE "PWRB"
> +#define ACPI_APEI_ERROR_DEVICE "GEDD"
perhaps aml_build.h would be a better place
(if you consider using it with pc/q35 machines)
> #define TYPE_ACPI_GED "acpi-ged"
> OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> @@ -98,6 +99,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
> #define ACPI_GED_PWR_DOWN_EVT 0x2
> #define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
> #define ACPI_GED_CPU_HOTPLUG_EVT 0x8
> +#define ACPI_GED_ERROR_EVT 0x10
>
> typedef struct GEDState {
> MemoryRegion evt;
> @@ -120,5 +122,6 @@ struct AcpiGedState {
> void build_ged_aml(Aml *table, const char* name, HotplugHandler *hotplug_dev,
> uint32_t ged_irq, AmlRegionSpace rs, hwaddr ged_base);
> void acpi_dsdt_add_power_button(Aml *scope);
> +void acpi_dsdt_add_error_device(Aml *scope);
>
> #endif
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 1/7] arm/virt: place power button pin number on a define
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
@ 2024-08-06 8:57 ` Igor Mammedov
0 siblings, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 8:57 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:43:56 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Having magic numbers inside the code is not a good idea, as it
> is error-prone. So, instead, create a macro with the number
> definition.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/arm/virt-acpi-build.c | 6 +++---
> hw/arm/virt.c | 7 ++++---
> include/hw/arm/virt.h | 3 +++
> 3 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index e10cad86dd73..f76fb117adff 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -154,10 +154,10 @@ static void acpi_dsdt_add_gpio(Aml *scope, const MemMapEntry *gpio_memmap,
> aml_append(dev, aml_name_decl("_CRS", crs));
>
> Aml *aei = aml_resource_template();
> - /* Pin 3 for power button */
> - const uint32_t pin_list[1] = {3};
> +
> + const uint32_t pin = GPIO_PIN_POWER_BUTTON;
> aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
> - AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
> + AML_EXCLUSIVE, AML_PULL_UP, 0, &pin, 1,
> "GPO0", NULL, 0));
> aml_append(dev, aml_name_decl("_AEI", aei));
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 719e83e6a1e7..687fe0bb8bc9 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1004,7 +1004,7 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> if (s->acpi_dev) {
> acpi_send_event(s->acpi_dev, ACPI_POWER_DOWN_STATUS);
> } else {
> - /* use gpio Pin 3 for power button event */
> + /* use gpio Pin for power button event */
> qemu_set_irq(qdev_get_gpio_in(gpio_key_dev, 0), 1);
> }
> }
> @@ -1013,7 +1013,8 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> uint32_t phandle)
> {
> gpio_key_dev = sysbus_create_simple("gpio-key", -1,
> - qdev_get_gpio_in(pl061_dev, 3));
> + qdev_get_gpio_in(pl061_dev,
> + GPIO_PIN_POWER_BUTTON));
>
> qemu_fdt_add_subnode(fdt, "/gpio-keys");
> qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
> @@ -1024,7 +1025,7 @@ static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> qemu_fdt_setprop_cell(fdt, "/gpio-keys/poweroff", "linux,code",
> KEY_POWER);
> qemu_fdt_setprop_cells(fdt, "/gpio-keys/poweroff",
> - "gpios", phandle, 3, 0);
> + "gpios", phandle, GPIO_PIN_POWER_BUTTON, 0);
> }
>
> #define SECURE_GPIO_POWEROFF 0
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index ab961bb6a9b8..a4d937ed45ac 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -47,6 +47,9 @@
> /* See Linux kernel arch/arm64/include/asm/pvclock-abi.h */
> #define PVTIME_SIZE_PER_CPU 64
>
> +/* GPIO pins */
> +#define GPIO_PIN_POWER_BUTTON 3
> +
> enum {
> VIRT_FLASH,
> VIRT_MEM,
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-08-05 16:54 ` Jonathan Cameron via
@ 2024-08-06 9:15 ` Igor Mammedov
1 sibling, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 9:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
qemu-devel
On Fri, 2 Aug 2024 23:43:58 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
subj: s/GPIO/GED signaled/
> Adds support to ARM virtualization to allow handling
> a General Purpose Event (GPE) via GED error device.
s/General Purpose Event (GPE).../
generic error ACPI Event via GED & error source device
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 3 +++
> hw/arm/virt-acpi-build.c | 1 +
> hw/arm/virt.c | 16 +++++++++++++++-
> include/hw/acpi/ghes.h | 3 +++
> include/hw/arm/virt.h | 1 +
> 5 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index e9511d9b8f71..8d0262e6c1aa 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -444,6 +444,9 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> return ret;
> }
>
> +NotifierList generic_error_notifiers =
> + NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
s/generic_error_notifiers/acpi_generic_error_notifiers/
> bool acpi_ghes_present(void)
> {
> AcpiGedState *acpi_ged_state;
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..f8bbe3e7a0b8 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
> }
>
> acpi_dsdt_add_power_button(scope);
> + acpi_dsdt_add_error_device(scope);
with suggested change in 2/7, this will look like
aml_append(scope, aml_foo_device());
> #ifdef CONFIG_TPM
> acpi_dsdt_add_tpm(scope, vms);
> #endif
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 687fe0bb8bc9..8b315328154f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -73,6 +73,7 @@
> #include "standard-headers/linux/input.h"
> #include "hw/arm/smmuv3.h"
> #include "hw/acpi/acpi.h"
> +#include "hw/acpi/ghes.h"
> #include "target/arm/cpu-qom.h"
> #include "target/arm/internals.h"
> #include "target/arm/multiprocessing.h"
> @@ -677,7 +678,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
> DeviceState *dev;
> MachineState *ms = MACHINE(vms);
> int irq = vms->irqmap[VIRT_ACPI_GED];
> - uint32_t event = ACPI_GED_PWR_DOWN_EVT;
> + uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
>
> if (ms->ram_slots) {
> event |= ACPI_GED_MEM_HOTPLUG_EVT;
> @@ -1009,6 +1010,15 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
> }
> }
>
> +static void virt_generic_error_req(Notifier *n, void *opaque)
> +{
> + VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
> +
> + if (s->acpi_dev) {
I'd assert her, and move check to caller so it won't even add
a notifier if acpi_dev is not present
> + acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
> + }
> +}
> +
> static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
> uint32_t phandle)
> {
> @@ -2397,6 +2407,10 @@ static void machvirt_init(MachineState *machine)
> vms->powerdown_notifier.notify = virt_powerdown_req;
> qemu_register_powerdown_notifier(&vms->powerdown_notifier);
>
> + vms->generic_error_notifier.notify = virt_generic_error_req;
> + notifier_list_add(&generic_error_notifiers,
> + &vms->generic_error_notifier);
> +
> /* Create mmio transports, so the user can create virtio backends
> * (which will be automatically plugged in to the transports). If
> * no backend is created the transport will just sit harmlessly idle.
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 674f6958e905..6891eafff5ab 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,9 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qemu/notify.h"
> +
> +extern NotifierList generic_error_notifiers;
>
> /*
> * Values for Hardware Error Notification Type field
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index a4d937ed45ac..ad9f6e94dcc5 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -175,6 +175,7 @@ struct VirtMachineState {
> DeviceState *gic;
> DeviceState *acpi_dev;
> Notifier powerdown_notifier;
> + Notifier generic_error_notifier;
> PCIBus *bus;
> char *oem_id;
> char *oem_table_id;
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-05 17:00 ` Jonathan Cameron via
@ 2024-08-06 9:15 ` Shiju Jose via
2024-08-06 12:51 ` Igor Mammedov
2024-08-08 8:50 ` Markus Armbruster
3 siblings, 0 replies; 54+ messages in thread
From: Shiju Jose via @ 2024-08-06 9:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Eric Blake, Igor Mammedov, Markus Armbruster, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel@vger.kernel.org,
qemu-arm@nongnu.org, qemu-devel@nongnu.org
>-----Original Message-----
>From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>Sent: 02 August 2024 22:44
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>; Shiju Jose
><shiju.jose@huawei.com>; Mauro Carvalho Chehab
><mchehab+huawei@kernel.org>; Michael S. Tsirkin <mst@redhat.com>; Ani
>Sinha <anisinha@redhat.com>; Dongjiu Geng <gengdongjiu1@gmail.com>; Eric
>Blake <eblake@redhat.com>; Igor Mammedov <imammedo@redhat.com>;
>Markus Armbruster <armbru@redhat.com>; Michael Roth
><michael.roth@amd.com>; Paolo Bonzini <pbonzini@redhat.com>; Peter
>Maydell <peter.maydell@linaro.org>; linux-kernel@vger.kernel.org; qemu-
>arm@nongnu.org; qemu-devel@nongnu.org
>Subject: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER
>error injection
>
>Creates a QMP command to be used for generic ACPI APEI hardware error
>injection (HEST) via GHESv2.
>
>The actual GHES code will be added at the followup patch.
>
>Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Few minor comments inline.
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>---
> MAINTAINERS | 7 +++++
> hw/acpi/Kconfig | 5 ++++
> hw/acpi/ghes_cper.c | 45 ++++++++++++++++++++++++++++++++
> hw/acpi/ghes_cper_stub.c | 18 +++++++++++++
> hw/acpi/meson.build | 2 ++
> hw/arm/Kconfig | 5 ++++
> include/hw/acpi/ghes.h | 7 +++++
> qapi/ghes-cper.json | 55 ++++++++++++++++++++++++++++++++++++++++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> 10 files changed, 146 insertions(+)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c create mode 100644
>qapi/ghes-cper.json
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index 98eddf7ae155..655edcb6688c 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> F: include/hw/acpi/ghes.h
> F: docs/specs/acpi_hest_ghes.rst
>
>+ACPI/HEST/GHES/ARM processor CPER
>+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>+S: Maintained
>+F: hw/arm/ghes_cper.c
>+F: hw/acpi/ghes_cper_stub.c
>+F: qapi/ghes-cper.json
>+
> ppc4xx
> L: qemu-ppc@nongnu.org
> S: Orphan
>diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig index
>e07d3204eb36..73ffbb82c150 100644
>--- a/hw/acpi/Kconfig
>+++ b/hw/acpi/Kconfig
>@@ -51,6 +51,11 @@ config ACPI_APEI
> bool
> depends on ACPI
>
>+config GHES_CPER
>+ bool
>+ depends on ACPI_APEI
>+ default y
>+
> config ACPI_PCI
> bool
> depends on ACPI && PCI
>diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c new file mode 100644
>index 000000000000..7aa7e71e90dc
>--- /dev/null
>+++ b/hw/acpi/ghes_cper.c
>@@ -0,0 +1,45 @@
>+/*
>+ * ARM Processor error injection
>+ *
>+ * Copyright(C) 2024 Huawei LTD.
>+ *
>+ * This code is licensed under the GPL version 2 or later. See the
>+ * COPYING file in the top-level directory.
>+ *
>+ */
>+
>+#include "qemu/osdep.h"
>+
>+#include "qemu/base64.h"
>+#include "qemu/error-report.h"
>+#include "qemu/uuid.h"
>+#include "qapi/qapi-commands-ghes-cper.h"
>+#include "hw/acpi/ghes.h"
>+
>+void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper,
>+ Error **errp)
>+{
>+ int rc;
>+ AcpiGhesCper cper;
>+ QemuUUID be_uuid, le_uuid;
>+
>+ rc = qemu_uuid_parse(qmp_cper->notification_type, &be_uuid);
>+ if (rc) {
>+ error_setg(errp, "GHES: Invalid UUID: %s",
>+ qmp_cper->notification_type);
>+ return;
>+ }
>+
>+ le_uuid = qemu_uuid_bswap(be_uuid);
>+ cper.guid = le_uuid.data;
>+
>+ cper.data = qbase64_decode(qmp_cper->raw_data, -1,
>+ &cper.data_len, errp);
>+ if (!cper.data) {
>+ return;
>+ }
>+
>+ /* TODO: call a function at ghes */
>+
>+ g_free(cper.data);
>+}
>diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c new file
>mode 100644 index 000000000000..7ce6ed70a265
>--- /dev/null
>+++ b/hw/acpi/ghes_cper_stub.c
>@@ -0,0 +1,18 @@
>+/*
>+ * ARM Processor error injection
>+ *
>+ * Copyright(C) 2024 Huawei LTD.
>+ *
>+ * This code is licensed under the GPL version 2 or later. See the
>+ * COPYING file in the top-level directory.
>+ *
>+ */
>+
>+#include "qemu/osdep.h"
>+#include "qapi/error.h"
>+#include "qapi/qapi-commands-ghes-cper.h"
>+#include "hw/acpi/ghes.h"
>+
>+void qmp_ghes_cper(CommonPlatformErrorRecord *cper, Error **errp) { }
May be add an unsupported or similar log in the stub function?
>diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build index
>fa5c07db9068..6cbf430eb66d 100644
>--- a/hw/acpi/meson.build
>+++ b/hw/acpi/meson.build
>@@ -34,4 +34,6 @@ endif
> system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-
>stub.c', 'ghes-stub.c', 'acpi_interface.c'))
> system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-
>stub.c'))
> system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
>+system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
>+system_ss.add(when: 'CONFIG_GHES_CPER', if_false:
>+files('ghes_cper_stub.c'))
> system_ss.add(files('acpi-qmp-cmds.c'))
>diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index
>1ad60da7aa2d..bed6ba27d715 100644
>--- a/hw/arm/Kconfig
>+++ b/hw/arm/Kconfig
>@@ -712,3 +712,8 @@ config ARMSSE
> select UNIMP
> select SSE_COUNTER
> select SSE_TIMER
>+
>+config GHES_CPER
>+ bool
>+ depends on ARM
>+ default y if AARCH64
>diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index
>33be1eb5acf4..06a5b8820cd5 100644
>--- a/include/hw/acpi/ghes.h
>+++ b/include/hw/acpi/ghes.h
>@@ -23,6 +23,7 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
>+#include "qapi/error.h"
> #include "qemu/notify.h"
>
> extern NotifierList generic_error_notifiers; @@ -78,6 +79,12 @@ void
>acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors); int
>acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
>
>+typedef struct AcpiGhesCper {
>+ uint8_t *guid;
>+ uint8_t *data;
>+ size_t data_len;
>+} AcpiGhesCper;
>+
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
> *
>diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json new file mode 100644
>index 000000000000..3cc4f9f2aaa9
>--- /dev/null
>+++ b/qapi/ghes-cper.json
>@@ -0,0 +1,55 @@
>+# -*- Mode: Python -*-
>+# vim: filetype=python
>+
>+##
>+# = GHESv2 CPER Error Injection
>+#
>+# These are defined at
>+# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2 # (GHESv2
>+- Type 10) ##
>+
>+##
>+# @CommonPlatformErrorRecord:
>+#
>+# Common Platform Error Record - CPER - as defined at the UEFI #
>+specification. See #
>+https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.htm
>+l#record-header
>+# for more details.
>+#
>+# @notification-type: pre-assigned GUID string indicating the record
>+# association with an error event notification type, as defined
>+# at
>https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
>#record-header
>+#
>+# @raw-data: Contains a base64 encoded string with the payload of
>+# the CPER.
>+#
>+# Since: 9.2
>+##
>+{ 'struct': 'CommonPlatformErrorRecord',
>+ 'data': {
>+ 'notification-type': 'str',
>+ 'raw-data': 'str'
>+ }
>+}
>+
>+##
>+# @ghes-cper:
>+#
>+# Inject ARM Processor error with data to be filled according with #
>+ACPI 6.2 GHESv2 spec.
Since ghes-cper is generic interface, mentioning term "ARM Processor error" here may not be appropriate?
>+#
>+# @cper: a single CPER record to be sent to the guest OS.
>+#
>+# Features:
>+#
>+# @unstable: This command is experimental.
>+#
>+# Since: 9.2
>+##
>+{ 'command': 'ghes-cper',
>+ 'data': {
>+ 'cper': 'CommonPlatformErrorRecord'
>+ },
>+ 'features': [ 'unstable' ]
>+}
>diff --git a/qapi/meson.build b/qapi/meson.build index
>e7bc54e5d047..bd13cd7d40c9 100644
>--- a/qapi/meson.build
>+++ b/qapi/meson.build
>@@ -35,6 +35,7 @@ qapi_all_modules = [
> 'dump',
> 'ebpf',
> 'error',
>+ 'ghes-cper',
> 'introspect',
> 'job',
> 'machine-common',
>diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json index
>b1581988e4eb..c1a267399fe5 100644
>--- a/qapi/qapi-schema.json
>+++ b/qapi/qapi-schema.json
>@@ -75,6 +75,7 @@
> { 'include': 'misc-target.json' }
> { 'include': 'audio.json' }
> { 'include': 'acpi.json' }
>+{ 'include': 'ghes-cper.json' }
> { 'include': 'pci.json' }
> { 'include': 'stats.json' }
> { 'include': 'virtio.json' }
>--
>2.45.2
Thanks,
Shiju
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-06 6:09 ` Mauro Carvalho Chehab
@ 2024-08-06 9:18 ` Igor Mammedov
0 siblings, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 9:18 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Tue, 6 Aug 2024 08:09:28 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Mon, 5 Aug 2024 17:56:17 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
>
> > On Fri, 2 Aug 2024 23:43:59 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >
> > > Add error notification to GHES v2 using the GPIO source.
> >
> > The gpio / external interrupt follows through.
>
> True. As session 18.3.2.7 of the spec says:
>
> The OSPM evaluates the control method associated with this event
> as indicated in The Event Method for Handling GPIO Signaled Events
> and The Event Method for Handling Interrupt Signaled Events.
>
> E. g. defining two methods:
> - GED GPIO;
> - GED interrupt
>
> I'm doing this rename:
>
> ACPI_HEST_SRC_ID_GPIO -> ACPI_HEST_SRC_ID_GED_INT
>
> To clearly state what it is implemented there.
>
> I'm also changing patch description to:
>
> acpi/ghes: Add support for General Purpose Event
>
> As a GED error device is now defined, add another type
> of notification.
>
> Add error notification to GHES v2 using the GPIO source.
^^^^
did you mean: GED?
>
> [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks and
> rename HEST event to better identify GED interrupt OSPM]
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> Regards,
> Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-08-05 16:56 ` Jonathan Cameron via
@ 2024-08-06 9:32 ` Igor Mammedov
2024-08-07 7:15 ` Mauro Carvalho Chehab
1 sibling, 1 reply; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 9:32 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:43:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>
> Add error notification to GHES v2 using the GPIO source.
>
> [mchehab: do some cleanups at ACPI_HEST_SRC_ID_* checks]
>
> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 16 ++++++++++------
> include/hw/acpi/ghes.h | 3 ++-
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 8d0262e6c1aa..a745dcc7be5e 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -34,8 +34,8 @@
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
>
> -/* Now only support ARMv8 SEA notification type error source */
> -#define ACPI_GHES_ERROR_SOURCE_COUNT 1
> +/* Support ARMv8 SEA notification type error source and GPIO interrupt. */
> +#define ACPI_GHES_ERROR_SOURCE_COUNT 2
>
> /* Generic Hardware Error Source version 2 */
> #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
> @@ -290,6 +290,9 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> {
> uint64_t address_offset;
> +
> + assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> +
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> */
> build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> break;
> + case ACPI_HEST_SRC_ID_GPIO:
> + build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
perhaps ACPI_GHES_NOTIFY_EXTERNAL fits better here?
> + break;
> default:
> error_report("Not support this error source");
> abort();
> @@ -370,6 +376,7 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> + build_ghes_v2(table_data, ACPI_HEST_SRC_ID_GPIO, linker);
>
> acpi_table_end(linker, &table);
> }
> @@ -406,10 +413,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> start_addr = le64_to_cpu(ags->ghes_addr_le);
>
> if (physical_address) {
> -
> - if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> + start_addr += source_id * sizeof(uint64_t);
why check is being removed?
>
> cpu_physical_memory_read(start_addr, &error_block_addr,
> sizeof(error_block_addr));
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 6891eafff5ab..33be1eb5acf4 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -59,9 +59,10 @@ enum AcpiGhesNotifyType {
> ACPI_GHES_NOTIFY_RESERVED = 12
> };
>
> +/* Those are used as table indexes when building GHES tables */
> enum {
> ACPI_HEST_SRC_ID_SEA = 0,
> - /* future ids go here */
> + ACPI_HEST_SRC_ID_GPIO,
> ACPI_HEST_SRC_ID_RESERVED,
> };
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-05 17:03 ` Jonathan Cameron via
@ 2024-08-06 11:13 ` Shiju Jose via
2024-08-06 14:31 ` Igor Mammedov
2 siblings, 0 replies; 54+ messages in thread
From: Shiju Jose via @ 2024-08-06 11:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
Igor Mammedov, linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
qemu-devel@nongnu.org
>-----Original Message-----
>From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>Sent: 02 August 2024 22:44
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>; Shiju Jose
><shiju.jose@huawei.com>; Mauro Carvalho Chehab
><mchehab+huawei@kernel.org>; Michael S. Tsirkin <mst@redhat.com>; Ani
>Sinha <anisinha@redhat.com>; Dongjiu Geng <gengdongjiu1@gmail.com>; Igor
>Mammedov <imammedo@redhat.com>; linux-kernel@vger.kernel.org; qemu-
>arm@nongnu.org; qemu-devel@nongnu.org
>Subject: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via
>QAPI
>
>Provide a generic interface for error injection via GHESv2.
>
>This patch is co-authored:
> - original ghes logic to inject a simple ARM record by Shiju Jose;
> - generic logic to handle block addresses by Jonathan Cameron;
> - generic GHESv2 error inject by Mauro Carvalho Chehab;
>
>Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
>Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Cc: Shiju Jose <shiju.jose@huawei.com>
>Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>---
> hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> hw/acpi/ghes_cper.c | 2 +-
> include/hw/acpi/ghes.h | 3 +
> 3 files changed, 152 insertions(+), 12 deletions(-)
>
>diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index
>a745dcc7be5e..e125c9475773 100644
>--- a/hw/acpi/ghes.c
>+++ b/hw/acpi/ghes.c
>@@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags,
>FWCfgState *s,
> ags->present = true;
> }
>
>+static uint64_t ghes_get_state_start_address(void)
>+{
>+ AcpiGedState *acpi_ged_state =
>+ ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
>+ AcpiGhesState *ags = &acpi_ged_state->ghes_state;
>+
>+ return le64_to_cpu(ags->ghes_addr_le); }
>+
> int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address) {
> uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
>- uint64_t start_addr;
>+ uint64_t start_addr = ghes_get_state_start_address();
> bool ret = -1;
>- AcpiGedState *acpi_ged_state;
>- AcpiGhesState *ags;
>-
> assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
>
>- acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
>- NULL));
>- g_assert(acpi_ged_state);
>- ags = &acpi_ged_state->ghes_state;
>-
>- start_addr = le64_to_cpu(ags->ghes_addr_le);
>-
> if (physical_address) {
> start_addr += source_id * sizeof(uint64_t);
>
>@@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id,
>uint64_t physical_address)
> return ret;
> }
>
>+/*
>+ * Error register block data layout
>+ *
>+ * | +---------------------+ ges.ghes_addr_le
>+ * | |error_block_address0 |
>+ * | +---------------------+
>+ * | |error_block_address1 |
>+ * | +---------------------+ --+--
>+ * | | ............. | GHES_ADDRESS_SIZE
>+ * | +---------------------+ --+--
>+ * | |error_block_addressN |
>+ * | +---------------------+
>+ * | | read_ack0 |
>+ * | +---------------------+ --+--
>+ * | | read_ack1 | GHES_ADDRESS_SIZE
>+ * | +---------------------+ --+--
>+ * | | ............. |
>+ * | +---------------------+
>+ * | | read_ackN |
>+ * | +---------------------+ --+--
>+ * | | CPER | |
>+ * | | .... | GHES_MAX_RAW_DATA_LENGT
>+ * | | CPER | |
>+ * | +---------------------+ --+--
>+ * | | .......... |
>+ * | +---------------------+
>+ * | | CPER |
>+ * | | .... |
>+ * | | CPER |
>+ * | +---------------------+
>+ */
>+
>+/* Map from uint32_t notify to entry offset in GHES */ static const
>+uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
>+ 0xff, 0xff, 0xff, 1,
>+0};
>+
>+static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
>+ uint64_t *read_ack_addr) {
>+ uint64_t base;
>+
>+ if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
>+ return false;
>+ }
>+
>+ /* Find and check the source id for this new CPER */
>+ if (error_source_to_index[notify] == 0xff) {
>+ return false;
>+ }
>+
>+ base = ghes_get_state_start_address();
>+
>+ *read_ack_addr = base +
>+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
>+ error_source_to_index[notify] * sizeof(uint64_t);
>+
>+ /* Could also be read back from the error_block_address register */
>+ *error_block_addr = base +
>+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
>+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
>+ error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
>+
>+ return true;
>+}
>+
> NotifierList generic_error_notifiers =
> NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
>
>+void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
>+ uint32_t notify) {
>+ int read_ack = 0;
>+ uint32_t i;
>+ uint64_t read_ack_addr = 0;
>+ uint64_t error_block_addr = 0;
>+ uint32_t data_length;
>+ GArray *block;
>+
>+ if (!ghes_get_addr(notify, &error_block_addr, &read_ack_addr)) {
>+ error_setg(errp, "GHES: Invalid error block/ack address(es)");
>+ return;
>+ }
>+
>+ cpu_physical_memory_read(read_ack_addr,
>+ &read_ack, sizeof(uint64_t));
>+
>+ /* zero means OSPM does not acknowledge the error */
>+ if (!read_ack) {
>+ error_setg(errp,
>+ "Last CPER record was not acknowledged yet");
>+ read_ack = 1;
>+ cpu_physical_memory_write(read_ack_addr,
>+ &read_ack, sizeof(uint64_t));
>+ return;
>+ }
>+
>+ read_ack = cpu_to_le64(0);
>+ cpu_physical_memory_write(read_ack_addr,
>+ &read_ack, sizeof(uint64_t));
>+
>+ /* Build CPER record */
>+
>+ /*
>+ * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
>+ * Table 17-13 Generic Error Data Entry
>+ */
>+ QemuUUID fru_id = {};
>+
>+ block = g_array_new(false, true /* clear */, 1);
>+ data_length = ACPI_GHES_DATA_LENGTH + cper->data_len;
>+
>+ /*
>+ * It should not run out of the preallocated memory if
>+ * adding a new generic error data entry
>+ */
>+ assert((data_length + ACPI_GHES_GESB_SIZE) <=
>+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
>+
>+ /* Build the new generic error status block header */
>+ acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
>+ 0, 0, data_length,
>+ ACPI_CPER_SEV_RECOVERABLE);
>+
>+ /* Build this new generic error data entry header */
>+ acpi_ghes_generic_error_data(block, cper->guid,
>+ ACPI_CPER_SEV_RECOVERABLE, 0, 0,
>+ cper->data_len, fru_id, 0);
>+
>+ /* Add CPER data */
>+ for (i = 0; i < cper->data_len; i++) {
>+ build_append_int_noprefix(block, cper->data[i], 1);
>+ }
>+
>+ /* Write the generic error data entry into guest memory */
>+ cpu_physical_memory_write(error_block_addr, block->data,
>+ block->len);
>+
>+ g_array_free(block, true);
>+
>+ notifier_list_notify(&generic_error_notifiers, NULL); }
>+
> bool acpi_ghes_present(void)
> {
> AcpiGedState *acpi_ged_state;
>diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c index
>7aa7e71e90dc..d7ff7debee74 100644
>--- a/hw/acpi/ghes_cper.c
>+++ b/hw/acpi/ghes_cper.c
>@@ -39,7 +39,7 @@ void qmp_ghes_cper(CommonPlatformErrorRecord
>*qmp_cper,
> return;
> }
>
>- /* TODO: call a function at ghes */
>+ ghes_record_cper_errors(&cper, errp, ACPI_GHES_NOTIFY_GPIO);
>
> g_free(cper.data);
> }
>diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index
>06a5b8820cd5..ee6f6cd96911 100644
>--- a/include/hw/acpi/ghes.h
>+++ b/include/hw/acpi/ghes.h
>@@ -85,6 +85,9 @@ typedef struct AcpiGhesCper {
> size_t data_len;
> } AcpiGhesCper;
>
>+void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
>+ uint32_t notify);
>+
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
> *
>--
>2.45.2
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-05 17:00 ` Jonathan Cameron via
2024-08-06 9:15 ` Shiju Jose via
@ 2024-08-06 12:51 ` Igor Mammedov
2024-08-06 12:58 ` Mauro Carvalho Chehab
2024-08-08 8:50 ` Markus Armbruster
3 siblings, 1 reply; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 12:51 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:44:00 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2.
>
> The actual GHES code will be added at the followup patch.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 7 +++++
> hw/acpi/Kconfig | 5 ++++
> hw/acpi/ghes_cper.c | 45 ++++++++++++++++++++++++++++++++
> hw/acpi/ghes_cper_stub.c | 18 +++++++++++++
> hw/acpi/meson.build | 2 ++
> hw/arm/Kconfig | 5 ++++
> include/hw/acpi/ghes.h | 7 +++++
> qapi/ghes-cper.json | 55 ++++++++++++++++++++++++++++++++++++++++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> 10 files changed, 146 insertions(+)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c
> create mode 100644 qapi/ghes-cper.json
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 98eddf7ae155..655edcb6688c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> F: include/hw/acpi/ghes.h
> F: docs/specs/acpi_hest_ghes.rst
>
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/ghes_cper.c
> +F: hw/acpi/ghes_cper_stub.c
> +F: qapi/ghes-cper.json
> +
> ppc4xx
> L: qemu-ppc@nongnu.org
> S: Orphan
> diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
> index e07d3204eb36..73ffbb82c150 100644
> --- a/hw/acpi/Kconfig
> +++ b/hw/acpi/Kconfig
> @@ -51,6 +51,11 @@ config ACPI_APEI
> bool
> depends on ACPI
>
> +config GHES_CPER
> + bool
> + depends on ACPI_APEI
> + default y
> +
> config ACPI_PCI
> bool
> depends on ACPI && PCI
> diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> new file mode 100644
> index 000000000000..7aa7e71e90dc
> --- /dev/null
> +++ b/hw/acpi/ghes_cper.c
> @@ -0,0 +1,45 @@
> +/*
> + * ARM Processor error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +
> +#include "qemu/base64.h"
> +#include "qemu/error-report.h"
> +#include "qemu/uuid.h"
> +#include "qapi/qapi-commands-ghes-cper.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper,
> + Error **errp)
> +{
> + int rc;
> + AcpiGhesCper cper;
> + QemuUUID be_uuid, le_uuid;
> +
> + rc = qemu_uuid_parse(qmp_cper->notification_type, &be_uuid);
> + if (rc) {
> + error_setg(errp, "GHES: Invalid UUID: %s",
> + qmp_cper->notification_type);
> + return;
> + }
> +
> + le_uuid = qemu_uuid_bswap(be_uuid);
> + cper.guid = le_uuid.data;
> +
> + cper.data = qbase64_decode(qmp_cper->raw_data, -1,
> + &cper.data_len, errp);
> + if (!cper.data) {
> + return;
> + }
> +
> + /* TODO: call a function at ghes */
> +
> + g_free(cper.data);
> +}
> diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
> new file mode 100644
> index 000000000000..7ce6ed70a265
> --- /dev/null
> +++ b/hw/acpi/ghes_cper_stub.c
> @@ -0,0 +1,18 @@
> +/*
> + * ARM Processor error injection
> + *
> + * Copyright(C) 2024 Huawei LTD.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-commands-ghes-cper.h"
> +#include "hw/acpi/ghes.h"
> +
> +void qmp_ghes_cper(CommonPlatformErrorRecord *cper, Error **errp)
> +{
> +}
> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
> index fa5c07db9068..6cbf430eb66d 100644
> --- a/hw/acpi/meson.build
> +++ b/hw/acpi/meson.build
> @@ -34,4 +34,6 @@ endif
> system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
> system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
> system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
> +system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
> system_ss.add(files('acpi-qmp-cmds.c'))
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 1ad60da7aa2d..bed6ba27d715 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -712,3 +712,8 @@ config ARMSSE
> select UNIMP
> select SSE_COUNTER
> select SSE_TIMER
> +
> +config GHES_CPER
> + bool
> + depends on ARM
> + default y if AARCH64
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 33be1eb5acf4..06a5b8820cd5 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,7 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qapi/error.h"
> #include "qemu/notify.h"
>
> extern NotifierList generic_error_notifiers;
> @@ -78,6 +79,12 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
>
> +typedef struct AcpiGhesCper {
> + uint8_t *guid;
> + uint8_t *data;
> + size_t data_len;
> +} AcpiGhesCper;
> +
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
> *
> diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
> new file mode 100644
> index 000000000000..3cc4f9f2aaa9
> --- /dev/null
> +++ b/qapi/ghes-cper.json
> @@ -0,0 +1,55 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = GHESv2 CPER Error Injection
> +#
> +# These are defined at
> +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +# (GHESv2 - Type 10)
> +##
> +
> +##
> +# @CommonPlatformErrorRecord:
> +#
> +# Common Platform Error Record - CPER - as defined at the UEFI
> +# specification. See
> +# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
> +# for more details.
> +#
> +# @notification-type: pre-assigned GUID string indicating the record
> +# association with an error event notification type, as defined
> +# at https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
> +#
> +# @raw-data: Contains a base64 encoded string with the payload of
> +# the CPER.
> +#
> +# Since: 9.2
> +##
> +{ 'struct': 'CommonPlatformErrorRecord',
> + 'data': {
> + 'notification-type': 'str',
this should be source id (type is just impl. detail of how QEMU delivers
event for given source id)
unless there is no plan to use more sources,
I'd just drop this from API to avoid confusing user.
Since the patch comes before 5/7, it's not clear how it will be used at this point.
I'd move the patch after 5/7.
> + 'raw-data': 'str'
> + }
> +}
> +
> +##
> +# @ghes-cper:
> +#
> +# Inject ARM Processor error with data to be filled according with
> +# ACPI 6.2 GHESv2 spec.
> +#
> +# @cper: a single CPER record to be sent to the guest OS.
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.2
> +##
> +{ 'command': 'ghes-cper',
> + 'data': {
> + 'cper': 'CommonPlatformErrorRecord'
> + },
> + 'features': [ 'unstable' ]
> +}
> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..bd13cd7d40c9 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -35,6 +35,7 @@ qapi_all_modules = [
> 'dump',
> 'ebpf',
> 'error',
> + 'ghes-cper',
> 'introspect',
> 'job',
> 'machine-common',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..c1a267399fe5 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -75,6 +75,7 @@
> { 'include': 'misc-target.json' }
> { 'include': 'audio.json' }
> { 'include': 'acpi.json' }
> +{ 'include': 'ghes-cper.json' }
> { 'include': 'pci.json' }
> { 'include': 'stats.json' }
> { 'include': 'virtio.json' }
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-06 12:51 ` Igor Mammedov
@ 2024-08-06 12:58 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-06 12:58 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Eric Blake, Markus Armbruster, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Em Tue, 6 Aug 2024 14:51:53 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > +{ 'struct': 'CommonPlatformErrorRecord',
> > + 'data': {
>
> > + 'notification-type': 'str',
>
> this should be source id (type is just impl. detail of how QEMU delivers
> event for given source id)
> unless there is no plan to use more sources,
> I'd just drop this from API to avoid confusing user.
>
> Since the patch comes before 5/7, it's not clear how it will be used at this point.
> I'd move the patch after 5/7.
As described at:
> +# @notification-type: pre-assigned GUID string indicating the record
> +# association with an error event notification type, as defined
> +# at https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
This is actually GUID of the error to be generated. Perhaps the better would
be to change the above to:
{ 'struct': 'CommonPlatformErrorRecord',
'data': {
'guid': 'str',
'raw-data': 'str'
}
Making it even clearer. In any case, this is mandatory, as otherwise
the interface would be limited to a single type.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-05 17:03 ` Jonathan Cameron via
2024-08-06 11:13 ` Shiju Jose via
@ 2024-08-06 14:31 ` Igor Mammedov
2024-08-07 7:47 ` Mauro Carvalho Chehab
` (2 more replies)
2 siblings, 3 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 14:31 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Fri, 2 Aug 2024 23:44:01 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Provide a generic interface for error injection via GHESv2.
>
> This patch is co-authored:
> - original ghes logic to inject a simple ARM record by Shiju Jose;
> - generic logic to handle block addresses by Jonathan Cameron;
> - generic GHESv2 error inject by Mauro Carvalho Chehab;
>
> Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> hw/acpi/ghes_cper.c | 2 +-
> include/hw/acpi/ghes.h | 3 +
> 3 files changed, 152 insertions(+), 12 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index a745dcc7be5e..e125c9475773 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> +static uint64_t ghes_get_state_start_address(void)
ghes_get_hardware_errors_address() might better reflect what address it will return
> +{
> + AcpiGedState *acpi_ged_state =
> + ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> + AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> +
> + return le64_to_cpu(ags->ghes_addr_le);
> +}
> +
> int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> {
> uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> - uint64_t start_addr;
> + uint64_t start_addr = ghes_get_state_start_address();
> bool ret = -1;
> - AcpiGedState *acpi_ged_state;
> - AcpiGhesState *ags;
> -
> assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
>
> - acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> - NULL));
> - g_assert(acpi_ged_state);
> - ags = &acpi_ged_state->ghes_state;
> -
> - start_addr = le64_to_cpu(ags->ghes_addr_le);
> -
> if (physical_address) {
> start_addr += source_id * sizeof(uint64_t);
above should be a separate patch
>
> @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> return ret;
> }
>
> +/*
> + * Error register block data layout
> + *
> + * | +---------------------+ ges.ghes_addr_le
> + * | |error_block_address0 |
> + * | +---------------------+
> + * | |error_block_address1 |
> + * | +---------------------+ --+--
> + * | | ............. | GHES_ADDRESS_SIZE
> + * | +---------------------+ --+--
> + * | |error_block_addressN |
> + * | +---------------------+
> + * | | read_ack0 |
> + * | +---------------------+ --+--
> + * | | read_ack1 | GHES_ADDRESS_SIZE
> + * | +---------------------+ --+--
> + * | | ............. |
> + * | +---------------------+
> + * | | read_ackN |
> + * | +---------------------+ --+--
> + * | | CPER | |
> + * | | .... | GHES_MAX_RAW_DATA_LENGT
> + * | | CPER | |
> + * | +---------------------+ --+--
> + * | | .......... |
> + * | +---------------------+
> + * | | CPER |
> + * | | .... |
> + * | | CPER |
> + * | +---------------------+
> + */
no need to duplicate docs/specs/acpi_hest_ghes.rst,
I'd just reffer to it and maybe add short comment as to why it's mentioned.
> +/* Map from uint32_t notify to entry offset in GHES */
> +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> + 0xff, 0xff, 0xff, 1, 0};
> +
> +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> + uint64_t *read_ack_addr)
> +{
> + uint64_t base;
> +
> + if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> + return false;
> + }
> +
> + /* Find and check the source id for this new CPER */
> + if (error_source_to_index[notify] == 0xff) {
> + return false;
> + }
> +
> + base = ghes_get_state_start_address();
> +
> + *read_ack_addr = base +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> + error_source_to_index[notify] * sizeof(uint64_t);
> +
> + /* Could also be read back from the error_block_address register */
> + *error_block_addr = base +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> +
> + return true;
> +}
I don't like all this pointer math, which is basically a reverse engineered
QEMU actions on startup + guest provided etc/hardware_errors address.
For once, it assumes error_source_to_index[] matches order in which HEST
error sources were described, which is fragile.
2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
in RAM migrated from older version might not match above assumptions
of target QEMU.
I see 2 ways to rectify it:
1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
in guest RAM, like we do with etc/hardware_errors, see
build_ghes_error_table()
...
tell firmware to write hardware_errors GPA into
and then fetch from HEST table in RAM, the guest patched error/ack addresses
for given source_id
code-wise: relatively simple once one wraps their own head over
how this whole APEI thing works in QEMU
workflow is described in docs/specs/acpi_hest_ghes.rst
look to me as sufficient to grasp it.
(but my view is very biased given my prior knowledge,
aka: docs/comments/examples wrt acpi patching are good enough)
(if it's not clear how to do it, ask me for pointers)
2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack Register
patching instructions
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
^^^^^^^^^^^^^^^^^^^^^^^^^
during build_ghes_v2() also store on a side mapping
source_id -> error address offset : read ack address
so when you are injecting error, you'd at least use offsets
used at start time, to get rid of risk where injection code
diverge from HEST:etc/hardware_errors layout at start time.
However to make migration safe, one would need to add a fat
comment not to change order ghest error sources in HEST _and_
a dedicated unit test to make sure we catch it when that happens.
bios_tables_test should be able to catch the change, but it won't
say what's wrong, hence a test case that explicitly checks order
and loudly & clear complains when we will break order assumptions.
downside:
* we are are limiting ways HEST could be composed/reshuffled in future
* consumption of extra CI resources
* and well, it relies on above duct tape holding all pieces together
> NotifierList generic_error_notifiers =
> NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
>
> +void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
> + uint32_t notify)
> +{
> + int read_ack = 0;
^^^
[...]
> + cpu_physical_memory_read(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
^^^^
it looks like possible stack corruption, isn't it?
> + /* zero means OSPM does not acknowledge the error */
> + if (!read_ack) {
> + error_setg(errp,
> + "Last CPER record was not acknowledged yet");
> + read_ack = 1;
> + cpu_physical_memory_write(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
^^^^^
and then who knows what we are writing back here
> + return;
> + }
> +
> + read_ack = cpu_to_le64(0);
> + cpu_physical_memory_write(read_ack_addr,
> + &read_ack, sizeof(uint64_t));
> +
> + /* Build CPER record */
> +
> + /*
> + * Invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> + * Table 17-13 Generic Error Data Entry
> + */
> + QemuUUID fru_id = {};
> +
> + block = g_array_new(false, true /* clear */, 1);
> + data_length = ACPI_GHES_DATA_LENGTH + cper->data_len;
> +
> + /*
> + * It should not run out of the preallocated memory if
> + * adding a new generic error data entry
> + */
> + assert((data_length + ACPI_GHES_GESB_SIZE) <=
> + ACPI_GHES_MAX_RAW_DATA_LENGTH);
it's better to error out gracefully here instead of crash
in case script generated too long record,
not the end of the world, but it's annoying to restart guest
on external mistake.
PS:
looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
and it is the total size of a error block for a error source.
However acpi_hest_ghes.rst (3) says it should be 4K,
am I mistaken?
> + /* Build the new generic error status block header */
> + acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> + 0, 0, data_length,
> + ACPI_CPER_SEV_RECOVERABLE);
> +
> + /* Build this new generic error data entry header */
> + acpi_ghes_generic_error_data(block, cper->guid,
> + ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> + cper->data_len, fru_id, 0);
> +
not that I mind, but I'd ax above calls with their hardcoded
assumptions and make script generate whole error block,
it's more flexible wrt ACPI_CPER_SEV_RECOVERABLE/ACPI_GEBS_UNCORRECTABLE
and then one can ditch from QAPI interface cper->guid.
basically inject whatever user provided via QAPI without any other assumptions.
> + /* Add CPER data */
> + for (i = 0; i < cper->data_len; i++) {
> + build_append_int_noprefix(block, cper->data[i], 1);
> + }
> +
> + /* Write the generic error data entry into guest memory */
> + cpu_physical_memory_write(error_block_addr, block->data, block->len);
> +
> + g_array_free(block, true);
> +
> + notifier_list_notify(&generic_error_notifiers, NULL);
> +}
> +
> bool acpi_ghes_present(void)
> {
> AcpiGedState *acpi_ged_state;
> diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
> index 7aa7e71e90dc..d7ff7debee74 100644
> --- a/hw/acpi/ghes_cper.c
> +++ b/hw/acpi/ghes_cper.c
> @@ -39,7 +39,7 @@ void qmp_ghes_cper(CommonPlatformErrorRecord *qmp_cper,
> return;
> }
>
> - /* TODO: call a function at ghes */
> + ghes_record_cper_errors(&cper, errp, ACPI_GHES_NOTIFY_GPIO);
>
> g_free(cper.data);
> }
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 06a5b8820cd5..ee6f6cd96911 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -85,6 +85,9 @@ typedef struct AcpiGhesCper {
> size_t data_len;
> } AcpiGhesCper;
>
> +void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
> + uint32_t notify);
maybe rename it to acpi_ghes_inject_error_block()
> +
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
> *
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
@ 2024-08-06 14:56 ` Igor Mammedov
2024-08-08 20:58 ` John Snow
2024-08-08 21:21 ` John Snow
2 siblings, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-06 14:56 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, John Snow,
linux-kernel, qemu-devel
On Fri, 2 Aug 2024 23:44:02 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Using the QMP GHESv2 API requires preparing a raw data array
> containing a CPER record.
>
> Add a helper script with subcommands to prepare such data.
>
> Currently, only ARM Processor error CPER record is supported.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 3 +
> scripts/arm_processor_error.py | 352 +++++++++++++++++++++++++++++++++
> scripts/ghes_inject.py | 59 ++++++
> scripts/qmp_helper.py | 249 +++++++++++++++++++++++
> 4 files changed, 663 insertions(+)
> create mode 100644 scripts/arm_processor_error.py
> create mode 100755 scripts/ghes_inject.py
> create mode 100644 scripts/qmp_helper.py
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 655edcb6688c..e490f69da1de 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2081,6 +2081,9 @@ S: Maintained
> F: hw/arm/ghes_cper.c
> F: hw/acpi/ghes_cper_stub.c
> F: qapi/ghes-cper.json
> +F: scripts/ghes_inject.py
> +F: scripts/arm_processor_error.py
> +F: scripts/qmp_helper.py
>
> ppc4xx
> L: qemu-ppc@nongnu.org
> diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
> new file mode 100644
> index 000000000000..df4efa508790
> --- /dev/null
> +++ b/scripts/arm_processor_error.py
> @@ -0,0 +1,352 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +# TODO: current implementation has dummy defaults.
> +#
> +# For a better implementation, a QMP addition/call is needed to
> +# retrieve some data for ARM Processor Error injection:
> +#
> +# - machine emulation architecture, as ARM current default is
> +# for AArch64;
> +# - ARM registers: power_state, midr, mpidr.
I'm not really reviewing the script but here some pointers how to fetch properties
start qemu with QMP connection
./qemu-system-aarch64 -M virt -qmp unix:/tmp/s,server,nowait
use script
./scripts/qmp/qom-get --socket /tmp/s /machine/unattached/device[0].midr
you can use ./scripts/qmp/qom-tree to explore what's there
see commit e61cc6b5c69 how to add property (DEFINE_PROP_UINT32 part mainly),
as long as it's prefixed with "x-" (meaning internal/unstable) likelihood is
that no one would object adding extra ones
> +
> +import argparse
> +import json
> +
> +from qmp_helper import (qmp_command, get_choice, get_mult_array,
> + get_mult_choices, get_mult_int, bit,
> + data_add, to_guid)
> +
> +# Arm processor EINJ logic
> +#
> +ACPI_GHES_ARM_CPER_LENGTH = 40
> +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
> +
> +# TODO: query it from emulation. Current default valid only for Aarch64
> +CONTEXT_AARCH64_EL1 = 5
> +
> +class ArmProcessorEinj:
> + """
> + Implements ARM Processor Error injection via GHES
> + """
> +
> + def __init__(self):
> + """Initialize the error injection class"""
> +
> + # Valid choice values
> + self.arm_valid_bits = {
> + "mpidr": bit(0),
> + "affinity": bit(1),
> + "running": bit(2),
> + "vendor": bit(3),
> + }
> +
> + self.pei_flags = {
> + "first": bit(0),
> + "last": bit(1),
> + "propagated": bit(2),
> + "overflow": bit(3),
> + }
> +
> + self.pei_error_types = {
> + "cache": bit(1),
> + "tlb": bit(2),
> + "bus": bit(3),
> + "micro-arch": bit(4),
> + }
> +
> + self.pei_valid_bits = {
> + "multiple-error": bit(0),
> + "flags": bit(1),
> + "error-info": bit(2),
> + "virt-addr": bit(3),
> + "phy-addr": bit(4),
> + }
> +
> + self.data = bytearray()
> +
> + def create_subparser(self, subparsers):
> + """Add a subparser to handle for the error fields"""
> +
> + parser = subparsers.add_parser("arm",
> + help="Generate an ARM processor CPER")
> +
> + arm_valid_bits = ",".join(self.arm_valid_bits.keys())
> + flags = ",".join(self.pei_flags.keys())
> + error_types = ",".join(self.pei_error_types.keys())
> + pei_valid_bits = ",".join(self.arm_valid_bits.keys())
> +
> + # UEFI N.16 ARM Validation bits
> + g_arm = parser.add_argument_group("ARM processor")
> + g_arm.add_argument("--arm", "--arm-valid",
> + help=f"ARM valid bits: {arm_valid_bits}")
> + g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level",
> + type=lambda x: int(x, 0),
> + help="Affinity level (when multiple levels apply)")
> + g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
> + help="Multiprocessor Affinity Register")
> + g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
> + help="Main ID Register")
> + g_arm.add_argument("-r", "--running",
> + action=argparse.BooleanOptionalAction,
> + default=None,
> + help="Indicates if the processor is running or not")
> + g_arm.add_argument("--psci", "--psci-state",
> + type=lambda x: int(x, 0),
> + help="Power State Coordination Interface - PSCI state")
> +
> + # TODO: Add vendor-specific support
> +
> + # UEFI N.17 bitmaps (type and flags)
> + g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
> + g_pei.add_argument("-t", "--type", nargs="+",
> + help=f"one or more error types: {error_types}")
> + g_pei.add_argument("-f", "--flags", nargs="*",
> + help=f"zero or more error flags: {flags}")
> + g_pei.add_argument("-V", "--pei-valid", "--error-valid", nargs="*",
> + help=f"zero or more PEI valid bits: {pei_valid_bits}")
> +
> + # UEFI N.17 Integer values
> + g_pei.add_argument("-m", "--multiple-error", nargs="+",
> + help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")
> + g_pei.add_argument("-e", "--error-info", nargs="+",
> + help="Error information (UEFI 2.10 tables N.18 to N.20)")
> + g_pei.add_argument("-p", "--physical-address", nargs="+",
> + help="Physical address")
> + g_pei.add_argument("-v", "--virtual-address", nargs="+",
> + help="Virtual address")
> +
> + # UEFI N.21 Context
> + g_ctx = parser.add_argument_group("Processor Context")
> + g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
> + help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")
> + g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
> + help="Minimal size of the context")
> + g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
> + help="Comma-separated arrays for each context")
> +
> + # Vendor-specific data
> + g_vendor = parser.add_argument_group("Vendor-specific data")
> + g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
> + help="Vendor-specific byte arrays of data")
> +
> + def parse_args(self, args):
> + """Parse subcommand arguments"""
> +
> + cper = {}
> + pei = {}
> + ctx = {}
> + vendor = {}
> +
> + arg = vars(args)
> +
> + # Handle global parameters
> + if args.arm:
> + arm_valid_init = False
> + cper["valid"] = get_choice(name="valid",
> + value=args.arm,
> + choices=self.arm_valid_bits,
> + suffixes=["-error", "-err"])
> + else:
> + cper["valid"] = 0
> + arm_valid_init = True
> +
> + if "running" in arg:
> + if args.running:
> + cper["running-state"] = bit(0)
> + else:
> + cper["running-state"] = 0
> + else:
> + cper["running-state"] = 0
> +
> + if arm_valid_init:
> + if args.affinity:
> + cper["valid"] |= self.arm_valid_bits["affinity"]
> +
> + if args.mpidr:
> + cper["valid"] |= self.arm_valid_bits["mpidr"]
> +
> + if "running-state" in cper:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + if args.psci:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + # Handle PEI
> + if not args.type:
> + args.type = ["cache-error"]
> +
> + get_mult_choices(
> + pei,
> + name="valid",
> + values=args.pei_valid,
> + choices=self.pei_valid_bits,
> + suffixes=["-valid", "-info", "--information", "--addr"],
> + )
> + get_mult_choices(
> + pei,
> + name="type",
> + values=args.type,
> + choices=self.pei_error_types,
> + suffixes=["-error", "-err"],
> + )
> + get_mult_choices(
> + pei,
> + name="flags",
> + values=args.flags,
> + choices=self.pei_flags,
> + suffixes=["-error", "-cap"],
> + )
> + get_mult_int(pei, "error-info", args.error_info)
> + get_mult_int(pei, "multiple-error", args.multiple_error)
> + get_mult_int(pei, "phy-addr", args.physical_address)
> + get_mult_int(pei, "virt-addr", args.virtual_address)
> +
> + # Handle context
> + get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
> + get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
> + get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
> +
> + get_mult_array(vendor, "bytes", args.vendor, max_val=255)
> +
> + # Store PEI
> + pei_data = bytearray()
> + default_flags = self.pei_flags["first"]
> + default_flags |= self.pei_flags["last"]
> +
> + error_info_num = 0
> +
> + for i, p in pei.items(): # pylint: disable=W0612
> + error_info_num += 1
> +
> + # UEFI 2.10 doesn't define how to encode error information
> + # when multiple types are raised. So, provide a default only
> + # if a single type is there
> + if "error-info" not in p:
> + if p["type"] == bit(1):
> + p["error-info"] = 0x0091000F
> + if p["type"] == bit(2):
> + p["error-info"] = 0x0054007F
> + if p["type"] == bit(3):
> + p["error-info"] = 0x80D6460FFF
> + if p["type"] == bit(4):
> + p["error-info"] = 0x78DA03FF
> +
> + if "valid" not in p:
> + p["valid"] = 0
> + if "multiple-error" in p:
> + p["valid"] |= self.pei_valid_bits["multiple-error"]
> +
> + if "flags" in p:
> + p["valid"] |= self.pei_valid_bits["flags"]
> +
> + if "error-info" in p:
> + p["valid"] |= self.pei_valid_bits["error-info"]
> +
> + if "phy-addr" in p:
> + p["valid"] |= self.pei_valid_bits["phy-addr"]
> +
> + if "virt-addr" in p:
> + p["valid"] |= self.pei_valid_bits["virt-addr"]
> +
> + # Version
> + data_add(pei_data, 0, 1)
> +
> + data_add(pei_data, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
> +
> + data_add(pei_data, p["valid"], 2)
> + data_add(pei_data, p["type"], 1)
> + data_add(pei_data, p.get("multiple-error", 1), 2)
> + data_add(pei_data, p.get("flags", default_flags), 1)
> + data_add(pei_data, p.get("error-info", 0), 8)
> + data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
> + data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
> +
> + # Store Context
> + ctx_data = bytearray()
> + context_info_num = 0
> +
> + if ctx:
> + for k in sorted(ctx.keys()):
> + context_info_num += 1
> +
> + if "type" not in ctx:
> + ctx[k]["type"] = CONTEXT_AARCH64_EL1
> +
> + if "register" not in ctx:
> + ctx[k]["register"] = []
> +
> + reg_size = len(ctx[k]["register"])
> + size = 0
> +
> + if "minimal-size" in ctx:
> + size = ctx[k]["minimal-size"]
> +
> + size = max(size, reg_size)
> +
> + size = (size + 1) % 0xFFFE
> +
> + # Version
> + data_add(ctx_data, 0, 2)
> +
> + data_add(ctx_data, ctx[k]["type"], 2)
> +
> + data_add(ctx_data, 8 * size, 4)
> +
> + for r in ctx[k]["register"]:
> + data_add(ctx_data, r, 8)
> +
> + for i in range(reg_size, size): # pylint: disable=W0612
> + data_add(ctx_data, 0, 8)
> +
> + # Vendor-specific bytes are not grouped
> + vendor_data = bytearray()
> + if vendor:
> + for k in sorted(vendor.keys()):
> + for b in vendor[k]["bytes"]:
> + data_add(vendor_data, b, 1)
> +
> + # Encode ARM Processor Error
> + data = bytearray()
> +
> + data_add(data, cper["valid"], 4)
> +
> + data_add(data, error_info_num, 2)
> + data_add(data, context_info_num, 2)
> +
> + # Calculate the length of the CPER data
> + cper_length = ACPI_GHES_ARM_CPER_LENGTH
> + cper_length += len(pei_data)
> + cper_length += len(vendor_data)
> + cper_length += len(ctx_data)
> + data_add(data, cper_length, 4)
> +
> + data_add(data, arg.get("affinity-level", 0), 1)
> +
> + # Reserved
> + data_add(data, 0, 3)
> +
> + data_add(data, arg.get("mpidr-el1", 0), 8)
> + data_add(data, arg.get("midr-el1", 0), 8)
> + data_add(data, cper["running-state"], 4)
> + data_add(data, arg.get("psci-state", 0), 4)
> +
> + # Add PEI
> + data.extend(pei_data)
> + data.extend(ctx_data)
> + data.extend(vendor_data)
> +
> + self.data = data
> +
> + def run(self, host, port):
> + """Execute QMP commands"""
> +
> + guid = to_guid(0xE19E3D16, 0xBC11, 0x11E4,
> + [0x9C, 0xAA, 0xC2, 0x05,
> + 0x1D, 0x5D, 0x46, 0xB0])
> +
> + qmp_command(host, port, guid, self.data)
> diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
> new file mode 100755
> index 000000000000..8415ccbbc53d
> --- /dev/null
> +++ b/scripts/ghes_inject.py
> @@ -0,0 +1,59 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import argparse
> +
> +from arm_processor_error import ArmProcessorEinj
> +
> +EINJ_DESCRIPTION = """
> +Handle ACPI GHESv2 error injection logic QEMU QMP interface.\n
> +
> +It allows using UEFI BIOS EINJ features to generate GHES records.
> +
> +It helps testing Linux CPER and GHES drivers and to test rasdaemon
> +error handling logic.
> +
> +Currently, it support ARM processor error injection for ARM processor
> +events, being compatible with UEFI 2.9A Errata.
> +
> +This small utility works together with those QEMU additions:
> +- https://gitlab.com/mchehab_kernel/qemu/-/tree/arm-error-inject-v2
> +"""
> +
> +def main():
> + """Main program"""
> +
> + # Main parser - handle generic args like QEMU QMP TCP socket options
> + parser = argparse.ArgumentParser(prog="einj.py",
> + formatter_class=argparse.RawDescriptionHelpFormatter,
> + usage="%(prog)s [options]",
> + description=EINJ_DESCRIPTION,
> + epilog="If a field is not defined, a default value will be applied by QEMU.")
> +
> + g_options = parser.add_argument_group("QEMU QMP socket options")
> + g_options.add_argument("-H", "--host", default="localhost", type=str,
> + help="host name")
> + g_options.add_argument("-P", "--port", default=4445, type=int,
> + help="TCP port number")
> +
> + arm_einj = ArmProcessorEinj()
> +
> + # Call subparsers
> + subparsers = parser.add_subparsers(dest='command')
> +
> + arm_einj.create_subparser(subparsers)
> +
> + args = parser.parse_args()
> +
> + # Handle subparser commands
> + if args.command == "arm":
> + arm_einj.parse_args(args)
> + arm_einj.run(args.host, args.port)
> +
> +
> +if __name__ == "__main__":
> + main()
> diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> new file mode 100644
> index 000000000000..13fae7a7af0e
> --- /dev/null
> +++ b/scripts/qmp_helper.py
> @@ -0,0 +1,249 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0915, W0511
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import json
> +import socket
> +import sys
> +
> +from base64 import b64encode
> +
> +#
> +# Socket QMP send command
> +#
> +def qmp_command(host, port, guid, data):
> + """Send commands to QEMU though QMP TCP socket"""
> +
> + # Fill the commands to be sent
> + commands = []
> +
> + # Needed to negotiate QMP and for QEMU to accept the command
> + commands.append('{ "execute": "qmp_capabilities" } ')
> +
> + base64_data = b64encode(bytes(data)).decode('ascii')
> +
> + cmd_arg = {
> + 'cper': {
> + 'notification-type': guid,
> + "raw-data": base64_data
> + }
> + }
> +
> + command = '{ "execute": "ghes-cper", '
> + command += '"arguments": ' + json.dumps(cmd_arg) + " }"
> +
> + commands.append(command)
> +
> + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> + try:
> + s.connect((host, port))
> + except ConnectionRefusedError:
> + sys.exit(f"Can't connect to QMP host {host}:{port}")
> +
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if "QMP" not in obj:
> + print(f"Invalid QMP answer: {data.decode("utf-8")}")
> + s.close()
> + return
> +
> + for i, command in enumerate(commands):
> + s.sendall(command.encode("utf-8"))
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + if obj["return"]:
> + print(json.dumps(obj["return"]))
> + elif i > 0:
> + print("Error injected.")
> + elif isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.shutdown(socket.SHUT_WR)
> + while 1:
> + data = s.recv(1024)
> + if data == b"":
> + break
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + print(json.dumps(obj["return"]))
> + if isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.close()
> +
> +
> +#
> +# Helper routines to handle multiple choice arguments
> +#
> +def get_choice(name, value, choices, suffixes=None):
> + """Produce a list from multiple choice argument"""
> +
> + new_values = 0
> +
> + if not value:
> + return new_values
> +
> + for val in value.split(","):
> + val = val.lower()
> +
> + if suffixes:
> + for suffix in suffixes:
> + val = val.removesuffix(suffix)
> +
> + if val not in choices.keys():
> + sys.exit(f"Error on '{name}': choice {val} is invalid.")
> +
> + val = choices[val]
> +
> + new_values |= val
> +
> + return new_values
> +
> +
> +def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
> + """Add numbered hashes from integer lists"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + if not values:
> + i = 0
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = []
> + return
> +
> + i = 0
> + for value in values:
> + for val in value.split(","):
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if max_val and val > max_val:
> + sys.exit(f"Error on '{name}': {val} is too little")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + if name not in mult[i]:
> + mult[i][name] = []
> +
> + mult[i][name].append(val)
> +
> + i += 1
> +
> +
> +def get_mult_choices(mult, name, values, choices,
> + suffixes=None, allow_zero=False):
> + """Add numbered hashes from multiple choice arguments"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + new_values = get_choice(name, val, choices, suffixes)
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = new_values
> + i += 1
> +
> +
> +def get_mult_int(mult, name, values, allow_zero=False):
> + """Add numbered hashes from integer arguments"""
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = val
> + i += 1
> +
> +
> +#
> +# Data encode helper functions
> +#
> +def bit(b):
> + """Simple macro to define a bit on a bitmask"""
> + return 1 << b
> +
> +
> +def data_add(data, value, num_bytes):
> + """Adds bytes from value inside a bitarray"""
> +
> + data.extend(value.to_bytes(num_bytes, byteorder="little"))
> +
> +def to_guid(time_low, time_mid, time_high, nodes):
> + """Create an GUID string"""
> +
> + assert(len(nodes) == 8)
> +
> + clock = nodes[0] << 8 | nodes[1]
> +
> + node = 0
> + for i in range(2, len(nodes)):
> + node = node << 8 | nodes[i]
> +
> + s = f"{time_low:08x}-{time_mid:04x}-"
> + s += f"{time_high:04x}-{clock:04x}-{node:012x}"
> +
> + return s
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 4/7] acpi/ghes: Support GPIO error source
2024-08-06 9:32 ` Igor Mammedov
@ 2024-08-07 7:15 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-07 7:15 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Tue, 6 Aug 2024 11:32:19 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > @@ -327,6 +330,9 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> > */
> > build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> > break;
> > + case ACPI_HEST_SRC_ID_GPIO:
> > + build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_GPIO);
>
> perhaps ACPI_GHES_NOTIFY_EXTERNAL fits better here?
Symbol already used to map the 12 possible notification types from ACPI spec.
I did a:
sed s,ACPI_HEST_SRC_ID_GED_INT,ACPI_HEST_NOTIFY_EXTERNAL,
instead.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-06 14:31 ` Igor Mammedov
@ 2024-08-07 7:47 ` Mauro Carvalho Chehab
2024-08-07 9:34 ` Jonathan Cameron via
2024-08-07 14:25 ` Jonathan Cameron via
2024-08-08 12:11 ` Mauro Carvalho Chehab
2 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-07 7:47 UTC (permalink / raw)
To: Igor Mammedov, Jonathan Cameron
Cc: Shiju Jose, Michael S. Tsirkin, Ani Sinha, Dongjiu Geng,
linux-kernel, qemu-arm, qemu-devel
Em Tue, 6 Aug 2024 16:31:13 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> PS:
> looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
> and it is the total size of a error block for a error source.
>
> However acpi_hest_ghes.rst (3) says it should be 4K,
> am I mistaken?
Maybe Jonathan knows better, but I guess the 1K was just some
arbitrary limit to prevent a too big CPER. The 4K limit described
at acpi_hest_ghes.rst could be just some limit to cope with
the current bios implementation, but I didn't check myself how
this is implemented there.
I was unable to find any limit at the specs. Yet, if you look at:
https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
The processor Error Information Structure, starting at offset
40, can go up to 255*32, meaning an offset of 8200, which is
bigger than 4K.
Going further, processor context can have up to 65535 (spec
actually says 65536, but that sounds a typo, as the size is
stored on an uint16_t), containing multiple register values
there (the spec calls its length as "P").
So, the CPER record could, in theory, have:
8200 + (65535 * P) + sizeof(vendor-specicific-info)
The CPER length is stored in Section Length record, which is
uint32_t.
So, I'd say that the GHES record can theoretically be a lot
bigger than 4K.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-07 7:47 ` Mauro Carvalho Chehab
@ 2024-08-07 9:34 ` Jonathan Cameron via
2024-08-07 13:23 ` Mauro Carvalho Chehab
2024-08-07 13:28 ` Igor Mammedov
0 siblings, 2 replies; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-07 9:34 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 7 Aug 2024 09:47:50 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 6 Aug 2024 16:31:13 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > PS:
> > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
> > and it is the total size of a error block for a error source.
> >
> > However acpi_hest_ghes.rst (3) says it should be 4K,
> > am I mistaken?
>
> Maybe Jonathan knows better, but I guess the 1K was just some
> arbitrary limit to prevent a too big CPER. The 4K limit described
> at acpi_hest_ghes.rst could be just some limit to cope with
> the current bios implementation, but I didn't check myself how
> this is implemented there.
>
> I was unable to find any limit at the specs. Yet, if you look at:
>
> https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
I think both limits are just made up. You can in theory log huge
error records. Just not one does.
>
> The processor Error Information Structure, starting at offset
> 40, can go up to 255*32, meaning an offset of 8200, which is
> bigger than 4K.
>
> Going further, processor context can have up to 65535 (spec
> actually says 65536, but that sounds a typo, as the size is
> stored on an uint16_t), containing multiple register values
> there (the spec calls its length as "P").
>
> So, the CPER record could, in theory, have:
> 8200 + (65535 * P) + sizeof(vendor-specicific-info)
>
> The CPER length is stored in Section Length record, which is
> uint32_t.
>
> So, I'd say that the GHES record can theoretically be a lot
> bigger than 4K.
Agreed - but I don't think we care for testing as long as it's
big enough for plausible records. Unless you really want
to fuzz the limits?
Jonathan
>
> Thanks,
> Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-07 9:34 ` Jonathan Cameron via
@ 2024-08-07 13:23 ` Mauro Carvalho Chehab
2024-08-07 13:43 ` Igor Mammedov
2024-08-07 13:28 ` Igor Mammedov
1 sibling, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-07 13:23 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Igor Mammedov, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Wed, 7 Aug 2024 10:34:36 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
> On Wed, 7 Aug 2024 09:47:50 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Em Tue, 6 Aug 2024 16:31:13 +0200
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >
> > > PS:
> > > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
> > > and it is the total size of a error block for a error source.
> > >
> > > However acpi_hest_ghes.rst (3) says it should be 4K,
> > > am I mistaken?
> >
> > Maybe Jonathan knows better, but I guess the 1K was just some
> > arbitrary limit to prevent a too big CPER. The 4K limit described
> > at acpi_hest_ghes.rst could be just some limit to cope with
> > the current bios implementation, but I didn't check myself how
> > this is implemented there.
> >
> > I was unable to find any limit at the specs. Yet, if you look at:
> >
> > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
>
> I think both limits are just made up. You can in theory log huge
> error records. Just not one does.
If both are made up, I would sync them, either patching the
documentation or the ghes driver.
>
> >
> > The processor Error Information Structure, starting at offset
> > 40, can go up to 255*32, meaning an offset of 8200, which is
> > bigger than 4K.
> >
> > Going further, processor context can have up to 65535 (spec
> > actually says 65536, but that sounds a typo, as the size is
> > stored on an uint16_t), containing multiple register values
> > there (the spec calls its length as "P").
> >
> > So, the CPER record could, in theory, have:
> > 8200 + (65535 * P) + sizeof(vendor-specicific-info)
> >
> > The CPER length is stored in Section Length record, which is
> > uint32_t.
> >
> > So, I'd say that the GHES record can theoretically be a lot
> > bigger than 4K.
> Agreed - but I don't think we care for testing as long as it's
> big enough for plausible records. Unless you really want
> to fuzz the limits?
Fuzz the limits could be interesting, but it is not on my
current plans.
Yet, 1K could be a little bit short for ARM CPER.
See: N.26 ARMv8 AArch64 GPRs (Type 4) has 256 bytes for
registers, plus 8 bytes for the header. So, a total size of
264 bytes, for a single context register dump. I would expect
that, in real life, type 4 to always be reported on aarch64,
on BIOS with context register support. Maybe other types could
also be dumped altogether (like context registers for EL1,
EL2 and/or EL3).
If just one type 4 context is encoded, it means that, 1K has
space for 23 errors (of a max limit of 255).
Just looking at the maximum number, my feeling is that 1K
might be too short to simulate some real life reports,
but that depends on how firmware is actually grouping
such events.
So, maybe this could be expanded to, let's say, 4K, thus
aligning with the ReST documentation.
Regards,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-07 9:34 ` Jonathan Cameron via
2024-08-07 13:23 ` Mauro Carvalho Chehab
@ 2024-08-07 13:28 ` Igor Mammedov
1 sibling, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-07 13:28 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Mauro Carvalho Chehab, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 7 Aug 2024 10:34:36 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> On Wed, 7 Aug 2024 09:47:50 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Em Tue, 6 Aug 2024 16:31:13 +0200
> > Igor Mammedov <imammedo@redhat.com> escreveu:
> >
> > > PS:
> > > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
> > > and it is the total size of a error block for a error source.
> > >
> > > However acpi_hest_ghes.rst (3) says it should be 4K,
> > > am I mistaken?
> >
> > Maybe Jonathan knows better, but I guess the 1K was just some
> > arbitrary limit to prevent a too big CPER. The 4K limit described
> > at acpi_hest_ghes.rst could be just some limit to cope with
> > the current bios implementation, but I didn't check myself how
> > this is implemented there.
> >
> > I was unable to find any limit at the specs. Yet, if you look at:
> >
> > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
>
> I think both limits are just made up. You can in theory log huge
> error records. Just not one does.
What I care about is what we actually allocate vs what we promised
in docs. Given that it's harder to change actual size (we would need
a compat handling here not to break old machine types), I would vote
for syncing docs to match code.
A separate stand-alone patch for fixing it would do,
or it could be a part of series.
Also I'd like another pair of eyes to look at it to confirm actual
size we allocate, in case I'm not seeing it right.
> > The processor Error Information Structure, starting at offset
> > 40, can go up to 255*32, meaning an offset of 8200, which is
> > bigger than 4K.
> >
> > Going further, processor context can have up to 65535 (spec
> > actually says 65536, but that sounds a typo, as the size is
> > stored on an uint16_t), containing multiple register values
> > there (the spec calls its length as "P").
> >
> > So, the CPER record could, in theory, have:
> > 8200 + (65535 * P) + sizeof(vendor-specicific-info)
> >
> > The CPER length is stored in Section Length record, which is
> > uint32_t.
> >
> > So, I'd say that the GHES record can theoretically be a lot
> > bigger than 4K.
> Agreed - but I don't think we care for testing as long as it's
> big enough for plausible records. Unless you really want
> to fuzz the limits?
>
> Jonathan
>
> >
> > Thanks,
> > Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-07 13:23 ` Mauro Carvalho Chehab
@ 2024-08-07 13:43 ` Igor Mammedov
0 siblings, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-07 13:43 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 7 Aug 2024 15:23:57 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Wed, 7 Aug 2024 10:34:36 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> escreveu:
>
> > On Wed, 7 Aug 2024 09:47:50 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > Em Tue, 6 Aug 2024 16:31:13 +0200
> > > Igor Mammedov <imammedo@redhat.com> escreveu:
> > >
> > > > PS:
> > > > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K
> > > > and it is the total size of a error block for a error source.
> > > >
> > > > However acpi_hest_ghes.rst (3) says it should be 4K,
> > > > am I mistaken?
> > >
> > > Maybe Jonathan knows better, but I guess the 1K was just some
> > > arbitrary limit to prevent a too big CPER. The 4K limit described
> > > at acpi_hest_ghes.rst could be just some limit to cope with
> > > the current bios implementation, but I didn't check myself how
> > > this is implemented there.
> > >
> > > I was unable to find any limit at the specs. Yet, if you look at:
> > >
> > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
> >
> > I think both limits are just made up. You can in theory log huge
> > error records. Just not one does.
>
> If both are made up, I would sync them, either patching the
> documentation or the ghes driver.
>
> >
> > >
> > > The processor Error Information Structure, starting at offset
> > > 40, can go up to 255*32, meaning an offset of 8200, which is
> > > bigger than 4K.
> > >
> > > Going further, processor context can have up to 65535 (spec
> > > actually says 65536, but that sounds a typo, as the size is
> > > stored on an uint16_t), containing multiple register values
> > > there (the spec calls its length as "P").
> > >
> > > So, the CPER record could, in theory, have:
> > > 8200 + (65535 * P) + sizeof(vendor-specicific-info)
> > >
> > > The CPER length is stored in Section Length record, which is
> > > uint32_t.
> > >
> > > So, I'd say that the GHES record can theoretically be a lot
> > > bigger than 4K.
> > Agreed - but I don't think we care for testing as long as it's
> > big enough for plausible records. Unless you really want
> > to fuzz the limits?
>
> Fuzz the limits could be interesting, but it is not on my
> current plans.
>
> Yet, 1K could be a little bit short for ARM CPER.
>
> See: N.26 ARMv8 AArch64 GPRs (Type 4) has 256 bytes for
> registers, plus 8 bytes for the header. So, a total size of
> 264 bytes, for a single context register dump. I would expect
> that, in real life, type 4 to always be reported on aarch64,
> on BIOS with context register support. Maybe other types could
> also be dumped altogether (like context registers for EL1,
> EL2 and/or EL3).
>
> If just one type 4 context is encoded, it means that, 1K has
> space for 23 errors (of a max limit of 255).
>
> Just looking at the maximum number, my feeling is that 1K
> might be too short to simulate some real life reports,
> but that depends on how firmware is actually grouping
> such events.
per my knowledge firmware is out of picture here, since all
it does in HEST case is allocate continuous space for
'etc/hardware_errors' blob as QEMU told it.
>
> So, maybe this could be expanded to, let's say, 4K, thus
> aligning with the ReST documentation.
maybe to get moving, 1st get your series in with docs fixed
to today limit.
And then increase error_block size to desired value on top of that
as it's really not relevant to what you are doing here.
> Regards,
> Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-06 14:31 ` Igor Mammedov
2024-08-07 7:47 ` Mauro Carvalho Chehab
@ 2024-08-07 14:25 ` Jonathan Cameron via
2024-08-08 8:11 ` Igor Mammedov
2024-08-08 12:11 ` Mauro Carvalho Chehab
2 siblings, 1 reply; 54+ messages in thread
From: Jonathan Cameron via @ 2024-08-07 14:25 UTC (permalink / raw)
To: Igor Mammedov
Cc: Mauro Carvalho Chehab, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Tue, 6 Aug 2024 16:31:13 +0200
Igor Mammedov <imammedo@redhat.com> wrote:
> On Fri, 2 Aug 2024 23:44:01 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Provide a generic interface for error injection via GHESv2.
> >
> > This patch is co-authored:
> > - original ghes logic to inject a simple ARM record by Shiju Jose;
> > - generic logic to handle block addresses by Jonathan Cameron;
> > - generic GHESv2 error inject by Mauro Carvalho Chehab;
> >
> > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Cc: Shiju Jose <shiju.jose@huawei.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> > hw/acpi/ghes_cper.c | 2 +-
> > include/hw/acpi/ghes.h | 3 +
> > 3 files changed, 152 insertions(+), 12 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index a745dcc7be5e..e125c9475773 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > ags->present = true;
> > }
> >
> > +static uint64_t ghes_get_state_start_address(void)
>
> ghes_get_hardware_errors_address() might better reflect what address it will return
>
> > +{
> > + AcpiGedState *acpi_ged_state =
> > + ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> > + AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> > +
> > + return le64_to_cpu(ags->ghes_addr_le);
> > +}
> > +
> > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > {
> > uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > - uint64_t start_addr;
> > + uint64_t start_addr = ghes_get_state_start_address();
> > bool ret = -1;
> > - AcpiGedState *acpi_ged_state;
> > - AcpiGhesState *ags;
> > -
> > assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> >
> > - acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > - NULL));
> > - g_assert(acpi_ged_state);
> > - ags = &acpi_ged_state->ghes_state;
> > -
> > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > -
> > if (physical_address) {
> > start_addr += source_id * sizeof(uint64_t);
>
> above should be a separate patch
>
> >
> > @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > return ret;
> > }
> >
> > +/*
> > + * Error register block data layout
> > + *
> > + * | +---------------------+ ges.ghes_addr_le
> > + * | |error_block_address0 |
> > + * | +---------------------+
> > + * | |error_block_address1 |
> > + * | +---------------------+ --+--
> > + * | | ............. | GHES_ADDRESS_SIZE
> > + * | +---------------------+ --+--
> > + * | |error_block_addressN |
> > + * | +---------------------+
> > + * | | read_ack0 |
> > + * | +---------------------+ --+--
> > + * | | read_ack1 | GHES_ADDRESS_SIZE
> > + * | +---------------------+ --+--
> > + * | | ............. |
> > + * | +---------------------+
> > + * | | read_ackN |
> > + * | +---------------------+ --+--
> > + * | | CPER | |
> > + * | | .... | GHES_MAX_RAW_DATA_LENGT
> > + * | | CPER | |
> > + * | +---------------------+ --+--
> > + * | | .......... |
> > + * | +---------------------+
> > + * | | CPER |
> > + * | | .... |
> > + * | | CPER |
> > + * | +---------------------+
> > + */
>
> no need to duplicate docs/specs/acpi_hest_ghes.rst,
> I'd just reffer to it and maybe add short comment as to why it's mentioned.
>
> > +/* Map from uint32_t notify to entry offset in GHES */
> > +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> > + 0xff, 0xff, 0xff, 1, 0};
> > +
> > +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> > + uint64_t *read_ack_addr)
> > +{
> > + uint64_t base;
> > +
> > + if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> > + return false;
> > + }
> > +
> > + /* Find and check the source id for this new CPER */
> > + if (error_source_to_index[notify] == 0xff) {
> > + return false;
> > + }
> > +
> > + base = ghes_get_state_start_address();
> > +
> > + *read_ack_addr = base +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > + error_source_to_index[notify] * sizeof(uint64_t);
> > +
> > + /* Could also be read back from the error_block_address register */
> > + *error_block_addr = base +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > +
> > + return true;
> > +}
>
> I don't like all this pointer math, which is basically a reverse engineered
> QEMU actions on startup + guest provided etc/hardware_errors address.
>
> For once, it assumes error_source_to_index[] matches order in which HEST
> error sources were described, which is fragile.
>
> 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> in RAM migrated from older version might not match above assumptions
> of target QEMU.
>
> I see 2 ways to rectify it:
> 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> in guest RAM, like we do with etc/hardware_errors, see
> build_ghes_error_table()
> ...
> tell firmware to write hardware_errors GPA into
> and then fetch from HEST table in RAM, the guest patched error/ack addresses
> for given source_id
>
> code-wise: relatively simple once one wraps their own head over
> how this whole APEI thing works in QEMU
> workflow is described in docs/specs/acpi_hest_ghes.rst
> look to me as sufficient to grasp it.
> (but my view is very biased given my prior knowledge,
> aka: docs/comments/examples wrt acpi patching are good enough)
> (if it's not clear how to do it, ask me for pointers)
Hi Igor, I think I follow what you mean but maybe this question will reveal
otherwise. HEST is currently in ACPI_BUILD_TABLE_FILE.
Would you suggest splitting it to it's own file, or using table_offsets
to get the offset in ACPI_BUILD_TABLE_FILE GPA?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-07 14:25 ` Jonathan Cameron via
@ 2024-08-08 8:11 ` Igor Mammedov
2024-08-08 18:19 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 54+ messages in thread
From: Igor Mammedov @ 2024-08-08 8:11 UTC (permalink / raw)
To: Jonathan Cameron
Cc: Mauro Carvalho Chehab, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Wed, 7 Aug 2024 15:25:47 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> On Tue, 6 Aug 2024 16:31:13 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
>
> > On Fri, 2 Aug 2024 23:44:01 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > Provide a generic interface for error injection via GHESv2.
> > >
> > > This patch is co-authored:
> > > - original ghes logic to inject a simple ARM record by Shiju Jose;
> > > - generic logic to handle block addresses by Jonathan Cameron;
> > > - generic GHESv2 error inject by Mauro Carvalho Chehab;
> > >
> > > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Cc: Shiju Jose <shiju.jose@huawei.com>
> > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > ---
> > > hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> > > hw/acpi/ghes_cper.c | 2 +-
> > > include/hw/acpi/ghes.h | 3 +
> > > 3 files changed, 152 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > index a745dcc7be5e..e125c9475773 100644
> > > --- a/hw/acpi/ghes.c
> > > +++ b/hw/acpi/ghes.c
> > > @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > ags->present = true;
> > > }
> > >
> > > +static uint64_t ghes_get_state_start_address(void)
> >
> > ghes_get_hardware_errors_address() might better reflect what address it will return
> >
> > > +{
> > > + AcpiGedState *acpi_ged_state =
> > > + ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> > > + AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> > > +
> > > + return le64_to_cpu(ags->ghes_addr_le);
> > > +}
> > > +
> > > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > {
> > > uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > > - uint64_t start_addr;
> > > + uint64_t start_addr = ghes_get_state_start_address();
> > > bool ret = -1;
> > > - AcpiGedState *acpi_ged_state;
> > > - AcpiGhesState *ags;
> > > -
> > > assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> > >
> > > - acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > > - NULL));
> > > - g_assert(acpi_ged_state);
> > > - ags = &acpi_ged_state->ghes_state;
> > > -
> > > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > > -
> > > if (physical_address) {
> > > start_addr += source_id * sizeof(uint64_t);
> >
> > above should be a separate patch
> >
> > >
> > > @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > return ret;
> > > }
> > >
> > > +/*
> > > + * Error register block data layout
> > > + *
> > > + * | +---------------------+ ges.ghes_addr_le
> > > + * | |error_block_address0 |
> > > + * | +---------------------+
> > > + * | |error_block_address1 |
> > > + * | +---------------------+ --+--
> > > + * | | ............. | GHES_ADDRESS_SIZE
> > > + * | +---------------------+ --+--
> > > + * | |error_block_addressN |
> > > + * | +---------------------+
> > > + * | | read_ack0 |
> > > + * | +---------------------+ --+--
> > > + * | | read_ack1 | GHES_ADDRESS_SIZE
> > > + * | +---------------------+ --+--
> > > + * | | ............. |
> > > + * | +---------------------+
> > > + * | | read_ackN |
> > > + * | +---------------------+ --+--
> > > + * | | CPER | |
> > > + * | | .... | GHES_MAX_RAW_DATA_LENGT
> > > + * | | CPER | |
> > > + * | +---------------------+ --+--
> > > + * | | .......... |
> > > + * | +---------------------+
> > > + * | | CPER |
> > > + * | | .... |
> > > + * | | CPER |
> > > + * | +---------------------+
> > > + */
> >
> > no need to duplicate docs/specs/acpi_hest_ghes.rst,
> > I'd just reffer to it and maybe add short comment as to why it's mentioned.
> >
> > > +/* Map from uint32_t notify to entry offset in GHES */
> > > +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> > > + 0xff, 0xff, 0xff, 1, 0};
> > > +
> > > +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> > > + uint64_t *read_ack_addr)
> > > +{
> > > + uint64_t base;
> > > +
> > > + if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> > > + return false;
> > > + }
> > > +
> > > + /* Find and check the source id for this new CPER */
> > > + if (error_source_to_index[notify] == 0xff) {
> > > + return false;
> > > + }
> > > +
> > > + base = ghes_get_state_start_address();
> > > +
> > > + *read_ack_addr = base +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + error_source_to_index[notify] * sizeof(uint64_t);
> > > +
> > > + /* Could also be read back from the error_block_address register */
> > > + *error_block_addr = base +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > > +
> > > + return true;
> > > +}
> >
> > I don't like all this pointer math, which is basically a reverse engineered
> > QEMU actions on startup + guest provided etc/hardware_errors address.
> >
> > For once, it assumes error_source_to_index[] matches order in which HEST
> > error sources were described, which is fragile.
> >
> > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> > in RAM migrated from older version might not match above assumptions
> > of target QEMU.
> >
> > I see 2 ways to rectify it:
> > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> > in guest RAM, like we do with etc/hardware_errors, see
> > build_ghes_error_table()
> > ...
> > tell firmware to write hardware_errors GPA into
> > and then fetch from HEST table in RAM, the guest patched error/ack addresses
> > for given source_id
> >
> > code-wise: relatively simple once one wraps their own head over
> > how this whole APEI thing works in QEMU
> > workflow is described in docs/specs/acpi_hest_ghes.rst
> > look to me as sufficient to grasp it.
> > (but my view is very biased given my prior knowledge,
> > aka: docs/comments/examples wrt acpi patching are good enough)
> > (if it's not clear how to do it, ask me for pointers)
>
> Hi Igor, I think I follow what you mean but maybe this question will reveal
> otherwise. HEST is currently in ACPI_BUILD_TABLE_FILE.
> Would you suggest splitting it to it's own file, or using table_offsets
> to get the offset in ACPI_BUILD_TABLE_FILE GPA?
yep, offset taken right before HEST is to be created
doc comment for bios_linker_loader_write_pointer() explains how it works
we need something like:
bios_linker_loader_write_pointer(linker,
ACPI_HEST_TABLE_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
ACPI_BUILD_TABLE_FILE, hest_offset_within_ACPI_BUILD_TABLE_FILE);
to register new file see:
a08a64627 ACPI: Record the Generic Error Status Block address
and to avoid copy past error maybe
136fc6aa2 ACPI: Avoid infinite recursion when dump-vmstat
for this needs to be limited to new machine types and keep
old ones without this new feature. (I'd use hw_compat_ machinery for that)
while at it we should rename
ACPI_GHES_DATA_ADDR_FW_CFG_FILE -> ACPI_GHES_ERRORS_ADDR_FW_CFG_FILE
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-08-06 12:51 ` Igor Mammedov
@ 2024-08-08 8:50 ` Markus Armbruster
2024-08-08 14:11 ` Mauro Carvalho Chehab
3 siblings, 1 reply; 54+ messages in thread
From: Markus Armbruster @ 2024-08-08 8:50 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> Creates a QMP command to be used for generic ACPI APEI hardware error
> injection (HEST) via GHESv2.
>
> The actual GHES code will be added at the followup patch.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 7 +++++
> hw/acpi/Kconfig | 5 ++++
> hw/acpi/ghes_cper.c | 45 ++++++++++++++++++++++++++++++++
> hw/acpi/ghes_cper_stub.c | 18 +++++++++++++
> hw/acpi/meson.build | 2 ++
> hw/arm/Kconfig | 5 ++++
> include/hw/acpi/ghes.h | 7 +++++
> qapi/ghes-cper.json | 55 ++++++++++++++++++++++++++++++++++++++++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> 10 files changed, 146 insertions(+)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c
> create mode 100644 qapi/ghes-cper.json
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 98eddf7ae155..655edcb6688c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> F: include/hw/acpi/ghes.h
> F: docs/specs/acpi_hest_ghes.rst
>
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/ghes_cper.c
> +F: hw/acpi/ghes_cper_stub.c
> +F: qapi/ghes-cper.json
> +
Here's the reason for creating a new QAPI module instead of adding to
existing module acpi.json: different maintainers.
Hypothetical question: if we didn't care for that, would this go into
qapi/acpi.json?
If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json
instead?
> ppc4xx
> L: qemu-ppc@nongnu.org
> S: Orphan
[...]
> diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
> new file mode 100644
> index 000000000000..3cc4f9f2aaa9
> --- /dev/null
> +++ b/qapi/ghes-cper.json
> @@ -0,0 +1,55 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = GHESv2 CPER Error Injection
> +#
> +# These are defined at
> +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> +# (GHESv2 - Type 10)
> +##
Feels a bit terse. These what?
The reference could be clearer: "defined in the ACPI Specification 6.2,
section 18.3.2.8 Generic Hardware Error Source version 2". A link would
be nice, if it's stable.
> +
> +##
> +# @CommonPlatformErrorRecord:
> +#
> +# Common Platform Error Record - CPER - as defined at the UEFI
> +# specification. See
> +# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
> +# for more details.
> +#
> +# @notification-type: pre-assigned GUID string indicating the record
> +# association with an error event notification type, as defined
> +# at https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
Please indent four spaces for consistency, like this:
# @notification-type: pre-assigned GUID string indicating the record
# association with an error event notification type, as defined at
# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#record-header
> +#
> +# @raw-data: Contains a base64 encoded string with the payload of
> +# the CPER.
Suggest
# @raw-data: payload of the CPER encoded in base64
Have you considered naming this @payload instead?
> +#
> +# Since: 9.2
> +##
> +{ 'struct': 'CommonPlatformErrorRecord',
> + 'data': {
> + 'notification-type': 'str',
> + 'raw-data': 'str'
> + }
> +}
> +
> +##
> +# @ghes-cper:
> +#
> +# Inject ARM Processor error with data to be filled according with
> +# ACPI 6.2 GHESv2 spec.
according to
(Beware, I'm not a native speaker)
> +#
> +# @cper: a single CPER record to be sent to the guest OS.
> +#
> +# Features:
> +#
> +# @unstable: This command is experimental.
> +#
> +# Since: 9.2
> +##
> +{ 'command': 'ghes-cper',
> + 'data': {
> + 'cper': 'CommonPlatformErrorRecord'
> + },
> + 'features': [ 'unstable' ]
> +}
> diff --git a/qapi/meson.build b/qapi/meson.build
> index e7bc54e5d047..bd13cd7d40c9 100644
> --- a/qapi/meson.build
> +++ b/qapi/meson.build
> @@ -35,6 +35,7 @@ qapi_all_modules = [
> 'dump',
> 'ebpf',
> 'error',
> + 'ghes-cper',
> 'introspect',
> 'job',
> 'machine-common',
> diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
> index b1581988e4eb..c1a267399fe5 100644
> --- a/qapi/qapi-schema.json
> +++ b/qapi/qapi-schema.json
> @@ -75,6 +75,7 @@
> { 'include': 'misc-target.json' }
> { 'include': 'audio.json' }
> { 'include': 'acpi.json' }
> +{ 'include': 'ghes-cper.json' }
> { 'include': 'pci.json' }
> { 'include': 'stats.json' }
> { 'include': 'virtio.json' }
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-06 14:31 ` Igor Mammedov
2024-08-07 7:47 ` Mauro Carvalho Chehab
2024-08-07 14:25 ` Jonathan Cameron via
@ 2024-08-08 12:11 ` Mauro Carvalho Chehab
2024-08-08 12:45 ` Igor Mammedov
2 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-08 12:11 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Tue, 6 Aug 2024 16:31:13 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > + /* Could also be read back from the error_block_address register */
> > + *error_block_addr = base +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > +
> > + return true;
> > +}
>
> I don't like all this pointer math, which is basically a reverse engineered
> QEMU actions on startup + guest provided etc/hardware_errors address.
>
> For once, it assumes error_source_to_index[] matches order in which HEST
> error sources were described, which is fragile.
>
> 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> in RAM migrated from older version might not match above assumptions
> of target QEMU.
>
> I see 2 ways to rectify it:
> 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> in guest RAM, like we do with etc/hardware_errors, see
> build_ghes_error_table()
> ...
> tell firmware to write hardware_errors GPA into
> and then fetch from HEST table in RAM, the guest patched error/ack addresses
> for given source_id
>
> code-wise: relatively simple once one wraps their own head over
> how this whole APEI thing works in QEMU
> workflow is described in docs/specs/acpi_hest_ghes.rst
> look to me as sufficient to grasp it.
> (but my view is very biased given my prior knowledge,
> aka: docs/comments/examples wrt acpi patching are good enough)
> (if it's not clear how to do it, ask me for pointers)
That sounds a better approach, however...
> 2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack Register
> patching instructions
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> during build_ghes_v2() also store on a side mapping
> source_id -> error address offset : read ack address
>
> so when you are injecting error, you'd at least use offsets
> used at start time, to get rid of risk where injection code
> diverge from HEST:etc/hardware_errors layout at start time.
>
> However to make migration safe, one would need to add a fat
> comment not to change order ghest error sources in HEST _and_
> a dedicated unit test to make sure we catch it when that happens.
> bios_tables_test should be able to catch the change, but it won't
> say what's wrong, hence a test case that explicitly checks order
> and loudly & clear complains when we will break order assumptions.
>
> downside:
> * we are are limiting ways HEST could be composed/reshuffled in future
> * consumption of extra CI resources
> * and well, it relies on above duct tape holding all pieces together
I ended opting to do approach (2) on this changeset, as the current code
is already using bios_linker_loader_add_pointer() for ghes, being deeply
relying on the block address/ack and cper calculus.
To avoid troubles on this duct tape, I opted to move all offset math
to a single function at ghes.c:
/*
* ID numbers used to fill HEST source ID field
*/
enum AcpiHestSourceId {
ACPI_HEST_SRC_ID_SEA,
ACPI_HEST_SRC_ID_GED,
/* Shall be the last one */
ACPI_HEST_SRC_ID_COUNT
} AcpiHestSourceId;
...
static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify,
uint64_t *error_block_offset,
uint64_t *ack_offset,
uint64_t *cper_offset,
enum AcpiHestSourceId *source_id)
{
enum AcpiHestSourceId source;
uint64_t offset;
switch (notify) {
case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */
source = ACPI_HEST_SRC_ID_SEA;
break;
case ACPI_GHES_NOTIFY_GPIO:
source = ACPI_HEST_SRC_ID_GED;
break;
default:
return true;
}
if (source_id) {
*source_id = source;
}
/*
* Please see docs/specs/acpi_hest_ghes.rst for the memory layout.
* In summary, memory starts with error addresses, then acks and
* finally CPER blocks.
*/
offset = source * sizeof(uint64_t);
if (error_block_offset) {
*error_block_offset = offset;
}
if (ack_offset) {
*ack_offset = offset + ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t);
}
if (cper_offset) {
*cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) +
source * ACPI_GHES_MAX_RAW_DATA_LENGTH;
}
return false;
}
I also removed the anonymous enum with SEA/GPIO source IDs, using
only the ACPI notify type as arguments at the function calls.
As there's now a single point where the offsets from
docs/specs/acpi_hest_ghes.rst are enforced, this should be error
prone.
The code could later be changed to use approach (2), on a separate
cleanup.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-08 12:11 ` Mauro Carvalho Chehab
@ 2024-08-08 12:45 ` Igor Mammedov
0 siblings, 0 replies; 54+ messages in thread
From: Igor Mammedov @ 2024-08-08 12:45 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Thu, 8 Aug 2024 14:11:14 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 6 Aug 2024 16:31:13 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > > + /* Could also be read back from the error_block_address register */
> > > + *error_block_addr = base +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > > +
> > > + return true;
> > > +}
> >
> > I don't like all this pointer math, which is basically a reverse engineered
> > QEMU actions on startup + guest provided etc/hardware_errors address.
> >
> > For once, it assumes error_source_to_index[] matches order in which HEST
> > error sources were described, which is fragile.
> >
> > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> > in RAM migrated from older version might not match above assumptions
> > of target QEMU.
> >
> > I see 2 ways to rectify it:
> > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> > in guest RAM, like we do with etc/hardware_errors, see
> > build_ghes_error_table()
> > ...
> > tell firmware to write hardware_errors GPA into
> > and then fetch from HEST table in RAM, the guest patched error/ack addresses
> > for given source_id
> >
> > code-wise: relatively simple once one wraps their own head over
> > how this whole APEI thing works in QEMU
> > workflow is described in docs/specs/acpi_hest_ghes.rst
> > look to me as sufficient to grasp it.
> > (but my view is very biased given my prior knowledge,
> > aka: docs/comments/examples wrt acpi patching are good enough)
> > (if it's not clear how to do it, ask me for pointers)
>
> That sounds a better approach, however...
>
> > 2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack Register
> > patching instructions
> > bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> > address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> > ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > during build_ghes_v2() also store on a side mapping
> > source_id -> error address offset : read ack address
> >
> > so when you are injecting error, you'd at least use offsets
> > used at start time, to get rid of risk where injection code
> > diverge from HEST:etc/hardware_errors layout at start time.
> >
> > However to make migration safe, one would need to add a fat
> > comment not to change order ghest error sources in HEST _and_
> > a dedicated unit test to make sure we catch it when that happens.
> > bios_tables_test should be able to catch the change, but it won't
> > say what's wrong, hence a test case that explicitly checks order
> > and loudly & clear complains when we will break order assumptions.
> >
> > downside:
> > * we are are limiting ways HEST could be composed/reshuffled in future
> > * consumption of extra CI resources
> > * and well, it relies on above duct tape holding all pieces together
>
> I ended opting to do approach (2) on this changeset, as the current code
> is already using bios_linker_loader_add_pointer() for ghes, being deeply
> relying on the block address/ack and cper calculus.
I consider (2) as a fallback in case (1) can't be done with reasonable effort.
At this point, (1) looks doable and I'm not convinced that duct tape
is necessary and that we badly need to rush in this series.
Hence I'd strongly prefer (1).
See my other reply to Jonathan, setting write pointer is not hard.
Parsing HEST doesn't have to be a full tables parser as long as
it respects table types/length/revision then we can cheat by using
documented offsets from ACPI spec as is, for fields we
need to access.
> To avoid troubles on this duct tape, I opted to move all offset math
> to a single function at ghes.c:
>
> /*
> * ID numbers used to fill HEST source ID field
> */
> enum AcpiHestSourceId {
> ACPI_HEST_SRC_ID_SEA,
> ACPI_HEST_SRC_ID_GED,
>
> /* Shall be the last one */
> ACPI_HEST_SRC_ID_COUNT
> } AcpiHestSourceId;
>
> ...
>
> static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify,
> uint64_t *error_block_offset,
> uint64_t *ack_offset,
> uint64_t *cper_offset,
> enum AcpiHestSourceId *source_id)
> {
> enum AcpiHestSourceId source;
> uint64_t offset;
>
> switch (notify) {
> case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */
> source = ACPI_HEST_SRC_ID_SEA;
> break;
> case ACPI_GHES_NOTIFY_GPIO:
> source = ACPI_HEST_SRC_ID_GED;
> break;
> default:
> return true;
> }
>
> if (source_id) {
> *source_id = source;
> }
>
> /*
> * Please see docs/specs/acpi_hest_ghes.rst for the memory layout.
> * In summary, memory starts with error addresses, then acks and
> * finally CPER blocks.
> */
>
> offset = source * sizeof(uint64_t);
>
> if (error_block_offset) {
> *error_block_offset = offset;
> }
> if (ack_offset) {
> *ack_offset = offset + ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t);
> }
> if (cper_offset) {
> *cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) +
> source * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> }
>
> return false;
> }
>
> I also removed the anonymous enum with SEA/GPIO source IDs, using
> only the ACPI notify type as arguments at the function calls.
>
> As there's now a single point where the offsets from
> docs/specs/acpi_hest_ghes.rst are enforced, this should be error
> prone.
>
> The code could later be changed to use approach (2), on a separate
> cleanup.
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-08 8:50 ` Markus Armbruster
@ 2024-08-08 14:11 ` Mauro Carvalho Chehab
2024-08-08 14:22 ` Igor Mammedov
0 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-08 14:11 UTC (permalink / raw)
To: Markus Armbruster
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Eric Blake, Igor Mammedov, Michael Roth,
Paolo Bonzini, Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Em Thu, 08 Aug 2024 10:50:33 +0200
Markus Armbruster <armbru@redhat.com> escreveu:
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 98eddf7ae155..655edcb6688c 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> > F: include/hw/acpi/ghes.h
> > F: docs/specs/acpi_hest_ghes.rst
> >
> > +ACPI/HEST/GHES/ARM processor CPER
> > +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > +S: Maintained
> > +F: hw/arm/ghes_cper.c
> > +F: hw/acpi/ghes_cper_stub.c
> > +F: qapi/ghes-cper.json
> > +
>
> Here's the reason for creating a new QAPI module instead of adding to
> existing module acpi.json: different maintainers.
>
> Hypothetical question: if we didn't care for that, would this go into
> qapi/acpi.json?
Independently of maintainers, GHES is part of ACPI APEI HEST, meaning
to report hardware errors. Such hardware errors are typically handled by
the host OS, so quest doesn't need to be aware of that[1].
So, IMO the best would be to keep APEI/HEST/GHES in a separate file.
[1] still, I can foresee some scenarios were passing some errors to the
guest could make sense.
>
> If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json
> instead?
Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work
from my side.
>
> > ppc4xx
> > L: qemu-ppc@nongnu.org
> > S: Orphan
>
> [...]
>
> > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
> > new file mode 100644
> > index 000000000000..3cc4f9f2aaa9
> > --- /dev/null
> > +++ b/qapi/ghes-cper.json
> > @@ -0,0 +1,55 @@
> > +# -*- Mode: Python -*-
> > +# vim: filetype=python
> > +
> > +##
> > +# = GHESv2 CPER Error Injection
> > +#
> > +# These are defined at
> > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > +# (GHESv2 - Type 10)
> > +##
>
> Feels a bit terse. These what?
>
> The reference could be clearer: "defined in the ACPI Specification 6.2,
> section 18.3.2.8 Generic Hardware Error Source version 2". A link would
> be nice, if it's stable.
I can add a link, but only newer ACPI versions are hosted in html format
(e. g. only versions 6.4 and 6.5 are available as html at uefi.org).
Can I place something like:
Defined since ACPI Specification 6.2,
section 18.3.2.8 Generic Hardware Error Source version 2. See:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10
e. g. having the link pointing to ACPI 6.4 or 6.5, instead of 6.2?
> # @raw-data: payload of the CPER encoded in base64
>
> Have you considered naming this @payload instead?
Works for me.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-08 14:11 ` Mauro Carvalho Chehab
@ 2024-08-08 14:22 ` Igor Mammedov
2024-08-08 14:45 ` Markus Armbruster
0 siblings, 1 reply; 54+ messages in thread
From: Igor Mammedov @ 2024-08-08 14:22 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Markus Armbruster, Jonathan Cameron, Shiju Jose,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
Michael Roth, Paolo Bonzini, Peter Maydell, linux-kernel,
qemu-arm, qemu-devel
On Thu, 8 Aug 2024 16:11:41 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Thu, 08 Aug 2024 10:50:33 +0200
> Markus Armbruster <armbru@redhat.com> escreveu:
>
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 98eddf7ae155..655edcb6688c 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> > > F: include/hw/acpi/ghes.h
> > > F: docs/specs/acpi_hest_ghes.rst
> > >
> > > +ACPI/HEST/GHES/ARM processor CPER
> > > +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > +S: Maintained
> > > +F: hw/arm/ghes_cper.c
> > > +F: hw/acpi/ghes_cper_stub.c
> > > +F: qapi/ghes-cper.json
> > > +
> >
> > Here's the reason for creating a new QAPI module instead of adding to
> > existing module acpi.json: different maintainers.
> >
> > Hypothetical question: if we didn't care for that, would this go into
> > qapi/acpi.json?
>
> Independently of maintainers, GHES is part of ACPI APEI HEST, meaning
> to report hardware errors. Such hardware errors are typically handled by
> the host OS, so quest doesn't need to be aware of that[1].
>
> So, IMO the best would be to keep APEI/HEST/GHES in a separate file.
>
> [1] still, I can foresee some scenarios were passing some errors to the
> guest could make sense.
>
> >
> > If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json
> > instead?
>
> Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work
> from my side.
if we going to keep it generic, acpi-hest would do
>
> >
> > > ppc4xx
> > > L: qemu-ppc@nongnu.org
> > > S: Orphan
> >
> > [...]
> >
> > > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
> > > new file mode 100644
> > > index 000000000000..3cc4f9f2aaa9
> > > --- /dev/null
> > > +++ b/qapi/ghes-cper.json
> > > @@ -0,0 +1,55 @@
> > > +# -*- Mode: Python -*-
> > > +# vim: filetype=python
> > > +
> > > +##
> > > +# = GHESv2 CPER Error Injection
> > > +#
> > > +# These are defined at
> > > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > > +# (GHESv2 - Type 10)
> > > +##
> >
> > Feels a bit terse. These what?
> >
> > The reference could be clearer: "defined in the ACPI Specification 6.2,
> > section 18.3.2.8 Generic Hardware Error Source version 2". A link would
> > be nice, if it's stable.
>
> I can add a link, but only newer ACPI versions are hosted in html format
> (e. g. only versions 6.4 and 6.5 are available as html at uefi.org).
some years earlier it could be said 'stable link' about acpi spec hosted
elsewhere. Not the case anymore after umbrella change.
spec name, rev, chapter worked fine for acpi code (it's easy to find wherever spec is hosted).
Probably the same would work for QAPI, I'm not QAPI maintainer though,
so preffered approach here is absolutely up to you.
>
> Can I place something like:
>
> Defined since ACPI Specification 6.2,
> section 18.3.2.8 Generic Hardware Error Source version 2. See:
>
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#generic-hardware-error-source-version-2-ghesv2-type-10
>
> e. g. having the link pointing to ACPI 6.4 or 6.5, instead of 6.2?
>
> > # @raw-data: payload of the CPER encoded in base64
> >
> > Have you considered naming this @payload instead?
>
> Works for me.
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-08 14:22 ` Igor Mammedov
@ 2024-08-08 14:45 ` Markus Armbruster
2024-08-09 8:42 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 54+ messages in thread
From: Markus Armbruster @ 2024-08-08 14:45 UTC (permalink / raw)
To: Igor Mammedov
Cc: Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
Michael Roth, Paolo Bonzini, Peter Maydell, linux-kernel,
qemu-arm, qemu-devel
Igor Mammedov <imammedo@redhat.com> writes:
> On Thu, 8 Aug 2024 16:11:41 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
>> Em Thu, 08 Aug 2024 10:50:33 +0200
>> Markus Armbruster <armbru@redhat.com> escreveu:
>>
>> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>>
>> > > diff --git a/MAINTAINERS b/MAINTAINERS
>> > > index 98eddf7ae155..655edcb6688c 100644
>> > > --- a/MAINTAINERS
>> > > +++ b/MAINTAINERS
>> > > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
>> > > F: include/hw/acpi/ghes.h
>> > > F: docs/specs/acpi_hest_ghes.rst
>> > >
>> > > +ACPI/HEST/GHES/ARM processor CPER
>> > > +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > > +S: Maintained
>> > > +F: hw/arm/ghes_cper.c
>> > > +F: hw/acpi/ghes_cper_stub.c
>> > > +F: qapi/ghes-cper.json
>> > > +
>> >
>> > Here's the reason for creating a new QAPI module instead of adding to
>> > existing module acpi.json: different maintainers.
>> >
>> > Hypothetical question: if we didn't care for that, would this go into
>> > qapi/acpi.json?
>>
>> Independently of maintainers, GHES is part of ACPI APEI HEST, meaning
>> to report hardware errors. Such hardware errors are typically handled by
>> the host OS, so quest doesn't need to be aware of that[1].
>>
>> So, IMO the best would be to keep APEI/HEST/GHES in a separate file.
>>
>> [1] still, I can foresee some scenarios were passing some errors to the
>> guest could make sense.
>>
>> >
>> > If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json
>> > instead?
>>
>> Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work
>> from my side.
>
> if we going to keep it generic, acpi-hest would do
Works for me.
>> > > ppc4xx
>> > > L: qemu-ppc@nongnu.org
>> > > S: Orphan
>> >
>> > [...]
>> >
>> > > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
>> > > new file mode 100644
>> > > index 000000000000..3cc4f9f2aaa9
>> > > --- /dev/null
>> > > +++ b/qapi/ghes-cper.json
>> > > @@ -0,0 +1,55 @@
>> > > +# -*- Mode: Python -*-
>> > > +# vim: filetype=python
>> > > +
>> > > +##
>> > > +# = GHESv2 CPER Error Injection
>> > > +#
>> > > +# These are defined at
>> > > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
>> > > +# (GHESv2 - Type 10)
>> > > +##
>> >
>> > Feels a bit terse. These what?
>> >
>> > The reference could be clearer: "defined in the ACPI Specification 6.2,
>> > section 18.3.2.8 Generic Hardware Error Source version 2". A link would
>> > be nice, if it's stable.
>>
>> I can add a link, but only newer ACPI versions are hosted in html format
>> (e. g. only versions 6.4 and 6.5 are available as html at uefi.org).
>
> some years earlier it could be said 'stable link' about acpi spec hosted
> elsewhere. Not the case anymore after umbrella change.
>
> spec name, rev, chapter worked fine for acpi code (it's easy to find wherever spec is hosted).
> Probably the same would work for QAPI, I'm not QAPI maintainer though,
> so preffered approach here is absolutely up to you.
A link is strictly optional. Stable links are nice, stale links are
annoying. Mauro, you decide :)
Thanks!
[...]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-08 8:11 ` Igor Mammedov
@ 2024-08-08 18:19 ` Mauro Carvalho Chehab
2024-08-12 9:39 ` Igor Mammedov
0 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-08 18:19 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Thu, 8 Aug 2024 10:11:07 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Wed, 7 Aug 2024 15:25:47 +0100
> Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
>
> > On Tue, 6 Aug 2024 16:31:13 +0200
> > Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > > On Fri, 2 Aug 2024 23:44:01 +0200
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >
> > > > Provide a generic interface for error injection via GHESv2.
> > > >
> > > > This patch is co-authored:
> > > > - original ghes logic to inject a simple ARM record by Shiju Jose;
> > > > - generic logic to handle block addresses by Jonathan Cameron;
> > > > - generic GHESv2 error inject by Mauro Carvalho Chehab;
> > > >
> > > > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > > > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > Cc: Shiju Jose <shiju.jose@huawei.com>
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > ---
> > > > hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> > > > hw/acpi/ghes_cper.c | 2 +-
> > > > include/hw/acpi/ghes.h | 3 +
> > > > 3 files changed, 152 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > index a745dcc7be5e..e125c9475773 100644
> > > > --- a/hw/acpi/ghes.c
> > > > +++ b/hw/acpi/ghes.c
> > > > @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > > ags->present = true;
> > > > }
> > > >
> > > > +static uint64_t ghes_get_state_start_address(void)
> > >
> > > ghes_get_hardware_errors_address() might better reflect what address it will return
> > >
> > > > +{
> > > > + AcpiGedState *acpi_ged_state =
> > > > + ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> > > > + AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> > > > +
> > > > + return le64_to_cpu(ags->ghes_addr_le);
> > > > +}
> > > > +
> > > > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > > {
> > > > uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > > > - uint64_t start_addr;
> > > > + uint64_t start_addr = ghes_get_state_start_address();
> > > > bool ret = -1;
> > > > - AcpiGedState *acpi_ged_state;
> > > > - AcpiGhesState *ags;
> > > > -
> > > > assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> > > >
> > > > - acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > > > - NULL));
> > > > - g_assert(acpi_ged_state);
> > > > - ags = &acpi_ged_state->ghes_state;
> > > > -
> > > > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > > > -
> > > > if (physical_address) {
> > > > start_addr += source_id * sizeof(uint64_t);
> > >
> > > above should be a separate patch
> > >
> > > >
> > > > @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > > return ret;
> > > > }
> > > >
> > > > +/*
> > > > + * Error register block data layout
> > > > + *
> > > > + * | +---------------------+ ges.ghes_addr_le
> > > > + * | |error_block_address0 |
> > > > + * | +---------------------+
> > > > + * | |error_block_address1 |
> > > > + * | +---------------------+ --+--
> > > > + * | | ............. | GHES_ADDRESS_SIZE
> > > > + * | +---------------------+ --+--
> > > > + * | |error_block_addressN |
> > > > + * | +---------------------+
> > > > + * | | read_ack0 |
> > > > + * | +---------------------+ --+--
> > > > + * | | read_ack1 | GHES_ADDRESS_SIZE
> > > > + * | +---------------------+ --+--
> > > > + * | | ............. |
> > > > + * | +---------------------+
> > > > + * | | read_ackN |
> > > > + * | +---------------------+ --+--
> > > > + * | | CPER | |
> > > > + * | | .... | GHES_MAX_RAW_DATA_LENGT
> > > > + * | | CPER | |
> > > > + * | +---------------------+ --+--
> > > > + * | | .......... |
> > > > + * | +---------------------+
> > > > + * | | CPER |
> > > > + * | | .... |
> > > > + * | | CPER |
> > > > + * | +---------------------+
> > > > + */
> > >
> > > no need to duplicate docs/specs/acpi_hest_ghes.rst,
> > > I'd just reffer to it and maybe add short comment as to why it's mentioned.
> > >
> > > > +/* Map from uint32_t notify to entry offset in GHES */
> > > > +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> > > > + 0xff, 0xff, 0xff, 1, 0};
> > > > +
> > > > +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> > > > + uint64_t *read_ack_addr)
> > > > +{
> > > > + uint64_t base;
> > > > +
> > > > + if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> > > > + return false;
> > > > + }
> > > > +
> > > > + /* Find and check the source id for this new CPER */
> > > > + if (error_source_to_index[notify] == 0xff) {
> > > > + return false;
> > > > + }
> > > > +
> > > > + base = ghes_get_state_start_address();
> > > > +
> > > > + *read_ack_addr = base +
> > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > + error_source_to_index[notify] * sizeof(uint64_t);
> > > > +
> > > > + /* Could also be read back from the error_block_address register */
> > > > + *error_block_addr = base +
> > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > > > +
> > > > + return true;
> > > > +}
> > >
> > > I don't like all this pointer math, which is basically a reverse engineered
> > > QEMU actions on startup + guest provided etc/hardware_errors address.
> > >
> > > For once, it assumes error_source_to_index[] matches order in which HEST
> > > error sources were described, which is fragile.
> > >
> > > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> > > in RAM migrated from older version might not match above assumptions
> > > of target QEMU.
> > >
> > > I see 2 ways to rectify it:
> > > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> > > in guest RAM, like we do with etc/hardware_errors, see
> > > build_ghes_error_table()
> > > ...
> > > tell firmware to write hardware_errors GPA into
> > > and then fetch from HEST table in RAM, the guest patched error/ack addresses
> > > for given source_id
> > >
> > > code-wise: relatively simple once one wraps their own head over
> > > how this whole APEI thing works in QEMU
> > > workflow is described in docs/specs/acpi_hest_ghes.rst
> > > look to me as sufficient to grasp it.
> > > (but my view is very biased given my prior knowledge,
> > > aka: docs/comments/examples wrt acpi patching are good enough)
> > > (if it's not clear how to do it, ask me for pointers)
> >
> > Hi Igor, I think I follow what you mean but maybe this question will reveal
> > otherwise. HEST is currently in ACPI_BUILD_TABLE_FILE.
> > Would you suggest splitting it to it's own file, or using table_offsets
> > to get the offset in ACPI_BUILD_TABLE_FILE GPA?
> yep, offset taken right before HEST is to be created
> doc comment for bios_linker_loader_write_pointer() explains how it works
>
> we need something like:
> bios_linker_loader_write_pointer(linker,
> ACPI_HEST_TABLE_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
> ACPI_BUILD_TABLE_FILE, hest_offset_within_ACPI_BUILD_TABLE_FILE);
>
> to register new file see:
> a08a64627 ACPI: Record the Generic Error Status Block address
> and to avoid copy past error maybe
> 136fc6aa2 ACPI: Avoid infinite recursion when dump-vmstat
> for this needs to be limited to new machine types and keep
> old ones without this new feature. (I'd use hw_compat_ machinery for that)
Not sure if I got it. The code, after this patch from my v6:
https://lore.kernel.org/qemu-devel/5710c364d7ef6cdab6b2f1e127ef191bdf84e8c2.1723119423.git.mchehab+huawei@kernel.org/T/#u
Already stores two of the three address offsets via
bios_linker_loader_add_pointer(), e. g. it is similar to the
code below (I simplified the code to make the example clearer):
<snip>
/* From hw/arm/virt-acpi-build.c */
static
void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
{
...
if (vms->ras) {
build_ghes_error_table(tables->hardware_errors, tables->linker);
acpi_add_table(table_offsets, tables_blob);
/* internally, call build_ghes_v2() for SEA and GED notification sources */
acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
vms->oem_table_id);
}
...
}
/* From hw/acpi/ghes.c */
static void build_ghes_v2(GArray *table_data,
enum AcpiGhesNotifyType notify,
BIOSLinker *linker)
{
uint64_t address_offset, ack_offset, block_addr_offset, cper_offset;
enum AcpiHestSourceId source_id;
/*
* Get offsets for either SEA or GED notification - easy to extend
* to all mechanisms like MCE and SCI to better support x86
*/
assert(!acpi_hest_address_offset(notify, &block_addr_offset, &ack_offset,
&cper_offset, &source_id));
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE,
block_addr_offset);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE,
ack_offset);
/* Current code ignores &cper_offset when creating HEST */
}
void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
enum AcpiGhesNotifyType notify)
{
uint64_t cper_addr, read_ack_start_addr;
assert(!ghes_get_hardware_errors_address(notify, NULL, &read_ack_start_addr,
&cper_addr, NULL));
/*
* Use cpu_physical_memory_read/write() to
* - read/store at read_ack_start_addr
* - Write cper block GArray at cper_addr
*/
}
</snip>
We may also store cper_offset there via bios_linker_loader_add_pointer()
and/or use bios_linker_loader_write_pointer(), but I can't see how the
data stored there can be retrieved, nor any advantage of using it instead
of the current code, as, in the end, we'll have 3 addresses that will be
used:
- an address where a pointer to CPER record will be stored;
- an address where the ack will be stored;
- an address where the actual CPER record will be stored.
And those are calculated on a single function and are all stored at the
ACPI table files.
What am I missing?
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-06 14:56 ` Igor Mammedov
@ 2024-08-08 20:58 ` John Snow
2024-08-08 21:51 ` Mauro Carvalho Chehab
2024-08-08 21:21 ` John Snow
2 siblings, 1 reply; 54+ messages in thread
From: John Snow @ 2024-08-08 20:58 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
[-- Attachment #1: Type: text/plain, Size: 25016 bytes --]
On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab <
mchehab+huawei@kernel.org> wrote:
> Using the QMP GHESv2 API requires preparing a raw data array
> containing a CPER record.
>
> Add a helper script with subcommands to prepare such data.
>
> Currently, only ARM Processor error CPER record is supported.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 3 +
> scripts/arm_processor_error.py | 352 +++++++++++++++++++++++++++++++++
> scripts/ghes_inject.py | 59 ++++++
> scripts/qmp_helper.py | 249 +++++++++++++++++++++++
> 4 files changed, 663 insertions(+)
> create mode 100644 scripts/arm_processor_error.py
> create mode 100755 scripts/ghes_inject.py
> create mode 100644 scripts/qmp_helper.py
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 655edcb6688c..e490f69da1de 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2081,6 +2081,9 @@ S: Maintained
> F: hw/arm/ghes_cper.c
> F: hw/acpi/ghes_cper_stub.c
> F: qapi/ghes-cper.json
> +F: scripts/ghes_inject.py
> +F: scripts/arm_processor_error.py
> +F: scripts/qmp_helper.py
>
> ppc4xx
> L: qemu-ppc@nongnu.org
> diff --git a/scripts/arm_processor_error.py
> b/scripts/arm_processor_error.py
> new file mode 100644
> index 000000000000..df4efa508790
> --- /dev/null
> +++ b/scripts/arm_processor_error.py
> @@ -0,0 +1,352 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511
>
Out of curiosity, what tools are you using to delint your files and how are
you invoking them?
I don't really maintain any strict regime for python files under
qemu.git/scripts (yet), so I am mostly curious as to what regimes others
are using currently. I don't see most QEMU contributors checking in pylint
ignores etc directly into the files, so it caught my eye.
~js
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +# TODO: current implementation has dummy defaults.
> +#
> +# For a better implementation, a QMP addition/call is needed to
> +# retrieve some data for ARM Processor Error injection:
> +#
> +# - machine emulation architecture, as ARM current default is
> +# for AArch64;
> +# - ARM registers: power_state, midr, mpidr.
> +
> +import argparse
> +import json
> +
> +from qmp_helper import (qmp_command, get_choice, get_mult_array,
> + get_mult_choices, get_mult_int, bit,
> + data_add, to_guid)
> +
> +# Arm processor EINJ logic
> +#
> +ACPI_GHES_ARM_CPER_LENGTH = 40
> +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
> +
> +# TODO: query it from emulation. Current default valid only for Aarch64
> +CONTEXT_AARCH64_EL1 = 5
> +
> +class ArmProcessorEinj:
> + """
> + Implements ARM Processor Error injection via GHES
> + """
> +
> + def __init__(self):
> + """Initialize the error injection class"""
> +
> + # Valid choice values
> + self.arm_valid_bits = {
> + "mpidr": bit(0),
> + "affinity": bit(1),
> + "running": bit(2),
> + "vendor": bit(3),
> + }
> +
> + self.pei_flags = {
> + "first": bit(0),
> + "last": bit(1),
> + "propagated": bit(2),
> + "overflow": bit(3),
> + }
> +
> + self.pei_error_types = {
> + "cache": bit(1),
> + "tlb": bit(2),
> + "bus": bit(3),
> + "micro-arch": bit(4),
> + }
> +
> + self.pei_valid_bits = {
> + "multiple-error": bit(0),
> + "flags": bit(1),
> + "error-info": bit(2),
> + "virt-addr": bit(3),
> + "phy-addr": bit(4),
> + }
> +
> + self.data = bytearray()
> +
> + def create_subparser(self, subparsers):
> + """Add a subparser to handle for the error fields"""
> +
> + parser = subparsers.add_parser("arm",
> + help="Generate an ARM processor
> CPER")
> +
> + arm_valid_bits = ",".join(self.arm_valid_bits.keys())
> + flags = ",".join(self.pei_flags.keys())
> + error_types = ",".join(self.pei_error_types.keys())
> + pei_valid_bits = ",".join(self.arm_valid_bits.keys())
> +
> + # UEFI N.16 ARM Validation bits
> + g_arm = parser.add_argument_group("ARM processor")
> + g_arm.add_argument("--arm", "--arm-valid",
> + help=f"ARM valid bits: {arm_valid_bits}")
> + g_arm.add_argument("-a", "--affinity", "--level",
> "--affinity-level",
> + type=lambda x: int(x, 0),
> + help="Affinity level (when multiple levels
> apply)")
> + g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
> + help="Multiprocessor Affinity Register")
> + g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
> + help="Main ID Register")
> + g_arm.add_argument("-r", "--running",
> + action=argparse.BooleanOptionalAction,
> + default=None,
> + help="Indicates if the processor is running or
> not")
> + g_arm.add_argument("--psci", "--psci-state",
> + type=lambda x: int(x, 0),
> + help="Power State Coordination Interface -
> PSCI state")
> +
> + # TODO: Add vendor-specific support
> +
> + # UEFI N.17 bitmaps (type and flags)
> + g_pei = parser.add_argument_group("ARM Processor Error Info
> (PEI)")
> + g_pei.add_argument("-t", "--type", nargs="+",
> + help=f"one or more error types: {error_types}")
> + g_pei.add_argument("-f", "--flags", nargs="*",
> + help=f"zero or more error flags: {flags}")
> + g_pei.add_argument("-V", "--pei-valid", "--error-valid",
> nargs="*",
> + help=f"zero or more PEI valid bits:
> {pei_valid_bits}")
> +
> + # UEFI N.17 Integer values
> + g_pei.add_argument("-m", "--multiple-error", nargs="+",
> + help="Number of errors: 0: Single error, 1:
> Multiple errors, 2-65535: Error count if known")
> + g_pei.add_argument("-e", "--error-info", nargs="+",
> + help="Error information (UEFI 2.10 tables N.18 to
> N.20)")
> + g_pei.add_argument("-p", "--physical-address", nargs="+",
> + help="Physical address")
> + g_pei.add_argument("-v", "--virtual-address", nargs="+",
> + help="Virtual address")
> +
> + # UEFI N.21 Context
> + g_ctx = parser.add_argument_group("Processor Context")
> + g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
> + help="Type of the context (0=ARM32 GPR, 5=ARM64
> EL1, other values supported)")
> + g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
> + help="Minimal size of the context")
> + g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
> + help="Comma-separated arrays for each context")
> +
> + # Vendor-specific data
> + g_vendor = parser.add_argument_group("Vendor-specific data")
> + g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
> + help="Vendor-specific byte arrays of data")
> +
> + def parse_args(self, args):
> + """Parse subcommand arguments"""
> +
> + cper = {}
> + pei = {}
> + ctx = {}
> + vendor = {}
> +
> + arg = vars(args)
> +
> + # Handle global parameters
> + if args.arm:
> + arm_valid_init = False
> + cper["valid"] = get_choice(name="valid",
> + value=args.arm,
> + choices=self.arm_valid_bits,
> + suffixes=["-error", "-err"])
> + else:
> + cper["valid"] = 0
> + arm_valid_init = True
> +
> + if "running" in arg:
> + if args.running:
> + cper["running-state"] = bit(0)
> + else:
> + cper["running-state"] = 0
> + else:
> + cper["running-state"] = 0
> +
> + if arm_valid_init:
> + if args.affinity:
> + cper["valid"] |= self.arm_valid_bits["affinity"]
> +
> + if args.mpidr:
> + cper["valid"] |= self.arm_valid_bits["mpidr"]
> +
> + if "running-state" in cper:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + if args.psci:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + # Handle PEI
> + if not args.type:
> + args.type = ["cache-error"]
> +
> + get_mult_choices(
> + pei,
> + name="valid",
> + values=args.pei_valid,
> + choices=self.pei_valid_bits,
> + suffixes=["-valid", "-info", "--information", "--addr"],
> + )
> + get_mult_choices(
> + pei,
> + name="type",
> + values=args.type,
> + choices=self.pei_error_types,
> + suffixes=["-error", "-err"],
> + )
> + get_mult_choices(
> + pei,
> + name="flags",
> + values=args.flags,
> + choices=self.pei_flags,
> + suffixes=["-error", "-cap"],
> + )
> + get_mult_int(pei, "error-info", args.error_info)
> + get_mult_int(pei, "multiple-error", args.multiple_error)
> + get_mult_int(pei, "phy-addr", args.physical_address)
> + get_mult_int(pei, "virt-addr", args.virtual_address)
> +
> + # Handle context
> + get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
> + get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
> + get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
> +
> + get_mult_array(vendor, "bytes", args.vendor, max_val=255)
> +
> + # Store PEI
> + pei_data = bytearray()
> + default_flags = self.pei_flags["first"]
> + default_flags |= self.pei_flags["last"]
> +
> + error_info_num = 0
> +
> + for i, p in pei.items(): # pylint: disable=W0612
> + error_info_num += 1
> +
> + # UEFI 2.10 doesn't define how to encode error information
> + # when multiple types are raised. So, provide a default only
> + # if a single type is there
> + if "error-info" not in p:
> + if p["type"] == bit(1):
> + p["error-info"] = 0x0091000F
> + if p["type"] == bit(2):
> + p["error-info"] = 0x0054007F
> + if p["type"] == bit(3):
> + p["error-info"] = 0x80D6460FFF
> + if p["type"] == bit(4):
> + p["error-info"] = 0x78DA03FF
> +
> + if "valid" not in p:
> + p["valid"] = 0
> + if "multiple-error" in p:
> + p["valid"] |= self.pei_valid_bits["multiple-error"]
> +
> + if "flags" in p:
> + p["valid"] |= self.pei_valid_bits["flags"]
> +
> + if "error-info" in p:
> + p["valid"] |= self.pei_valid_bits["error-info"]
> +
> + if "phy-addr" in p:
> + p["valid"] |= self.pei_valid_bits["phy-addr"]
> +
> + if "virt-addr" in p:
> + p["valid"] |= self.pei_valid_bits["virt-addr"]
> +
> + # Version
> + data_add(pei_data, 0, 1)
> +
> + data_add(pei_data, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
> +
> + data_add(pei_data, p["valid"], 2)
> + data_add(pei_data, p["type"], 1)
> + data_add(pei_data, p.get("multiple-error", 1), 2)
> + data_add(pei_data, p.get("flags", default_flags), 1)
> + data_add(pei_data, p.get("error-info", 0), 8)
> + data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
> + data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
> +
> + # Store Context
> + ctx_data = bytearray()
> + context_info_num = 0
> +
> + if ctx:
> + for k in sorted(ctx.keys()):
> + context_info_num += 1
> +
> + if "type" not in ctx:
> + ctx[k]["type"] = CONTEXT_AARCH64_EL1
> +
> + if "register" not in ctx:
> + ctx[k]["register"] = []
> +
> + reg_size = len(ctx[k]["register"])
> + size = 0
> +
> + if "minimal-size" in ctx:
> + size = ctx[k]["minimal-size"]
> +
> + size = max(size, reg_size)
> +
> + size = (size + 1) % 0xFFFE
> +
> + # Version
> + data_add(ctx_data, 0, 2)
> +
> + data_add(ctx_data, ctx[k]["type"], 2)
> +
> + data_add(ctx_data, 8 * size, 4)
> +
> + for r in ctx[k]["register"]:
> + data_add(ctx_data, r, 8)
> +
> + for i in range(reg_size, size): # pylint: disable=W0612
> + data_add(ctx_data, 0, 8)
> +
> + # Vendor-specific bytes are not grouped
> + vendor_data = bytearray()
> + if vendor:
> + for k in sorted(vendor.keys()):
> + for b in vendor[k]["bytes"]:
> + data_add(vendor_data, b, 1)
> +
> + # Encode ARM Processor Error
> + data = bytearray()
> +
> + data_add(data, cper["valid"], 4)
> +
> + data_add(data, error_info_num, 2)
> + data_add(data, context_info_num, 2)
> +
> + # Calculate the length of the CPER data
> + cper_length = ACPI_GHES_ARM_CPER_LENGTH
> + cper_length += len(pei_data)
> + cper_length += len(vendor_data)
> + cper_length += len(ctx_data)
> + data_add(data, cper_length, 4)
> +
> + data_add(data, arg.get("affinity-level", 0), 1)
> +
> + # Reserved
> + data_add(data, 0, 3)
> +
> + data_add(data, arg.get("mpidr-el1", 0), 8)
> + data_add(data, arg.get("midr-el1", 0), 8)
> + data_add(data, cper["running-state"], 4)
> + data_add(data, arg.get("psci-state", 0), 4)
> +
> + # Add PEI
> + data.extend(pei_data)
> + data.extend(ctx_data)
> + data.extend(vendor_data)
> +
> + self.data = data
> +
> + def run(self, host, port):
> + """Execute QMP commands"""
> +
> + guid = to_guid(0xE19E3D16, 0xBC11, 0x11E4,
> + [0x9C, 0xAA, 0xC2, 0x05,
> + 0x1D, 0x5D, 0x46, 0xB0])
> +
> + qmp_command(host, port, guid, self.data)
> diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
> new file mode 100755
> index 000000000000..8415ccbbc53d
> --- /dev/null
> +++ b/scripts/ghes_inject.py
> @@ -0,0 +1,59 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import argparse
> +
> +from arm_processor_error import ArmProcessorEinj
> +
> +EINJ_DESCRIPTION = """
> +Handle ACPI GHESv2 error injection logic QEMU QMP interface.\n
> +
> +It allows using UEFI BIOS EINJ features to generate GHES records.
> +
> +It helps testing Linux CPER and GHES drivers and to test rasdaemon
> +error handling logic.
> +
> +Currently, it support ARM processor error injection for ARM processor
> +events, being compatible with UEFI 2.9A Errata.
> +
> +This small utility works together with those QEMU additions:
> +- https://gitlab.com/mchehab_kernel/qemu/-/tree/arm-error-inject-v2
> +"""
> +
> +def main():
> + """Main program"""
> +
> + # Main parser - handle generic args like QEMU QMP TCP socket options
> + parser = argparse.ArgumentParser(prog="einj.py",
> +
> formatter_class=argparse.RawDescriptionHelpFormatter,
> + usage="%(prog)s [options]",
> + description=EINJ_DESCRIPTION,
> + epilog="If a field is not defined, a
> default value will be applied by QEMU.")
> +
> + g_options = parser.add_argument_group("QEMU QMP socket options")
> + g_options.add_argument("-H", "--host", default="localhost", type=str,
> + help="host name")
> + g_options.add_argument("-P", "--port", default=4445, type=int,
> + help="TCP port number")
> +
> + arm_einj = ArmProcessorEinj()
> +
> + # Call subparsers
> + subparsers = parser.add_subparsers(dest='command')
> +
> + arm_einj.create_subparser(subparsers)
> +
> + args = parser.parse_args()
> +
> + # Handle subparser commands
> + if args.command == "arm":
> + arm_einj.parse_args(args)
> + arm_einj.run(args.host, args.port)
> +
> +
> +if __name__ == "__main__":
> + main()
> diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> new file mode 100644
> index 000000000000..13fae7a7af0e
> --- /dev/null
> +++ b/scripts/qmp_helper.py
> @@ -0,0 +1,249 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0915, W0511
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import json
> +import socket
> +import sys
> +
> +from base64 import b64encode
> +
> +#
> +# Socket QMP send command
> +#
> +def qmp_command(host, port, guid, data):
> + """Send commands to QEMU though QMP TCP socket"""
> +
> + # Fill the commands to be sent
> + commands = []
> +
> + # Needed to negotiate QMP and for QEMU to accept the command
> + commands.append('{ "execute": "qmp_capabilities" } ')
> +
> + base64_data = b64encode(bytes(data)).decode('ascii')
> +
> + cmd_arg = {
> + 'cper': {
> + 'notification-type': guid,
> + "raw-data": base64_data
> + }
> + }
> +
> + command = '{ "execute": "ghes-cper", '
> + command += '"arguments": ' + json.dumps(cmd_arg) + " }"
> +
> + commands.append(command)
> +
> + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> + try:
> + s.connect((host, port))
> + except ConnectionRefusedError:
> + sys.exit(f"Can't connect to QMP host {host}:{port}")
> +
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if "QMP" not in obj:
> + print(f"Invalid QMP answer: {data.decode("utf-8")}")
> + s.close()
> + return
> +
> + for i, command in enumerate(commands):
> + s.sendall(command.encode("utf-8"))
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + if obj["return"]:
> + print(json.dumps(obj["return"]))
> + elif i > 0:
> + print("Error injected.")
> + elif isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.shutdown(socket.SHUT_WR)
> + while 1:
> + data = s.recv(1024)
> + if data == b"":
> + break
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + print(json.dumps(obj["return"]))
> + if isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.close()
> +
> +
> +#
> +# Helper routines to handle multiple choice arguments
> +#
> +def get_choice(name, value, choices, suffixes=None):
> + """Produce a list from multiple choice argument"""
> +
> + new_values = 0
> +
> + if not value:
> + return new_values
> +
> + for val in value.split(","):
> + val = val.lower()
> +
> + if suffixes:
> + for suffix in suffixes:
> + val = val.removesuffix(suffix)
> +
> + if val not in choices.keys():
> + sys.exit(f"Error on '{name}': choice {val} is invalid.")
> +
> + val = choices[val]
> +
> + new_values |= val
> +
> + return new_values
> +
> +
> +def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
> + """Add numbered hashes from integer lists"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + if not values:
> + i = 0
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = []
> + return
> +
> + i = 0
> + for value in values:
> + for val in value.split(","):
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if max_val and val > max_val:
> + sys.exit(f"Error on '{name}': {val} is too little")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + if name not in mult[i]:
> + mult[i][name] = []
> +
> + mult[i][name].append(val)
> +
> + i += 1
> +
> +
> +def get_mult_choices(mult, name, values, choices,
> + suffixes=None, allow_zero=False):
> + """Add numbered hashes from multiple choice arguments"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + new_values = get_choice(name, val, choices, suffixes)
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = new_values
> + i += 1
> +
> +
> +def get_mult_int(mult, name, values, allow_zero=False):
> + """Add numbered hashes from integer arguments"""
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = val
> + i += 1
> +
> +
> +#
> +# Data encode helper functions
> +#
> +def bit(b):
> + """Simple macro to define a bit on a bitmask"""
> + return 1 << b
> +
> +
> +def data_add(data, value, num_bytes):
> + """Adds bytes from value inside a bitarray"""
> +
> + data.extend(value.to_bytes(num_bytes, byteorder="little"))
> +
> +def to_guid(time_low, time_mid, time_high, nodes):
> + """Create an GUID string"""
> +
> + assert(len(nodes) == 8)
> +
> + clock = nodes[0] << 8 | nodes[1]
> +
> + node = 0
> + for i in range(2, len(nodes)):
> + node = node << 8 | nodes[i]
> +
> + s = f"{time_low:08x}-{time_mid:04x}-"
> + s += f"{time_high:04x}-{clock:04x}-{node:012x}"
> +
> + return s
> --
> 2.45.2
>
>
[-- Attachment #2: Type: text/html, Size: 32957 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-06 14:56 ` Igor Mammedov
2024-08-08 20:58 ` John Snow
@ 2024-08-08 21:21 ` John Snow
2024-08-08 22:41 ` Mauro Carvalho Chehab
2 siblings, 1 reply; 54+ messages in thread
From: John Snow @ 2024-08-08 21:21 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
[-- Attachment #1: Type: text/plain, Size: 25729 bytes --]
On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab <
mchehab+huawei@kernel.org> wrote:
> Using the QMP GHESv2 API requires preparing a raw data array
> containing a CPER record.
>
> Add a helper script with subcommands to prepare such data.
>
> Currently, only ARM Processor error CPER record is supported.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> MAINTAINERS | 3 +
> scripts/arm_processor_error.py | 352 +++++++++++++++++++++++++++++++++
> scripts/ghes_inject.py | 59 ++++++
> scripts/qmp_helper.py | 249 +++++++++++++++++++++++
> 4 files changed, 663 insertions(+)
> create mode 100644 scripts/arm_processor_error.py
> create mode 100755 scripts/ghes_inject.py
> create mode 100644 scripts/qmp_helper.py
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 655edcb6688c..e490f69da1de 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2081,6 +2081,9 @@ S: Maintained
> F: hw/arm/ghes_cper.c
> F: hw/acpi/ghes_cper_stub.c
> F: qapi/ghes-cper.json
> +F: scripts/ghes_inject.py
> +F: scripts/arm_processor_error.py
> +F: scripts/qmp_helper.py
>
> ppc4xx
> L: qemu-ppc@nongnu.org
> diff --git a/scripts/arm_processor_error.py
> b/scripts/arm_processor_error.py
> new file mode 100644
> index 000000000000..df4efa508790
> --- /dev/null
> +++ b/scripts/arm_processor_error.py
> @@ -0,0 +1,352 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +# TODO: current implementation has dummy defaults.
> +#
> +# For a better implementation, a QMP addition/call is needed to
> +# retrieve some data for ARM Processor Error injection:
> +#
> +# - machine emulation architecture, as ARM current default is
> +# for AArch64;
> +# - ARM registers: power_state, midr, mpidr.
> +
> +import argparse
> +import json
> +
> +from qmp_helper import (qmp_command, get_choice, get_mult_array,
> + get_mult_choices, get_mult_int, bit,
> + data_add, to_guid)
> +
> +# Arm processor EINJ logic
> +#
> +ACPI_GHES_ARM_CPER_LENGTH = 40
> +ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
> +
> +# TODO: query it from emulation. Current default valid only for Aarch64
> +CONTEXT_AARCH64_EL1 = 5
> +
> +class ArmProcessorEinj:
> + """
> + Implements ARM Processor Error injection via GHES
> + """
> +
> + def __init__(self):
> + """Initialize the error injection class"""
> +
> + # Valid choice values
> + self.arm_valid_bits = {
> + "mpidr": bit(0),
> + "affinity": bit(1),
> + "running": bit(2),
> + "vendor": bit(3),
> + }
> +
> + self.pei_flags = {
> + "first": bit(0),
> + "last": bit(1),
> + "propagated": bit(2),
> + "overflow": bit(3),
> + }
> +
> + self.pei_error_types = {
> + "cache": bit(1),
> + "tlb": bit(2),
> + "bus": bit(3),
> + "micro-arch": bit(4),
> + }
> +
> + self.pei_valid_bits = {
> + "multiple-error": bit(0),
> + "flags": bit(1),
> + "error-info": bit(2),
> + "virt-addr": bit(3),
> + "phy-addr": bit(4),
> + }
> +
> + self.data = bytearray()
> +
> + def create_subparser(self, subparsers):
> + """Add a subparser to handle for the error fields"""
> +
> + parser = subparsers.add_parser("arm",
> + help="Generate an ARM processor
> CPER")
> +
> + arm_valid_bits = ",".join(self.arm_valid_bits.keys())
> + flags = ",".join(self.pei_flags.keys())
> + error_types = ",".join(self.pei_error_types.keys())
> + pei_valid_bits = ",".join(self.arm_valid_bits.keys())
> +
> + # UEFI N.16 ARM Validation bits
> + g_arm = parser.add_argument_group("ARM processor")
> + g_arm.add_argument("--arm", "--arm-valid",
> + help=f"ARM valid bits: {arm_valid_bits}")
> + g_arm.add_argument("-a", "--affinity", "--level",
> "--affinity-level",
> + type=lambda x: int(x, 0),
> + help="Affinity level (when multiple levels
> apply)")
> + g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
> + help="Multiprocessor Affinity Register")
> + g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
> + help="Main ID Register")
> + g_arm.add_argument("-r", "--running",
> + action=argparse.BooleanOptionalAction,
> + default=None,
> + help="Indicates if the processor is running or
> not")
> + g_arm.add_argument("--psci", "--psci-state",
> + type=lambda x: int(x, 0),
> + help="Power State Coordination Interface -
> PSCI state")
> +
> + # TODO: Add vendor-specific support
> +
> + # UEFI N.17 bitmaps (type and flags)
> + g_pei = parser.add_argument_group("ARM Processor Error Info
> (PEI)")
> + g_pei.add_argument("-t", "--type", nargs="+",
> + help=f"one or more error types: {error_types}")
> + g_pei.add_argument("-f", "--flags", nargs="*",
> + help=f"zero or more error flags: {flags}")
> + g_pei.add_argument("-V", "--pei-valid", "--error-valid",
> nargs="*",
> + help=f"zero or more PEI valid bits:
> {pei_valid_bits}")
> +
> + # UEFI N.17 Integer values
> + g_pei.add_argument("-m", "--multiple-error", nargs="+",
> + help="Number of errors: 0: Single error, 1:
> Multiple errors, 2-65535: Error count if known")
> + g_pei.add_argument("-e", "--error-info", nargs="+",
> + help="Error information (UEFI 2.10 tables N.18 to
> N.20)")
> + g_pei.add_argument("-p", "--physical-address", nargs="+",
> + help="Physical address")
> + g_pei.add_argument("-v", "--virtual-address", nargs="+",
> + help="Virtual address")
> +
> + # UEFI N.21 Context
> + g_ctx = parser.add_argument_group("Processor Context")
> + g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
> + help="Type of the context (0=ARM32 GPR, 5=ARM64
> EL1, other values supported)")
> + g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
> + help="Minimal size of the context")
> + g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
> + help="Comma-separated arrays for each context")
> +
> + # Vendor-specific data
> + g_vendor = parser.add_argument_group("Vendor-specific data")
> + g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
> + help="Vendor-specific byte arrays of data")
> +
> + def parse_args(self, args):
> + """Parse subcommand arguments"""
> +
> + cper = {}
> + pei = {}
> + ctx = {}
> + vendor = {}
> +
> + arg = vars(args)
> +
> + # Handle global parameters
> + if args.arm:
> + arm_valid_init = False
> + cper["valid"] = get_choice(name="valid",
> + value=args.arm,
> + choices=self.arm_valid_bits,
> + suffixes=["-error", "-err"])
> + else:
> + cper["valid"] = 0
> + arm_valid_init = True
> +
> + if "running" in arg:
> + if args.running:
> + cper["running-state"] = bit(0)
> + else:
> + cper["running-state"] = 0
> + else:
> + cper["running-state"] = 0
> +
> + if arm_valid_init:
> + if args.affinity:
> + cper["valid"] |= self.arm_valid_bits["affinity"]
> +
> + if args.mpidr:
> + cper["valid"] |= self.arm_valid_bits["mpidr"]
> +
> + if "running-state" in cper:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + if args.psci:
> + cper["valid"] |= self.arm_valid_bits["running"]
> +
> + # Handle PEI
> + if not args.type:
> + args.type = ["cache-error"]
> +
> + get_mult_choices(
> + pei,
> + name="valid",
> + values=args.pei_valid,
> + choices=self.pei_valid_bits,
> + suffixes=["-valid", "-info", "--information", "--addr"],
> + )
> + get_mult_choices(
> + pei,
> + name="type",
> + values=args.type,
> + choices=self.pei_error_types,
> + suffixes=["-error", "-err"],
> + )
> + get_mult_choices(
> + pei,
> + name="flags",
> + values=args.flags,
> + choices=self.pei_flags,
> + suffixes=["-error", "-cap"],
> + )
> + get_mult_int(pei, "error-info", args.error_info)
> + get_mult_int(pei, "multiple-error", args.multiple_error)
> + get_mult_int(pei, "phy-addr", args.physical_address)
> + get_mult_int(pei, "virt-addr", args.virtual_address)
> +
> + # Handle context
> + get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
> + get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
> + get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
> +
> + get_mult_array(vendor, "bytes", args.vendor, max_val=255)
> +
> + # Store PEI
> + pei_data = bytearray()
> + default_flags = self.pei_flags["first"]
> + default_flags |= self.pei_flags["last"]
> +
> + error_info_num = 0
> +
> + for i, p in pei.items(): # pylint: disable=W0612
> + error_info_num += 1
> +
> + # UEFI 2.10 doesn't define how to encode error information
> + # when multiple types are raised. So, provide a default only
> + # if a single type is there
> + if "error-info" not in p:
> + if p["type"] == bit(1):
> + p["error-info"] = 0x0091000F
> + if p["type"] == bit(2):
> + p["error-info"] = 0x0054007F
> + if p["type"] == bit(3):
> + p["error-info"] = 0x80D6460FFF
> + if p["type"] == bit(4):
> + p["error-info"] = 0x78DA03FF
> +
> + if "valid" not in p:
> + p["valid"] = 0
> + if "multiple-error" in p:
> + p["valid"] |= self.pei_valid_bits["multiple-error"]
> +
> + if "flags" in p:
> + p["valid"] |= self.pei_valid_bits["flags"]
> +
> + if "error-info" in p:
> + p["valid"] |= self.pei_valid_bits["error-info"]
> +
> + if "phy-addr" in p:
> + p["valid"] |= self.pei_valid_bits["phy-addr"]
> +
> + if "virt-addr" in p:
> + p["valid"] |= self.pei_valid_bits["virt-addr"]
> +
> + # Version
> + data_add(pei_data, 0, 1)
> +
> + data_add(pei_data, ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
> +
> + data_add(pei_data, p["valid"], 2)
> + data_add(pei_data, p["type"], 1)
> + data_add(pei_data, p.get("multiple-error", 1), 2)
> + data_add(pei_data, p.get("flags", default_flags), 1)
> + data_add(pei_data, p.get("error-info", 0), 8)
> + data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
> + data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
> +
> + # Store Context
> + ctx_data = bytearray()
> + context_info_num = 0
> +
> + if ctx:
> + for k in sorted(ctx.keys()):
> + context_info_num += 1
> +
> + if "type" not in ctx:
> + ctx[k]["type"] = CONTEXT_AARCH64_EL1
> +
> + if "register" not in ctx:
> + ctx[k]["register"] = []
> +
> + reg_size = len(ctx[k]["register"])
> + size = 0
> +
> + if "minimal-size" in ctx:
> + size = ctx[k]["minimal-size"]
> +
> + size = max(size, reg_size)
> +
> + size = (size + 1) % 0xFFFE
> +
> + # Version
> + data_add(ctx_data, 0, 2)
> +
> + data_add(ctx_data, ctx[k]["type"], 2)
> +
> + data_add(ctx_data, 8 * size, 4)
> +
> + for r in ctx[k]["register"]:
> + data_add(ctx_data, r, 8)
> +
> + for i in range(reg_size, size): # pylint: disable=W0612
> + data_add(ctx_data, 0, 8)
> +
> + # Vendor-specific bytes are not grouped
> + vendor_data = bytearray()
> + if vendor:
> + for k in sorted(vendor.keys()):
> + for b in vendor[k]["bytes"]:
> + data_add(vendor_data, b, 1)
> +
> + # Encode ARM Processor Error
> + data = bytearray()
> +
> + data_add(data, cper["valid"], 4)
> +
> + data_add(data, error_info_num, 2)
> + data_add(data, context_info_num, 2)
> +
> + # Calculate the length of the CPER data
> + cper_length = ACPI_GHES_ARM_CPER_LENGTH
> + cper_length += len(pei_data)
> + cper_length += len(vendor_data)
> + cper_length += len(ctx_data)
> + data_add(data, cper_length, 4)
> +
> + data_add(data, arg.get("affinity-level", 0), 1)
> +
> + # Reserved
> + data_add(data, 0, 3)
> +
> + data_add(data, arg.get("mpidr-el1", 0), 8)
> + data_add(data, arg.get("midr-el1", 0), 8)
> + data_add(data, cper["running-state"], 4)
> + data_add(data, arg.get("psci-state", 0), 4)
> +
> + # Add PEI
> + data.extend(pei_data)
> + data.extend(ctx_data)
> + data.extend(vendor_data)
> +
> + self.data = data
> +
> + def run(self, host, port):
> + """Execute QMP commands"""
> +
> + guid = to_guid(0xE19E3D16, 0xBC11, 0x11E4,
> + [0x9C, 0xAA, 0xC2, 0x05,
> + 0x1D, 0x5D, 0x46, 0xB0])
> +
> + qmp_command(host, port, guid, self.data)
> diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
> new file mode 100755
> index 000000000000..8415ccbbc53d
> --- /dev/null
> +++ b/scripts/ghes_inject.py
> @@ -0,0 +1,59 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import argparse
> +
> +from arm_processor_error import ArmProcessorEinj
> +
> +EINJ_DESCRIPTION = """
> +Handle ACPI GHESv2 error injection logic QEMU QMP interface.\n
> +
> +It allows using UEFI BIOS EINJ features to generate GHES records.
> +
> +It helps testing Linux CPER and GHES drivers and to test rasdaemon
> +error handling logic.
> +
> +Currently, it support ARM processor error injection for ARM processor
> +events, being compatible with UEFI 2.9A Errata.
> +
> +This small utility works together with those QEMU additions:
> +- https://gitlab.com/mchehab_kernel/qemu/-/tree/arm-error-inject-v2
> +"""
> +
> +def main():
> + """Main program"""
> +
> + # Main parser - handle generic args like QEMU QMP TCP socket options
> + parser = argparse.ArgumentParser(prog="einj.py",
> +
> formatter_class=argparse.RawDescriptionHelpFormatter,
> + usage="%(prog)s [options]",
> + description=EINJ_DESCRIPTION,
> + epilog="If a field is not defined, a
> default value will be applied by QEMU.")
> +
> + g_options = parser.add_argument_group("QEMU QMP socket options")
> + g_options.add_argument("-H", "--host", default="localhost", type=str,
> + help="host name")
> + g_options.add_argument("-P", "--port", default=4445, type=int,
> + help="TCP port number")
> +
> + arm_einj = ArmProcessorEinj()
> +
> + # Call subparsers
> + subparsers = parser.add_subparsers(dest='command')
> +
> + arm_einj.create_subparser(subparsers)
> +
> + args = parser.parse_args()
> +
> + # Handle subparser commands
> + if args.command == "arm":
> + arm_einj.parse_args(args)
> + arm_einj.run(args.host, args.port)
> +
> +
> +if __name__ == "__main__":
> + main()
> diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> new file mode 100644
> index 000000000000..13fae7a7af0e
> --- /dev/null
> +++ b/scripts/qmp_helper.py
>
I'm going to admit I only glanced at this very briefly, but -- is there a
chance you could use qemu.git/python/qemu/qmp instead of writing your own
helpers here?
If *NOT*, is there something that I need to add to our QMP library to
facilitate your script?
> @@ -0,0 +1,249 @@
> +#!/usr/bin/env python3
> +#
> +# pylint: disable=C0301, C0114, R0912, R0913, R0915, W0511
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +
> +import json
> +import socket
> +import sys
> +
> +from base64 import b64encode
> +
> +#
> +# Socket QMP send command
> +#
> +def qmp_command(host, port, guid, data):
> + """Send commands to QEMU though QMP TCP socket"""
> +
> + # Fill the commands to be sent
> + commands = []
> +
> + # Needed to negotiate QMP and for QEMU to accept the command
> + commands.append('{ "execute": "qmp_capabilities" } ')
> +
> + base64_data = b64encode(bytes(data)).decode('ascii')
> +
> + cmd_arg = {
> + 'cper': {
> + 'notification-type': guid,
> + "raw-data": base64_data
> + }
> + }
> +
> + command = '{ "execute": "ghes-cper", '
> + command += '"arguments": ' + json.dumps(cmd_arg) + " }"
> +
> + commands.append(command)
> +
> + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> + try:
> + s.connect((host, port))
> + except ConnectionRefusedError:
> + sys.exit(f"Can't connect to QMP host {host}:{port}")
>
You should be able to use e.g.
legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g.
from qemu.qmp.legacy import QEMUMonitorProtocol
qmp = QEMUMonitorProtocol((host, port))
qmp.connect(negotiate=True)
If you want to run the script w/o setting up a virtual environment or
installing the package, take a look at the hacks in scripts/qmp/ for how I
support e.g. qom-get directly from the source tree.
> +
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if "QMP" not in obj:
> + print(f"Invalid QMP answer: {data.decode("utf-8")}")
> + s.close()
> + return
> +
> + for i, command in enumerate(commands):
>
Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw
or qmp.cmd_obj (returns the QMP response as the return value even if it was
an error.)
More details:
https://qemu.readthedocs.io/projects/python-qemu-qmp/en/latest/qemu.qmp.legacy.html
There's also an async version, but it doesn't look like you require that
complexity, so you can ignore it.
~~js
+ s.sendall(command.encode("utf-8"))
> + data = s.recv(1024)
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + if obj["return"]:
> + print(json.dumps(obj["return"]))
> + elif i > 0:
> + print("Error injected.")
> + elif isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.shutdown(socket.SHUT_WR)
> + while 1:
> + data = s.recv(1024)
> + if data == b"":
> + break
> + try:
> + obj = json.loads(data.decode("utf-8"))
> + except json.JSONDecodeError as e:
> + print(f"Invalid QMP answer: {e}")
> + s.close()
> + return
> +
> + if isinstance(obj.get("return"), dict):
> + print(json.dumps(obj["return"]))
> + if isinstance(obj.get("error"), dict):
> + error = obj["error"]
> + print(f'{error["class"]}: {error["desc"]}')
> + else:
> + print(json.dumps(obj))
> +
> + s.close()
> +
> +
> +#
> +# Helper routines to handle multiple choice arguments
> +#
> +def get_choice(name, value, choices, suffixes=None):
> + """Produce a list from multiple choice argument"""
> +
> + new_values = 0
> +
> + if not value:
> + return new_values
> +
> + for val in value.split(","):
> + val = val.lower()
> +
> + if suffixes:
> + for suffix in suffixes:
> + val = val.removesuffix(suffix)
> +
> + if val not in choices.keys():
> + sys.exit(f"Error on '{name}': choice {val} is invalid.")
> +
> + val = choices[val]
> +
> + new_values |= val
> +
> + return new_values
> +
> +
> +def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
> + """Add numbered hashes from integer lists"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + if not values:
> + i = 0
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = []
> + return
> +
> + i = 0
> + for value in values:
> + for val in value.split(","):
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if max_val and val > max_val:
> + sys.exit(f"Error on '{name}': {val} is too little")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + if name not in mult[i]:
> + mult[i][name] = []
> +
> + mult[i][name].append(val)
> +
> + i += 1
> +
> +
> +def get_mult_choices(mult, name, values, choices,
> + suffixes=None, allow_zero=False):
> + """Add numbered hashes from multiple choice arguments"""
> +
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + new_values = get_choice(name, val, choices, suffixes)
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = new_values
> + i += 1
> +
> +
> +def get_mult_int(mult, name, values, allow_zero=False):
> + """Add numbered hashes from integer arguments"""
> + if not allow_zero:
> + if not values:
> + return
> + else:
> + if values is None:
> + return
> +
> + i = 0
> + for val in values:
> + try:
> + val = int(val, 0)
> + except ValueError:
> + sys.exit(f"Error on '{name}': {val} is not an integer")
> +
> + if val < 0:
> + sys.exit(f"Error on '{name}': {val} is not unsigned")
> +
> + if i not in mult:
> + mult[i] = {}
> +
> + mult[i][name] = val
> + i += 1
> +
> +
> +#
> +# Data encode helper functions
> +#
> +def bit(b):
> + """Simple macro to define a bit on a bitmask"""
> + return 1 << b
> +
> +
> +def data_add(data, value, num_bytes):
> + """Adds bytes from value inside a bitarray"""
> +
> + data.extend(value.to_bytes(num_bytes, byteorder="little"))
> +
> +def to_guid(time_low, time_mid, time_high, nodes):
> + """Create an GUID string"""
> +
> + assert(len(nodes) == 8)
> +
> + clock = nodes[0] << 8 | nodes[1]
> +
> + node = 0
> + for i in range(2, len(nodes)):
> + node = node << 8 | nodes[i]
> +
> + s = f"{time_low:08x}-{time_mid:04x}-"
> + s += f"{time_high:04x}-{clock:04x}-{node:012x}"
> +
> + return s
> --
> 2.45.2
>
>
[-- Attachment #2: Type: text/html, Size: 34254 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-08 20:58 ` John Snow
@ 2024-08-08 21:51 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-08 21:51 UTC (permalink / raw)
To: John Snow
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
Em Thu, 8 Aug 2024 16:58:38 -0400
John Snow <jsnow@redhat.com> escreveu:
> On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab <
> mchehab+huawei@kernel.org> wrote:
>
> > +#!/usr/bin/env python3
> > +#
> > +# pylint: disable=C0301, C0114, R0912, R0913, R0914, R0915, W0511
> >
>
> Out of curiosity, what tools are you using to delint your files
Primarily I use pylint, almost always with disable line(s), as those lint
tools have some warnings that sound too silly (like too many/too low
functions/branches/arguments...). From time to time, I review the disable
lines, to keep the code as clean as desired.
Sometimes I also use pep8 (now named as pycodestyle) and black, specially
when I want some autoformat hints (I manually commit the hunks that make
sense), but I prefer pylint as the primary checking tool. I'm not too
found of the black's coding style, though[1].
[1] For instance, black would do this change:
- g_arm.add_argument("--arm", "--arm-valid",
- help=f"ARM valid bits: {arm_valid_bits}")
+ g_arm.add_argument(
+ "--arm", "--arm-valid", help=f"ARM valid bits: {arm_valid_bits}"
+ )
IMO, the original coding style I wrote is a lot better than black's
suggestion - and it is closer to the C style I use at the Linux Kernel ;-)
> and how are
> you invoking them?
I don't play much with such tools, though. I usually just invoke them with
the python file names(s) without passing any parameters nor creating any
configuration file.
> I don't really maintain any strict regime for python files under
> qemu.git/scripts (yet), so I am mostly curious as to what regimes others
> are using currently. I don't see most QEMU contributors checking in pylint
> ignores etc directly into the files, so it caught my eye.
Having some verification sounds interesting, as it may help preventing
some hidden bugs (like re-defining a variable that it was already used
globally), if such check is not too picky and if stupid warnings can be
bypassed.
Regards,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-08 21:21 ` John Snow
@ 2024-08-08 22:41 ` Mauro Carvalho Chehab
2024-08-08 23:33 ` John Snow
2024-08-09 6:26 ` Mauro Carvalho Chehab
0 siblings, 2 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-08 22:41 UTC (permalink / raw)
To: John Snow
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
Em Thu, 8 Aug 2024 17:21:33 -0400
John Snow <jsnow@redhat.com> escreveu:
> On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab <
> mchehab+huawei@kernel.org> wrote:
>
> > diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> > new file mode 100644
> > index 000000000000..13fae7a7af0e
> > --- /dev/null
> > +++ b/scripts/qmp_helper.py
> >
>
> I'm going to admit I only glanced at this very briefly, but -- is there a
> chance you could use qemu.git/python/qemu/qmp instead of writing your own
> helpers here?
>
> If *NOT*, is there something that I need to add to our QMP library to
> facilitate your script?
I started writing this script to be hosted outside qemu tree, when
we had a very different API.
I noticed later about the QMP, and even tried to write a patch for it,
but I gave up due to asyncio complexity...
Please notice that, on this file, I actually placed three classes:
- qmp
- util
- cper_guid
I could probably make the first one to be an override of QEMUMonitorProtocol
(besides normal open/close/cmd communication, it also contains some
methods that are specific to error inject use case:
- to generate a CPER record;
- to search for data via qom-get.
The other two classes are just common code used by ghes_inject commands.
My idea is to have multiple commands to do different kinds of GHES
error injection, each command on a different file/class.
> > + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > + try:
> > + s.connect((host, port))
> > + except ConnectionRefusedError:
> > + sys.exit(f"Can't connect to QMP host {host}:{port}")
> >
>
> You should be able to use e.g.
>
> legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g.
>
> from qemu.qmp.legacy import QEMUMonitorProtocol
>
> qmp = QEMUMonitorProtocol((host, port))
> qmp.connect(negotiate=True)
That sounds interesting! I give it a try.
> If you want to run the script w/o setting up a virtual environment or
> installing the package, take a look at the hacks in scripts/qmp/ for how I
> support e.g. qom-get directly from the source tree.
Yeah, I saw that already. Doing:
sys.path.append(path.join(qemu_dir, 'python'))
the same way qom-get does should do the trick.
> > +
> > + data = s.recv(1024)
> > + try:
> > + obj = json.loads(data.decode("utf-8"))
> > + except json.JSONDecodeError as e:
> > + print(f"Invalid QMP answer: {e}")
> > + s.close()
> > + return
> > +
> > + if "QMP" not in obj:
> > + print(f"Invalid QMP answer: {data.decode("utf-8")}")
> > + s.close()
> > + return
> > +
> > + for i, command in enumerate(commands):
> >
>
> Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw
> or qmp.cmd_obj (returns the QMP response as the return value even if it was
> an error.)
Good to know, I'll try and see what fits best.
> More details:
> https://qemu.readthedocs.io/projects/python-qemu-qmp/en/latest/qemu.qmp.legacy.html
I'll take a look. The name "legacy" is a little scary, as it might
imply that this has been deprecated. If there's no plans to deprecate,
then it would be great to use it and simplify the code a little bit.
> There's also an async version, but it doesn't look like you require that
> complexity, so you can ignore it.
Yes, that's the case: a serialized sync send/response logic works perfectly
for this script. No need to be burden with asyncio complexity.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-08 22:41 ` Mauro Carvalho Chehab
@ 2024-08-08 23:33 ` John Snow
2024-08-09 8:24 ` Mauro Carvalho Chehab
2024-08-09 6:26 ` Mauro Carvalho Chehab
1 sibling, 1 reply; 54+ messages in thread
From: John Snow @ 2024-08-08 23:33 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
[-- Attachment #1: Type: text/plain, Size: 6610 bytes --]
On Thu, Aug 8, 2024 at 6:41 PM Mauro Carvalho Chehab <
mchehab+huawei@kernel.org> wrote:
> Em Thu, 8 Aug 2024 17:21:33 -0400
> John Snow <jsnow@redhat.com> escreveu:
>
> > On Fri, Aug 2, 2024 at 5:44 PM Mauro Carvalho Chehab <
> > mchehab+huawei@kernel.org> wrote:
> >
>
> > > diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
> > > new file mode 100644
> > > index 000000000000..13fae7a7af0e
> > > --- /dev/null
> > > +++ b/scripts/qmp_helper.py
> > >
> >
> > I'm going to admit I only glanced at this very briefly, but -- is there a
> > chance you could use qemu.git/python/qemu/qmp instead of writing your own
> > helpers here?
> >
> > If *NOT*, is there something that I need to add to our QMP library to
> > facilitate your script?
>
> I started writing this script to be hosted outside qemu tree, when
> we had a very different API.
>
> I noticed later about the QMP, and even tried to write a patch for it,
> but I gave up due to asyncio complexity...
>
Sorry :)
>
> Please notice that, on this file, I actually placed three classes:
>
> - qmp
> - util
> - cper_guid
>
> I could probably make the first one to be an override of
> QEMUMonitorProtocol
> (besides normal open/close/cmd communication, it also contains some
> methods that are specific to error inject use case:
>
> - to generate a CPER record;
> - to search for data via qom-get.
>
> The other two classes are just common code used by ghes_inject commands.
> My idea is to have multiple commands to do different kinds of GHES
> error injection, each command on a different file/class.
>
Gotcha! Thanks for the feedback. I would *prefer* that code checked in to
qemu.git use the QMP module where possible so that I don't have to maintain
multiple copies of QMP wrangling code. I think what you want to do should
be easily possible with the existing library; and anything that isn't, I'm
more than happy to meet your needs. Reach out absolutely any time.
>
> > > + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> > > + try:
> > > + s.connect((host, port))
> > > + except ConnectionRefusedError:
> > > + sys.exit(f"Can't connect to QMP host {host}:{port}")
> > >
> >
> > You should be able to use e.g.
> >
> > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g.
> >
> > from qemu.qmp.legacy import QEMUMonitorProtocol
> >
> > qmp = QEMUMonitorProtocol((host, port))
> > qmp.connect(negotiate=True)
>
> That sounds interesting! I give it a try.
>
> > If you want to run the script w/o setting up a virtual environment or
> > installing the package, take a look at the hacks in scripts/qmp/ for how
> I
> > support e.g. qom-get directly from the source tree.
>
> Yeah, I saw that already. Doing:
>
> sys.path.append(path.join(qemu_dir, 'python'))
>
> the same way qom-get does should do the trick.
>
> > > +
> > > + data = s.recv(1024)
> > > + try:
> > > + obj = json.loads(data.decode("utf-8"))
> > > + except json.JSONDecodeError as e:
> > > + print(f"Invalid QMP answer: {e}")
> > > + s.close()
> > > + return
> > > +
> > > + if "QMP" not in obj:
> > > + print(f"Invalid QMP answer: {data.decode("utf-8")}")
> > > + s.close()
> > > + return
> > > +
> > > + for i, command in enumerate(commands):
> > >
> >
> > Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw
> > or qmp.cmd_obj (returns the QMP response as the return value even if it
> was
> > an error.)
>
> Good to know, I'll try and see what fits best.
>
I might *suggest* you try to use the exception-raising interface and catch
exceptions to interrogate expected errors as it aligns better with the
"idiomatic python API" - I have no plans to support an external API that
*returns* error objects except via the exception class. This approach will
be easier to port when I drop the legacy interface in the future, see below.
But, that said, whichever is easiest. We use all three interfaces in many
places in the QEMU tree. I have no grounds to require you to use a specific
one ;)
>
> > More details:
> >
> https://qemu.readthedocs.io/projects/python-qemu-qmp/en/latest/qemu.qmp.legacy.html
>
> I'll take a look. The name "legacy" is a little scary, as it might
> imply that this has been deprecated. If there's no plans to deprecate,
> then it would be great to use it and simplify the code a little bit.
>
I named it legacy to be scary on purpose :)
The truth is that the "legacy" module was designed to be a 1:1 drop-in
replacement for an older version of the synchronous QMP library that
powered our internal iotests. We still use this "legacy" module in
thousands of places in the QEMU tree. I do have plans to replace it with a
"proper" synchronous frontend class, eventually, someday, etc. It's been a
while and I still haven't done it, though. Oops...
When I do eventually replace it, I will convert all users inside of
qemu.git personally, and the design of the "non-legacy" API will be chosen
pretty explicitly to make that task really easy for myself and reviewers.
This would include your script inside the qemu.git tree. It should be
pretty safe to use the legacy module *in qemu.git*, but for external,
out-of-tree scripts, it may indeed disappear someday - but converting to
the new API, when I merge it, should be very, very trivial. How much of a
headache that is for you depends on how you package/distribute the script
and how awful it will be to update the code and dependencies when it
happens.
FYI: I have promised in the readme for the standalone version of qemu.qmp
that legacy.py will not be removed prior to v0.1.0. All versions before
then will still have it, guaranteed.
(Neither here nor there: One of the holdups in this replacement is figuring
out how to structure the API for event listening, which was the main
motivator of the *async* version of the class. We have many users who don't
want full async handling, but still want to listen for and catch events. I
need a proper sit and think for what the API I want to commit to supporting
and maintaining for this should look like. Not your problem, anyway!)
>
> > There's also an async version, but it doesn't look like you require that
> > complexity, so you can ignore it.
>
> Yes, that's the case: a serialized sync send/response logic works perfectly
> for this script. No need to be burden with asyncio complexity.
>
> Thanks,
> Mauro
>
>
[-- Attachment #2: Type: text/html, Size: 8687 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-08 22:41 ` Mauro Carvalho Chehab
2024-08-08 23:33 ` John Snow
@ 2024-08-09 6:26 ` Mauro Carvalho Chehab
2024-08-09 7:37 ` Mauro Carvalho Chehab
1 sibling, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-09 6:26 UTC (permalink / raw)
To: John Snow
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
Em Fri, 9 Aug 2024 00:41:37 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> > You should be able to use e.g.
> >
> > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g.
> >
> > from qemu.qmp.legacy import QEMUMonitorProtocol
> >
> > qmp = QEMUMonitorProtocol((host, port))
> > qmp.connect(negotiate=True)
>
> That sounds interesting! I give it a try.
I applied the enclosed patch at the end of my patch series, but
somehow it is not working. For whatever reason, connect() is
raising a StateError apparently due to Runstate.CONNECTING.
I tried both as declaring (see enclosed patch):
class qmp(QEMUMonitorProtocol)
and using:
- super().__init__(self.host, self.port)
+ self.qmp_monitor = QEMUMonitorProtocol(self.host, self.port)
On both cases, it keeps waiting forever for a connection.
Regards,
Mauro
---
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
index e9e9388bcb8b..62ca267cdc87 100644
--- a/scripts/qmp_helper.py
+++ b/scripts/qmp_helper.py
@@ -9,9 +9,23 @@
import socket
import sys
+from os import path
+
+try:
+ qemu_dir = path.abspath(path.dirname(path.dirname(__file__)))
+ sys.path.append(path.join(qemu_dir, 'python'))
+
+ from qemu.qmp.legacy import QEMUMonitorProtocol
+ from qemu.qmp.protocol import StateError
+
+except ModuleNotFoundError as exc:
+ print(f"Module '{exc.name}' not found.")
+ print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir")
+ sys.exit(1)
+
from base64 import b64encode
-class qmp:
+class qmp(QEMUMonitorProtocol):
"""
Opens a connection and send/receive QMP commands.
"""
@@ -21,22 +35,20 @@ def send_cmd(self, command, may_open=False,return_error=True):
if may_open:
self._connect()
- elif not self.socket:
- return None
+ elif not self.connected:
+ return False
if isinstance(command, dict):
data = json.dumps(command).encode("utf-8")
else:
data = command.encode("utf-8")
- self.socket.sendall(data)
- data = self.socket.recv(1024)
try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self._close()
- return None
+ obj = self.cmd_obj(command)
+ except Exception as e:
+ print("Failed to inject error: {e}.")
+
+ print(obj)
if "return" in obj:
if isinstance(obj.get("return"), dict):
@@ -46,86 +58,47 @@ def send_cmd(self, command, may_open=False,return_error=True):
else:
return obj["return"]
- elif isinstance(obj.get("error"), dict):
- error = obj["error"]
- if return_error:
- print(f'{error["class"]}: {error["desc"]}')
- else:
- print(json.dumps(obj))
-
return None
def _close(self):
"""Shutdown and close the socket, if opened"""
- if not self.socket:
+ if not self.connected:
return
- self.socket.shutdown(socket.SHUT_WR)
- while 1:
- data = self.socket.recv(1024)
- if data == b"":
- break
- try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self.socket.close()
- self.socket = None
- return
-
- if isinstance(obj.get("return"), dict):
- print(json.dumps(obj["return"]))
- if isinstance(obj.get("error"), dict):
- error = obj["error"]
- print(f'{error["class"]}: {error["desc"]}')
- else:
- print(json.dumps(obj))
-
- self.socket.close()
- self.socket = None
+ self.close()
+ self.connected = False
def _connect(self):
"""Connect to a QMP TCP/IP port, if not connected yet"""
- if self.socket:
+ if self.connected:
return True
- self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
- try:
- self.socket.connect((self.host, self.port))
- except ConnectionRefusedError:
- sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
-
- data = self.socket.recv(1024)
- try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self._close()
- return False
-
- if "QMP" not in obj:
- print(f"Invalid QMP answer: {data.decode('utf-8')}")
- self._close()
- return False
+ is_connecting = True
+ while is_connecting:
+ try:
+ ret = self.connect(negotiate=True)
+ self.accept()
+ is_connecting = False
+ except ConnectionError:
+ sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
+ return False
+ except StateError as e:
+ print(f"StateError: {e}")
- result = self.send_cmd('{ "execute": "qmp_capabilities" }')
- if not result:
- self._close()
- return False
+ self.connected = True
return True
def __init__(self, host, port, debug=False):
"""Initialize variables used by the QMP send logic"""
- self.socket = None
+ self.connected = False
self.host = host
self.port = port
self.debug = debug
- def __del__(self):
- self._close()
+ super().__init__(self.host, self.port)
#
# Socket QMP send command
@@ -168,8 +141,12 @@ def send_cper(self, guid, data):
self._connect()
- if self.send_cmd(command):
- print("Error injected.")
+ try:
+ self.cmd_obj(command)
+ except Exception as e:
+ print("Failed to inject error: {e}.")
+
+ print("Error injected.")
def search_qom(self, path, prop, regex):
"""
@@ -180,8 +157,9 @@ def search_qom(self, path, prop, regex):
...
"""
- found = []
+ self._connect()
+ found = []
i = 0
while 1:
dev = f"{path}[{i}]"
@@ -192,7 +170,11 @@ def search_qom(self, path, prop, regex):
'property': prop
}
}
- ret = self.send_cmd(cmd, may_open=True, return_error=False)
+ try:
+ ret = self.cmd_obj(cmd)
+ except Exception as e:
+ print("Failed to inject error: {e}.")
+
if not ret:
break
Thanks,
Mauro
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-09 6:26 ` Mauro Carvalho Chehab
@ 2024-08-09 7:37 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-09 7:37 UTC (permalink / raw)
To: John Snow
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
Em Fri, 9 Aug 2024 08:26:09 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> Em Fri, 9 Aug 2024 00:41:37 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
>
> > > You should be able to use e.g.
> > >
> > > legacy.py's QEMUMonitorProtocol class for synchronous connections, e.g.
> > >
> > > from qemu.qmp.legacy import QEMUMonitorProtocol
> > >
> > > qmp = QEMUMonitorProtocol((host, port))
> > > qmp.connect(negotiate=True)
> >
> > That sounds interesting! I give it a try.
>
> I applied the enclosed patch at the end of my patch series, but
> somehow it is not working. For whatever reason, connect() is
> raising a StateError apparently due to Runstate.CONNECTING.
>
> I tried both as declaring (see enclosed patch):
>
> class qmp(QEMUMonitorProtocol)
>
> and using:
>
> - super().__init__(self.host, self.port)
> + self.qmp_monitor = QEMUMonitorProtocol(self.host, self.port)
>
> On both cases, it keeps waiting forever for a connection.
Nevermind, placing host/post on a tuple made it work.
The enclosed patch converts the script to use QEMUMonitorProtocol.
I'll fold it with the script for the next spin of this series.
Regards,
Mauro
---
[PATCH] scripts/qmp_helper.py: use QEMUMonitorProtocol class
Instead of reinventing the wheel, let's use QEMUMonitorProtocol.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
index 756935a2263c..f869f07860b8 100644
--- a/scripts/arm_processor_error.py
+++ b/scripts/arm_processor_error.py
@@ -169,14 +169,11 @@ def send_cper(self, args):
if args.mpidr:
cper["mpidr-el1"] = arg["mpidr"]
elif cpus:
- get_mpidr = {
- "execute": "qom-get",
- "arguments": {
- 'path': cpus[0],
- 'property': "x-mpidr"
- }
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "x-mpidr"
}
- ret = qmp_cmd.send_cmd(get_mpidr, may_open=True)
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
if isinstance(ret, int):
cper["mpidr-el1"] = ret
else:
@@ -291,8 +288,7 @@ def send_cper(self, args):
context_info_num = 0
if ctx:
- ret = qmp_cmd.send_cmd('{ "execute": "query-target" }',
- may_open=True)
+ ret = qmp_cmd.send_cmd("query-target", may_open=True)
default_ctx = self.CONTEXT_MISC_REG
@@ -363,14 +359,11 @@ def send_cper(self, args):
if "midr-el1" not in arg:
if cpus:
- get_mpidr = {
- "execute": "qom-get",
- "arguments": {
- 'path': cpus[0],
- 'property': "midr"
- }
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "midr"
}
- ret = qmp_cmd.send_cmd(get_mpidr, may_open=True)
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
if isinstance(ret, int):
arg["midr-el1"] = ret
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
index 7214c15c6718..e2e0a881f6c1 100644
--- a/scripts/qmp_helper.py
+++ b/scripts/qmp_helper.py
@@ -9,6 +9,19 @@
import socket
import sys
+from os import path
+
+try:
+ qemu_dir = path.abspath(path.dirname(path.dirname(__file__)))
+ sys.path.append(path.join(qemu_dir, 'python'))
+
+ from qemu.qmp.legacy import QEMUMonitorProtocol
+
+except ModuleNotFoundError as exc:
+ print(f"Module '{exc.name}' not found.")
+ print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir")
+ sys.exit(1)
+
from base64 import b64encode
class qmp:
@@ -16,26 +29,23 @@ class qmp:
Opens a connection and send/receive QMP commands.
"""
- def send_cmd(self, command, may_open=False, return_error=True):
+ def send_cmd(self, command, args=None, may_open=False, return_error=True):
"""Send a command to QMP, optinally opening a connection"""
if may_open:
self._connect()
- elif not self.socket:
- return None
+ elif not self.connected:
+ return False
- if isinstance(command, dict):
- data = json.dumps(command).encode("utf-8")
- else:
- data = command.encode("utf-8")
+ msg = { 'execute': command }
+ if args:
+ msg['arguments'] = args
- self.socket.sendall(data)
- data = self.socket.recv(1024)
try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self._close()
+ obj = self.qmp_monitor.cmd_obj(msg)
+ except Exception as e:
+ print(f"Command: {command}")
+ print(f"Failed to inject error: {e}.")
return None
if "return" in obj:
@@ -49,6 +59,7 @@ def send_cmd(self, command, may_open=False, return_error=True):
elif isinstance(obj.get("error"), dict):
error = obj["error"]
if return_error:
+ print(f"Command: {msg}")
print(f'{error["class"]}: {error["desc"]}')
else:
print(json.dumps(obj))
@@ -57,75 +68,37 @@ def send_cmd(self, command, may_open=False, return_error=True):
def _close(self):
"""Shutdown and close the socket, if opened"""
- if not self.socket:
+ if not self.connected:
return
- self.socket.shutdown(socket.SHUT_WR)
- while 1:
- data = self.socket.recv(1024)
- if data == b"":
- break
- try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self.socket.close()
- self.socket = None
- return
-
- if isinstance(obj.get("return"), dict):
- print(json.dumps(obj["return"]))
- if isinstance(obj.get("error"), dict):
- error = obj["error"]
- print(f'{error["class"]}: {error["desc"]}')
- else:
- print(json.dumps(obj))
-
- self.socket.close()
- self.socket = None
+ self.qmp_monitor.close()
+ self.connected = False
def _connect(self):
"""Connect to a QMP TCP/IP port, if not connected yet"""
- if self.socket:
+ if self.connected:
return True
- self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
- self.socket.connect((self.host, self.port))
- except ConnectionRefusedError:
+ ret = self.qmp_monitor.connect(negotiate=True)
+ except ConnectionError:
sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
-
- data = self.socket.recv(1024)
- try:
- obj = json.loads(data.decode("utf-8"))
- except json.JSONDecodeError as e:
- print(f"Invalid QMP answer: {e}")
- self._close()
return False
- if "QMP" not in obj:
- print(f"Invalid QMP answer: {data.decode('utf-8')}")
- self._close()
- return False
-
- result = self.send_cmd('{ "execute": "qmp_capabilities" }')
- if not result:
- self._close()
- return False
+ self.connected = True
return True
def __init__(self, host, port, debug=False):
"""Initialize variables used by the QMP send logic"""
- self.socket = None
+ self.connected = False
self.host = host
self.port = port
self.debug = debug
- def __del__(self):
- self._close()
+ self.qmp_monitor = QEMUMonitorProtocol(address=(self.host, self.port))
#
# Socket QMP send command
@@ -142,9 +115,6 @@ def send_cper(self, guid, data):
}
}
- command = '{ "execute": "ghes-cper", '
- command += '"arguments": ' + json.dumps(cmd_arg) + " }"
-
if self.debug:
print(f"GUID: {guid}")
print("CPER:")
@@ -168,7 +138,7 @@ def send_cper(self, guid, data):
self._connect()
- if self.send_cmd(command):
+ if self.send_cmd("ghes-cper", cmd_arg):
print("Error injected.")
def search_qom(self, path, prop, regex):
@@ -185,14 +155,11 @@ def search_qom(self, path, prop, regex):
i = 0
while 1:
dev = f"{path}[{i}]"
- cmd = {
- "execute": "qom-get",
- "arguments": {
- 'path': dev,
- 'property': prop
- }
+ args = {
+ 'path': dev,
+ 'property': prop
}
- ret = self.send_cmd(cmd, may_open=True, return_error=False)
+ ret = self.send_cmd("qom-get", args, may_open=True, return_error=False)
if not ret:
break
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-08 23:33 ` John Snow
@ 2024-08-09 8:24 ` Mauro Carvalho Chehab
2024-08-09 19:26 ` John Snow
0 siblings, 1 reply; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-09 8:24 UTC (permalink / raw)
To: John Snow
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
Em Thu, 8 Aug 2024 19:33:32 -0400
John Snow <jsnow@redhat.com> escreveu:
> > > Then here you'd use qmp.cmd (raises exception on QMPError) or qmp.cmd_raw
> > > or qmp.cmd_obj (returns the QMP response as the return value even if it
> > was
> > > an error.)
> >
> > Good to know, I'll try and see what fits best.
> >
>
> I might *suggest* you try to use the exception-raising interface and catch
> exceptions to interrogate expected errors as it aligns better with the
> "idiomatic python API" - I have no plans to support an external API that
> *returns* error objects except via the exception class. This approach will
> be easier to port when I drop the legacy interface in the future, see below.
>
> But, that said, whichever is easiest. We use all three interfaces in many
> places in the QEMU tree. I have no grounds to require you to use a specific
> one ;)
While a python-style exception handling is cool, I ended opting to use
cmd_obj(), as the script needs to catch the end of /machine/unattached/device[]
array, and using cmd_obj() made the conversion easier.
One of the things I missed at the documentation is a description of the
possible exceptions that cmd() could raise.
It is probably worth documenting it and placing them on a QMP-specific
error class, but a change like that would probably be incompatible with
the existing applications. Probably something to be considered on your
TODO list to move this from legacy ;-)
Anyway, I already folded the changes at the branch I'll be using as basis
for the next submission (be careful to use it, as I'm always rebasing it):
https://gitlab.com/mchehab_kernel/qemu/-/commit/62feb8f6037ab762a9848eb601a041fbbbe2a77a#b665bcbc1e5ae3a488f1c0f20f8c29ae640bfa63_0_17
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection
2024-08-08 14:45 ` Markus Armbruster
@ 2024-08-09 8:42 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-09 8:42 UTC (permalink / raw)
To: Markus Armbruster
Cc: Igor Mammedov, Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
Ani Sinha, Dongjiu Geng, Eric Blake, Michael Roth, Paolo Bonzini,
Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Em Thu, 08 Aug 2024 16:45:51 +0200
Markus Armbruster <armbru@redhat.com> escreveu:
> Igor Mammedov <imammedo@redhat.com> writes:
>
> > On Thu, 8 Aug 2024 16:11:41 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> >> Em Thu, 08 Aug 2024 10:50:33 +0200
> >> Markus Armbruster <armbru@redhat.com> escreveu:
> >>
> >> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> >>
> >> > > diff --git a/MAINTAINERS b/MAINTAINERS
> >> > > index 98eddf7ae155..655edcb6688c 100644
> >> > > --- a/MAINTAINERS
> >> > > +++ b/MAINTAINERS
> >> > > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> >> > > F: include/hw/acpi/ghes.h
> >> > > F: docs/specs/acpi_hest_ghes.rst
> >> > >
> >> > > +ACPI/HEST/GHES/ARM processor CPER
> >> > > +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >> > > +S: Maintained
> >> > > +F: hw/arm/ghes_cper.c
> >> > > +F: hw/acpi/ghes_cper_stub.c
> >> > > +F: qapi/ghes-cper.json
> >> > > +
> >> >
> >> > Here's the reason for creating a new QAPI module instead of adding to
> >> > existing module acpi.json: different maintainers.
> >> >
> >> > Hypothetical question: if we didn't care for that, would this go into
> >> > qapi/acpi.json?
> >>
> >> Independently of maintainers, GHES is part of ACPI APEI HEST, meaning
> >> to report hardware errors. Such hardware errors are typically handled by
> >> the host OS, so quest doesn't need to be aware of that[1].
> >>
> >> So, IMO the best would be to keep APEI/HEST/GHES in a separate file.
> >>
> >> [1] still, I can foresee some scenarios were passing some errors to the
> >> guest could make sense.
> >>
> >> >
> >> > If yes, then should we call it acpi-ghes-cper.json or acpi-ghes.json
> >> > instead?
> >>
> >> Naming it as acpi-ghes,acpi-hest or acpi-ghes-cper would equally work
> >> from my side.
> >
> > if we going to keep it generic, acpi-hest would do
>
> Works for me.
Ok, I'll do the rename. With regards to the files implementing
support for it:
hw/acpi/ghes_cper.c
hw/acpi/ghes_cper_stub.c
I guess there's no need to rename them, right? IMO such names
are better than acpi/hest.c, specially since the actual implementation
for HEST is inside acpi/ghes.c.
>
> >> > > ppc4xx
> >> > > L: qemu-ppc@nongnu.org
> >> > > S: Orphan
> >> >
> >> > [...]
> >> >
> >> > > diff --git a/qapi/ghes-cper.json b/qapi/ghes-cper.json
> >> > > new file mode 100644
> >> > > index 000000000000..3cc4f9f2aaa9
> >> > > --- /dev/null
> >> > > +++ b/qapi/ghes-cper.json
> >> > > @@ -0,0 +1,55 @@
> >> > > +# -*- Mode: Python -*-
> >> > > +# vim: filetype=python
> >> > > +
> >> > > +##
> >> > > +# = GHESv2 CPER Error Injection
> >> > > +#
> >> > > +# These are defined at
> >> > > +# ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> >> > > +# (GHESv2 - Type 10)
> >> > > +##
> >> >
> >> > Feels a bit terse. These what?
> >> >
> >> > The reference could be clearer: "defined in the ACPI Specification 6.2,
> >> > section 18.3.2.8 Generic Hardware Error Source version 2". A link would
> >> > be nice, if it's stable.
> >>
> >> I can add a link, but only newer ACPI versions are hosted in html format
> >> (e. g. only versions 6.4 and 6.5 are available as html at uefi.org).
> >
> > some years earlier it could be said 'stable link' about acpi spec hosted
> > elsewhere. Not the case anymore after umbrella change.
> >
> > spec name, rev, chapter worked fine for acpi code (it's easy to find wherever spec is hosted).
> > Probably the same would work for QAPI, I'm not QAPI maintainer though,
> > so preffered approach here is absolutely up to you.
>
> A link is strictly optional. Stable links are nice, stale links are
> annoying. Mauro, you decide :)
Well, I guess I'll add a link then, keeping it in text mode as well.
Changing umbrella is something that doesn't happen too often. Hopefully
those will stay for a long time, if not forever, under uefi.org.
If not, we can always drop the link.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject
2024-08-09 8:24 ` Mauro Carvalho Chehab
@ 2024-08-09 19:26 ` John Snow
0 siblings, 0 replies; 54+ messages in thread
From: John Snow @ 2024-08-09 19:26 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Cleber Rosa, linux-kernel,
qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2781 bytes --]
On Fri, Aug 9, 2024, 4:24 AM Mauro Carvalho Chehab <
mchehab+huawei@kernel.org> wrote:
> Em Thu, 8 Aug 2024 19:33:32 -0400
> John Snow <jsnow@redhat.com> escreveu:
>
> > > > Then here you'd use qmp.cmd (raises exception on QMPError) or
> qmp.cmd_raw
> > > > or qmp.cmd_obj (returns the QMP response as the return value even if
> it
> > > was
> > > > an error.)
> > >
> > > Good to know, I'll try and see what fits best.
> > >
> >
> > I might *suggest* you try to use the exception-raising interface and
> catch
> > exceptions to interrogate expected errors as it aligns better with the
> > "idiomatic python API" - I have no plans to support an external API that
> > *returns* error objects except via the exception class. This approach
> will
> > be easier to port when I drop the legacy interface in the future, see
> below.
> >
> > But, that said, whichever is easiest. We use all three interfaces in many
> > places in the QEMU tree. I have no grounds to require you to use a
> specific
> > one ;)
>
> While a python-style exception handling is cool, I ended opting to use
> cmd_obj(), as the script needs to catch the end of
> /machine/unattached/device[]
> array, and using cmd_obj() made the conversion easier.
>
> One of the things I missed at the documentation is a description of the
> possible exceptions that cmd() could raise.
>
> It is probably worth documenting it and placing them on a QMP-specific
> error class, but a change like that would probably be incompatible with
> the existing applications. Probably something to be considered on your
> TODO list to move this from legacy ;-)
>
Good feedback, thanks! I definitely didn't spend much time polishing the
"legacy" interface. I clearly thought it'd be more temporary than it became
;)
I owe the package some updates for 3.13, I'll improve the documentation and
also consider adding some "you forgot to make the address a tuple"
protection so that part is less of a trap. (Without the tuple, I think it
likely used the address as a socket path and the port as a bool to enter
server mode. mypy would catch this, but it's a design goal to not require
or expect script writers to need such things.)
Thank you! :)
> Anyway, I already folded the changes at the branch I'll be using as basis
> for the next submission (be careful to use it, as I'm always rebasing it):
>
Great, I'll review the entire script more thoroughly on v2, if that's OK
with you.
Just got back from a long PTO and an illness and I'm still ramping back up
and handling backlog.
>
> https://gitlab.com/mchehab_kernel/qemu/-/commit/62feb8f6037ab762a9848eb601a041fbbbe2a77a#b665bcbc1e5ae3a488f1c0f20f8c29ae640bfa63_0_17
>
>
> Thanks,
> Mauro
>
~~js
>
[-- Attachment #2: Type: text/html, Size: 4378 bytes --]
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-08 18:19 ` Mauro Carvalho Chehab
@ 2024-08-12 9:39 ` Igor Mammedov
2024-08-13 18:59 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 54+ messages in thread
From: Igor Mammedov @ 2024-08-12 9:39 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Thu, 8 Aug 2024 20:19:03 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Thu, 8 Aug 2024 10:11:07 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Wed, 7 Aug 2024 15:25:47 +0100
> > Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:
> >
> > > On Tue, 6 Aug 2024 16:31:13 +0200
> > > Igor Mammedov <imammedo@redhat.com> wrote:
> > >
> > > > On Fri, 2 Aug 2024 23:44:01 +0200
> > > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > > >
> > > > > Provide a generic interface for error injection via GHESv2.
> > > > >
> > > > > This patch is co-authored:
> > > > > - original ghes logic to inject a simple ARM record by Shiju Jose;
> > > > > - generic logic to handle block addresses by Jonathan Cameron;
> > > > > - generic GHESv2 error inject by Mauro Carvalho Chehab;
> > > > >
> > > > > Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
> > > > > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > > > Cc: Shiju Jose <shiju.jose@huawei.com>
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > > > > ---
> > > > > hw/acpi/ghes.c | 159 ++++++++++++++++++++++++++++++++++++++---
> > > > > hw/acpi/ghes_cper.c | 2 +-
> > > > > include/hw/acpi/ghes.h | 3 +
> > > > > 3 files changed, 152 insertions(+), 12 deletions(-)
> > > > >
> > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > > index a745dcc7be5e..e125c9475773 100644
> > > > > --- a/hw/acpi/ghes.c
> > > > > +++ b/hw/acpi/ghes.c
> > > > > @@ -395,23 +395,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > > > > ags->present = true;
> > > > > }
> > > > >
> > > > > +static uint64_t ghes_get_state_start_address(void)
> > > >
> > > > ghes_get_hardware_errors_address() might better reflect what address it will return
> > > >
> > > > > +{
> > > > > + AcpiGedState *acpi_ged_state =
> > > > > + ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
> > > > > + AcpiGhesState *ags = &acpi_ged_state->ghes_state;
> > > > > +
> > > > > + return le64_to_cpu(ags->ghes_addr_le);
> > > > > +}
> > > > > +
> > > > > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > > > {
> > > > > uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > > > > - uint64_t start_addr;
> > > > > + uint64_t start_addr = ghes_get_state_start_address();
> > > > > bool ret = -1;
> > > > > - AcpiGedState *acpi_ged_state;
> > > > > - AcpiGhesState *ags;
> > > > > -
> > > > > assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> > > > >
> > > > > - acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > > > > - NULL));
> > > > > - g_assert(acpi_ged_state);
> > > > > - ags = &acpi_ged_state->ghes_state;
> > > > > -
> > > > > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > > > > -
> > > > > if (physical_address) {
> > > > > start_addr += source_id * sizeof(uint64_t);
> > > >
> > > > above should be a separate patch
> > > >
> > > > >
> > > > > @@ -448,9 +447,147 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > > > > return ret;
> > > > > }
> > > > >
> > > > > +/*
> > > > > + * Error register block data layout
> > > > > + *
> > > > > + * | +---------------------+ ges.ghes_addr_le
> > > > > + * | |error_block_address0 |
> > > > > + * | +---------------------+
> > > > > + * | |error_block_address1 |
> > > > > + * | +---------------------+ --+--
> > > > > + * | | ............. | GHES_ADDRESS_SIZE
> > > > > + * | +---------------------+ --+--
> > > > > + * | |error_block_addressN |
> > > > > + * | +---------------------+
> > > > > + * | | read_ack0 |
> > > > > + * | +---------------------+ --+--
> > > > > + * | | read_ack1 | GHES_ADDRESS_SIZE
> > > > > + * | +---------------------+ --+--
> > > > > + * | | ............. |
> > > > > + * | +---------------------+
> > > > > + * | | read_ackN |
> > > > > + * | +---------------------+ --+--
> > > > > + * | | CPER | |
> > > > > + * | | .... | GHES_MAX_RAW_DATA_LENGT
> > > > > + * | | CPER | |
> > > > > + * | +---------------------+ --+--
> > > > > + * | | .......... |
> > > > > + * | +---------------------+
> > > > > + * | | CPER |
> > > > > + * | | .... |
> > > > > + * | | CPER |
> > > > > + * | +---------------------+
> > > > > + */
> > > >
> > > > no need to duplicate docs/specs/acpi_hest_ghes.rst,
> > > > I'd just reffer to it and maybe add short comment as to why it's mentioned.
> > > >
> > > > > +/* Map from uint32_t notify to entry offset in GHES */
> > > > > +static const uint8_t error_source_to_index[] = { 0xff, 0xff, 0xff, 0xff,
> > > > > + 0xff, 0xff, 0xff, 1, 0};
> > > > > +
> > > > > +static bool ghes_get_addr(uint32_t notify, uint64_t *error_block_addr,
> > > > > + uint64_t *read_ack_addr)
> > > > > +{
> > > > > + uint64_t base;
> > > > > +
> > > > > + if (notify >= ACPI_GHES_NOTIFY_RESERVED) {
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + /* Find and check the source id for this new CPER */
> > > > > + if (error_source_to_index[notify] == 0xff) {
> > > > > + return false;
> > > > > + }
> > > > > +
> > > > > + base = ghes_get_state_start_address();
> > > > > +
> > > > > + *read_ack_addr = base +
> > > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > > + error_source_to_index[notify] * sizeof(uint64_t);
> > > > > +
> > > > > + /* Could also be read back from the error_block_address register */
> > > > > + *error_block_addr = base +
> > > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > > > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > > > > +
> > > > > + return true;
> > > > > +}
> > > >
> > > > I don't like all this pointer math, which is basically a reverse engineered
> > > > QEMU actions on startup + guest provided etc/hardware_errors address.
> > > >
> > > > For once, it assumes error_source_to_index[] matches order in which HEST
> > > > error sources were described, which is fragile.
> > > >
> > > > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> > > > in RAM migrated from older version might not match above assumptions
> > > > of target QEMU.
> > > >
> > > > I see 2 ways to rectify it:
> > > > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
> > > > in guest RAM, like we do with etc/hardware_errors, see
> > > > build_ghes_error_table()
> > > > ...
> > > > tell firmware to write hardware_errors GPA into
> > > > and then fetch from HEST table in RAM, the guest patched error/ack addresses
> > > > for given source_id
> > > >
> > > > code-wise: relatively simple once one wraps their own head over
> > > > how this whole APEI thing works in QEMU
> > > > workflow is described in docs/specs/acpi_hest_ghes.rst
> > > > look to me as sufficient to grasp it.
> > > > (but my view is very biased given my prior knowledge,
> > > > aka: docs/comments/examples wrt acpi patching are good enough)
> > > > (if it's not clear how to do it, ask me for pointers)
> > >
> > > Hi Igor, I think I follow what you mean but maybe this question will reveal
> > > otherwise. HEST is currently in ACPI_BUILD_TABLE_FILE.
> > > Would you suggest splitting it to it's own file, or using table_offsets
> > > to get the offset in ACPI_BUILD_TABLE_FILE GPA?
> > yep, offset taken right before HEST is to be created
> > doc comment for bios_linker_loader_write_pointer() explains how it works
> >
> > we need something like:
> > bios_linker_loader_write_pointer(linker,
> > ACPI_HEST_TABLE_ADDR_FW_CFG_FILE, 0, sizeof(uint64_t),
> > ACPI_BUILD_TABLE_FILE, hest_offset_within_ACPI_BUILD_TABLE_FILE);
> >
> > to register new file see:
> > a08a64627 ACPI: Record the Generic Error Status Block address
> > and to avoid copy past error maybe
> > 136fc6aa2 ACPI: Avoid infinite recursion when dump-vmstat
> > for this needs to be limited to new machine types and keep
> > old ones without this new feature. (I'd use hw_compat_ machinery for that)
>
> Not sure if I got it. The code, after this patch from my v6:
>
> https://lore.kernel.org/qemu-devel/5710c364d7ef6cdab6b2f1e127ef191bdf84e8c2.1723119423.git.mchehab+huawei@kernel.org/T/#u
>
> Already stores two of the three address offsets via
> bios_linker_loader_add_pointer(), e. g. it is similar to the
> code below (I simplified the code to make the example clearer):
>
> <snip>
> /* From hw/arm/virt-acpi-build.c */
> static
> void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> {
> ...
> if (vms->ras) {
> build_ghes_error_table(tables->hardware_errors, tables->linker);
> acpi_add_table(table_offsets, tables_blob);
> /* internally, call build_ghes_v2() for SEA and GED notification sources */
> acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
> vms->oem_table_id);
> }
> ...
> }
>
> /* From hw/acpi/ghes.c */
> static void build_ghes_v2(GArray *table_data,
> enum AcpiGhesNotifyType notify,
> BIOSLinker *linker)
> {
> uint64_t address_offset, ack_offset, block_addr_offset, cper_offset;
> enum AcpiHestSourceId source_id;
>
> /*
> * Get offsets for either SEA or GED notification - easy to extend
> * to all mechanisms like MCE and SCI to better support x86
> */
> assert(!acpi_hest_address_offset(notify, &block_addr_offset, &ack_offset,
> &cper_offset, &source_id));
>
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET,
> sizeof(uint64_t),
> ACPI_GHES_ERRORS_FW_CFG_FILE,
> block_addr_offset);
>
> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
> address_offset + GAS_ADDR_OFFSET,
> sizeof(uint64_t),
> ACPI_GHES_ERRORS_FW_CFG_FILE,
> ack_offset);
>
> /* Current code ignores &cper_offset when creating HEST */
> }
>
> void ghes_record_cper_errors(AcpiGhesCper *cper, Error **errp,
> enum AcpiGhesNotifyType notify)
> {
> uint64_t cper_addr, read_ack_start_addr;
>
> assert(!ghes_get_hardware_errors_address(notify, NULL, &read_ack_start_addr,
> &cper_addr, NULL));
>
> /*
> * Use cpu_physical_memory_read/write() to
> * - read/store at read_ack_start_addr
> * - Write cper block GArray at cper_addr
> */
> }
> </snip>
>
> We may also store cper_offset there via bios_linker_loader_add_pointer()
> and/or use bios_linker_loader_write_pointer(), but I can't see how the
> data stored there can be retrieved, nor any advantage of using it instead
> of the current code, as, in the end, we'll have 3 addresses that will be
> used:
>
> - an address where a pointer to CPER record will be stored;
> - an address where the ack will be stored;
> - an address where the actual CPER record will be stored.
>
> And those are calculated on a single function and are all stored at the
> ACPI table files.
>
> What am I missing?
That's basically (2) approach and it works to some degree,
unfortunately it's fragile when we start talking about migration
and changing layout in the future.
Lets take as example increasing size of 1) 'Generic Error Status Block',
we are considering. Old QEMU will, tell firmware to allocate 1K buffer
for it and calculated offsets to [1] (that you've stored/calculated) will
include this assumption.
Then in newer we QEMU increase size of [1] and all hardcoded offsets will
account for new size, but if we migrate guest from old QEMU to this newer
one all HEST tables layout within guest will match old QEMU assumptions,
and as result newer QEMU with larger block size will write CPERs at wrong
address considering we are still running guest from old QEMU.
That's just one example.
To make it work there a number of ways, but the ultimate goal is to pick
one that's the least fragile and won't snowball in maintenance nightmare
as number of GHES sources increases over time.
This series tries to solve problem of mapping GHES source to
a corresponding 'Generic Error Status Block' and related registers.
However we are missing access to this mapping since it only
exists in guest patched HEST (i.e in guest RAM only).
The robust way to make it work would be for QEMU to get a pointer
to whole HEST table and then enumerate GHES sources and related
error/ack registers directly from guest RAM (sidestepping layout
change issues this way).
what I'm proposing is to use bios_linker_loader_write_pointer()
(only once) so that firmware could tell QEMU address of HEST table,
in which one can find a GHES source and always correct error/ack
pointers (regardless of table[s] layout changes).
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
2024-08-12 9:39 ` Igor Mammedov
@ 2024-08-13 18:59 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 54+ messages in thread
From: Mauro Carvalho Chehab @ 2024-08-13 18:59 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Mon, 12 Aug 2024 11:39:00 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > We may also store cper_offset there via bios_linker_loader_add_pointer()
> > and/or use bios_linker_loader_write_pointer(), but I can't see how the
> > data stored there can be retrieved, nor any advantage of using it instead
> > of the current code, as, in the end, we'll have 3 addresses that will be
> > used:
> >
> > - an address where a pointer to CPER record will be stored;
> > - an address where the ack will be stored;
> > - an address where the actual CPER record will be stored.
> >
> > And those are calculated on a single function and are all stored at the
> > ACPI table files.
> >
> > What am I missing?
>
> That's basically (2) approach and it works to some degree,
> unfortunately it's fragile when we start talking about migration
> and changing layout in the future.
>
> Lets take as example increasing size of 1) 'Generic Error Status Block',
> we are considering. Old QEMU will, tell firmware to allocate 1K buffer
> for it and calculated offsets to [1] (that you've stored/calculated) will
> include this assumption.
> Then in newer we QEMU increase size of [1] and all hardcoded offsets will
> account for new size, but if we migrate guest from old QEMU to this newer
> one all HEST tables layout within guest will match old QEMU assumptions,
> and as result newer QEMU with larger block size will write CPERs at wrong
> address considering we are still running guest from old QEMU.
> That's just one example.
>
> To make it work there a number of ways, but the ultimate goal is to pick
> one that's the least fragile and won't snowball in maintenance nightmare
> as number of GHES sources increases over time.
>
> This series tries to solve problem of mapping GHES source to
> a corresponding 'Generic Error Status Block' and related registers.
> However we are missing access to this mapping since it only
> exists in guest patched HEST (i.e in guest RAM only).
>
> The robust way to make it work would be for QEMU to get a pointer
> to whole HEST table and then enumerate GHES sources and related
> error/ack registers directly from guest RAM (sidestepping layout
> change issues this way).
>
> what I'm proposing is to use bios_linker_loader_write_pointer()
> (only once) so that firmware could tell QEMU address of HEST table,
> in which one can find a GHES source and always correct error/ack
> pointers (regardless of table[s] layout changes).
Ok, got it. Such change was not easy, but I finally figured out how
to make it actually work.
I'll address tomorrow your comment on patch 5/10 about using raw data also
for the other parts of CPER (generic error status and generic error data).
If you want to do a sneak peak, I'm keeping the latest development
version here:
https://gitlab.com/mchehab_kernel/qemu/-/commits/qemu_submission?ref_type=heads
In particular, the patch changing from /etc/hardware_errors offset to
a HEST offset is at:
https://gitlab.com/mchehab_kernel/qemu/-/commit/9197d22de09df97ce3d6725cb21bd2114c2eb43c
It contains several cleanups to make the logic clearer and more robust.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2024-08-13 19:00 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-08-06 8:57 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-05 16:39 ` Jonathan Cameron via
2024-08-06 5:50 ` Mauro Carvalho Chehab
2024-08-06 8:54 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-08-05 16:54 ` Jonathan Cameron via
2024-08-06 5:56 ` Mauro Carvalho Chehab
2024-08-06 9:15 ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-08-05 16:56 ` Jonathan Cameron via
2024-08-06 6:09 ` Mauro Carvalho Chehab
2024-08-06 9:18 ` Igor Mammedov
2024-08-06 9:32 ` Igor Mammedov
2024-08-07 7:15 ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-05 17:00 ` Jonathan Cameron via
2024-08-06 9:15 ` Shiju Jose via
2024-08-06 12:51 ` Igor Mammedov
2024-08-06 12:58 ` Mauro Carvalho Chehab
2024-08-08 8:50 ` Markus Armbruster
2024-08-08 14:11 ` Mauro Carvalho Chehab
2024-08-08 14:22 ` Igor Mammedov
2024-08-08 14:45 ` Markus Armbruster
2024-08-09 8:42 ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-05 17:03 ` Jonathan Cameron via
2024-08-06 11:13 ` Shiju Jose via
2024-08-06 14:31 ` Igor Mammedov
2024-08-07 7:47 ` Mauro Carvalho Chehab
2024-08-07 9:34 ` Jonathan Cameron via
2024-08-07 13:23 ` Mauro Carvalho Chehab
2024-08-07 13:43 ` Igor Mammedov
2024-08-07 13:28 ` Igor Mammedov
2024-08-07 14:25 ` Jonathan Cameron via
2024-08-08 8:11 ` Igor Mammedov
2024-08-08 18:19 ` Mauro Carvalho Chehab
2024-08-12 9:39 ` Igor Mammedov
2024-08-13 18:59 ` Mauro Carvalho Chehab
2024-08-08 12:11 ` Mauro Carvalho Chehab
2024-08-08 12:45 ` Igor Mammedov
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-06 14:56 ` Igor Mammedov
2024-08-08 20:58 ` John Snow
2024-08-08 21:51 ` Mauro Carvalho Chehab
2024-08-08 21:21 ` John Snow
2024-08-08 22:41 ` Mauro Carvalho Chehab
2024-08-08 23:33 ` John Snow
2024-08-09 8:24 ` Mauro Carvalho Chehab
2024-08-09 19:26 ` John Snow
2024-08-09 6:26 ` Mauro Carvalho Chehab
2024-08-09 7:37 ` Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).