* [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 9:34 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
` (20 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Store HEST table address at GPA, placing its content at
hest_addr_le variable.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Change from v8:
- hest_addr_lr is now pointing to the error source size and data.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 15 +++++++++++++++
include/hw/acpi/ghes.h | 1 +
2 files changed, 16 insertions(+)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e9511d9b8f71..529c14e3289f 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -30,6 +30,7 @@
#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
@@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
acpi_table_begin(&table, table_data);
+ int hest_offset = table_data->len;
+
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
acpi_table_end(linker, &table);
+
+ /*
+ * tell firmware to write into GPA the address of HEST via fw_cfg,
+ * once initialized.
+ */
+ bios_linker_loader_write_pointer(linker,
+ ACPI_HEST_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_BUILD_TABLE_FILE, hest_offset);
}
void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
@@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
+ fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
+ NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
+
ags->present = true;
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 674f6958e905..28b956acb19a 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -63,6 +63,7 @@ enum {
};
typedef struct AcpiGhesState {
+ uint64_t hest_addr_le;
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address
2024-09-14 6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-09-17 9:34 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 9:34 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:22 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Store HEST table address at GPA, placing its content at
> hest_addr_le variable.
say here why (just short description and pointing to the next patch
that would do that)
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
PS:
I'd move this patch to later in the series,
right before the patch where you are going to use hest_addr_le
>
> ---
>
> Change from v8:
> - hest_addr_lr is now pointing to the error source size and data.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
stray Signed-off-by
> ---
> hw/acpi/ghes.c | 15 +++++++++++++++
> include/hw/acpi/ghes.h | 1 +
> 2 files changed, 16 insertions(+)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index e9511d9b8f71..529c14e3289f 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -30,6 +30,7 @@
>
> #define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
> #define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
> +#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
>
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
> @@ -367,11 +368,22 @@ void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
>
> acpi_table_begin(&table, table_data);
>
> + int hest_offset = table_data->len;
> +
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
>
> acpi_table_end(linker, &table);
> +
> + /*
> + * tell firmware to write into GPA the address of HEST via fw_cfg,
> + * once initialized.
> + */
> + bios_linker_loader_write_pointer(linker,
> + ACPI_HEST_ADDR_FW_CFG_FILE, 0,
> + sizeof(uint64_t),
> + ACPI_BUILD_TABLE_FILE, hest_offset);
> }
>
> void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> @@ -385,6 +397,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
> NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
>
> + fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
> + NULL, &(ags->hest_addr_le), sizeof(ags->hest_addr_le), false);
> +
> ags->present = true;
> }
>
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 674f6958e905..28b956acb19a 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -63,6 +63,7 @@ enum {
> };
>
> typedef struct AcpiGhesState {
> + uint64_t hest_addr_le;
> uint64_t ghes_addr_le;
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 9:19 ` Igor Mammedov
2024-09-17 12:01 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
` (19 subsequent siblings)
21 siblings, 2 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, linux-kernel, qemu-devel
The GHES migration logic at GED should now support HEST table
location too.
Increase migration version and change needed to check for both
ghes_addr_le and hest_addr_le.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/generic_event_device.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 15b4c3ebbf24..4e5e387ee2df 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -343,10 +343,11 @@ static const VMStateDescription vmstate_ged_state = {
static const VMStateDescription vmstate_ghes = {
.name = "acpi-ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
+ VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
VMSTATE_END_OF_LIST()
},
};
@@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = {
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
- return s->ghes_state.ghes_addr_le;
+ return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
}
static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.needed = ghes_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-14 6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2024-09-17 9:19 ` Igor Mammedov
2024-09-17 15:22 ` Peter Xu
2024-09-17 12:01 ` Igor Mammedov
1 sibling, 1 reply; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 9:19 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
linux-kernel, qemu-devel, Peter Xu, Fabiano Rosas
On Sat, 14 Sep 2024 08:13:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The GHES migration logic at GED should now support HEST table
> location too.
>
> Increase migration version and change needed to check for both
> ghes_addr_le and hest_addr_le
But I don't think it will work like this (but I might be easily wrong)
However I don't know enough to properly review this patch, CCing Peter & Fabiano
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/generic_event_device.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf24..4e5e387ee2df 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -343,10 +343,11 @@ static const VMStateDescription vmstate_ged_state = {
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> + VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> },
> };
> @@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = {
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> - return s->ghes_state.ghes_addr_le;
> + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
> }
what I would do:
add ghes_needed_v2(): return s->ghes_state.hest_addr_le;
and then instead of reusing vmstate_ghes_state would add new
vmstate_ghes_v2_state subsection that migrates only
VMSTATE_UINT64(hest_addr_le, AcpiGhesState)
field.
btw: we probably don't need ghes_addr_le for new code that
uses HEST to lookup relevant error status block.
but we should still keep it for 9.1 and older machine types
as they expect/use it. Separate subsections would work with
this req just fine.
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-17 9:19 ` Igor Mammedov
@ 2024-09-17 15:22 ` Peter Xu
2024-09-25 8:04 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 40+ messages in thread
From: Peter Xu @ 2024-09-17 15:22 UTC (permalink / raw)
To: Igor Mammedov
Cc: Mauro Carvalho Chehab, Jonathan Cameron, Shiju Jose,
Michael S. Tsirkin, Ani Sinha, linux-kernel, qemu-devel,
Fabiano Rosas
On Tue, Sep 17, 2024 at 11:19:21AM +0200, Igor Mammedov wrote:
> On Sat, 14 Sep 2024 08:13:23 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > The GHES migration logic at GED should now support HEST table
> > location too.
> >
> > Increase migration version and change needed to check for both
> > ghes_addr_le and hest_addr_le
> But I don't think it will work like this (but I might be easily wrong)
> However I don't know enough to properly review this patch, CCing Peter & Fabiano
>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/generic_event_device.c | 11 ++++++-----
> > 1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> > index 15b4c3ebbf24..4e5e387ee2df 100644
> > --- a/hw/acpi/generic_event_device.c
> > +++ b/hw/acpi/generic_event_device.c
> > @@ -343,10 +343,11 @@ static const VMStateDescription vmstate_ged_state = {
> >
> > static const VMStateDescription vmstate_ghes = {
> > .name = "acpi-ghes",
> > - .version_id = 1,
> > - .minimum_version_id = 1,
> > + .version_id = 2,
> > + .minimum_version_id = 2,
> > .fields = (const VMStateField[]) {
> > VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> > + VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> > VMSTATE_END_OF_LIST()
> > },
> > };
> > @@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = {
> > static bool ghes_needed(void *opaque)
> > {
> > AcpiGedState *s = opaque;
> > - return s->ghes_state.ghes_addr_le;
> > + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
> > }
>
> what I would do:
> add ghes_needed_v2(): return s->ghes_state.hest_addr_le;
>
> and then instead of reusing vmstate_ghes_state would add new
> vmstate_ghes_v2_state subsection that migrates only
> VMSTATE_UINT64(hest_addr_le, AcpiGhesState)
> field.
>
> btw: we probably don't need ghes_addr_le for new code that
> uses HEST to lookup relevant error status block.
> but we should still keep it for 9.1 and older machine types
> as they expect/use it. Separate subsections would work with
> this req just fine.
Right, if we need bi-directional migration we need above and a compat
property (or VMSTATE_UINT64_TEST() would work too, iiuc).
OTOH VMSD versioning only works for forward migration, not backward.
>
> > static const VMStateDescription vmstate_ghes_state = {
> > .name = "acpi-ged/ghes",
> > - .version_id = 1,
> > - .minimum_version_id = 1,
> > + .version_id = 2,
> > + .minimum_version_id = 2,
(and IIUC if we set min ver=2, even forward migration should fail.. better
test it with an old binary, migrating back and forth)
> > .needed = ghes_needed,
> > .fields = (const VMStateField[]) {
> > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
>
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-17 15:22 ` Peter Xu
@ 2024-09-25 8:04 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 8:04 UTC (permalink / raw)
To: Peter Xu
Cc: Igor Mammedov, Jonathan Cameron, Shiju Jose, Michael S. Tsirkin,
Ani Sinha, linux-kernel, qemu-devel, Fabiano Rosas
Em Tue, 17 Sep 2024 11:22:31 -0400
Peter Xu <peterx@redhat.com> escreveu:
> On Tue, Sep 17, 2024 at 11:19:21AM +0200, Igor Mammedov wrote:
> > On Sat, 14 Sep 2024 08:13:23 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > what I would do:
> > add ghes_needed_v2(): return s->ghes_state.hest_addr_le;
> >
> > and then instead of reusing vmstate_ghes_state would add new
> > vmstate_ghes_v2_state subsection that migrates only
> > VMSTATE_UINT64(hest_addr_le, AcpiGhesState)
> > field.
> >
> > btw: we probably don't need ghes_addr_le for new code that
> > uses HEST to lookup relevant error status block.
> > but we should still keep it for 9.1 and older machine types
> > as they expect/use it. Separate subsections would work with
> > this req just fine.
Ok, so, if I understood it right, the enclosed patch should do the
job, right?
> Right, if we need bi-directional migration we need above and a compat
> property (or VMSTATE_UINT64_TEST() would work too, iiuc).
>
> OTOH VMSD versioning only works for forward migration, not backward.
I don't think bi-directional migration would work. See, with
old version, we have:
- Just one Error source block structure, only for ARMv8 using synchronous
notification (SEA).
- The offsets to access the error block structure and the memory
position where the OSPM will acknowledge the error assumes a single
error source;
- such offsets come from ghes_addr_le bios pointer (we will rename it to
hw_addr_le at the cleanup patch series).
With the new versions, we'll have:
- at least two notification sources on ARMv8 (SEA and GPIO). We may
end adding more with time;
- other architectures may also have support for bios-first hardware
errors;
- the number of error block structures is now read from HEST table
(so it needs that hest_addr_le will be stored at AcpiGedState;
- the offsets to retrieve the addresses are now relative to the offset
at hest_addr_le.
The new error injection code, which uses either hest_addr_le or
ghes_addr_le (now hw_addr_le) should be able to properly generate
errors from a VM created on 9.1, as it will check if hest_addr_le
is not zero. If it is zero, it will call a backward-compatible
code:
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
if (!acpi_ged_state) {
error_setg(errp, "Can't find ACPI_GED object");
return;
}
ags = &acpi_ged_state->ghes_state;
if (!ags->hest_addr_le) {
/* Assumes just a single source_id */
get_ghes_offsets(le64_to_cpu(ags->hw_error_le),
&cper_addr, &read_ack_register_addr);
} else {
/* do a for at the HEST table seeking for an specific source_id */
get_hest_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
&cper_addr, &read_ack_register_addr, errp);
}
Now, a VM created with 9.2 will have multiple sources. The location of the
read_ack_register_addr depends on the number of sources, which can't be
retrieved without a VIOS pointer to the location of the HEST table
(e. g. ags->hest_addr_le).
So, a 9.1 QEMU with a VM created on 9.2 won't be doing the right thing
with regards to the vaule of the ack offset, thus with RAS errors not
working. I can't see any way to make it work.
> >
> > > static const VMStateDescription vmstate_ghes_state = {
> > > .name = "acpi-ged/ghes",
> > > - .version_id = 1,
> > > - .minimum_version_id = 1,
> > > + .version_id = 2,
> > > + .minimum_version_id = 2,
>
> (and IIUC if we set min ver=2, even forward migration should fail.. better
> test it with an old binary, migrating back and forth)
>
> > > .needed = ghes_needed,
> > > .fields = (const VMStateField[]) {
> > > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> >
>
> Thanks,
>
Thanks,
Mauro
---
[PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr
The GHES migration logic at GED should now support HEST table
location too.
Increase migration version and change needed to check for both
ghes_addr_le and hest_addr_le.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index c5acfb204e5f..bd996d01390c 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -377,6 +377,34 @@ static const VMStateDescription vmstate_ghes_state = {
}
};
+static const VMStateDescription vmstate_hest = {
+ .name = "acpi-ghes",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (const VMStateField[]) {
+ VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
+ VMSTATE_END_OF_LIST()
+ },
+};
+
+static bool hest_needed(void *opaque)
+{
+ AcpiGedState *s = opaque;
+ return s->ghes_state.hest_addr_le;
+}
+
+static const VMStateDescription vmstate_hest_state = {
+ .name = "acpi-ged/ghes",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = hest_needed,
+ .fields = (const VMStateField[]) {
+ VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
+ vmstate_hest, AcpiGhesState),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
static const VMStateDescription vmstate_acpi_ged = {
.name = "acpi-ged",
.version_id = 1,
@@ -388,6 +416,7 @@ static const VMStateDescription vmstate_acpi_ged = {
.subsections = (const VMStateDescription * const []) {
&vmstate_memhp_state,
&vmstate_ghes_state,
+ &vmstate_hest_state,
NULL
}
};
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-14 6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2024-09-17 9:19 ` Igor Mammedov
@ 2024-09-17 12:01 ` Igor Mammedov
2024-09-25 7:32 ` Mauro Carvalho Chehab
1 sibling, 1 reply; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 12:01 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
linux-kernel, qemu-devel
On Sat, 14 Sep 2024 08:13:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The GHES migration logic at GED should now support HEST table
> location too.
>
> Increase migration version and change needed to check for both
> ghes_addr_le and hest_addr_le.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/generic_event_device.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
> index 15b4c3ebbf24..4e5e387ee2df 100644
> --- a/hw/acpi/generic_event_device.c
> +++ b/hw/acpi/generic_event_device.c
> @@ -343,10 +343,11 @@ static const VMStateDescription vmstate_ged_state = {
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> + VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> },
> };
> @@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = {
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> - return s->ghes_state.ghes_addr_le;
^^^^^^^^^^^^
another thing, perhaps we should rename it to 'hardware_errors_addr'
to make it less confusing
> + return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
> }
>
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> - .version_id = 1,
> - .minimum_version_id = 1,
> + .version_id = 2,
> + .minimum_version_id = 2,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr
2024-09-17 12:01 ` Igor Mammedov
@ 2024-09-25 7:32 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 7:32 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
linux-kernel, qemu-devel
Em Tue, 17 Sep 2024 14:01:46 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > @@ -354,13 +355,13 @@ static const VMStateDescription vmstate_ghes = {
> > static bool ghes_needed(void *opaque)
> > {
> > AcpiGedState *s = opaque;
> > - return s->ghes_state.ghes_addr_le;
> ^^^^^^^^^^^^
> another thing, perhaps we should rename it to 'hardware_errors_addr'
> to make it less confusing
At the cleanups, I added a rename patch. I opted to a shorter name:
hw_error_le.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 9:22 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
` (18 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which
has a better name. So, drop the duplication.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 7 ++-----
include/hw/acpi/ghes.h | 3 ++-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 529c14e3289f..35f793401d06 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -35,9 +35,6 @@
/* The max size in bytes for one error block */
#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
-/* Now only support ARMv8 SEA notification type error source */
-#define ACPI_GHES_ERROR_SOURCE_COUNT 1
-
/* Generic Hardware Error Source version 2 */
#define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
@@ -411,7 +408,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
- assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
+ assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
@@ -422,7 +419,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
if (physical_address) {
- if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
start_addr += source_id * sizeof(uint64_t);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 28b956acb19a..5421ffcbb7fa 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -59,7 +59,8 @@ enum AcpiGhesNotifyType {
enum {
ACPI_HEST_SRC_ID_SEA = 0,
/* future ids go here */
- ACPI_HEST_SRC_ID_RESERVED,
+
+ ACPI_GHES_ERROR_SOURCE_COUNT
};
typedef struct AcpiGhesState {
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
2024-09-14 6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
@ 2024-09-17 9:22 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 9:22 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:24 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> This is just duplicating ACPI_GHES_ERROR_SOURCE_COUNT, which
> has a better name. So, drop the duplication.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 7 ++-----
> include/hw/acpi/ghes.h | 3 ++-
> 2 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 529c14e3289f..35f793401d06 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -35,9 +35,6 @@
> /* The max size in bytes for one error block */
> #define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
>
> -/* Now only support ARMv8 SEA notification type error source */
> -#define ACPI_GHES_ERROR_SOURCE_COUNT 1
> -
> /* Generic Hardware Error Source version 2 */
> #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
>
> @@ -411,7 +408,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
>
> - assert(source_id < ACPI_HEST_SRC_ID_RESERVED);
> + assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
>
> acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> NULL));
> @@ -422,7 +419,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>
> if (physical_address) {
>
> - if (source_id < ACPI_HEST_SRC_ID_RESERVED) {
> + if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> start_addr += source_id * sizeof(uint64_t);
> }
>
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 28b956acb19a..5421ffcbb7fa 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -59,7 +59,8 @@ enum AcpiGhesNotifyType {
> enum {
> ACPI_HEST_SRC_ID_SEA = 0,
> /* future ids go here */
> - ACPI_HEST_SRC_ID_RESERVED,
> +
> + ACPI_GHES_ERROR_SOURCE_COUNT
> };
>
> typedef struct AcpiGhesState {
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (2 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 9:28 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
` (17 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
if physical_address is not defined, just return with an error
set.
That reduces the ident of the function and prepares it for
the next changes.
No functional changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 56 ++++++++++++++++++++++++++------------------------
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 35f793401d06..17b7d9e10f3e 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -417,40 +417,42 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
start_addr = le64_to_cpu(ags->ghes_addr_le);
- if (physical_address) {
+ if (!physical_address) {
+ return -1;
+ }
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
+ start_addr += source_id * sizeof(uint64_t);
+ }
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
+ cpu_physical_memory_read(start_addr, &error_block_addr,
+ sizeof(error_block_addr));
- error_block_addr = le64_to_cpu(error_block_addr);
+ error_block_addr = le64_to_cpu(error_block_addr);
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ read_ack_register_addr = start_addr +
+ ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
- cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
+ cpu_physical_memory_read(read_ack_register_addr,
+ &read_ack_register, sizeof(read_ack_register));
- /* zero means OSPM does not acknowledge the error */
- if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack_register) {
+ error_report("OSPM does not acknowledge previous error,"
+ " so can not record CPER for current error anymore");
+ } else if (error_block_addr) {
+ read_ack_register = cpu_to_le64(0);
+ /*
+ * Clear the Read Ack Register, OSPM will write it to 1 when
+ * it acknowledges this error.
+ */
+ cpu_physical_memory_write(read_ack_register_addr,
+ &read_ack_register, sizeof(uint64_t));
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else
- error_report("can not find Generic Error Status Block");
+ ret = acpi_ghes_record_mem_error(error_block_addr,
+ physical_address);
+ } else {
+ error_report("can not find Generic Error Status Block");
}
return ret;
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code
2024-09-14 6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
@ 2024-09-17 9:28 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 9:28 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:25 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> if physical_address is not defined, just return with an error
> set.
no need to mention that, as it's what does now.
>
> That reduces the ident of the function and prepares it for
> the next changes.
>
> No functional changes.
this is enough to describe the change.
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 56 ++++++++++++++++++++++++++------------------------
> 1 file changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 35f793401d06..17b7d9e10f3e 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -417,40 +417,42 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>
> start_addr = le64_to_cpu(ags->ghes_addr_le);
>
> - if (physical_address) {
> + if (!physical_address) {
> + return -1;
> + }
>
> - if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> + if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> + start_addr += source_id * sizeof(uint64_t);
> + }
>
> - cpu_physical_memory_read(start_addr, &error_block_addr,
> - sizeof(error_block_addr));
> + cpu_physical_memory_read(start_addr, &error_block_addr,
> + sizeof(error_block_addr));
>
> - error_block_addr = le64_to_cpu(error_block_addr);
> + error_block_addr = le64_to_cpu(error_block_addr);
>
> - read_ack_register_addr = start_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> + read_ack_register_addr = start_addr +
> + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
>
> - cpu_physical_memory_read(read_ack_register_addr,
> - &read_ack_register, sizeof(read_ack_register));
> + cpu_physical_memory_read(read_ack_register_addr,
> + &read_ack_register, sizeof(read_ack_register));
>
> - /* zero means OSPM does not acknowledge the error */
> - if (!read_ack_register) {
> - error_report("OSPM does not acknowledge previous error,"
> - " so can not record CPER for current error anymore");
> - } else if (error_block_addr) {
> - read_ack_register = cpu_to_le64(0);
> - /*
> - * Clear the Read Ack Register, OSPM will write it to 1 when
> - * it acknowledges this error.
> - */
> - cpu_physical_memory_write(read_ack_register_addr,
> - &read_ack_register, sizeof(uint64_t));
> + /* zero means OSPM does not acknowledge the error */
> + if (!read_ack_register) {
> + error_report("OSPM does not acknowledge previous error,"
> + " so can not record CPER for current error anymore");
> + } else if (error_block_addr) {
> + read_ack_register = cpu_to_le64(0);
> + /*
> + * Clear the Read Ack Register, OSPM will write it to 1 when
> + * it acknowledges this error.
> + */
> + cpu_physical_memory_write(read_ack_register_addr,
> + &read_ack_register, sizeof(uint64_t));
>
> - ret = acpi_ghes_record_mem_error(error_block_addr,
> - physical_address);
> - } else
> - error_report("can not find Generic Error Status Block");
> + ret = acpi_ghes_record_mem_error(error_block_addr,
> + physical_address);
> + } else {
> + error_report("can not find Generic Error Status Block");
> }
>
> return ret;
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 05/21] acpi/ghes: better handle source_id and notification
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (3 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 9:01 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
` (16 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
GHES has two fields with somewhat meanings:
- notification type, which is a number defined at the ACPI spec
containing several arch-specific synchronous and assynchronous
types;
- source id, which is a HW/FW defined number, used to distinguish
between different implemented hardware report mechanisms.
The current logic is arm-specific, implementing a single source ID,
for an armv8-specific synchronous report mechanism (SEA).
Cleanup the code to make easier to add other types and make the
code portable to non-ARM.
As a collateral effect of such change, build_ghes_error_table()
function is now an internal function.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 31 +++++++++++++++----------------
hw/arm/virt-acpi-build.c | 5 ++---
include/hw/acpi/ghes.h | 6 +++---
3 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 17b7d9e10f3e..939e89723a2f 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -234,7 +234,7 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
{
int i, error_status_block_offset;
@@ -285,9 +285,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
-static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
+static void build_ghes_v2(GArray *table_data,
+ BIOSLinker *linker,
+ enum AcpiGhesNotifyType notify,
+ uint16_t source_id)
{
uint64_t address_offset;
+
/*
* Type:
* Generic Hardware Error Source version 2(GHESv2 - Type 10)
@@ -317,18 +321,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
- switch (source_id) {
- case ACPI_HEST_SRC_ID_SEA:
- /*
- * Notification Structure
- * Now only enable ARMv8 SEA notification type
- */
- build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
- break;
- default:
- error_report("Not support this error source");
- abort();
- }
+ /* Notification Structure */
+ build_ghes_hw_error_notification(table_data, notify);
/* Error Status Block Length */
build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
@@ -357,19 +351,24 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
}
/* Build Hardware Error Source Table */
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ build_ghes_error_table(hardware_errors, linker);
+
acpi_table_begin(&table, table_data);
+ /* Beginning at the HEST Error Source struct count and data */
int hest_offset = table_data->len;
/* Error Source Count */
build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
+ build_ghes_v2(table_data, linker,
+ ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
acpi_table_end(linker, &table);
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f76fb117adff..bafd9a56c217 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -943,10 +943,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
build_dbg2(tables_blob, tables->linker, vms);
if (vms->ras) {
- build_ghes_error_table(tables->hardware_errors, tables->linker);
acpi_add_table(table_offsets, tables_blob);
- acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
- vms->oem_table_id);
+ acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ vms->oem_id, vms->oem_table_id);
}
if (ms->numa_state->num_nodes > 0) {
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 5421ffcbb7fa..c9bbda4740e2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -69,12 +69,12 @@ typedef struct AcpiGhesState {
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
-void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
-void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
+void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
+ BIOSLinker *linker,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 05/21] acpi/ghes: better handle source_id and notification
2024-09-14 6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-09-17 9:01 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 9:01 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, Peter Maydell, Shannon Zhao, linux-kernel, qemu-arm,
qemu-devel
On Sat, 14 Sep 2024 08:13:26 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> GHES has two fields with somewhat meanings:
if 'somewhat' means similar, than it's not.
type is defining a way OSPM is notified about error
while source_id id fw/hw defined arbitrary number that specify
a concrete error reporting source. One can easily have several source ids
with the same notification type.
> - notification type, which is a number defined at the ACPI spec
> containing several arch-specific synchronous and assynchronous
> types;
> - source id, which is a HW/FW defined number, used to distinguish
> between different implemented hardware report mechanisms.
s/mechanisms/sources/
> The current logic is arm-specific, implementing a single source ID,
> for an armv8-specific synchronous report mechanism (SEA).
regardless, above chunk doesn't seem to be relevant to what patch does.
> Cleanup the code to make easier to add other types and make the
> code portable to non-ARM.
>
> As a collateral effect of such change, build_ghes_error_table()
> function is now an internal function.
all of above doesn't explain why it's good idea,
commit message should be self sufficient for anyone without prior
knowledge if series history to understand what & why of the patch.
The patch does 2 things
1. hide build_ghes_error_table()
2. adds notify to build_ghes_v2()
it's the best to split these and accompany them with commit messages
explaining why it's needed.
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> ---
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 31 +++++++++++++++----------------
> hw/arm/virt-acpi-build.c | 5 ++---
> include/hw/acpi/ghes.h | 6 +++---
> 3 files changed, 20 insertions(+), 22 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 17b7d9e10f3e..939e89723a2f 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -234,7 +234,7 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> * Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
> * See docs/specs/acpi_hest_ghes.rst for blobs format.
> */
> -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> +static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> {
> int i, error_status_block_offset;
>
> @@ -285,9 +285,13 @@ void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
> }
>
> /* Build Generic Hardware Error Source version 2 (GHESv2) */
> -static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> +static void build_ghes_v2(GArray *table_data,
> + BIOSLinker *linker,
> + enum AcpiGhesNotifyType notify,
> + uint16_t source_id)
> {
> uint64_t address_offset;
> +
> /*
> * Type:
> * Generic Hardware Error Source version 2(GHESv2 - Type 10)
> @@ -317,18 +321,8 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
> ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
>
> - switch (source_id) {
> - case ACPI_HEST_SRC_ID_SEA:
> - /*
> - * Notification Structure
> - * Now only enable ARMv8 SEA notification type
> - */
> - build_ghes_hw_error_notification(table_data, ACPI_GHES_NOTIFY_SEA);
> - break;
> - default:
> - error_report("Not support this error source");
> - abort();
> - }
> + /* Notification Structure */
> + build_ghes_hw_error_notification(table_data, notify);
>
> /* Error Status Block Length */
> build_append_int_noprefix(table_data, ACPI_GHES_MAX_RAW_DATA_LENGTH, 4);
> @@ -357,19 +351,24 @@ static void build_ghes_v2(GArray *table_data, int source_id, BIOSLinker *linker)
> }
>
> /* Build Hardware Error Source Table */
> -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> + BIOSLinker *linker,
> const char *oem_id, const char *oem_table_id)
> {
> AcpiTable table = { .sig = "HEST", .rev = 1,
> .oem_id = oem_id, .oem_table_id = oem_table_id };
>
> + build_ghes_error_table(hardware_errors, linker);
> +
> acpi_table_begin(&table, table_data);
>
> + /* Beginning at the HEST Error Source struct count and data */
> int hest_offset = table_data->len;
>
> /* Error Source Count */
> build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
> - build_ghes_v2(table_data, ACPI_HEST_SRC_ID_SEA, linker);
> + build_ghes_v2(table_data, linker,
> + ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
>
> acpi_table_end(linker, &table);
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
> index f76fb117adff..bafd9a56c217 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -943,10 +943,9 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
> build_dbg2(tables_blob, tables->linker, vms);
>
> if (vms->ras) {
> - build_ghes_error_table(tables->hardware_errors, tables->linker);
> acpi_add_table(table_offsets, tables_blob);
> - acpi_build_hest(tables_blob, tables->linker, vms->oem_id,
> - vms->oem_table_id);
> + acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
> + vms->oem_id, vms->oem_table_id);
> }
>
> if (ms->numa_state->num_nodes > 0) {
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 5421ffcbb7fa..c9bbda4740e2 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -69,12 +69,12 @@ typedef struct AcpiGhesState {
> bool present; /* True if GHES is present at all on this board */
> } AcpiGhesState;
>
> -void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker);
> -void acpi_build_hest(GArray *table_data, BIOSLinker *linker,
> +void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> + BIOSLinker *linker,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t notify, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (4 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 10:39 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
` (15 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
acpi_ghes_record_errors() has an assert() at the beginning
to ensure that source_id will be lower than
ACPI_GHES_ERROR_SOURCE_COUNT. Remove a duplicated check.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 939e89723a2f..36fe5f68782f 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -420,9 +420,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
return -1;
}
- if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
- start_addr += source_id * sizeof(uint64_t);
- }
+ start_addr += source_id * sizeof(uint64_t);
cpu_physical_memory_read(start_addr, &error_block_addr,
sizeof(error_block_addr));
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check
2024-09-14 6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
@ 2024-09-17 10:39 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 10:39 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:27 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> acpi_ghes_record_errors() has an assert() at the beginning
> to ensure that source_id will be lower than
> ACPI_GHES_ERROR_SOURCE_COUNT. Remove a duplicated check.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 939e89723a2f..36fe5f68782f 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -420,9 +420,7 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> return -1;
> }
>
> - if (source_id < ACPI_GHES_ERROR_SOURCE_COUNT) {
> - start_addr += source_id * sizeof(uint64_t);
> - }
> + start_addr += source_id * sizeof(uint64_t);
>
> cpu_physical_memory_read(start_addr, &error_block_addr,
> sizeof(error_block_addr));
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (5 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 11:59 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
` (14 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
The current logic is based on a lot of duct tape, with
offsets calculated based on one define with the number of
source IDs and an enum.
Rewrite the logic in a way that it would be more resilient
of code changes, by moving the source ID count to an enum
and make the offset calculus more explicit.
Such change was inspired on a patch from Jonathan Cameron
splitting the logic to get the CPER address on a separate
function, as this will be needed to support generic error
injection.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes from v9:
- patch split on multiple patches to avoid multiple changes
at the same patch.
Changes from v8:
- Non-rename/cleanup changes merged altogether;
- source ID is now more generic, defined per guest target.
That should make easier to add support for 86.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 95 ++++++++++++++++++++++++++++++------------
include/hw/acpi/ghes.h | 6 ++-
2 files changed, 73 insertions(+), 28 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 36fe5f68782f..6e5f0e8cf0c9 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -61,6 +61,23 @@
*/
#define ACPI_GHES_GESB_SIZE 20
+/*
+ * Offsets with regards to the start of the HEST table stored at
+ * ags->hest_addr_le, according with the memory layout map at
+ * docs/specs/acpi_hest_ghes.rst.
+ */
+
+/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
+ * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
+ */
+#define HEST_GHES_V2_TABLE_SIZE 92
+#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
+
+/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
+ * Table 18-380: 'Error Status Address' field
+ */
+#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
+
/*
* Values for error_severity field
*/
@@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
{
- uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
- uint64_t start_addr;
- bool ret = -1;
+ uint64_t hest_read_ack_start_addr, read_ack_start_addr;
+ uint64_t hest_addr, cper_addr, err_source_struct;
+ uint64_t hest_err_block_addr, error_block_addr;
+ uint32_t i, ret;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
+ uint64_t read_ack;
assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
@@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
g_assert(acpi_ged_state);
ags = &acpi_ged_state->ghes_state;
- start_addr = le64_to_cpu(ags->ghes_addr_le);
+ hest_addr = le64_to_cpu(ags->hest_addr_le);
if (!physical_address) {
return -1;
}
- start_addr += source_id * sizeof(uint64_t);
+ err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
- cpu_physical_memory_read(start_addr, &error_block_addr,
- sizeof(error_block_addr));
+ /*
+ * Currently, HEST Error source navigates only for GHESv2 tables
+ */
+ for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ uint64_t addr = err_source_struct;
+ uint16_t type, src_id;
- error_block_addr = le64_to_cpu(error_block_addr);
+ cpu_physical_memory_read(addr, &type, sizeof(type));
- read_ack_register_addr = start_addr +
- ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
+ /* For now, we only know the size of GHESv2 table */
+ assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
- cpu_physical_memory_read(read_ack_register_addr,
- &read_ack_register, sizeof(read_ack_register));
+ /* It is GHES. Compare CPER source address */
+ addr += sizeof(type);
+ cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
- /* zero means OSPM does not acknowledge the error */
- if (!read_ack_register) {
- error_report("OSPM does not acknowledge previous error,"
- " so can not record CPER for current error anymore");
- } else if (error_block_addr) {
- read_ack_register = cpu_to_le64(0);
- /*
- * Clear the Read Ack Register, OSPM will write it to 1 when
- * it acknowledges this error.
- */
- cpu_physical_memory_write(read_ack_register_addr,
- &read_ack_register, sizeof(uint64_t));
+ if (src_id == source_id) {
+ break;
+ }
- ret = acpi_ghes_record_mem_error(error_block_addr,
- physical_address);
- } else {
+ err_source_struct += HEST_GHES_V2_TABLE_SIZE;
+ }
+ if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
error_report("can not find Generic Error Status Block");
+
+ return -1;
}
+ /* Navigate though table address pointers */
+
+ hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
+ hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
+
+ cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
+ sizeof(error_block_addr));
+
+ cpu_physical_memory_read(error_block_addr, &cper_addr,
+ sizeof(error_block_addr));
+
+ cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
+ sizeof(read_ack_start_addr));
+
+ /* Update ACK offset to notify about a new error */
+
+ cpu_physical_memory_read(read_ack_start_addr,
+ &read_ack, sizeof(read_ack));
+
+ read_ack = cpu_to_le64(0);
+ cpu_physical_memory_write(read_ack_start_addr,
+ &read_ack, sizeof(read_ack));
+
+ ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
return ret;
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index c9bbda4740e2..7485f54c28f2 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -23,6 +23,7 @@
#define ACPI_GHES_H
#include "hw/acpi/bios-linker-loader.h"
+#include "qapi/error.h"
/*
* Values for Hardware Error Notification Type field
@@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
+int acpi_ghes_record_errors(uint8_t source_id,
+ uint64_t error_physical_addr);
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp);
/**
* acpi_ghes_present: Report whether ACPI GHES table is present
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
2024-09-14 6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
@ 2024-09-17 11:59 ` Igor Mammedov
2024-10-01 11:57 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 11:59 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:28 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> The current logic is based on a lot of duct tape, with
> offsets calculated based on one define with the number of
> source IDs and an enum.
>
> Rewrite the logic in a way that it would be more resilient
> of code changes, by moving the source ID count to an enum
> and make the offset calculus more explicit.
>
> Such change was inspired on a patch from Jonathan Cameron
> splitting the logic to get the CPER address on a separate
> function, as this will be needed to support generic error
> injection.
so this patch switches to using HEST to lookup error status block
by source id, though nothing in commit message mentions that.
Perhaps it's time to rewrite commit message to be more
specific/clear on what it's doing.
now, I'd split this on several patches that should also take care of
wiring needed to preserve old lookup to keep migration with 9.1 machines
working:
1. extract error status block read_ack addresses calculation/lookup
into a separate function (just current code, without HEST lookup).
2. since old lookup code handles only 1 source and won't ever see multiple
error sources, simplify that as much as possible to keep old way simple
(and mention that in commit message).
it basically should take 1st error status block/read ack addresses from
hardware_errors
3. add to AcpiGedState a callback to with signature of #1 lookup func.
(which will be used to switch between old/new code), by default set to
old lookup func
4. add to GED a bool property x-has-hardware_errors_addr = 1
and use it in acpi_ged_realize() to set callback
if x-has-hardware_errors_addr == 1
callback = old_lookup
5. add new HEST lookup func (not used yet with mentioning that follow up patch
will use it)
6. cleanup fwcfg based on x-has-hardware_errors_addr,
i.e. for 'true':
ask for write pointer to hardware_errors like it's done in current code
and don't register hest_addr write pointer
while for 'false'
do opposite of above.
7. flip x-has-hardware_errors_addr default to 0 and add to hw/core/machine.c
hw_compat_9_1[] = {
{ged type, 'x-has-hardware_errors_addr', 'true'},
}
this will make 9.1 and older machine types to use old lookup callback
while newer machines will use default new lookup
that should take care of switching between new and old ABI,
i.e. which code qemu and guest will use/see.
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
> ---
>
> Changes from v9:
> - patch split on multiple patches to avoid multiple changes
> at the same patch.
>
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
> That should make easier to add support for 86.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes.c | 95 ++++++++++++++++++++++++++++++------------
> include/hw/acpi/ghes.h | 6 ++-
> 2 files changed, 73 insertions(+), 28 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 36fe5f68782f..6e5f0e8cf0c9 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -61,6 +61,23 @@
> */
> #define ACPI_GHES_GESB_SIZE 20
>
> +/*
> + * Offsets with regards to the start of the HEST table stored at
> + * ags->hest_addr_le, according with the memory layout map at
> + * docs/specs/acpi_hest_ghes.rst.
> + */
> +
> +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> + */
> +#define HEST_GHES_V2_TABLE_SIZE 92
> +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
> +
> +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> + * Table 18-380: 'Error Status Address' field
> + */
> +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
> +
> /*
> * Values for error_severity field
> */
> @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
>
> int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> {
> - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> - uint64_t start_addr;
> - bool ret = -1;
> + uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> + uint64_t hest_addr, cper_addr, err_source_struct;
> + uint64_t hest_err_block_addr, error_block_addr;
> + uint32_t i, ret;
> AcpiGedState *acpi_ged_state;
> AcpiGhesState *ags;
> + uint64_t read_ack;
>
> assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
>
> @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> g_assert(acpi_ged_state);
> ags = &acpi_ged_state->ghes_state;
>
> - start_addr = le64_to_cpu(ags->ghes_addr_le);
> + hest_addr = le64_to_cpu(ags->hest_addr_le);
>
> if (!physical_address) {
> return -1;
> }
>
> - start_addr += source_id * sizeof(uint64_t);
> + err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
>
> - cpu_physical_memory_read(start_addr, &error_block_addr,
> - sizeof(error_block_addr));
> + /*
> + * Currently, HEST Error source navigates only for GHESv2 tables
> + */
> + for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> + uint64_t addr = err_source_struct;
> + uint16_t type, src_id;
>
> - error_block_addr = le64_to_cpu(error_block_addr);
> + cpu_physical_memory_read(addr, &type, sizeof(type));
>
> - read_ack_register_addr = start_addr +
> - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> + /* For now, we only know the size of GHESv2 table */
> + assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
>
> - cpu_physical_memory_read(read_ack_register_addr,
> - &read_ack_register, sizeof(read_ack_register));
> + /* It is GHES. Compare CPER source address */
> + addr += sizeof(type);
> + cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
>
> - /* zero means OSPM does not acknowledge the error */
> - if (!read_ack_register) {
> - error_report("OSPM does not acknowledge previous error,"
> - " so can not record CPER for current error anymore");
> - } else if (error_block_addr) {
> - read_ack_register = cpu_to_le64(0);
> - /*
> - * Clear the Read Ack Register, OSPM will write it to 1 when
> - * it acknowledges this error.
> - */
> - cpu_physical_memory_write(read_ack_register_addr,
> - &read_ack_register, sizeof(uint64_t));
> + if (src_id == source_id) {
> + break;
> + }
>
> - ret = acpi_ghes_record_mem_error(error_block_addr,
> - physical_address);
> - } else {
> + err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> + }
> + if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
> error_report("can not find Generic Error Status Block");
> +
> + return -1;
> }
>
> + /* Navigate though table address pointers */
> +
> + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> + hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> +
> + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> + sizeof(error_block_addr));
> +
> + cpu_physical_memory_read(error_block_addr, &cper_addr,
> + sizeof(error_block_addr));
> +
> + cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> + sizeof(read_ack_start_addr));
> +
> + /* Update ACK offset to notify about a new error */
> +
> + cpu_physical_memory_read(read_ack_start_addr,
> + &read_ack, sizeof(read_ack));
> +
> + read_ack = cpu_to_le64(0);
> + cpu_physical_memory_write(read_ack_start_addr,
> + &read_ack, sizeof(read_ack));
> +
> + ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
> return ret;
> }
>
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index c9bbda4740e2..7485f54c28f2 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,7 @@
> #define ACPI_GHES_H
>
> #include "hw/acpi/bios-linker-loader.h"
> +#include "qapi/error.h"
>
> /*
> * Values for Hardware Error Notification Type field
> @@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint8_t source_id,
> + uint64_t error_physical_addr);
> +void ghes_record_cper_errors(const void *cper, size_t len,
> + uint16_t source_id, Error **errp);
>
> /**
> * acpi_ghes_present: Report whether ACPI GHES table is present
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
2024-09-17 11:59 ` Igor Mammedov
@ 2024-10-01 11:57 ` Mauro Carvalho Chehab
2024-10-03 14:51 ` Igor Mammedov
0 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-10-01 11:57 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
Em Tue, 17 Sep 2024 13:59:34 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> On Sat, 14 Sep 2024 08:13:28 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > The current logic is based on a lot of duct tape, with
> > offsets calculated based on one define with the number of
> > source IDs and an enum.
> >
> > Rewrite the logic in a way that it would be more resilient
> > of code changes, by moving the source ID count to an enum
> > and make the offset calculus more explicit.
> >
> > Such change was inspired on a patch from Jonathan Cameron
> > splitting the logic to get the CPER address on a separate
> > function, as this will be needed to support generic error
> > injection.
>
> so this patch switches to using HEST to lookup error status block
> by source id, though nothing in commit message mentions that.
> Perhaps it's time to rewrite commit message to be more
> specific/clear on what it's doing.
>
> now, I'd split this on several patches that should also take care of
> wiring needed to preserve old lookup to keep migration with 9.1 machines
> working:
>
> 1. extract error status block read_ack addresses calculation/lookup
> into a separate function (just current code, without HEST lookup).
Done. This is at the cleanup series.
> 2. since old lookup code handles only 1 source and won't ever see multiple
> error sources, simplify that as much as possible to keep old way simple
> (and mention that in commit message).
> it basically should take 1st error status block/read ack addresses from
> hardware_errors
Done there too.
>
> 3. add to AcpiGedState a callback to with signature of #1 lookup func.
> (which will be used to switch between old/new code), by default set to
> old lookup func
>
> 4. add to GED a bool property x-has-hardware_errors_addr = 1
> and use it in acpi_ged_realize() to set callback
>
> if x-has-hardware_errors_addr == 1
> callback = old_lookup
Instead of a callback, I opted to define a boolean:
DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),
That avoided the need of exporting the math logic and deal with
callbacks. It also avoided the need of touching acpi_ged_realize().
>
> 5. add new HEST lookup func (not used yet with mentioning that follow up patch
> will use it)
>
> 6. cleanup fwcfg based on x-has-hardware_errors_addr,
> i.e. for 'true':
> ask for write pointer to hardware_errors like it's done in current code
> and don't register hest_addr write pointer
> while for 'false'
> do opposite of above.
This doesn't work. without the fw_cfg logic for both, QEMU/BIOS won't boot
and/or the hardware_errors won't work, causing ghes to do nothing.
> 7. flip x-has-hardware_errors_addr default to 0 and add to hw/core/machine.c
> hw_compat_9_1[] = {
> {ged type, 'x-has-hardware_errors_addr', 'true'},
> }
> this will make 9.1 and older machine types to use old lookup callback
> while newer machines will use default new lookup
I opted to add the changes on machine.c at the same patch.
See the RFC series I posted.
> that should take care of switching between new and old ABI,
> i.e. which code qemu and guest will use/see.
With the new series, both virt-9.1 and virt-9.2 are doing the
right thing using either the legacy or the new math. I double
checked with a small debug patch, and changing the virt-9.1
code to use GPIO, as it is a way easier to inject errors than
via SEA (SEA requires the host OS to generate a bus error on
a hw address used by the VM).
>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >
> > ---
> >
> > Changes from v9:
> > - patch split on multiple patches to avoid multiple changes
> > at the same patch.
> >
> > Changes from v8:
> > - Non-rename/cleanup changes merged altogether;
> > - source ID is now more generic, defined per guest target.
> > That should make easier to add support for 86.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> > hw/acpi/ghes.c | 95 ++++++++++++++++++++++++++++++------------
> > include/hw/acpi/ghes.h | 6 ++-
> > 2 files changed, 73 insertions(+), 28 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 36fe5f68782f..6e5f0e8cf0c9 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -61,6 +61,23 @@
> > */
> > #define ACPI_GHES_GESB_SIZE 20
> >
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> > +
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > + */
> > +#define HEST_GHES_V2_TABLE_SIZE 92
> > +#define GHES_ACK_OFFSET (64 + GAS_ADDR_OFFSET)
> > +
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> > + * Table 18-380: 'Error Status Address' field
> > + */
> > +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
> > +
> > /*
> > * Values for error_severity field
> > */
> > @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> >
> > int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > {
> > - uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > - uint64_t start_addr;
> > - bool ret = -1;
> > + uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> > + uint64_t hest_addr, cper_addr, err_source_struct;
> > + uint64_t hest_err_block_addr, error_block_addr;
> > + uint32_t i, ret;
> > AcpiGedState *acpi_ged_state;
> > AcpiGhesState *ags;
> > + uint64_t read_ack;
> >
> > assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
> >
> > @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> > g_assert(acpi_ged_state);
> > ags = &acpi_ged_state->ghes_state;
> >
> > - start_addr = le64_to_cpu(ags->ghes_addr_le);
> > + hest_addr = le64_to_cpu(ags->hest_addr_le);
> >
> > if (!physical_address) {
> > return -1;
> > }
> >
> > - start_addr += source_id * sizeof(uint64_t);
> > + err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
> >
> > - cpu_physical_memory_read(start_addr, &error_block_addr,
> > - sizeof(error_block_addr));
> > + /*
> > + * Currently, HEST Error source navigates only for GHESv2 tables
> > + */
> > + for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > + uint64_t addr = err_source_struct;
> > + uint16_t type, src_id;
> >
> > - error_block_addr = le64_to_cpu(error_block_addr);
> > + cpu_physical_memory_read(addr, &type, sizeof(type));
> >
> > - read_ack_register_addr = start_addr +
> > - ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > + /* For now, we only know the size of GHESv2 table */
> > + assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
> >
> > - cpu_physical_memory_read(read_ack_register_addr,
> > - &read_ack_register, sizeof(read_ack_register));
> > + /* It is GHES. Compare CPER source address */
> > + addr += sizeof(type);
> > + cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> >
> > - /* zero means OSPM does not acknowledge the error */
> > - if (!read_ack_register) {
> > - error_report("OSPM does not acknowledge previous error,"
> > - " so can not record CPER for current error anymore");
> > - } else if (error_block_addr) {
> > - read_ack_register = cpu_to_le64(0);
> > - /*
> > - * Clear the Read Ack Register, OSPM will write it to 1 when
> > - * it acknowledges this error.
> > - */
> > - cpu_physical_memory_write(read_ack_register_addr,
> > - &read_ack_register, sizeof(uint64_t));
> > + if (src_id == source_id) {
> > + break;
> > + }
> >
> > - ret = acpi_ghes_record_mem_error(error_block_addr,
> > - physical_address);
> > - } else {
> > + err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > + }
> > + if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
> > error_report("can not find Generic Error Status Block");
> > +
> > + return -1;
> > }
> >
> > + /* Navigate though table address pointers */
> > +
> > + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> > + hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> > +
> > + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > + sizeof(error_block_addr));
> > +
> > + cpu_physical_memory_read(error_block_addr, &cper_addr,
> > + sizeof(error_block_addr));
> > +
> > + cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> > + sizeof(read_ack_start_addr));
> > +
> > + /* Update ACK offset to notify about a new error */
> > +
> > + cpu_physical_memory_read(read_ack_start_addr,
> > + &read_ack, sizeof(read_ack));
> > +
> > + read_ack = cpu_to_le64(0);
> > + cpu_physical_memory_write(read_ack_start_addr,
> > + &read_ack, sizeof(read_ack));
> > +
> > + ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
> > return ret;
> > }
> >
> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index c9bbda4740e2..7485f54c28f2 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -23,6 +23,7 @@
> > #define ACPI_GHES_H
> >
> > #include "hw/acpi/bios-linker-loader.h"
> > +#include "qapi/error.h"
> >
> > /*
> > * Values for Hardware Error Notification Type field
> > @@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > const char *oem_id, const char *oem_table_id);
> > void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> > GArray *hardware_errors);
> > -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> > +int acpi_ghes_record_errors(uint8_t source_id,
> > + uint64_t error_physical_addr);
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > + uint16_t source_id, Error **errp);
> >
> > /**
> > * acpi_ghes_present: Report whether ACPI GHES table is present
>
Thanks,
Mauro
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
2024-10-01 11:57 ` Mauro Carvalho Chehab
@ 2024-10-03 14:51 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-10-03 14:51 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Tue, 1 Oct 2024 13:57:59 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 17 Sep 2024 13:59:34 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Sat, 14 Sep 2024 08:13:28 +0200
> > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> >
> > > The current logic is based on a lot of duct tape, with
> > > offsets calculated based on one define with the number of
> > > source IDs and an enum.
> > >
> > > Rewrite the logic in a way that it would be more resilient
> > > of code changes, by moving the source ID count to an enum
> > > and make the offset calculus more explicit.
> > >
> > > Such change was inspired on a patch from Jonathan Cameron
> > > splitting the logic to get the CPER address on a separate
> > > function, as this will be needed to support generic error
> > > injection.
> >
> > so this patch switches to using HEST to lookup error status block
> > by source id, though nothing in commit message mentions that.
> > Perhaps it's time to rewrite commit message to be more
> > specific/clear on what it's doing.
> >
> > now, I'd split this on several patches that should also take care of
> > wiring needed to preserve old lookup to keep migration with 9.1 machines
> > working:
[...]
> > 6. cleanup fwcfg based on x-has-hardware_errors_addr,
> > i.e. for 'true':
> > ask for write pointer to hardware_errors like it's done in current code
> > and don't register hest_addr write pointer
> > while for 'false'
> > do opposite of above.
>
> This doesn't work. without the fw_cfg logic for both, QEMU/BIOS won't boot
> and/or the hardware_errors won't work, causing ghes to do nothing.
we should look more into it,
only 1 of them hest_addr(9.2+) or hwerror_addr(9.1) is necessary
so if it breaks, it looks like a bug somewhere to me.
>
[...]
>
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 08/21] acpi/ghes: Change the type for source_id
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (6 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 12:04 ` Igor Mammedov
2024-09-14 6:13 ` [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes Mauro Carvalho Chehab
` (13 subsequent siblings)
21 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
HEST source ID is actually a 16-bit value. Yet, make it a little
bit more generic using just an integer type.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index c315de1802d6..58a04e935142 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 6e5f0e8cf0c9..4e34b16a6ca9 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -416,7 +416,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
+int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
{
uint64_t hest_read_ack_start_addr, read_ack_start_addr;
uint64_t hest_addr, cper_addr, err_source_struct;
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 7485f54c28f2..6471934d7775 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -75,7 +75,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(uint8_t source_id,
+int acpi_ghes_record_errors(int source_id,
uint64_t error_physical_addr);
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 08/21] acpi/ghes: Change the type for source_id
2024-09-14 6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-09-17 12:04 ` Igor Mammedov
0 siblings, 0 replies; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 12:04 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Dongjiu Geng, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:29 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> HEST source ID is actually a 16-bit value. Yet, make it a little
> bit more generic using just an integer type.
wouldn't uint16_t be better to use, to explicitly show expectations?
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 2 +-
> include/hw/acpi/ghes.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index c315de1802d6..58a04e935142 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,7 +11,7 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/ghes.h"
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
> {
> return -1;
> }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 6e5f0e8cf0c9..4e34b16a6ca9 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -416,7 +416,7 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> ags->present = true;
> }
>
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> +int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
> {
> uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> uint64_t hest_addr, cper_addr, err_source_struct;
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index 7485f54c28f2..6471934d7775 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -75,7 +75,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t source_id,
> +int acpi_ghes_record_errors(int source_id,
> uint64_t error_physical_addr);
> void ghes_record_cper_errors(const void *cper, size_t len,
> uint16_t source_id, Error **errp);
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (7 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
` (12 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
The number of sources is architecture-dependent. Usually,
architectures will implement one synchronous and/or one
asynchronous source.
Change the logic to better cope with such model.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 53 +++++++++++++++++++++++-----------------
hw/arm/virt-acpi-build.c | 5 ++++
include/hw/acpi/ghes.h | 21 ++++++++++------
3 files changed, 49 insertions(+), 30 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 4e34b16a6ca9..c88717fb1bef 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -251,17 +251,18 @@ static int acpi_ghes_record_mem_error(uint64_t error_block_address,
* Initialize "etc/hardware_errors" and "etc/hardware_errors_addr" fw_cfg blobs.
* See docs/specs/acpi_hest_ghes.rst for blobs format.
*/
-static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
+static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
+ int num_sources)
{
int i, error_status_block_offset;
/* Build error_block_address */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
}
/* Build read_ack_register */
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Initialize the value of read_ack_register to 1, so GHES can be
* writable after (re)boot.
@@ -276,13 +277,13 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
/* Reserve space for Error Status Data Block */
acpi_data_push(hardware_errors,
- ACPI_GHES_MAX_RAW_DATA_LENGTH * ACPI_GHES_ERROR_SOURCE_COUNT);
+ ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+ for (i = 0; i < num_sources; i++) {
/*
* Tell firmware to patch error_block_address entries to point to
* corresponding "Generic Error Status Block"
@@ -304,10 +305,12 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker)
/* Build Generic Hardware Error Source version 2 (GHESv2) */
static void build_ghes_v2(GArray *table_data,
BIOSLinker *linker,
- enum AcpiGhesNotifyType notify,
- uint16_t source_id)
+ const AcpiNotificationSourceId *notif_src,
+ uint16_t index, int num_sources)
{
uint64_t address_offset;
+ const uint16_t notify = notif_src->notify;
+ const uint16_t source_id = notif_src->source_id;
/*
* Type:
@@ -336,7 +339,7 @@ static void build_ghes_v2(GArray *table_data,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
+ ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t));
/* Notification Structure */
build_ghes_hw_error_notification(table_data, notify);
@@ -353,9 +356,10 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- (ACPI_GHES_ERROR_SOURCE_COUNT + source_id) * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_GHES_ERRORS_FW_CFG_FILE,
+ (num_sources + index) * sizeof(uint64_t));
/*
* Read Ack Preserve field
@@ -370,12 +374,15 @@ static void build_ghes_v2(GArray *table_data,
/* Build Hardware Error Source Table */
void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
BIOSLinker *linker,
+ const AcpiNotificationSourceId * const notif_source,
+ int num_sources,
const char *oem_id, const char *oem_table_id)
{
AcpiTable table = { .sig = "HEST", .rev = 1,
.oem_id = oem_id, .oem_table_id = oem_table_id };
+ int i;
- build_ghes_error_table(hardware_errors, linker);
+ build_ghes_error_table(hardware_errors, linker, num_sources);
acpi_table_begin(&table, table_data);
@@ -383,9 +390,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
int hest_offset = table_data->len;
/* Error Source Count */
- build_append_int_noprefix(table_data, ACPI_GHES_ERROR_SOURCE_COUNT, 4);
- build_ghes_v2(table_data, linker,
- ACPI_GHES_NOTIFY_SEA, ACPI_HEST_SRC_ID_SEA);
+ build_append_int_noprefix(table_data, num_sources, 4);
+ for (i = 0; i < num_sources; i++) {
+ build_ghes_v2(table_data, linker, ¬if_source[i], i, num_sources);
+ }
acpi_table_end(linker, &table);
@@ -421,13 +429,11 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
uint64_t hest_read_ack_start_addr, read_ack_start_addr;
uint64_t hest_addr, cper_addr, err_source_struct;
uint64_t hest_err_block_addr, error_block_addr;
- uint32_t i, ret;
+ uint32_t num_sources, i, ret;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
uint64_t read_ack;
- assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
-
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
g_assert(acpi_ged_state);
@@ -435,16 +441,18 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
hest_addr = le64_to_cpu(ags->hest_addr_le);
+ cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
if (!physical_address) {
return -1;
}
- err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
+ err_source_struct = hest_addr + sizeof(num_sources);
/*
* Currently, HEST Error source navigates only for GHESv2 tables
*/
- for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
+
+ for (i = 0; i < num_sources; i++) {
uint64_t addr = err_source_struct;
uint16_t type, src_id;
@@ -463,9 +471,8 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
err_source_struct += HEST_GHES_V2_TABLE_SIZE;
}
- if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
- error_report("can not find Generic Error Status Block");
-
+ if (i == num_sources) {
+ error_report("HEST: Source %d not found.", source_id);
return -1;
}
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bafd9a56c217..476c365851c4 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -890,6 +890,10 @@ static void acpi_align_size(GArray *blob, unsigned align)
g_array_set_size(blob, ROUND_UP(acpi_data_len(blob), align));
}
+static const AcpiNotificationSourceId hest_ghes_notify[] = {
+ {ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
+};
+
static
void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
{
@@ -945,6 +949,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
if (vms->ras) {
acpi_add_table(table_offsets, tables_blob);
acpi_build_hest(tables_blob, tables->hardware_errors, tables->linker,
+ hest_ghes_notify, ARRAY_SIZE(hest_ghes_notify),
vms->oem_id, vms->oem_table_id);
}
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 6471934d7775..344919f1f75c 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -57,21 +57,28 @@ enum AcpiGhesNotifyType {
ACPI_GHES_NOTIFY_RESERVED = 12
};
-enum {
- ACPI_HEST_SRC_ID_SEA = 0,
- /* future ids go here */
-
- ACPI_GHES_ERROR_SOURCE_COUNT
-};
-
typedef struct AcpiGhesState {
uint64_t hest_addr_le;
uint64_t ghes_addr_le;
bool present; /* True if GHES is present at all on this board */
} AcpiGhesState;
+/*
+ * ID numbers used to fill HEST source ID field
+ */
+enum AcpiGhesSourceID {
+ ACPI_HEST_SRC_ID_SYNC,
+};
+
+typedef struct AcpiNotificationSourceId {
+ enum AcpiGhesSourceID source_id;
+ enum AcpiGhesNotifyType notify;
+} AcpiNotificationSourceId;
+
void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
BIOSLinker *linker,
+ const AcpiNotificationSourceId * const notif_source,
+ int num_sources,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (8 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
` (11 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Split the code into separate functions to allow using the
common CPER filling code by different error sources.
The generic code was moved to ghes_record_cper_errors(),
and ghes_gen_err_data_uncorrectable_recoverable() now contains
only a logic to fill GEGB part of the record.
The remaining code to generate a memory error now belongs to
acpi_ghes_record_errors() function.
A further patch will give it a better name.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 101 ++++++++++++++++++++++++++++++-------------------
1 file changed, 63 insertions(+), 38 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index c88717fb1bef..f54865423f69 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -199,51 +199,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
build_append_int_noprefix(table, 0, 7);
}
-static int acpi_ghes_record_mem_error(uint64_t error_block_address,
- uint64_t error_physical_addr)
+static void
+ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
+ const uint8_t *section_type,
+ int data_length)
{
- GArray *block;
-
- /* Memory Error Section Type */
- const uint8_t uefi_cper_mem_sec[] =
- UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
- 0xED, 0x7C, 0x83, 0xB1);
-
/* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
* Table 17-13 Generic Error Data Entry
*/
QemuUUID fru_id = {};
- uint32_t data_length;
- block = g_array_new(false, true /* clear */, 1);
-
- /* This is the length if adding a new generic error data entry*/
- data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
/*
- * It should not run out of the preallocated memory if adding a new generic
- * error data entry
+ * Calculate the size with this block. No need to check for
+ * too big CPER, as CPER size is checked at ghes_record_cper_errors()
*/
- assert((data_length + ACPI_GHES_GESB_SIZE) <=
- ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ data_length += ACPI_GHES_GESB_SIZE;
/* Build the new generic error status block header */
acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
/* Build this new generic error data entry header */
- acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
+ acpi_ghes_generic_error_data(block, section_type,
ACPI_CPER_SEV_RECOVERABLE, 0, 0,
ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
-
- /* Build the memory section CPER for above new generic error data entry */
- acpi_ghes_build_append_mem_cper(block, error_physical_addr);
-
- /* Write the generic error data entry into guest memory */
- cpu_physical_memory_write(error_block_address, block->data, block->len);
-
- g_array_free(block, true);
-
- return 0;
}
/*
@@ -424,16 +403,22 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
-int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+void ghes_record_cper_errors(const void *cper, size_t len,
+ uint16_t source_id, Error **errp)
{
uint64_t hest_read_ack_start_addr, read_ack_start_addr;
uint64_t hest_addr, cper_addr, err_source_struct;
uint64_t hest_err_block_addr, error_block_addr;
- uint32_t num_sources, i, ret;
+ uint32_t num_sources, i;
AcpiGedState *acpi_ged_state;
AcpiGhesState *ags;
uint64_t read_ack;
+ if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
+ error_setg(errp, "GHES CPER record is too big: %ld", len);
+ return;
+ }
+
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
g_assert(acpi_ged_state);
@@ -442,9 +427,6 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
hest_addr = le64_to_cpu(ags->hest_addr_le);
cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
- if (!physical_address) {
- return -1;
- }
err_source_struct = hest_addr + sizeof(num_sources);
@@ -472,8 +454,8 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
err_source_struct += HEST_GHES_V2_TABLE_SIZE;
}
if (i == num_sources) {
- error_report("HEST: Source %d not found.", source_id);
- return -1;
+ error_setg(errp, "HEST: Source %d not found.", source_id);
+ return;
}
/* Navigate though table address pointers */
@@ -495,12 +477,55 @@ int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
cpu_physical_memory_read(read_ack_start_addr,
&read_ack, sizeof(read_ack));
+ /* zero means OSPM does not acknowledge the error */
+ if (!read_ack) {
+ error_setg(errp,
+ "Last CPER record was not acknowledged yet");
+ return;
+ }
+
read_ack = cpu_to_le64(0);
cpu_physical_memory_write(read_ack_start_addr,
&read_ack, sizeof(read_ack));
- ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
- return ret;
+ /* Write the generic error data entry into guest memory */
+ cpu_physical_memory_write(cper_addr, cper, len);
+}
+
+int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+{
+ /* Memory Error Section Type */
+ const uint8_t guid[] =
+ UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
+ 0xED, 0x7C, 0x83, 0xB1);
+ Error *errp = NULL;
+ GArray *block;
+
+ if (!physical_address) {
+ error_report("can not find Generic Error Status Block for source id %d",
+ source_id);
+ return -1;
+ }
+
+ block = g_array_new(false, true /* clear */, 1);
+
+ ghes_gen_err_data_uncorrectable_recoverable(block, guid,
+ ACPI_GHES_MAX_RAW_DATA_LENGTH);
+
+ /* Build the memory section CPER for above new generic error data entry */
+ acpi_ghes_build_append_mem_cper(block, physical_address);
+
+ /* Report the error */
+ ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+
+ g_array_free(block, true);
+
+ if (errp) {
+ error_report_err(errp);
+ return -1;
+ }
+
+ return 0;
}
bool acpi_ghes_present(void)
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (9 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
` (10 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Instead, produce an error and continue working
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index f54865423f69..e47c0238f3c5 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -421,7 +421,10 @@ void ghes_record_cper_errors(const void *cper, size_t len,
acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
NULL));
- g_assert(acpi_ged_state);
+ if (!acpi_ged_state) {
+ error_setg(errp, "Can't find ACPI_GED object");
+ return;
+ }
ags = &acpi_ged_state->ghes_state;
hest_addr = le64_to_cpu(ags->hest_addr_le);
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (10 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
` (9 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Now that we have also have a file to store HEST data location,
which is part of GHES, better name the file where CPER records
are stored.
No functional changes.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index e47c0238f3c5..dc15d6a693d6 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -28,8 +28,8 @@
#include "hw/nvram/fw_cfg.h"
#include "qemu/uuid.h"
-#define ACPI_GHES_ERRORS_FW_CFG_FILE "etc/hardware_errors"
-#define ACPI_GHES_DATA_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
+#define ACPI_HW_ERROR_FW_CFG_FILE "etc/hardware_errors"
+#define ACPI_HW_ERROR_ADDR_FW_CFG_FILE "etc/hardware_errors_addr"
#define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
/* The max size in bytes for one error block */
@@ -259,7 +259,7 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
ACPI_GHES_MAX_RAW_DATA_LENGTH * num_sources);
/* Tell guest firmware to place hardware_errors blob into RAM */
- bios_linker_loader_alloc(linker, ACPI_GHES_ERRORS_FW_CFG_FILE,
+ bios_linker_loader_alloc(linker, ACPI_HW_ERROR_FW_CFG_FILE,
hardware_errors, sizeof(uint64_t), false);
for (i = 0; i < num_sources; i++) {
@@ -268,17 +268,21 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
* corresponding "Generic Error Status Block"
*/
bios_linker_loader_add_pointer(linker,
- ACPI_GHES_ERRORS_FW_CFG_FILE, sizeof(uint64_t) * i,
- sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
- error_status_block_offset + i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ sizeof(uint64_t) * i,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ error_status_block_offset +
+ i * ACPI_GHES_MAX_RAW_DATA_LENGTH);
}
/*
* tell firmware to write hardware_errors GPA into
* hardware_errors_addr fw_cfg, once the former has been initialized.
*/
- bios_linker_loader_write_pointer(linker, ACPI_GHES_DATA_ADDR_FW_CFG_FILE,
- 0, sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE, 0);
+ bios_linker_loader_write_pointer(linker, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, 0,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE, 0);
}
/* Build Generic Hardware Error Source version 2 (GHESv2) */
@@ -317,8 +321,10 @@ static void build_ghes_v2(GArray *table_data,
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
- address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE, index * sizeof(uint64_t));
+ address_offset + GAS_ADDR_OFFSET,
+ sizeof(uint64_t),
+ ACPI_HW_ERROR_FW_CFG_FILE,
+ index * sizeof(uint64_t));
/* Notification Structure */
build_ghes_hw_error_notification(table_data, notify);
@@ -337,7 +343,7 @@ static void build_ghes_v2(GArray *table_data,
bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
address_offset + GAS_ADDR_OFFSET,
sizeof(uint64_t),
- ACPI_GHES_ERRORS_FW_CFG_FILE,
+ ACPI_HW_ERROR_FW_CFG_FILE,
(num_sources + index) * sizeof(uint64_t));
/*
@@ -390,11 +396,11 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
GArray *hardware_error)
{
/* Create a read-only fw_cfg file for GHES */
- fw_cfg_add_file(s, ACPI_GHES_ERRORS_FW_CFG_FILE, hardware_error->data,
+ fw_cfg_add_file(s, ACPI_HW_ERROR_FW_CFG_FILE, hardware_error->data,
hardware_error->len);
/* Create a read-write fw_cfg file for Address */
- fw_cfg_add_file_callback(s, ACPI_GHES_DATA_ADDR_FW_CFG_FILE, NULL, NULL,
+ fw_cfg_add_file_callback(s, ACPI_HW_ERROR_ADDR_FW_CFG_FILE, NULL, NULL,
NULL, &(ags->ghes_addr_le), sizeof(ags->ghes_addr_le), false);
fw_cfg_add_file_callback(s, ACPI_HEST_ADDR_FW_CFG_FILE, NULL, NULL,
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 13/21] acpi/ghes: better name GHES memory error function
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (11 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
` (8 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Paolo Bonzini,
Peter Maydell, kvm, linux-kernel, qemu-arm, qemu-devel
The current function used to generate GHES data is specific for
memory errors. Give a better name for it, as we now have a generic
function as well.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes-stub.c | 2 +-
hw/acpi/ghes.c | 2 +-
include/hw/acpi/ghes.h | 4 ++--
target/arm/kvm.c | 3 ++-
4 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 58a04e935142..b0f053d5998f 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,7 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
{
return -1;
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index dc15d6a693d6..a8feb39c9f30 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -501,7 +501,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
cpu_physical_memory_write(cper_addr, cper, len);
}
-int acpi_ghes_record_errors(int source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
{
/* Memory Error Section Type */
const uint8_t guid[] =
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 344919f1f75c..7a7961e6078a 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -82,7 +82,7 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_record_errors(int source_id,
+int acpi_ghes_memory_errors(int source_id,
uint64_t error_physical_addr);
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp);
@@ -91,7 +91,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
* acpi_ghes_present: Report whether ACPI GHES table is present
*
* Returns: true if the system has an ACPI GHES table and it is
- * safe to call acpi_ghes_record_errors() to record a memory error.
+ * safe to call acpi_ghes_memory_errors() to record a memory error.
*/
bool acpi_ghes_present(void);
#endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 849e2e21b304..57192285fb96 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2373,7 +2373,8 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+ if (!acpi_ghes_memory_errors(ARM_ACPI_HEST_SRC_ID_SYNC,
+ paddr)) {
kvm_inject_arm_sea(c);
} else {
error_report("failed to record the error");
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (12 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
` (7 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, linux-kernel,
qemu-arm, qemu-devel
Some error injection notify methods are async, like GPIO
notify. Add a notifier to be used when the error record is
ready to be sent to the guest OS.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/acpi/ghes.c | 5 +++++
include/hw/acpi/ghes.h | 3 +++
2 files changed, 8 insertions(+)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a8feb39c9f30..7bea265c7ef3 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -409,6 +409,9 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
ags->present = true;
}
+NotifierList acpi_generic_error_notifiers =
+ NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
+
void ghes_record_cper_errors(const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
@@ -499,6 +502,8 @@ void ghes_record_cper_errors(const void *cper, size_t len,
/* Write the generic error data entry into guest memory */
cpu_physical_memory_write(cper_addr, cper, len);
+
+ notifier_list_notify(&acpi_generic_error_notifiers, NULL);
}
int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 7a7961e6078a..83c912338137 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -24,6 +24,9 @@
#include "hw/acpi/bios-linker-loader.h"
#include "qapi/error.h"
+#include "qemu/notify.h"
+
+extern NotifierList acpi_generic_error_notifiers;
/*
* Values for Hardware Error Notification Type field
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (13 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
` (6 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, linux-kernel, qemu-devel
Adds a generic error device to handle generic hardware error
events as specified at ACPI 6.5 specification at 18.3.2.7.2:
https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
using HID PNP0C33.
The PNP0C33 device is used to report hardware errors to
the guest via ACPI APEI Generic Hardware Error Source (GHES).
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
---
hw/acpi/aml-build.c | 10 ++++++++++
hw/acpi/generic_event_device.c | 8 ++++++++
include/hw/acpi/acpi_dev_interface.h | 1 +
include/hw/acpi/aml-build.h | 2 ++
include/hw/acpi/generic_event_device.h | 1 +
5 files changed, 22 insertions(+)
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 6d4517cfbe3d..222a933a8760 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2520,3 +2520,13 @@ Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source)
return var;
}
+
+/* ACPI 5.0b: 18.3.2.6.2 Event Notification For Generic Error Sources */
+Aml *aml_error_device(void)
+{
+ Aml *dev = aml_device(ACPI_APEI_ERROR_DEVICE);
+ aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C33")));
+ aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+ return dev;
+}
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 4e5e387ee2df..efae0ff62c7b 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -26,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
ACPI_GED_PWR_DOWN_EVT,
ACPI_GED_NVDIMM_HOTPLUG_EVT,
ACPI_GED_CPU_HOTPLUG_EVT,
+ ACPI_GED_ERROR_EVT,
};
/*
@@ -116,6 +117,11 @@ void build_ged_aml(Aml *table, const char *name, HotplugHandler *hotplug_dev,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
aml_int(0x80)));
break;
+ case ACPI_GED_ERROR_EVT:
+ aml_append(if_ctx,
+ aml_notify(aml_name(ACPI_APEI_ERROR_DEVICE),
+ aml_int(0x80)));
+ break;
case ACPI_GED_NVDIMM_HOTPLUG_EVT:
aml_append(if_ctx,
aml_notify(aml_name("\\_SB.NVDR"),
@@ -295,6 +301,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
sel = ACPI_GED_MEM_HOTPLUG_EVT;
} else if (ev & ACPI_POWER_DOWN_STATUS) {
sel = ACPI_GED_PWR_DOWN_EVT;
+ } else if (ev & ACPI_GENERIC_ERROR) {
+ sel = ACPI_GED_ERROR_EVT;
} else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
} else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
diff --git a/include/hw/acpi/acpi_dev_interface.h b/include/hw/acpi/acpi_dev_interface.h
index 68d9d15f50aa..8294f8f0ccca 100644
--- a/include/hw/acpi/acpi_dev_interface.h
+++ b/include/hw/acpi/acpi_dev_interface.h
@@ -13,6 +13,7 @@ typedef enum {
ACPI_NVDIMM_HOTPLUG_STATUS = 16,
ACPI_VMGENID_CHANGE_STATUS = 32,
ACPI_POWER_DOWN_STATUS = 64,
+ ACPI_GENERIC_ERROR = 128,
} AcpiEventStatusBits;
#define TYPE_ACPI_DEVICE_IF "acpi-device-interface"
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index a3784155cb33..44d1a6af0c69 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -252,6 +252,7 @@ struct CrsRangeSet {
/* Consumer/Producer */
#define AML_SERIAL_BUS_FLAG_CONSUME_ONLY (1 << 1)
+#define ACPI_APEI_ERROR_DEVICE "GEDD"
/**
* init_aml_allocator:
*
@@ -382,6 +383,7 @@ Aml *aml_dma(AmlDmaType typ, AmlDmaBusMaster bm, AmlTransferSize sz,
uint8_t channel);
Aml *aml_sleep(uint64_t msec);
Aml *aml_i2c_serial_bus_device(uint16_t address, const char *resource_source);
+Aml *aml_error_device(void);
/* Block AML object primitives */
Aml *aml_scope(const char *name_format, ...) G_GNUC_PRINTF(1, 2);
diff --git a/include/hw/acpi/generic_event_device.h b/include/hw/acpi/generic_event_device.h
index 40af3550b56d..9ace8fe70328 100644
--- a/include/hw/acpi/generic_event_device.h
+++ b/include/hw/acpi/generic_event_device.h
@@ -98,6 +98,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(AcpiGedState, ACPI_GED)
#define ACPI_GED_PWR_DOWN_EVT 0x2
#define ACPI_GED_NVDIMM_HOTPLUG_EVT 0x4
#define ACPI_GED_CPU_HOTPLUG_EVT 0x8
+#define ACPI_GED_ERROR_EVT 0x10
typedef struct GEDState {
MemoryRegion evt;
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (14 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
` (5 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Peter Maydell, Shannon Zhao,
linux-kernel, qemu-arm, qemu-devel
Adds support to ARM virtualization to allow handling
generic error ACPI Event via GED & error source device.
It is aligned with Linux Kernel patch:
https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.huang@intel.com/
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
---
Changes from v8:
- Added a call to the function that produces GHES generic
records, as this is now added earlier in this series.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
hw/arm/virt-acpi-build.c | 1 +
hw/arm/virt.c | 12 +++++++++++-
include/hw/arm/virt.h | 1 +
3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 476c365851c4..b0606434c972 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -858,6 +858,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
}
acpi_dsdt_add_power_button(scope);
+ aml_append(scope, aml_error_device());
#ifdef CONFIG_TPM
acpi_dsdt_add_tpm(scope, vms);
#endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7934b2365163..d970893a3db6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -677,7 +677,7 @@ static inline DeviceState *create_acpi_ged(VirtMachineState *vms)
DeviceState *dev;
MachineState *ms = MACHINE(vms);
int irq = vms->irqmap[VIRT_ACPI_GED];
- uint32_t event = ACPI_GED_PWR_DOWN_EVT;
+ uint32_t event = ACPI_GED_PWR_DOWN_EVT | ACPI_GED_ERROR_EVT;
if (ms->ram_slots) {
event |= ACPI_GED_MEM_HOTPLUG_EVT;
@@ -1009,6 +1009,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
}
}
+static void virt_generic_error_req(Notifier *n, void *opaque)
+{
+ VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
+
+ acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
+}
+
static void create_gpio_keys(char *fdt, DeviceState *pl061_dev,
uint32_t phandle)
{
@@ -2389,6 +2396,9 @@ static void machvirt_init(MachineState *machine)
if (has_ged && aarch64 && firmware_loaded && virt_is_acpi_enabled(vms)) {
vms->acpi_dev = create_acpi_ged(vms);
+ vms->generic_error_notifier.notify = virt_generic_error_req;
+ notifier_list_add(&acpi_generic_error_notifiers,
+ &vms->generic_error_notifier);
} else {
create_gpio_devices(vms, VIRT_GPIO, sysmem);
}
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index aca4f8061b18..24ab84cd623d 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -176,6 +176,7 @@ struct VirtMachineState {
DeviceState *gic;
DeviceState *acpi_dev;
Notifier powerdown_notifier;
+ Notifier generic_error_notifier;
PCIBus *bus;
char *oem_id;
char *oem_table_id;
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (15 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
` (4 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Michael S. Tsirkin, Ani Sinha, Dongjiu Geng, Eric Blake,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, linux-kernel, qemu-arm, qemu-devel
Creates a QMP command to be used for generic ACPI APEI hardware error
injection (HEST) via GHESv2, and add support for it for ARM guests.
Error injection uses ACPI_HEST_SRC_ID_QMP source ID to be platform
independent. This is mapped at arch virt bindings, depending on the
types supported by QEMU and by the BIOS. So, on ARM, this is supported
via ACPI_GHES_NOTIFY_GPIO notification type.
This patch is co-authored:
- original ghes logic to inject a simple ARM record by Shiju Jose;
- generic logic to handle block addresses by Jonathan Cameron;
- generic GHESv2 error inject by Mauro Carvalho Chehab;
Co-authored-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
Changes since v9:
- ARM source IDs renamed to reflect SYNC/ASYNC;
- command name changed to better reflect what it does;
- some improvements at JSON documentation;
- add a check for QMP source at the notification logic.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 7 +++++++
hw/acpi/Kconfig | 5 +++++
hw/acpi/ghes.c | 2 +-
hw/acpi/ghes_cper.c | 32 ++++++++++++++++++++++++++++++++
hw/acpi/ghes_cper_stub.c | 19 +++++++++++++++++++
hw/acpi/meson.build | 2 ++
hw/arm/virt-acpi-build.c | 1 +
hw/arm/virt.c | 7 +++++++
include/hw/acpi/ghes.h | 1 +
include/hw/arm/virt.h | 1 +
qapi/acpi-hest.json | 35 +++++++++++++++++++++++++++++++++++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
13 files changed, 113 insertions(+), 1 deletion(-)
create mode 100644 hw/acpi/ghes_cper.c
create mode 100644 hw/acpi/ghes_cper_stub.c
create mode 100644 qapi/acpi-hest.json
diff --git a/MAINTAINERS b/MAINTAINERS
index c59f7b253825..776f94efff02 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2094,6 +2094,13 @@ F: hw/acpi/ghes.c
F: include/hw/acpi/ghes.h
F: docs/specs/acpi_hest_ghes.rst
+ACPI/HEST/GHES/ARM processor CPER
+R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+S: Maintained
+F: hw/arm/ghes_cper.c
+F: hw/acpi/ghes_cper_stub.c
+F: qapi/acpi-hest.json
+
ppc4xx
L: qemu-ppc@nongnu.org
S: Orphan
diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index e07d3204eb36..73ffbb82c150 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -51,6 +51,11 @@ config ACPI_APEI
bool
depends on ACPI
+config GHES_CPER
+ bool
+ depends on ACPI_APEI
+ default y
+
config ACPI_PCI
bool
depends on ACPI && PCI
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 7bea265c7ef3..d7d147ef40f2 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -503,7 +503,7 @@ void ghes_record_cper_errors(const void *cper, size_t len,
/* Write the generic error data entry into guest memory */
cpu_physical_memory_write(cper_addr, cper, len);
- notifier_list_notify(&acpi_generic_error_notifiers, NULL);
+ notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
}
int acpi_ghes_memory_errors(int source_id, uint64_t physical_address)
diff --git a/hw/acpi/ghes_cper.c b/hw/acpi/ghes_cper.c
new file mode 100644
index 000000000000..02c47b41b990
--- /dev/null
+++ b/hw/acpi/ghes_cper.c
@@ -0,0 +1,32 @@
+/*
+ * CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu/base64.h"
+#include "qemu/error-report.h"
+#include "qemu/uuid.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *qmp_cper, Error **errp)
+{
+
+ uint8_t *cper;
+ size_t len;
+
+ cper = qbase64_decode(qmp_cper, -1, &len, errp);
+ if (!cper) {
+ error_setg(errp, "missing GHES CPER payload");
+ return;
+ }
+
+ ghes_record_cper_errors(cper, len, ACPI_HEST_SRC_ID_QMP, errp);
+}
diff --git a/hw/acpi/ghes_cper_stub.c b/hw/acpi/ghes_cper_stub.c
new file mode 100644
index 000000000000..8782e2c02fa8
--- /dev/null
+++ b/hw/acpi/ghes_cper_stub.c
@@ -0,0 +1,19 @@
+/*
+ * Stub interface for CPER payload parser for error injection
+ *
+ * Copyright(C) 2024 Huawei LTD.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-acpi-hest.h"
+#include "hw/acpi/ghes.h"
+
+void qmp_inject_ghes_error(const char *cper, Error **errp)
+{
+ error_setg(errp, "GHES QMP error inject is not compiled in");
+}
diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
index fa5c07db9068..6cbf430eb66d 100644
--- a/hw/acpi/meson.build
+++ b/hw/acpi/meson.build
@@ -34,4 +34,6 @@ endif
system_ss.add(when: 'CONFIG_ACPI', if_false: files('acpi-stub.c', 'aml-build-stub.c', 'ghes-stub.c', 'acpi_interface.c'))
system_ss.add(when: 'CONFIG_ACPI_PCI_BRIDGE', if_false: files('pci-bridge-stub.c'))
system_ss.add_all(when: 'CONFIG_ACPI', if_true: acpi_ss)
+system_ss.add(when: 'CONFIG_GHES_CPER', if_true: files('ghes_cper.c'))
+system_ss.add(when: 'CONFIG_GHES_CPER', if_false: files('ghes_cper_stub.c'))
system_ss.add(files('acpi-qmp-cmds.c'))
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index b0606434c972..f2db8bc485f9 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -893,6 +893,7 @@ static void acpi_align_size(GArray *blob, unsigned align)
static const AcpiNotificationSourceId hest_ghes_notify[] = {
{ACPI_HEST_SRC_ID_SYNC, ACPI_GHES_NOTIFY_SEA},
+ {ACPI_HEST_SRC_ID_QMP, ACPI_GHES_NOTIFY_GPIO},
};
static
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d970893a3db6..5fdff23610d8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1011,6 +1011,13 @@ static void virt_powerdown_req(Notifier *n, void *opaque)
static void virt_generic_error_req(Notifier *n, void *opaque)
{
+ uint16_t *source_id = opaque;
+
+ /* Currently, only QMP source ID is async */
+ if (*source_id != ACPI_HEST_SRC_ID_QMP) {
+ return;
+ }
+
VirtMachineState *s = container_of(n, VirtMachineState, generic_error_notifier);
acpi_send_event(s->acpi_dev, ACPI_GENERIC_ERROR);
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 83c912338137..54490a6dd13e 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -71,6 +71,7 @@ typedef struct AcpiGhesState {
*/
enum AcpiGhesSourceID {
ACPI_HEST_SRC_ID_SYNC,
+ ACPI_HEST_SRC_ID_QMP, /* Use it only for QMP injected errors */
};
typedef struct AcpiNotificationSourceId {
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 24ab84cd623d..d96edea8a3f9 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -33,6 +33,7 @@
#include "exec/hwaddr.h"
#include "qemu/notify.h"
#include "hw/boards.h"
+#include "hw/acpi/ghes.h"
#include "hw/arm/boot.h"
#include "hw/arm/bsa.h"
#include "hw/block/flash.h"
diff --git a/qapi/acpi-hest.json b/qapi/acpi-hest.json
new file mode 100644
index 000000000000..d58fba485180
--- /dev/null
+++ b/qapi/acpi-hest.json
@@ -0,0 +1,35 @@
+# -*- Mode: Python -*-
+# vim: filetype=python
+
+##
+# == GHESv2 CPER Error Injection
+#
+# Defined since ACPI Specification 6.1,
+# section 18.3.2.8 Generic Hardware Error Source version 2. See:
+#
+# https://uefi.org/sites/default/files/resources/ACPI_6_1.pdf
+##
+
+
+##
+# @inject-ghes-error:
+#
+# Inject an error with additional ACPI 6.1 GHESv2 error information
+#
+# @cper: contains a base64 encoded string with raw data for a single
+# CPER record with Generic Error Status Block, Generic Error Data
+# Entry and generic error data payload, as described at
+# https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#format
+#
+# Features:
+#
+# @unstable: This command is experimental.
+#
+# Since: 9.2
+##
+{ 'command': 'inject-ghes-error',
+ 'data': {
+ 'cper': 'str'
+ },
+ 'features': [ 'unstable' ]
+}
diff --git a/qapi/meson.build b/qapi/meson.build
index e7bc54e5d047..35cea6147262 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -59,6 +59,7 @@ qapi_all_modules = [
if have_system
qapi_all_modules += [
'acpi',
+ 'acpi-hest',
'audio',
'cryptodev',
'qdev',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index b1581988e4eb..baf19ab73afe 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -75,6 +75,7 @@
{ 'include': 'misc-target.json' }
{ 'include': 'audio.json' }
{ 'include': 'acpi.json' }
+{ 'include': 'acpi-hest.json' }
{ 'include': 'pci.json' }
{ 'include': 'stats.json' }
{ 'include': 'virtio.json' }
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (16 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
` (3 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Dongjiu Geng,
linux-kernel, qemu-arm, qemu-devel
While the spec defines a CPER size of 4KiB for each record,
currently it is set to 1KiB. Fix the documentation and add
a pointer to the macro name there, as this may help to keep
it updated.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Igor Mammedov <imammedo@redhat.com>
---
docs/specs/acpi_hest_ghes.rst | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index 68f1fbe0a4af..c3e9f8d9a702 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -67,8 +67,10 @@ Design Details
(3) The address registers table contains N Error Block Address entries
and N Read Ack Register entries. The size for each entry is 8-byte.
The Error Status Data Block table contains N Error Status Data Block
- entries. The size for each entry is 4096(0x1000) bytes. The total size
- for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes.
+ entries. The size for each entry is defined at the source code as
+ ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
+ for the "etc/hardware_errors" fw_cfg blob is
+ (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
N is the number of the kinds of hardware error sources.
(4) QEMU generates the ACPI linker/loader script for the firmware. The
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (17 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
` (2 subsequent siblings)
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Cleber Rosa,
John Snow, linux-kernel, qemu-devel
Using the QMP GHESv2 API requires preparing a raw data array
containing a CPER record.
Add a helper script with subcommands to prepare such data.
Currently, only ARM Processor error CPER record is supported.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
MAINTAINERS | 3 +
scripts/arm_processor_error.py | 377 ++++++++++++++++++
scripts/ghes_inject.py | 51 +++
scripts/qmp_helper.py | 702 +++++++++++++++++++++++++++++++++
4 files changed, 1133 insertions(+)
create mode 100644 scripts/arm_processor_error.py
create mode 100755 scripts/ghes_inject.py
create mode 100644 scripts/qmp_helper.py
diff --git a/MAINTAINERS b/MAINTAINERS
index 776f94efff02..8816132d40f6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2100,6 +2100,9 @@ S: Maintained
F: hw/arm/ghes_cper.c
F: hw/acpi/ghes_cper_stub.c
F: qapi/acpi-hest.json
+F: scripts/ghes_inject.py
+F: scripts/arm_processor_error.py
+F: scripts/qmp_helper.py
ppc4xx
L: qemu-ppc@nongnu.org
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
new file mode 100644
index 000000000000..62e0c5662232
--- /dev/null
+++ b/scripts/arm_processor_error.py
@@ -0,0 +1,377 @@
+#!/usr/bin/env python3
+#
+# pylint: disable=C0301,C0114,R0903,R0912,R0913,R0914,R0915,W0511
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+# TODO: current implementation has dummy defaults.
+#
+# For a better implementation, a QMP addition/call is needed to
+# retrieve some data for ARM Processor Error injection:
+#
+# - ARM registers: power_state, mpidr.
+
+import argparse
+import re
+
+from qmp_helper import qmp, util, cper_guid
+
+class ArmProcessorEinj:
+ """
+ Implements ARM Processor Error injection via GHES
+ """
+
+ DESC = """
+ Generates an ARM processor error CPER, compatible with
+ UEFI 2.9A Errata.
+ """
+
+ ACPI_GHES_ARM_CPER_LENGTH = 40
+ ACPI_GHES_ARM_CPER_PEI_LENGTH = 32
+
+ # Context types
+ CONTEXT_AARCH32_EL1 = 1
+ CONTEXT_AARCH64_EL1 = 5
+ CONTEXT_MISC_REG = 8
+
+ def __init__(self, subparsers):
+ """Initialize the error injection class and add subparser"""
+
+ # Valid choice values
+ self.arm_valid_bits = {
+ "mpidr": util.bit(0),
+ "affinity": util.bit(1),
+ "running": util.bit(2),
+ "vendor": util.bit(3),
+ }
+
+ self.pei_flags = {
+ "first": util.bit(0),
+ "last": util.bit(1),
+ "propagated": util.bit(2),
+ "overflow": util.bit(3),
+ }
+
+ self.pei_error_types = {
+ "cache": util.bit(1),
+ "tlb": util.bit(2),
+ "bus": util.bit(3),
+ "micro-arch": util.bit(4),
+ }
+
+ self.pei_valid_bits = {
+ "multiple-error": util.bit(0),
+ "flags": util.bit(1),
+ "error-info": util.bit(2),
+ "virt-addr": util.bit(3),
+ "phy-addr": util.bit(4),
+ }
+
+ self.data = bytearray()
+
+ parser = subparsers.add_parser("arm", description=self.DESC)
+
+ arm_valid_bits = ",".join(self.arm_valid_bits.keys())
+ flags = ",".join(self.pei_flags.keys())
+ error_types = ",".join(self.pei_error_types.keys())
+ pei_valid_bits = ",".join(self.pei_valid_bits.keys())
+
+ # UEFI N.16 ARM Validation bits
+ g_arm = parser.add_argument_group("ARM processor")
+ g_arm.add_argument("--arm", "--arm-valid",
+ help=f"ARM valid bits: {arm_valid_bits}")
+ g_arm.add_argument("-a", "--affinity", "--level", "--affinity-level",
+ type=lambda x: int(x, 0),
+ help="Affinity level (when multiple levels apply)")
+ g_arm.add_argument("-l", "--mpidr", type=lambda x: int(x, 0),
+ help="Multiprocessor Affinity Register")
+ g_arm.add_argument("-i", "--midr", type=lambda x: int(x, 0),
+ help="Main ID Register")
+ g_arm.add_argument("-r", "--running",
+ action=argparse.BooleanOptionalAction,
+ default=None,
+ help="Indicates if the processor is running or not")
+ g_arm.add_argument("--psci", "--psci-state",
+ type=lambda x: int(x, 0),
+ help="Power State Coordination Interface - PSCI state")
+
+ # TODO: Add vendor-specific support
+
+ # UEFI N.17 bitmaps (type and flags)
+ g_pei = parser.add_argument_group("ARM Processor Error Info (PEI)")
+ g_pei.add_argument("-t", "--type", nargs="+",
+ help=f"one or more error types: {error_types}")
+ g_pei.add_argument("-f", "--flags", nargs="*",
+ help=f"zero or more error flags: {flags}")
+ g_pei.add_argument("-V", "--pei-valid", "--error-valid", nargs="*",
+ help=f"zero or more PEI valid bits: {pei_valid_bits}")
+
+ # UEFI N.17 Integer values
+ g_pei.add_argument("-m", "--multiple-error", nargs="+",
+ help="Number of errors: 0: Single error, 1: Multiple errors, 2-65535: Error count if known")
+ g_pei.add_argument("-e", "--error-info", nargs="+",
+ help="Error information (UEFI 2.10 tables N.18 to N.20)")
+ g_pei.add_argument("-p", "--physical-address", nargs="+",
+ help="Physical address")
+ g_pei.add_argument("-v", "--virtual-address", nargs="+",
+ help="Virtual address")
+
+ # UEFI N.21 Context
+ g_ctx = parser.add_argument_group("Processor Context")
+ g_ctx.add_argument("--ctx-type", "--context-type", nargs="*",
+ help="Type of the context (0=ARM32 GPR, 5=ARM64 EL1, other values supported)")
+ g_ctx.add_argument("--ctx-size", "--context-size", nargs="*",
+ help="Minimal size of the context")
+ g_ctx.add_argument("--ctx-array", "--context-array", nargs="*",
+ help="Comma-separated arrays for each context")
+
+ # Vendor-specific data
+ g_vendor = parser.add_argument_group("Vendor-specific data")
+ g_vendor.add_argument("--vendor", "--vendor-specific", nargs="+",
+ help="Vendor-specific byte arrays of data")
+
+ # Add arguments for Generic Error Data
+ qmp.argparse(parser)
+
+ parser.set_defaults(func=self.send_cper)
+
+ def send_cper(self, args):
+ """Parse subcommand arguments and send a CPER via QMP"""
+
+ qmp_cmd = qmp(args.host, args.port, args.debug)
+
+ # Handle Generic Error Data arguments if any
+ qmp_cmd.set_args(args)
+
+ is_cpu_type = re.compile(r"^([\w+]+\-)?arm\-cpu$")
+ cpus = qmp_cmd.search_qom("/machine/unattached/device",
+ "type", is_cpu_type)
+
+ cper = {}
+ pei = {}
+ ctx = {}
+ vendor = {}
+
+ arg = vars(args)
+
+ # Handle global parameters
+ if args.arm:
+ arm_valid_init = False
+ cper["valid"] = util.get_choice(name="valid",
+ value=args.arm,
+ choices=self.arm_valid_bits,
+ suffixes=["-error", "-err"])
+ else:
+ cper["valid"] = 0
+ arm_valid_init = True
+
+ if "running" in arg:
+ if args.running:
+ cper["running-state"] = util.bit(0)
+ else:
+ cper["running-state"] = 0
+ else:
+ cper["running-state"] = 0
+
+ if arm_valid_init:
+ if args.affinity:
+ cper["valid"] |= self.arm_valid_bits["affinity"]
+
+ if args.mpidr:
+ cper["valid"] |= self.arm_valid_bits["mpidr"]
+
+ if "running-state" in cper:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ if args.psci:
+ cper["valid"] |= self.arm_valid_bits["running"]
+
+ # Handle PEI
+ if not args.type:
+ args.type = ["cache-error"]
+
+ util.get_mult_choices(
+ pei,
+ name="valid",
+ values=args.pei_valid,
+ choices=self.pei_valid_bits,
+ suffixes=["-valid", "--addr"],
+ )
+ util.get_mult_choices(
+ pei,
+ name="type",
+ values=args.type,
+ choices=self.pei_error_types,
+ suffixes=["-error", "-err"],
+ )
+ util.get_mult_choices(
+ pei,
+ name="flags",
+ values=args.flags,
+ choices=self.pei_flags,
+ suffixes=["-error", "-cap"],
+ )
+ util.get_mult_int(pei, "error-info", args.error_info)
+ util.get_mult_int(pei, "multiple-error", args.multiple_error)
+ util.get_mult_int(pei, "phy-addr", args.physical_address)
+ util.get_mult_int(pei, "virt-addr", args.virtual_address)
+
+ # Handle context
+ util.get_mult_int(ctx, "type", args.ctx_type, allow_zero=True)
+ util.get_mult_int(ctx, "minimal-size", args.ctx_size, allow_zero=True)
+ util.get_mult_array(ctx, "register", args.ctx_array, allow_zero=True)
+
+ util.get_mult_array(vendor, "bytes", args.vendor, max_val=255)
+
+ # Store PEI
+ pei_data = bytearray()
+ default_flags = self.pei_flags["first"]
+ default_flags |= self.pei_flags["last"]
+
+ error_info_num = 0
+
+ for i, p in pei.items(): # pylint: disable=W0612
+ error_info_num += 1
+
+ # UEFI 2.10 doesn't define how to encode error information
+ # when multiple types are raised. So, provide a default only
+ # if a single type is there
+ if "error-info" not in p:
+ if p["type"] == util.bit(1):
+ p["error-info"] = 0x0091000F
+ if p["type"] == util.bit(2):
+ p["error-info"] = 0x0054007F
+ if p["type"] == util.bit(3):
+ p["error-info"] = 0x80D6460FFF
+ if p["type"] == util.bit(4):
+ p["error-info"] = 0x78DA03FF
+
+ if "valid" not in p:
+ p["valid"] = 0
+ if "multiple-error" in p:
+ p["valid"] |= self.pei_valid_bits["multiple-error"]
+
+ if "flags" in p:
+ p["valid"] |= self.pei_valid_bits["flags"]
+
+ if "error-info" in p:
+ p["valid"] |= self.pei_valid_bits["error-info"]
+
+ if "phy-addr" in p:
+ p["valid"] |= self.pei_valid_bits["phy-addr"]
+
+ if "virt-addr" in p:
+ p["valid"] |= self.pei_valid_bits["virt-addr"]
+
+ # Version
+ util.data_add(pei_data, 0, 1)
+
+ util.data_add(pei_data,
+ self.ACPI_GHES_ARM_CPER_PEI_LENGTH, 1)
+
+ util.data_add(pei_data, p["valid"], 2)
+ util.data_add(pei_data, p["type"], 1)
+ util.data_add(pei_data, p.get("multiple-error", 1), 2)
+ util.data_add(pei_data, p.get("flags", default_flags), 1)
+ util.data_add(pei_data, p.get("error-info", 0), 8)
+ util.data_add(pei_data, p.get("virt-addr", 0xDEADBEEF), 8)
+ util.data_add(pei_data, p.get("phy-addr", 0xABBA0BAD), 8)
+
+ # Store Context
+ ctx_data = bytearray()
+ context_info_num = 0
+
+ if ctx:
+ ret = qmp_cmd.send_cmd("query-target", may_open=True)
+
+ default_ctx = self.CONTEXT_MISC_REG
+
+ if "arch" in ret:
+ if ret["arch"] == "aarch64":
+ default_ctx = self.CONTEXT_AARCH64_EL1
+ elif ret["arch"] == "arm":
+ default_ctx = self.CONTEXT_AARCH32_EL1
+
+ for k in sorted(ctx.keys()):
+ context_info_num += 1
+
+ if "type" not in ctx[k]:
+ ctx[k]["type"] = default_ctx
+
+ if "register" not in ctx[k]:
+ ctx[k]["register"] = []
+
+ reg_size = len(ctx[k]["register"])
+ size = 0
+
+ if "minimal-size" in ctx:
+ size = ctx[k]["minimal-size"]
+
+ size = max(size, reg_size)
+
+ size = (size + 1) % 0xFFFE
+
+ # Version
+ util.data_add(ctx_data, 0, 2)
+
+ util.data_add(ctx_data, ctx[k]["type"], 2)
+
+ util.data_add(ctx_data, 8 * size, 4)
+
+ for r in ctx[k]["register"]:
+ util.data_add(ctx_data, r, 8)
+
+ for i in range(reg_size, size): # pylint: disable=W0612
+ util.data_add(ctx_data, 0, 8)
+
+ # Vendor-specific bytes are not grouped
+ vendor_data = bytearray()
+ if vendor:
+ for k in sorted(vendor.keys()):
+ for b in vendor[k]["bytes"]:
+ util.data_add(vendor_data, b, 1)
+
+ # Encode ARM Processor Error
+ data = bytearray()
+
+ util.data_add(data, cper["valid"], 4)
+
+ util.data_add(data, error_info_num, 2)
+ util.data_add(data, context_info_num, 2)
+
+ # Calculate the length of the CPER data
+ cper_length = self.ACPI_GHES_ARM_CPER_LENGTH
+ cper_length += len(pei_data)
+ cper_length += len(vendor_data)
+ cper_length += len(ctx_data)
+ util.data_add(data, cper_length, 4)
+
+ util.data_add(data, arg.get("affinity-level", 0), 1)
+
+ # Reserved
+ util.data_add(data, 0, 3)
+
+ if "midr-el1" not in arg:
+ if cpus:
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "midr"
+ }
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
+ if isinstance(ret, int):
+ arg["midr-el1"] = ret
+
+ util.data_add(data, arg.get("mpidr-el1", 0), 8)
+ util.data_add(data, arg.get("midr-el1", 0), 8)
+ util.data_add(data, cper["running-state"], 4)
+ util.data_add(data, arg.get("psci-state", 0), 4)
+
+ # Add PEI
+ data.extend(pei_data)
+ data.extend(ctx_data)
+ data.extend(vendor_data)
+
+ self.data = data
+
+ qmp_cmd.send_cper(cper_guid.CPER_PROC_ARM, self.data)
diff --git a/scripts/ghes_inject.py b/scripts/ghes_inject.py
new file mode 100755
index 000000000000..67cb6077bec8
--- /dev/null
+++ b/scripts/ghes_inject.py
@@ -0,0 +1,51 @@
+#!/usr/bin/env python3
+#
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+"""
+
+import argparse
+import sys
+
+from arm_processor_error import ArmProcessorEinj
+
+EINJ_DESC = """
+Handle ACPI GHESv2 error injection logic QEMU QMP interface.
+
+It allows using UEFI BIOS EINJ features to generate GHES records.
+
+It helps testing CPER and GHES drivers at the guest OS and how
+userspace applications at the guest handle them.
+"""
+
+def main():
+ """Main program"""
+
+ # Main parser - handle generic args like QEMU QMP TCP socket options
+ parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter,
+ usage="%(prog)s [options]",
+ description=EINJ_DESC)
+
+ g_options = parser.add_argument_group("QEMU QMP socket options")
+ g_options.add_argument("-H", "--host", default="localhost", type=str,
+ help="host name")
+ g_options.add_argument("-P", "--port", default=4445, type=int,
+ help="TCP port number")
+ g_options.add_argument('-d', '--debug', action='store_true')
+
+ subparsers = parser.add_subparsers()
+
+ ArmProcessorEinj(subparsers)
+
+ args = parser.parse_args()
+ if "func" in args:
+ args.func(args)
+ else:
+ sys.exit(f"Please specify a valid command for {sys.argv[0]}")
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/qmp_helper.py b/scripts/qmp_helper.py
new file mode 100644
index 000000000000..357ebc6e8359
--- /dev/null
+++ b/scripts/qmp_helper.py
@@ -0,0 +1,702 @@
+#!/usr/bin/env python3
+#
+# # pylint: disable=C0103,E0213,E1135,E1136,E1137,R0902,R0903,R0912,R0913
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
+
+"""
+Helper classes to be used by ghes_inject command classes.
+"""
+
+import json
+import sys
+
+from datetime import datetime
+from os import path as os_path
+
+try:
+ qemu_dir = os_path.abspath(os_path.dirname(os_path.dirname(__file__)))
+ sys.path.append(os_path.join(qemu_dir, 'python'))
+
+ from qemu.qmp.legacy import QEMUMonitorProtocol
+
+except ModuleNotFoundError as exc:
+ print(f"Module '{exc.name}' not found.")
+ print("Try export PYTHONPATH=top-qemu-dir/python or run from top-qemu-dir")
+ sys.exit(1)
+
+from base64 import b64encode
+
+class util:
+ """
+ Ancillary functions to deal with bitmaps, parse arguments,
+ generate GUID and encode data on a bytearray buffer.
+ """
+
+ #
+ # Helper routines to handle multiple choice arguments
+ #
+ def get_choice(name, value, choices, suffixes=None, bitmask=True):
+ """Produce a list from multiple choice argument"""
+
+ new_values = 0
+
+ if not value:
+ return new_values
+
+ for val in value.split(","):
+ val = val.lower()
+
+ if suffixes:
+ for suffix in suffixes:
+ val = val.removesuffix(suffix)
+
+ if val not in choices.keys():
+ if suffixes:
+ for suffix in suffixes:
+ if val + suffix in choices.keys():
+ val += suffix
+ break
+
+ if val not in choices.keys():
+ sys.exit(f"Error on '{name}': choice '{val}' is invalid.")
+
+ val = choices[val]
+
+ if bitmask:
+ new_values |= val
+ else:
+ if new_values:
+ sys.exit(f"Error on '{name}': only one value is accepted.")
+
+ new_values = val
+
+ return new_values
+
+ def get_array(name, values, max_val=None):
+ """Add numbered hashes from integer lists into an array"""
+
+ array = []
+
+ for value in values:
+ for val in value.split(","):
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if max_val and val > max_val:
+ sys.exit(f"Error on '{name}': {val} is too little")
+
+ array.append(val)
+
+ return array
+
+ def get_mult_array(mult, name, values, allow_zero=False, max_val=None):
+ """Add numbered hashes from integer lists"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ if not values:
+ i = 0
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = []
+ return
+
+ i = 0
+ for value in values:
+ for val in value.split(","):
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if max_val and val > max_val:
+ sys.exit(f"Error on '{name}': {val} is too little")
+
+ if i not in mult:
+ mult[i] = {}
+
+ if name not in mult[i]:
+ mult[i][name] = []
+
+ mult[i][name].append(val)
+
+ i += 1
+
+
+ def get_mult_choices(mult, name, values, choices,
+ suffixes=None, allow_zero=False):
+ """Add numbered hashes from multiple choice arguments"""
+
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ new_values = util.get_choice(name, val, choices, suffixes)
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = new_values
+ i += 1
+
+
+ def get_mult_int(mult, name, values, allow_zero=False):
+ """Add numbered hashes from integer arguments"""
+ if not allow_zero:
+ if not values:
+ return
+ else:
+ if values is None:
+ return
+
+ i = 0
+ for val in values:
+ try:
+ val = int(val, 0)
+ except ValueError:
+ sys.exit(f"Error on '{name}': {val} is not an integer")
+
+ if val < 0:
+ sys.exit(f"Error on '{name}': {val} is not unsigned")
+
+ if i not in mult:
+ mult[i] = {}
+
+ mult[i][name] = val
+ i += 1
+
+
+ #
+ # Data encode helper functions
+ #
+ def bit(b):
+ """Simple macro to define a bit on a bitmask"""
+ return 1 << b
+
+
+ def data_add(data, value, num_bytes):
+ """Adds bytes from value inside a bitarray"""
+
+ data.extend(value.to_bytes(num_bytes, byteorder="little")) # pylint: disable=E1101
+
+ def dump_bytearray(name, data):
+ """Does an hexdump of a byte array, grouping in bytes"""
+
+ print(f"{name} ({len(data)} bytes):")
+
+ for ln_start in range(0, len(data), 16):
+ ln_end = min(ln_start + 16, len(data))
+ print(f" {ln_start:08x} ", end="")
+ for i in range(ln_start, ln_end):
+ print(f"{data[i]:02x} ", end="")
+ for i in range(ln_end, ln_start + 16):
+ print(" ", end="")
+ print(" ", end="")
+ for i in range(ln_start, ln_end):
+ if data[i] >= 32 and data[i] < 127:
+ print(chr(data[i]), end="")
+ else:
+ print(".", end="")
+
+ print()
+ print()
+
+ def time(string):
+ """Handle BCD timestamps used on Generic Error Data Block"""
+
+ time = None
+
+ # Formats to be used when parsing time stamps
+ formats = [
+ "%Y-%m-%d %H:%M:%S",
+ ]
+
+ if string == "now":
+ time = datetime.now()
+
+ if time is None:
+ for fmt in formats:
+ try:
+ time = datetime.strptime(string, fmt)
+ break
+ except ValueError:
+ pass
+
+ if time is None:
+ raise ValueError("Invalid time format")
+
+ return time
+
+class guid:
+ """
+ Simple class to handle GUID fields.
+ """
+
+ def __init__(self, time_low, time_mid, time_high, nodes):
+ """Initialize a GUID value"""
+
+ assert len(nodes) == 8
+
+ self.time_low = time_low
+ self.time_mid = time_mid
+ self.time_high = time_high
+ self.nodes = nodes
+
+ @classmethod
+ def UUID(cls, guid_str):
+ """Initialize a GUID using a string on its standard format"""
+
+ if len(guid_str) != 36:
+ print("Size not 36")
+ raise ValueError('Invalid GUID size')
+
+ # It is easier to parse without separators. So, drop them
+ guid_str = guid_str.replace('-', '')
+
+ if len(guid_str) != 32:
+ print("Size not 32", guid_str, len(guid_str))
+ raise ValueError('Invalid GUID hex size')
+
+ time_low = 0
+ time_mid = 0
+ time_high = 0
+ nodes = []
+
+ for i in reversed(range(16, 32, 2)):
+ h = guid_str[i:i + 2]
+ value = int(h, 16)
+ nodes.insert(0, value)
+
+ time_high = int(guid_str[12:16], 16)
+ time_mid = int(guid_str[8:12], 16)
+ time_low = int(guid_str[0:8], 16)
+
+ return cls(time_low, time_mid, time_high, nodes)
+
+ def __str__(self):
+ """Output a GUID value on its default string representation"""
+
+ clock = self.nodes[0] << 8 | self.nodes[1]
+
+ node = 0
+ for i in range(2, len(self.nodes)):
+ node = node << 8 | self.nodes[i]
+
+ s = f"{self.time_low:08x}-{self.time_mid:04x}-"
+ s += f"{self.time_high:04x}-{clock:04x}-{node:012x}"
+ return s
+
+ def to_bytes(self):
+ """Output a GUID value in bytes"""
+
+ data = bytearray()
+
+ util.data_add(data, self.time_low, 4)
+ util.data_add(data, self.time_mid, 2)
+ util.data_add(data, self.time_high, 2)
+ data.extend(bytearray(self.nodes))
+
+ return data
+
+class qmp:
+ """
+ Opens a connection and send/receive QMP commands.
+ """
+
+ def send_cmd(self, command, args=None, may_open=False, return_error=True):
+ """Send a command to QMP, optinally opening a connection"""
+
+ if may_open:
+ self._connect()
+ elif not self.connected:
+ return False
+
+ msg = { 'execute': command }
+ if args:
+ msg['arguments'] = args
+
+ try:
+ obj = self.qmp_monitor.cmd_obj(msg)
+ # Can we use some other exception class here?
+ except Exception as e: # pylint: disable=W0718
+ print(f"Command: {command}")
+ print(f"Failed to inject error: {e}.")
+ return None
+
+ if "return" in obj:
+ if isinstance(obj.get("return"), dict):
+ if obj["return"]:
+ return obj["return"]
+ return "OK"
+
+ return obj["return"]
+
+ if isinstance(obj.get("error"), dict):
+ error = obj["error"]
+ if return_error:
+ print(f"Command: {msg}")
+ print(f'{error["class"]}: {error["desc"]}')
+ else:
+ print(json.dumps(obj))
+
+ return None
+
+ def _close(self):
+ """Shutdown and close the socket, if opened"""
+ if not self.connected:
+ return
+
+ self.qmp_monitor.close()
+ self.connected = False
+
+ def _connect(self):
+ """Connect to a QMP TCP/IP port, if not connected yet"""
+
+ if self.connected:
+ return True
+
+ try:
+ self.qmp_monitor.connect(negotiate=True)
+ except ConnectionError:
+ sys.exit(f"Can't connect to QMP host {self.host}:{self.port}")
+
+ self.connected = True
+
+ return True
+
+ BLOCK_STATUS_BITS = {
+ "uncorrectable": util.bit(0),
+ "correctable": util.bit(1),
+ "multi-uncorrectable": util.bit(2),
+ "multi-correctable": util.bit(3),
+ }
+
+ ERROR_SEVERITY = {
+ "recoverable": 0,
+ "fatal": 1,
+ "corrected": 2,
+ "none": 3,
+ }
+
+ VALIDATION_BITS = {
+ "fru-id": util.bit(0),
+ "fru-text": util.bit(1),
+ "timestamp": util.bit(2),
+ }
+
+ GEDB_FLAGS_BITS = {
+ "recovered": util.bit(0),
+ "prev-error": util.bit(1),
+ "simulated": util.bit(2),
+ }
+
+ GENERIC_DATA_SIZE = 72
+
+ def argparse(parser):
+ """Prepare a parser group to query generic error data"""
+
+ block_status_bits = ",".join(qmp.BLOCK_STATUS_BITS.keys())
+ error_severity_enum = ",".join(qmp.ERROR_SEVERITY.keys())
+ validation_bits = ",".join(qmp.VALIDATION_BITS.keys())
+ gedb_flags_bits = ",".join(qmp.GEDB_FLAGS_BITS.keys())
+
+ g_gen = parser.add_argument_group("Generic Error Data") # pylint: disable=E1101
+ g_gen.add_argument("--block-status",
+ help=f"block status bits: {block_status_bits}")
+ g_gen.add_argument("--raw-data", nargs="+",
+ help="Raw data inside the Error Status Block")
+ g_gen.add_argument("--error-severity", "--severity",
+ help=f"error severity: {error_severity_enum}")
+ g_gen.add_argument("--gen-err-valid-bits",
+ "--generic-error-validation-bits",
+ help=f"validation bits: {validation_bits}")
+ g_gen.add_argument("--fru-id", type=guid.UUID,
+ help="GUID representing a physical device")
+ g_gen.add_argument("--fru-text",
+ help="ASCII string identifying the FRU hardware")
+ g_gen.add_argument("--timestamp", type=util.time,
+ help="Time when the error info was collected")
+ g_gen.add_argument("--precise", "--precise-timestamp",
+ action='store_true',
+ help="Marks the timestamp as precise if --timestamp is used")
+ g_gen.add_argument("--gedb-flags",
+ help=f"General Error Data Block flags: {gedb_flags_bits}")
+
+ def set_args(self, args):
+ """Set the arguments optionally defined via self.argparse()"""
+
+ if args.block_status:
+ self.block_status = util.get_choice(name="block-status",
+ value=args.block_status,
+ choices=self.BLOCK_STATUS_BITS,
+ bitmask=False)
+ if args.raw_data:
+ self.raw_data = util.get_array("raw-data", args.raw_data,
+ max_val=255)
+ print(self.raw_data)
+
+ if args.error_severity:
+ self.error_severity = util.get_choice(name="error-severity",
+ value=args.error_severity,
+ choices=self.ERROR_SEVERITY,
+ bitmask=False)
+
+ if args.fru_id:
+ self.fru_id = args.fru_id.to_bytes()
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["fru-id"]
+
+ if args.fru_text:
+ text = bytearray(args.fru_text.encode('ascii'))
+ if len(text) > 20:
+ sys.exit("FRU text is too big to fit")
+
+ self.fru_text = text
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["fru-text"]
+
+ if args.timestamp:
+ time = args.timestamp
+ century = int(time.year / 100)
+
+ bcd = bytearray()
+ util.data_add(bcd, (time.second // 10) << 4 | (time.second % 10), 1)
+ util.data_add(bcd, (time.minute // 10) << 4 | (time.minute % 10), 1)
+ util.data_add(bcd, (time.hour // 10) << 4 | (time.hour % 10), 1)
+
+ if args.precise:
+ util.data_add(bcd, 1, 1)
+ else:
+ util.data_add(bcd, 0, 1)
+
+ util.data_add(bcd, (time.day // 10) << 4 | (time.day % 10), 1)
+ util.data_add(bcd, (time.month // 10) << 4 | (time.month % 10), 1)
+ util.data_add(bcd,
+ ((time.year % 100) // 10) << 4 | (time.year % 10), 1)
+ util.data_add(bcd, ((century % 100) // 10) << 4 | (century % 10), 1)
+
+ self.timestamp = bcd
+ if not args.gen_err_valid_bits:
+ self.validation_bits |= self.VALIDATION_BITS["timestamp"]
+
+ if args.gen_err_valid_bits:
+ self.validation_bits = util.get_choice(name="validation",
+ value=args.gen_err_valid_bits,
+ choices=self.VALIDATION_BITS)
+
+ def __init__(self, host, port, debug=False):
+ """Initialize variables used by the QMP send logic"""
+
+ self.connected = False
+ self.host = host
+ self.port = port
+ self.debug = debug
+
+ # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Status Block
+ self.block_status = self.BLOCK_STATUS_BITS["uncorrectable"]
+ self.raw_data = []
+ self.error_severity = self.ERROR_SEVERITY["recoverable"]
+
+ # ACPI 6.1: 18.3.2.7.1 Generic Error Data: Generic Error Data Entry
+ self.validation_bits = 0
+ self.flags = 0
+ self.fru_id = bytearray(16)
+ self.fru_text = bytearray(20)
+ self.timestamp = bytearray(8)
+
+ self.qmp_monitor = QEMUMonitorProtocol(address=(self.host, self.port))
+
+ #
+ # Socket QMP send command
+ #
+ def send_cper_raw(self, cper_data):
+ """Send a raw CPER data to QEMU though QMP TCP socket"""
+
+ data = b64encode(bytes(cper_data)).decode('ascii')
+
+ cmd_arg = {
+ 'cper': data
+ }
+
+ self._connect()
+
+ if self.send_cmd("inject-ghes-error", cmd_arg):
+ print("Error injected.")
+
+ def send_cper(self, notif_type, payload):
+ """Send commands to QEMU though QMP TCP socket"""
+
+ # Fill CPER record header
+
+ # NOTE: bits 4 to 13 of block status contain the number of
+ # data entries in the data section. This is currently unsupported.
+
+ cper_length = len(payload)
+ data_length = cper_length + len(self.raw_data) + self.GENERIC_DATA_SIZE
+
+ # Generic Error Data Entry
+ gede = bytearray()
+
+ gede.extend(notif_type.to_bytes())
+ util.data_add(gede, self.error_severity, 4)
+ util.data_add(gede, 0x300, 2)
+ util.data_add(gede, self.validation_bits, 1)
+ util.data_add(gede, self.flags, 1)
+ util.data_add(gede, cper_length, 4)
+ gede.extend(self.fru_id)
+ gede.extend(self.fru_text)
+ gede.extend(self.timestamp)
+
+ # Generic Error Status Block
+ gebs = bytearray()
+
+ if self.raw_data:
+ raw_data_offset = len(gebs)
+ else:
+ raw_data_offset = 0
+
+ util.data_add(gebs, self.block_status, 4)
+ util.data_add(gebs, raw_data_offset, 4)
+ util.data_add(gebs, len(self.raw_data), 4)
+ util.data_add(gebs, data_length, 4)
+ util.data_add(gebs, self.error_severity, 4)
+
+ cper_data = bytearray()
+ cper_data.extend(gebs)
+ cper_data.extend(gede)
+ cper_data.extend(bytearray(self.raw_data))
+ cper_data.extend(bytearray(payload))
+
+ if self.debug:
+ print(f"GUID: {notif_type}")
+
+ util.dump_bytearray("Generic Error Status Block", gebs)
+ util.dump_bytearray("Generic Error Data Entry", gede)
+
+ if self.raw_data:
+ util.dump_bytearray("Raw data", bytearray(self.raw_data))
+
+ util.dump_bytearray("Payload", payload)
+
+ self.send_cper_raw(cper_data)
+
+
+ def search_qom(self, path, prop, regex):
+ """
+ Return a list of devices that match path array like:
+
+ /machine/unattached/device
+ /machine/peripheral-anon/device
+ ...
+ """
+
+ found = []
+
+ i = 0
+ while 1:
+ dev = f"{path}[{i}]"
+ args = {
+ 'path': dev,
+ 'property': prop
+ }
+ ret = self.send_cmd("qom-get", args, may_open=True, return_error=False)
+ if not ret:
+ break
+
+ if isinstance(ret, str):
+ if regex.search(ret):
+ found.append(dev)
+
+ i += 1
+ if i > 10000:
+ print("Too many objects returned by qom-get!")
+ break
+
+ return found
+
+class cper_guid:
+ """
+ Contains CPER GUID, as per:
+ https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html
+ """
+
+ CPER_PROC_GENERIC = guid(0x9876CCAD, 0x47B4, 0x4bdb,
+ [0xB6, 0x5E, 0x16, 0xF1,
+ 0x93, 0xC4, 0xF3, 0xDB])
+
+ CPER_PROC_X86 = guid(0xDC3EA0B0, 0xA144, 0x4797,
+ [0xB9, 0x5B, 0x53, 0xFA,
+ 0x24, 0x2B, 0x6E, 0x1D])
+
+ CPER_PROC_ITANIUM = guid(0xe429faf1, 0x3cb7, 0x11d4,
+ [0xbc, 0xa7, 0x00, 0x80,
+ 0xc7, 0x3c, 0x88, 0x81])
+
+ CPER_PROC_ARM = guid(0xE19E3D16, 0xBC11, 0x11E4,
+ [0x9C, 0xAA, 0xC2, 0x05,
+ 0x1D, 0x5D, 0x46, 0xB0])
+
+ CPER_PLATFORM_MEM = guid(0xA5BC1114, 0x6F64, 0x4EDE,
+ [0xB8, 0x63, 0x3E, 0x83,
+ 0xED, 0x7C, 0x83, 0xB1])
+
+ CPER_PLATFORM_MEM2 = guid(0x61EC04FC, 0x48E6, 0xD813,
+ [0x25, 0xC9, 0x8D, 0xAA,
+ 0x44, 0x75, 0x0B, 0x12])
+
+ CPER_PCIE = guid(0xD995E954, 0xBBC1, 0x430F,
+ [0xAD, 0x91, 0xB4, 0x4D,
+ 0xCB, 0x3C, 0x6F, 0x35])
+
+ CPER_PCI_BUS = guid(0xC5753963, 0x3B84, 0x4095,
+ [0xBF, 0x78, 0xED, 0xDA,
+ 0xD3, 0xF9, 0xC9, 0xDD])
+
+ CPER_PCI_DEV = guid(0xEB5E4685, 0xCA66, 0x4769,
+ [0xB6, 0xA2, 0x26, 0x06,
+ 0x8B, 0x00, 0x13, 0x26])
+
+ CPER_FW_ERROR = guid(0x81212A96, 0x09ED, 0x4996,
+ [0x94, 0x71, 0x8D, 0x72,
+ 0x9C, 0x8E, 0x69, 0xED])
+
+ CPER_DMA_GENERIC = guid(0x5B51FEF7, 0xC79D, 0x4434,
+ [0x8F, 0x1B, 0xAA, 0x62,
+ 0xDE, 0x3E, 0x2C, 0x64])
+
+ CPER_DMA_VT = guid(0x71761D37, 0x32B2, 0x45cd,
+ [0xA7, 0xD0, 0xB0, 0xFE,
+ 0xDD, 0x93, 0xE8, 0xCF])
+
+ CPER_DMA_IOMMU = guid(0x036F84E1, 0x7F37, 0x428c,
+ [0xA7, 0x9E, 0x57, 0x5F,
+ 0xDF, 0xAA, 0x84, 0xEC])
+
+ CPER_CCIX_PER = guid(0x91335EF6, 0xEBFB, 0x4478,
+ [0xA6, 0xA6, 0x88, 0xB7,
+ 0x28, 0xCF, 0x75, 0xD7])
+
+ CPER_CXL_PROT_ERR = guid(0x80B9EFB4, 0x52B5, 0x4DE3,
+ [0xA7, 0x77, 0x68, 0x78,
+ 0x4B, 0x77, 0x10, 0x48])
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (18 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-14 6:13 ` [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab,
Peter Maydell, linux-kernel, qemu-arm, qemu-devel
Accurately injecting an ARM Processor error ACPI/APEI GHES
error record requires the value of the ARM Multiprocessor
Affinity Register (mpidr).
While ARM implements it, this is currently not visible.
Add a field at CPU storing it, and place it at arm_cpu_properties
as experimental, thus allowing it to be queried via QMP using
qom-get function.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
target/arm/cpu.c | 1 +
target/arm/cpu.h | 1 +
target/arm/helper.c | 10 ++++++++--
3 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 19191c239181..30fcf0a10f46 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2619,6 +2619,7 @@ static ObjectClass *arm_cpu_class_by_name(const char *cpu_model)
static Property arm_cpu_properties[] = {
DEFINE_PROP_UINT64("midr", ARMCPU, midr, 0),
+ DEFINE_PROP_UINT64("x-mpidr", ARMCPU, mpidr, 0),
DEFINE_PROP_UINT64("mp-affinity", ARMCPU,
mp_affinity, ARM64_AFFINITY_INVALID),
DEFINE_PROP_INT32("node-id", ARMCPU, node_id, CPU_UNSET_NUMA_NODE_ID),
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f065756c5c7d..bf8e5943af4f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@ struct ArchCPU {
uint64_t reset_pmcr_el0;
} isar;
uint64_t midr;
+ uint64_t mpidr;
uint32_t revidr;
uint32_t reset_fpsid;
uint64_t ctr;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0a582c1cd3b3..d6e7aa069489 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4690,7 +4690,7 @@ static uint64_t mpidr_read_val(CPUARMState *env)
return mpidr;
}
-static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+static uint64_t mpidr_read(CPUARMState *env)
{
unsigned int cur_el = arm_current_el(env);
@@ -4700,6 +4700,11 @@ static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
return mpidr_read_val(env);
}
+static uint64_t mpidr_read_ri(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+ return mpidr_read(env);
+}
+
static const ARMCPRegInfo lpae_cp_reginfo[] = {
/* NOP AMAIR0/1 */
{ .name = "AMAIR0", .state = ARM_CP_STATE_BOTH,
@@ -9721,7 +9726,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
{ .name = "MPIDR_EL1", .state = ARM_CP_STATE_BOTH,
.opc0 = 3, .crn = 0, .crm = 0, .opc1 = 0, .opc2 = 5,
.fgt = FGT_MPIDR_EL1,
- .access = PL1_R, .readfn = mpidr_read, .type = ARM_CP_NO_RAW },
+ .access = PL1_R, .readfn = mpidr_read_ri, .type = ARM_CP_NO_RAW },
};
#ifdef CONFIG_USER_ONLY
static const ARMCPRegUserSpaceInfo mpidr_user_cp_reginfo[] = {
@@ -9731,6 +9736,7 @@ void register_cp_regs_for_features(ARMCPU *cpu)
modify_arm_cp_regs(mpidr_cp_reginfo, mpidr_user_cp_reginfo);
#endif
define_arm_cp_regs(cpu, mpidr_cp_reginfo);
+ cpu->mpidr = mpidr_read(env);
}
if (arm_feature(env, ARM_FEATURE_AUXCR)) {
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (19 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
@ 2024-09-14 6:13 ` Mauro Carvalho Chehab
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
21 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-14 6:13 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Mauro Carvalho Chehab, Cleber Rosa,
John Snow, linux-kernel, qemu-devel
Add support to retrieve mpidr value via qom-get.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
scripts/arm_processor_error.py | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/scripts/arm_processor_error.py b/scripts/arm_processor_error.py
index 62e0c5662232..0a16d4f0d8b1 100644
--- a/scripts/arm_processor_error.py
+++ b/scripts/arm_processor_error.py
@@ -5,12 +5,10 @@
#
# Copyright (C) 2024 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-# TODO: current implementation has dummy defaults.
-#
-# For a better implementation, a QMP addition/call is needed to
-# retrieve some data for ARM Processor Error injection:
-#
-# - ARM registers: power_state, mpidr.
+# Note: currently it lacks a method to fill the ARM Processor Error CPER
+# psci field from emulation. On a real hardware, this is filled only
+# when a CPU is not running. Implementing support for it to simulate a
+# real hardware is not trivial.
import argparse
import re
@@ -174,11 +172,24 @@ def send_cper(self, args):
else:
cper["running-state"] = 0
+ if args.mpidr:
+ cper["mpidr-el1"] = arg["mpidr"]
+ elif cpus:
+ cmd_arg = {
+ 'path': cpus[0],
+ 'property': "x-mpidr"
+ }
+ ret = qmp_cmd.send_cmd("qom-get", cmd_arg, may_open=True)
+ if isinstance(ret, int):
+ cper["mpidr-el1"] = ret
+ else:
+ cper["mpidr-el1"] = 0
+
if arm_valid_init:
if args.affinity:
cper["valid"] |= self.arm_valid_bits["affinity"]
- if args.mpidr:
+ if "mpidr-el1" in cper:
cper["valid"] |= self.arm_valid_bits["mpidr"]
if "running-state" in cper:
@@ -362,7 +373,7 @@ def send_cper(self, args):
if isinstance(ret, int):
arg["midr-el1"] = ret
- util.data_add(data, arg.get("mpidr-el1", 0), 8)
+ util.data_add(data, cper["mpidr-el1"], 8)
util.data_add(data, arg.get("midr-el1", 0), 8)
util.data_add(data, cper["running-state"], 4)
util.data_add(data, arg.get("psci-state", 0), 4)
--
2.46.0
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
2024-09-14 6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
` (20 preceding siblings ...)
2024-09-14 6:13 ` [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
@ 2024-09-17 12:15 ` Igor Mammedov
2024-09-24 13:00 ` Mauro Carvalho Chehab
21 siblings, 1 reply; 40+ messages in thread
From: Igor Mammedov @ 2024-09-17 12:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
On Sat, 14 Sep 2024 08:13:21 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> This series add support for injecting generic CPER records. Such records
> are generated outside QEMU via a provided script.
>
> On this version, the patch reworking the way offsets are calculated were
> split on several other patches, to make one logical change per patch and
> make review easier.
>
> Despite the number of patches increased from 12 to 21, there is just one
> real new patch (as the other ones are a split from a big change):
>
> acpi/generic_event_device: Update GHES migration to cover hest addr
I'm done with this round of review.
Given that the series accumulated a bunch of cleanups,
I'd suggest to move all cleanups/renamings not related
to new HEST lookup and new src id mapping to the beginning
of the series, so once they reviewed they could be split up into
a separate series that could be merged while we are ironing down
the new functionality.
> ---
>
> v10:
> - Patch 1 split on several patches to make reviews easier;
> - Added a migration patch;
> - CPER QMP command was renamed;
> - Updated some comments to better reflect exact ACPI version;
> - Removed a code to reset acks when OSPM fails to read records;
> - Removed a duplicated config GHES_CPER symbol;
> - There is now an arch-independent namespace for GHES source IDs;
> - Fixed the size of hest_ghes_notify array when creating tables;
> - acpi-hest.json is now a section of ACPI;
> - QMP command renamed from @ghes-cper to inject-ghes-error.
>
> v9:
> - Patches reorganized to make easier for reviewers;
> - source ID is now guest-OS specific;
> - Some patches got a revision history since v8;
> - Several minor cleanups.
>
> v8:
> - Fix one of the BIOS links that were incorrect;
> - Changed mem error internal injection to use a common code;
> - No more hardcoded values for CPER: instead of using just the
> payload at the QAPI, it now has the full raw CPER there;
> - Error injection script now supports changing fields at the
> Generic Error Data section of the CPER;
> - Several minor cleanups.
>
> v7:
> - Change the way offsets are calculated and used on HEST table.
> Now, it is compatible with migrations as all offsets are relative
> to the HEST table;
> - GHES interface is now more generic: the entire CPER is sent via
> QMP, instead of just the payload;
> - Some code cleanups to make the code more robust;
> - The python script now uses QEMUMonitorProtocol class.
>
> v6:
> - PNP0C33 device creation moved to aml-build.c;
> - acpi_ghes record functions now use ACPI notify parameter,
> instead of source ID;
> - the number of source IDs is now automatically calculated;
> - some code cleanups and function/var renames;
> - some fixes and cleanups at the error injection script;
> - ghes cper stub now produces an error if cper JSON is not compiled;
> - Offset calculation logic for GHES was refactored;
> - Updated documentation to reflect the GHES allocated size;
> - Added a x-mpidr object for QOM usage;
> - Added a patch making usage of x-mpidr field at ARM injection
> script;
>
> v5:
> - CPER guid is now passing as string;
> - raw-data is now passed with base64 encode;
> - Removed several GPIO left-overs from arm/virt.c changes;
> - Lots of cleanups and improvements at the error injection script.
> It now better handles QMP dialog and doesn't print debug messages.
> Also, code was split on two modules, to make easier to add more
> error injection commands.
>
> v4:
> - CPER generation moved to happen outside QEMU;
> - One patch adding support for mpidr query was removed.
>
> v3:
> - patch 1 cleanups with some comment changes and adding another place where
> the poweroff GPIO define should be used. No changes on other patches (except
> due to conflict resolution).
>
> v2:
> - added a new patch using a define for GPIO power pin;
> - patch 2 changed to also use a define for generic error GPIO pin;
> - a couple cleanups at patch 2 removing uneeded else clauses.
>
> Example of generating a CPER record:
>
> $ scripts/ghes_inject.py -d arm -p 0xdeadbeef
> GUID: e19e3d16-bc11-11e4-9caa-c2051d5d46b0
> Generic Error Status Block (20 bytes):
> 00000000 01 00 00 00 00 00 00 00 00 00 00 00 90 00 00 00 ................
> 00000010 00 00 00 00 ....
>
> Generic Error Data Entry (72 bytes):
> 00000000 16 3d 9e e1 11 bc e4 11 9c aa c2 05 1d 5d 46 b0 .=...........]F.
> 00000010 00 00 00 00 00 03 00 00 48 00 00 00 00 00 00 00 ........H.......
> 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00000040 00 00 00 00 00 00 00 00 ........
>
> Payload (72 bytes):
> 00000000 05 00 00 00 01 00 00 00 48 00 00 00 00 00 00 00 ........H.......
> 00000010 00 00 00 80 00 00 00 00 10 05 0f 00 00 00 00 00 ................
> 00000020 00 00 00 00 00 00 00 00 00 20 14 00 02 01 00 03 ......... ......
> 00000030 0f 00 91 00 00 00 00 00 ef be ad de 00 00 00 00 ................
> 00000040 ef be ad de 00 00 00 00 ........
>
> Error injected.
>
> [ 9.358364] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [ 9.359027] {1}[Hardware Error]: event severity: recoverable
> [ 9.359586] {1}[Hardware Error]: Error 0, type: recoverable
> [ 9.360124] {1}[Hardware Error]: section_type: ARM processor error
> [ 9.360561] {1}[Hardware Error]: MIDR: 0x00000000000f0510
> [ 9.361160] {1}[Hardware Error]: Multiprocessor Affinity Register (MPIDR): 0x0000000080000000
> [ 9.361643] {1}[Hardware Error]: running state: 0x0
> [ 9.362142] {1}[Hardware Error]: Power State Coordination Interface state: 0
> [ 9.362682] {1}[Hardware Error]: Error info structure 0:
> [ 9.363030] {1}[Hardware Error]: num errors: 2
> [ 9.363656] {1}[Hardware Error]: error_type: 0x02: cache error
> [ 9.364163] {1}[Hardware Error]: error_info: 0x000000000091000f
> [ 9.364834] {1}[Hardware Error]: transaction type: Data Access
> [ 9.365599] {1}[Hardware Error]: cache error, operation type: Data write
> [ 9.366441] {1}[Hardware Error]: cache level: 2
> [ 9.367005] {1}[Hardware Error]: processor context not corrupted
> [ 9.367753] {1}[Hardware Error]: physical fault address: 0x00000000deadbeef
> [ 9.374267] Memory failure: 0xdeadb: recovery action for free buddy page: Recovered
>
> Such script currently supports arm processor error CPER, but can easily be
> extended to other GHES notification types.
>
>
> Mauro Carvalho Chehab (21):
> acpi/ghes: add a firmware file with HEST address
> acpi/generic_event_device: Update GHES migration to cover hest addr
> acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
> acpi/ghes: simplify acpi_ghes_record_errors() code
> acpi/ghes: better handle source_id and notification
> acpi/ghes: Remove a duplicated out of bounds check
> acpi/ghes: rework the logic to handle HEST source ID
> acpi/ghes: Change the type for source_id
> acpi/ghes: Don't hardcode the number of sources on ghes
> acpi/ghes: make the GHES record generation more generic
> acpi/ghes: don't crash QEMU if ghes GED is not found
> acpi/ghes: rename etc/hardware_error file macros
> acpi/ghes: better name GHES memory error function
> acpi/ghes: add a notifier to notify when error data is ready
> acpi/generic_event_device: add an APEI error device
> arm/virt: Wire up a GED error device for ACPI / GHES
> qapi/acpi-hest: add an interface to do generic CPER error injection
> docs: acpi_hest_ghes: fix documentation for CPER size
> scripts/ghes_inject: add a script to generate GHES error inject
> target/arm: add an experimental mpidr arm cpu property object
> scripts/arm_processor_error.py: retrieve mpidr if not filled
>
> MAINTAINERS | 10 +
> docs/specs/acpi_hest_ghes.rst | 6 +-
> hw/acpi/Kconfig | 5 +
> hw/acpi/aml-build.c | 10 +
> hw/acpi/generic_event_device.c | 19 +-
> hw/acpi/ghes-stub.c | 2 +-
> hw/acpi/ghes.c | 312 +++++++----
> hw/acpi/ghes_cper.c | 32 ++
> hw/acpi/ghes_cper_stub.c | 19 +
> hw/acpi/meson.build | 2 +
> hw/arm/virt-acpi-build.c | 12 +-
> hw/arm/virt.c | 19 +-
> include/hw/acpi/acpi_dev_interface.h | 1 +
> include/hw/acpi/aml-build.h | 2 +
> include/hw/acpi/generic_event_device.h | 1 +
> include/hw/acpi/ghes.h | 37 +-
> include/hw/arm/virt.h | 2 +
> qapi/acpi-hest.json | 35 ++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> scripts/arm_processor_error.py | 388 ++++++++++++++
> scripts/ghes_inject.py | 51 ++
> scripts/qmp_helper.py | 702 +++++++++++++++++++++++++
> target/arm/cpu.c | 1 +
> target/arm/cpu.h | 1 +
> target/arm/helper.c | 10 +-
> target/arm/kvm.c | 3 +-
> 27 files changed, 1552 insertions(+), 132 deletions(-)
> create mode 100644 hw/acpi/ghes_cper.c
> create mode 100644 hw/acpi/ghes_cper_stub.c
> create mode 100644 qapi/acpi-hest.json
> create mode 100644 scripts/arm_processor_error.py
> create mode 100755 scripts/ghes_inject.py
> create mode 100644 scripts/qmp_helper.py
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
@ 2024-09-24 13:00 ` Mauro Carvalho Chehab
2024-09-24 13:14 ` Igor Mammedov
0 siblings, 1 reply; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-24 13:00 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
Em Tue, 17 Sep 2024 14:15:19 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> I'm done with this round of review.
>
> Given that the series accumulated a bunch of cleanups,
> I'd suggest to move all cleanups/renamings not related
> to new HEST lookup and new src id mapping to the beginning
> of the series, so once they reviewed they could be split up into
> a separate series that could be merged while we are ironing down
> the new functionality.
I've rebased the series placing the preparation stuff (cleanups
and renames) at the beginning. So, what I have now is:
1) preparation patches:
41709f0898e1 acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
5409daa41c78 acpi/ghes: simplify acpi_ghes_record_errors() code
2539f1f662b9 acpi/ghes: better handle source_id and notification
3f19400549c1 acpi/ghes: Remove a duplicated out of bounds check
f0b06ecede46 acpi/ghes: Change the type for source_id
9f08301ac195 acpi/ghes: Prepare to support multiple sources on ghes
2426cd76e868 acpi/ghes: make the GHES record generation more generic
3fb7ec864700 acpi/ghes: better name GHES memory error function
1a22dad3211e acpi/ghes: don't crash QEMU if ghes GED is not found
726968d4ee20 acpi/ghes: rename etc/hardware_error file macros
f562380da7ce docs: acpi_hest_ghes: fix documentation for CPER size
69850f550f99 acpi/generic_event_device: add an APEI error device
Patches were changed to ensure that they won't be add any new
new features. They are just code shift in order to make the diff
of the next patches smaller.
There is a small point here: the logic was simplified to only
support a single source ID (I added an assert() to enforce it) and
simplified the calculus in preparation for the HEST and migration
series.
2) add a BIOS pointer to HEST, using it. The migration stuff
will be along those:
c24f1a8708e3 acpi/ghes: add a firmware file with HEST address
853dce23ec39 acpi/ghes: Use HEST table offsets when preparing GHES records
c148716fd7c8 acpi/generic_event_device: Update GHES migration to cover hest addr
Up to that, still no new features, but the offset calculus will be
relative to HEST table and will use the bios pointers stored there;
3) Add support for generic error inject:
f5ec0d197d82 acpi/ghes: add a notifier to notify when error data is ready
f5e015537209 arm/virt: Wire up a GED error device for ACPI / GHES
3b6692dbf473 qapi/acpi-hest: add an interface to do generic CPER error injection
620a5a49f218 scripts/ghes_inject: add a script to generate GHES error inject
4) MPIDR property:
2dd6e3aae450 target/arm: add an experimental mpidr arm cpu property object
02c88cd4daa2 scripts/arm_processor_error.py: retrieve mpidr if not filled
I'm still testing if the rebase didn't cause any issues. So, the above
may still change a little bit. I also need to address your comments to the
cleanup patches and work at the migration, but just want to double check if
this is what you want.
If OK to you, my plan is to submit you the cleanup patches after I
finish testing the hole series.
The migration logic will require some time, and I don't want to bother
with the cleanup stuff while doing it. So, perhaps while I'm doing it,
you could review/merge the cleanups.
We can do the same for each of the 4 above series of patches, as it
makes review simpler as there will be less patches to look into on
each series.
Would it work for you?
Thanks,
Mauro
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
2024-09-24 13:00 ` Mauro Carvalho Chehab
@ 2024-09-24 13:14 ` Igor Mammedov
2024-09-25 4:26 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 40+ messages in thread
From: Igor Mammedov @ 2024-09-24 13:14 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
On Tue, 24 Sep 2024 15:00:58 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Tue, 17 Sep 2024 14:15:19 +0200
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > I'm done with this round of review.
> >
> > Given that the series accumulated a bunch of cleanups,
> > I'd suggest to move all cleanups/renamings not related
> > to new HEST lookup and new src id mapping to the beginning
> > of the series, so once they reviewed they could be split up into
> > a separate series that could be merged while we are ironing down
> > the new functionality.
>
> I've rebased the series placing the preparation stuff (cleanups
> and renames) at the beginning. So, what I have now is:
>
> 1) preparation patches:
>
> 41709f0898e1 acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED
> 5409daa41c78 acpi/ghes: simplify acpi_ghes_record_errors() code
> 2539f1f662b9 acpi/ghes: better handle source_id and notification
> 3f19400549c1 acpi/ghes: Remove a duplicated out of bounds check
> f0b06ecede46 acpi/ghes: Change the type for source_id
> 9f08301ac195 acpi/ghes: Prepare to support multiple sources on ghes
> 2426cd76e868 acpi/ghes: make the GHES record generation more generic
> 3fb7ec864700 acpi/ghes: better name GHES memory error function
> 1a22dad3211e acpi/ghes: don't crash QEMU if ghes GED is not found
> 726968d4ee20 acpi/ghes: rename etc/hardware_error file macros
> f562380da7ce docs: acpi_hest_ghes: fix documentation for CPER size
> 69850f550f99 acpi/generic_event_device: add an APEI error device
this one doesn't belong to clean ups, I think.
Lets move this to #3 part
>
> Patches were changed to ensure that they won't be add any new
> new features. They are just code shift in order to make the diff
> of the next patches smaller.
>
> There is a small point here: the logic was simplified to only
> support a single source ID (I added an assert() to enforce it) and
> simplified the calculus in preparation for the HEST and migration
> series.
>
>
> 2) add a BIOS pointer to HEST, using it. The migration stuff
> will be along those:
>
> c24f1a8708e3 acpi/ghes: add a firmware file with HEST address
> 853dce23ec39 acpi/ghes: Use HEST table offsets when preparing GHES records
> c148716fd7c8 acpi/generic_event_device: Update GHES migration to cover hest addr
>
> Up to that, still no new features, but the offset calculus will be
> relative to HEST table and will use the bios pointers stored there;
>
> 3) Add support for generic error inject:
>
> f5ec0d197d82 acpi/ghes: add a notifier to notify when error data is ready
> f5e015537209 arm/virt: Wire up a GED error device for ACPI / GHES
> 3b6692dbf473 qapi/acpi-hest: add an interface to do generic CPER error injection
> 620a5a49f218 scripts/ghes_inject: add a script to generate GHES error inject
>
> 4) MPIDR property:
> 2dd6e3aae450 target/arm: add an experimental mpidr arm cpu property object
> 02c88cd4daa2 scripts/arm_processor_error.py: retrieve mpidr if not filled
>
> I'm still testing if the rebase didn't cause any issues. So, the above
> may still change a little bit. I also need to address your comments to the
> cleanup patches and work at the migration, but just want to double check if
> this is what you want.
>
> If OK to you, my plan is to submit you the cleanup patches after I
> finish testing the hole series.
>
> The migration logic will require some time, and I don't want to bother
> with the cleanup stuff while doing it. So, perhaps while I'm doing it,
> you could review/merge the cleanups.
>
> We can do the same for each of the 4 above series of patches, as it
> makes review simpler as there will be less patches to look into on
> each series.
>
> Would it work for you?
other than nit above, LGTM
>
> Thanks,
> Mauro
>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation
2024-09-24 13:14 ` Igor Mammedov
@ 2024-09-25 4:26 ` Mauro Carvalho Chehab
0 siblings, 0 replies; 40+ messages in thread
From: Mauro Carvalho Chehab @ 2024-09-25 4:26 UTC (permalink / raw)
To: Igor Mammedov
Cc: Jonathan Cameron, Shiju Jose, Michael S. Tsirkin, Ani Sinha,
Cleber Rosa, Dongjiu Geng, Eric Blake, John Snow,
Markus Armbruster, Michael Roth, Paolo Bonzini, Peter Maydell,
Shannon Zhao, kvm, linux-kernel, qemu-arm, qemu-devel
Em Tue, 24 Sep 2024 15:14:29 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > 1) preparation patches:
...
> > 69850f550f99 acpi/generic_event_device: add an APEI error device
> this one doesn't belong to clean ups, I think.
> Lets move this to #3 part
Ok.
> > The migration logic will require some time, and I don't want to bother
> > with the cleanup stuff while doing it. So, perhaps while I'm doing it,
> > you could review/merge the cleanups.
> >
> > We can do the same for each of the 4 above series of patches, as it
> > makes review simpler as there will be less patches to look into on
> > each series.
> >
> > Would it work for you?
>
> other than nit above, LGTM
>
Ok, sent a PR with the first set (cleanups) at:
https://lore.kernel.org/qemu-devel/cover.1727236561.git.mchehab+huawei@kernel.org/
You can see the full series at:
https://gitlab.com/mchehab_kernel/qemu/-/commits/qemu_submission_v11b?ref_type=heads
It works fine, except for the migration part that I'm still working with.
For the migration, there are how two functions at ghes.c:
The one compatible with current behavior (up to version 9.1):
https://gitlab.com/mchehab_kernel/qemu/-/blob/qemu_submission_v11b/hw/acpi/ghes.c?ref_type=heads#L411
And the new one using offsets calculated from HEST (newer versions):
https://gitlab.com/mchehab_kernel/qemu/-/blob/qemu_submission_v11b/hw/acpi/ghes.c?ref_type=heads#L437
With that, the migration logic can decide what function should be
called (currently, it is just checking if hest_addr_le is zero, but
I guess I'll need to change it to match some variable added by the
migration path.
Also, in preparation for the migration tests, I created a separate
branch at:
https://gitlab.com/mchehab_kernel/qemu/-/commits/ghes_on_v9.1.0?ref_type=heads
which contains the same patches on the top of 9.1, except for
the HEST ones. It also contains a hack to use ACPI_GHES_NOTIFY_GPIO
instead of ACPI_GHES_NOTIFY_SEA.
With that, we have a way to use the same error injection logic
on both 9.1 and upstream, hopefully being enough to test if migration
works.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 40+ messages in thread