From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
Ani Sinha <anisinha@redhat.com>,
Dongjiu Geng <gengdongjiu1@gmail.com>,
linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
qemu-devel@nongnu.org
Subject: Re: [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address
Date: Sat, 14 Sep 2024 07:33:14 +0200 [thread overview]
Message-ID: <20240914073314.46368ff6@foz.lan> (raw)
In-Reply-To: <20240913152518.2f80ab1e@imammedo.users.ipa.redhat.com>
Hi Igor,
Em Fri, 13 Sep 2024 15:25:18 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:
> > > in addition to this, it needs a patch on top to make sure
> > > that we migrate hest_addr_le.
> > > See a08a64627b6b 'ACPI: Record the Generic Error Status Block address'
> > > and fixes on top of that for an example.
> >
> > Hmm... If I understood such change well, vmstate_ghes_state() will
> > use this structure as basis to do migration:
> >
> > /* ghes.h */
> > typedef struct AcpiGhesState {
> > uint64_t hest_addr_le;
> > uint64_t ghes_addr_le;
> > bool present; /* True if GHES is present at all on this board */
> > } AcpiGhesState;
> >
> > /* generic_event_device.c */
> > static const VMStateDescription vmstate_ghes_state = {
> > .name = "acpi-ged/ghes",
> > .version_id = 1,
> > .minimum_version_id = 1,
> > .needed = ghes_needed,
> > .fields = (VMStateField[]) {
> > VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> > vmstate_ghes_state, AcpiGhesState),
> > VMSTATE_END_OF_LIST()
> > }
> > };
>
> current code looks like that:
>
> static const VMStateDescription vmstate_ghes = {
> .name = "acpi-ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .fields = (const VMStateField[]) {
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState), <<===
> VMSTATE_END_OF_LIST()
> },
> };
>
> static bool ghes_needed(void *opaque)
> {
> AcpiGedState *s = opaque;
> return s->ghes_state.ghes_addr_le;
> }
>
> static const VMStateDescription vmstate_ghes_state = {
> .name = "acpi-ged/ghes",
> .version_id = 1,
> .minimum_version_id = 1,
> .needed = ghes_needed,
> .fields = (const VMStateField[]) {
> VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
> vmstate_ghes, AcpiGhesState),
> VMSTATE_END_OF_LIST()
> }
> };
>
> where
> VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
> explicitly defines field(s) within structure to be sent over wire.
>
> we need to add a conditional field for ghes_addr_le
> which will be sent only with new machine types, but not with old ones
> to avoid migration breakage.
>
> I don't know much about migration, but maybe we can get away with
> similar condition as in ghes_needed(), or enabling QMP error injection
> based on machine type version.
>
> Or maybe adding a CLI option to enable QMP error injection in which
> case the explicit option would serve as a trigger enable QMP command and
> to migrate hest_addr_le.
> It might be even better this way, since machine wouldn't need to
> carry extra error source that will be used only for testing
> and practically never in production VMs (aka reduced attack surface).
>
> You can easily test it locally:
> new-qemu: with your patches
> old-qemu: qemu-9.1
>
> and then try to do forth & back migration for following cases:
> 1. (ping-pong case with OLD firmware/ACPI tables)
> start old-qemu with 9.1 machine type ->
> migrate to file ->
> start new-qemu with 9.1 machine type -> restore from file ->
> migrate to file ->
As I never used migration, I'm a little stuck with the command line
parameters.
I guess I got the one to do the migration at the monitor:
(qemu) migrate file://tmp/migrate
But no idea how to start a machine using a saved state.
> start old-qemu with 9.1 machine type ->restore from file ->
>
> 2. (ping-pong case with NEW firmware/ACPI tables)
> do the same as #1 but starting with new-qemu binary
>
> (from upstream pov #2 is optional, but not implementing it
> is pain for downstream so it's better to have it if it's not
> too much work)
If I understood the migration documentation, every when new fields
are added, we should increment .version_id. If new version is
not backward-compatible, .minimum_version_id is also incremented.
So, for a migration-compatible code with a 9.1 VM, the code needs to
handle the case where hest_addr_le is not defined, e. g. use offsets
relative to ghes_addr_le, just like the current version, e.g.:
uint64_t cper_addr, read_ack_start_addr;
AcpiGedState *acpi_ged_state =
ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED, NULL));
AcpiGhesState *ags = &acpi_ged_state->ghes_state;
if (!ags->hest_addr_le) {
// Backward-compatible migration code
uint64_t base = le64_to_cpu(ags->ghes_addr_le);
*read_ack_start_addr = base +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
error_source_to_index[notify] * sizeof(uint64_t);
*cper_addr = base +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
} else {
// Use the new logic from ags->hest_addr_le
}
There are two problems with that:
1. On your reviews, if I understood right, the code above is not
migration safe. So, while implementing it would be technically
correct, migration still won't work;
2. With the new code, ACPI_GHES_ERROR_SOURCE_COUNT is not
defined anymore, as the size of the error source structure can
be different on different architectures, being 2 for new
VMs and 1 for old ones.
Basically the new code gets it right because it can see a
pointer to the HEST table, so it can get the number from there:
hest_addr = le64_to_cpu(ags->hest_addr_le);
cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
But, without hest_addr_le, getting num_sources is not possible.
An alternative would be to add a hacky code that works only for
arm machines (as new versions may support more archs).
Something like:
#define V1_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 1
#define V2_ARM_ACPI_GHES_ERROR_SOURCE_COUNT 2
And have a hardcoded logic that would work before/after this
changeset but may break on newer versions, if the number of
source IDs change, if we add other HEST types, etc.
Now, assuming that such hack would work, it sounds too hacky to
my taste.
So, IMO it is a lot safer to not support migrations from v1 (only
ghes_addr_le), using a patch like the enclosed one to ensure that.
Btw, checking existing migration structs, it sounds that for almost all
structures, .version_id is identical to .minimum_version_id, meaning that
migration between different versions aren't supported on most cases.
Thanks,
Mauro
---
[PATCH] acpi/generic_event_device: Update GHES migration to cover hest addr
The GHES migration logic at GED should now support HEST table
location too.
Increase migration version and change needed to check for both
ghes_addr_le and hest_addr_le.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index b4c83a089a02..efae0ff62c7b 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -351,10 +351,11 @@ static const VMStateDescription vmstate_ged_state = {
static const VMStateDescription vmstate_ghes = {
.name = "acpi-ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.fields = (const VMStateField[]) {
VMSTATE_UINT64(ghes_addr_le, AcpiGhesState),
+ VMSTATE_UINT64(hest_addr_le, AcpiGhesState),
VMSTATE_END_OF_LIST()
},
};
@@ -362,13 +363,13 @@ static const VMStateDescription vmstate_ghes = {
static bool ghes_needed(void *opaque)
{
AcpiGedState *s = opaque;
- return s->ghes_state.ghes_addr_le;
+ return s->ghes_state.ghes_addr_le && s->ghes_state.hest_addr_le;
}
static const VMStateDescription vmstate_ghes_state = {
.name = "acpi-ged/ghes",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.needed = ghes_needed,
.fields = (const VMStateField[]) {
VMSTATE_STRUCT(ghes_state, AcpiGedState, 1,
next prev parent reply other threads:[~2024-09-14 5:34 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-25 3:45 [PATCH v9 00/12] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 01/12] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-11 13:51 ` Igor Mammedov
2024-09-13 5:44 ` Mauro Carvalho Chehab
2024-09-13 13:25 ` Igor Mammedov
2024-09-14 5:33 ` Mauro Carvalho Chehab [this message]
2024-09-16 11:05 ` Igor Mammedov
2024-10-01 8:54 ` Mauro Carvalho Chehab
2024-10-03 11:48 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 02/12] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-09-11 15:01 ` Igor Mammedov
2024-09-13 15:14 ` Mauro Carvalho Chehab
2024-08-25 3:45 ` [PATCH v9 03/12] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-13 13:27 ` Igor Mammedov
2024-08-25 3:45 ` [PATCH v9 04/12] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-13 13:31 ` Igor Mammedov
2024-08-25 3:46 ` [PATCH v9 05/12] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 06/12] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 07/12] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 08/12] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-27 11:11 ` Markus Armbruster
2024-08-25 3:46 ` [PATCH v9 09/12] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 10/12] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 11/12] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-08-25 11:34 ` Peter Maydell
2024-08-26 1:53 ` Mauro Carvalho Chehab
2024-08-30 16:27 ` Peter Maydell
2024-09-01 6:40 ` Mauro Carvalho Chehab
2024-08-25 3:46 ` [PATCH v9 12/12] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240914073314.46368ff6@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).