From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: "Igor Mammedov" <imammedo@redhat.com>,
"Michael S . Tsirkin" <mst@redhat.com>,
"Shiju Jose" <shiju.jose@huawei.com>,
qemu-arm@nongnu.org, qemu-devel@nongnu.org,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Ani Sinha" <anisinha@redhat.com>,
"Cleber Rosa" <crosa@redhat.com>,
"Dongjiu Geng" <gengdongjiu1@gmail.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Eric Blake" <eblake@redhat.com>, "John Snow" <jsnow@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Michael Roth" <michael.roth@amd.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Peter Maydell" <peter.maydell@linaro.org>,
"Shannon Zhao" <shannon.zhaosl@gmail.com>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Zhao Liu" <zhao1.liu@intel.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
"Gavin Shan" <gshan@redhat.com>
Subject: Re: [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject
Date: Fri, 21 Feb 2025 10:21:27 +0000 [thread overview]
Message-ID: <20250221102127.000059e6@huawei.com> (raw)
In-Reply-To: <20250221073823.061a1039@foz.lan>
On Fri, 21 Feb 2025 07:38:23 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Em Mon, 3 Feb 2025 16:22:36 +0100
> Igor Mammedov <imammedo@redhat.com> escreveu:
>
> > On Mon, 3 Feb 2025 11:09:34 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >
> > > On Fri, 31 Jan 2025 18:42:41 +0100
> > > Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > >
> > > > Now that the ghes preparation patches were merged, let's add support
> > > > for error injection.
> > > >
> > > > On this series, the first 6 patches chang to the math used to calculate offsets at HEST
> > > > table and hardware_error firmware file, together with its migration code. Migration tested
> > > > with both latest QEMU released kernel and upstream, on both directions.
> > > >
> > > > The next patches add a new QAPI to allow injecting GHESv2 errors, and a script using such QAPI
> > > > to inject ARM Processor Error records.
> > > >
> > > > If I'm counting well, this is the 19th submission of my error inject patches.
> > >
> > > Looks good to me. All remaining trivial things are in the category
> > > of things to consider only if you are doing another spin. The code
> > > ends up how I'd like it at the end of the series anyway, just
> > > a question of the precise path to that state!
> >
> > if you look at series as a whole it's more or less fine (I guess you
> > and me got used to it)
> >
> > however if you take it patch by patch (as if you've never seen it)
> > ordering is messed up (the same would apply to everyone after a while
> > when it's forgotten)
> >
> > So I'd strongly suggest to restructure the series (especially 2-6/14).
> > re sum up my comments wrt ordering:
> >
> > 0 add testcase for HEST table with current HEST as expected blob
> > (currently missing), so that we can be sure that we haven't messed
> > existing tables during refactoring.
To potentially save time I think Igor is asking that before you do anything
at all you plug the existing test hole which is that we don't test HEST
at all. Even after this series I think we don't test HEST. You add
a stub hest and exclusion but then in patch 12 the HEST stub is deleted whereas
it should be replaced with the example data for the test.
That indeed doesn't address testing the error data storage which would be
a different problem.
>
> Not sure if I got this one. The HEST table is part of etc/acpi/tables,
> which is already tested, as you pointed at the previous reviews. Doing
> changes there is already detected. That's basically why we added patches
> 10 and 12:
>
> [PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new table: HEST
> [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT
>
> What tests don't have is a check for etc/hardware_errors firmware inside
> tests/data/acpi/aarch64/virt/, but, IMO, we shouldn't add it there.
>
> See, hardware_errors table contains only some skeleton space to
> store:
>
> - 1 or more error block address offsets;
> - 1 or more read ack register;
> - 1 or more HEST source entries containing CPER blocks.
>
> There's nothing there to be actually checked: it is just some
> empty spaces with a variable number of fields.
>
> With the new code, the actual number of CPER blocks and their
> corresponding offsets and read ack registers can be different on
> different architectures. So, for instance, when we add x86 support,
> we'll likely start with just one error source entry, while arm will
> have two after this changeset.
>
> Also, one possibility to address the issues reported by Gavin Shan at
> https://lore.kernel.org/qemu-devel/20250214041635.608012-1-gshan@redhat.com/
> would be to have one entry per each CPU. So, the size of such firmware
> could be dependent on the number of CPUs.
>
> So, adding any validation to it would just cause pain and probably
> won't detect any problems.
If we did do this the test would use a fixed number of CPUs so
would just verify we didn't break a small number of variants. Useful
but to me a follow up to this series not something that needs to
be part of it - particularly as Gavin's work may well change that!
>
> What could be done instead is to have a different type of tests that
> would use the error injection script to check if regressions are
> introduced after QEMU 10.0. Such new kind of test would require
> this series to be merged first. It would also require the usage of
> an OSPM image with some testing tools on it. This is easier said
> than done, as besides the complexity of having an OSPM test image,
> such kind of tests would require extra logic, specially if it would
> check regressions for SEA and other notification sources.
>
Agreed that a more end to end test is even better, but those are
quite a bit more complex so definitely a follow up.
J
> Thanks,
> Mauro
>
next prev parent reply other threads:[~2025-02-21 10:22 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-31 17:42 [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for error inject Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 01/14] acpi/ghes: Prepare to support multiple sources on ghes Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 02/14] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2025-02-03 13:41 ` Igor Mammedov
2025-01-31 17:42 ` [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when preparing GHES records Mauro Carvalho Chehab
2025-02-03 10:42 ` Jonathan Cameron via
2025-02-03 14:34 ` Igor Mammedov
2025-02-21 6:02 ` Mauro Carvalho Chehab
2025-02-25 9:43 ` Igor Mammedov
2025-02-26 16:14 ` Mauro Carvalho Chehab
2025-02-27 9:22 ` Igor Mammedov
2025-02-27 10:11 ` Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 04/14] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 05/14] acpi/generic_event_device: add logic to detect if HEST addr is available Mauro Carvalho Chehab
2025-02-03 14:56 ` Igor Mammedov
2025-01-31 17:42 ` [PATCH v3 06/14] acpi/ghes: only set hw_error_le or hest_addr_le Mauro Carvalho Chehab
2025-02-03 10:48 ` Jonathan Cameron via
2025-01-31 17:42 ` [PATCH v3 07/14] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 08/14] acpi/ghes: Cleanup the code which gets ghes ged state Mauro Carvalho Chehab
2025-02-03 10:51 ` Jonathan Cameron via
2025-01-31 17:42 ` [PATCH v3 09/14] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 10/14] tests/acpi: virt: allow acpi table changes for a new table: HEST Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 11/14] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2025-01-31 17:42 ` [PATCH v3 12/14] tests/acpi: virt: add a HEST table to aarch64 virt and update DSDT Mauro Carvalho Chehab
2025-02-03 10:53 ` Jonathan Cameron via
2025-01-31 17:42 ` [PATCH v3 13/14] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2025-02-05 8:12 ` Markus Armbruster
2025-01-31 17:42 ` [PATCH v3 14/14] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2025-02-03 10:56 ` Jonathan Cameron via
2025-02-05 8:16 ` Markus Armbruster
2025-02-21 4:57 ` Mauro Carvalho Chehab
2025-02-21 5:50 ` Markus Armbruster
2025-02-03 11:09 ` [PATCH v3 00/14] Change ghes to use HEST-based offsets and add support for " Jonathan Cameron via
2025-02-03 15:22 ` Igor Mammedov
2025-02-21 6:38 ` Mauro Carvalho Chehab
2025-02-21 10:21 ` Jonathan Cameron via [this message]
2025-02-21 12:23 ` Mauro Carvalho Chehab
2025-02-21 15:05 ` Mauro Carvalho Chehab
2025-02-25 10:01 ` Igor Mammedov
2025-02-26 9:56 ` Mauro Carvalho Chehab
2025-02-26 11:23 ` Mauro Carvalho Chehab
2025-02-26 11:31 ` Mauro Carvalho Chehab
2025-02-26 12:29 ` Igor Mammedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250221102127.000059e6@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=armbru@redhat.com \
--cc=crosa@redhat.com \
--cc=eblake@redhat.com \
--cc=eduardo@habkost.net \
--cc=gengdongjiu1@gmail.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=jsnow@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mchehab+huawei@kernel.org \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=shannon.zhaosl@gmail.com \
--cc=shiju.jose@huawei.com \
--cc=wangyanan55@huawei.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).