From: Markus Armbruster <armbru@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Shiju Jose <shiju.jose@huawei.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Ani Sinha <anisinha@redhat.com>,
Dongjiu Geng <gengdongjiu1@gmail.com>,
Eric Blake <eblake@redhat.com>,
Igor Mammedov <imammedo@redhat.com>,
Michael Roth <michael.roth@amd.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Peter Maydell <peter.maydell@linaro.org>,
linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
qemu-devel@nongnu.org
Subject: Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
Date: Mon, 29 Jul 2024 16:32:41 +0200 [thread overview]
Message-ID: <87zfq0b75i.fsf@pond.sub.org> (raw)
In-Reply-To: <20240729142154.44d484c4@foz.lan> (Mauro Carvalho Chehab's message of "Mon, 29 Jul 2024 14:21:54 +0200")
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> Em Thu, 25 Jul 2024 11:48:12 +0200
> Markus Armbruster <armbru@redhat.com> escreveu:
>
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>>
>> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> >
>> > 1. Some GHES functions require handling addresses. Add a helper function
>> > to support it.
>> >
>> > 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
>> >
>> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
>> > upper specs, using error type bit encoding as detailed at UEFI 2.9A
>> > errata.
>> >
>> > Error injection examples:
>> >
>> > { "execute": "qmp_capabilities" }
>> >
>> > { "execute": "arm-inject-error",
>> > "arguments": {
>> > "errortypes": ['cache-error']
>> > }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> > "arguments": {
>> > "errortypes": ['tlb-error']
>> > }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> > "arguments": {
>> > "errortypes": ['bus-error']
>> > }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> > "arguments": {
>> > "errortypes": ['cache-error', 'tlb-error']
>> > }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> > "arguments": {
>> > "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>> > }
>> > }
>> > ...
>> >
>> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
>> > For Add a logic to handle block addresses,
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > For FW first ARM processor error injection,
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> > ---
>> > configs/targets/aarch64-softmmu.mak | 1 +
>> > hw/acpi/ghes.c | 258 ++++++++++++++++++++++++++--
>> > hw/arm/Kconfig | 4 +
>> > hw/arm/arm_error_inject.c | 35 ++++
>> > hw/arm/arm_error_inject_stubs.c | 18 ++
>> > hw/arm/meson.build | 3 +
>> > include/hw/acpi/ghes.h | 2 +
>> > qapi/arm-error-inject.json | 49 ++++++
>> > qapi/meson.build | 1 +
>> > qapi/qapi-schema.json | 1 +
>> > 10 files changed, 361 insertions(+), 11 deletions(-)
>> > create mode 100644 hw/arm/arm_error_inject.c
>> > create mode 100644 hw/arm/arm_error_inject_stubs.c
>> > create mode 100644 qapi/arm-error-inject.json
>>
>> Since the new file not covered in MAINTAINERS, get_maintainer.pl will
>> blame it on the QAPI maintainers alone. No good.
>
> Added myself there:
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 98eddf7ae155..713a104ef901 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
> F: include/hw/acpi/ghes.h
> F: docs/specs/acpi_hest_ghes.rst
>
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/arm_error_inject.c
> +F: hw/arm/arm_error_inject_stubs.c
> +F: qapi/arm-error-inject.json
> +
> ppc4xx
> L: qemu-ppc@nongnu.org
> S: Orphan
>
>>
>> [...]
>>
>> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
>> > new file mode 100644
>> > index 000000000000..430e6cea6b60
>> > --- /dev/null
>> > +++ b/qapi/arm-error-inject.json
>> > @@ -0,0 +1,49 @@
>> > +# -*- Mode: Python -*-
>> > +# vim: filetype=python
>> > +
>> > +##
>> > +# = ARM Processor Errors
>> > +##
>> > +
>> > +##
>> > +# @ArmProcessorErrorType:
>> > +#
>> > +# Type of ARM processor error to inject
>> > +#
>> > +# @unknown-error: Unknown error
>>
>> Removed in PATCH 7, and unused until then. Why add it in the first
>> place?
>
> I folded this with patch 7, so this was gone now.
>
>>
>> > +#
>> > +# @cache-error: Cache error
>> > +#
>> > +# @tlb-error: TLB error
>> > +#
>> > +# @bus-error: Bus error.
>> > +#
>> > +# @micro-arch-error: Micro architectural error.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'enum': 'ArmProcessorErrorType',
>> > + 'data': ['unknown-error',
>> > + 'cache-error',
>>
>> Tab in this line. Please convert to spaces.
>
> Ok.
>
>>
>> > + 'tlb-error',
>> > + 'bus-error',
>> > + 'micro-arch-error']
>> > +}
>> > +
>> > +##
>> > +# @arm-inject-error:
>> > +#
>> > +# Inject ARM Processor error.
>> > +#
>> > +# @errortypes: ARM processor error types to inject
>> > +#
>> > +# Features:
>> > +#
>> > +# @unstable: This command is experimental.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'command': 'arm-inject-error',
>> > + 'data': { 'errortypes': ['ArmProcessorErrorType'] },
>>
>> Please separate words with dashes: 'error-types'.
>
> Done.
>
> Folding with patch 7 broke it on two separate fields: error and
> type.
>
>>
>> > + 'features': [ 'unstable' ]
>> > +}
>>
>> Is this used only with TARGET_ARM?
>
> Yes, as this CPER record is defined only for arm. There are three other
> processor error info:
> - for x86;
> - for ia32;
> - for "generic cpu".
>
> They have different structures, with different fields.
A generic inject-error command feels nicer, but coding its arguments in
the schema could be more trouble than it's worth. I'm not asking you to
try.
A target-specific command like this one should be conditional. Try
this:
{ 'command': 'arm-inject-error',
'data': { 'errortypes': ['ArmProcessorErrorType'] },
'features': [ 'unstable' ],
'if': 'TARGET_ARM' }
No need to provide a qmp_arm_inject_error() stub then.
>> Why is being able to inject multiple error types at once useful?
>
> The CPER ARM Processor record is defined at UEFI spec as having from 1 to
> 255 errors, that can be using the same type or not. The idea behind UEFI
> spec is that a single root error may be reflected on multiple errors.
>
> It may also help to reduce BIOS interrupts to OS, by merging errors
> altogether, as memory errors usually happen in bursts.
>
> Due to that, a single Processor Error Information inside a CPER record
> for ARM processor can, according with UEFI spec, contain more than one
> of the following bits set:
>
> +-----|---------------------------+
> | Bit | Meaning |
> +=====+===========================+
> | 1 | Cache Error |
> | 2 | TLB Error |
> | 3 | Bus Error |
> | 4 | Micro-architectural Error |
> +-----|---------------------------+
>
> So, the spec allows, for instance, to have a single Processor Error
> Information (PEI) with micro-arch and tlb-error flags raised at the
> same time.
>
> We need the capability of testing multiple error types in order to check
> if OS implementation is decoding it the right way. In particular, Linux
> was not doing it right, as the CPER ARM Processor record handler was
> written at the time UEFI 2.6 spec was written, while the actual encoding
> for the error type was only defined at UEFI 2.9A errata and newer.
I see.
>> I'd expect at least some of these errors to come with additional
>> information. For instance, I imagine a bus error is associated with
>> some address.
>
> It actually depends on the ARM and PEI valid fields: the address may or
> may not be present, depending if the phy/logical address valid field bit
> is set or not.
>
>>
>> If we encode the the error to inject as an enum value, adding more will
>> be hard.
>>
>> If we wrap the enum in a struct
>>
>> { 'struct': 'ArmProcessorError',
>> 'data': { 'type': 'ArmProcessorErrorType' } }
>>
>> we can later extend it like
>>
>> { 'union': 'ArmProcessorError',
>> 'base: { 'type': 'ArmProcessorErrorType' }
>> 'data': {
>> 'bus-error': 'ArmProcessorBusErrorData' } }
>>
>> { 'struct': 'ArmProcessorBusErrorData',
>> 'data': ... }
>
> I don't see this working as one might expect. See, the ARM error
> information data can be repeated from 1 to 255 times. It is given
> by this struct (see patch 7):
>
> { 'struct': 'ArmProcessorErrorInformation',
> 'data': { '*validation': ['ArmPeiValidationBits'],
> 'type': ['ArmProcessorErrorType'],
> '*multiple-error': 'uint16',
> '*flags': ['ArmProcessorFlags'],
> '*error-info': 'uint64',
> '*virt-addr': 'uint64',
> '*phy-addr': 'uint64'}
> }
>
> According with the UEFI spec, the type is always be present.
> The other fields are marked as valid or not via the field
> "validation". So, there's one bit indicating what is valid between
> the fields at the PEI structure, e. g.:
>
> - multiple-error: multiple occurrences of the error;
> - flags;
> - error-info: error information;
> - virt-addr: virtual address;
> - phy-addr: physical address.
>
> There are also other fields that are global for the entire record,
> also marked as valid or not via another bitmask.
>
> The contents of almost all those fields are independent of the error
> type. The only field which content is affected by the error type is
> "error-info", and the definition of such field is not fully specified.
>
> So, currently, UEFI spec only defines it when:
>
> 1. the error type has just one bit set;
> 2. the error type is either cache, TLB or bus error[1].
> If type is micro-arch-specific error, the spec doesn't tell how this
> field if filled.
>
> To make the API simple (yet powerful), I opted to not enforce any encoding
> for error-info: let userspace fill it as required and use some default
> that would make sense, if this is not passed via QMP.
>
> [1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
I asked because designing for extensibility is good practice.
It's not a hard requirement here, because feature 'unstable' gives us
lincense to change the interface incompatibly.
[...]
next prev parent reply other threads:[~2024-07-29 14:32 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1721630625.git.mchehab+huawei@kernel.org>
2024-07-22 6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-07-30 7:25 ` Igor Mammedov
2024-07-30 8:29 ` Peter Maydell
2024-07-30 11:26 ` Igor Mammedov
2024-08-01 13:15 ` Mauro Carvalho Chehab
2024-08-05 14:04 ` Igor Mammedov
2024-08-05 15:22 ` Mauro Carvalho Chehab
2024-07-22 6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-07-26 12:30 ` Jonathan Cameron
2024-07-30 8:36 ` Igor Mammedov
2024-07-31 5:17 ` Mauro Carvalho Chehab
2024-07-22 6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-07-30 8:40 ` Igor Mammedov
2024-08-01 12:56 ` Mauro Carvalho Chehab
2024-08-01 14:32 ` Jonathan Cameron
2024-07-22 6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
2024-07-25 9:48 ` Markus Armbruster
2024-07-26 12:46 ` Jonathan Cameron
2024-07-29 12:49 ` Mauro Carvalho Chehab
2024-07-29 12:21 ` Mauro Carvalho Chehab
2024-07-29 14:32 ` Markus Armbruster [this message]
2024-08-01 14:34 ` Mauro Carvalho Chehab
2024-07-26 12:44 ` Jonathan Cameron
2024-07-29 11:40 ` Mauro Carvalho Chehab
2024-07-30 11:17 ` Igor Mammedov
2024-07-31 7:11 ` Mauro Carvalho Chehab
2024-07-31 8:57 ` Jonathan Cameron
2024-07-31 10:30 ` Mauro Carvalho Chehab
2024-08-01 8:36 ` Igor Mammedov
2024-08-01 14:26 ` Mauro Carvalho Chehab
2024-07-22 6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
2024-07-26 12:50 ` Jonathan Cameron
2024-07-22 6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
2024-07-30 11:24 ` Igor Mammedov
2024-07-30 11:36 ` Michael S. Tsirkin
2024-07-31 6:05 ` Mauro Carvalho Chehab
2024-07-22 6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
2024-07-25 10:03 ` Markus Armbruster
2024-07-29 11:18 ` Mauro Carvalho Chehab
2024-07-26 13:22 ` Jonathan Cameron
2024-07-29 11:10 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zfq0b75i.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=eblake@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox