public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Shiju Jose <shiju.jose@huawei.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	 Ani Sinha <anisinha@redhat.com>,
	 Dongjiu Geng <gengdongjiu1@gmail.com>,
	 Eric Blake <eblake@redhat.com>,
	 Igor Mammedov <imammedo@redhat.com>,
	 Michael Roth <michael.roth@amd.com>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	 Peter Maydell <peter.maydell@linaro.org>,
	 linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
	 qemu-devel@nongnu.org
Subject: Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection
Date: Mon, 29 Jul 2024 16:32:41 +0200	[thread overview]
Message-ID: <87zfq0b75i.fsf@pond.sub.org> (raw)
In-Reply-To: <20240729142154.44d484c4@foz.lan> (Mauro Carvalho Chehab's message of "Mon, 29 Jul 2024 14:21:54 +0200")

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Thu, 25 Jul 2024 11:48:12 +0200
> Markus Armbruster <armbru@redhat.com> escreveu:
>
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>> 
>> > From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> >
>> > 1. Some GHES functions require handling addresses. Add a helper function
>> >    to support it.
>> >
>> > 2. Add support for ACPI CPER (firmware-first) ARM processor error injection.
>> >
>> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and
>> > upper specs, using error type bit encoding as detailed at UEFI 2.9A
>> > errata.
>> >
>> > Error injection examples:
>> >
>> > { "execute": "qmp_capabilities" }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['tlb-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['bus-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error', 'tlb-error']
>> >       }
>> > }
>> >
>> > { "execute": "arm-inject-error",
>> >       "arguments": {
>> >         "errortypes": ['cache-error', 'tlb-error', 'bus-error', 'micro-arch-error']
>> >       }
>> > }
>> > ...
>> >
>> > Co-authored-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Co-authored-by: Shiju Jose <shiju.jose@huawei.com>
>> > For Add a logic to handle block addresses,
>> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > For FW first ARM processor error injection,
>> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>> > ---
>> >  configs/targets/aarch64-softmmu.mak |   1 +
>> >  hw/acpi/ghes.c                      | 258 ++++++++++++++++++++++++++--
>> >  hw/arm/Kconfig                      |   4 +
>> >  hw/arm/arm_error_inject.c           |  35 ++++
>> >  hw/arm/arm_error_inject_stubs.c     |  18 ++
>> >  hw/arm/meson.build                  |   3 +
>> >  include/hw/acpi/ghes.h              |   2 +
>> >  qapi/arm-error-inject.json          |  49 ++++++
>> >  qapi/meson.build                    |   1 +
>> >  qapi/qapi-schema.json               |   1 +
>> >  10 files changed, 361 insertions(+), 11 deletions(-)
>> >  create mode 100644 hw/arm/arm_error_inject.c
>> >  create mode 100644 hw/arm/arm_error_inject_stubs.c
>> >  create mode 100644 qapi/arm-error-inject.json  
>> 
>> Since the new file not covered in MAINTAINERS, get_maintainer.pl will
>> blame it on the QAPI maintainers alone.  No good.
>
> Added myself there:
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 98eddf7ae155..713a104ef901 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c
>  F: include/hw/acpi/ghes.h
>  F: docs/specs/acpi_hest_ghes.rst
>  
> +ACPI/HEST/GHES/ARM processor CPER
> +R: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> +S: Maintained
> +F: hw/arm/arm_error_inject.c
> +F: hw/arm/arm_error_inject_stubs.c
> +F: qapi/arm-error-inject.json
> +
>  ppc4xx
>  L: qemu-ppc@nongnu.org
>  S: Orphan
>
>> 
>> [...]
>> 
>> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json
>> > new file mode 100644
>> > index 000000000000..430e6cea6b60
>> > --- /dev/null
>> > +++ b/qapi/arm-error-inject.json
>> > @@ -0,0 +1,49 @@
>> > +# -*- Mode: Python -*-
>> > +# vim: filetype=python
>> > +
>> > +##
>> > +# = ARM Processor Errors
>> > +##
>> > +
>> > +##
>> > +# @ArmProcessorErrorType:
>> > +#
>> > +# Type of ARM processor error to inject
>> > +#
>> > +# @unknown-error: Unknown error  
>> 
>> Removed in PATCH 7, and unused until then.  Why add it in the first
>> place?
>
> I folded this with patch 7, so this was gone now.
>
>> 
>> > +#
>> > +# @cache-error: Cache error
>> > +#
>> > +# @tlb-error: TLB error
>> > +#
>> > +# @bus-error: Bus error.
>> > +#
>> > +# @micro-arch-error: Micro architectural error.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'enum': 'ArmProcessorErrorType',
>> > +  'data': ['unknown-error',
>> > +	   'cache-error',  
>> 
>> Tab in this line.  Please convert to spaces.
>
> Ok.
>
>> 
>> > +           'tlb-error',
>> > +           'bus-error',
>> > +           'micro-arch-error']
>> > +}
>> > +
>> > +##
>> > +# @arm-inject-error:
>> > +#
>> > +# Inject ARM Processor error.
>> > +#
>> > +# @errortypes: ARM processor error types to inject
>> > +#
>> > +# Features:
>> > +#
>> > +# @unstable: This command is experimental.
>> > +#
>> > +# Since: 9.1
>> > +##
>> > +{ 'command': 'arm-inject-error',
>> > +  'data': { 'errortypes': ['ArmProcessorErrorType'] },  
>> 
>> Please separate words with dashes: 'error-types'.
>
> Done.
>
> Folding with patch 7 broke it on two separate fields: error and
> type.
>
>> 
>> > +  'features': [ 'unstable' ]
>> > +}  
>> 
>> Is this used only with TARGET_ARM?
>
> Yes, as this CPER record is defined only for arm. There are three other
> processor error info:
> 	- for x86;
> 	- for ia32;
> 	- for "generic cpu".
>
> They have different structures, with different fields.

A generic inject-error command feels nicer, but coding its arguments in
the schema could be more trouble than it's worth.  I'm not asking you to
try.

A target-specific command like this one should be conditional.  Try
this:

    { 'command': 'arm-inject-error',
      'data': { 'errortypes': ['ArmProcessorErrorType'] },
      'features': [ 'unstable' ],
      'if': 'TARGET_ARM' }

No need to provide a qmp_arm_inject_error() stub then.

>> Why is being able to inject multiple error types at once useful?
>
> The CPER ARM Processor record is defined at UEFI spec as having from 1 to
> 255 errors, that can be using the same type or not. The idea behind UEFI
> spec is that a single root error may be reflected on multiple errors.
>
> It may also help to reduce BIOS interrupts to OS, by merging errors
> altogether, as memory errors usually happen in bursts.
>
> Due to that, a single Processor Error Information inside a CPER record
> for ARM processor can, according with UEFI spec, contain more than one
> of the following bits set:
>
>             +-----|---------------------------+
>             | Bit | Meaning                   |
>             +=====+===========================+
>             |  1  | Cache Error               |
>             |  2  | TLB Error                 |
>             |  3  | Bus Error                 |
>             |  4  | Micro-architectural Error |
>             +-----|---------------------------+
>
> So, the spec allows, for instance, to have a single Processor Error
> Information (PEI) with micro-arch and tlb-error flags raised at the
> same time.
>
> We need the capability of testing multiple error types in order to check
> if OS implementation is decoding it the right way. In particular, Linux
> was not doing it right, as the CPER ARM Processor record handler was 
> written at the time UEFI 2.6 spec was written, while the actual encoding
> for the error type was only defined at UEFI 2.9A errata and newer.

I see.

>> I'd expect at least some of these errors to come with additional
>> information.  For instance, I imagine a bus error is associated with
>> some address.
>
> It actually depends on the ARM and PEI valid fields: the address may or 
> may not be present, depending if the phy/logical address valid field bit
> is set or not.
>
>> 
>> If we encode the the error to inject as an enum value, adding more will
>> be hard.
>> 
>> If we wrap the enum in a struct
>> 
>>     { 'struct': 'ArmProcessorError',
>>       'data': { 'type': 'ArmProcessorErrorType' } }
>> 
>> we can later extend it like
>> 
>>     { 'union': 'ArmProcessorError',
>>       'base: { 'type': 'ArmProcessorErrorType' }
>>       'data': {
>>           'bus-error': 'ArmProcessorBusErrorData' } }
>> 
>>     { 'struct': 'ArmProcessorBusErrorData',
>>       'data': ... }
>
> I don't see this working as one might expect. See, the ARM error
> information data can be repeated from 1 to 255 times. It is given 
> by this struct (see patch 7):
>
> 	{ 'struct': 'ArmProcessorErrorInformation',
> 	  'data': { '*validation': ['ArmPeiValidationBits'],
> 	            'type': ['ArmProcessorErrorType'],
> 	            '*multiple-error': 'uint16',
> 	            '*flags': ['ArmProcessorFlags'],
> 	            '*error-info': 'uint64',
> 	            '*virt-addr':  'uint64',
> 	            '*phy-addr': 'uint64'}
> 	}
>
> According with the UEFI spec, the type is always be present.
> The other fields are marked as valid or not via the field
> "validation". So, there's one bit indicating what is valid between
> the fields at the PEI structure, e. g.:
>
> 	- multiple-error: multiple occurrences of the error;
> 	- flags;
> 	- error-info: error information;
> 	- virt-addr: virtual address;
> 	- phy-addr: physical address.
>
> There are also other fields that are global for the entire record,
> also marked as valid or not via another bitmask.
>
> The contents of almost all those fields are independent of the error
> type. The only field which content is affected by the error type is
> "error-info", and the definition of such field is not fully specified.
>
> So, currently, UEFI spec only defines it when:
>
> 1. the error type has just one bit set;
> 2. the error type is either cache, TLB or bus error[1].
>    If type is micro-arch-specific error, the spec doesn't tell how this 
>    field if filled.
>
> To make the API simple (yet powerful), I opted to not enforce any encoding
> for error-info: let userspace fill it as required and use some default
> that would make sense, if this is not passed via QMP.
>
> [1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information

I asked because designing for extensibility is good practice.

It's not a hard requirement here, because feature 'unstable' gives us
lincense to change the interface incompatibly.

[...]


  reply	other threads:[~2024-07-29 14:32 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1721630625.git.mchehab+huawei@kernel.org>
2024-07-22  6:45 ` [PATCH v3 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-07-30  7:25   ` Igor Mammedov
2024-07-30  8:29     ` Peter Maydell
2024-07-30 11:26       ` Igor Mammedov
2024-08-01 13:15         ` Mauro Carvalho Chehab
2024-08-05 14:04           ` Igor Mammedov
2024-08-05 15:22             ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 2/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-07-26 12:30   ` Jonathan Cameron
2024-07-30  8:36   ` Igor Mammedov
2024-07-31  5:17     ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 3/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-07-30  8:40   ` Igor Mammedov
2024-08-01 12:56     ` Mauro Carvalho Chehab
2024-08-01 14:32       ` Jonathan Cameron
2024-07-22  6:45 ` [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses and FW first ARM processor error injection Mauro Carvalho Chehab
2024-07-25  9:48   ` Markus Armbruster
2024-07-26 12:46     ` Jonathan Cameron
2024-07-29 12:49       ` Mauro Carvalho Chehab
2024-07-29 12:21     ` Mauro Carvalho Chehab
2024-07-29 14:32       ` Markus Armbruster [this message]
2024-08-01 14:34         ` Mauro Carvalho Chehab
2024-07-26 12:44   ` Jonathan Cameron
2024-07-29 11:40     ` Mauro Carvalho Chehab
2024-07-30 11:17   ` Igor Mammedov
2024-07-31  7:11     ` Mauro Carvalho Chehab
2024-07-31  8:57       ` Jonathan Cameron
2024-07-31 10:30         ` Mauro Carvalho Chehab
2024-08-01  8:36         ` Igor Mammedov
2024-08-01 14:26           ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 5/7] target/arm: preserve mpidr value Mauro Carvalho Chehab
2024-07-26 12:50   ` Jonathan Cameron
2024-07-22  6:45 ` [PATCH v3 6/7] acpi/ghes: update comments to point to newer ACPI specs Mauro Carvalho Chehab
2024-07-30 11:24   ` Igor Mammedov
2024-07-30 11:36     ` Michael S. Tsirkin
2024-07-31  6:05       ` Mauro Carvalho Chehab
2024-07-22  6:45 ` [PATCH v3 7/7] acpi/ghes: extend arm error injection logic Mauro Carvalho Chehab
2024-07-25 10:03   ` Markus Armbruster
2024-07-29 11:18     ` Mauro Carvalho Chehab
2024-07-26 13:22   ` Jonathan Cameron
2024-07-29 11:10     ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zfq0b75i.fsf@pond.sub.org \
    --to=armbru@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anisinha@redhat.com \
    --cc=eblake@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox