From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: Gavin Shan <gshan@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>, <qemu-arm@nongnu.org>,
<qemu-devel@nongnu.org>, <mst@redhat.com>, <anisinha@redhat.com>,
<gengdongjiu1@gmail.com>, <peter.maydell@linaro.org>,
<pbonzini@redhat.com>, <shan.gavin@gmail.com>,
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Subject: Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
Date: Fri, 28 Feb 2025 09:55:29 +0800 [thread overview]
Message-ID: <20250228095529.00007890@huawei.com> (raw)
In-Reply-To: <dafa471d-c5bb-4f6b-8483-17741e0caab1@redhat.com>
On Wed, 26 Feb 2025 14:58:46 +1000
Gavin Shan <gshan@redhat.com> wrote:
> On 2/25/25 9:19 PM, Igor Mammedov wrote:
> > On Fri, 21 Feb 2025 11:04:35 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> >>
> >> Ideally I'd like whatever we choose to look like what a bare metal machine
> >> does - mostly because we are less likely to hit untested OS paths.
> >
> > Ack for that but,
> > that would need someone from hw/firmware side since error status block
> > handling is done by firmware.
> >
> > right now we are just making things up based on spec interpretation.
> >
>
> It's a good point. I think it's worthwhile to understand how the RAS event
> is processed and turned to CPER by firmware.
>
> I didn't figure out how CPER is generated by edk2 after looking into tf-a (trust
> firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
> their helps. However, there is a note in tf-a that briefly explaining how RAS
> event is handled.
>
> From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
> (git@github.com:ARM-software/arm-trusted-firmware.git)
>
> /*
> * Note: Typical RAS error handling flow with Firmware First Handling
> *
> * Step 1: Exception resulting from a RAS error in the normal world is routed to
> * EL3.
> * Step 2: This exception is typically signaled as either a synchronous external
> * abort or SError or interrupt. TF-A (EL3 firmware) delegates the
> * control to platform specific handler built on top of the RAS helper
> * utilities.
> * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
> * message to dedicated S-EL0 (or S-EL1) RAS Partition managed by SPMC.
> * TF-A also populates a shared buffer with a data structure containing
> * enough information (such as system registers) to identify and triage
> * the RAS error.
> * Step 4: RAS SP generates the Common Platform Error Record (CPER) and shares
> * it with normal world firmware and/or OS kernel through a reserved
> * buffer memory.
> * Step 5: RAS SP responds to the direct message with information necessary for
> * TF-A to notify the OS kernel.
> * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS kernel
> * about the CPER records for further logging.
> */
>
> According to the note, RAS SP (Secure Partition) is the black box where the RAS
> event raised by tf-a is turned to CPER. Unfortunately, I didn't find the source
> code to understand the details yet.
This is very much 'a flow' rather than 'the flow'. TFA may not even be
involved in many systems, nor SDEI, nor EDK2 beyond passing through some
config. One option, as I understand it, is to offload the firmware handing
and building of the record to a management processor and stick to SEA
for the signalling.
I'd be rather surprised if you can find anything beyond binary blobs
for those firmware (if that!). Maybe all we can get from publicish sources
is what the HEST tables look like. I've asked our firmware folk if they
can share more on how we do it but might take a while.
I have confirmed we only have one GHESv2 SEA entry on at least the one random
board I looked at (and various interrupt ones). That board may not be
representative but seems pushing everything through one structure is an option.
Jonathan
>
> Thanks,
> Gavin
>
>
next prev parent reply other threads:[~2025-02-28 1:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-14 4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
2025-02-14 4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
2025-02-21 10:44 ` Philippe Mathieu-Daudé
2025-02-14 4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
2025-02-14 4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
2025-02-14 4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
2025-02-19 17:55 ` Igor Mammedov
2025-02-21 5:27 ` Gavin Shan
2025-02-21 11:04 ` Jonathan Cameron via
2025-02-25 11:19 ` Igor Mammedov
2025-02-26 4:58 ` Gavin Shan
2025-02-28 1:55 ` Jonathan Cameron via [this message]
2025-02-26 6:56 ` Gavin Shan
2025-02-14 9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
2025-02-17 0:29 ` Gavin Shan
2025-02-14 10:12 ` Jonathan Cameron via
2025-02-17 3:49 ` Gavin Shan
2025-02-14 12:59 ` Mauro Carvalho Chehab
2025-02-17 3:58 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250228095529.00007890@huawei.com \
--to=qemu-devel@nongnu.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).