qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, mst@redhat.com,
	anisinha@redhat.com, gengdongjiu1@gmail.com,
	peter.maydell@linaro.org, pbonzini@redhat.com,
	shan.gavin@gmail.com,
	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Subject: Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
Date: Wed, 26 Feb 2025 14:58:46 +1000	[thread overview]
Message-ID: <dafa471d-c5bb-4f6b-8483-17741e0caab1@redhat.com> (raw)
In-Reply-To: <20250225121939.7e0e2304@imammedo.users.ipa.redhat.com>

On 2/25/25 9:19 PM, Igor Mammedov wrote:
> On Fri, 21 Feb 2025 11:04:35 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>
>> Ideally I'd like whatever we choose to look like what a bare metal machine
>> does - mostly because we are less likely to hit untested OS paths.
> 
> Ack for that but,
> that would need someone from hw/firmware side since error status block
> handling is done by firmware.
> 
> right now we are just making things up based on spec interpretation.
> 

It's a good point. I think it's worthwhile to understand how the RAS event
is processed and turned to CPER by firmware.

I didn't figure out how CPER is generated by edk2 after looking into tf-a (trust
firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
their helps. However, there is a note in tf-a that briefly explaining how RAS
event is handled.

   From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
   (git@github.com:ARM-software/arm-trusted-firmware.git)

   /*
    * Note: Typical RAS error handling flow with Firmware First Handling
    *
    * Step 1: Exception resulting from a RAS error in the normal world is routed to
    *         EL3.
    * Step 2: This exception is typically signaled as either a synchronous external
    *         abort or SError or interrupt. TF-A (EL3 firmware) delegates the
    *         control to platform specific handler built on top of the RAS helper
    *         utilities.
    * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
    *         message to dedicated S-EL0 (or S-EL1) RAS Partition managed by SPMC.
    *         TF-A also populates a shared buffer with a data structure containing
    *         enough information (such as system registers) to identify and triage
    *         the RAS error.
    * Step 4: RAS SP generates the Common Platform Error Record (CPER) and shares
    *         it with normal world firmware and/or OS kernel through a reserved
    *         buffer memory.
    * Step 5: RAS SP responds to the direct message with information necessary for
    *         TF-A to notify the OS kernel.
    * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS kernel
    *         about the CPER records for further logging.
    */

According to the note, RAS SP (Secure Partition) is the black box where the RAS
event raised by tf-a is turned to CPER. Unfortunately, I didn't find the source
code to understand the details yet.

Thanks,
Gavin



  reply	other threads:[~2025-02-26  4:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
2025-02-21 10:44   ` Philippe Mathieu-Daudé
2025-02-14  4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
2025-02-14  4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
2025-02-19 17:55   ` Igor Mammedov
2025-02-21  5:27     ` Gavin Shan
2025-02-21 11:04       ` Jonathan Cameron via
2025-02-25 11:19         ` Igor Mammedov
2025-02-26  4:58           ` Gavin Shan [this message]
2025-02-28  1:55             ` Jonathan Cameron via
2025-02-26  6:56         ` Gavin Shan
2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
2025-02-17  0:29   ` Gavin Shan
2025-02-14 10:12 ` Jonathan Cameron via
2025-02-17  3:49   ` Gavin Shan
2025-02-14 12:59 ` Mauro Carvalho Chehab
2025-02-17  3:58   ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dafa471d-c5bb-4f6b-8483-17741e0caab1@redhat.com \
    --to=gshan@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).