public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: Ruidong Tian <tianruidong@linux.alibaba.com>
To: Himanshu Chauhan <hchauhan@ventanamicro.com>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org,
	acpica-devel@lists.linux.dev
Cc: paul.walmsley@sifive.com, palmer@dabbelt.com, lenb@kernel.org,
	james.morse@arm.com, tony.luck@intel.com, ardb@kernel.org,
	conor@kernel.org, cleger@rivosinc.com, robert.moore@intel.com,
	sunilvl@ventanamicro.com, apatel@ventanamicro.com,
	xueshuai@linux.alibaba.com
Subject: Re: [RFC PATCH v1 00/10] Add RAS support for RISC-V architecture
Date: Fri, 12 Sep 2025 15:30:41 +0800	[thread overview]
Message-ID: <72563756-a53a-4f50-9bf4-87f6b26af036@linux.alibaba.com> (raw)
In-Reply-To: <20250227123628.2931490-1-hchauhan@ventanamicro.com>


在 2025/2/27 20:36, Himanshu Chauhan 写道:
> This series implements the RAS (Reliability, Availability and Serviceability)
> support for RISC-V architecture using RISC-V RERI specification. It is conformant
> to ACPI platform error interfaces (APEI). It uses the highest priority
> Supervisor Software Events (SSE)[2] to deliver the hardware error events to the kernel.
> The SSE implemetation has already been merged in OpenSBI. Clement has sent a patch series for
> its implemenation in Linux kernel.[5]
>
> The GHES driver framework is used as is with the following changes for RISC-V:
> 	1. Register each ghes entry with SSE layer. Ghes notification vector is SSE event.
> 	2. Add RISC-V specific entries for processor type and ISA string
> 	3. Add fixmap indices GHES SSE Low and High Priority to help map and read from
> 	   physical addresses present in GHES entry.
> 	4. Other changes to build/configure the RAS support
>
> How to Use:
> ----------
> This RAS stack consists of Qemu[3], OpenSBI, EDK2[4], Linux kernel and devmem utility to inject and trigger
> errors. Qemu [Ref.] has support to emulate RISC-V RERI. The RAS agent is implemented in OpenSBI which
> creates CPER records. EDK2 generates HEST table and populates it with GHES entries with the help of
> OpenSBI.
>
> Qemu Command:
> ------------
> <qemu-dir>/build/qemu-system-riscv64 \
>      -s -accel tcg -m 4096 -smp 2 \
>      -cpu rv64,smepmp=false \
>      -serial mon:stdio \
>      -d guest_errors -D ./qemu.log \
>      -bios <opensbi-dir>/build/platform/generic/firmware/fw_dynamic.bin \
>      -monitor telnet:127.0.0.1:55555,server,nowait \
>      -device virtio-gpu-pci -full-screen \
>      -device qemu-xhci \
>      -device usb-kbd \
>      -blockdev node-name=pflash0,driver=file,read-only=on,filename=<edk2-build-dir>/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT_CODE.fd \
>      -blockdev node-name=pflash1,driver=file,filename=<edk2-build-dir>/RiscVVirtQemu/RELEASE_GCC5/FV/RISCV_VIRT_VARS.fd \
>      -M virt,pflash0=pflash0,pflash1=pflash1,rpmi=true,reri=true,aia=aplic-imsic \
>      -kernel <kernel image> \
>      -initrd <rootfs image> \
>      -append "root=/dev/ram rw console=ttyS0 earlycon=uart8250,mmio,0x10000000"
>
> Error Injection & Triggering:
> ----------------------------
> devmem 0x4010040 32 0x2a1
> devmem 0x4010048 32 0x9001404
> devmem 0x4010044 8 1
>
> The above commands injects a TLB error on CPU 0.
>
> Sample Output (CPU 0):
> ---------------------
> [   34.370282] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
> [   34.371375] {1}[Hardware Error]: event severity: recoverable
> [   34.372149] {1}[Hardware Error]:  Error 0, type: recoverable
> [   34.372756] {1}[Hardware Error]:   section_type: general processor error
> [   34.373357] {1}[Hardware Error]:   processor_type: 3, RISCV
> [   34.373806] {1}[Hardware Error]:   processor_isa: 6, RISCV64
> [   34.374294] {1}[Hardware Error]:   error_type: 0x02
> [   34.374845] {1}[Hardware Error]:   TLB error
> [   34.375448] {1}[Hardware Error]:   operation: 1, data read
> [   34.376100] {1}[Hardware Error]:   target_address: 0x0000000000000000
>
> References:
> ----------
> [1] RERI Specification: https://github.com/riscv-non-isa/riscv-ras-eri/releases/download/v1.0/riscv-reri.pdf
> [2] SSE Section in OpenSBI v3.0: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v3.0-rc3/riscv-sbi.pdf
> [3] Qemu source (with RERI emulation support): https://github.com/ventanamicro/qemu.git (branch: dev-upstream)
> [4] EDK2: https://github.com/ventanamicro/edk2.git (branch: dev-upstream)
> [5] SSE Kernel Patches: https://lore.kernel.org/linux-riscv/649fdead-09b0-4f94-a6ff-099fc970d890@rivosinc.com/T/

Hi,

Thanks for this series.

I'm doing some work related to your patch. Besides SSE, I'm working on support
for another notification type for synchronous hardware errors (e.g., on a poison
read), which called Hardware Error Exception (HEE) in Dhaval Sharma's UEFI
proposal[0] in PRS-TG.  I have a patch for HEE support which I've sent out
separately[1].

Perhaps we could merge my work into your patchset to bringing a complete RAS
solution to the RISC-V architecture? Or, I'm also happy to wait for your patches
to land and then continue my work on top.

Let me know what you think would be best.

Cheers,
Ruidong Tian

[0]: https://lists.riscv.org/g/tech-prs/topic/risc_v_ras_related_ecrs/113685653
[1]: https://lore.kernel.org/all/20250910093347.75822-6-tianruidong@linux.alibaba.com/

> Himanshu Chauhan (10):
>    riscv: Define ioremap_cache for RISC-V
>    riscv: Define arch_apei_get_mem_attribute for RISC-V
>    acpi: Introduce SSE in HEST notification types
>    riscv: Add fixmap indices for GHES IRQ and SSE contexts
>    riscv: conditionally compile GHES NMI spool function
>    riscv: Add functions to register ghes having SSE notification
>    riscv: Add RISC-V entries in processor type and ISA strings
>    riscv: Introduce HEST SSE notification handlers
>    riscv: Add config option to enable APEI SSE handler
>    riscv: Enable APEI and NMI safe cmpxchg options required for RAS
>
>   arch/riscv/Kconfig                 |   2 +
>   arch/riscv/include/asm/acpi.h      |  20 ++++
>   arch/riscv/include/asm/fixmap.h    |   8 ++
>   arch/riscv/include/asm/io.h        |   3 +
>   drivers/acpi/apei/Kconfig          |   5 +
>   drivers/acpi/apei/ghes.c           | 102 +++++++++++++++++---
>   drivers/firmware/efi/cper.c        |   3 +
>   drivers/firmware/riscv/riscv_sse.c | 147 +++++++++++++++++++++++++++++
>   include/acpi/actbl1.h              |   3 +-
>   include/linux/riscv_sse.h          |  15 +++
>   10 files changed, 296 insertions(+), 12 deletions(-)
>

      parent reply	other threads:[~2025-09-12  7:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 12:36 [RFC PATCH v1 00/10] Add RAS support for RISC-V architecture Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 01/10] riscv: Define ioremap_cache for RISC-V Himanshu Chauhan
2025-05-05 12:32   ` Anup Patel
2025-02-27 12:36 ` [RFC PATCH v1 02/10] riscv: Define arch_apei_get_mem_attribute " Himanshu Chauhan
2025-02-27 12:57   ` Clément Léger
2025-02-27 12:36 ` [RFC PATCH v1 03/10] acpi: Introduce SSE in HEST notification types Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 04/10] riscv: Add fixmap indices for GHES IRQ and SSE contexts Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 05/10] riscv: conditionally compile GHES NMI spool function Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 06/10] riscv: Add functions to register ghes having SSE notification Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 07/10] riscv: Add RISC-V entries in processor type and ISA strings Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 08/10] riscv: Introduce HEST SSE notification handlers Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 09/10] riscv: Add config option to enable APEI SSE handler Himanshu Chauhan
2025-02-27 12:36 ` [RFC PATCH v1 10/10] riscv: Enable APEI and NMI safe cmpxchg options required for RAS Himanshu Chauhan
2025-09-12  7:30 ` Ruidong Tian [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=72563756-a53a-4f50-9bf4-87f6b26af036@linux.alibaba.com \
    --to=tianruidong@linux.alibaba.com \
    --cc=acpica-devel@lists.linux.dev \
    --cc=apatel@ventanamicro.com \
    --cc=ardb@kernel.org \
    --cc=cleger@rivosinc.com \
    --cc=conor@kernel.org \
    --cc=hchauhan@ventanamicro.com \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=robert.moore@intel.com \
    --cc=sunilvl@ventanamicro.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox