Linux-RISC-V Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Ruidong Tian <tianruidong@linux.alibaba.com>
To: pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu,
	alex@ghiti.fr, rafael@kernel.org, tony.luck@intel.com,
	bp@alien8.de, guohanjun@huawei.com, mchehab@kernel.org,
	xueshuai@linux.alibaba.com, lenb@kernel.org,
	saket.dumbre@intel.com
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, acpica-devel@lists.linux.dev,
	Ruidong Tian <tianruidong@linux.alibaba.com>
Subject: [PATCH 0/3] riscv: log Hardware Error Exception via APEI
Date: Fri,  8 May 2026 16:20:17 +0800	[thread overview]
Message-ID: <20260508082020.3368109-1-tianruidong@linux.alibaba.com> (raw)

This series extends the handling of do_trap_hardware_error() based on
the works in [1].

RISC-V already dispatches Hardware Error Exception (cause 19, "HEE") via
do_trap_hardware_error(), but today the trap handler has no way to learn
*what* went wrong: the offending task is killed (or the kernel panics)
with no diagnostic about the underlying hardware fault, no error record
is logged, no page is isolated, and memory_failure() is never invoked.

There are two principal ways to obtain hardware error information on
HEE:

  1. Let firmware parse platform error registers and hand the kernel a
     standardized CPER record through ACPI / APEI / GHES.
  2. Have the kernel read the error registers directly.

Option (2) is not yet viable on RISC-V: the architecture does not
define a unified, mandatory layout for hardware error status registers
across implementations, so there is nothing stable for common code to
decode. This series therefore implements option (1) and wires HEE into
the existing APEI / GHES path, mirroring how arm64 treats SEA.

Future work: option (2) is not ruled out. Once the RISC-V architecture
standardizes a common hardware error register layout (either as part
of the privileged spec or via a well-defined SBI / ACPI namespace
interface), a kernel-native decoder could be added alongside the
ACPI/APEI path. The two can then coexist and be selected per
platform through a Kconfig choice. This series keeps the door open
by routing HEE through apei_claim_hee() behind CONFIG_ACPI_APEI_HEE,
so disabling that config already restores the legacy path and does
not block a future native decoder from being wired in.

After this series:

  * Firmware reports RAS events to the OS as CPER records through a
    HEST GHES entry whose notification type is HEE (new value 13).
  * If CONFIG_ACPI_APEI_HEE is set, do_trap_hardware_error() calls
    apei_claim_hee() first. On success GHES queues the record, drains
    irq_work inline, and delivers a BUS_MCEERR_AR SIGBUS to the faulting
    user task via task_work after isolating the poisoned page with 
    memory_failure(MF_ACTION_REQUIRED).
  * If firmware does not claim the error or CONFIG_ACPI_APEI_HEE not set:
      - user mode falls back to SIGBUS / BUS_MCEERR_AR via do_trap_error(),
      - kernel mode tries fixup_exception() to let MC-safe copy routines
        recover; otherwise die().

References:
----------
[1] [RISC-V RAS patch]: https://lore.kernel.org/all/20260109090224.3105465-1-himanshu.chauhan@oss.qualcomm.com/

Ruidong Tian (3):
  acpi: Introduce HEE in HEST notification types
  riscv: Introduce HEST HEE notification handlers for APEI
  riscv: collect hardware error information via APEI on HEE

 arch/riscv/include/asm/acpi.h   |  2 +
 arch/riscv/include/asm/fixmap.h |  3 ++
 arch/riscv/kernel/acpi.c        | 54 ++++++++++++++++++++++++++
 arch/riscv/kernel/traps.c       | 35 ++++++++++++++++-
 drivers/acpi/apei/Kconfig       | 12 ++++++
 drivers/acpi/apei/ghes.c        | 68 ++++++++++++++++++++++++++++++++-
 include/acpi/actbl1.h           |  3 +-
 include/acpi/ghes.h             |  6 +++
 8 files changed, 178 insertions(+), 5 deletions(-)

-- 
2.51.2.612.gdc70283dfc


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

             reply	other threads:[~2026-05-08  8:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  8:20 Ruidong Tian [this message]
2026-05-08  8:20 ` [PATCH 1/3] acpi: Introduce HEE in HEST notification types Ruidong Tian
2026-05-08  8:20 ` [PATCH 2/3] riscv: Introduce HEST HEE notification handlers for APEI Ruidong Tian
2026-05-08  8:20 ` [PATCH 3/3] riscv: collect hardware error information via APEI on HEE Ruidong Tian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508082020.3368109-1-tianruidong@linux.alibaba.com \
    --to=tianruidong@linux.alibaba.com \
    --cc=acpica-devel@lists.linux.dev \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=bp@alien8.de \
    --cc=guohanjun@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mchehab@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=rafael@kernel.org \
    --cc=saket.dumbre@intel.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox