From: Shiju Jose <shiju.jose@huawei.com>
To: Kai-Heng Feng <kaihengf@nvidia.com>, "ardb@kernel.org" <ardb@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
"Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Shuai Xue <xueshuai@linux.alibaba.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Morduan Zang <zhangdandan@uniontech.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Subject: RE: [PATCH] efi/cper: Add NVIDIA CPER section support
Date: Tue, 24 Feb 2026 11:23:57 +0000 [thread overview]
Message-ID: <786211585e2b4a1fbca58c1427102260@huawei.com> (raw)
In-Reply-To: <20260223064924.6449-1-kaihengf@nvidia.com>
>-----Original Message-----
>From: Kai-Heng Feng <kaihengf@nvidia.com>
>Sent: 23 February 2026 06:49
>To: ardb@kernel.org
>Cc: Kai-Heng Feng <kaihengf@nvidia.com>; Rafael J. Wysocki
><rafael@kernel.org>; Tony Luck <tony.luck@intel.com>; Borislav Petkov
><bp@alien8.de>; Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; Mauro
>Carvalho Chehab <mchehab@kernel.org>; Shuai Xue
><xueshuai@linux.alibaba.com>; Jonathan Cameron
><jonathan.cameron@huawei.com>; Morduan Zang
><zhangdandan@uniontech.com>; linux-kernel@vger.kernel.org; linux-
>efi@vger.kernel.org; linux-acpi@vger.kernel.org
>Subject: [PATCH] efi/cper: Add NVIDIA CPER section support
>
>Add support for decoding NVIDIA-specific error sections in UEFI CPER records.
>NVIDIA hardware generates vendor-specific CPER sections containing error
>signatures and diagnostic register dumps. This implementation decodes these
>sections and prints error details to the kernel log.
>
>The NVIDIA CPER section contains a fixed header with error metadata (signature,
>error type, severity, socket) followed by variable-length register address-value
>pairs for hardware diagnostics.
>
>This work is based on libcper [0].
>
>Example output:
>Hardware error from APEI Generic Hardware Error Source: 816 event severity:
>info imprecise tstamp: 2025-11-17 07:57:38 Error 0, type: info
> section_type: NVIDIA, error_data_length: 224
> signature: HSS-IDLE
> error_type: 0
> error_instance: 0
> severity: 0
> socket: 255
> number_regs: 12
> instance_base: 0x0000000000000000
> register[0]: address=0x0000000004f10008 value=0x0000000000002019
> register[1]: address=0x0000000000000000 value=0x0000000000000000
>
>[0] https://github.com/openbmc/libcper/commit/683e055061ce
>Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
>---
> drivers/firmware/efi/Kconfig | 16 ++++++
> drivers/firmware/efi/Makefile | 1 +
> drivers/firmware/efi/cper-nvidia.c | 79 ++++++++++++++++++++++++++++++
>drivers/firmware/efi/cper-nvidia.h | 33 +++++++++++++
> drivers/firmware/efi/cper.c | 3 ++
> include/linux/cper.h | 4 ++
> 6 files changed, 136 insertions(+)
> create mode 100644 drivers/firmware/efi/cper-nvidia.c
> create mode 100644 drivers/firmware/efi/cper-nvidia.h
>
>diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index
>29e0729299f5..ed1f53b8e878 100644
>--- a/drivers/firmware/efi/Kconfig
>+++ b/drivers/firmware/efi/Kconfig
>@@ -329,6 +329,22 @@ config UEFI_CPER_X86
> depends on UEFI_CPER && X86
> default y
>
>+config UEFI_CPER_NVIDIA
>+ bool "UEFI CPER NVIDIA support"
>+ depends on UEFI_CPER
>+ help
>+ This option enables support for decoding NVIDIA-specific error
>+ sections in UEFI Common Platform Error Records (CPER). These
>+ sections contain additional diagnostic information for errors
>+ occurring in NVIDIA hardware such as GPUs, switches, and other
>+ devices.
>+
>+ The NVIDIA CPER sections include error signatures (e.g., PCIe-DPC,
>+ DCC-ECC, GPU-STATUS) and diagnostic registers that provide detailed
>+ information about hardware errors for debugging and analysis.
>+
>+ If unsure, say N.
>+
> config TEE_STMM_EFI
> tristate "TEE-based EFI runtime variable service driver"
> depends on EFI && OPTEE
>diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index
>8efbcf699e4f..a571b6086860 100644
>--- a/drivers/firmware/efi/Makefile
>+++ b/drivers/firmware/efi/Makefile
>@@ -42,5 +42,6 @@ obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-
>loader.o
> obj-$(CONFIG_EFI_EARLYCON) += earlycon.o
> obj-$(CONFIG_UEFI_CPER_ARM) += cper-arm.o
> obj-$(CONFIG_UEFI_CPER_X86) += cper-x86.o
>+obj-$(CONFIG_UEFI_CPER_NVIDIA) += cper-nvidia.o
Hi,
Is drivers/firmware/efi/cper.c the right place to log vendor-specific errors,
given that so far drivers/firmware/efi/ only logs CPER information defined by the standards?
Vendor-specific errors are currently logged and recorded in rasdaemon.
https://github.com/mchehab/rasdaemon
https://github.com/mchehab/rasdaemon/blob/master/ras-non-standard-handler.c#L52
If some kernel-level recovery action or logging is required, we can also register with
acpi/apei/ghes using ghes_register_vendor_record_notifier() to receive a callback.
https://elixir.bootlin.com/linux/v6.19.3/source/drivers/acpi/apei/ghes.c#L652
[...]
>+/* NVIDIA Error Section */
>+#define CPER_SEC_NVIDIA
> \
>+ GUID_INIT(0x6d5244f2, 0x2712, 0x11ec, 0xbe, 0xa7, 0xcb, 0x3f, \
>+ 0xdb, 0x95, 0xc7, 0x86)
>
> #define CPER_PROC_VALID_TYPE 0x0001
> #define CPER_PROC_VALID_ISA 0x0002
>--
>2.43.0
>
Thanks,
Shiju
next prev parent reply other threads:[~2026-02-24 11:24 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 6:49 [PATCH] efi/cper: Add NVIDIA CPER section support Kai-Heng Feng
2026-02-24 11:23 ` Shiju Jose [this message]
2026-02-25 4:52 ` Kai-Heng Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=786211585e2b4a1fbca58c1427102260@huawei.com \
--to=shiju.jose@huawei.com \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=guohanjun@huawei.com \
--cc=jonathan.cameron@huawei.com \
--cc=kaihengf@nvidia.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
--cc=xueshuai@linux.alibaba.com \
--cc=zhangdandan@uniontech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox