public inbox for linux-efi@vger.kernel.org
 help / color / mirror / Atom feed
From: Kai-Heng Feng <kaihengf@nvidia.com>
To: Shiju Jose <shiju.jose@huawei.com>, "ardb@kernel.org" <ardb@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
	"Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Morduan Zang <zhangdandan@uniontech.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-efi@vger.kernel.org" <linux-efi@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>
Subject: Re: [PATCH] efi/cper: Add NVIDIA CPER section support
Date: Wed, 25 Feb 2026 12:52:46 +0800	[thread overview]
Message-ID: <66901350-8a4e-4282-8f56-a2df844a7ff6@nvidia.com> (raw)
In-Reply-To: <786211585e2b4a1fbca58c1427102260@huawei.com>

Hi Shiju,

On 2026/2/24 7:23 PM, Shiju Jose wrote:
> External email: Use caution opening links or attachments
> 
> 
>> -----Original Message-----
>> From: Kai-Heng Feng <kaihengf@nvidia.com>
>> Sent: 23 February 2026 06:49
>> To: ardb@kernel.org
>> Cc: Kai-Heng Feng <kaihengf@nvidia.com>; Rafael J. Wysocki
>> <rafael@kernel.org>; Tony Luck <tony.luck@intel.com>; Borislav Petkov
>> <bp@alien8.de>; Guohanjun (Hanjun Guo) <guohanjun@huawei.com>; Mauro
>> Carvalho Chehab <mchehab@kernel.org>; Shuai Xue
>> <xueshuai@linux.alibaba.com>; Jonathan Cameron
>> <jonathan.cameron@huawei.com>; Morduan Zang
>> <zhangdandan@uniontech.com>; linux-kernel@vger.kernel.org; linux-
>> efi@vger.kernel.org; linux-acpi@vger.kernel.org
>> Subject: [PATCH] efi/cper: Add NVIDIA CPER section support
>>
>> Add support for decoding NVIDIA-specific error sections in UEFI CPER records.
>> NVIDIA hardware generates vendor-specific CPER sections containing error
>> signatures and diagnostic register dumps. This implementation decodes these
>> sections and prints error details to the kernel log.
>>
>> The NVIDIA CPER section contains a fixed header with error metadata (signature,
>> error type, severity, socket) followed by variable-length register address-value
>> pairs for hardware diagnostics.
>>
>> This work is based on libcper [0].
>>
>> Example output:
>> Hardware error from APEI Generic Hardware Error Source: 816 event severity:
>> info  imprecise tstamp: 2025-11-17 07:57:38  Error 0, type: info
>>   section_type: NVIDIA, error_data_length: 224
>>   signature: HSS-IDLE
>>   error_type: 0
>>   error_instance: 0
>>   severity: 0
>>   socket: 255
>>   number_regs: 12
>>   instance_base: 0x0000000000000000
>>   register[0]: address=0x0000000004f10008 value=0x0000000000002019
>>   register[1]: address=0x0000000000000000 value=0x0000000000000000
>>
>> [0] https://github.com/openbmc/libcper/commit/683e055061ce
>> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
>> ---
>> drivers/firmware/efi/Kconfig       | 16 ++++++
>> drivers/firmware/efi/Makefile      |  1 +
>> drivers/firmware/efi/cper-nvidia.c | 79 ++++++++++++++++++++++++++++++
>> drivers/firmware/efi/cper-nvidia.h | 33 +++++++++++++
>> drivers/firmware/efi/cper.c        |  3 ++
>> include/linux/cper.h               |  4 ++
>> 6 files changed, 136 insertions(+)
>> create mode 100644 drivers/firmware/efi/cper-nvidia.c
>> create mode 100644 drivers/firmware/efi/cper-nvidia.h
>>
>> diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index
>> 29e0729299f5..ed1f53b8e878 100644
>> --- a/drivers/firmware/efi/Kconfig
>> +++ b/drivers/firmware/efi/Kconfig
>> @@ -329,6 +329,22 @@ config UEFI_CPER_X86
>>        depends on UEFI_CPER && X86
>>        default y
>>
>> +config UEFI_CPER_NVIDIA
>> +      bool "UEFI CPER NVIDIA support"
>> +      depends on UEFI_CPER
>> +      help
>> +        This option enables support for decoding NVIDIA-specific error
>> +        sections in UEFI Common Platform Error Records (CPER). These
>> +        sections contain additional diagnostic information for errors
>> +        occurring in NVIDIA hardware such as GPUs, switches, and other
>> +        devices.
>> +
>> +        The NVIDIA CPER sections include error signatures (e.g., PCIe-DPC,
>> +        DCC-ECC, GPU-STATUS) and diagnostic registers that provide detailed
>> +        information about hardware errors for debugging and analysis.
>> +
>> +        If unsure, say N.
>> +
>> config TEE_STMM_EFI
>>        tristate "TEE-based EFI runtime variable service driver"
>>        depends on EFI && OPTEE
>> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index
>> 8efbcf699e4f..a571b6086860 100644
>> --- a/drivers/firmware/efi/Makefile
>> +++ b/drivers/firmware/efi/Makefile
>> @@ -42,5 +42,6 @@ obj-$(CONFIG_EFI_CAPSULE_LOADER)     += capsule-
>> loader.o
>> obj-$(CONFIG_EFI_EARLYCON)            += earlycon.o
>> obj-$(CONFIG_UEFI_CPER_ARM)           += cper-arm.o
>> obj-$(CONFIG_UEFI_CPER_X86)           += cper-x86.o
>> +obj-$(CONFIG_UEFI_CPER_NVIDIA)                += cper-nvidia.o
> 
> Hi,
> 
> Is drivers/firmware/efi/cper.c the right place to log vendor-specific errors,
> given that so far drivers/firmware/efi/ only logs CPER information defined by the standards?
> Vendor-specific errors are currently logged and recorded in rasdaemon.
> https://github.com/mchehab/rasdaemon
> https://github.com/mchehab/rasdaemon/blob/master/ras-non-standard-handler.c#L52
> 
> If some kernel-level  recovery action or logging is required, we can also register with
> acpi/apei/ghes using ghes_register_vendor_record_notifier() to receive a callback.
> https://elixir.bootlin.com/linux/v6.19.3/source/drivers/acpi/apei/ghes.c#L652

Thank you for the info. There's indeed an ACPI node for CPER purpose. I'll see 
if that ACPI HID can be used for implementing using 
ghes_register_vendor_record_notifier().

Kai-Heng

> 
> [...]
>> +/* NVIDIA Error Section */
>> +#define CPER_SEC_NVIDIA
>>        \
>> +      GUID_INIT(0x6d5244f2, 0x2712, 0x11ec, 0xbe, 0xa7, 0xcb, 0x3f,   \
>> +                0xdb, 0x95, 0xc7, 0x86)
>>
>> #define CPER_PROC_VALID_TYPE                  0x0001
>> #define CPER_PROC_VALID_ISA                   0x0002
>> --
>> 2.43.0
>>
> 
> Thanks,
> Shiju


      reply	other threads:[~2026-02-25  4:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  6:49 [PATCH] efi/cper: Add NVIDIA CPER section support Kai-Heng Feng
2026-02-24 11:23 ` Shiju Jose
2026-02-25  4:52   ` Kai-Heng Feng [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66901350-8a4e-4282-8f56-a2df844a7ff6@nvidia.com \
    --to=kaihengf@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=guohanjun@huawei.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rafael@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    --cc=zhangdandan@uniontech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox