All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kai-Heng Feng" <kaihengf@nvidia.com>
To: "Bjorn Helgaas" <helgaas@kernel.org>
Cc: <rafael@kernel.org>,
	"Jonathan Cameron" <jonathan.cameron@huawei.com>,
	"Shiju Jose" <shiju.jose@huawei.com>,
	"Tony Luck" <tony.luck@intel.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Hanjun Guo" <guohanjun@huawei.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Shuai Xue" <xueshuai@linux.alibaba.com>,
	"Len Brown" <lenb@kernel.org>, "Kees Cook" <kees@kernel.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	"Will Deacon" <will@kernel.org>,
	"Huang Yiwei" <quic_hyiwei@quicinc.com>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Nathan Chancellor" <nathan@kernel.org>,
	"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-hardening@vger.kernel.org>
Subject: Re: [PATCH v2 3/3] acpi/apei: Add NVIDIA GHES vendor CPER record handler
Date: Wed, 25 Mar 2026 19:34:50 +0800	[thread overview]
Message-ID: <DHBTY4W21Z9S.1LCIGKKJ0WX3T@nvidia.com> (raw)
In-Reply-To: <20260324161533.GA1131495@bhelgaas>

On Wed Mar 25, 2026 at 12:15 AM CST, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, Mar 24, 2026 at 05:33:06PM +0800, Kai-Heng Feng wrote:
>> On 2026-03-20 09:52, Bjorn Helgaas wrote:
>> > On Thu, Mar 19, 2026 at 07:13:09PM +0800, Kai-Heng Feng wrote:
>> > > Add support for decoding NVIDIA-specific CPER sections delivered via
>> > > the APEI GHES vendor record notifier chain. NVIDIA hardware generates
>> > > vendor-specific CPER sections containing error signatures and diagnostic
>> > > register dumps. This implementation registers a notifier_block with the
>> > > GHES vendor record notifier and decodes these sections, printing error
>> > > details via dev_info().
>> > >
>> > > The driver binds to ACPI device NVDA2012, present on NVIDIA server
>> > > platforms. The NVIDIA CPER section contains a fixed header with error
>> > > metadata (signature, error type, severity, socket) followed by
>> > > variable-length register address-value pairs for hardware diagnostics.
>> > >
>> > > This work is based on libcper [0].
>> > >
>> > > Example output:
>> > > nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544
>> > > nvidia-ghes NVDA2012:00: signature: CMET-INFO
>> > > nvidia-ghes NVDA2012:00: error_type: 0
>> > > nvidia-ghes NVDA2012:00: error_instance: 0
>> > > nvidia-ghes NVDA2012:00: severity: 3
>> > > nvidia-ghes NVDA2012:00: socket: 0
>> > > nvidia-ghes NVDA2012:00: number_regs: 32
>> > > nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000
>> > > nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 value=0x0000000100000000
>> >
>> > Is there a convenient way to connect NVDA2012:00 with the actual
>> > device?  I assume this is typically a PCIe device?  How would we
>> > relate this with PCIe errors?
>>
>> The CPER report is from ARM RAS firmware and not neccessarily be
>> related to a PCIe device.
>
> Right, I know CPER is more general than just PCI/PCIe.
>
> But in this case, I think NVDA2012 probably *is* a PCIe device.  How
> would we figure out which one?  If we have to manually do an acpidump,
> figure out which NVDA2012 is :00, and look for an _ADR or something,
> that doesn't really seem convenient for multi-NVDA2012 situations.

It's actually just an ACPI device:
Device (CPER)
{
  Name (_HID, "NVDA2012")  // _HID: Hardware ID
  Name (_UID, 0x00)  // _UID: Unique ID
  Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method
}

And that's it.


  reply	other threads:[~2026-03-25 11:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 11:13 [PATCH v2 1/3] acpi/apei: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-19 11:13 ` [PATCH v2 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-20  9:57   ` Jonathan Cameron
2026-03-19 11:13 ` [PATCH v2 3/3] acpi/apei: Add NVIDIA GHES vendor CPER record handler Kai-Heng Feng
2026-03-20 10:13   ` Jonathan Cameron
2026-03-24  9:10     ` Kai-Heng Feng
2026-03-20 14:52   ` Bjorn Helgaas
2026-03-20 15:13     ` Bjorn Helgaas
2026-03-24  9:33     ` Kai-Heng Feng
2026-03-24 16:15       ` Bjorn Helgaas
2026-03-25 11:34         ` Kai-Heng Feng [this message]
2026-03-25 15:36           ` Bjorn Helgaas
2026-03-25 17:08             ` Jonathan Cameron
2026-03-25 17:16               ` Rafael J. Wysocki
2026-03-20  9:55 ` [PATCH v2 1/3] acpi/apei: Add devm_ghes_register_vendor_record_notifier() Jonathan Cameron
2026-03-24 10:14   ` Kai-Heng Feng
2026-03-23 12:28 ` Hanjun Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DHBTY4W21Z9S.1LCIGKKJ0WX3T@nvidia.com \
    --to=kaihengf@nvidia.com \
    --cc=bp@alien8.de \
    --cc=dave.jiang@intel.com \
    --cc=fabio.m.de.francesco@linux.intel.com \
    --cc=guohanjun@huawei.com \
    --cc=gustavoars@kernel.org \
    --cc=helgaas@kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kees@kernel.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=nathan@kernel.org \
    --cc=quic_hyiwei@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.