All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Kai-Heng Feng <kaihengf@nvidia.com>
Cc: <rafael@kernel.org>, Shiju Jose <shiju.jose@huawei.com>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
	Hanjun Guo <guohanjun@huawei.com>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	"Shuai Xue" <xueshuai@linux.alibaba.com>,
	Len Brown <lenb@kernel.org>, Kees Cook <kees@kernel.org>,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	"Will Deacon" <will@kernel.org>,
	Huang Yiwei <quic_hyiwei@quicinc.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Nathan Chancellor <nathan@kernel.org>,
	"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-hardening@vger.kernel.org>
Subject: Re: [PATCH v2 3/3] acpi/apei: Add NVIDIA GHES vendor CPER record handler
Date: Fri, 20 Mar 2026 10:13:35 +0000	[thread overview]
Message-ID: <20260320101335.00004026@huawei.com> (raw)
In-Reply-To: <20260319111315.87624-3-kaihengf@nvidia.com>

On Thu, 19 Mar 2026 19:13:09 +0800
Kai-Heng Feng <kaihengf@nvidia.com> wrote:

> Add support for decoding NVIDIA-specific CPER sections delivered via
> the APEI GHES vendor record notifier chain. NVIDIA hardware generates
> vendor-specific CPER sections containing error signatures and diagnostic
> register dumps. This implementation registers a notifier_block with the
> GHES vendor record notifier and decodes these sections, printing error
> details via dev_info().
> 
> The driver binds to ACPI device NVDA2012, present on NVIDIA server
> platforms. The NVIDIA CPER section contains a fixed header with error
> metadata (signature, error type, severity, socket) followed by
> variable-length register address-value pairs for hardware diagnostics.
> 
> This work is based on libcper [0].
> 
> Example output:
> nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544
> nvidia-ghes NVDA2012:00: signature: CMET-INFO
> nvidia-ghes NVDA2012:00: error_type: 0
> nvidia-ghes NVDA2012:00: error_instance: 0
> nvidia-ghes NVDA2012:00: severity: 3
> nvidia-ghes NVDA2012:00: socket: 0
> nvidia-ghes NVDA2012:00: number_regs: 32
> nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000
> nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 value=0x0000000100000000
> 
> [0] https://github.com/openbmc/libcper/commit/683e055061ce
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Only significant thing is around use of dev_err_probe().

I'm surprised that didn't give you error messages in the log even on success.

With that fixed (other stuff is all up to you).
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


>  apei-y := apei-base.o hest.o erst.o bert.o
> diff --git a/drivers/acpi/apei/nvidia-ghes.c b/drivers/acpi/apei/nvidia-ghes.c
> new file mode 100644
> index 000000000000..aa2e3a387b49
> --- /dev/null
> +++ b/drivers/acpi/apei/nvidia-ghes.c

> +static void nvidia_ghes_print_error(struct device *dev,
> +				    const struct cper_sec_nvidia *nvidia_err,
> +				    size_t error_data_length, bool fatal)
> +{
> +	const char *level = fatal ? KERN_ERR : KERN_INFO;
> +	size_t min_size;
> +	int i;
...


> +	 * Validate that all registers fit within error_data_length.
> +	 * Each register pair is two little-endian u64s.
> +	 */
> +	min_size = struct_size(nvidia_err, regs, nvidia_err->number_regs);
> +	if (error_data_length < min_size) {
> +		dev_err(dev, "Invalid number_regs %u (section size %zu, need %zu)\n",
> +			nvidia_err->number_regs, error_data_length, min_size);
> +		return;
> +	}
> +
> +	for (i = 0; i < nvidia_err->number_regs; i++)

Trivial but I'd take advantage of it now being acceptable (in general) to do
	for (int i = 0; i < ....)

> +		dev_printk(level, dev, "register[%d]: address=0x%016llx value=0x%016llx\n",
> +			   i, le64_to_cpu(nvidia_err->regs[i].addr),
> +			   le64_to_cpu(nvidia_err->regs[i].val));
> +}

> +static int nvidia_ghes_probe(struct platform_device *pdev)
> +{
> +	struct nvidia_ghes_private *priv;
> +
> +	priv = devm_kmalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	*priv = (struct nvidia_ghes_private) {
> +		.nb.notifier_call = nvidia_ghes_notify,
> +		.dev = &pdev->dev,
> +	};
> +
> +	return dev_err_probe(&pdev->dev,
> +			     devm_ghes_register_vendor_record_notifier(&pdev->dev, &priv->nb),
That's too not great for readability and dev_err_probe() should only be called on errors
I'm fairly sure it doesn't have special handling for 0 so will call dev_err() or dev_warn()
and print some stuff before saying 'no error'.

	int ret;
	...

	ret = devm_ghes_register_vendor_record_notifier(&pdev->dev, &priv->nb);
	if (ret)
		return dev_err_probe(&pdev->dev,
				      "Failed to register NVIDIA GHES vendor record notifier\n");

	return 0;



> +			     "Failed to register NVIDIA GHES vendor record notifier\n");
> +}
> +
> +static const struct acpi_device_id nvidia_ghes_acpi_match[] = {
> +	{ "NVDA2012" },

London Olympics :)

> +	{ }
> +};
> +MODULE_DEVICE_TABLE(acpi, nvidia_ghes_acpi_match);
> +
> +static struct platform_driver nvidia_ghes_driver = {
> +	.driver = {
> +		.name		= "nvidia-ghes",
> +		.acpi_match_table = nvidia_ghes_acpi_match,
> +	},
> +	.probe	= nvidia_ghes_probe,

I'd just not attempt to align the = 
static struct platform_driver nvidia_ghes_driver = {
	.driver = {
		.name = "nvidia-ghes",
		.acpi_match_table = nvidia_ghes_acpi_match,
	},
	.probe = nvidia_ghes_probe,

There aren't enough of them to make it much of a readability improvement
and doing this often results in unnecessary churn as a driver evolves.
Also it's already broken!

> +};
> +module_platform_driver(nvidia_ghes_driver);
> +
> +MODULE_AUTHOR("Kai-Heng Feng <kaihengf@nvidia.com>");
> +MODULE_DESCRIPTION("NVIDIA GHES vendor CPER record handler");
> +MODULE_LICENSE("GPL");


  reply	other threads:[~2026-03-20 10:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-19 11:13 [PATCH v2 1/3] acpi/apei: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-19 11:13 ` [PATCH v2 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-20  9:57   ` Jonathan Cameron
2026-03-19 11:13 ` [PATCH v2 3/3] acpi/apei: Add NVIDIA GHES vendor CPER record handler Kai-Heng Feng
2026-03-20 10:13   ` Jonathan Cameron [this message]
2026-03-24  9:10     ` Kai-Heng Feng
2026-03-20 14:52   ` Bjorn Helgaas
2026-03-20 15:13     ` Bjorn Helgaas
2026-03-24  9:33     ` Kai-Heng Feng
2026-03-24 16:15       ` Bjorn Helgaas
2026-03-25 11:34         ` Kai-Heng Feng
2026-03-25 15:36           ` Bjorn Helgaas
2026-03-25 17:08             ` Jonathan Cameron
2026-03-25 17:16               ` Rafael J. Wysocki
2026-03-20  9:55 ` [PATCH v2 1/3] acpi/apei: Add devm_ghes_register_vendor_record_notifier() Jonathan Cameron
2026-03-24 10:14   ` Kai-Heng Feng
2026-03-23 12:28 ` Hanjun Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260320101335.00004026@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=bp@alien8.de \
    --cc=dave.jiang@intel.com \
    --cc=fabio.m.de.francesco@linux.intel.com \
    --cc=guohanjun@huawei.com \
    --cc=gustavoars@kernel.org \
    --cc=kaihengf@nvidia.com \
    --cc=kees@kernel.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=nathan@kernel.org \
    --cc=quic_hyiwei@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.