* [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler
@ 2026-03-30 9:41 Kai-Heng Feng
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Kai-Heng Feng @ 2026-03-30 9:41 UTC (permalink / raw)
To: rafael
Cc: jonathan.cameron, helgaas, guohanjun, linux-kernel, linux-acpi,
linux-pci, acpica-devel, linux-hardening, Kai-Heng Feng
NVIDIA DGX and HGX platforms provide an ACPI device (NVDA2012) alongside
the GHES error source. The device does not generate the error records
itself; its purpose is to allow the vendor CPER handler driver to load
automatically via ACPI enumeration rather than manual module loading.
The series introduces devm_ghes_register_vendor_record_notifier() helper,
let HiSilicon PCI driver and new NVIDA GHES driver use the new helper.
Kai-Heng Feng (3):
ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
MAINTAINERS | 6 +
drivers/acpi/apei/Kconfig | 14 +++
drivers/acpi/apei/Makefile | 1 +
drivers/acpi/apei/ghes-nvidia.c | 149 +++++++++++++++++++++++
drivers/acpi/apei/ghes.c | 18 +++
drivers/pci/controller/pcie-hisi-error.c | 12 +-
include/acpi/ghes.h | 11 ++
7 files changed, 200 insertions(+), 11 deletions(-)
create mode 100644 drivers/acpi/apei/ghes-nvidia.c
--
2.50.1 (Apple Git-155)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Kai-Heng Feng
@ 2026-03-30 9:41 ` Kai-Heng Feng
2026-03-30 12:58 ` Breno Leitao
` (2 more replies)
2026-03-30 9:41 ` [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
` (2 subsequent siblings)
3 siblings, 3 replies; 10+ messages in thread
From: Kai-Heng Feng @ 2026-03-30 9:41 UTC (permalink / raw)
To: rafael
Cc: jonathan.cameron, helgaas, guohanjun, linux-kernel, linux-acpi,
linux-pci, acpica-devel, linux-hardening, Kai-Heng Feng,
Shiju Jose, Tony Luck, Borislav Petkov, Mauro Carvalho Chehab,
Shuai Xue, Len Brown, Robert Moore, Fabio M. De Francesco,
Breno Leitao, Jason Tian
Add a device-managed wrapper around ghes_register_vendor_record_notifier()
so drivers can avoid manual cleanup on device removal or probe failure.
Cc: Shiju Jose <shiju.jose@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
---
v3:
- Change patch title
- Move documentation to header file.
v2:
- New patch.
drivers/acpi/apei/ghes.c | 18 ++++++++++++++++++
include/acpi/ghes.h | 11 +++++++++++
2 files changed, 29 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8acd2742bb27..3236a3ce79d6 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -689,6 +689,24 @@ void ghes_unregister_vendor_record_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL_GPL(ghes_unregister_vendor_record_notifier);
+static void ghes_vendor_record_notifier_destroy(void *nb)
+{
+ ghes_unregister_vendor_record_notifier(nb);
+}
+
+int devm_ghes_register_vendor_record_notifier(struct device *dev,
+ struct notifier_block *nb)
+{
+ int ret;
+
+ ret = ghes_register_vendor_record_notifier(nb);
+ if (ret)
+ return ret;
+
+ return devm_add_action_or_reset(dev, ghes_vendor_record_notifier_destroy, nb);
+}
+EXPORT_SYMBOL_GPL(devm_ghes_register_vendor_record_notifier);
+
static void ghes_vendor_record_work_func(struct work_struct *work)
{
struct ghes_vendor_record_entry *entry;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 7bea522c0657..8d7e5caef3f1 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -71,6 +71,17 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
*/
void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
+/**
+ * devm_ghes_register_vendor_record_notifier - device-managed vendor
+ * record notifier registration.
+ * @dev: device that owns the notifier lifetime
+ * @nb: pointer to the notifier_block structure of the vendor record handler
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int devm_ghes_register_vendor_record_notifier(struct device *dev,
+ struct notifier_block *nb);
+
struct list_head *ghes_get_devices(void);
void ghes_estatus_pool_region_free(unsigned long addr, u32 size);
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Kai-Heng Feng
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
@ 2026-03-30 9:41 ` Kai-Heng Feng
2026-03-31 5:34 ` Manivannan Sadhasivam
2026-03-31 14:43 ` Shiju Jose
2026-03-30 9:41 ` [PATCH v3 3/3] ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler Kai-Heng Feng
2026-04-06 14:50 ` [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Rafael J. Wysocki
3 siblings, 2 replies; 10+ messages in thread
From: Kai-Heng Feng @ 2026-03-30 9:41 UTC (permalink / raw)
To: rafael
Cc: jonathan.cameron, helgaas, guohanjun, linux-kernel, linux-acpi,
linux-pci, acpica-devel, linux-hardening, Kai-Heng Feng,
Shiju Jose, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas
Switch to the device-managed variant so the notifier is automatically
unregistered on device removal, allowing the open-coded remove callback
to be dropped entirely.
Cc: Shiju Jose <shiju.jose@huawei.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
---
v3:
- No change.
v2:
- New patch.
drivers/pci/controller/pcie-hisi-error.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/drivers/pci/controller/pcie-hisi-error.c b/drivers/pci/controller/pcie-hisi-error.c
index aaf1ed2b6e59..36be86d827a8 100644
--- a/drivers/pci/controller/pcie-hisi-error.c
+++ b/drivers/pci/controller/pcie-hisi-error.c
@@ -287,25 +287,16 @@ static int hisi_pcie_error_handler_probe(struct platform_device *pdev)
priv->nb.notifier_call = hisi_pcie_notify_error;
priv->dev = &pdev->dev;
- ret = ghes_register_vendor_record_notifier(&priv->nb);
+ ret = devm_ghes_register_vendor_record_notifier(&pdev->dev, &priv->nb);
if (ret) {
dev_err(&pdev->dev,
"Failed to register hisi pcie controller error handler with apei\n");
return ret;
}
- platform_set_drvdata(pdev, priv);
-
return 0;
}
-static void hisi_pcie_error_handler_remove(struct platform_device *pdev)
-{
- struct hisi_pcie_error_private *priv = platform_get_drvdata(pdev);
-
- ghes_unregister_vendor_record_notifier(&priv->nb);
-}
-
static const struct acpi_device_id hisi_pcie_acpi_match[] = {
{ "HISI0361", 0 },
{ }
@@ -317,7 +308,6 @@ static struct platform_driver hisi_pcie_error_handler_driver = {
.acpi_match_table = hisi_pcie_acpi_match,
},
.probe = hisi_pcie_error_handler_probe,
- .remove = hisi_pcie_error_handler_remove,
};
module_platform_driver(hisi_pcie_error_handler_driver);
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
2026-03-30 9:41 [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Kai-Heng Feng
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-30 9:41 ` [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
@ 2026-03-30 9:41 ` Kai-Heng Feng
2026-04-06 14:50 ` [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Rafael J. Wysocki
3 siblings, 0 replies; 10+ messages in thread
From: Kai-Heng Feng @ 2026-03-30 9:41 UTC (permalink / raw)
To: rafael
Cc: jonathan.cameron, helgaas, guohanjun, linux-kernel, linux-acpi,
linux-pci, acpica-devel, linux-hardening, Kai-Heng Feng,
Shiju Jose, Tony Luck, Borislav Petkov, Mauro Carvalho Chehab,
Shuai Xue, Len Brown, Kees Cook, Gustavo A. R. Silva, Gavin Shan,
Huang Yiwei, Nathan Chancellor, Dave Jiang, Fabio M. De Francesco
Add support for decoding NVIDIA-specific CPER sections delivered via
the APEI GHES vendor record notifier chain. NVIDIA hardware generates
vendor-specific CPER sections containing error signatures and diagnostic
register dumps. This implementation registers a notifier_block with the
GHES vendor record notifier and decodes these sections, printing error
details via dev_info().
The driver binds to ACPI device NVDA2012, present on NVIDIA server
platforms. The NVIDIA CPER section contains a fixed header with error
metadata (signature, error type, severity, socket) followed by
variable-length register address-value pairs for hardware diagnostics.
This work is based on libcper [0].
Example output:
nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544
nvidia-ghes NVDA2012:00: signature: CMET-INFO
nvidia-ghes NVDA2012:00: error_type: 0
nvidia-ghes NVDA2012:00: error_instance: 0
nvidia-ghes NVDA2012:00: severity: 3
nvidia-ghes NVDA2012:00: socket: 0
nvidia-ghes NVDA2012:00: number_regs: 32
nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000
nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 value=0x0000000100000000
[0] https://github.com/openbmc/libcper/commit/683e055061ce
Cc: Shiju Jose <shiju.jose@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
---
v3:
- NVIDIA_GHES to GHES_NVIDIA.
- Better error handling in probe()
- "int i" in for loop.
v2:
- Use right headers.
- Use embedded struct and __counted_by.
- Drop __packed.
- Remove unecessary casts.
- Use * in sizeof().
- Use devm_kmalloc() and struct assignment.
- Use dev_err_probe and new devm helper.
MAINTAINERS | 6 ++
drivers/acpi/apei/Kconfig | 14 +++
drivers/acpi/apei/Makefile | 1 +
drivers/acpi/apei/ghes-nvidia.c | 149 ++++++++++++++++++++++++++++++++
4 files changed, 170 insertions(+)
create mode 100644 drivers/acpi/apei/ghes-nvidia.c
diff --git a/MAINTAINERS b/MAINTAINERS
index c3fe46d7c4bc..94608f8e247e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18919,6 +18919,12 @@ S: Maintained
F: drivers/video/fbdev/nvidia/
F: drivers/video/fbdev/riva/
+NVIDIA GHES VENDOR CPER RECORD HANDLER
+M: Kai-Heng Feng <kaihengf@nvidia.com>
+L: linux-acpi@vger.kernel.org
+S: Maintained
+F: drivers/acpi/apei/nvidia-ghes.c
+
NVIDIA VRS RTC DRIVER
M: Shubhi Garg <shgarg@nvidia.com>
L: linux-tegra@vger.kernel.org
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 070c07d68dfb..428458c623f0 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -74,6 +74,20 @@ config ACPI_APEI_EINJ_CXL
If unsure say 'n'
+config ACPI_APEI_GHES_NVIDIA
+ tristate "NVIDIA GHES vendor record handler"
+ depends on ACPI_APEI_GHES
+ help
+ Support for decoding NVIDIA-specific CPER sections delivered via
+ the APEI GHES vendor record notifier chain. Registers a handler
+ for the NVIDIA section GUID and logs error signatures, severity,
+ socket, and diagnostic register address-value pairs.
+
+ Enable on NVIDIA server platforms (e.g. DGX, HGX) that expose
+ ACPI device NVDA2012 in their firmware tables.
+
+ If unsure, say N.
+
config ACPI_APEI_ERST_DEBUG
tristate "APEI Error Record Serialization Table (ERST) Debug Support"
depends on ACPI_APEI
diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
index 1a0b85923cd4..66588d6be56f 100644
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -10,5 +10,6 @@ obj-$(CONFIG_ACPI_APEI_EINJ) += einj.o
einj-y := einj-core.o
einj-$(CONFIG_ACPI_APEI_EINJ_CXL) += einj-cxl.o
obj-$(CONFIG_ACPI_APEI_ERST_DEBUG) += erst-dbg.o
+obj-$(CONFIG_ACPI_APEI_GHES_NVIDIA) += ghes-nvidia.o
apei-y := apei-base.o hest.o erst.o bert.o
diff --git a/drivers/acpi/apei/ghes-nvidia.c b/drivers/acpi/apei/ghes-nvidia.c
new file mode 100644
index 000000000000..597275d81de8
--- /dev/null
+++ b/drivers/acpi/apei/ghes-nvidia.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NVIDIA GHES vendor record handler
+ *
+ * Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#include <linux/acpi.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+#include <acpi/ghes.h>
+
+static const guid_t nvidia_sec_guid =
+ GUID_INIT(0x6d5244f2, 0x2712, 0x11ec,
+ 0xbe, 0xa7, 0xcb, 0x3f, 0xdb, 0x95, 0xc7, 0x86);
+
+struct cper_sec_nvidia {
+ char signature[16];
+ __le16 error_type;
+ __le16 error_instance;
+ u8 severity;
+ u8 socket;
+ u8 number_regs;
+ u8 reserved;
+ __le64 instance_base;
+ struct {
+ __le64 addr;
+ __le64 val;
+ } regs[] __counted_by(number_regs);
+};
+
+struct nvidia_ghes_private {
+ struct notifier_block nb;
+ struct device *dev;
+};
+
+static void nvidia_ghes_print_error(struct device *dev,
+ const struct cper_sec_nvidia *nvidia_err,
+ size_t error_data_length, bool fatal)
+{
+ const char *level = fatal ? KERN_ERR : KERN_INFO;
+ size_t min_size;
+
+ dev_printk(level, dev, "signature: %.16s\n", nvidia_err->signature);
+ dev_printk(level, dev, "error_type: %u\n", le16_to_cpu(nvidia_err->error_type));
+ dev_printk(level, dev, "error_instance: %u\n", le16_to_cpu(nvidia_err->error_instance));
+ dev_printk(level, dev, "severity: %u\n", nvidia_err->severity);
+ dev_printk(level, dev, "socket: %u\n", nvidia_err->socket);
+ dev_printk(level, dev, "number_regs: %u\n", nvidia_err->number_regs);
+ dev_printk(level, dev, "instance_base: 0x%016llx\n",
+ le64_to_cpu(nvidia_err->instance_base));
+
+ if (nvidia_err->number_regs == 0)
+ return;
+
+ /*
+ * Validate that all registers fit within error_data_length.
+ * Each register pair is two little-endian u64s.
+ */
+ min_size = struct_size(nvidia_err, regs, nvidia_err->number_regs);
+ if (error_data_length < min_size) {
+ dev_err(dev, "Invalid number_regs %u (section size %zu, need %zu)\n",
+ nvidia_err->number_regs, error_data_length, min_size);
+ return;
+ }
+
+ for (int i = 0; i < nvidia_err->number_regs; i++)
+ dev_printk(level, dev, "register[%d]: address=0x%016llx value=0x%016llx\n",
+ i, le64_to_cpu(nvidia_err->regs[i].addr),
+ le64_to_cpu(nvidia_err->regs[i].val));
+}
+
+static int nvidia_ghes_notify(struct notifier_block *nb,
+ unsigned long event, void *data)
+{
+ struct acpi_hest_generic_data *gdata = data;
+ struct nvidia_ghes_private *priv;
+ const struct cper_sec_nvidia *nvidia_err;
+ guid_t sec_guid;
+
+ import_guid(&sec_guid, gdata->section_type);
+ if (!guid_equal(&sec_guid, &nvidia_sec_guid))
+ return NOTIFY_DONE;
+
+ priv = container_of(nb, struct nvidia_ghes_private, nb);
+
+ if (acpi_hest_get_error_length(gdata) < sizeof(*nvidia_err)) {
+ dev_err(priv->dev, "Section too small (%d < %zu)\n",
+ acpi_hest_get_error_length(gdata), sizeof(*nvidia_err));
+ return NOTIFY_OK;
+ }
+
+ nvidia_err = acpi_hest_get_payload(gdata);
+
+ if (event >= GHES_SEV_RECOVERABLE)
+ dev_err(priv->dev, "NVIDIA CPER section, error_data_length: %u\n",
+ acpi_hest_get_error_length(gdata));
+ else
+ dev_info(priv->dev, "NVIDIA CPER section, error_data_length: %u\n",
+ acpi_hest_get_error_length(gdata));
+
+ nvidia_ghes_print_error(priv->dev, nvidia_err, acpi_hest_get_error_length(gdata),
+ event >= GHES_SEV_RECOVERABLE);
+
+ return NOTIFY_OK;
+}
+
+static int nvidia_ghes_probe(struct platform_device *pdev)
+{
+ struct nvidia_ghes_private *priv;
+ int ret;
+
+ priv = devm_kmalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
+ *priv = (struct nvidia_ghes_private) {
+ .nb.notifier_call = nvidia_ghes_notify,
+ .dev = &pdev->dev,
+ };
+
+ ret = devm_ghes_register_vendor_record_notifier(&pdev->dev, &priv->nb);
+ if (ret)
+ return dev_err_probe(&pdev->dev, ret,
+ "Failed to register NVIDIA GHES vendor record notifier\n");
+
+ return 0;
+}
+
+static const struct acpi_device_id nvidia_ghes_acpi_match[] = {
+ { "NVDA2012" },
+ { }
+};
+MODULE_DEVICE_TABLE(acpi, nvidia_ghes_acpi_match);
+
+static struct platform_driver nvidia_ghes_driver = {
+ .driver = {
+ .name = "nvidia-ghes",
+ .acpi_match_table = nvidia_ghes_acpi_match,
+ },
+ .probe = nvidia_ghes_probe,
+};
+module_platform_driver(nvidia_ghes_driver);
+
+MODULE_AUTHOR("Kai-Heng Feng <kaihengf@nvidia.com>");
+MODULE_DESCRIPTION("NVIDIA GHES vendor CPER record handler");
+MODULE_LICENSE("GPL");
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
@ 2026-03-30 12:58 ` Breno Leitao
2026-03-31 14:42 ` Shiju Jose
2026-04-01 1:55 ` Shuai Xue
2 siblings, 0 replies; 10+ messages in thread
From: Breno Leitao @ 2026-03-30 12:58 UTC (permalink / raw)
To: Kai-Heng Feng
Cc: rafael, jonathan.cameron, helgaas, guohanjun, linux-kernel,
linux-acpi, linux-pci, acpica-devel, linux-hardening, Shiju Jose,
Tony Luck, Borislav Petkov, Mauro Carvalho Chehab, Shuai Xue,
Len Brown, Robert Moore, Fabio M. De Francesco, Jason Tian
On Mon, Mar 30, 2026 at 05:41:55PM +0800, Kai-Heng Feng wrote:
> Add a device-managed wrapper around ghes_register_vendor_record_notifier()
> so drivers can avoid manual cleanup on device removal or probe failure.
>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 ` [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
@ 2026-03-31 5:34 ` Manivannan Sadhasivam
2026-03-31 14:43 ` Shiju Jose
1 sibling, 0 replies; 10+ messages in thread
From: Manivannan Sadhasivam @ 2026-03-31 5:34 UTC (permalink / raw)
To: Kai-Heng Feng
Cc: rafael, jonathan.cameron, helgaas, guohanjun, linux-kernel,
linux-acpi, linux-pci, acpica-devel, linux-hardening, Shiju Jose,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
Bjorn Helgaas
On Mon, Mar 30, 2026 at 05:41:56PM +0800, Kai-Heng Feng wrote:
> Switch to the device-managed variant so the notifier is automatically
> unregistered on device removal, allowing the open-coded remove callback
> to be dropped entirely.
>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Since there is an API dependency, feel free to take it through ACPI tree.
- Mani
> ---
> v3:
> - No change.
> v2:
> - New patch.
>
> drivers/pci/controller/pcie-hisi-error.c | 12 +-----------
> 1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-hisi-error.c b/drivers/pci/controller/pcie-hisi-error.c
> index aaf1ed2b6e59..36be86d827a8 100644
> --- a/drivers/pci/controller/pcie-hisi-error.c
> +++ b/drivers/pci/controller/pcie-hisi-error.c
> @@ -287,25 +287,16 @@ static int hisi_pcie_error_handler_probe(struct platform_device *pdev)
>
> priv->nb.notifier_call = hisi_pcie_notify_error;
> priv->dev = &pdev->dev;
> - ret = ghes_register_vendor_record_notifier(&priv->nb);
> + ret = devm_ghes_register_vendor_record_notifier(&pdev->dev, &priv->nb);
> if (ret) {
> dev_err(&pdev->dev,
> "Failed to register hisi pcie controller error handler with apei\n");
> return ret;
> }
>
> - platform_set_drvdata(pdev, priv);
> -
> return 0;
> }
>
> -static void hisi_pcie_error_handler_remove(struct platform_device *pdev)
> -{
> - struct hisi_pcie_error_private *priv = platform_get_drvdata(pdev);
> -
> - ghes_unregister_vendor_record_notifier(&priv->nb);
> -}
> -
> static const struct acpi_device_id hisi_pcie_acpi_match[] = {
> { "HISI0361", 0 },
> { }
> @@ -317,7 +308,6 @@ static struct platform_driver hisi_pcie_error_handler_driver = {
> .acpi_match_table = hisi_pcie_acpi_match,
> },
> .probe = hisi_pcie_error_handler_probe,
> - .remove = hisi_pcie_error_handler_remove,
> };
> module_platform_driver(hisi_pcie_error_handler_driver);
>
> --
> 2.50.1 (Apple Git-155)
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-30 12:58 ` Breno Leitao
@ 2026-03-31 14:42 ` Shiju Jose
2026-04-01 1:55 ` Shuai Xue
2 siblings, 0 replies; 10+ messages in thread
From: Shiju Jose @ 2026-03-31 14:42 UTC (permalink / raw)
To: Kai-Heng Feng, rafael@kernel.org
Cc: Jonathan Cameron, helgaas@kernel.org, Guohanjun (Hanjun Guo),
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-pci@vger.kernel.org, acpica-devel@lists.linux.dev,
linux-hardening@vger.kernel.org, Tony Luck, Borislav Petkov,
Mauro Carvalho Chehab, Shuai Xue, Len Brown, Robert Moore,
Fabio M. De Francesco, Breno Leitao, Jason Tian
>-----Original Message-----
>From: Kai-Heng Feng <kaihengf@nvidia.com>
>Sent: 30 March 2026 10:42
>To: rafael@kernel.org
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>;
>helgaas@kernel.org; Guohanjun (Hanjun Guo) <guohanjun@huawei.com>;
>linux-kernel@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>pci@vger.kernel.org; acpica-devel@lists.linux.dev; linux-
>hardening@vger.kernel.org; Kai-Heng Feng <kaihengf@nvidia.com>; Shiju Jose
><shiju.jose@huawei.com>; Tony Luck <tony.luck@intel.com>; Borislav Petkov
><bp@alien8.de>; Mauro Carvalho Chehab <mchehab@kernel.org>; Shuai Xue
><xueshuai@linux.alibaba.com>; Len Brown <lenb@kernel.org>; Robert Moore
><robert.moore@intel.com>; Fabio M. De Francesco
><fabio.m.de.francesco@linux.intel.com>; Breno Leitao <leitao@debian.org>;
>Jason Tian <jason@os.amperecomputing.com>
>Subject: [PATCH v3 1/3] ACPI: APEI: GHES: Add
>devm_ghes_register_vendor_record_notifier()
>
>Add a device-managed wrapper around
>ghes_register_vendor_record_notifier()
>so drivers can avoid manual cleanup on device removal or probe failure.
>
>Cc: Shiju Jose <shiju.jose@huawei.com>
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
>Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
>---
>v3:
> - Change patch title
> - Move documentation to header file.
>v2:
> - New patch.
>
> drivers/acpi/apei/ghes.c | 18 ++++++++++++++++++
> include/acpi/ghes.h | 11 +++++++++++
> 2 files changed, 29 insertions(+)
>
>diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index
>8acd2742bb27..3236a3ce79d6 100644
>--- a/drivers/acpi/apei/ghes.c
>+++ b/drivers/acpi/apei/ghes.c
>@@ -689,6 +689,24 @@ void ghes_unregister_vendor_record_notifier(struct
>notifier_block *nb) }
>EXPORT_SYMBOL_GPL(ghes_unregister_vendor_record_notifier);
>
>+static void ghes_vendor_record_notifier_destroy(void *nb) {
>+ ghes_unregister_vendor_record_notifier(nb);
>+}
>+
>+int devm_ghes_register_vendor_record_notifier(struct device *dev,
>+ struct notifier_block *nb)
>+{
>+ int ret;
>+
>+ ret = ghes_register_vendor_record_notifier(nb);
>+ if (ret)
>+ return ret;
>+
>+ return devm_add_action_or_reset(dev,
>+ghes_vendor_record_notifier_destroy, nb); }
>+EXPORT_SYMBOL_GPL(devm_ghes_register_vendor_record_notifier);
>+
> static void ghes_vendor_record_work_func(struct work_struct *work) {
> struct ghes_vendor_record_entry *entry; diff --git
>a/include/acpi/ghes.h b/include/acpi/ghes.h index
>7bea522c0657..8d7e5caef3f1 100644
>--- a/include/acpi/ghes.h
>+++ b/include/acpi/ghes.h
>@@ -71,6 +71,17 @@ int ghes_register_vendor_record_notifier(struct
>notifier_block *nb);
> */
> void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
>
>+/**
>+ * devm_ghes_register_vendor_record_notifier - device-managed vendor
>+ * record notifier registration.
>+ * @dev: device that owns the notifier lifetime
>+ * @nb: pointer to the notifier_block structure of the vendor record
>+handler
>+ *
>+ * Return: 0 on success, negative errno on failure.
>+ */
>+int devm_ghes_register_vendor_record_notifier(struct device *dev,
>+ struct notifier_block *nb);
>+
> struct list_head *ghes_get_devices(void);
>
> void ghes_estatus_pool_region_free(unsigned long addr, u32 size);
>--
>2.50.1 (Apple Git-155)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 ` [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-31 5:34 ` Manivannan Sadhasivam
@ 2026-03-31 14:43 ` Shiju Jose
1 sibling, 0 replies; 10+ messages in thread
From: Shiju Jose @ 2026-03-31 14:43 UTC (permalink / raw)
To: Kai-Heng Feng, rafael@kernel.org
Cc: Jonathan Cameron, helgaas@kernel.org, Guohanjun (Hanjun Guo),
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-pci@vger.kernel.org, acpica-devel@lists.linux.dev,
linux-hardening@vger.kernel.org, Lorenzo Pieralisi,
Krzysztof Wilczyński, Manivannan Sadhasivam, Rob Herring,
Bjorn Helgaas
>-----Original Message-----
>From: Kai-Heng Feng <kaihengf@nvidia.com>
>Sent: 30 March 2026 10:42
>To: rafael@kernel.org
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>;
>helgaas@kernel.org; Guohanjun (Hanjun Guo) <guohanjun@huawei.com>;
>linux-kernel@vger.kernel.org; linux-acpi@vger.kernel.org; linux-
>pci@vger.kernel.org; acpica-devel@lists.linux.dev; linux-
>hardening@vger.kernel.org; Kai-Heng Feng <kaihengf@nvidia.com>; Shiju Jose
><shiju.jose@huawei.com>; Lorenzo Pieralisi <lpieralisi@kernel.org>; Krzysztof
>Wilczyński <kwilczynski@kernel.org>; Manivannan Sadhasivam
><mani@kernel.org>; Rob Herring <robh@kernel.org>; Bjorn Helgaas
><bhelgaas@google.com>
>Subject: [PATCH v3 2/3] PCI: hisi: Use
>devm_ghes_register_vendor_record_notifier()
>
>Switch to the device-managed variant so the notifier is automatically
>unregistered on device removal, allowing the open-coded remove callback to
>be dropped entirely.
>
>Cc: Shiju Jose <shiju.jose@huawei.com>
>Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
>Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
Reviewed-by: Shiju Jose <shiju.jose@huawei.com>
>---
>v3:
> - No change.
>v2:
> - New patch.
>
> drivers/pci/controller/pcie-hisi-error.c | 12 +-----------
> 1 file changed, 1 insertion(+), 11 deletions(-)
>
>diff --git a/drivers/pci/controller/pcie-hisi-error.c b/drivers/pci/controller/pcie-
>hisi-error.c
>index aaf1ed2b6e59..36be86d827a8 100644
>--- a/drivers/pci/controller/pcie-hisi-error.c
>+++ b/drivers/pci/controller/pcie-hisi-error.c
>@@ -287,25 +287,16 @@ static int hisi_pcie_error_handler_probe(struct
>platform_device *pdev)
>
> priv->nb.notifier_call = hisi_pcie_notify_error;
> priv->dev = &pdev->dev;
>- ret = ghes_register_vendor_record_notifier(&priv->nb);
>+ ret = devm_ghes_register_vendor_record_notifier(&pdev->dev,
>+&priv->nb);
> if (ret) {
> dev_err(&pdev->dev,
> "Failed to register hisi pcie controller error handler
>with apei\n");
> return ret;
> }
>
>- platform_set_drvdata(pdev, priv);
>-
> return 0;
> }
>
>-static void hisi_pcie_error_handler_remove(struct platform_device *pdev) -{
>- struct hisi_pcie_error_private *priv = platform_get_drvdata(pdev);
>-
>- ghes_unregister_vendor_record_notifier(&priv->nb);
>-}
>-
> static const struct acpi_device_id hisi_pcie_acpi_match[] = {
> { "HISI0361", 0 },
> { }
>@@ -317,7 +308,6 @@ static struct platform_driver
>hisi_pcie_error_handler_driver = {
> .acpi_match_table = hisi_pcie_acpi_match,
> },
> .probe = hisi_pcie_error_handler_probe,
>- .remove = hisi_pcie_error_handler_remove,
> };
> module_platform_driver(hisi_pcie_error_handler_driver);
>
>--
>2.50.1 (Apple Git-155)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-30 12:58 ` Breno Leitao
2026-03-31 14:42 ` Shiju Jose
@ 2026-04-01 1:55 ` Shuai Xue
2 siblings, 0 replies; 10+ messages in thread
From: Shuai Xue @ 2026-04-01 1:55 UTC (permalink / raw)
To: Kai-Heng Feng, rafael
Cc: jonathan.cameron, helgaas, guohanjun, linux-kernel, linux-acpi,
linux-pci, acpica-devel, linux-hardening, Shiju Jose, Tony Luck,
Borislav Petkov, Mauro Carvalho Chehab, Len Brown, Robert Moore,
Fabio M. De Francesco, Breno Leitao, Jason Tian
On 3/30/26 5:41 PM, Kai-Heng Feng wrote:
> Add a device-managed wrapper around ghes_register_vendor_record_notifier()
> so drivers can avoid manual cleanup on device removal or probe failure.
>
> Cc: Shiju Jose <shiju.jose@huawei.com>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com>
LGTM.
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Thanks.
Shuai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler
2026-03-30 9:41 [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Kai-Heng Feng
` (2 preceding siblings ...)
2026-03-30 9:41 ` [PATCH v3 3/3] ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler Kai-Heng Feng
@ 2026-04-06 14:50 ` Rafael J. Wysocki
3 siblings, 0 replies; 10+ messages in thread
From: Rafael J. Wysocki @ 2026-04-06 14:50 UTC (permalink / raw)
To: Kai-Heng Feng
Cc: rafael, jonathan.cameron, helgaas, guohanjun, linux-kernel,
linux-acpi, linux-pci, acpica-devel, linux-hardening
On Mon, Mar 30, 2026 at 11:42 AM Kai-Heng Feng <kaihengf@nvidia.com> wrote:
>
> NVIDIA DGX and HGX platforms provide an ACPI device (NVDA2012) alongside
> the GHES error source. The device does not generate the error records
> itself; its purpose is to allow the vendor CPER handler driver to load
> automatically via ACPI enumeration rather than manual module loading.
>
> The series introduces devm_ghes_register_vendor_record_notifier() helper,
> let HiSilicon PCI driver and new NVIDA GHES driver use the new helper.
>
> Kai-Heng Feng (3):
> ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
> PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
> ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
>
> MAINTAINERS | 6 +
> drivers/acpi/apei/Kconfig | 14 +++
> drivers/acpi/apei/Makefile | 1 +
> drivers/acpi/apei/ghes-nvidia.c | 149 +++++++++++++++++++++++
> drivers/acpi/apei/ghes.c | 18 +++
> drivers/pci/controller/pcie-hisi-error.c | 12 +-
> include/acpi/ghes.h | 11 ++
> 7 files changed, 200 insertions(+), 11 deletions(-)
> create mode 100644 drivers/acpi/apei/ghes-nvidia.c
>
> --
Whole series applied as 7.1 material, thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-06 14:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 9:41 [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Kai-Heng Feng
2026-03-30 9:41 ` [PATCH v3 1/3] ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-30 12:58 ` Breno Leitao
2026-03-31 14:42 ` Shiju Jose
2026-04-01 1:55 ` Shuai Xue
2026-03-30 9:41 ` [PATCH v3 2/3] PCI: hisi: Use devm_ghes_register_vendor_record_notifier() Kai-Heng Feng
2026-03-31 5:34 ` Manivannan Sadhasivam
2026-03-31 14:43 ` Shiju Jose
2026-03-30 9:41 ` [PATCH v3 3/3] ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler Kai-Heng Feng
2026-04-06 14:50 ` [PATCH v3 0/3] ACPI: APEI: GHES: Add device-managed notifier helper and NVIDIA CPER handler Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox