* [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider
@ 2026-02-20 13:42 Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
` (11 more replies)
0 siblings, 12 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
This is v2 of the GHES refactor series. The goal is to reuse existing
GHES CPER handling for non-ACPI platforms without changing the GHES
flow or naming, and add a DT firmware-first CPER provider, while
keeping the changes mechanical and reviewable.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
Changes in v2:
- Dropped the proposed "estatus core" and kept GHES naming/flow intact
(per Borislav Petkov).
- Re-sliced the series into smaller mechanical steps (per Mauro Carvalho Chehab).
- Minor DT binding fixes based on Krzysztof Kozlowski's feedback.
- Removed fixmap slot usage from the DT FFH driver (per Will Deacon).
Series structure:
- Patches 1-8 are mechanical moves only and do not change behavior.
- Patch 9 wires the shared helpers back into GHES.
- The DT firmware-first CPER buffer provider is added in the final patches.
- "ACPI: APEI: introduce GHES helper" is internal build glue only
and does not introduce a new user-visible configuration option.
- Link to v1: https://lore.kernel.org/r/20251217112845.1814119-1-ahmed.tiba@arm.com
---
Ahmed Tiba (11):
ACPI: APEI: GHES: share macros via a private header
ACPI: APEI: GHES: add ghes_cper.o stub
ACPI: APEI: GHES: move CPER read helpers
ACPI: APEI: GHES: move GHESv2 ack and alloc helpers
ACPI: APEI: GHES: move estatus cache helpers
ACPI: APEI: GHES: move vendor record helpers
ACPI: APEI: GHES: move CXL CPER helpers
ACPI: APEI: introduce GHES helper
ACPI: APEI: share GHES CPER helpers
dt-bindings: firmware: add arm,ras-ffh
RAS: add DeviceTree firmware-first CPER provider
Documentation/admin-guide/RAS/main.rst | 18 +
.../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++
MAINTAINERS | 6 +
drivers/Makefile | 1 +
drivers/acpi/Kconfig | 4 +
drivers/acpi/apei/Kconfig | 1 +
drivers/acpi/apei/apei-internal.h | 10 +-
drivers/acpi/apei/ghes.c | 1024 +------------------
drivers/acpi/apei/ghes_cper.c | 1026 ++++++++++++++++++++
drivers/ras/Kconfig | 12 +
drivers/ras/Makefile | 1 +
drivers/ras/esource-dt.c | 264 +++++
include/acpi/ghes.h | 10 +-
include/acpi/ghes_cper.h | 143 +++
include/cxl/event.h | 2 +-
15 files changed, 1558 insertions(+), 1035 deletions(-)
---
base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
change-id: 20260220-topics-ahmtib01-ras_ffh_arm_internal_review-bfddc7fc7cab
Best regards,
--
Ahmed Tiba <ahmed.tiba@arm.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-24 15:22 ` Jonathan Cameron
2026-02-26 6:44 ` Himanshu Chauhan
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
` (10 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Carve the CPER helper macros out of ghes.c and place them in a private
header so they can be shared with upcoming helper files. This is a
mechanical include change with no functional differences.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 60 +-----------------------------
include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 96 insertions(+), 59 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index f96aede5d9a3..07b70bcb8342 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -49,6 +49,7 @@
#include <acpi/actbl1.h>
#include <acpi/ghes.h>
+#include <acpi/ghes_cper.h>
#include <acpi/apei.h>
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
@@ -57,40 +58,6 @@
#include "apei-internal.h"
-#define GHES_PFX "GHES: "
-
-#define GHES_ESTATUS_MAX_SIZE 65536
-#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
-
-#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
-
-/* This is just an estimation for memory pool allocation */
-#define GHES_ESTATUS_CACHE_AVG_SIZE 512
-
-#define GHES_ESTATUS_CACHES_SIZE 4
-
-#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
-/* Prevent too many caches are allocated because of RCU */
-#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
-
-#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
- (sizeof(struct ghes_estatus_cache) + (estatus_len))
-#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
- ((struct acpi_hest_generic_status *) \
- ((struct ghes_estatus_cache *)(estatus_cache) + 1))
-
-#define GHES_ESTATUS_NODE_LEN(estatus_len) \
- (sizeof(struct ghes_estatus_node) + (estatus_len))
-#define GHES_ESTATUS_FROM_NODE(estatus_node) \
- ((struct acpi_hest_generic_status *) \
- ((struct ghes_estatus_node *)(estatus_node) + 1))
-
-#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
- (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
-#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
- ((struct acpi_hest_generic_data *) \
- ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
-
/*
* NMI-like notifications vary by architecture, before the compiler can prune
* unused static functions it needs a value for these enums.
@@ -102,25 +69,6 @@
static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
-static inline bool is_hest_type_generic_v2(struct ghes *ghes)
-{
- return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
-}
-
-/*
- * A platform may describe one error source for the handling of synchronous
- * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
- * or External Interrupt). On x86, the HEST notifications are always
- * asynchronous, so only SEA on ARM is delivered as a synchronous
- * notification.
- */
-static inline bool is_hest_sync_notify(struct ghes *ghes)
-{
- u8 notify_type = ghes->generic->notify.type;
-
- return notify_type == ACPI_HEST_NOTIFY_SEA;
-}
-
/*
* This driver isn't really modular, however for the time being,
* continuing to use module_param is the easiest way to remain
@@ -165,12 +113,6 @@ static DEFINE_MUTEX(ghes_devs_mutex);
*/
static DEFINE_SPINLOCK(ghes_notify_lock_irq);
-struct ghes_vendor_record_entry {
- struct work_struct work;
- int error_severity;
- char vendor_record[];
-};
-
static struct gen_pool *ghes_estatus_pool;
static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
new file mode 100644
index 000000000000..2597fbadc4f3
--- /dev/null
+++ b/include/acpi/ghes_cper.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * APEI Generic Hardware Error Source: CPER Helper
+ *
+ * Copyright (C) 2026 ARM Ltd.
+ * Author: Ahmed Tiba <ahmed.tiba@arm.com>
+ * Based on ACPI APEI GHES driver.
+ *
+ */
+
+#ifndef ACPI_APEI_GHES_CPER_H
+#define ACPI_APEI_GHES_CPER_H
+
+#include <linux/workqueue.h>
+
+#include <acpi/ghes.h>
+
+#define GHES_PFX "GHES: "
+
+#define GHES_ESTATUS_MAX_SIZE 65536
+#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
+
+#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
+
+/* This is just an estimation for memory pool allocation */
+#define GHES_ESTATUS_CACHE_AVG_SIZE 512
+
+#define GHES_ESTATUS_CACHES_SIZE 4
+
+#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
+/* Prevent too many caches are allocated because of RCU */
+#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
+
+#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
+ (sizeof(struct ghes_estatus_cache) + (estatus_len))
+#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
+ ((struct acpi_hest_generic_status *) \
+ ((struct ghes_estatus_cache *)(estatus_cache) + 1))
+
+#define GHES_ESTATUS_NODE_LEN(estatus_len) \
+ (sizeof(struct ghes_estatus_node) + (estatus_len))
+#define GHES_ESTATUS_FROM_NODE(estatus_node) \
+ ((struct acpi_hest_generic_status *) \
+ ((struct ghes_estatus_node *)(estatus_node) + 1))
+
+#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
+ (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
+#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
+ ((struct acpi_hest_generic_data *) \
+ ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
+
+static inline bool is_hest_type_generic_v2(struct ghes *ghes)
+{
+ return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
+}
+
+/*
+ * A platform may describe one error source for the handling of synchronous
+ * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
+ * or External Interrupt). On x86, the HEST notifications are always
+ * asynchronous, so only SEA on ARM is delivered as a synchronous
+ * notification.
+ */
+static inline bool is_hest_sync_notify(struct ghes *ghes)
+{
+ u8 notify_type = ghes->generic->notify.type;
+
+ return notify_type == ACPI_HEST_NOTIFY_SEA;
+}
+
+struct ghes_vendor_record_entry {
+ struct work_struct work;
+ int error_severity;
+ char vendor_record[];
+};
+
+static struct ghes *ghes_new(struct acpi_hest_generic *generic);
+static void ghes_fini(struct ghes *ghes);
+
+static int ghes_read_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 *buf_paddr, enum fixed_addresses fixmap_idx);
+static void ghes_clear_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 buf_paddr, enum fixed_addresses fixmap_idx);
+static int __ghes_peek_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 *buf_paddr, enum fixed_addresses fixmap_idx);
+static int __ghes_check_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus);
+static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
+ u64 buf_paddr, enum fixed_addresses fixmap_idx,
+ size_t buf_len);
+
+#endif /* ACPI_APEI_GHES_CPER_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-24 15:25 ` Jonathan Cameron
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
` (9 subsequent siblings)
11 siblings, 1 reply; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Introduce a dedicated ghes_cper translation unit so that follow-on commits
can move helpers out of ghes.c without touching the build logic twice.
This keeps the object in the tree while remaining functionally identical.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/Makefile | 2 +-
drivers/acpi/apei/ghes_cper.c | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
index 1a0b85923cd4..b3774af70883 100644
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_ACPI_APEI) += apei.o
-obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
+obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o ghes_cper.o
# clang versions prior to 18 may blow out the stack with KASAN
ifeq ($(CONFIG_COMPILE_TEST)_$(CONFIG_CC_IS_CLANG)_$(call clang-min-version, 180000),y_y_)
KASAN_SANITIZE_ghes.o := n
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
new file mode 100644
index 000000000000..63047322a3d9
--- /dev/null
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *
+ * APEI GHES CPER helper translation unit - staging file for helper moves
+ *
+ * Copyright (C) 2026 ARM Ltd.
+ * Author: Ahmed Tiba <ahmed.tiba@arm.com>
+ * Based on ACPI APEI GHES driver.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/ratelimit.h>
+#include <linux/slab.h>
+
+#include <acpi/apei.h>
+
+#include <asm/fixmap.h>
+#include <asm/tlbflush.h>
+
+#include "apei-internal.h"
+
+/* Helper bodies will be moved here in follow-up commits. */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-24 15:32 ` Jonathan Cameron
2026-02-26 5:58 ` Himanshu Chauhan
2026-02-20 13:42 ` [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
` (8 subsequent siblings)
11 siblings, 2 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Relocate the CPER buffer mapping, peek, and clear helpers from ghes.c into
ghes_cper.c so they can be shared with other firmware-first providers.
This commit only shuffles code; behavior stays the same.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 170 +-----------------------------------------
drivers/acpi/apei/ghes_cper.c | 170 +++++++++++++++++++++++++++++++++++++++++-
include/acpi/ghes_cper.h | 14 ++--
3 files changed, 177 insertions(+), 177 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 07b70bcb8342..b159dbee90ac 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -118,26 +118,6 @@ static struct gen_pool *ghes_estatus_pool;
static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
static atomic_t ghes_estatus_cache_alloced;
-static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
-{
- phys_addr_t paddr;
- pgprot_t prot;
-
- paddr = PFN_PHYS(pfn);
- prot = arch_apei_get_mem_attribute(paddr);
- __set_fixmap(fixmap_idx, paddr, prot);
-
- return (void __iomem *) __fix_to_virt(fixmap_idx);
-}
-
-static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
-{
- int _idx = virt_to_fix((unsigned long)vaddr);
-
- WARN_ON_ONCE(fixmap_idx != _idx);
- clear_fixmap(fixmap_idx);
-}
-
int ghes_estatus_pool_init(unsigned int num_ghes)
{
unsigned long addr, len;
@@ -193,22 +173,7 @@ static void unmap_gen_v2(struct ghes *ghes)
apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
}
-static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
-{
- int rc;
- u64 val = 0;
-
- rc = apei_read(&val, &gv2->read_ack_register);
- if (rc)
- return;
-
- val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
- val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
-
- apei_write(val, &gv2->read_ack_register);
-}
-
-static struct ghes *ghes_new(struct acpi_hest_generic *generic)
+struct ghes *ghes_new(struct acpi_hest_generic *generic)
{
struct ghes *ghes;
unsigned int error_block_length;
@@ -255,7 +220,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
return ERR_PTR(rc);
}
-static void ghes_fini(struct ghes *ghes)
+void ghes_fini(struct ghes *ghes)
{
kfree(ghes->estatus);
apei_unmap_generic_address(&ghes->generic->error_status_address);
@@ -280,137 +245,6 @@ static inline int ghes_severity(int severity)
}
}
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
- int from_phys,
- enum fixed_addresses fixmap_idx)
-{
- void __iomem *vaddr;
- u64 offset;
- u32 trunk;
-
- while (len > 0) {
- offset = paddr - (paddr & PAGE_MASK);
- vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
- trunk = PAGE_SIZE - offset;
- trunk = min(trunk, len);
- if (from_phys)
- memcpy_fromio(buffer, vaddr + offset, trunk);
- else
- memcpy_toio(vaddr + offset, buffer, trunk);
- len -= trunk;
- paddr += trunk;
- buffer += trunk;
- ghes_unmap(vaddr, fixmap_idx);
- }
-}
-
-/* Check the top-level record header has an appropriate size. */
-static int __ghes_check_estatus(struct ghes *ghes,
- struct acpi_hest_generic_status *estatus)
-{
- u32 len = cper_estatus_len(estatus);
- u32 max_len = min(ghes->generic->error_block_length,
- ghes->estatus_length);
-
- if (len < sizeof(*estatus)) {
- pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
- return -EIO;
- }
-
- if (!len || len > max_len) {
- pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
- return -EIO;
- }
-
- if (cper_estatus_check_header(estatus)) {
- pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
- return -EIO;
- }
-
- return 0;
-}
-
-/* Read the CPER block, returning its address, and header in estatus. */
-static int __ghes_peek_estatus(struct ghes *ghes,
- struct acpi_hest_generic_status *estatus,
- u64 *buf_paddr, enum fixed_addresses fixmap_idx)
-{
- struct acpi_hest_generic *g = ghes->generic;
- int rc;
-
- rc = apei_read(buf_paddr, &g->error_status_address);
- if (rc) {
- *buf_paddr = 0;
- pr_warn_ratelimited(FW_WARN GHES_PFX
-"Failed to read error status block address for hardware error source: %d.\n",
- g->header.source_id);
- return -EIO;
- }
- if (!*buf_paddr)
- return -ENOENT;
-
- ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
- fixmap_idx);
- if (!estatus->block_status) {
- *buf_paddr = 0;
- return -ENOENT;
- }
-
- return 0;
-}
-
-static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
- u64 buf_paddr, enum fixed_addresses fixmap_idx,
- size_t buf_len)
-{
- ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
- if (cper_estatus_check(estatus)) {
- pr_warn_ratelimited(FW_WARN GHES_PFX
- "Failed to read error status block!\n");
- return -EIO;
- }
-
- return 0;
-}
-
-static int ghes_read_estatus(struct ghes *ghes,
- struct acpi_hest_generic_status *estatus,
- u64 *buf_paddr, enum fixed_addresses fixmap_idx)
-{
- int rc;
-
- rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
- if (rc)
- return rc;
-
- rc = __ghes_check_estatus(ghes, estatus);
- if (rc)
- return rc;
-
- return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
- cper_estatus_len(estatus));
-}
-
-static void ghes_clear_estatus(struct ghes *ghes,
- struct acpi_hest_generic_status *estatus,
- u64 buf_paddr, enum fixed_addresses fixmap_idx)
-{
- estatus->block_status = 0;
-
- if (!buf_paddr)
- return;
-
- ghes_copy_tofrom_phys(estatus, buf_paddr,
- sizeof(estatus->block_status), 0,
- fixmap_idx);
-
- /*
- * GHESv2 type HEST entries introduce support for error acknowledgment,
- * so only acknowledge the error if this support is present.
- */
- if (is_hest_type_generic_v2(ghes))
- ghes_ack_error(ghes->generic_v2);
-}
/**
* struct ghes_task_work - for synchronous RAS event
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 63047322a3d9..7e0015e960c1 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/*
*
- * APEI GHES CPER helper translation unit - staging file for helper moves
+ * APEI GHES CPER helper translation unit - code mechanically moved from ghes.c
*
* Copyright (C) 2026 ARM Ltd.
* Author: Ahmed Tiba <ahmed.tiba@arm.com>
@@ -17,10 +17,176 @@
#include <linux/slab.h>
#include <acpi/apei.h>
+#include <acpi/ghes_cper.h>
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
#include "apei-internal.h"
-/* Helper bodies will be moved here in follow-up commits. */
+static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
+{
+ phys_addr_t paddr;
+ pgprot_t prot;
+
+ paddr = PFN_PHYS(pfn);
+ prot = arch_apei_get_mem_attribute(paddr);
+ __set_fixmap(fixmap_idx, paddr, prot);
+
+ return (void __iomem *) __fix_to_virt(fixmap_idx);
+}
+
+static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
+{
+ int _idx = virt_to_fix((unsigned long)vaddr);
+
+ WARN_ON_ONCE(fixmap_idx != _idx);
+ clear_fixmap(fixmap_idx);
+}
+
+static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
+{
+ int rc;
+ u64 val = 0;
+
+ rc = apei_read(&val, &gv2->read_ack_register);
+ if (rc)
+ return;
+
+ val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
+ val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
+
+ apei_write(val, &gv2->read_ack_register);
+}
+
+static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
+ int from_phys,
+ enum fixed_addresses fixmap_idx)
+{
+ void __iomem *vaddr;
+ u64 offset;
+ u32 trunk;
+
+ while (len > 0) {
+ offset = paddr - (paddr & PAGE_MASK);
+ vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
+ trunk = PAGE_SIZE - offset;
+ trunk = min(trunk, len);
+ if (from_phys)
+ memcpy_fromio(buffer, vaddr + offset, trunk);
+ else
+ memcpy_toio(vaddr + offset, buffer, trunk);
+ len -= trunk;
+ paddr += trunk;
+ buffer += trunk;
+ ghes_unmap(vaddr, fixmap_idx);
+ }
+}
+
+/* Check the top-level record header has an appropriate size. */
+int __ghes_check_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus)
+{
+ u32 len = cper_estatus_len(estatus);
+ u32 max_len = min(ghes->generic->error_block_length,
+ ghes->estatus_length);
+
+ if (len < sizeof(*estatus)) {
+ pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
+ return -EIO;
+ }
+
+ if (!len || len > max_len) {
+ pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
+ return -EIO;
+ }
+
+ if (cper_estatus_check_header(estatus)) {
+ pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+/* Read the CPER block, returning its address, and header in estatus. */
+int __ghes_peek_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 *buf_paddr, enum fixed_addresses fixmap_idx)
+{
+ struct acpi_hest_generic *g = ghes->generic;
+ int rc;
+
+ rc = apei_read(buf_paddr, &g->error_status_address);
+ if (rc) {
+ *buf_paddr = 0;
+ pr_warn_ratelimited(FW_WARN GHES_PFX
+"Failed to read error status block address for hardware error source: %d.\n",
+ g->header.source_id);
+ return -EIO;
+ }
+ if (!*buf_paddr)
+ return -ENOENT;
+
+ ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
+ fixmap_idx);
+ if (!estatus->block_status) {
+ *buf_paddr = 0;
+ return -ENOENT;
+ }
+
+ return 0;
+}
+
+int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
+ u64 buf_paddr, enum fixed_addresses fixmap_idx,
+ size_t buf_len)
+{
+ ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
+ if (cper_estatus_check(estatus)) {
+ pr_warn_ratelimited(FW_WARN GHES_PFX
+ "Failed to read error status block!\n");
+ return -EIO;
+ }
+
+ return 0;
+}
+
+int ghes_read_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 *buf_paddr, enum fixed_addresses fixmap_idx)
+{
+ int rc;
+
+ rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
+ if (rc)
+ return rc;
+
+ rc = __ghes_check_estatus(ghes, estatus);
+ if (rc)
+ return rc;
+
+ return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
+ cper_estatus_len(estatus));
+}
+
+void ghes_clear_estatus(struct ghes *ghes,
+ struct acpi_hest_generic_status *estatus,
+ u64 buf_paddr, enum fixed_addresses fixmap_idx)
+{
+ estatus->block_status = 0;
+
+ if (!buf_paddr)
+ return;
+
+ ghes_copy_tofrom_phys(estatus, buf_paddr,
+ sizeof(estatus->block_status), 0,
+ fixmap_idx);
+
+ /*
+ * GHESv2 type HEST entries introduce support for error acknowledgment,
+ * so only acknowledge the error if this support is present.
+ */
+ if (is_hest_type_generic_v2(ghes))
+ ghes_ack_error(ghes->generic_v2);
+}
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index 2597fbadc4f3..2e3919f0c3e7 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -74,21 +74,21 @@ struct ghes_vendor_record_entry {
char vendor_record[];
};
-static struct ghes *ghes_new(struct acpi_hest_generic *generic);
-static void ghes_fini(struct ghes *ghes);
+struct ghes *ghes_new(struct acpi_hest_generic *generic);
+void ghes_fini(struct ghes *ghes);
-static int ghes_read_estatus(struct ghes *ghes,
+int ghes_read_estatus(struct ghes *ghes,
struct acpi_hest_generic_status *estatus,
u64 *buf_paddr, enum fixed_addresses fixmap_idx);
-static void ghes_clear_estatus(struct ghes *ghes,
+void ghes_clear_estatus(struct ghes *ghes,
struct acpi_hest_generic_status *estatus,
u64 buf_paddr, enum fixed_addresses fixmap_idx);
-static int __ghes_peek_estatus(struct ghes *ghes,
+int __ghes_peek_estatus(struct ghes *ghes,
struct acpi_hest_generic_status *estatus,
u64 *buf_paddr, enum fixed_addresses fixmap_idx);
-static int __ghes_check_estatus(struct ghes *ghes,
+int __ghes_check_estatus(struct ghes *ghes,
struct acpi_hest_generic_status *estatus);
-static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
+int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
u64 buf_paddr, enum fixed_addresses fixmap_idx,
size_t buf_len);
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (2 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
` (7 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Move the GHESv2 acknowledgment and error-source allocation helpers from
ghes.c into ghes_cper.c. This is a mechanical refactor that keeps the
logic unchanged while making the helpers reusable.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 65 -------------------------------------------
drivers/acpi/apei/ghes_cper.c | 65 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+), 65 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b159dbee90ac..d562c98bff19 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -163,71 +163,6 @@ void ghes_estatus_pool_region_free(unsigned long addr, u32 size)
}
EXPORT_SYMBOL_GPL(ghes_estatus_pool_region_free);
-static int map_gen_v2(struct ghes *ghes)
-{
- return apei_map_generic_address(&ghes->generic_v2->read_ack_register);
-}
-
-static void unmap_gen_v2(struct ghes *ghes)
-{
- apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
-}
-
-struct ghes *ghes_new(struct acpi_hest_generic *generic)
-{
- struct ghes *ghes;
- unsigned int error_block_length;
- int rc;
-
- ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
- if (!ghes)
- return ERR_PTR(-ENOMEM);
-
- ghes->generic = generic;
- if (is_hest_type_generic_v2(ghes)) {
- rc = map_gen_v2(ghes);
- if (rc)
- goto err_free;
- }
-
- rc = apei_map_generic_address(&generic->error_status_address);
- if (rc)
- goto err_unmap_read_ack_addr;
- error_block_length = generic->error_block_length;
- if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
- pr_warn(FW_WARN GHES_PFX
- "Error status block length is too long: %u for "
- "generic hardware error source: %d.\n",
- error_block_length, generic->header.source_id);
- error_block_length = GHES_ESTATUS_MAX_SIZE;
- }
- ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
- ghes->estatus_length = error_block_length;
- if (!ghes->estatus) {
- rc = -ENOMEM;
- goto err_unmap_status_addr;
- }
-
- return ghes;
-
-err_unmap_status_addr:
- apei_unmap_generic_address(&generic->error_status_address);
-err_unmap_read_ack_addr:
- if (is_hest_type_generic_v2(ghes))
- unmap_gen_v2(ghes);
-err_free:
- kfree(ghes);
- return ERR_PTR(rc);
-}
-
-void ghes_fini(struct ghes *ghes)
-{
- kfree(ghes->estatus);
- apei_unmap_generic_address(&ghes->generic->error_status_address);
- if (is_hest_type_generic_v2(ghes))
- unmap_gen_v2(ghes);
-}
-
static inline int ghes_severity(int severity)
{
switch (severity) {
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 7e0015e960c1..974d5f032799 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -59,6 +59,71 @@ static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
apei_write(val, &gv2->read_ack_register);
}
+static int map_gen_v2(struct ghes *ghes)
+{
+ return apei_map_generic_address(&ghes->generic_v2->read_ack_register);
+}
+
+static void unmap_gen_v2(struct ghes *ghes)
+{
+ apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
+}
+
+struct ghes *ghes_new(struct acpi_hest_generic *generic)
+{
+ struct ghes *ghes;
+ unsigned int error_block_length;
+ int rc;
+
+ ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
+ if (!ghes)
+ return ERR_PTR(-ENOMEM);
+
+ ghes->generic = generic;
+ if (is_hest_type_generic_v2(ghes)) {
+ rc = map_gen_v2(ghes);
+ if (rc)
+ goto err_free;
+ }
+
+ rc = apei_map_generic_address(&generic->error_status_address);
+ if (rc)
+ goto err_unmap_read_ack_addr;
+ error_block_length = generic->error_block_length;
+ if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
+ pr_warn(FW_WARN GHES_PFX
+ "Error status block length is too long: %u for "
+ "generic hardware error source: %d.\n",
+ error_block_length, generic->header.source_id);
+ error_block_length = GHES_ESTATUS_MAX_SIZE;
+ }
+ ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
+ ghes->estatus_length = error_block_length;
+ if (!ghes->estatus) {
+ rc = -ENOMEM;
+ goto err_unmap_status_addr;
+ }
+
+ return ghes;
+
+err_unmap_status_addr:
+ apei_unmap_generic_address(&generic->error_status_address);
+err_unmap_read_ack_addr:
+ if (is_hest_type_generic_v2(ghes))
+ unmap_gen_v2(ghes);
+err_free:
+ kfree(ghes);
+ return ERR_PTR(rc);
+}
+
+void ghes_fini(struct ghes *ghes)
+{
+ kfree(ghes->estatus);
+ apei_unmap_generic_address(&ghes->generic->error_status_address);
+ if (is_hest_type_generic_v2(ghes))
+ unmap_gen_v2(ghes);
+}
+
static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
int from_phys,
enum fixed_addresses fixmap_idx)
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (3 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
` (6 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Relocate the estatus cache allocation and lookup helpers from ghes.c into
ghes_cper.c. This code move keeps the logic intact while making the cache
implementation available to forthcoming users.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 138 +----------------------------------------
drivers/acpi/apei/ghes_cper.c | 140 ++++++++++++++++++++++++++++++++++++++++++
include/acpi/ghes_cper.h | 6 ++
3 files changed, 147 insertions(+), 137 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d562c98bff19..8a9b4dda3748 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -113,10 +113,7 @@ static DEFINE_MUTEX(ghes_devs_mutex);
*/
static DEFINE_SPINLOCK(ghes_notify_lock_irq);
-static struct gen_pool *ghes_estatus_pool;
-
-static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
-static atomic_t ghes_estatus_cache_alloced;
+struct gen_pool *ghes_estatus_pool;
int ghes_estatus_pool_init(unsigned int num_ghes)
{
@@ -715,139 +712,6 @@ static int ghes_print_estatus(const char *pfx,
return 0;
}
-/*
- * GHES error status reporting throttle, to report more kinds of
- * errors, instead of just most frequently occurred errors.
- */
-static int ghes_estatus_cached(struct acpi_hest_generic_status *estatus)
-{
- u32 len;
- int i, cached = 0;
- unsigned long long now;
- struct ghes_estatus_cache *cache;
- struct acpi_hest_generic_status *cache_estatus;
-
- len = cper_estatus_len(estatus);
- rcu_read_lock();
- for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) {
- cache = rcu_dereference(ghes_estatus_caches[i]);
- if (cache == NULL)
- continue;
- if (len != cache->estatus_len)
- continue;
- cache_estatus = GHES_ESTATUS_FROM_CACHE(cache);
- if (memcmp(estatus, cache_estatus, len))
- continue;
- atomic_inc(&cache->count);
- now = sched_clock();
- if (now - cache->time_in < GHES_ESTATUS_IN_CACHE_MAX_NSEC)
- cached = 1;
- break;
- }
- rcu_read_unlock();
- return cached;
-}
-
-static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
- struct acpi_hest_generic *generic,
- struct acpi_hest_generic_status *estatus)
-{
- int alloced;
- u32 len, cache_len;
- struct ghes_estatus_cache *cache;
- struct acpi_hest_generic_status *cache_estatus;
-
- alloced = atomic_add_return(1, &ghes_estatus_cache_alloced);
- if (alloced > GHES_ESTATUS_CACHE_ALLOCED_MAX) {
- atomic_dec(&ghes_estatus_cache_alloced);
- return NULL;
- }
- len = cper_estatus_len(estatus);
- cache_len = GHES_ESTATUS_CACHE_LEN(len);
- cache = (void *)gen_pool_alloc(ghes_estatus_pool, cache_len);
- if (!cache) {
- atomic_dec(&ghes_estatus_cache_alloced);
- return NULL;
- }
- cache_estatus = GHES_ESTATUS_FROM_CACHE(cache);
- memcpy(cache_estatus, estatus, len);
- cache->estatus_len = len;
- atomic_set(&cache->count, 0);
- cache->generic = generic;
- cache->time_in = sched_clock();
- return cache;
-}
-
-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
- struct ghes_estatus_cache *cache;
- u32 len;
-
- cache = container_of(head, struct ghes_estatus_cache, rcu);
- len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
- len = GHES_ESTATUS_CACHE_LEN(len);
- gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
- atomic_dec(&ghes_estatus_cache_alloced);
-}
-
-static void
-ghes_estatus_cache_add(struct acpi_hest_generic *generic,
- struct acpi_hest_generic_status *estatus)
-{
- unsigned long long now, duration, period, max_period = 0;
- struct ghes_estatus_cache *cache, *new_cache;
- struct ghes_estatus_cache __rcu *victim;
- int i, slot = -1, count;
-
- new_cache = ghes_estatus_cache_alloc(generic, estatus);
- if (!new_cache)
- return;
-
- rcu_read_lock();
- now = sched_clock();
- for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) {
- cache = rcu_dereference(ghes_estatus_caches[i]);
- if (cache == NULL) {
- slot = i;
- break;
- }
- duration = now - cache->time_in;
- if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
- slot = i;
- break;
- }
- count = atomic_read(&cache->count);
- period = duration;
- do_div(period, (count + 1));
- if (period > max_period) {
- max_period = period;
- slot = i;
- }
- }
- rcu_read_unlock();
-
- if (slot != -1) {
- /*
- * Use release semantics to ensure that ghes_estatus_cached()
- * running on another CPU will see the updated cache fields if
- * it can see the new value of the pointer.
- */
- victim = xchg_release(&ghes_estatus_caches[slot],
- RCU_INITIALIZER(new_cache));
-
- /*
- * At this point, victim may point to a cached item different
- * from the one based on which we selected the slot. Instead of
- * going to the loop again to pick another slot, let's just
- * drop the other item anyway: this may cause a false cache
- * miss later on, but that won't cause any problems.
- */
- if (victim)
- call_rcu(&unrcu_pointer(victim)->rcu,
- ghes_estatus_cache_rcu_free);
- }
-}
-
static void __ghes_panic(struct ghes *ghes,
struct acpi_hest_generic_status *estatus,
u64 buf_paddr, enum fixed_addresses fixmap_idx)
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 974d5f032799..cb7f6f684087 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -10,10 +10,14 @@
*/
#include <linux/err.h>
+#include <linux/genalloc.h>
#include <linux/io.h>
#include <linux/kernel.h>
+#include <linux/math64.h>
#include <linux/mm.h>
#include <linux/ratelimit.h>
+#include <linux/rcupdate.h>
+#include <linux/sched/clock.h>
#include <linux/slab.h>
#include <acpi/apei.h>
@@ -24,6 +28,9 @@
#include "apei-internal.h"
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static atomic_t ghes_estatus_cache_alloced;
+
static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
{
phys_addr_t paddr;
@@ -255,3 +262,136 @@ void ghes_clear_estatus(struct ghes *ghes,
if (is_hest_type_generic_v2(ghes))
ghes_ack_error(ghes->generic_v2);
}
+
+/*
+ * GHES error status reporting throttle, to report more kinds of
+ * errors, instead of just most frequently occurred errors.
+ */
+int ghes_estatus_cached(struct acpi_hest_generic_status *estatus)
+{
+ u32 len;
+ int i, cached = 0;
+ unsigned long long now;
+ struct ghes_estatus_cache *cache;
+ struct acpi_hest_generic_status *cache_estatus;
+
+ len = cper_estatus_len(estatus);
+ rcu_read_lock();
+ for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) {
+ cache = rcu_dereference(ghes_estatus_caches[i]);
+ if (cache == NULL)
+ continue;
+ if (len != cache->estatus_len)
+ continue;
+ cache_estatus = GHES_ESTATUS_FROM_CACHE(cache);
+ if (memcmp(estatus, cache_estatus, len))
+ continue;
+ atomic_inc(&cache->count);
+ now = sched_clock();
+ if (now - cache->time_in < GHES_ESTATUS_IN_CACHE_MAX_NSEC)
+ cached = 1;
+ break;
+ }
+ rcu_read_unlock();
+ return cached;
+}
+
+static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
+ struct acpi_hest_generic *generic,
+ struct acpi_hest_generic_status *estatus)
+{
+ int alloced;
+ u32 len, cache_len;
+ struct ghes_estatus_cache *cache;
+ struct acpi_hest_generic_status *cache_estatus;
+
+ alloced = atomic_add_return(1, &ghes_estatus_cache_alloced);
+ if (alloced > GHES_ESTATUS_CACHE_ALLOCED_MAX) {
+ atomic_dec(&ghes_estatus_cache_alloced);
+ return NULL;
+ }
+ len = cper_estatus_len(estatus);
+ cache_len = GHES_ESTATUS_CACHE_LEN(len);
+ cache = (void *)gen_pool_alloc(ghes_estatus_pool, cache_len);
+ if (!cache) {
+ atomic_dec(&ghes_estatus_cache_alloced);
+ return NULL;
+ }
+ cache_estatus = GHES_ESTATUS_FROM_CACHE(cache);
+ memcpy(cache_estatus, estatus, len);
+ cache->estatus_len = len;
+ atomic_set(&cache->count, 0);
+ cache->generic = generic;
+ cache->time_in = sched_clock();
+ return cache;
+}
+
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
+{
+ struct ghes_estatus_cache *cache;
+ u32 len;
+
+ cache = container_of(head, struct ghes_estatus_cache, rcu);
+ len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
+ len = GHES_ESTATUS_CACHE_LEN(len);
+ gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
+ atomic_dec(&ghes_estatus_cache_alloced);
+}
+
+void
+ghes_estatus_cache_add(struct acpi_hest_generic *generic,
+ struct acpi_hest_generic_status *estatus)
+{
+ unsigned long long now, duration, period, max_period = 0;
+ struct ghes_estatus_cache *cache, *new_cache;
+ struct ghes_estatus_cache __rcu *victim;
+ int i, slot = -1, count;
+
+ new_cache = ghes_estatus_cache_alloc(generic, estatus);
+ if (!new_cache)
+ return;
+
+ rcu_read_lock();
+ now = sched_clock();
+ for (i = 0; i < GHES_ESTATUS_CACHES_SIZE; i++) {
+ cache = rcu_dereference(ghes_estatus_caches[i]);
+ if (cache == NULL) {
+ slot = i;
+ break;
+ }
+ duration = now - cache->time_in;
+ if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
+ slot = i;
+ break;
+ }
+ count = atomic_read(&cache->count);
+ period = duration;
+ do_div(period, (count + 1));
+ if (period > max_period) {
+ max_period = period;
+ slot = i;
+ }
+ }
+ rcu_read_unlock();
+
+ if (slot != -1) {
+ /*
+ * Use release semantics to ensure that ghes_estatus_cached()
+ * running on another CPU will see the updated cache fields if
+ * it can see the new value of the pointer.
+ */
+ victim = xchg_release(&ghes_estatus_caches[slot],
+ RCU_INITIALIZER(new_cache));
+
+ /*
+ * At this point, victim may point to a cached item different
+ * from the one based on which we selected the slot. Instead of
+ * going to the loop again to pick another slot, let's just
+ * drop the other item anyway: this may cause a false cache
+ * miss later on, but that won't cause any problems.
+ */
+ if (victim)
+ call_rcu(&unrcu_pointer(victim)->rcu,
+ ghes_estatus_cache_rcu_free);
+ }
+}
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index 2e3919f0c3e7..1f012a23d0c6 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -11,6 +11,7 @@
#ifndef ACPI_APEI_GHES_CPER_H
#define ACPI_APEI_GHES_CPER_H
+#include <linux/atomic.h>
#include <linux/workqueue.h>
#include <acpi/ghes.h>
@@ -49,6 +50,8 @@
((struct acpi_hest_generic_data *) \
((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
+extern struct gen_pool *ghes_estatus_pool;
+
static inline bool is_hest_type_generic_v2(struct ghes *ghes)
{
return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
@@ -91,5 +94,8 @@ int __ghes_check_estatus(struct ghes *ghes,
int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
u64 buf_paddr, enum fixed_addresses fixmap_idx,
size_t buf_len);
+int ghes_estatus_cached(struct acpi_hest_generic_status *estatus);
+void ghes_estatus_cache_add(struct acpi_hest_generic *generic,
+ struct acpi_hest_generic_status *estatus);
#endif /* ACPI_APEI_GHES_CPER_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (4 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
` (5 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Shift the vendor record workqueue helpers into ghes_cper.c so both GHES
and future DT-based providers can use the same implementation. The change
is mechanical and keeps the notifier behavior identical.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 50 --------------------------------------
drivers/acpi/apei/ghes_cper.c | 56 +++++++++++++++++++++++++++++++++++++++++++
include/acpi/ghes_cper.h | 2 ++
3 files changed, 58 insertions(+), 50 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 8a9b4dda3748..9703c602a8c2 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -383,56 +383,6 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
#endif
}
-static BLOCKING_NOTIFIER_HEAD(vendor_record_notify_list);
-
-int ghes_register_vendor_record_notifier(struct notifier_block *nb)
-{
- return blocking_notifier_chain_register(&vendor_record_notify_list, nb);
-}
-EXPORT_SYMBOL_GPL(ghes_register_vendor_record_notifier);
-
-void ghes_unregister_vendor_record_notifier(struct notifier_block *nb)
-{
- blocking_notifier_chain_unregister(&vendor_record_notify_list, nb);
-}
-EXPORT_SYMBOL_GPL(ghes_unregister_vendor_record_notifier);
-
-static void ghes_vendor_record_work_func(struct work_struct *work)
-{
- struct ghes_vendor_record_entry *entry;
- struct acpi_hest_generic_data *gdata;
- u32 len;
-
- entry = container_of(work, struct ghes_vendor_record_entry, work);
- gdata = GHES_GDATA_FROM_VENDOR_ENTRY(entry);
-
- blocking_notifier_call_chain(&vendor_record_notify_list,
- entry->error_severity, gdata);
-
- len = GHES_VENDOR_ENTRY_LEN(acpi_hest_get_record_size(gdata));
- gen_pool_free(ghes_estatus_pool, (unsigned long)entry, len);
-}
-
-static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
- int sev)
-{
- struct acpi_hest_generic_data *copied_gdata;
- struct ghes_vendor_record_entry *entry;
- u32 len;
-
- len = GHES_VENDOR_ENTRY_LEN(acpi_hest_get_record_size(gdata));
- entry = (void *)gen_pool_alloc(ghes_estatus_pool, len);
- if (!entry)
- return;
-
- copied_gdata = GHES_GDATA_FROM_VENDOR_ENTRY(entry);
- memcpy(copied_gdata, gdata, acpi_hest_get_record_size(gdata));
- entry->error_severity = sev;
-
- INIT_WORK(&entry->work, ghes_vendor_record_work_func);
- schedule_work(&entry->work);
-}
-
/* Room for 8 entries */
#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index cb7f6f684087..627f6c712261 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -11,12 +11,17 @@
#include <linux/err.h>
#include <linux/genalloc.h>
+#include <linux/irq_work.h>
#include <linux/io.h>
#include <linux/kernel.h>
+#include <linux/list.h>
#include <linux/math64.h>
#include <linux/mm.h>
+#include <linux/notifier.h>
+#include <linux/llist.h>
#include <linux/ratelimit.h>
#include <linux/rcupdate.h>
+#include <linux/rculist.h>
#include <linux/sched/clock.h>
#include <linux/slab.h>
@@ -263,6 +268,57 @@ void ghes_clear_estatus(struct ghes *ghes,
ghes_ack_error(ghes->generic_v2);
}
+
+static BLOCKING_NOTIFIER_HEAD(vendor_record_notify_list);
+
+int ghes_register_vendor_record_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&vendor_record_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_register_vendor_record_notifier);
+
+void ghes_unregister_vendor_record_notifier(struct notifier_block *nb)
+{
+ blocking_notifier_chain_unregister(&vendor_record_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(ghes_unregister_vendor_record_notifier);
+
+static void ghes_vendor_record_work_func(struct work_struct *work)
+{
+ struct ghes_vendor_record_entry *entry;
+ struct acpi_hest_generic_data *gdata;
+ u32 len;
+
+ entry = container_of(work, struct ghes_vendor_record_entry, work);
+ gdata = GHES_GDATA_FROM_VENDOR_ENTRY(entry);
+
+ blocking_notifier_call_chain(&vendor_record_notify_list,
+ entry->error_severity, gdata);
+
+ len = GHES_VENDOR_ENTRY_LEN(acpi_hest_get_record_size(gdata));
+ gen_pool_free(ghes_estatus_pool, (unsigned long)entry, len);
+}
+
+void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
+ int sev)
+{
+ struct acpi_hest_generic_data *copied_gdata;
+ struct ghes_vendor_record_entry *entry;
+ u32 len;
+
+ len = GHES_VENDOR_ENTRY_LEN(acpi_hest_get_record_size(gdata));
+ entry = (void *)gen_pool_alloc(ghes_estatus_pool, len);
+ if (!entry)
+ return;
+
+ copied_gdata = GHES_GDATA_FROM_VENDOR_ENTRY(entry);
+ memcpy(copied_gdata, gdata, acpi_hest_get_record_size(gdata));
+ entry->error_severity = sev;
+
+ INIT_WORK(&entry->work, ghes_vendor_record_work_func);
+ schedule_work(&entry->work);
+}
+
/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index 1f012a23d0c6..c5ff4c502017 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -97,5 +97,7 @@ int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
int ghes_estatus_cached(struct acpi_hest_generic_status *estatus);
void ghes_estatus_cache_add(struct acpi_hest_generic *generic,
struct acpi_hest_generic_status *estatus);
+void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
+ int sev);
#endif /* ACPI_APEI_GHES_CPER_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (5 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-24 15:34 ` Jonathan Cameron
2026-02-20 13:42 ` [PATCH v2 08/11] ACPI: APEI: introduce GHES helper Ahmed Tiba
` (4 subsequent siblings)
11 siblings, 1 reply; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Move the CXL CPER handling paths out of ghes.c and into ghes_cper.c so the
helpers can be reused. The code is moved as-is, with the public
prototypes updated so GHES keeps calling into the new translation unit.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 132 -----------------------------------------
drivers/acpi/apei/ghes_cper.c | 135 ++++++++++++++++++++++++++++++++++++++++++
include/acpi/ghes_cper.h | 11 ++++
3 files changed, 146 insertions(+), 132 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 9703c602a8c2..136993704d52 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -383,138 +383,6 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
#endif
}
-/* Room for 8 entries */
-#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
-static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
- CXL_CPER_PROT_ERR_FIFO_DEPTH);
-
-/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
-static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
-struct work_struct *cxl_cper_prot_err_work;
-
-static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
- int severity)
-{
-#ifdef CONFIG_ACPI_APEI_PCIEAER
- struct cxl_cper_prot_err_work_data wd;
-
- if (cxl_cper_sec_prot_err_valid(prot_err))
- return;
-
- guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
-
- if (!cxl_cper_prot_err_work)
- return;
-
- if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
- return;
-
- if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
- pr_err_ratelimited("CXL CPER kfifo overflow\n");
- return;
- }
-
- schedule_work(cxl_cper_prot_err_work);
-#endif
-}
-
-int cxl_cper_register_prot_err_work(struct work_struct *work)
-{
- if (cxl_cper_prot_err_work)
- return -EINVAL;
-
- guard(spinlock)(&cxl_cper_prot_err_work_lock);
- cxl_cper_prot_err_work = work;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
-
-int cxl_cper_unregister_prot_err_work(struct work_struct *work)
-{
- if (cxl_cper_prot_err_work != work)
- return -EINVAL;
-
- guard(spinlock)(&cxl_cper_prot_err_work_lock);
- cxl_cper_prot_err_work = NULL;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
-
-int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
-{
- return kfifo_get(&cxl_cper_prot_err_fifo, wd);
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
-
-/* Room for 8 entries for each of the 4 event log queues */
-#define CXL_CPER_FIFO_DEPTH 32
-DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
-
-/* Synchronize schedule_work() with cxl_cper_work changes */
-static DEFINE_SPINLOCK(cxl_cper_work_lock);
-struct work_struct *cxl_cper_work;
-
-static void cxl_cper_post_event(enum cxl_event_type event_type,
- struct cxl_cper_event_rec *rec)
-{
- struct cxl_cper_work_data wd;
-
- if (rec->hdr.length <= sizeof(rec->hdr) ||
- rec->hdr.length > sizeof(*rec)) {
- pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
- rec->hdr.length);
- return;
- }
-
- if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
- pr_err(FW_WARN "CXL CPER invalid event\n");
- return;
- }
-
- guard(spinlock_irqsave)(&cxl_cper_work_lock);
-
- if (!cxl_cper_work)
- return;
-
- wd.event_type = event_type;
- memcpy(&wd.rec, rec, sizeof(wd.rec));
-
- if (!kfifo_put(&cxl_cper_fifo, wd)) {
- pr_err_ratelimited("CXL CPER kfifo overflow\n");
- return;
- }
-
- schedule_work(cxl_cper_work);
-}
-
-int cxl_cper_register_work(struct work_struct *work)
-{
- if (cxl_cper_work)
- return -EINVAL;
-
- guard(spinlock)(&cxl_cper_work_lock);
- cxl_cper_work = work;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
-
-int cxl_cper_unregister_work(struct work_struct *work)
-{
- if (cxl_cper_work != work)
- return -EINVAL;
-
- guard(spinlock)(&cxl_cper_work_lock);
- cxl_cper_work = NULL;
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
-
-int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
-{
- return kfifo_get(&cxl_cper_fifo, wd);
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
-
static void ghes_log_hwerr(int sev, guid_t *sec_type)
{
if (sev != CPER_SEV_RECOVERABLE)
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 627f6c712261..673dca208935 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -9,10 +9,12 @@
*
*/
+#include <linux/aer.h>
#include <linux/err.h>
#include <linux/genalloc.h>
#include <linux/irq_work.h>
#include <linux/io.h>
+#include <linux/kfifo.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/math64.h>
@@ -319,6 +321,139 @@ void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
schedule_work(&entry->work);
}
+
+/* Room for 8 entries */
+#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
+static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
+ CXL_CPER_PROT_ERR_FIFO_DEPTH);
+
+/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
+static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
+struct work_struct *cxl_cper_prot_err_work;
+
+void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+ int severity)
+{
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+ struct cxl_cper_prot_err_work_data wd;
+
+ if (cxl_cper_sec_prot_err_valid(prot_err))
+ return;
+
+ guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
+
+ if (!cxl_cper_prot_err_work)
+ return;
+
+ if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
+ return;
+
+ if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
+ pr_err_ratelimited("CXL CPER kfifo overflow\n");
+ return;
+ }
+
+ schedule_work(cxl_cper_prot_err_work);
+#endif
+}
+
+int cxl_cper_register_prot_err_work(struct work_struct *work)
+{
+ if (cxl_cper_prot_err_work)
+ return -EINVAL;
+
+ guard(spinlock)(&cxl_cper_prot_err_work_lock);
+ cxl_cper_prot_err_work = work;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
+
+int cxl_cper_unregister_prot_err_work(struct work_struct *work)
+{
+ if (cxl_cper_prot_err_work != work)
+ return -EINVAL;
+
+ guard(spinlock)(&cxl_cper_prot_err_work_lock);
+ cxl_cper_prot_err_work = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
+
+int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
+{
+ return kfifo_get(&cxl_cper_prot_err_fifo, wd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
+
+/* Room for 8 entries for each of the 4 event log queues */
+#define CXL_CPER_FIFO_DEPTH 32
+DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
+
+/* Synchronize schedule_work() with cxl_cper_work changes */
+static DEFINE_SPINLOCK(cxl_cper_work_lock);
+struct work_struct *cxl_cper_work;
+
+void cxl_cper_post_event(enum cxl_event_type event_type,
+ struct cxl_cper_event_rec *rec)
+{
+ struct cxl_cper_work_data wd;
+
+ if (rec->hdr.length <= sizeof(rec->hdr) ||
+ rec->hdr.length > sizeof(*rec)) {
+ pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
+ rec->hdr.length);
+ return;
+ }
+
+ if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
+ pr_err(FW_WARN "CXL CPER invalid event\n");
+ return;
+ }
+
+ guard(spinlock_irqsave)(&cxl_cper_work_lock);
+
+ if (!cxl_cper_work)
+ return;
+
+ wd.event_type = event_type;
+ memcpy(&wd.rec, rec, sizeof(wd.rec));
+
+ if (!kfifo_put(&cxl_cper_fifo, wd)) {
+ pr_err_ratelimited("CXL CPER kfifo overflow\n");
+ return;
+ }
+
+ schedule_work(cxl_cper_work);
+}
+
+int cxl_cper_register_work(struct work_struct *work)
+{
+ if (cxl_cper_work)
+ return -EINVAL;
+
+ guard(spinlock)(&cxl_cper_work_lock);
+ cxl_cper_work = work;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
+
+int cxl_cper_unregister_work(struct work_struct *work)
+{
+ if (cxl_cper_work != work)
+ return -EINVAL;
+
+ guard(spinlock)(&cxl_cper_work_lock);
+ cxl_cper_work = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
+
+int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
+{
+ return kfifo_get(&cxl_cper_fifo, wd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
+
/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index c5ff4c502017..4522e8699ce0 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -15,6 +15,7 @@
#include <linux/workqueue.h>
#include <acpi/ghes.h>
+#include <cxl/event.h>
#define GHES_PFX "GHES: "
@@ -99,5 +100,15 @@ void ghes_estatus_cache_add(struct acpi_hest_generic *generic,
struct acpi_hest_generic_status *estatus);
void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
int sev);
+void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+ int severity);
+int cxl_cper_register_prot_err_work(struct work_struct *work);
+int cxl_cper_unregister_prot_err_work(struct work_struct *work);
+int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd);
+void cxl_cper_post_event(enum cxl_event_type event_type,
+ struct cxl_cper_event_rec *rec);
+int cxl_cper_register_work(struct work_struct *work);
+int cxl_cper_unregister_work(struct work_struct *work);
+int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
#endif /* ACPI_APEI_GHES_CPER_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 08/11] ACPI: APEI: introduce GHES helper
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (6 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
` (3 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Add a dedicated GHES_CPER_HELPERS Kconfig entry so the shared helper code
can be built even when ACPI_APEI_GHES is disabled. Update the build glue
and headers to depend on the new symbol.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/Makefile | 1 +
drivers/acpi/Kconfig | 4 ++++
drivers/acpi/apei/Kconfig | 1 +
drivers/acpi/apei/Makefile | 2 +-
include/acpi/ghes.h | 10 ++++++----
include/cxl/event.h | 2 +-
6 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/Makefile b/drivers/Makefile
index 53fbd2e0acdd..3b98d3b44a35 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -31,6 +31,7 @@ obj-y += idle/
obj-y += char/ipmi/
obj-$(CONFIG_ACPI) += acpi/
+obj-$(CONFIG_GHES_CPER_HELPERS) += acpi/apei/ghes_cper.o
# PnP must come after ACPI since it will eventually need to check if acpi
# was used and do nothing if so
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index df0ff0764d0d..153ec8de6490 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -6,6 +6,10 @@
config ARCH_SUPPORTS_ACPI
bool
+config GHES_CPER_HELPERS
+ bool
+ select UEFI_CPER
+
menuconfig ACPI
bool "ACPI (Advanced Configuration and Power Interface) Support"
depends on ARCH_SUPPORTS_ACPI
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 070c07d68dfb..2f65070b4ba3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -21,6 +21,7 @@ config ACPI_APEI_GHES
bool "APEI Generic Hardware Error Source"
depends on ACPI_APEI
select ACPI_HED
+ select GHES_CPER_HELPERS
select IRQ_WORK
select GENERIC_ALLOCATOR
select ARM_SDE_INTERFACE if ARM64
diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
index b3774af70883..1a0b85923cd4 100644
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_ACPI_APEI) += apei.o
-obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o ghes_cper.o
+obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
# clang versions prior to 18 may blow out the stack with KASAN
ifeq ($(CONFIG_COMPILE_TEST)_$(CONFIG_CC_IS_CLANG)_$(call clang-min-version, 180000),y_y_)
KASAN_SANITIZE_ghes.o := n
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 7bea522c0657..fb9d53537b1e 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -72,15 +72,17 @@ int ghes_register_vendor_record_notifier(struct notifier_block *nb);
void ghes_unregister_vendor_record_notifier(struct notifier_block *nb);
struct list_head *ghes_get_devices(void);
-
-void ghes_estatus_pool_region_free(unsigned long addr, u32 size);
#else
static inline struct list_head *ghes_get_devices(void) { return NULL; }
-
-static inline void ghes_estatus_pool_region_free(unsigned long addr, u32 size) { return; }
#endif
+#ifdef CONFIG_GHES_CPER_HELPERS
int ghes_estatus_pool_init(unsigned int num_ghes);
+void ghes_estatus_pool_region_free(unsigned long addr, u32 size);
+#else
+static inline int ghes_estatus_pool_init(unsigned int num_ghes) { return -ENODEV; }
+static inline void ghes_estatus_pool_region_free(unsigned long addr, u32 size) { }
+#endif
static inline int acpi_hest_get_version(struct acpi_hest_generic_data *gdata)
{
diff --git a/include/cxl/event.h b/include/cxl/event.h
index ff97fea718d2..2ebd65b0d9d6 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -285,7 +285,7 @@ struct cxl_cper_prot_err_work_data {
int severity;
};
-#ifdef CONFIG_ACPI_APEI_GHES
+#ifdef CONFIG_GHES_CPER_HELPERS
int cxl_cper_register_work(struct work_struct *work);
int cxl_cper_unregister_work(struct work_struct *work);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (7 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 08/11] ACPI: APEI: introduce GHES helper Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-20 19:19 ` kernel test robot
` (3 more replies)
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
` (2 subsequent siblings)
11 siblings, 4 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Wire GHES up to the helper routines in ghes_cper.c and remove the local
copies from ghes.c. This keeps the control flow identical while letting
the helpers be shared with other firmware-first providers.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
drivers/acpi/apei/ghes.c | 415 +--------------------------------------
drivers/acpi/apei/ghes_cper.c | 438 +++++++++++++++++++++++++++++++++++++++++-
include/acpi/ghes_cper.h | 20 ++
3 files changed, 459 insertions(+), 414 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 136993704d52..25abd3594c89 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -67,8 +67,6 @@
#define FIX_APEI_GHES_SDEI_CRITICAL __end_of_fixed_addresses
#endif
-static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
-
/*
* This driver isn't really modular, however for the time being,
* continuing to use module_param is the easiest way to remain
@@ -113,421 +111,12 @@ static DEFINE_MUTEX(ghes_devs_mutex);
*/
static DEFINE_SPINLOCK(ghes_notify_lock_irq);
-struct gen_pool *ghes_estatus_pool;
-
-int ghes_estatus_pool_init(unsigned int num_ghes)
-{
- unsigned long addr, len;
- int rc;
-
- ghes_estatus_pool = gen_pool_create(GHES_ESTATUS_POOL_MIN_ALLOC_ORDER, -1);
- if (!ghes_estatus_pool)
- return -ENOMEM;
-
- len = GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX;
- len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
-
- addr = (unsigned long)vmalloc(PAGE_ALIGN(len));
- if (!addr)
- goto err_pool_alloc;
-
- rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1);
- if (rc)
- goto err_pool_add;
-
- return 0;
-
-err_pool_add:
- vfree((void *)addr);
-
-err_pool_alloc:
- gen_pool_destroy(ghes_estatus_pool);
-
- return -ENOMEM;
-}
-
-/**
- * ghes_estatus_pool_region_free - free previously allocated memory
- * from the ghes_estatus_pool.
- * @addr: address of memory to free.
- * @size: size of memory to free.
- *
- * Returns none.
- */
-void ghes_estatus_pool_region_free(unsigned long addr, u32 size)
-{
- gen_pool_free(ghes_estatus_pool, addr, size);
-}
-EXPORT_SYMBOL_GPL(ghes_estatus_pool_region_free);
-
-static inline int ghes_severity(int severity)
-{
- switch (severity) {
- case CPER_SEV_INFORMATIONAL:
- return GHES_SEV_NO;
- case CPER_SEV_CORRECTED:
- return GHES_SEV_CORRECTED;
- case CPER_SEV_RECOVERABLE:
- return GHES_SEV_RECOVERABLE;
- case CPER_SEV_FATAL:
- return GHES_SEV_PANIC;
- default:
- /* Unknown, go panic */
- return GHES_SEV_PANIC;
- }
-}
-
-
-/**
- * struct ghes_task_work - for synchronous RAS event
- *
- * @twork: callback_head for task work
- * @pfn: page frame number of corrupted page
- * @flags: work control flags
- *
- * Structure to pass task work to be handled before
- * returning to user-space via task_work_add().
- */
-struct ghes_task_work {
- struct callback_head twork;
- u64 pfn;
- int flags;
-};
-
-static void memory_failure_cb(struct callback_head *twork)
-{
- struct ghes_task_work *twcb = container_of(twork, struct ghes_task_work, twork);
- int ret;
-
- ret = memory_failure(twcb->pfn, twcb->flags);
- gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb));
-
- if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP)
- return;
-
- pr_err("%#llx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
- twcb->pfn, current->comm, task_pid_nr(current));
- force_sig(SIGBUS);
-}
-
-static bool ghes_do_memory_failure(u64 physical_addr, int flags)
-{
- struct ghes_task_work *twcb;
- unsigned long pfn;
-
- if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
- return false;
-
- pfn = PHYS_PFN(physical_addr);
-
- if (flags == MF_ACTION_REQUIRED && current->mm) {
- twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb));
- if (!twcb)
- return false;
-
- twcb->pfn = pfn;
- twcb->flags = flags;
- init_task_work(&twcb->twork, memory_failure_cb);
- task_work_add(current, &twcb->twork, TWA_RESUME);
- return true;
- }
-
- memory_failure_queue(pfn, flags);
- return true;
-}
-
-static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
- int sev, bool sync)
-{
- int flags = -1;
- int sec_sev = ghes_severity(gdata->error_severity);
- struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
-
- if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
- return false;
-
- /* iff following two events can be handled properly by now */
- if (sec_sev == GHES_SEV_CORRECTED &&
- (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
- flags = MF_SOFT_OFFLINE;
- if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
- flags = sync ? MF_ACTION_REQUIRED : 0;
-
- if (flags != -1)
- return ghes_do_memory_failure(mem_err->physical_addr, flags);
-
- return false;
-}
-
-static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
- int sev, bool sync)
-{
- struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
- int flags = sync ? MF_ACTION_REQUIRED : 0;
- int length = gdata->error_data_length;
- char error_type[120];
- bool queued = false;
- int sec_sev, i;
- char *p;
-
- sec_sev = ghes_severity(gdata->error_severity);
- if (length >= sizeof(*err)) {
- log_arm_hw_error(err, sec_sev);
- } else {
- pr_warn(FW_BUG "arm error length: %d\n", length);
- pr_warn(FW_BUG "length is too small\n");
- pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
- return false;
- }
-
- if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
- return false;
-
- p = (char *)(err + 1);
- length -= sizeof(err);
-
- for (i = 0; i < err->err_info_num; i++) {
- struct cper_arm_err_info *err_info;
- bool is_cache, has_pa;
-
- /* Ensure we have enough data for the error info header */
- if (length < sizeof(*err_info))
- break;
-
- err_info = (struct cper_arm_err_info *)p;
-
- /* Validate the claimed length before using it */
- length -= err_info->length;
- if (length < 0)
- break;
-
- is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
- has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
-
- /*
- * The field (err_info->error_info & BIT(26)) is fixed to set to
- * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
- * firmware won't mix corrected errors in an uncorrected section,
- * and don't filter out 'corrected' error here.
- */
- if (is_cache && has_pa) {
- queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
- p += err_info->length;
- continue;
- }
-
- cper_bits_to_str(error_type, sizeof(error_type),
- FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
- cper_proc_error_type_strs,
- ARRAY_SIZE(cper_proc_error_type_strs));
-
- pr_warn_ratelimited(FW_WARN GHES_PFX
- "Unhandled processor error type 0x%02x: %s%s\n",
- err_info->type, error_type,
- (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
- p += err_info->length;
- }
-
- return queued;
-}
-
-/*
- * PCIe AER errors need to be sent to the AER driver for reporting and
- * recovery. The GHES severities map to the following AER severities and
- * require the following handling:
- *
- * GHES_SEV_CORRECTABLE -> AER_CORRECTABLE
- * These need to be reported by the AER driver but no recovery is
- * necessary.
- * GHES_SEV_RECOVERABLE -> AER_NONFATAL
- * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
- * These both need to be reported and recovered from by the AER driver.
- * GHES_SEV_PANIC does not make it to this handling since the kernel must
- * panic.
- */
-static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
-{
-#ifdef CONFIG_ACPI_APEI_PCIEAER
- struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
-
- if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
- pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
- unsigned int devfn;
- int aer_severity;
- u8 *aer_info;
-
- devfn = PCI_DEVFN(pcie_err->device_id.device,
- pcie_err->device_id.function);
- aer_severity = cper_severity_to_aer(gdata->error_severity);
-
- /*
- * If firmware reset the component to contain
- * the error, we must reinitialize it before
- * use, so treat it as a fatal AER error.
- */
- if (gdata->flags & CPER_SEC_RESET)
- aer_severity = AER_FATAL;
-
- aer_info = (void *)gen_pool_alloc(ghes_estatus_pool,
- sizeof(struct aer_capability_regs));
- if (!aer_info)
- return;
- memcpy(aer_info, pcie_err->aer_info, sizeof(struct aer_capability_regs));
-
- aer_recover_queue(pcie_err->device_id.segment,
- pcie_err->device_id.bus,
- devfn, aer_severity,
- (struct aer_capability_regs *)
- aer_info);
- }
-#endif
-}
-
-static void ghes_log_hwerr(int sev, guid_t *sec_type)
-{
- if (sev != CPER_SEV_RECOVERABLE)
- return;
-
- if (guid_equal(sec_type, &CPER_SEC_PROC_ARM) ||
- guid_equal(sec_type, &CPER_SEC_PROC_GENERIC) ||
- guid_equal(sec_type, &CPER_SEC_PROC_IA)) {
- hwerr_log_error_type(HWERR_RECOV_CPU);
- return;
- }
-
- if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR) ||
- guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
- guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
- guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
- hwerr_log_error_type(HWERR_RECOV_CXL);
- return;
- }
-
- if (guid_equal(sec_type, &CPER_SEC_PCIE) ||
- guid_equal(sec_type, &CPER_SEC_PCI_X_BUS)) {
- hwerr_log_error_type(HWERR_RECOV_PCI);
- return;
- }
-
- if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
- hwerr_log_error_type(HWERR_RECOV_MEMORY);
- return;
- }
-
- hwerr_log_error_type(HWERR_RECOV_OTHERS);
-}
static void ghes_do_proc(struct ghes *ghes,
const struct acpi_hest_generic_status *estatus)
{
- int sev, sec_sev;
- struct acpi_hest_generic_data *gdata;
- guid_t *sec_type;
- const guid_t *fru_id = &guid_null;
- char *fru_text = "";
- bool queued = false;
- bool sync = is_hest_sync_notify(ghes);
-
- sev = ghes_severity(estatus->error_severity);
- apei_estatus_for_each_section(estatus, gdata) {
- sec_type = (guid_t *)gdata->section_type;
- sec_sev = ghes_severity(gdata->error_severity);
- if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
- fru_id = (guid_t *)gdata->fru_id;
-
- if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
- fru_text = gdata->fru_text;
-
- ghes_log_hwerr(sev, sec_type);
- if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
- struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
-
- atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
-
- arch_apei_report_mem_error(sev, mem_err);
- queued = ghes_handle_memory_failure(gdata, sev, sync);
- } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
- ghes_handle_aer(gdata);
- } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
- queued = ghes_handle_arm_hw_error(gdata, sev, sync);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
- struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
-
- cxl_cper_post_prot_err(prot_err, gdata->error_severity);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
- struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
- struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
- } else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
- struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
-
- cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
- } else {
- void *err = acpi_hest_get_payload(gdata);
-
- ghes_defer_non_standard_event(gdata, sev);
- log_non_standard_event(sec_type, fru_id, fru_text,
- sec_sev, err,
- gdata->error_data_length);
- }
- }
-
- /*
- * If no memory failure work is queued for abnormal synchronous
- * errors, do a force kill.
- */
- if (sync && !queued) {
- dev_err(ghes->dev,
- HW_ERR GHES_PFX "%s:%d: synchronous unrecoverable error (SIGBUS)\n",
- current->comm, task_pid_nr(current));
- force_sig(SIGBUS);
- }
-}
-
-static void __ghes_print_estatus(const char *pfx,
- const struct acpi_hest_generic *generic,
- const struct acpi_hest_generic_status *estatus)
-{
- static atomic_t seqno;
- unsigned int curr_seqno;
- char pfx_seq[64];
-
- if (pfx == NULL) {
- if (ghes_severity(estatus->error_severity) <=
- GHES_SEV_CORRECTED)
- pfx = KERN_WARNING;
- else
- pfx = KERN_ERR;
- }
- curr_seqno = atomic_inc_return(&seqno);
- snprintf(pfx_seq, sizeof(pfx_seq), "%s{%u}" HW_ERR, pfx, curr_seqno);
- printk("%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
- pfx_seq, generic->header.source_id);
- cper_estatus_print(pfx_seq, estatus);
-}
-
-static int ghes_print_estatus(const char *pfx,
- const struct acpi_hest_generic *generic,
- const struct acpi_hest_generic_status *estatus)
-{
- /* Not more than 2 messages every 5 seconds */
- static DEFINE_RATELIMIT_STATE(ratelimit_corrected, 5*HZ, 2);
- static DEFINE_RATELIMIT_STATE(ratelimit_uncorrected, 5*HZ, 2);
- struct ratelimit_state *ratelimit;
-
- if (ghes_severity(estatus->error_severity) <= GHES_SEV_CORRECTED)
- ratelimit = &ratelimit_corrected;
- else
- ratelimit = &ratelimit_uncorrected;
- if (__ratelimit(ratelimit)) {
- __ghes_print_estatus(pfx, generic, estatus);
- return 1;
- }
- return 0;
+ ghes_cper_handle_status(ghes->dev, ghes->generic,
+ estatus, is_hest_sync_notify(ghes));
}
static void __ghes_panic(struct ghes *ghes,
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 673dca208935..29b790160e91 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -10,22 +10,31 @@
*/
#include <linux/aer.h>
+#include <linux/device.h>
#include <linux/err.h>
#include <linux/genalloc.h>
-#include <linux/irq_work.h>
#include <linux/io.h>
+#include <linux/irq_work.h>
#include <linux/kfifo.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/math64.h>
#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/uuid.h>
+#include <linux/sched/signal.h>
+#include <linux/task_work.h>
#include <linux/notifier.h>
#include <linux/llist.h>
+#include <linux/ras.h>
+#include <ras/ras_event.h>
#include <linux/ratelimit.h>
#include <linux/rcupdate.h>
#include <linux/rculist.h>
#include <linux/sched/clock.h>
#include <linux/slab.h>
+#include <linux/vmcore_info.h>
+#include <linux/vmalloc.h>
#include <acpi/apei.h>
#include <acpi/ghes_cper.h>
@@ -35,9 +44,363 @@
#include "apei-internal.h"
+ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
+
+#ifndef CONFIG_ACPI_APEI
+void __weak arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) { }
+#endif
+
static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
static atomic_t ghes_estatus_cache_alloced;
+struct gen_pool *ghes_estatus_pool;
+
+int ghes_estatus_pool_init(unsigned int num_ghes)
+{
+ unsigned long addr, len;
+ int rc;
+
+ ghes_estatus_pool = gen_pool_create(GHES_ESTATUS_POOL_MIN_ALLOC_ORDER, -1);
+ if (!ghes_estatus_pool)
+ return -ENOMEM;
+
+ len = GHES_ESTATUS_CACHE_AVG_SIZE * GHES_ESTATUS_CACHE_ALLOCED_MAX;
+ len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);
+
+ addr = (unsigned long)vmalloc(PAGE_ALIGN(len));
+ if (!addr)
+ goto err_pool_alloc;
+
+ rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1);
+ if (rc)
+ goto err_pool_add;
+
+ return 0;
+
+err_pool_add:
+ vfree((void *)addr);
+
+err_pool_alloc:
+ gen_pool_destroy(ghes_estatus_pool);
+
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(ghes_estatus_pool_init);
+
+/**
+ * ghes_estatus_pool_region_free - free previously allocated memory
+ * from the ghes_estatus_pool.
+ * @addr: address of memory to free.
+ * @size: size of memory to free.
+ *
+ * Returns none.
+ */
+void ghes_estatus_pool_region_free(unsigned long addr, u32 size)
+{
+ gen_pool_free(ghes_estatus_pool, addr, size);
+}
+EXPORT_SYMBOL_GPL(ghes_estatus_pool_region_free);
+
+int ghes_severity(int severity)
+{
+ switch (severity) {
+ case CPER_SEV_INFORMATIONAL:
+ return GHES_SEV_NO;
+ case CPER_SEV_CORRECTED:
+ return GHES_SEV_CORRECTED;
+ case CPER_SEV_RECOVERABLE:
+ return GHES_SEV_RECOVERABLE;
+ case CPER_SEV_FATAL:
+ return GHES_SEV_PANIC;
+ default:
+ /* Unknown, go panic */
+ return GHES_SEV_PANIC;
+ }
+}
+
+
+/**
+ * struct ghes_task_work - for synchronous RAS event
+ *
+ * @twork: callback_head for task work
+ * @pfn: page frame number of corrupted page
+ * @flags: work control flags
+ *
+ * Structure to pass task work to be handled before
+ * returning to user-space via task_work_add().
+ */
+struct ghes_task_work {
+ struct callback_head twork;
+ u64 pfn;
+ int flags;
+};
+
+static void memory_failure_cb(struct callback_head *twork)
+{
+ struct ghes_task_work *twcb = container_of(twork, struct ghes_task_work, twork);
+ int ret;
+
+ ret = memory_failure(twcb->pfn, twcb->flags);
+ gen_pool_free(ghes_estatus_pool, (unsigned long)twcb, sizeof(*twcb));
+
+ if (!ret || ret == -EHWPOISON || ret == -EOPNOTSUPP)
+ return;
+
+ pr_err("%#llx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
+ twcb->pfn, current->comm, task_pid_nr(current));
+ force_sig(SIGBUS);
+}
+
+static bool ghes_do_memory_failure(u64 physical_addr, int flags)
+{
+ struct ghes_task_work *twcb;
+ unsigned long pfn;
+
+ if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+ return false;
+
+ pfn = PHYS_PFN(physical_addr);
+
+ if (flags == MF_ACTION_REQUIRED && current->mm) {
+ twcb = (void *)gen_pool_alloc(ghes_estatus_pool, sizeof(*twcb));
+ if (!twcb)
+ return false;
+
+ twcb->pfn = pfn;
+ twcb->flags = flags;
+ init_task_work(&twcb->twork, memory_failure_cb);
+ task_work_add(current, &twcb->twork, TWA_RESUME);
+ return true;
+ }
+
+ memory_failure_queue(pfn, flags);
+ return true;
+}
+
+bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
+ int sev, bool sync)
+{
+ int flags = -1;
+ int sec_sev = ghes_severity(gdata->error_severity);
+ struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
+
+ if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
+ return false;
+
+ /* iff following two events can be handled properly by now */
+ if (sec_sev == GHES_SEV_CORRECTED &&
+ (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
+ flags = MF_SOFT_OFFLINE;
+ if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
+ flags = sync ? MF_ACTION_REQUIRED : 0;
+
+ if (flags != -1)
+ return ghes_do_memory_failure(mem_err->physical_addr, flags);
+
+ return false;
+}
+
+bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
+ int sev, bool sync)
+{
+ struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
+ int flags = sync ? MF_ACTION_REQUIRED : 0;
+ int length = gdata->error_data_length;
+ char error_type[120];
+ bool queued = false;
+ int sec_sev, i;
+ char *p;
+
+ sec_sev = ghes_severity(gdata->error_severity);
+ if (length >= sizeof(*err)) {
+ log_arm_hw_error(err, sec_sev);
+ } else {
+ pr_warn(FW_BUG "arm error length: %d\n", length);
+ pr_warn(FW_BUG "length is too small\n");
+ pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
+ return false;
+ }
+
+ if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
+ return false;
+
+ p = (char *)(err + 1);
+ length -= sizeof(err);
+
+ for (i = 0; i < err->err_info_num; i++) {
+ struct cper_arm_err_info *err_info;
+ bool is_cache, has_pa;
+
+ /* Ensure we have enough data for the error info header */
+ if (length < sizeof(*err_info))
+ break;
+
+ err_info = (struct cper_arm_err_info *)p;
+
+ /* Validate the claimed length before using it */
+ length -= err_info->length;
+ if (length < 0)
+ break;
+
+ is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
+ has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
+
+ /*
+ * The field (err_info->error_info & BIT(26)) is fixed to set to
+ * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
+ * firmware won't mix corrected errors in an uncorrected section,
+ * and don't filter out 'corrected' error here.
+ */
+ if (is_cache && has_pa) {
+ queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
+ p += err_info->length;
+ continue;
+ }
+
+ cper_bits_to_str(error_type, sizeof(error_type),
+ FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
+ cper_proc_error_type_strs,
+ ARRAY_SIZE(cper_proc_error_type_strs));
+
+ pr_warn_ratelimited(FW_WARN GHES_PFX
+ "Unhandled processor error type 0x%02x: %s%s\n",
+ err_info->type, error_type,
+ (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
+ p += err_info->length;
+ }
+
+ return queued;
+}
+
+/*
+ * PCIe AER errors need to be sent to the AER driver for reporting and
+ * recovery. The GHES severities map to the following AER severities and
+ * require the following handling:
+ *
+ * GHES_SEV_CORRECTABLE -> AER_CORRECTABLE
+ * These need to be reported by the AER driver but no recovery is
+ * necessary.
+ * GHES_SEV_RECOVERABLE -> AER_NONFATAL
+ * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
+ * These both need to be reported and recovered from by the AER driver.
+ * GHES_SEV_PANIC does not make it to this handling since the kernel must
+ * panic.
+ */
+void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
+{
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+ if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
+ unsigned int devfn;
+ int aer_severity;
+ u8 *aer_info;
+
+ devfn = PCI_DEVFN(pcie_err->device_id.device,
+ pcie_err->device_id.function);
+ aer_severity = cper_severity_to_aer(gdata->error_severity);
+
+ /*
+ * If firmware reset the component to contain
+ * the error, we must reinitialize it before
+ * use, so treat it as a fatal AER error.
+ */
+ if (gdata->flags & CPER_SEC_RESET)
+ aer_severity = AER_FATAL;
+
+ aer_info = (void *)gen_pool_alloc(ghes_estatus_pool,
+ sizeof(struct aer_capability_regs));
+ if (!aer_info)
+ return;
+ memcpy(aer_info, pcie_err->aer_info, sizeof(struct aer_capability_regs));
+
+ aer_recover_queue(pcie_err->device_id.segment,
+ pcie_err->device_id.bus,
+ devfn, aer_severity,
+ (struct aer_capability_regs *)
+ aer_info);
+ }
+#endif
+}
+
+void ghes_log_hwerr(int sev, guid_t *sec_type)
+{
+ if (sev != CPER_SEV_RECOVERABLE)
+ return;
+
+ if (guid_equal(sec_type, &CPER_SEC_PROC_ARM) ||
+ guid_equal(sec_type, &CPER_SEC_PROC_GENERIC) ||
+ guid_equal(sec_type, &CPER_SEC_PROC_IA)) {
+ hwerr_log_error_type(HWERR_RECOV_CPU);
+ return;
+ }
+
+ if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
+ guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ hwerr_log_error_type(HWERR_RECOV_CXL);
+ return;
+ }
+
+ if (guid_equal(sec_type, &CPER_SEC_PCIE) ||
+ guid_equal(sec_type, &CPER_SEC_PCI_X_BUS)) {
+ hwerr_log_error_type(HWERR_RECOV_PCI);
+ return;
+ }
+
+ if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
+ hwerr_log_error_type(HWERR_RECOV_MEMORY);
+ return;
+ }
+
+ hwerr_log_error_type(HWERR_RECOV_OTHERS);
+}
+
+void __ghes_print_estatus(const char *pfx,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus)
+{
+ static atomic_t seqno;
+ unsigned int curr_seqno;
+ char pfx_seq[64];
+
+ if (pfx == NULL) {
+ if (ghes_severity(estatus->error_severity) <=
+ GHES_SEV_CORRECTED)
+ pfx = KERN_WARNING;
+ else
+ pfx = KERN_ERR;
+ }
+ curr_seqno = atomic_inc_return(&seqno);
+ snprintf(pfx_seq, sizeof(pfx_seq), "%s{%u}" HW_ERR, pfx, curr_seqno);
+ printk("%s""Hardware error from APEI Generic Hardware Error Source: %d\n",
+ pfx_seq, generic->header.source_id);
+ cper_estatus_print(pfx_seq, estatus);
+}
+
+int ghes_print_estatus(const char *pfx,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus)
+{
+ /* Not more than 2 messages every 5 seconds */
+ static DEFINE_RATELIMIT_STATE(ratelimit_corrected, 5*HZ, 2);
+ static DEFINE_RATELIMIT_STATE(ratelimit_uncorrected, 5*HZ, 2);
+ struct ratelimit_state *ratelimit;
+
+ if (ghes_severity(estatus->error_severity) <= GHES_SEV_CORRECTED)
+ ratelimit = &ratelimit_corrected;
+ else
+ ratelimit = &ratelimit_uncorrected;
+ if (__ratelimit(ratelimit)) {
+ __ghes_print_estatus(pfx, generic, estatus);
+ return 1;
+ }
+ return 0;
+}
+
+#ifdef CONFIG_ACPI_APEI
static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
{
phys_addr_t paddr;
@@ -269,6 +632,7 @@ void ghes_clear_estatus(struct ghes *ghes,
if (is_hest_type_generic_v2(ghes))
ghes_ack_error(ghes->generic_v2);
}
+#endif /* CONFIG_ACPI_APEI */
static BLOCKING_NOTIFIER_HEAD(vendor_record_notify_list);
@@ -322,6 +686,78 @@ void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
}
+void ghes_cper_handle_status(struct device *dev,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus,
+ bool sync)
+{
+ int sev, sec_sev;
+ struct acpi_hest_generic_data *gdata;
+ guid_t *sec_type;
+ const guid_t *fru_id = &guid_null;
+ char *fru_text = "";
+ bool queued = false;
+
+ sev = ghes_severity(estatus->error_severity);
+ apei_estatus_for_each_section(estatus, gdata) {
+ sec_type = (guid_t *)gdata->section_type;
+ sec_sev = ghes_severity(gdata->error_severity);
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+ fru_id = (guid_t *)gdata->fru_id;
+
+ if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+ fru_text = gdata->fru_text;
+
+ ghes_log_hwerr(sev, sec_type);
+ if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
+ struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
+
+ atomic_notifier_call_chain(&ghes_report_chain, sev, mem_err);
+
+ arch_apei_report_mem_error(sev, mem_err);
+ queued = ghes_handle_memory_failure(gdata, sev, sync);
+ } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+ ghes_handle_aer(gdata);
+ } else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
+ queued = ghes_handle_arm_hw_error(gdata, sev, sync);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
+ struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_prot_err(prot_err, gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
+ struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
+
+ cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
+ } else {
+ void *err = acpi_hest_get_payload(gdata);
+
+ ghes_defer_non_standard_event(gdata, sev);
+ log_non_standard_event(sec_type, fru_id, fru_text,
+ sec_sev, err,
+ gdata->error_data_length);
+ }
+ }
+
+ /*
+ * If no memory failure work is queued for abnormal synchronous
+ * errors, do a force kill.
+ */
+ if (sync && !queued) {
+ dev_err(dev,
+ HW_ERR GHES_PFX "%s:%d: synchronous unrecoverable error (SIGBUS)\n",
+ current->comm, task_pid_nr(current));
+ force_sig(SIGBUS);
+ }
+}
+
/* Room for 8 entries */
#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index 4522e8699ce0..f7c9fba62585 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -12,6 +12,8 @@
#define ACPI_APEI_GHES_CPER_H
#include <linux/atomic.h>
+#include <linux/device.h>
+#include <linux/notifier.h>
#include <linux/workqueue.h>
#include <acpi/ghes.h>
@@ -52,6 +54,7 @@
((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
extern struct gen_pool *ghes_estatus_pool;
+extern struct atomic_notifier_head ghes_report_chain;
static inline bool is_hest_type_generic_v2(struct ghes *ghes)
{
@@ -100,6 +103,23 @@ void ghes_estatus_cache_add(struct acpi_hest_generic *generic,
struct acpi_hest_generic_status *estatus);
void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
int sev);
+int ghes_severity(int severity);
+bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
+ int sev, bool sync);
+bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
+ int sev, bool sync);
+void ghes_handle_aer(struct acpi_hest_generic_data *gdata);
+void ghes_log_hwerr(int sev, guid_t *sec_type);
+void __ghes_print_estatus(const char *pfx,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus);
+int ghes_print_estatus(const char *pfx,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus);
+void ghes_cper_handle_status(struct device *dev,
+ const struct acpi_hest_generic *generic,
+ const struct acpi_hest_generic_status *estatus,
+ bool sync);
void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
int severity);
int cxl_cper_register_prot_err_work(struct work_struct *work);
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (8 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-26 7:03 ` Himanshu Chauhan
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-26 7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
11 siblings, 1 reply; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Describe the DeviceTree node that exposes the Arm firmware-first handler
CPER provider and hook the file into MAINTAINERS so the binding has an
owner.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
.../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++++++++++++++++++++++
MAINTAINERS | 5 ++
2 files changed, 76 insertions(+)
diff --git a/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
new file mode 100644
index 000000000000..eccbaaf45885
--- /dev/null
+++ b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/firmware/arm,ras-ffh.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Firmware-First Handler (FFH) CPER provider
+
+maintainers:
+ - Ahmed Tiba <ahmed.tiba@arm.com>
+
+description: |
+ Arm Reliability, Availability and Serviceability (RAS) firmware can expose
+ a firmware-first handler (FFH) that provides UEFI CPER Generic Error Status
+ blocks directly via DeviceTree. The firmware owns the CPER buffer
+ and notifies the OS through an interrupt.
+
+properties:
+ compatible:
+ const: arm,ras-ffh
+
+ reg:
+ minItems: 1
+ items:
+ - description:
+ CPER Generic Error Status block exposed by firmware
+ - description:
+ Optional 32- or 64-bit doorbell register used on platforms
+ where firmware needs an explicit "ack" handshake before overwriting
+ the CPER buffer. Firmware watches bit 0 and expects the OS to set it
+ once the current status block has been consumed.
+
+ interrupts:
+ maxItems: 1
+ description:
+ Interrupt used to signal that a new status record is ready.
+
+ memory-region:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ Optional phandle to the reserved-memory entry that backs the status
+ buffer so firmware and the OS use the same carved-out region.
+
+required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ reserved-memory {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ ras_cper_buffer: cper@fe800000 {
+ reg = <0x0 0xfe800000 0x0 0x1000>;
+ no-map;
+ };
+ };
+
+ error-handler@fe800000 {
+ compatible = "arm,ras-ffh";
+ reg = <0xfe800000 0x1000>,
+ <0xfe810000 0x4>;
+ memory-region = <&ras_cper_buffer>;
+ interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>;
+ };
+...
diff --git a/MAINTAINERS b/MAINTAINERS
index b8d8a5c41597..47db7877b485 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22027,6 +22027,11 @@ M: Alexandre Bounine <alex.bou9@gmail.com>
S: Maintained
F: drivers/rapidio/
+RAS ERROR STATUS
+M: Ahmed Tiba <ahmed.tiba@arm.com>
+S: Maintained
+F: Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
+
RAS INFRASTRUCTURE
M: Tony Luck <tony.luck@intel.com>
M: Borislav Petkov <bp@alien8.de>
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (9 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
@ 2026-02-20 13:42 ` Ahmed Tiba
2026-02-21 9:06 ` Krzysztof Kozlowski
` (2 more replies)
2026-02-26 7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
11 siblings, 3 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-20 13:42 UTC (permalink / raw)
To: devicetree, linux-acpi
Cc: Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp, robh, rafael,
will, conor, linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2,
tony.luck
Add a DeviceTree firmware-first CPER provider that reuses the shared
GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
the admin guide. Update MAINTAINERS now that the driver exists.
Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
---
Documentation/admin-guide/RAS/main.rst | 18 +++
MAINTAINERS | 1 +
drivers/acpi/apei/apei-internal.h | 10 +-
drivers/acpi/apei/ghes_cper.c | 2 +
drivers/ras/Kconfig | 12 ++
drivers/ras/Makefile | 1 +
drivers/ras/esource-dt.c | 264 +++++++++++++++++++++++++++++++++
include/acpi/ghes_cper.h | 9 ++
8 files changed, 308 insertions(+), 9 deletions(-)
diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
index 5a45db32c49b..4ffabaaeabb1 100644
--- a/Documentation/admin-guide/RAS/main.rst
+++ b/Documentation/admin-guide/RAS/main.rst
@@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
.. [#f3] For more details about the Machine Check Architecture (MCA),
please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
+Firmware-first CPER via DeviceTree
+----------------------------------
+
+Some systems expose Common Platform Error Record (CPER) data
+via DeviceTree instead of ACPI HEST tables.
+Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
+driver and describe the CPER error source buffer with the
+``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
+The driver reuses the GHES CPER helper object in
+``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
+memory failure handling match the ACPI GHES behaviour even when
+ACPI is disabled.
+
+Once a platform describes a firmware-first provider, both ACPI GHES and the
+DeviceTree driver reuse the same code paths. This keeps the behaviour
+consistent regardless of whether the error source is described via ACPI
+tables or DeviceTree.
+
EDAC - Error Detection And Correction
*************************************
diff --git a/MAINTAINERS b/MAINTAINERS
index 47db7877b485..fa6113b482b7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22031,6 +22031,7 @@ RAS ERROR STATUS
M: Ahmed Tiba <ahmed.tiba@arm.com>
S: Maintained
F: Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
+F: drivers/ras/esource-dt.c
RAS INFRASTRUCTURE
M: Tony Luck <tony.luck@intel.com>
diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
index 77c10a7a7a9f..c16ac541f15b 100644
--- a/drivers/acpi/apei/apei-internal.h
+++ b/drivers/acpi/apei/apei-internal.h
@@ -8,6 +8,7 @@
#define APEI_INTERNAL_H
#include <linux/acpi.h>
+#include <acpi/ghes_cper.h>
struct apei_exec_context;
@@ -120,15 +121,6 @@ int apei_exec_collect_resources(struct apei_exec_context *ctx,
struct dentry;
struct dentry *apei_get_debugfs_dir(void);
-static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
-{
- if (estatus->raw_data_length)
- return estatus->raw_data_offset + \
- estatus->raw_data_length;
- else
- return sizeof(*estatus) + estatus->data_length;
-}
-
int apei_osc_setup(void);
int einj_get_available_error_type(u32 *type, int einj_action);
diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
index 29b790160e91..9b2d1b8cf9f4 100644
--- a/drivers/acpi/apei/ghes_cper.c
+++ b/drivers/acpi/apei/ghes_cper.c
@@ -42,7 +42,9 @@
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
+#ifdef CONFIG_ACPI_APEI
#include "apei-internal.h"
+#endif
ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..ea6d96713020 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -34,6 +34,18 @@ if RAS
source "arch/x86/ras/Kconfig"
source "drivers/ras/amd/atl/Kconfig"
+config RAS_ESOURCE_DT
+ bool "DeviceTree firmware-first CPER error source block provider"
+ depends on OF
+ depends on ARM64
+ select GHES_CPER_HELPERS
+ help
+ Enable support for firmware-first Common Platform Error Record (CPER)
+ error source block providers that are described via DeviceTree
+ instead of ACPI HEST tables. The driver reuses the existing GHES
+ CPER helpers so the error processing matches the ACPI code paths,
+ but it can be built even when ACPI is disabled.
+
config RAS_FMPM
tristate "FRU Memory Poison Manager"
default m
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..53558a1707b3 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_RAS) += ras.o
obj-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RAS_CEC) += cec.o
+obj-$(CONFIG_RAS_ESOURCE_DT) += esource-dt.o
obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
obj-y += amd/atl/
diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
new file mode 100644
index 000000000000..b575a2258536
--- /dev/null
+++ b/drivers/ras/esource-dt.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * DeviceTree provider for firmware-first CPER error source block.
+ *
+ * This driver shares the GHES CPER helpers so we keep the reporting and
+ * notifier behaviour identical to ACPI GHES
+ *
+ * Copyright (C) 2025 ARM Ltd.
+ * Author: Ahmed Tiba <ahmed.tiba@arm.com>
+ */
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/panic.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <acpi/ghes.h>
+#include <acpi/ghes_cper.h>
+
+static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
+
+struct ghes_ffh_ack {
+ void __iomem *addr;
+ u64 preserve;
+ u64 set;
+ u8 width;
+ bool present;
+};
+
+struct ghes_ffh {
+ struct device *dev;
+ void __iomem *status;
+ size_t status_len;
+
+ struct ghes_ffh_ack ack;
+
+ struct acpi_hest_generic *generic;
+ struct acpi_hest_generic_status *estatus;
+
+ bool sync;
+ int irq;
+
+ /* Serializes access to the firmware-owned buffer. */
+ spinlock_t lock;
+};
+
+static int ghes_ffh_init_pool(void)
+{
+ if (ghes_estatus_pool)
+ return 0;
+
+ return ghes_estatus_pool_init(1);
+}
+
+static int ghes_ffh_copy_status(struct ghes_ffh *ctx)
+{
+ memcpy_fromio(ctx->estatus, ctx->status, ctx->status_len);
+ return 0;
+}
+
+static void ghes_ffh_ack(struct ghes_ffh *ctx)
+{
+ u64 val;
+
+ if (!ctx->ack.present)
+ return;
+
+ if (ctx->ack.width == 64) {
+ val = readq(ctx->ack.addr);
+ val &= ctx->ack.preserve;
+ val |= ctx->ack.set;
+ writeq(val, ctx->ack.addr);
+ } else {
+ val = readl(ctx->ack.addr);
+ val &= (u32)ctx->ack.preserve;
+ val |= (u32)ctx->ack.set;
+ writel(val, ctx->ack.addr);
+ }
+}
+
+static void ghes_ffh_fatal(struct ghes_ffh *ctx)
+{
+ __ghes_print_estatus(KERN_EMERG, ctx->generic, ctx->estatus);
+ add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
+ panic("GHES: fatal firmware-first CPER record from %s\n",
+ dev_name(ctx->dev));
+}
+
+static void ghes_ffh_process(struct ghes_ffh *ctx)
+{
+ unsigned long flags;
+ int sev;
+
+ spin_lock_irqsave(&ctx->lock, flags);
+
+ if (ghes_ffh_copy_status(ctx))
+ goto out;
+
+ sev = ghes_severity(ctx->estatus->error_severity);
+ if (sev >= GHES_SEV_PANIC)
+ ghes_ffh_fatal(ctx);
+
+ if (!ghes_estatus_cached(ctx->estatus)) {
+ if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
+ ghes_estatus_cache_add(ctx->generic, ctx->estatus);
+ }
+
+ ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
+
+ ghes_ffh_ack(ctx);
+
+out:
+ spin_unlock_irqrestore(&ctx->lock, flags);
+}
+
+static irqreturn_t ghes_ffh_irq(int irq, void *data)
+{
+ struct ghes_ffh *ctx = data;
+
+ ghes_ffh_process(ctx);
+
+ return IRQ_HANDLED;
+}
+
+static int ghes_ffh_init_ack(struct platform_device *pdev,
+ struct ghes_ffh *ctx)
+{
+ struct resource *res;
+ size_t size;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+ if (!res)
+ return 0;
+
+ ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(ctx->ack.addr))
+ return PTR_ERR(ctx->ack.addr);
+
+ size = resource_size(res);
+ switch (size) {
+ case 4:
+ ctx->ack.width = 32;
+ ctx->ack.preserve = ~0U;
+ break;
+ case 8:
+ ctx->ack.width = 64;
+ ctx->ack.preserve = ~0ULL;
+ break;
+ default:
+ dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
+ return -EINVAL;
+ }
+
+ ctx->ack.set = BIT_ULL(0);
+ ctx->ack.present = true;
+ return 0;
+}
+
+static int ghes_ffh_probe(struct platform_device *pdev)
+{
+ struct ghes_ffh *ctx;
+ struct resource *res;
+ int rc;
+
+ ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return -ENOMEM;
+
+ spin_lock_init(&ctx->lock);
+ ctx->dev = &pdev->dev;
+ ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (!res) {
+ dev_err(&pdev->dev, "status region missing\n");
+ return -EINVAL;
+ }
+
+ ctx->status_len = resource_size(res);
+ if (!ctx->status_len) {
+ dev_err(&pdev->dev, "Status region has zero length\n");
+ return -EINVAL;
+ }
+
+ ctx->status = devm_ioremap_resource(&pdev->dev, res);
+ if (IS_ERR(ctx->status))
+ return PTR_ERR(ctx->status);
+
+ rc = ghes_ffh_init_ack(pdev, ctx);
+ if (rc)
+ return rc;
+
+ rc = ghes_ffh_init_pool();
+ if (rc)
+ return rc;
+
+ ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
+ if (!ctx->estatus)
+ return -ENOMEM;
+
+ ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
+ if (!ctx->generic)
+ return -ENOMEM;
+
+ ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
+ ctx->generic->header.source_id =
+ atomic_inc_return(&ghes_ffh_source_ids);
+ ctx->generic->notify.type = ctx->sync ?
+ ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
+ ctx->generic->error_block_length = ctx->status_len;
+
+ ctx->irq = platform_get_irq_optional(pdev, 0);
+ if (ctx->irq <= 0) {
+ if (ctx->irq == -EPROBE_DEFER)
+ return ctx->irq;
+ dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
+ return -EINVAL;
+ }
+
+ rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
+ NULL, ghes_ffh_irq,
+ IRQF_ONESHOT,
+ dev_name(&pdev->dev), ctx);
+ if (rc)
+ return rc;
+
+ platform_set_drvdata(pdev, ctx);
+ dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
+ return 0;
+}
+
+static void ghes_ffh_remove(struct platform_device *pdev)
+{
+}
+
+static const struct of_device_id ghes_ffh_of_match[] = {
+ { .compatible = "arm,ras-ffh" },
+ { /* sentinel */ }
+};
+MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
+
+static struct platform_driver ghes_ffh_driver = {
+ .driver = {
+ .name = "esource-dt",
+ .of_match_table = ghes_ffh_of_match,
+ },
+ .probe = ghes_ffh_probe,
+ .remove = ghes_ffh_remove,
+};
+
+module_platform_driver(ghes_ffh_driver);
+
+MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
+MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
+MODULE_LICENSE("GPL");
diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
index f7c9fba62585..d43185c020ee 100644
--- a/include/acpi/ghes_cper.h
+++ b/include/acpi/ghes_cper.h
@@ -75,6 +75,15 @@ static inline bool is_hest_sync_notify(struct ghes *ghes)
return notify_type == ACPI_HEST_NOTIFY_SEA;
}
+static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
+{
+ if (estatus->raw_data_length)
+ return estatus->raw_data_offset + \
+ estatus->raw_data_length;
+ else
+ return sizeof(*estatus) + estatus->data_length;
+}
+
struct ghes_vendor_record_entry {
struct work_struct work;
int error_severity;
--
2.43.0
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
@ 2026-02-20 19:19 ` kernel test robot
2026-02-20 19:24 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2026-02-20 19:19 UTC (permalink / raw)
To: Ahmed Tiba, devicetree, linux-acpi
Cc: oe-kbuild-all, Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp,
robh, rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
Hi Ahmed,
kernel test robot noticed the following build errors:
[auto build test ERROR on 8bf22c33e7a172fbc72464f4cc484d23a6b412ba]
url: https://github.com/intel-lab-lkp/linux/commits/Ahmed-Tiba/ACPI-APEI-GHES-share-macros-via-a-private-header/20260220-214812
base: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
patch link: https://lore.kernel.org/r/20260220-topics-ahmtib01-ras_ffh_arm_internal_review-v2-9-347fa2d7351b%40arm.com
patch subject: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
config: i386-randconfig-012-20260220 (https://download.01.org/0day-ci/archive/20260221/202602210334.nHzuTUCB-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260221/202602210334.nHzuTUCB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602210334.nHzuTUCB-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/acpi/apei/ghes_cper.c: In function 'ghes_handle_arm_hw_error':
>> drivers/acpi/apei/ghes_cper.c:261:34: error: implicit declaration of function 'FIELD_GET' [-Wimplicit-function-declaration]
261 | FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
| ^~~~~~~~~
vim +/FIELD_GET +261 drivers/acpi/apei/ghes_cper.c
202
203 bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
204 int sev, bool sync)
205 {
206 struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
207 int flags = sync ? MF_ACTION_REQUIRED : 0;
208 int length = gdata->error_data_length;
209 char error_type[120];
210 bool queued = false;
211 int sec_sev, i;
212 char *p;
213
214 sec_sev = ghes_severity(gdata->error_severity);
215 if (length >= sizeof(*err)) {
216 log_arm_hw_error(err, sec_sev);
217 } else {
218 pr_warn(FW_BUG "arm error length: %d\n", length);
219 pr_warn(FW_BUG "length is too small\n");
220 pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
221 return false;
222 }
223
224 if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
225 return false;
226
227 p = (char *)(err + 1);
228 length -= sizeof(err);
229
230 for (i = 0; i < err->err_info_num; i++) {
231 struct cper_arm_err_info *err_info;
232 bool is_cache, has_pa;
233
234 /* Ensure we have enough data for the error info header */
235 if (length < sizeof(*err_info))
236 break;
237
238 err_info = (struct cper_arm_err_info *)p;
239
240 /* Validate the claimed length before using it */
241 length -= err_info->length;
242 if (length < 0)
243 break;
244
245 is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
246 has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
247
248 /*
249 * The field (err_info->error_info & BIT(26)) is fixed to set to
250 * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
251 * firmware won't mix corrected errors in an uncorrected section,
252 * and don't filter out 'corrected' error here.
253 */
254 if (is_cache && has_pa) {
255 queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
256 p += err_info->length;
257 continue;
258 }
259
260 cper_bits_to_str(error_type, sizeof(error_type),
> 261 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
262 cper_proc_error_type_strs,
263 ARRAY_SIZE(cper_proc_error_type_strs));
264
265 pr_warn_ratelimited(FW_WARN GHES_PFX
266 "Unhandled processor error type 0x%02x: %s%s\n",
267 err_info->type, error_type,
268 (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
269 p += err_info->length;
270 }
271
272 return queued;
273 }
274
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-02-20 19:19 ` kernel test robot
@ 2026-02-20 19:24 ` kernel test robot
2026-02-20 20:37 ` kernel test robot
2026-02-20 21:16 ` kernel test robot
3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2026-02-20 19:24 UTC (permalink / raw)
To: Ahmed Tiba, devicetree, linux-acpi
Cc: llvm, oe-kbuild-all, Ahmed Tiba, Dmitry.Lamerov, catalin.marinas,
bp, robh, rafael, will, conor, linux-arm-kernel, linux-doc,
krzk+dt, Michael.Zhao2, tony.luck
Hi Ahmed,
kernel test robot noticed the following build errors:
[auto build test ERROR on 8bf22c33e7a172fbc72464f4cc484d23a6b412ba]
url: https://github.com/intel-lab-lkp/linux/commits/Ahmed-Tiba/ACPI-APEI-GHES-share-macros-via-a-private-header/20260220-214812
base: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
patch link: https://lore.kernel.org/r/20260220-topics-ahmtib01-ras_ffh_arm_internal_review-v2-9-347fa2d7351b%40arm.com
patch subject: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260220/202602202042.cliczLhi-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260220/202602202042.cliczLhi-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602202042.cliczLhi-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/acpi/apei/ghes_cper.c:261:6: error: call to undeclared function 'FIELD_GET'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
261 | FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
| ^
1 error generated.
vim +/FIELD_GET +261 drivers/acpi/apei/ghes_cper.c
202
203 bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
204 int sev, bool sync)
205 {
206 struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
207 int flags = sync ? MF_ACTION_REQUIRED : 0;
208 int length = gdata->error_data_length;
209 char error_type[120];
210 bool queued = false;
211 int sec_sev, i;
212 char *p;
213
214 sec_sev = ghes_severity(gdata->error_severity);
215 if (length >= sizeof(*err)) {
216 log_arm_hw_error(err, sec_sev);
217 } else {
218 pr_warn(FW_BUG "arm error length: %d\n", length);
219 pr_warn(FW_BUG "length is too small\n");
220 pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
221 return false;
222 }
223
224 if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
225 return false;
226
227 p = (char *)(err + 1);
228 length -= sizeof(err);
229
230 for (i = 0; i < err->err_info_num; i++) {
231 struct cper_arm_err_info *err_info;
232 bool is_cache, has_pa;
233
234 /* Ensure we have enough data for the error info header */
235 if (length < sizeof(*err_info))
236 break;
237
238 err_info = (struct cper_arm_err_info *)p;
239
240 /* Validate the claimed length before using it */
241 length -= err_info->length;
242 if (length < 0)
243 break;
244
245 is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
246 has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
247
248 /*
249 * The field (err_info->error_info & BIT(26)) is fixed to set to
250 * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
251 * firmware won't mix corrected errors in an uncorrected section,
252 * and don't filter out 'corrected' error here.
253 */
254 if (is_cache && has_pa) {
255 queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
256 p += err_info->length;
257 continue;
258 }
259
260 cper_bits_to_str(error_type, sizeof(error_type),
> 261 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
262 cper_proc_error_type_strs,
263 ARRAY_SIZE(cper_proc_error_type_strs));
264
265 pr_warn_ratelimited(FW_WARN GHES_PFX
266 "Unhandled processor error type 0x%02x: %s%s\n",
267 err_info->type, error_type,
268 (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
269 p += err_info->length;
270 }
271
272 return queued;
273 }
274
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-02-20 19:19 ` kernel test robot
2026-02-20 19:24 ` kernel test robot
@ 2026-02-20 20:37 ` kernel test robot
2026-02-20 21:16 ` kernel test robot
3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2026-02-20 20:37 UTC (permalink / raw)
To: Ahmed Tiba, devicetree, linux-acpi
Cc: oe-kbuild-all, Ahmed Tiba, Dmitry.Lamerov, catalin.marinas, bp,
robh, rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
Hi Ahmed,
kernel test robot noticed the following build errors:
[auto build test ERROR on 8bf22c33e7a172fbc72464f4cc484d23a6b412ba]
url: https://github.com/intel-lab-lkp/linux/commits/Ahmed-Tiba/ACPI-APEI-GHES-share-macros-via-a-private-header/20260220-214812
base: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
patch link: https://lore.kernel.org/r/20260220-topics-ahmtib01-ras_ffh_arm_internal_review-v2-9-347fa2d7351b%40arm.com
patch subject: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260220/202602202148.CB8O9os9-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260220/202602202148.CB8O9os9-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602202148.CB8O9os9-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/acpi/apei/ghes_cper.c: In function 'ghes_handle_arm_hw_error':
>> drivers/acpi/apei/ghes_cper.c:261:34: error: implicit declaration of function 'FIELD_GET' [-Wimplicit-function-declaration]
261 | FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
| ^~~~~~~~~
vim +/FIELD_GET +261 drivers/acpi/apei/ghes_cper.c
202
203 bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
204 int sev, bool sync)
205 {
206 struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
207 int flags = sync ? MF_ACTION_REQUIRED : 0;
208 int length = gdata->error_data_length;
209 char error_type[120];
210 bool queued = false;
211 int sec_sev, i;
212 char *p;
213
214 sec_sev = ghes_severity(gdata->error_severity);
215 if (length >= sizeof(*err)) {
216 log_arm_hw_error(err, sec_sev);
217 } else {
218 pr_warn(FW_BUG "arm error length: %d\n", length);
219 pr_warn(FW_BUG "length is too small\n");
220 pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
221 return false;
222 }
223
224 if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
225 return false;
226
227 p = (char *)(err + 1);
228 length -= sizeof(err);
229
230 for (i = 0; i < err->err_info_num; i++) {
231 struct cper_arm_err_info *err_info;
232 bool is_cache, has_pa;
233
234 /* Ensure we have enough data for the error info header */
235 if (length < sizeof(*err_info))
236 break;
237
238 err_info = (struct cper_arm_err_info *)p;
239
240 /* Validate the claimed length before using it */
241 length -= err_info->length;
242 if (length < 0)
243 break;
244
245 is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
246 has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
247
248 /*
249 * The field (err_info->error_info & BIT(26)) is fixed to set to
250 * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
251 * firmware won't mix corrected errors in an uncorrected section,
252 * and don't filter out 'corrected' error here.
253 */
254 if (is_cache && has_pa) {
255 queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
256 p += err_info->length;
257 continue;
258 }
259
260 cper_bits_to_str(error_type, sizeof(error_type),
> 261 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
262 cper_proc_error_type_strs,
263 ARRAY_SIZE(cper_proc_error_type_strs));
264
265 pr_warn_ratelimited(FW_WARN GHES_PFX
266 "Unhandled processor error type 0x%02x: %s%s\n",
267 err_info->type, error_type,
268 (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
269 p += err_info->length;
270 }
271
272 return queued;
273 }
274
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
` (2 preceding siblings ...)
2026-02-20 20:37 ` kernel test robot
@ 2026-02-20 21:16 ` kernel test robot
3 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2026-02-20 21:16 UTC (permalink / raw)
To: Ahmed Tiba, devicetree, linux-acpi
Cc: llvm, oe-kbuild-all, Ahmed Tiba, Dmitry.Lamerov, catalin.marinas,
bp, robh, rafael, will, conor, linux-arm-kernel, linux-doc,
krzk+dt, Michael.Zhao2, tony.luck
Hi Ahmed,
kernel test robot noticed the following build errors:
[auto build test ERROR on 8bf22c33e7a172fbc72464f4cc484d23a6b412ba]
url: https://github.com/intel-lab-lkp/linux/commits/Ahmed-Tiba/ACPI-APEI-GHES-share-macros-via-a-private-header/20260220-214812
base: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
patch link: https://lore.kernel.org/r/20260220-topics-ahmtib01-ras_ffh_arm_internal_review-v2-9-347fa2d7351b%40arm.com
patch subject: [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers
config: x86_64-randconfig-004-20260220 (https://download.01.org/0day-ci/archive/20260221/202602210530.ukbF5fjB-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260221/202602210530.ukbF5fjB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602210530.ukbF5fjB-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/acpi/apei/ghes_cper.c:261:6: error: call to undeclared function 'FIELD_GET'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
261 | FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
| ^
1 error generated.
vim +/FIELD_GET +261 drivers/acpi/apei/ghes_cper.c
202
203 bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
204 int sev, bool sync)
205 {
206 struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
207 int flags = sync ? MF_ACTION_REQUIRED : 0;
208 int length = gdata->error_data_length;
209 char error_type[120];
210 bool queued = false;
211 int sec_sev, i;
212 char *p;
213
214 sec_sev = ghes_severity(gdata->error_severity);
215 if (length >= sizeof(*err)) {
216 log_arm_hw_error(err, sec_sev);
217 } else {
218 pr_warn(FW_BUG "arm error length: %d\n", length);
219 pr_warn(FW_BUG "length is too small\n");
220 pr_warn(FW_BUG "firmware-generated error record is incorrect\n");
221 return false;
222 }
223
224 if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
225 return false;
226
227 p = (char *)(err + 1);
228 length -= sizeof(err);
229
230 for (i = 0; i < err->err_info_num; i++) {
231 struct cper_arm_err_info *err_info;
232 bool is_cache, has_pa;
233
234 /* Ensure we have enough data for the error info header */
235 if (length < sizeof(*err_info))
236 break;
237
238 err_info = (struct cper_arm_err_info *)p;
239
240 /* Validate the claimed length before using it */
241 length -= err_info->length;
242 if (length < 0)
243 break;
244
245 is_cache = err_info->type & CPER_ARM_CACHE_ERROR;
246 has_pa = (err_info->validation_bits & CPER_ARM_INFO_VALID_PHYSICAL_ADDR);
247
248 /*
249 * The field (err_info->error_info & BIT(26)) is fixed to set to
250 * 1 in some old firmware of HiSilicon Kunpeng920. We assume that
251 * firmware won't mix corrected errors in an uncorrected section,
252 * and don't filter out 'corrected' error here.
253 */
254 if (is_cache && has_pa) {
255 queued = ghes_do_memory_failure(err_info->physical_fault_addr, flags);
256 p += err_info->length;
257 continue;
258 }
259
260 cper_bits_to_str(error_type, sizeof(error_type),
> 261 FIELD_GET(CPER_ARM_ERR_TYPE_MASK, err_info->type),
262 cper_proc_error_type_strs,
263 ARRAY_SIZE(cper_proc_error_type_strs));
264
265 pr_warn_ratelimited(FW_WARN GHES_PFX
266 "Unhandled processor error type 0x%02x: %s%s\n",
267 err_info->type, error_type,
268 (err_info->type & ~CPER_ARM_ERR_TYPE_MASK) ? " with reserved bit(s)" : "");
269 p += err_info->length;
270 }
271
272 return queued;
273 }
274
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
@ 2026-02-21 9:06 ` Krzysztof Kozlowski
2026-02-23 19:10 ` Ahmed Tiba
2026-02-24 15:55 ` Jonathan Cameron
2026-02-26 7:01 ` Himanshu Chauhan
2 siblings, 1 reply; 39+ messages in thread
From: Krzysztof Kozlowski @ 2026-02-21 9:06 UTC (permalink / raw)
To: Ahmed Tiba, devicetree, linux-acpi
Cc: Dmitry.Lamerov, catalin.marinas, bp, robh, rafael, will, conor,
linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2, tony.luck
On 20/02/2026 14:42, Ahmed Tiba wrote:
> + rc = ghes_ffh_init_ack(pdev, ctx);
> + if (rc)
> + return rc;
> +
> + rc = ghes_ffh_init_pool();
> + if (rc)
> + return rc;
> +
> + ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
> + if (!ctx->estatus)
> + return -ENOMEM;
> +
> + ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
> + if (!ctx->generic)
> + return -ENOMEM;
> +
> + ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
> + ctx->generic->header.source_id =
> + atomic_inc_return(&ghes_ffh_source_ids);
> + ctx->generic->notify.type = ctx->sync ?
> + ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
> + ctx->generic->error_block_length = ctx->status_len;
> +
> + ctx->irq = platform_get_irq_optional(pdev, 0);
Please read the kerneldoc - wrong check in if.
> + if (ctx->irq <= 0) {
> + if (ctx->irq == -EPROBE_DEFER)
> + return ctx->irq;
> + dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
> + return -EINVAL;
> + }
> +
> + rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
> + NULL, ghes_ffh_irq,
> + IRQF_ONESHOT,
> + dev_name(&pdev->dev), ctx);
> + if (rc)
> + return rc;
> +
> + platform_set_drvdata(pdev, ctx);
> + dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
This does not look like useful printk message. Drivers should be silent
on success:
https://elixir.bootlin.com/linux/v6.15-rc7/source/Documentation/process/coding-style.rst#L913
https://elixir.bootlin.com/linux/v6.15-rc7/source/Documentation/process/debugging/driver_development_debugging_guide.rst#L79
> + return 0;
> +}
> +
> +static void ghes_ffh_remove(struct platform_device *pdev)
> +{
> +}
> +
> +static const struct of_device_id ghes_ffh_of_match[] = {
> + { .compatible = "arm,ras-ffh" },
> + { /* sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
> +
> +static struct platform_driver ghes_ffh_driver = {
> + .driver = {
> + .name = "esource-dt",
> + .of_match_table = ghes_ffh_of_match,
> + },
> + .probe = ghes_ffh_probe,
> + .remove = ghes_ffh_remove,
> +};
> +
> +module_platform_driver(ghes_ffh_driver);
> +
> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
> +MODULE_LICENSE("GPL");
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-21 9:06 ` Krzysztof Kozlowski
@ 2026-02-23 19:10 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-02-23 19:10 UTC (permalink / raw)
To: Krzysztof Kozlowski, devicetree, linux-acpi
Cc: Dmitry.Lamerov, catalin.marinas, bp, robh, rafael, will, conor,
linux-arm-kernel, linux-doc, krzk+dt, Michael.Zhao2, tony.luck
On 21/02/2026 09:06, Krzysztof Kozlowski wrote:
> On 20/02/2026 14:42, Ahmed Tiba wrote:
>> + rc = ghes_ffh_init_ack(pdev, ctx);
>> + if (rc)
>> + return rc;
>> +
>> + rc = ghes_ffh_init_pool();
>> + if (rc)
>> + return rc;
>> +
>> + ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
>> + if (!ctx->estatus)
>> + return -ENOMEM;
>> +
>> + ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
>> + if (!ctx->generic)
>> + return -ENOMEM;
>> +
>> + ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
>> + ctx->generic->header.source_id =
>> + atomic_inc_return(&ghes_ffh_source_ids);
>> + ctx->generic->notify.type = ctx->sync ?
>> + ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
>> + ctx->generic->error_block_length = ctx->status_len;
>> +
>> + ctx->irq = platform_get_irq_optional(pdev, 0);
>
> Please read the kerneldoc - wrong check in if.
Got it. I’ll follow the kerneldoc: use `if (irq < 0) return irq;`.
>> + if (ctx->irq <= 0) {
>> + if (ctx->irq == -EPROBE_DEFER)
>> + return ctx->irq;
>> + dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
>> + return -EINVAL;
>> + }
>> +
>> + rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
>> + NULL, ghes_ffh_irq,
>> + IRQF_ONESHOT,
>> + dev_name(&pdev->dev), ctx);
>> + if (rc)
>> + return rc;
>> +
>> + platform_set_drvdata(pdev, ctx);
>> + dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
>
> This does not look like useful printk message. Drivers should be silent
> on success:
> https://elixir.bootlin.com/linux/v6.15-rc7/source/Documentation/process/coding-style.rst#L913
> https://elixir.bootlin.com/linux/v6.15-rc7/source/Documentation/process/debugging/driver_development_debugging_guide.rst#L79
I will drop it.
>
>> + return 0;
>> +}
>> +
>> +static void ghes_ffh_remove(struct platform_device *pdev)
>> +{
>> +}
>> +
>> +static const struct of_device_id ghes_ffh_of_match[] = {
>> + { .compatible = "arm,ras-ffh" },
>> + { /* sentinel */ }
>> +};
>> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
>> +
>> +static struct platform_driver ghes_ffh_driver = {
>> + .driver = {
>> + .name = "esource-dt",
>> + .of_match_table = ghes_ffh_of_match,
>> + },
>> + .probe = ghes_ffh_probe,
>> + .remove = ghes_ffh_remove,
>> +};
>> +
>> +module_platform_driver(ghes_ffh_driver);
>> +
>> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
>> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
>> +MODULE_LICENSE("GPL");
Best regards,
Tiba
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
@ 2026-02-24 15:22 ` Jonathan Cameron
2026-03-11 11:39 ` Ahmed Tiba
2026-02-26 6:44 ` Himanshu Chauhan
1 sibling, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2026-02-24 15:22 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck, Mauro Carvalho Chehab
On Fri, 20 Feb 2026 13:42:19 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> Carve the CPER helper macros out of ghes.c and place them in a private
> header so they can be shared with upcoming helper files. This is a
> mechanical include change with no functional differences.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
+CC Mauro as he's been doing a lot of work on error injection recently so
can probably review the use of the various structures much more easily
than I can!
My main comment is on the naming of the new header.
Jonathan
> ---
> drivers/acpi/apei/ghes.c | 60 +-----------------------------
> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 96 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index f96aede5d9a3..07b70bcb8342 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
>
> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> new file mode 100644
> index 000000000000..2597fbadc4f3
> --- /dev/null
> +++ b/include/acpi/ghes_cper.h
> @@ -0,0 +1,95 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * APEI Generic Hardware Error Source: CPER Helper
There is other stuff in her usch as the GHES acks etc
in ghes_clear_estatus(). So I think this intro text
needs a bit more thought. The boundary is already rather
blurred though as for example cper_estatus_len() is only
tangentially connected to cper.
> + *
> + * Copyright (C) 2026 ARM Ltd.
Doesn't make sense to ad this copyright in this patch as so far
it's cut and paste of code from a file that you didn't write (at least
not in 2026!)
Might make sense after a few patches, in which case add the copyright
when it does.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + * Based on ACPI APEI GHES driver.
> + *
> + */
> +
> +#ifndef ACPI_APEI_GHES_CPER_H
> +#define ACPI_APEI_GHES_CPER_H
> +
> +#include <linux/workqueue.h>
> +
> +#include <acpi/ghes.h>
> +
> +#define GHES_PFX "GHES: "
> +
> +#define GHES_ESTATUS_MAX_SIZE 65536
> +#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
> +
> +#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
> +
> +/* This is just an estimation for memory pool allocation */
> +#define GHES_ESTATUS_CACHE_AVG_SIZE 512
> +
> +#define GHES_ESTATUS_CACHES_SIZE 4
> +
> +#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
> +/* Prevent too many caches are allocated because of RCU */
> +#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
> +
> +#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
> + (sizeof(struct ghes_estatus_cache) + (estatus_len))
> +#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
> + ((struct acpi_hest_generic_status *) \
> + ((struct ghes_estatus_cache *)(estatus_cache) + 1))
> +
> +#define GHES_ESTATUS_NODE_LEN(estatus_len) \
> + (sizeof(struct ghes_estatus_node) + (estatus_len))
> +#define GHES_ESTATUS_FROM_NODE(estatus_node) \
> + ((struct acpi_hest_generic_status *) \
> + ((struct ghes_estatus_node *)(estatus_node) + 1))
> +
> +#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
> + (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
> +#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
> + ((struct acpi_hest_generic_data *) \
> + ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
> +
> +static inline bool is_hest_type_generic_v2(struct ghes *ghes)
> +{
> + return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
> +}
> +
> +/*
> + * A platform may describe one error source for the handling of synchronous
> + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
> + * or External Interrupt). On x86, the HEST notifications are always
> + * asynchronous, so only SEA on ARM is delivered as a synchronous
> + * notification.
> + */
> +static inline bool is_hest_sync_notify(struct ghes *ghes)
> +{
> + u8 notify_type = ghes->generic->notify.type;
> +
> + return notify_type == ACPI_HEST_NOTIFY_SEA;
> +}
> +
> +struct ghes_vendor_record_entry {
> + struct work_struct work;
> + int error_severity;
> + char vendor_record[];
> +};
> +
> +static struct ghes *ghes_new(struct acpi_hest_generic *generic);
> +static void ghes_fini(struct ghes *ghes);
> +
> +static int ghes_read_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> +static void ghes_clear_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx);
I'm not sure some of this makes sense in a file named ghes_cper.h
Maybe we just need a different intro comment though.
> +static int __ghes_peek_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> +static int __ghes_check_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus);
> +static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
> + size_t buf_len);
> +
> +#endif /* ACPI_APEI_GHES_CPER_H */
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
@ 2026-02-24 15:25 ` Jonathan Cameron
2026-03-11 12:19 ` Ahmed Tiba
0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2026-02-24 15:25 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, 20 Feb 2026 13:42:20 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> Introduce a dedicated ghes_cper translation unit so that follow-on commits
> can move helpers out of ghes.c without touching the build logic twice.
> This keeps the object in the tree while remaining functionally identical.
I'd probably do this with the first move patch not as a separate patch.
That would resolve the question of headers etc below.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> drivers/acpi/apei/Makefile | 2 +-
> drivers/acpi/apei/ghes_cper.c | 26 ++++++++++++++++++++++++++
> 2 files changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
> index 1a0b85923cd4..b3774af70883 100644
> --- a/drivers/acpi/apei/Makefile
> +++ b/drivers/acpi/apei/Makefile
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0
> obj-$(CONFIG_ACPI_APEI) += apei.o
> -obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
> +obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o ghes_cper.o
> # clang versions prior to 18 may blow out the stack with KASAN
> ifeq ($(CONFIG_COMPILE_TEST)_$(CONFIG_CC_IS_CLANG)_$(call clang-min-version, 180000),y_y_)
> KASAN_SANITIZE_ghes.o := n
> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
> new file mode 100644
> index 000000000000..63047322a3d9
> --- /dev/null
> +++ b/drivers/acpi/apei/ghes_cper.c
> @@ -0,0 +1,26 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *
As below.
> + * APEI GHES CPER helper translation unit - staging file for helper moves
> + *
> + * Copyright (C) 2026 ARM Ltd.
As before. If there isn't significant new content copyright doesn't make sense yet.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + * Based on ACPI APEI GHES driver.
> + *
No obvious benefit in this blank line so I'd drop it.
> + */
> +
> +#include <linux/err.h>
> +#include <linux/io.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/ratelimit.h>
> +#include <linux/slab.h>
Build includes up as they become relevant. That way we can see whether
they are needed or not. Right now none of them are..
> +
> +#include <acpi/apei.h>
> +
> +#include <asm/fixmap.h>
> +#include <asm/tlbflush.h>
> +
> +#include "apei-internal.h"
> +
> +/* Helper bodies will be moved here in follow-up commits. */
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
@ 2026-02-24 15:32 ` Jonathan Cameron
2026-03-11 12:38 ` Ahmed Tiba
2026-02-26 5:58 ` Himanshu Chauhan
1 sibling, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2026-02-24 15:32 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, 20 Feb 2026 13:42:21 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> Relocate the CPER buffer mapping, peek, and clear helpers from ghes.c into
> ghes_cper.c so they can be shared with other firmware-first providers.
> This commit only shuffles code; behavior stays the same.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
Hi Ahmed,
Most of the comments in here are about changing the patch break up.
Basic suggest approach is move stuff as it is needed, not in advance of
that need. So when you move the function to the c file, only then add what
it needs to the includes / header.
Jonathan
> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
> index 63047322a3d9..7e0015e960c1 100644
> --- a/drivers/acpi/apei/ghes_cper.c
> +++ b/drivers/acpi/apei/ghes_cper.c
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> /*
> *
> - * APEI GHES CPER helper translation unit - staging file for helper moves
> + * APEI GHES CPER helper translation unit - code mechanically moved from ghes.c
In the long run, no interest in where it came from. People can
look at the git history for that.
> *
> * Copyright (C) 2026 ARM Ltd.
> * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> @@ -17,10 +17,176 @@
> #include <linux/slab.h>
>
> #include <acpi/apei.h>
> +#include <acpi/ghes_cper.h>
>
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
>
> #include "apei-internal.h"
>
> -/* Helper bodies will be moved here in follow-up commits. */
If you just do the file creation with this first move, then we don't get churn of
comments like this one.
> +/* Read the CPER block, returning its address, and header in estatus. */
> +int __ghes_peek_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
> +{
> + struct acpi_hest_generic *g = ghes->generic;
> + int rc;
> +
> + rc = apei_read(buf_paddr, &g->error_status_address);
> + if (rc) {
> + *buf_paddr = 0;
> + pr_warn_ratelimited(FW_WARN GHES_PFX
> +"Failed to read error status block address for hardware error source: %d.\n",
Unusual indenting. I'd just fix that whilst you are here. Don't worry about long line.
> + g->header.source_id);
> + return -EIO;
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> index 2597fbadc4f3..2e3919f0c3e7 100644
> --- a/include/acpi/ghes_cper.h
> +++ b/include/acpi/ghes_cper.h
> @@ -74,21 +74,21 @@ struct ghes_vendor_record_entry {
> char vendor_record[];
> };
>
> -static struct ghes *ghes_new(struct acpi_hest_generic *generic);
Huh. Static forward declarations in a header? That never made sense. Fix it in the
earlier patch and remove the statics from the declarations.
Actually no, just bring them into the header only when you need to. So as part
of the patch that moves the caller or the function.
> -static void ghes_fini(struct ghes *ghes);
> +struct ghes *ghes_new(struct acpi_hest_generic *generic);
> +void ghes_fini(struct ghes *ghes);
>
> -static int ghes_read_estatus(struct ghes *ghes,
> +int ghes_read_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> -static void ghes_clear_estatus(struct ghes *ghes,
> +void ghes_clear_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 buf_paddr, enum fixed_addresses fixmap_idx);
> -static int __ghes_peek_estatus(struct ghes *ghes,
> +int __ghes_peek_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> -static int __ghes_check_estatus(struct ghes *ghes,
> +int __ghes_check_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus);
> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> u64 buf_paddr, enum fixed_addresses fixmap_idx,
> size_t buf_len);
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
@ 2026-02-24 15:34 ` Jonathan Cameron
0 siblings, 0 replies; 39+ messages in thread
From: Jonathan Cameron @ 2026-02-24 15:34 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck, linux-cxl
On Fri, 20 Feb 2026 13:42:25 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> Move the CXL CPER handling paths out of ghes.c and into ghes_cper.c so the
> helpers can be reused. The code is moved as-is, with the public
> prototypes updated so GHES keeps calling into the new translation unit.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
+CC linux-cxl.
I haven't looked closely but suspect the same stuff on code movement and patch
break up applies here.
Thanks,
Jonathan
> ---
> drivers/acpi/apei/ghes.c | 132 -----------------------------------------
> drivers/acpi/apei/ghes_cper.c | 135 ++++++++++++++++++++++++++++++++++++++++++
> include/acpi/ghes_cper.h | 11 ++++
> 3 files changed, 146 insertions(+), 132 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 9703c602a8c2..136993704d52 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -383,138 +383,6 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
> #endif
> }
>
> -/* Room for 8 entries */
> -#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
> -static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
> - CXL_CPER_PROT_ERR_FIFO_DEPTH);
> -
> -/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
> -static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
> -struct work_struct *cxl_cper_prot_err_work;
> -
> -static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> - int severity)
> -{
> -#ifdef CONFIG_ACPI_APEI_PCIEAER
> - struct cxl_cper_prot_err_work_data wd;
> -
> - if (cxl_cper_sec_prot_err_valid(prot_err))
> - return;
> -
> - guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
> -
> - if (!cxl_cper_prot_err_work)
> - return;
> -
> - if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
> - return;
> -
> - if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
> - pr_err_ratelimited("CXL CPER kfifo overflow\n");
> - return;
> - }
> -
> - schedule_work(cxl_cper_prot_err_work);
> -#endif
> -}
> -
> -int cxl_cper_register_prot_err_work(struct work_struct *work)
> -{
> - if (cxl_cper_prot_err_work)
> - return -EINVAL;
> -
> - guard(spinlock)(&cxl_cper_prot_err_work_lock);
> - cxl_cper_prot_err_work = work;
> - return 0;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
> -
> -int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> -{
> - if (cxl_cper_prot_err_work != work)
> - return -EINVAL;
> -
> - guard(spinlock)(&cxl_cper_prot_err_work_lock);
> - cxl_cper_prot_err_work = NULL;
> - return 0;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
> -
> -int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
> -{
> - return kfifo_get(&cxl_cper_prot_err_fifo, wd);
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
> -
> -/* Room for 8 entries for each of the 4 event log queues */
> -#define CXL_CPER_FIFO_DEPTH 32
> -DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
> -
> -/* Synchronize schedule_work() with cxl_cper_work changes */
> -static DEFINE_SPINLOCK(cxl_cper_work_lock);
> -struct work_struct *cxl_cper_work;
> -
> -static void cxl_cper_post_event(enum cxl_event_type event_type,
> - struct cxl_cper_event_rec *rec)
> -{
> - struct cxl_cper_work_data wd;
> -
> - if (rec->hdr.length <= sizeof(rec->hdr) ||
> - rec->hdr.length > sizeof(*rec)) {
> - pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> - rec->hdr.length);
> - return;
> - }
> -
> - if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> - pr_err(FW_WARN "CXL CPER invalid event\n");
> - return;
> - }
> -
> - guard(spinlock_irqsave)(&cxl_cper_work_lock);
> -
> - if (!cxl_cper_work)
> - return;
> -
> - wd.event_type = event_type;
> - memcpy(&wd.rec, rec, sizeof(wd.rec));
> -
> - if (!kfifo_put(&cxl_cper_fifo, wd)) {
> - pr_err_ratelimited("CXL CPER kfifo overflow\n");
> - return;
> - }
> -
> - schedule_work(cxl_cper_work);
> -}
> -
> -int cxl_cper_register_work(struct work_struct *work)
> -{
> - if (cxl_cper_work)
> - return -EINVAL;
> -
> - guard(spinlock)(&cxl_cper_work_lock);
> - cxl_cper_work = work;
> - return 0;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
> -
> -int cxl_cper_unregister_work(struct work_struct *work)
> -{
> - if (cxl_cper_work != work)
> - return -EINVAL;
> -
> - guard(spinlock)(&cxl_cper_work_lock);
> - cxl_cper_work = NULL;
> - return 0;
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
> -
> -int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
> -{
> - return kfifo_get(&cxl_cper_fifo, wd);
> -}
> -EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
> -
> static void ghes_log_hwerr(int sev, guid_t *sec_type)
> {
> if (sev != CPER_SEV_RECOVERABLE)
> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
> index 627f6c712261..673dca208935 100644
> --- a/drivers/acpi/apei/ghes_cper.c
> +++ b/drivers/acpi/apei/ghes_cper.c
> @@ -9,10 +9,12 @@
> *
> */
>
> +#include <linux/aer.h>
> #include <linux/err.h>
> #include <linux/genalloc.h>
> #include <linux/irq_work.h>
> #include <linux/io.h>
> +#include <linux/kfifo.h>
> #include <linux/kernel.h>
> #include <linux/list.h>
> #include <linux/math64.h>
> @@ -319,6 +321,139 @@ void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
> schedule_work(&entry->work);
> }
>
> +
> +/* Room for 8 entries */
> +#define CXL_CPER_PROT_ERR_FIFO_DEPTH 8
> +static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
> + CXL_CPER_PROT_ERR_FIFO_DEPTH);
> +
> +/* Synchronize schedule_work() with cxl_cper_prot_err_work changes */
> +static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
> +struct work_struct *cxl_cper_prot_err_work;
> +
> +void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> + int severity)
> +{
> +#ifdef CONFIG_ACPI_APEI_PCIEAER
> + struct cxl_cper_prot_err_work_data wd;
> +
> + if (cxl_cper_sec_prot_err_valid(prot_err))
> + return;
> +
> + guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
> +
> + if (!cxl_cper_prot_err_work)
> + return;
> +
> + if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
> + return;
> +
> + if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
> + pr_err_ratelimited("CXL CPER kfifo overflow\n");
> + return;
> + }
> +
> + schedule_work(cxl_cper_prot_err_work);
> +#endif
> +}
> +
> +int cxl_cper_register_prot_err_work(struct work_struct *work)
> +{
> + if (cxl_cper_prot_err_work)
> + return -EINVAL;
> +
> + guard(spinlock)(&cxl_cper_prot_err_work_lock);
> + cxl_cper_prot_err_work = work;
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
> +
> +int cxl_cper_unregister_prot_err_work(struct work_struct *work)
> +{
> + if (cxl_cper_prot_err_work != work)
> + return -EINVAL;
> +
> + guard(spinlock)(&cxl_cper_prot_err_work_lock);
> + cxl_cper_prot_err_work = NULL;
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
> +
> +int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
> +{
> + return kfifo_get(&cxl_cper_prot_err_fifo, wd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
> +
> +/* Room for 8 entries for each of the 4 event log queues */
> +#define CXL_CPER_FIFO_DEPTH 32
> +DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
> +
> +/* Synchronize schedule_work() with cxl_cper_work changes */
> +static DEFINE_SPINLOCK(cxl_cper_work_lock);
> +struct work_struct *cxl_cper_work;
> +
> +void cxl_cper_post_event(enum cxl_event_type event_type,
> + struct cxl_cper_event_rec *rec)
> +{
> + struct cxl_cper_work_data wd;
> +
> + if (rec->hdr.length <= sizeof(rec->hdr) ||
> + rec->hdr.length > sizeof(*rec)) {
> + pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
> + rec->hdr.length);
> + return;
> + }
> +
> + if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
> + pr_err(FW_WARN "CXL CPER invalid event\n");
> + return;
> + }
> +
> + guard(spinlock_irqsave)(&cxl_cper_work_lock);
> +
> + if (!cxl_cper_work)
> + return;
> +
> + wd.event_type = event_type;
> + memcpy(&wd.rec, rec, sizeof(wd.rec));
> +
> + if (!kfifo_put(&cxl_cper_fifo, wd)) {
> + pr_err_ratelimited("CXL CPER kfifo overflow\n");
> + return;
> + }
> +
> + schedule_work(cxl_cper_work);
> +}
> +
> +int cxl_cper_register_work(struct work_struct *work)
> +{
> + if (cxl_cper_work)
> + return -EINVAL;
> +
> + guard(spinlock)(&cxl_cper_work_lock);
> + cxl_cper_work = work;
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_register_work, "CXL");
> +
> +int cxl_cper_unregister_work(struct work_struct *work)
> +{
> + if (cxl_cper_work != work)
> + return -EINVAL;
> +
> + guard(spinlock)(&cxl_cper_work_lock);
> + cxl_cper_work = NULL;
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_work, "CXL");
> +
> +int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
> +{
> + return kfifo_get(&cxl_cper_fifo, wd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
> +
> /*
> * GHES error status reporting throttle, to report more kinds of
> * errors, instead of just most frequently occurred errors.
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> index c5ff4c502017..4522e8699ce0 100644
> --- a/include/acpi/ghes_cper.h
> +++ b/include/acpi/ghes_cper.h
> @@ -15,6 +15,7 @@
> #include <linux/workqueue.h>
>
> #include <acpi/ghes.h>
> +#include <cxl/event.h>
>
> #define GHES_PFX "GHES: "
>
> @@ -99,5 +100,15 @@ void ghes_estatus_cache_add(struct acpi_hest_generic *generic,
> struct acpi_hest_generic_status *estatus);
> void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
> int sev);
> +void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> + int severity);
> +int cxl_cper_register_prot_err_work(struct work_struct *work);
> +int cxl_cper_unregister_prot_err_work(struct work_struct *work);
> +int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd);
> +void cxl_cper_post_event(enum cxl_event_type event_type,
> + struct cxl_cper_event_rec *rec);
> +int cxl_cper_register_work(struct work_struct *work);
> +int cxl_cper_unregister_work(struct work_struct *work);
> +int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
>
> #endif /* ACPI_APEI_GHES_CPER_H */
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-21 9:06 ` Krzysztof Kozlowski
@ 2026-02-24 15:55 ` Jonathan Cameron
2026-03-12 12:23 ` Ahmed Tiba
2026-02-26 7:01 ` Himanshu Chauhan
2 siblings, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2026-02-24 15:55 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, 20 Feb 2026 13:42:29 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> Add a DeviceTree firmware-first CPER provider that reuses the shared
> GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
> the admin guide. Update MAINTAINERS now that the driver exists.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
Hi Ahmed,
Various comments inline.
Jonathan
> ---
> Documentation/admin-guide/RAS/main.rst | 18 +++
> MAINTAINERS | 1 +
> drivers/acpi/apei/apei-internal.h | 10 +-
> drivers/acpi/apei/ghes_cper.c | 2 +
> drivers/ras/Kconfig | 12 ++
> drivers/ras/Makefile | 1 +
> drivers/ras/esource-dt.c | 264 +++++++++++++++++++++++++++++++++
> include/acpi/ghes_cper.h | 9 ++
> 8 files changed, 308 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
> index 5a45db32c49b..4ffabaaeabb1 100644
> --- a/Documentation/admin-guide/RAS/main.rst
> +++ b/Documentation/admin-guide/RAS/main.rst
> @@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
> .. [#f3] For more details about the Machine Check Architecture (MCA),
> please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
>
> +Firmware-first CPER via DeviceTree
> +----------------------------------
> +
> +Some systems expose Common Platform Error Record (CPER) data
> +via DeviceTree instead of ACPI HEST tables.
I'd argue this isn't really DT specific, it's just not ACPI table.
You could for instance use PRP0001 and wire this up on ACPI with only
one trivial change to generic property.h accessor for the boolean.
Or use another firmware information source entirely.
> +Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
> +driver and describe the CPER error source buffer with the
> +``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
> +The driver reuses the GHES CPER helper object in
> +``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
> +memory failure handling match the ACPI GHES behaviour even when
> +ACPI is disabled.
> +
> +Once a platform describes a firmware-first provider, both ACPI GHES and the
> +DeviceTree driver reuse the same code paths. This keeps the behaviour
> +consistent regardless of whether the error source is described via ACPI
> +tables or DeviceTree.
> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..ea6d96713020 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -34,6 +34,18 @@ if RAS
> source "arch/x86/ras/Kconfig"
> source "drivers/ras/amd/atl/Kconfig"
>
> +config RAS_ESOURCE_DT
> + bool "DeviceTree firmware-first CPER error source block provider"
It isn't really DT specific other than one call that I've suggested you
replace with a generic firmware accessor.
> + depends on OF
Generally we don't gate on OF unless there are OF specific calls. Here there
aren't so you are just reducing build coverage. || COMPILE_TEST
maybe.
> + depends on ARM64
Likewise, nothing in here is arm64 specific that I can spot.
> + select GHES_CPER_HELPERS
> + help
> + Enable support for firmware-first Common Platform Error Record (CPER)
> + error source block providers that are described via DeviceTree
> + instead of ACPI HEST tables. The driver reuses the existing GHES
> + CPER helpers so the error processing matches the ACPI code paths,
> + but it can be built even when ACPI is disabled.
> +
> diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
> new file mode 100644
> index 000000000000..b575a2258536
> --- /dev/null
> +++ b/drivers/ras/esource-dt.c
> @@ -0,0 +1,264 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * DeviceTree provider for firmware-first CPER error source block.
> + *
> + * This driver shares the GHES CPER helpers so we keep the reporting and
> + * notifier behaviour identical to ACPI GHES
> + *
> + * Copyright (C) 2025 ARM Ltd.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
Used?
> +#include <linux/module.h>
mod_devicetable.h for of_device_id definition.
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
Generally very little reason to include these. Not sure why you need
them here.
> +#include <linux/panic.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#include <acpi/ghes.h>
> +#include <acpi/ghes_cper.h>
> +
> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
I'd normally expect an IDA or similar. If nothing else it clearly
indicates we only want a unique ID.
> +
> +struct ghes_ffh_ack {
> + void __iomem *addr;
> + u64 preserve;
> + u64 set;
> + u8 width;
> + bool present;
> +};
> +
> +struct ghes_ffh {
> + struct device *dev;
> + void __iomem *status;
> + size_t status_len;
> +
> + struct ghes_ffh_ack ack;
> +
> + struct acpi_hest_generic *generic;
> + struct acpi_hest_generic_status *estatus;
> +
> + bool sync;
> + int irq;
> +
> + /* Serializes access to the firmware-owned buffer. */
If we are serializing it, in what sense is it owned by the firmware?
> + spinlock_t lock;
> +};
> +
> +static void ghes_ffh_process(struct ghes_ffh *ctx)
> +{
> + unsigned long flags;
> + int sev;
> +
> + spin_lock_irqsave(&ctx->lock, flags);
guard() + include cleanup.h. Then can do returns in error paths.
> +
> + if (ghes_ffh_copy_status(ctx))
> + goto out;
Like here to give simpler lfow.
> +
> + sev = ghes_severity(ctx->estatus->error_severity);
> + if (sev >= GHES_SEV_PANIC)
> + ghes_ffh_fatal(ctx);
> +
> + if (!ghes_estatus_cached(ctx->estatus)) {
> + if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
Combine the two if statements with &&
> + ghes_estatus_cache_add(ctx->generic, ctx->estatus);
> + }
> +
> + ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
> +
> + ghes_ffh_ack(ctx);
> +
> +out:
> + spin_unlock_irqrestore(&ctx->lock, flags);
> +}
> +
> +static irqreturn_t ghes_ffh_irq(int irq, void *data)
> +{
> + struct ghes_ffh *ctx = data;
> +
> + ghes_ffh_process(ctx);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int ghes_ffh_init_ack(struct platform_device *pdev,
> + struct ghes_ffh *ctx)
> +{
> + struct resource *res;
> + size_t size;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> + if (!res)
> + return 0;
> +
> + ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
Why not devm_platform_get_and_ioremap_resource()?
> + if (IS_ERR(ctx->ack.addr))
> + return PTR_ERR(ctx->ack.addr);
> +
> + size = resource_size(res);
> + switch (size) {
> + case 4:
> + ctx->ack.width = 32;
> + ctx->ack.preserve = ~0U;
> + break;
> + case 8:
> + ctx->ack.width = 64;
> + ctx->ack.preserve = ~0ULL;
> + break;
> + default:
> + dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
> + return -EINVAL;
> + }
> +
> + ctx->ack.set = BIT_ULL(0);
> + ctx->ack.present = true;
> + return 0;
> +}
> +
> +static int ghes_ffh_probe(struct platform_device *pdev)
Consider using a
struct device *dev = &pdev->dev;
given there is only one device around and it will shorten a bunch of
lines a little.
> +{
> + struct ghes_ffh *ctx;
> + struct resource *res;
> + int rc;
> +
> + ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + spin_lock_init(&ctx->lock);
> + ctx->dev = &pdev->dev;
> + ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
Hmm. I'd allow for other firmware types with
device_property_read_bool() instead.
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res) {
> + dev_err(&pdev->dev, "status region missing\n");
In probe you can always use dev_err_probe. It pretty prints the return value etc and
saves lines of code.
return dev_err_probe(&pdev->dev, -EINVAL, "status region missing\n");
Don't worry about slightly long line.
> + return -EINVAL;
> + }
> +
> + ctx->status_len = resource_size(res);
> + if (!ctx->status_len) {
> + dev_err(&pdev->dev, "Status region has zero length\n");
As above, use dev_err_probe()
> + return -EINVAL;
> + }
> +
> + ctx->status = devm_ioremap_resource(&pdev->dev, res);
I'd be tempted to use devm_platform_get_and_ioremap_resource() and just
not worry about mapping and unmapping that will unnecessarily occur in the
case of error.
> + if (IS_ERR(ctx->status))
> + return PTR_ERR(ctx->status);
> +
> + rc = ghes_ffh_init_ack(pdev, ctx);
> + if (rc)
> + return rc;
> +
> + rc = ghes_ffh_init_pool();
> + if (rc)
> + return rc;
> +
> + ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
> + if (!ctx->estatus)
> + return -ENOMEM;
> +
> + ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
> + if (!ctx->generic)
> + return -ENOMEM;
> +
> + ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
> + ctx->generic->header.source_id =
> + atomic_inc_return(&ghes_ffh_source_ids);
> + ctx->generic->notify.type = ctx->sync ?
> + ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
> + ctx->generic->error_block_length = ctx->status_len;
> +
> + ctx->irq = platform_get_irq_optional(pdev, 0);
> + if (ctx->irq <= 0) {
> + if (ctx->irq == -EPROBE_DEFER)
> + return ctx->irq;
> + dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
If it's required, why call get_irq_optional?
That only serves to suppress the error message inside the call. Use
the non optional version and drop this.
> + return -EINVAL;
> + }
> +
> + rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
> + NULL, ghes_ffh_irq,
> + IRQF_ONESHOT,
> + dev_name(&pdev->dev), ctx);
> + if (rc)
> + return rc;
> +
> + platform_set_drvdata(pdev, ctx);
I can't immediately spot where this is used. If it isn't don't set it as that
will mislead people into thinking it's needed.
> + dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
Krysztof already commented on this one.
> + return 0;
> +}
> +
> +static void ghes_ffh_remove(struct platform_device *pdev)
> +{
If nothing to do, platform drivers don't need a remove so get rid of it.
> +}
> +
> +static const struct of_device_id ghes_ffh_of_match[] = {
> + { .compatible = "arm,ras-ffh" },
> + { /* sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
> +
> +static struct platform_driver ghes_ffh_driver = {
> + .driver = {
> + .name = "esource-dt",
> + .of_match_table = ghes_ffh_of_match,
> + },
> + .probe = ghes_ffh_probe,
> + .remove = ghes_ffh_remove,
> +};
> +
Common convention is keep this tightly coupled with the
struct platform_driver but not having a blank line here.
> +module_platform_driver(ghes_ffh_driver);
> +
> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
> +MODULE_LICENSE("GPL");
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-02-24 15:32 ` Jonathan Cameron
@ 2026-02-26 5:58 ` Himanshu Chauhan
2026-03-11 13:18 ` Ahmed Tiba
1 sibling, 1 reply; 39+ messages in thread
From: Himanshu Chauhan @ 2026-02-26 5:58 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, Feb 20, 2026 at 7:13 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
> Relocate the CPER buffer mapping, peek, and clear helpers from ghes.c into
> ghes_cper.c so they can be shared with other firmware-first providers.
> This commit only shuffles code; behavior stays the same.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> drivers/acpi/apei/ghes.c | 170 +-----------------------------------------
> drivers/acpi/apei/ghes_cper.c | 170 +++++++++++++++++++++++++++++++++++++++++-
> include/acpi/ghes_cper.h | 14 ++--
> 3 files changed, 177 insertions(+), 177 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 07b70bcb8342..b159dbee90ac 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -118,26 +118,6 @@ static struct gen_pool *ghes_estatus_pool;
> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> static atomic_t ghes_estatus_cache_alloced;
>
> -static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
> -{
> - phys_addr_t paddr;
> - pgprot_t prot;
> -
> - paddr = PFN_PHYS(pfn);
> - prot = arch_apei_get_mem_attribute(paddr);
> - __set_fixmap(fixmap_idx, paddr, prot);
> -
> - return (void __iomem *) __fix_to_virt(fixmap_idx);
> -}
> -
> -static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
> -{
> - int _idx = virt_to_fix((unsigned long)vaddr);
> -
> - WARN_ON_ONCE(fixmap_idx != _idx);
> - clear_fixmap(fixmap_idx);
> -}
> -
> int ghes_estatus_pool_init(unsigned int num_ghes)
> {
> unsigned long addr, len;
> @@ -193,22 +173,7 @@ static void unmap_gen_v2(struct ghes *ghes)
> apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
> }
>
> -static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
> -{
> - int rc;
> - u64 val = 0;
> -
> - rc = apei_read(&val, &gv2->read_ack_register);
> - if (rc)
> - return;
> -
> - val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
> - val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
> -
> - apei_write(val, &gv2->read_ack_register);
> -}
> -
> -static struct ghes *ghes_new(struct acpi_hest_generic *generic)
> +struct ghes *ghes_new(struct acpi_hest_generic *generic)
> {
> struct ghes *ghes;
> unsigned int error_block_length;
> @@ -255,7 +220,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
> return ERR_PTR(rc);
> }
>
> -static void ghes_fini(struct ghes *ghes)
> +void ghes_fini(struct ghes *ghes)
> {
> kfree(ghes->estatus);
> apei_unmap_generic_address(&ghes->generic->error_status_address);
> @@ -280,137 +245,6 @@ static inline int ghes_severity(int severity)
> }
> }
Can it be "ghes_finish"? We already have "creat" without 'e'.
>
> -static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
> - int from_phys,
> - enum fixed_addresses fixmap_idx)
> -{
> - void __iomem *vaddr;
> - u64 offset;
> - u32 trunk;
> -
> - while (len > 0) {
> - offset = paddr - (paddr & PAGE_MASK);
> - vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
> - trunk = PAGE_SIZE - offset;
> - trunk = min(trunk, len);
> - if (from_phys)
> - memcpy_fromio(buffer, vaddr + offset, trunk);
> - else
> - memcpy_toio(vaddr + offset, buffer, trunk);
> - len -= trunk;
> - paddr += trunk;
> - buffer += trunk;
> - ghes_unmap(vaddr, fixmap_idx);
> - }
> -}
> -
> -/* Check the top-level record header has an appropriate size. */
> -static int __ghes_check_estatus(struct ghes *ghes,
> - struct acpi_hest_generic_status *estatus)
> -{
> - u32 len = cper_estatus_len(estatus);
> - u32 max_len = min(ghes->generic->error_block_length,
> - ghes->estatus_length);
> -
> - if (len < sizeof(*estatus)) {
> - pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
> - return -EIO;
> - }
> -
> - if (!len || len > max_len) {
> - pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
> - return -EIO;
> - }
> -
> - if (cper_estatus_check_header(estatus)) {
> - pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
> - return -EIO;
> - }
> -
> - return 0;
> -}
> -
> -/* Read the CPER block, returning its address, and header in estatus. */
> -static int __ghes_peek_estatus(struct ghes *ghes,
> - struct acpi_hest_generic_status *estatus,
> - u64 *buf_paddr, enum fixed_addresses fixmap_idx)
> -{
> - struct acpi_hest_generic *g = ghes->generic;
> - int rc;
> -
> - rc = apei_read(buf_paddr, &g->error_status_address);
> - if (rc) {
> - *buf_paddr = 0;
> - pr_warn_ratelimited(FW_WARN GHES_PFX
> -"Failed to read error status block address for hardware error source: %d.\n",
> - g->header.source_id);
> - return -EIO;
> - }
> - if (!*buf_paddr)
> - return -ENOENT;
> -
> - ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
> - fixmap_idx);
> - if (!estatus->block_status) {
> - *buf_paddr = 0;
> - return -ENOENT;
> - }
> -
> - return 0;
> -}
> -
> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> - u64 buf_paddr, enum fixed_addresses fixmap_idx,
> - size_t buf_len)
> -{
> - ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
> - if (cper_estatus_check(estatus)) {
> - pr_warn_ratelimited(FW_WARN GHES_PFX
> - "Failed to read error status block!\n");
> - return -EIO;
> - }
> -
> - return 0;
> -}
> -
> -static int ghes_read_estatus(struct ghes *ghes,
> - struct acpi_hest_generic_status *estatus,
> - u64 *buf_paddr, enum fixed_addresses fixmap_idx)
> -{
> - int rc;
> -
> - rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
> - if (rc)
> - return rc;
> -
> - rc = __ghes_check_estatus(ghes, estatus);
> - if (rc)
> - return rc;
> -
> - return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
> - cper_estatus_len(estatus));
> -}
> -
> -static void ghes_clear_estatus(struct ghes *ghes,
> - struct acpi_hest_generic_status *estatus,
> - u64 buf_paddr, enum fixed_addresses fixmap_idx)
> -{
> - estatus->block_status = 0;
> -
> - if (!buf_paddr)
> - return;
> -
> - ghes_copy_tofrom_phys(estatus, buf_paddr,
> - sizeof(estatus->block_status), 0,
> - fixmap_idx);
> -
> - /*
> - * GHESv2 type HEST entries introduce support for error acknowledgment,
> - * so only acknowledge the error if this support is present.
> - */
> - if (is_hest_type_generic_v2(ghes))
> - ghes_ack_error(ghes->generic_v2);
> -}
>
> /**
> * struct ghes_task_work - for synchronous RAS event
> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
> index 63047322a3d9..7e0015e960c1 100644
> --- a/drivers/acpi/apei/ghes_cper.c
> +++ b/drivers/acpi/apei/ghes_cper.c
IMO, just "cper.c" would be fine.
> @@ -1,7 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> /*
> *
> - * APEI GHES CPER helper translation unit - staging file for helper moves
> + * APEI GHES CPER helper translation unit - code mechanically moved from ghes.c
> *
> * Copyright (C) 2026 ARM Ltd.
> * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> @@ -17,10 +17,176 @@
> #include <linux/slab.h>
>
> #include <acpi/apei.h>
> +#include <acpi/ghes_cper.h>
>
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
>
> #include "apei-internal.h"
>
> -/* Helper bodies will be moved here in follow-up commits. */
> +static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
> +{
> + phys_addr_t paddr;
> + pgprot_t prot;
> +
> + paddr = PFN_PHYS(pfn);
> + prot = arch_apei_get_mem_attribute(paddr);
> + __set_fixmap(fixmap_idx, paddr, prot);
> +
> + return (void __iomem *) __fix_to_virt(fixmap_idx);
> +}
> +
> +static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
> +{
> + int _idx = virt_to_fix((unsigned long)vaddr);
> +
> + WARN_ON_ONCE(fixmap_idx != _idx);
> + clear_fixmap(fixmap_idx);
> +}
> +
> +static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
> +{
> + int rc;
> + u64 val = 0;
> +
> + rc = apei_read(&val, &gv2->read_ack_register);
> + if (rc)
> + return;
> +
> + val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
> + val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
> +
> + apei_write(val, &gv2->read_ack_register);
> +}
> +
> +static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
> + int from_phys,
> + enum fixed_addresses fixmap_idx)
> +{
> + void __iomem *vaddr;
> + u64 offset;
> + u32 trunk;
> +
> + while (len > 0) {
> + offset = paddr - (paddr & PAGE_MASK);
> + vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
> + trunk = PAGE_SIZE - offset;
> + trunk = min(trunk, len);
> + if (from_phys)
> + memcpy_fromio(buffer, vaddr + offset, trunk);
> + else
> + memcpy_toio(vaddr + offset, buffer, trunk);
> + len -= trunk;
> + paddr += trunk;
> + buffer += trunk;
> + ghes_unmap(vaddr, fixmap_idx);
> + }
> +}
> +
> +/* Check the top-level record header has an appropriate size. */
> +int __ghes_check_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus)
> +{
> + u32 len = cper_estatus_len(estatus);
> + u32 max_len = min(ghes->generic->error_block_length,
> + ghes->estatus_length);
> +
> + if (len < sizeof(*estatus)) {
> + pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
> + return -EIO;
> + }
> +
> + if (!len || len > max_len) {
> + pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
> + return -EIO;
> + }
> +
> + if (cper_estatus_check_header(estatus)) {
> + pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +/* Read the CPER block, returning its address, and header in estatus. */
> +int __ghes_peek_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
> +{
> + struct acpi_hest_generic *g = ghes->generic;
> + int rc;
> +
> + rc = apei_read(buf_paddr, &g->error_status_address);
> + if (rc) {
> + *buf_paddr = 0;
> + pr_warn_ratelimited(FW_WARN GHES_PFX
> +"Failed to read error status block address for hardware error source: %d.\n",
> + g->header.source_id);
> + return -EIO;
> + }
> + if (!*buf_paddr)
> + return -ENOENT;
> +
> + ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
> + fixmap_idx);
> + if (!estatus->block_status) {
> + *buf_paddr = 0;
> + return -ENOENT;
> + }
> +
> + return 0;
> +}
> +
> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
> + size_t buf_len)
> +{
> + ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
> + if (cper_estatus_check(estatus)) {
> + pr_warn_ratelimited(FW_WARN GHES_PFX
> + "Failed to read error status block!\n");
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +int ghes_read_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
> +{
> + int rc;
> +
> + rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
> + if (rc)
> + return rc;
> +
> + rc = __ghes_check_estatus(ghes, estatus);
> + if (rc)
> + return rc;
> +
> + return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
> + cper_estatus_len(estatus));
> +}
> +
> +void ghes_clear_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx)
> +{
> + estatus->block_status = 0;
> +
> + if (!buf_paddr)
> + return;
> +
> + ghes_copy_tofrom_phys(estatus, buf_paddr,
> + sizeof(estatus->block_status), 0,
> + fixmap_idx);
> +
> + /*
> + * GHESv2 type HEST entries introduce support for error acknowledgment,
> + * so only acknowledge the error if this support is present.
> + */
> + if (is_hest_type_generic_v2(ghes))
> + ghes_ack_error(ghes->generic_v2);
> +}
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> index 2597fbadc4f3..2e3919f0c3e7 100644
> --- a/include/acpi/ghes_cper.h
> +++ b/include/acpi/ghes_cper.h
> @@ -74,21 +74,21 @@ struct ghes_vendor_record_entry {
> char vendor_record[];
> };
>
ditto. "include/acpi/cper.h"
> -static struct ghes *ghes_new(struct acpi_hest_generic *generic);
> -static void ghes_fini(struct ghes *ghes);
> +struct ghes *ghes_new(struct acpi_hest_generic *generic);
> +void ghes_fini(struct ghes *ghes);
>
> -static int ghes_read_estatus(struct ghes *ghes,
> +int ghes_read_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> -static void ghes_clear_estatus(struct ghes *ghes,
> +void ghes_clear_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 buf_paddr, enum fixed_addresses fixmap_idx);
> -static int __ghes_peek_estatus(struct ghes *ghes,
> +int __ghes_peek_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus,
> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> -static int __ghes_check_estatus(struct ghes *ghes,
> +int __ghes_check_estatus(struct ghes *ghes,
> struct acpi_hest_generic_status *estatus);
> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> u64 buf_paddr, enum fixed_addresses fixmap_idx,
> size_t buf_len);
>
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-02-24 15:22 ` Jonathan Cameron
@ 2026-02-26 6:44 ` Himanshu Chauhan
2026-03-11 11:55 ` Ahmed Tiba
1 sibling, 1 reply; 39+ messages in thread
From: Himanshu Chauhan @ 2026-02-26 6:44 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, Feb 20, 2026 at 7:13 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
> Carve the CPER helper macros out of ghes.c and place them in a private
> header so they can be shared with upcoming helper files. This is a
> mechanical include change with no functional differences.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> drivers/acpi/apei/ghes.c | 60 +-----------------------------
> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 96 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index f96aede5d9a3..07b70bcb8342 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -49,6 +49,7 @@
>
> #include <acpi/actbl1.h>
> #include <acpi/ghes.h>
> +#include <acpi/ghes_cper.h>
> #include <acpi/apei.h>
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
> @@ -57,40 +58,6 @@
>
> #include "apei-internal.h"
>
> -#define GHES_PFX "GHES: "
> -
> -#define GHES_ESTATUS_MAX_SIZE 65536
> -#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
> -
> -#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
> -
> -/* This is just an estimation for memory pool allocation */
> -#define GHES_ESTATUS_CACHE_AVG_SIZE 512
> -
> -#define GHES_ESTATUS_CACHES_SIZE 4
> -
> -#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
> -/* Prevent too many caches are allocated because of RCU */
> -#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
> -
> -#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
> - (sizeof(struct ghes_estatus_cache) + (estatus_len))
> -#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
> - ((struct acpi_hest_generic_status *) \
> - ((struct ghes_estatus_cache *)(estatus_cache) + 1))
> -
> -#define GHES_ESTATUS_NODE_LEN(estatus_len) \
> - (sizeof(struct ghes_estatus_node) + (estatus_len))
> -#define GHES_ESTATUS_FROM_NODE(estatus_node) \
> - ((struct acpi_hest_generic_status *) \
> - ((struct ghes_estatus_node *)(estatus_node) + 1))
> -
> -#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
> - (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
> -#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
> - ((struct acpi_hest_generic_data *) \
> - ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
> -
> /*
> * NMI-like notifications vary by architecture, before the compiler can prune
> * unused static functions it needs a value for these enums.
> @@ -102,25 +69,6 @@
>
> static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>
> -static inline bool is_hest_type_generic_v2(struct ghes *ghes)
> -{
> - return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
> -}
> -
> -/*
> - * A platform may describe one error source for the handling of synchronous
> - * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
> - * or External Interrupt). On x86, the HEST notifications are always
> - * asynchronous, so only SEA on ARM is delivered as a synchronous
> - * notification.
> - */
> -static inline bool is_hest_sync_notify(struct ghes *ghes)
> -{
> - u8 notify_type = ghes->generic->notify.type;
> -
> - return notify_type == ACPI_HEST_NOTIFY_SEA;
> -}
All this has nothing to do with CPER which is defined in UEFI. All of
this is part of the GHES structure defined in ACPI. Why are these
being moved to ghes_cper.h.
It is blurring out the demacations. If you are caving out CPER
helpers, please don't move GHES helpers. The better place to move
these helpers is ghes.h otherwise they are good where they are.
> -
> /*
> * This driver isn't really modular, however for the time being,
> * continuing to use module_param is the easiest way to remain
> @@ -165,12 +113,6 @@ static DEFINE_MUTEX(ghes_devs_mutex);
> */
> static DEFINE_SPINLOCK(ghes_notify_lock_irq);
>
> -struct ghes_vendor_record_entry {
> - struct work_struct work;
> - int error_severity;
> - char vendor_record[];
> -};
> -
> static struct gen_pool *ghes_estatus_pool;
>
> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> new file mode 100644
> index 000000000000..2597fbadc4f3
> --- /dev/null
> +++ b/include/acpi/ghes_cper.h
> @@ -0,0 +1,95 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * APEI Generic Hardware Error Source: CPER Helper
> + *
> + * Copyright (C) 2026 ARM Ltd.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + * Based on ACPI APEI GHES driver.
> + *
> + */
> +
> +#ifndef ACPI_APEI_GHES_CPER_H
> +#define ACPI_APEI_GHES_CPER_H
> +
> +#include <linux/workqueue.h>
> +
> +#include <acpi/ghes.h>
> +
> +#define GHES_PFX "GHES: "
> +
> +#define GHES_ESTATUS_MAX_SIZE 65536
> +#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
> +
> +#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
> +
> +/* This is just an estimation for memory pool allocation */
> +#define GHES_ESTATUS_CACHE_AVG_SIZE 512
> +
> +#define GHES_ESTATUS_CACHES_SIZE 4
> +
> +#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
> +/* Prevent too many caches are allocated because of RCU */
> +#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
> +
> +#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
> + (sizeof(struct ghes_estatus_cache) + (estatus_len))
> +#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
> + ((struct acpi_hest_generic_status *) \
> + ((struct ghes_estatus_cache *)(estatus_cache) + 1))
> +
> +#define GHES_ESTATUS_NODE_LEN(estatus_len) \
> + (sizeof(struct ghes_estatus_node) + (estatus_len))
> +#define GHES_ESTATUS_FROM_NODE(estatus_node) \
> + ((struct acpi_hest_generic_status *) \
> + ((struct ghes_estatus_node *)(estatus_node) + 1))
> +
> +#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
> + (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
> +#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
> + ((struct acpi_hest_generic_data *) \
> + ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
> +
> +static inline bool is_hest_type_generic_v2(struct ghes *ghes)
> +{
> + return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
> +}
> +
> +/*
> + * A platform may describe one error source for the handling of synchronous
> + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
> + * or External Interrupt). On x86, the HEST notifications are always
> + * asynchronous, so only SEA on ARM is delivered as a synchronous
> + * notification.
> + */
> +static inline bool is_hest_sync_notify(struct ghes *ghes)
> +{
> + u8 notify_type = ghes->generic->notify.type;
> +
> + return notify_type == ACPI_HEST_NOTIFY_SEA;
> +}
> +
> +struct ghes_vendor_record_entry {
> + struct work_struct work;
> + int error_severity;
> + char vendor_record[];
> +};
> +
> +static struct ghes *ghes_new(struct acpi_hest_generic *generic);
> +static void ghes_fini(struct ghes *ghes);
> +
> +static int ghes_read_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> +static void ghes_clear_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx);
> +static int __ghes_peek_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus,
> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
> +static int __ghes_check_estatus(struct ghes *ghes,
> + struct acpi_hest_generic_status *estatus);
> +static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
> + size_t buf_len);
> +
> +#endif /* ACPI_APEI_GHES_CPER_H */
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-21 9:06 ` Krzysztof Kozlowski
2026-02-24 15:55 ` Jonathan Cameron
@ 2026-02-26 7:01 ` Himanshu Chauhan
2 siblings, 0 replies; 39+ messages in thread
From: Himanshu Chauhan @ 2026-02-26 7:01 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, Feb 20, 2026 at 7:13 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
> Add a DeviceTree firmware-first CPER provider that reuses the shared
> GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
> the admin guide. Update MAINTAINERS now that the driver exists.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> Documentation/admin-guide/RAS/main.rst | 18 +++
> MAINTAINERS | 1 +
> drivers/acpi/apei/apei-internal.h | 10 +-
> drivers/acpi/apei/ghes_cper.c | 2 +
> drivers/ras/Kconfig | 12 ++
> drivers/ras/Makefile | 1 +
> drivers/ras/esource-dt.c | 264 +++++++++++++++++++++++++++++++++
> include/acpi/ghes_cper.h | 9 ++
> 8 files changed, 308 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
> index 5a45db32c49b..4ffabaaeabb1 100644
> --- a/Documentation/admin-guide/RAS/main.rst
> +++ b/Documentation/admin-guide/RAS/main.rst
> @@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
> .. [#f3] For more details about the Machine Check Architecture (MCA),
> please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
>
> +Firmware-first CPER via DeviceTree
> +----------------------------------
> +
> +Some systems expose Common Platform Error Record (CPER) data
> +via DeviceTree instead of ACPI HEST tables.
> +Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
> +driver and describe the CPER error source buffer with the
> +``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
> +The driver reuses the GHES CPER helper object in
> +``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
> +memory failure handling match the ACPI GHES behaviour even when
> +ACPI is disabled.
> +
> +Once a platform describes a firmware-first provider, both ACPI GHES and the
> +DeviceTree driver reuse the same code paths. This keeps the behaviour
> +consistent regardless of whether the error source is described via ACPI
> +tables or DeviceTree.
> +
> EDAC - Error Detection And Correction
> *************************************
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 47db7877b485..fa6113b482b7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22031,6 +22031,7 @@ RAS ERROR STATUS
> M: Ahmed Tiba <ahmed.tiba@arm.com>
> S: Maintained
> F: Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
> +F: drivers/ras/esource-dt.c
>
> RAS INFRASTRUCTURE
> M: Tony Luck <tony.luck@intel.com>
> diff --git a/drivers/acpi/apei/apei-internal.h b/drivers/acpi/apei/apei-internal.h
> index 77c10a7a7a9f..c16ac541f15b 100644
> --- a/drivers/acpi/apei/apei-internal.h
> +++ b/drivers/acpi/apei/apei-internal.h
> @@ -8,6 +8,7 @@
> #define APEI_INTERNAL_H
>
> #include <linux/acpi.h>
> +#include <acpi/ghes_cper.h>
>
> struct apei_exec_context;
>
> @@ -120,15 +121,6 @@ int apei_exec_collect_resources(struct apei_exec_context *ctx,
> struct dentry;
> struct dentry *apei_get_debugfs_dir(void);
>
> -static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
> -{
> - if (estatus->raw_data_length)
> - return estatus->raw_data_offset + \
> - estatus->raw_data_length;
> - else
> - return sizeof(*estatus) + estatus->data_length;
> -}
> -
> int apei_osc_setup(void);
>
> int einj_get_available_error_type(u32 *type, int einj_action);
> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
> index 29b790160e91..9b2d1b8cf9f4 100644
> --- a/drivers/acpi/apei/ghes_cper.c
> +++ b/drivers/acpi/apei/ghes_cper.c
> @@ -42,7 +42,9 @@
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
>
> +#ifdef CONFIG_ACPI_APEI
> #include "apei-internal.h"
> +#endif
>
> ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>
> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..ea6d96713020 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -34,6 +34,18 @@ if RAS
> source "arch/x86/ras/Kconfig"
> source "drivers/ras/amd/atl/Kconfig"
>
> +config RAS_ESOURCE_DT
> + bool "DeviceTree firmware-first CPER error source block provider"
> + depends on OF
> + depends on ARM64
> + select GHES_CPER_HELPERS
> + help
> + Enable support for firmware-first Common Platform Error Record (CPER)
> + error source block providers that are described via DeviceTree
> + instead of ACPI HEST tables. The driver reuses the existing GHES
> + CPER helpers so the error processing matches the ACPI code paths,
> + but it can be built even when ACPI is disabled.
> +
> config RAS_FMPM
> tristate "FRU Memory Poison Manager"
> default m
> diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
> index 11f95d59d397..53558a1707b3 100644
> --- a/drivers/ras/Makefile
> +++ b/drivers/ras/Makefile
> @@ -2,6 +2,7 @@
> obj-$(CONFIG_RAS) += ras.o
> obj-$(CONFIG_DEBUG_FS) += debugfs.o
> obj-$(CONFIG_RAS_CEC) += cec.o
> +obj-$(CONFIG_RAS_ESOURCE_DT) += esource-dt.o
>
> obj-$(CONFIG_RAS_FMPM) += amd/fmpm.o
> obj-y += amd/atl/
> diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
> new file mode 100644
> index 000000000000..b575a2258536
> --- /dev/null
> +++ b/drivers/ras/esource-dt.c
> @@ -0,0 +1,264 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * DeviceTree provider for firmware-first CPER error source block.
> + *
> + * This driver shares the GHES CPER helpers so we keep the reporting and
> + * notifier behaviour identical to ACPI GHES
> + *
> + * Copyright (C) 2025 ARM Ltd.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
> +#include <linux/panic.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#include <acpi/ghes.h>
> +#include <acpi/ghes_cper.h>
> +
> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
> +
> +struct ghes_ffh_ack {
> + void __iomem *addr;
> + u64 preserve;
> + u64 set;
> + u8 width;
> + bool present;
> +};
Please don't use ffh. FFH stands for Fixed Feature Hardware. This is
making it confusing. As per ACPI specification, FFH can be used to
register read/write while handling errors.
I have started feeling that all this churn should be avoided. All the
GHES code is also being moved in the name of CPER helpers.
> +
> +struct ghes_ffh {
> + struct device *dev;
> + void __iomem *status;
> + size_t status_len;
> +
> + struct ghes_ffh_ack ack;
> +
> + struct acpi_hest_generic *generic;
> + struct acpi_hest_generic_status *estatus;
> +
> + bool sync;
> + int irq;
> +
> + /* Serializes access to the firmware-owned buffer. */
> + spinlock_t lock;
> +};
> +
> +static int ghes_ffh_init_pool(void)
> +{
> + if (ghes_estatus_pool)
> + return 0;
> +
> + return ghes_estatus_pool_init(1);
> +}
> +
> +static int ghes_ffh_copy_status(struct ghes_ffh *ctx)
> +{
> + memcpy_fromio(ctx->estatus, ctx->status, ctx->status_len);
> + return 0;
> +}
> +
> +static void ghes_ffh_ack(struct ghes_ffh *ctx)
> +{
> + u64 val;
> +
> + if (!ctx->ack.present)
> + return;
> +
> + if (ctx->ack.width == 64) {
> + val = readq(ctx->ack.addr);
> + val &= ctx->ack.preserve;
> + val |= ctx->ack.set;
> + writeq(val, ctx->ack.addr);
> + } else {
> + val = readl(ctx->ack.addr);
> + val &= (u32)ctx->ack.preserve;
> + val |= (u32)ctx->ack.set;
> + writel(val, ctx->ack.addr);
> + }
> +}
> +
> +static void ghes_ffh_fatal(struct ghes_ffh *ctx)
> +{
> + __ghes_print_estatus(KERN_EMERG, ctx->generic, ctx->estatus);
> + add_taint(TAINT_MACHINE_CHECK, LOCKDEP_STILL_OK);
> + panic("GHES: fatal firmware-first CPER record from %s\n",
> + dev_name(ctx->dev));
> +}
> +
> +static void ghes_ffh_process(struct ghes_ffh *ctx)
> +{
> + unsigned long flags;
> + int sev;
> +
> + spin_lock_irqsave(&ctx->lock, flags);
> +
> + if (ghes_ffh_copy_status(ctx))
> + goto out;
> +
> + sev = ghes_severity(ctx->estatus->error_severity);
> + if (sev >= GHES_SEV_PANIC)
> + ghes_ffh_fatal(ctx);
> +
> + if (!ghes_estatus_cached(ctx->estatus)) {
> + if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
> + ghes_estatus_cache_add(ctx->generic, ctx->estatus);
> + }
> +
> + ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
> +
> + ghes_ffh_ack(ctx);
> +
> +out:
> + spin_unlock_irqrestore(&ctx->lock, flags);
> +}
> +
> +static irqreturn_t ghes_ffh_irq(int irq, void *data)
> +{
> + struct ghes_ffh *ctx = data;
> +
> + ghes_ffh_process(ctx);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int ghes_ffh_init_ack(struct platform_device *pdev,
> + struct ghes_ffh *ctx)
> +{
> + struct resource *res;
> + size_t size;
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> + if (!res)
> + return 0;
> +
> + ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
> + if (IS_ERR(ctx->ack.addr))
> + return PTR_ERR(ctx->ack.addr);
> +
> + size = resource_size(res);
> + switch (size) {
> + case 4:
> + ctx->ack.width = 32;
> + ctx->ack.preserve = ~0U;
> + break;
> + case 8:
> + ctx->ack.width = 64;
> + ctx->ack.preserve = ~0ULL;
> + break;
> + default:
> + dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
> + return -EINVAL;
> + }
> +
> + ctx->ack.set = BIT_ULL(0);
> + ctx->ack.present = true;
> + return 0;
> +}
> +
> +static int ghes_ffh_probe(struct platform_device *pdev)
> +{
> + struct ghes_ffh *ctx;
> + struct resource *res;
> + int rc;
> +
> + ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
> + if (!ctx)
> + return -ENOMEM;
> +
> + spin_lock_init(&ctx->lock);
> + ctx->dev = &pdev->dev;
> + ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
> +
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + if (!res) {
> + dev_err(&pdev->dev, "status region missing\n");
> + return -EINVAL;
> + }
> +
> + ctx->status_len = resource_size(res);
> + if (!ctx->status_len) {
> + dev_err(&pdev->dev, "Status region has zero length\n");
> + return -EINVAL;
> + }
> +
> + ctx->status = devm_ioremap_resource(&pdev->dev, res);
> + if (IS_ERR(ctx->status))
> + return PTR_ERR(ctx->status);
> +
> + rc = ghes_ffh_init_ack(pdev, ctx);
> + if (rc)
> + return rc;
> +
> + rc = ghes_ffh_init_pool();
> + if (rc)
> + return rc;
> +
> + ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
> + if (!ctx->estatus)
> + return -ENOMEM;
> +
> + ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
> + if (!ctx->generic)
> + return -ENOMEM;
> +
> + ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
> + ctx->generic->header.source_id =
> + atomic_inc_return(&ghes_ffh_source_ids);
> + ctx->generic->notify.type = ctx->sync ?
> + ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
> + ctx->generic->error_block_length = ctx->status_len;
> +
> + ctx->irq = platform_get_irq_optional(pdev, 0);
> + if (ctx->irq <= 0) {
> + if (ctx->irq == -EPROBE_DEFER)
> + return ctx->irq;
> + dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
> + return -EINVAL;
> + }
> +
> + rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
> + NULL, ghes_ffh_irq,
> + IRQF_ONESHOT,
> + dev_name(&pdev->dev), ctx);
> + if (rc)
> + return rc;
> +
> + platform_set_drvdata(pdev, ctx);
> + dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
> + return 0;
> +}
> +
> +static void ghes_ffh_remove(struct platform_device *pdev)
> +{
> +}
> +
> +static const struct of_device_id ghes_ffh_of_match[] = {
> + { .compatible = "arm,ras-ffh" },
> + { /* sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
> +
> +static struct platform_driver ghes_ffh_driver = {
> + .driver = {
> + .name = "esource-dt",
> + .of_match_table = ghes_ffh_of_match,
> + },
> + .probe = ghes_ffh_probe,
> + .remove = ghes_ffh_remove,
> +};
> +
> +module_platform_driver(ghes_ffh_driver);
> +
> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
> +MODULE_LICENSE("GPL");
> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> index f7c9fba62585..d43185c020ee 100644
> --- a/include/acpi/ghes_cper.h
> +++ b/include/acpi/ghes_cper.h
> @@ -75,6 +75,15 @@ static inline bool is_hest_sync_notify(struct ghes *ghes)
> return notify_type == ACPI_HEST_NOTIFY_SEA;
> }
>
> +static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
> +{
> + if (estatus->raw_data_length)
> + return estatus->raw_data_offset + \
> + estatus->raw_data_length;
> + else
> + return sizeof(*estatus) + estatus->data_length;
> +}
> +
> struct ghes_vendor_record_entry {
> struct work_struct work;
> int error_severity;
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
@ 2026-02-26 7:03 ` Himanshu Chauhan
2026-03-11 13:41 ` Ahmed Tiba
0 siblings, 1 reply; 39+ messages in thread
From: Himanshu Chauhan @ 2026-02-26 7:03 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, Feb 20, 2026 at 7:15 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
> Describe the DeviceTree node that exposes the Arm firmware-first handler
> CPER provider and hook the file into MAINTAINERS so the binding has an
> owner.
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> .../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++++++++++++++++++++++
> MAINTAINERS | 5 ++
> 2 files changed, 76 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
> new file mode 100644
> index 000000000000..eccbaaf45885
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
> @@ -0,0 +1,71 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/firmware/arm,ras-ffh.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Arm Firmware-First Handler (FFH) CPER provider
Please don't called it FFH. FFH stands for Fixed Feature Hardware and
ACPI uses it at multiple places. It is causing confusion.
> +
> +maintainers:
> + - Ahmed Tiba <ahmed.tiba@arm.com>
> +
> +description: |
> + Arm Reliability, Availability and Serviceability (RAS) firmware can expose
> + a firmware-first handler (FFH) that provides UEFI CPER Generic Error Status
> + blocks directly via DeviceTree. The firmware owns the CPER buffer
> + and notifies the OS through an interrupt.
> +
> +properties:
> + compatible:
> + const: arm,ras-ffh
> +
> + reg:
> + minItems: 1
> + items:
> + - description:
> + CPER Generic Error Status block exposed by firmware
> + - description:
> + Optional 32- or 64-bit doorbell register used on platforms
> + where firmware needs an explicit "ack" handshake before overwriting
> + the CPER buffer. Firmware watches bit 0 and expects the OS to set it
> + once the current status block has been consumed.
> +
> + interrupts:
> + maxItems: 1
> + description:
> + Interrupt used to signal that a new status record is ready.
> +
> + memory-region:
> + $ref: /schemas/types.yaml#/definitions/phandle
> + description:
> + Optional phandle to the reserved-memory entry that backs the status
> + buffer so firmware and the OS use the same carved-out region.
> +
> +required:
> + - compatible
> + - reg
> + - interrupts
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + #include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> + reserved-memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ras_cper_buffer: cper@fe800000 {
> + reg = <0x0 0xfe800000 0x0 0x1000>;
> + no-map;
> + };
> + };
> +
> + error-handler@fe800000 {
> + compatible = "arm,ras-ffh";
> + reg = <0xfe800000 0x1000>,
> + <0xfe810000 0x4>;
> + memory-region = <&ras_cper_buffer>;
> + interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>;
> + };
> +...
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b8d8a5c41597..47db7877b485 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22027,6 +22027,11 @@ M: Alexandre Bounine <alex.bou9@gmail.com>
> S: Maintained
> F: drivers/rapidio/
>
> +RAS ERROR STATUS
> +M: Ahmed Tiba <ahmed.tiba@arm.com>
> +S: Maintained
> +F: Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
> +
> RAS INFRASTRUCTURE
> M: Tony Luck <tony.luck@intel.com>
> M: Borislav Petkov <bp@alien8.de>
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
` (10 preceding siblings ...)
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
@ 2026-02-26 7:05 ` Himanshu Chauhan
2026-03-11 10:44 ` Ahmed Tiba
11 siblings, 1 reply; 39+ messages in thread
From: Himanshu Chauhan @ 2026-02-26 7:05 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On Fri, Feb 20, 2026 at 7:14 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
> This is v2 of the GHES refactor series. The goal is to reuse existing
> GHES CPER handling for non-ACPI platforms without changing the GHES
> flow or naming, and add a DT firmware-first CPER provider, while
> keeping the changes mechanical and reviewable.
It seems almost all the code is being moved from ghes.c to ghes_cper.c
in multiple patches. It is not making sense and looks like an
unnecessary churn.
What is that which can't be handled in a separate file for non-ACPI platforms?
>
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> ---
> Changes in v2:
> - Dropped the proposed "estatus core" and kept GHES naming/flow intact
> (per Borislav Petkov).
> - Re-sliced the series into smaller mechanical steps (per Mauro Carvalho Chehab).
> - Minor DT binding fixes based on Krzysztof Kozlowski's feedback.
> - Removed fixmap slot usage from the DT FFH driver (per Will Deacon).
>
> Series structure:
> - Patches 1-8 are mechanical moves only and do not change behavior.
> - Patch 9 wires the shared helpers back into GHES.
> - The DT firmware-first CPER buffer provider is added in the final patches.
> - "ACPI: APEI: introduce GHES helper" is internal build glue only
> and does not introduce a new user-visible configuration option.
>
> - Link to v1: https://lore.kernel.org/r/20251217112845.1814119-1-ahmed.tiba@arm.com
>
> ---
> Ahmed Tiba (11):
> ACPI: APEI: GHES: share macros via a private header
> ACPI: APEI: GHES: add ghes_cper.o stub
> ACPI: APEI: GHES: move CPER read helpers
> ACPI: APEI: GHES: move GHESv2 ack and alloc helpers
> ACPI: APEI: GHES: move estatus cache helpers
> ACPI: APEI: GHES: move vendor record helpers
> ACPI: APEI: GHES: move CXL CPER helpers
> ACPI: APEI: introduce GHES helper
> ACPI: APEI: share GHES CPER helpers
> dt-bindings: firmware: add arm,ras-ffh
> RAS: add DeviceTree firmware-first CPER provider
>
> Documentation/admin-guide/RAS/main.rst | 18 +
> .../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++
> MAINTAINERS | 6 +
> drivers/Makefile | 1 +
> drivers/acpi/Kconfig | 4 +
> drivers/acpi/apei/Kconfig | 1 +
> drivers/acpi/apei/apei-internal.h | 10 +-
> drivers/acpi/apei/ghes.c | 1024 +------------------
> drivers/acpi/apei/ghes_cper.c | 1026 ++++++++++++++++++++
> drivers/ras/Kconfig | 12 +
> drivers/ras/Makefile | 1 +
> drivers/ras/esource-dt.c | 264 +++++
> include/acpi/ghes.h | 10 +-
> include/acpi/ghes_cper.h | 143 +++
> include/cxl/event.h | 2 +-
> 15 files changed, 1558 insertions(+), 1035 deletions(-)
> ---
> base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
> change-id: 20260220-topics-ahmtib01-ras_ffh_arm_internal_review-bfddc7fc7cab
>
> Best regards,
> --
> Ahmed Tiba <ahmed.tiba@arm.com>
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider
2026-02-26 7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
@ 2026-03-11 10:44 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 10:44 UTC (permalink / raw)
To: Himanshu Chauhan
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 26/02/2026 07:05, Himanshu Chauhan wrote:
> On Fri, Feb 20, 2026 at 7:14 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>>
>> This is v2 of the GHES refactor series. The goal is to reuse existing
>> GHES CPER handling for non-ACPI platforms without changing the GHES
>> flow or naming, and add a DT firmware-first CPER provider, while
>> keeping the changes mechanical and reviewable.
>
> It seems almost all the code is being moved from ghes.c to ghes_cper.c
> in multiple patches. It is not making sense and looks like an
> unnecessary churn.
> What is that which can't be handled in a separate file for non-ACPI platforms?
The intent is to reuse the existing GHES CPER parsing
and reporting logic for non‑ACPI platforms without duplicating it.
That does require moving the shared CPER handling into a common helper
file so both GHES and the DT provider call the same code.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>> ---
>> Changes in v2:
>> - Dropped the proposed "estatus core" and kept GHES naming/flow intact
>> (per Borislav Petkov).
>> - Re-sliced the series into smaller mechanical steps (per Mauro Carvalho Chehab).
>> - Minor DT binding fixes based on Krzysztof Kozlowski's feedback.
>> - Removed fixmap slot usage from the DT FFH driver (per Will Deacon).
>>
>> Series structure:
>> - Patches 1-8 are mechanical moves only and do not change behavior.
>> - Patch 9 wires the shared helpers back into GHES.
>> - The DT firmware-first CPER buffer provider is added in the final patches.
>> - "ACPI: APEI: introduce GHES helper" is internal build glue only
>> and does not introduce a new user-visible configuration option.
>>
>> - Link to v1: https://lore.kernel.org/r/20251217112845.1814119-1-ahmed.tiba@arm.com
>>
>> ---
>> Ahmed Tiba (11):
>> ACPI: APEI: GHES: share macros via a private header
>> ACPI: APEI: GHES: add ghes_cper.o stub
>> ACPI: APEI: GHES: move CPER read helpers
>> ACPI: APEI: GHES: move GHESv2 ack and alloc helpers
>> ACPI: APEI: GHES: move estatus cache helpers
>> ACPI: APEI: GHES: move vendor record helpers
>> ACPI: APEI: GHES: move CXL CPER helpers
>> ACPI: APEI: introduce GHES helper
>> ACPI: APEI: share GHES CPER helpers
>> dt-bindings: firmware: add arm,ras-ffh
>> RAS: add DeviceTree firmware-first CPER provider
>>
>> Documentation/admin-guide/RAS/main.rst | 18 +
>> .../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++
>> MAINTAINERS | 6 +
>> drivers/Makefile | 1 +
>> drivers/acpi/Kconfig | 4 +
>> drivers/acpi/apei/Kconfig | 1 +
>> drivers/acpi/apei/apei-internal.h | 10 +-
>> drivers/acpi/apei/ghes.c | 1024 +------------------
>> drivers/acpi/apei/ghes_cper.c | 1026 ++++++++++++++++++++
>> drivers/ras/Kconfig | 12 +
>> drivers/ras/Makefile | 1 +
>> drivers/ras/esource-dt.c | 264 +++++
>> include/acpi/ghes.h | 10 +-
>> include/acpi/ghes_cper.h | 143 +++
>> include/cxl/event.h | 2 +-
>> 15 files changed, 1558 insertions(+), 1035 deletions(-)
>> ---
>> base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
>> change-id: 20260220-topics-ahmtib01-ras_ffh_arm_internal_review-bfddc7fc7cab
>>
>> Best regards,
>> --
>> Ahmed Tiba <ahmed.tiba@arm.com>
>>
>>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-02-24 15:22 ` Jonathan Cameron
@ 2026-03-11 11:39 ` Ahmed Tiba
2026-03-11 12:39 ` Jonathan Cameron
0 siblings, 1 reply; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 11:39 UTC (permalink / raw)
To: Jonathan Cameron
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck, Mauro Carvalho Chehab
On 24/02/2026 15:22, Jonathan Cameron wrote:
> On Fri, 20 Feb 2026 13:42:19 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> Carve the CPER helper macros out of ghes.c and place them in a private
>> header so they can be shared with upcoming helper files. This is a
>> mechanical include change with no functional differences.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> +CC Mauro as he's been doing a lot of work on error injection recently so
> can probably review the use of the various structures much more easily
> than I can!
>
> My main comment is on the naming of the new header.
>
> Jonathan
The content is intentionally GHES‑specific CPER handling,
not generic UEFI CPER. It's the GHES view of CPER parsing/handling
and is used by the shared GHES/DT path, so keeping it in ghes_cper.h
documents that boundary better than moving it to ghes.h (which also
contains non‑CPER GHES logic). The helpers moved there are the ones
needed by the shared CPER handling path.
>> ---
>> drivers/acpi/apei/ghes.c | 60 +-----------------------------
>> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 96 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index f96aede5d9a3..07b70bcb8342 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>
>>
>> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
>> new file mode 100644
>> index 000000000000..2597fbadc4f3
>> --- /dev/null
>> +++ b/include/acpi/ghes_cper.h
>> @@ -0,0 +1,95 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * APEI Generic Hardware Error Source: CPER Helper
>
> There is other stuff in her usch as the GHES acks etc
> in ghes_clear_estatus(). So I think this intro text
> needs a bit more thought. The boundary is already rather
> blurred though as for example cper_estatus_len() is only
> tangentially connected to cper.
>
>> + *
>> + * Copyright (C) 2026 ARM Ltd.
>
> Doesn't make sense to ad this copyright in this patch as so far
> it's cut and paste of code from a file that you didn't write (at least
> not in 2026!)
>
> Might make sense after a few patches, in which case add the copyright
> when it does.
The file is new and maintained by Arm as part of this refactor,
so I kept the header consistent with other newly introduced files.
>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> + * Based on ACPI APEI GHES driver.
>> + *
>> + */
>> +
>> +#ifndef ACPI_APEI_GHES_CPER_H
>> +#define ACPI_APEI_GHES_CPER_H
>> +
>> +#include <linux/workqueue.h>
>> +
>> +#include <acpi/ghes.h>
>> +
>> +#define GHES_PFX "GHES: "
>> +
>> +#define GHES_ESTATUS_MAX_SIZE 65536
>> +#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
>> +
>> +#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
>> +
>> +/* This is just an estimation for memory pool allocation */
>> +#define GHES_ESTATUS_CACHE_AVG_SIZE 512
>> +
>> +#define GHES_ESTATUS_CACHES_SIZE 4
>> +
>> +#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
>> +/* Prevent too many caches are allocated because of RCU */
>> +#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
>> +
>> +#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
>> + (sizeof(struct ghes_estatus_cache) + (estatus_len))
>> +#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
>> + ((struct acpi_hest_generic_status *) \
>> + ((struct ghes_estatus_cache *)(estatus_cache) + 1))
>> +
>> +#define GHES_ESTATUS_NODE_LEN(estatus_len) \
>> + (sizeof(struct ghes_estatus_node) + (estatus_len))
>> +#define GHES_ESTATUS_FROM_NODE(estatus_node) \
>> + ((struct acpi_hest_generic_status *) \
>> + ((struct ghes_estatus_node *)(estatus_node) + 1))
>> +
>> +#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
>> + (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
>> +#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
>> + ((struct acpi_hest_generic_data *) \
>> + ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
>> +
>> +static inline bool is_hest_type_generic_v2(struct ghes *ghes)
>> +{
>> + return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
>> +}
>> +
>> +/*
>> + * A platform may describe one error source for the handling of synchronous
>> + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
>> + * or External Interrupt). On x86, the HEST notifications are always
>> + * asynchronous, so only SEA on ARM is delivered as a synchronous
>> + * notification.
>> + */
>> +static inline bool is_hest_sync_notify(struct ghes *ghes)
>> +{
>> + u8 notify_type = ghes->generic->notify.type;
>> +
>> + return notify_type == ACPI_HEST_NOTIFY_SEA;
>> +}
>> +
>> +struct ghes_vendor_record_entry {
>> + struct work_struct work;
>> + int error_severity;
>> + char vendor_record[];
>> +};
>> +
>> +static struct ghes *ghes_new(struct acpi_hest_generic *generic);
>> +static void ghes_fini(struct ghes *ghes);
>> +
>> +static int ghes_read_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> +static void ghes_clear_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx);
>
> I'm not sure some of this makes sense in a file named ghes_cper.h
> Maybe we just need a different intro comment though.
>
>> +static int __ghes_peek_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> +static int __ghes_check_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus);
>> +static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> + size_t buf_len);
>> +
>> +#endif /* ACPI_APEI_GHES_CPER_H */
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-02-26 6:44 ` Himanshu Chauhan
@ 2026-03-11 11:55 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 11:55 UTC (permalink / raw)
To: Himanshu Chauhan
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 26/02/2026 06:44, Himanshu Chauhan wrote:
> On Fri, Feb 20, 2026 at 7:13 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>>
>> Carve the CPER helper macros out of ghes.c and place them in a private
>> header so they can be shared with upcoming helper files. This is a
>> mechanical include change with no functional differences.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>> ---
>> drivers/acpi/apei/ghes.c | 60 +-----------------------------
>> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 96 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index f96aede5d9a3..07b70bcb8342 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -49,6 +49,7 @@
>>
>> #include <acpi/actbl1.h>
>> #include <acpi/ghes.h>
>> +#include <acpi/ghes_cper.h>
>> #include <acpi/apei.h>
>> #include <asm/fixmap.h>
>> #include <asm/tlbflush.h>
>> @@ -57,40 +58,6 @@
>>
>> #include "apei-internal.h"
>>
>> -#define GHES_PFX "GHES: "
>> -
>> -#define GHES_ESTATUS_MAX_SIZE 65536
>> -#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
>> -
>> -#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
>> -
>> -/* This is just an estimation for memory pool allocation */
>> -#define GHES_ESTATUS_CACHE_AVG_SIZE 512
>> -
>> -#define GHES_ESTATUS_CACHES_SIZE 4
>> -
>> -#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
>> -/* Prevent too many caches are allocated because of RCU */
>> -#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
>> -
>> -#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
>> - (sizeof(struct ghes_estatus_cache) + (estatus_len))
>> -#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
>> - ((struct acpi_hest_generic_status *) \
>> - ((struct ghes_estatus_cache *)(estatus_cache) + 1))
>> -
>> -#define GHES_ESTATUS_NODE_LEN(estatus_len) \
>> - (sizeof(struct ghes_estatus_node) + (estatus_len))
>> -#define GHES_ESTATUS_FROM_NODE(estatus_node) \
>> - ((struct acpi_hest_generic_status *) \
>> - ((struct ghes_estatus_node *)(estatus_node) + 1))
>> -
>> -#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
>> - (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
>> -#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
>> - ((struct acpi_hest_generic_data *) \
>> - ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
>> -
>> /*
>> * NMI-like notifications vary by architecture, before the compiler can prune
>> * unused static functions it needs a value for these enums.
>> @@ -102,25 +69,6 @@
>>
>> static ATOMIC_NOTIFIER_HEAD(ghes_report_chain);
>>
>> -static inline bool is_hest_type_generic_v2(struct ghes *ghes)
>> -{
>> - return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
>> -}
>> -
>> -/*
>> - * A platform may describe one error source for the handling of synchronous
>> - * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
>> - * or External Interrupt). On x86, the HEST notifications are always
>> - * asynchronous, so only SEA on ARM is delivered as a synchronous
>> - * notification.
>> - */
>> -static inline bool is_hest_sync_notify(struct ghes *ghes)
>> -{
>> - u8 notify_type = ghes->generic->notify.type;
>> -
>> - return notify_type == ACPI_HEST_NOTIFY_SEA;
>> -}
>
> All this has nothing to do with CPER which is defined in UEFI. All of
> this is part of the GHES structure defined in ACPI. Why are these
> being moved to ghes_cper.h.
> It is blurring out the demacations. If you are caving out CPER
> helpers, please don't move GHES helpers. The better place to move
> these helpers is ghes.h otherwise they are good where they are.
These helpers are part of the GHES CPER handling path,
not generic UEFI CPER. They sit at the boundary where GHES consumes CPER
(read/clear/ack/notify), and that boundary is exactly what the DT
provider must share to keep behavior identical. Putting them in ghes.h
would expand the GHES public surface and mix non‑CPER GHES internals
with the shared CPER path. ghes_cper.h keeps the shared GHES‑CPER
boundary explicit and avoids duplicating the pipeline in a DT‑only file.
That’s why they’re moved there.
>> -
>> /*
>> * This driver isn't really modular, however for the time being,
>> * continuing to use module_param is the easiest way to remain
>> @@ -165,12 +113,6 @@ static DEFINE_MUTEX(ghes_devs_mutex);
>> */
>> static DEFINE_SPINLOCK(ghes_notify_lock_irq);
>>
>> -struct ghes_vendor_record_entry {
>> - struct work_struct work;
>> - int error_severity;
>> - char vendor_record[];
>> -};
>> -
>> static struct gen_pool *ghes_estatus_pool;
>>
>> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
>> new file mode 100644
>> index 000000000000..2597fbadc4f3
>> --- /dev/null
>> +++ b/include/acpi/ghes_cper.h
>> @@ -0,0 +1,95 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * APEI Generic Hardware Error Source: CPER Helper
>> + *
>> + * Copyright (C) 2026 ARM Ltd.
>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> + * Based on ACPI APEI GHES driver.
>> + *
>> + */
>> +
>> +#ifndef ACPI_APEI_GHES_CPER_H
>> +#define ACPI_APEI_GHES_CPER_H
>> +
>> +#include <linux/workqueue.h>
>> +
>> +#include <acpi/ghes.h>
>> +
>> +#define GHES_PFX "GHES: "
>> +
>> +#define GHES_ESTATUS_MAX_SIZE 65536
>> +#define GHES_ESOURCE_PREALLOC_MAX_SIZE 65536
>> +
>> +#define GHES_ESTATUS_POOL_MIN_ALLOC_ORDER 3
>> +
>> +/* This is just an estimation for memory pool allocation */
>> +#define GHES_ESTATUS_CACHE_AVG_SIZE 512
>> +
>> +#define GHES_ESTATUS_CACHES_SIZE 4
>> +
>> +#define GHES_ESTATUS_IN_CACHE_MAX_NSEC 10000000000ULL
>> +/* Prevent too many caches are allocated because of RCU */
>> +#define GHES_ESTATUS_CACHE_ALLOCED_MAX (GHES_ESTATUS_CACHES_SIZE * 3 / 2)
>> +
>> +#define GHES_ESTATUS_CACHE_LEN(estatus_len) \
>> + (sizeof(struct ghes_estatus_cache) + (estatus_len))
>> +#define GHES_ESTATUS_FROM_CACHE(estatus_cache) \
>> + ((struct acpi_hest_generic_status *) \
>> + ((struct ghes_estatus_cache *)(estatus_cache) + 1))
>> +
>> +#define GHES_ESTATUS_NODE_LEN(estatus_len) \
>> + (sizeof(struct ghes_estatus_node) + (estatus_len))
>> +#define GHES_ESTATUS_FROM_NODE(estatus_node) \
>> + ((struct acpi_hest_generic_status *) \
>> + ((struct ghes_estatus_node *)(estatus_node) + 1))
>> +
>> +#define GHES_VENDOR_ENTRY_LEN(gdata_len) \
>> + (sizeof(struct ghes_vendor_record_entry) + (gdata_len))
>> +#define GHES_GDATA_FROM_VENDOR_ENTRY(vendor_entry) \
>> + ((struct acpi_hest_generic_data *) \
>> + ((struct ghes_vendor_record_entry *)(vendor_entry) + 1))
>> +
>> +static inline bool is_hest_type_generic_v2(struct ghes *ghes)
>> +{
>> + return ghes->generic->header.type == ACPI_HEST_TYPE_GENERIC_ERROR_V2;
>> +}
>> +
>> +/*
>> + * A platform may describe one error source for the handling of synchronous
>> + * errors (e.g. MCE or SEA), or for handling asynchronous errors (e.g. SCI
>> + * or External Interrupt). On x86, the HEST notifications are always
>> + * asynchronous, so only SEA on ARM is delivered as a synchronous
>> + * notification.
>> + */
>> +static inline bool is_hest_sync_notify(struct ghes *ghes)
>> +{
>> + u8 notify_type = ghes->generic->notify.type;
>> +
>> + return notify_type == ACPI_HEST_NOTIFY_SEA;
>> +}
>> +
>> +struct ghes_vendor_record_entry {
>> + struct work_struct work;
>> + int error_severity;
>> + char vendor_record[];
>> +};
>> +
>> +static struct ghes *ghes_new(struct acpi_hest_generic *generic);
>> +static void ghes_fini(struct ghes *ghes);
>> +
>> +static int ghes_read_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> +static void ghes_clear_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx);
>> +static int __ghes_peek_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> +static int __ghes_check_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus);
>> +static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> + size_t buf_len);
>> +
>> +#endif /* ACPI_APEI_GHES_CPER_H */
>>
>> --
>> 2.43.0
>>
>>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub
2026-02-24 15:25 ` Jonathan Cameron
@ 2026-03-11 12:19 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 12:19 UTC (permalink / raw)
To: Jonathan Cameron
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 24/02/2026 15:25, Jonathan Cameron wrote:
> On Fri, 20 Feb 2026 13:42:20 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> Introduce a dedicated ghes_cper translation unit so that follow-on commits
>> can move helpers out of ghes.c without touching the build logic twice.
>> This keeps the object in the tree while remaining functionally identical.
>
> I'd probably do this with the first move patch not as a separate patch.
> That would resolve the question of headers etc below.
I kept the stub as a separate patch intentionally. It isolates the build
system change and the new translation unit so all subsequent patches are
pure mechanical moves, which makes review and bisection straightforward.
If I fold the stub into the first move, the first functional patch
ends up mixing build plumbing and code movement, which is exactly what
I’m trying to avoid.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>> ---
>> drivers/acpi/apei/Makefile | 2 +-
>> drivers/acpi/apei/ghes_cper.c | 26 ++++++++++++++++++++++++++
>> 2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
>> index 1a0b85923cd4..b3774af70883 100644
>> --- a/drivers/acpi/apei/Makefile
>> +++ b/drivers/acpi/apei/Makefile
>> @@ -1,6 +1,6 @@
>> # SPDX-License-Identifier: GPL-2.0
>> obj-$(CONFIG_ACPI_APEI) += apei.o
>> -obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
>> +obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o ghes_cper.o
>> # clang versions prior to 18 may blow out the stack with KASAN
>> ifeq ($(CONFIG_COMPILE_TEST)_$(CONFIG_CC_IS_CLANG)_$(call clang-min-version, 180000),y_y_)
>> KASAN_SANITIZE_ghes.o := n
>> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
>> new file mode 100644
>> index 000000000000..63047322a3d9
>> --- /dev/null
>> +++ b/drivers/acpi/apei/ghes_cper.c
>> @@ -0,0 +1,26 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + *
>
> As below.
>
>> + * APEI GHES CPER helper translation unit - staging file for helper moves
>> + *
>> + * Copyright (C) 2026 ARM Ltd.
>
> As before. If there isn't significant new content copyright doesn't make sense yet.
I can defer the copyright line until there’s more new content.
>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> + * Based on ACPI APEI GHES driver.
>> + *
>
> No obvious benefit in this blank line so I'd drop it.
I'll drop it.
>> + */
>> +
>> +#include <linux/err.h>
>> +#include <linux/io.h>
>> +#include <linux/kernel.h>
>> +#include <linux/mm.h>
>> +#include <linux/ratelimit.h>
>> +#include <linux/slab.h>
> Build includes up as they become relevant. That way we can see whether
> they are needed or not. Right now none of them are..
I’m front‑loading the includes that the subsequent mechanical moves
will need so those patches remain strict cut‑and‑paste with no extra
edit noise. That keeps the movement obvious and reviewable.
>> +
>> +#include <acpi/apei.h>
>> +
>> +#include <asm/fixmap.h>
>> +#include <asm/tlbflush.h>
>> +
>> +#include "apei-internal.h"
>> +
>> +/* Helper bodies will be moved here in follow-up commits. */
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers
2026-02-24 15:32 ` Jonathan Cameron
@ 2026-03-11 12:38 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 12:38 UTC (permalink / raw)
To: Jonathan Cameron
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 24/02/2026 15:32, Jonathan Cameron wrote:
> On Fri, 20 Feb 2026 13:42:21 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> Relocate the CPER buffer mapping, peek, and clear helpers from ghes.c into
>> ghes_cper.c so they can be shared with other firmware-first providers.
>> This commit only shuffles code; behavior stays the same.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> Hi Ahmed,
>
> Most of the comments in here are about changing the patch break up.
> Basic suggest approach is move stuff as it is needed, not in advance of
> that need. So when you move the function to the c file, only then add what
> it needs to the includes / header.
>
> Jonathan
Thanks Jonathan.
I’m keeping each patch as a strict mechanical relocation so reviewers
can diff‑verify the move. That’s why includes and header prototypes are
introduced alongside the moved helpers, not later.
>> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
>> index 63047322a3d9..7e0015e960c1 100644
>> --- a/drivers/acpi/apei/ghes_cper.c
>> +++ b/drivers/acpi/apei/ghes_cper.c
>> @@ -1,7 +1,7 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /*
>> *
>> - * APEI GHES CPER helper translation unit - staging file for helper moves
>> + * APEI GHES CPER helper translation unit - code mechanically moved from ghes.c
>
> In the long run, no interest in where it came from. People can
> look at the git history for that.
>
I'll drop it.
>> *
>> * Copyright (C) 2026 ARM Ltd.
>> * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> @@ -17,10 +17,176 @@
>> #include <linux/slab.h>
>>
>> #include <acpi/apei.h>
>> +#include <acpi/ghes_cper.h>
>>
>> #include <asm/fixmap.h>
>> #include <asm/tlbflush.h>
>>
>> #include "apei-internal.h"
>>
>> -/* Helper bodies will be moved here in follow-up commits. */
>
> If you just do the file creation with this first move, then we don't get churn of
> comments like this one.
As above to avoid churny commentary.
>> +/* Read the CPER block, returning its address, and header in estatus. */
>> +int __ghes_peek_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
>> +{
>> + struct acpi_hest_generic *g = ghes->generic;
>> + int rc;
>> +
>> + rc = apei_read(buf_paddr, &g->error_status_address);
>> + if (rc) {
>> + *buf_paddr = 0;
>> + pr_warn_ratelimited(FW_WARN GHES_PFX
>> +"Failed to read error status block address for hardware error source: %d.\n",
>
> Unusual indenting. I'd just fix that whilst you are here. Don't worry about long line.
I'll fix the odd indentation
>> + g->header.source_id);
>> + return -EIO;
>
>> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
>> index 2597fbadc4f3..2e3919f0c3e7 100644
>> --- a/include/acpi/ghes_cper.h
>> +++ b/include/acpi/ghes_cper.h
>> @@ -74,21 +74,21 @@ struct ghes_vendor_record_entry {
>> char vendor_record[];
>> };
>>
>> -static struct ghes *ghes_new(struct acpi_hest_generic *generic);
> Huh. Static forward declarations in a header? That never made sense. Fix it in the
> earlier patch and remove the statics from the declarations.
>
> Actually no, just bring them into the header only when you need to. So as part
> of the patch that moves the caller or the function.
I’m keeping the GHES‑CPER interface explicit in one place.
Moving those declarations later would scatter that interface
and make the mechanical diffs harder to review.
>> -static void ghes_fini(struct ghes *ghes);
>> +struct ghes *ghes_new(struct acpi_hest_generic *generic);
>> +void ghes_fini(struct ghes *ghes);
>>
>> -static int ghes_read_estatus(struct ghes *ghes,
>> +int ghes_read_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> -static void ghes_clear_estatus(struct ghes *ghes,
>> +void ghes_clear_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 buf_paddr, enum fixed_addresses fixmap_idx);
>> -static int __ghes_peek_estatus(struct ghes *ghes,
>> +int __ghes_peek_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> -static int __ghes_check_estatus(struct ghes *ghes,
>> +int __ghes_check_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus);
>> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> size_t buf_len);
>>
>>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-03-11 11:39 ` Ahmed Tiba
@ 2026-03-11 12:39 ` Jonathan Cameron
2026-03-11 12:56 ` Ahmed Tiba
0 siblings, 1 reply; 39+ messages in thread
From: Jonathan Cameron @ 2026-03-11 12:39 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck, Mauro Carvalho Chehab
On Wed, 11 Mar 2026 11:39:38 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> On 24/02/2026 15:22, Jonathan Cameron wrote:
> > On Fri, 20 Feb 2026 13:42:19 +0000
> > Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> >
> >> Carve the CPER helper macros out of ghes.c and place them in a private
> >> header so they can be shared with upcoming helper files. This is a
> >> mechanical include change with no functional differences.
> >>
> >> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> > +CC Mauro as he's been doing a lot of work on error injection recently so
> > can probably review the use of the various structures much more easily
> > than I can!
> >
> > My main comment is on the naming of the new header.
> >
> > Jonathan
>
> The content is intentionally GHES‑specific CPER handling,
> not generic UEFI CPER. It's the GHES view of CPER parsing/handling
> and is used by the shared GHES/DT path, so keeping it in ghes_cper.h
> documents that boundary better than moving it to ghes.h (which also
> contains non‑CPER GHES logic). The helpers moved there are the ones
> needed by the shared CPER handling path.
Ok. So the intended meaning here is GHES and CPER, not stuff specific
to the CPER aspects of GHES. Maybe, though I'm not sure why you
don't just name ghes.h in that case as GHES always incorporates CPER.
I guess because that file already exists and covers some ACPI specific parts
and HEST bits that aren't of use to you.
Ah well, one for the ACPI maintainers to perhaps suggest what makes
most sense to them.
>
> >> ---
> >> drivers/acpi/apei/ghes.c | 60 +-----------------------------
> >> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 2 files changed, 96 insertions(+), 59 deletions(-)
> >>
> >> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> >> index f96aede5d9a3..07b70bcb8342 100644
> >> --- a/drivers/acpi/apei/ghes.c
> >> +++ b/drivers/acpi/apei/ghes.c
> >
> >>
> >> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
> >> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
> >> new file mode 100644
> >> index 000000000000..2597fbadc4f3
> >> --- /dev/null
> >> +++ b/include/acpi/ghes_cper.h
> >> @@ -0,0 +1,95 @@
> >> +/* SPDX-License-Identifier: GPL-2.0-only */
> >> +/*
> >> + * APEI Generic Hardware Error Source: CPER Helper
> >
> > There is other stuff in her usch as the GHES acks etc
> > in ghes_clear_estatus(). So I think this intro text
> > needs a bit more thought. The boundary is already rather
> > blurred though as for example cper_estatus_len() is only
> > tangentially connected to cper.
> >
> >> + *
> >> + * Copyright (C) 2026 ARM Ltd.
> >
> > Doesn't make sense to ad this copyright in this patch as so far
> > it's cut and paste of code from a file that you didn't write (at least
> > not in 2026!)
> >
> > Might make sense after a few patches, in which case add the copyright
> > when it does.
>
> The file is new and maintained by Arm as part of this refactor,
> so I kept the header consistent with other newly introduced files.
It's code moved from elsewhere, so you need to at least also list
the copyright of the original file alongside the new Arm one.
Just moving it and dropping that copyright is inconsistent with
the license.
>
> >> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> >> + * Based on ACPI APEI GHES driver.
> >> + *
> >> + */
> >> +
> >> +#ifndef ACPI_APEI_GHES_CPER_H
> >> +#define ACPI_APEI_GHES_CPER_H
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header
2026-03-11 12:39 ` Jonathan Cameron
@ 2026-03-11 12:56 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 12:56 UTC (permalink / raw)
To: Jonathan Cameron
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck, Mauro Carvalho Chehab
On 11/03/2026 12:39, Jonathan Cameron wrote:
> On Wed, 11 Mar 2026 11:39:38 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> On 24/02/2026 15:22, Jonathan Cameron wrote:
>>> On Fri, 20 Feb 2026 13:42:19 +0000
>>> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>>>
>>>> Carve the CPER helper macros out of ghes.c and place them in a private
>>>> header so they can be shared with upcoming helper files. This is a
>>>> mechanical include change with no functional differences.
>>>>
>>>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>>> +CC Mauro as he's been doing a lot of work on error injection recently so
>>> can probably review the use of the various structures much more easily
>>> than I can!
>>>
>>> My main comment is on the naming of the new header.
>>>
>>> Jonathan
>>
>> The content is intentionally GHES‑specific CPER handling,
>> not generic UEFI CPER. It's the GHES view of CPER parsing/handling
>> and is used by the shared GHES/DT path, so keeping it in ghes_cper.h
>> documents that boundary better than moving it to ghes.h (which also
>> contains non‑CPER GHES logic). The helpers moved there are the ones
>> needed by the shared CPER handling path.
>
> Ok. So the intended meaning here is GHES and CPER, not stuff specific
> to the CPER aspects of GHES. Maybe, though I'm not sure why you
> don't just name ghes.h in that case as GHES always incorporates CPER.
> I guess because that file already exists and covers some ACPI specific parts
> and HEST bits that aren't of use to you.
>
> Ah well, one for the ACPI maintainers to perhaps suggest what makes
> most sense to them.
Ok. I'll keep it GHES-scoped for now to avoid implying a generic UEFI
CPER API, but I'll defer to the ACPI maintainers if they prefer ghes.h
or another location.
>>
>>>> ---
>>>> drivers/acpi/apei/ghes.c | 60 +-----------------------------
>>>> include/acpi/ghes_cper.h | 95 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 96 insertions(+), 59 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index f96aede5d9a3..07b70bcb8342 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>
>>>>
>>>> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>>>> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
>>>> new file mode 100644
>>>> index 000000000000..2597fbadc4f3
>>>> --- /dev/null
>>>> +++ b/include/acpi/ghes_cper.h
>>>> @@ -0,0 +1,95 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +/*
>>>> + * APEI Generic Hardware Error Source: CPER Helper
>>>
>>> There is other stuff in her usch as the GHES acks etc
>>> in ghes_clear_estatus(). So I think this intro text
>>> needs a bit more thought. The boundary is already rather
>>> blurred though as for example cper_estatus_len() is only
>>> tangentially connected to cper.
>>>
>>>> + *
>>>> + * Copyright (C) 2026 ARM Ltd.
>>>
>>> Doesn't make sense to ad this copyright in this patch as so far
>>> it's cut and paste of code from a file that you didn't write (at least
>>> not in 2026!)
>>>
>>> Might make sense after a few patches, in which case add the copyright
>>> when it does.
>>
>> The file is new and maintained by Arm as part of this refactor,
>> so I kept the header consistent with other newly introduced files.
>
> It's code moved from elsewhere, so you need to at least also list
> the copyright of the original file alongside the new Arm one.
> Just moving it and dropping that copyright is inconsistent with
> the license.
Agreed. This is moved from ghes.c, so I'll carry over the original
ghes.c copyright into the new header and won't add a new Arm copyright
for a pure move.
>>
>>>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>>>> + * Based on ACPI APEI GHES driver.
>>>> + *
>>>> + */
>>>> +
>>>> +#ifndef ACPI_APEI_GHES_CPER_H
>>>> +#define ACPI_APEI_GHES_CPER_H
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers
2026-02-26 5:58 ` Himanshu Chauhan
@ 2026-03-11 13:18 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 13:18 UTC (permalink / raw)
To: Himanshu Chauhan
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 26/02/2026 05:58, Himanshu Chauhan wrote:
> On Fri, Feb 20, 2026 at 7:13 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>>
>> Relocate the CPER buffer mapping, peek, and clear helpers from ghes.c into
>> ghes_cper.c so they can be shared with other firmware-first providers.
>> This commit only shuffles code; behavior stays the same.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>> ---
>> drivers/acpi/apei/ghes.c | 170 +-----------------------------------------
>> drivers/acpi/apei/ghes_cper.c | 170 +++++++++++++++++++++++++++++++++++++++++-
>> include/acpi/ghes_cper.h | 14 ++--
>> 3 files changed, 177 insertions(+), 177 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 07b70bcb8342..b159dbee90ac 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -118,26 +118,6 @@ static struct gen_pool *ghes_estatus_pool;
>> static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> static atomic_t ghes_estatus_cache_alloced;
>>
>> -static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
>> -{
>> - phys_addr_t paddr;
>> - pgprot_t prot;
>> -
>> - paddr = PFN_PHYS(pfn);
>> - prot = arch_apei_get_mem_attribute(paddr);
>> - __set_fixmap(fixmap_idx, paddr, prot);
>> -
>> - return (void __iomem *) __fix_to_virt(fixmap_idx);
>> -}
>> -
>> -static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
>> -{
>> - int _idx = virt_to_fix((unsigned long)vaddr);
>> -
>> - WARN_ON_ONCE(fixmap_idx != _idx);
>> - clear_fixmap(fixmap_idx);
>> -}
>> -
>> int ghes_estatus_pool_init(unsigned int num_ghes)
>> {
>> unsigned long addr, len;
>> @@ -193,22 +173,7 @@ static void unmap_gen_v2(struct ghes *ghes)
>> apei_unmap_generic_address(&ghes->generic_v2->read_ack_register);
>> }
>>
>> -static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
>> -{
>> - int rc;
>> - u64 val = 0;
>> -
>> - rc = apei_read(&val, &gv2->read_ack_register);
>> - if (rc)
>> - return;
>> -
>> - val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
>> - val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
>> -
>> - apei_write(val, &gv2->read_ack_register);
>> -}
>> -
>> -static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>> +struct ghes *ghes_new(struct acpi_hest_generic *generic)
>> {
>> struct ghes *ghes;
>> unsigned int error_block_length;
>> @@ -255,7 +220,7 @@ static struct ghes *ghes_new(struct acpi_hest_generic *generic)
>> return ERR_PTR(rc);
>> }
>>
>> -static void ghes_fini(struct ghes *ghes)
>> +void ghes_fini(struct ghes *ghes)
>> {
>> kfree(ghes->estatus);
>> apei_unmap_generic_address(&ghes->generic->error_status_address);
>> @@ -280,137 +245,6 @@ static inline int ghes_severity(int severity)
>> }
>> }
>
> Can it be "ghes_finish"? We already have "creat" without 'e'.
This is a pure mechanical move from ghes.c. I’m keeping the original
name here to avoid churn. If we want a rename, I can do that separately
with justification.
>>
>> -static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
>> - int from_phys,
>> - enum fixed_addresses fixmap_idx)
>> -{
>> - void __iomem *vaddr;
>> - u64 offset;
>> - u32 trunk;
>> -
>> - while (len > 0) {
>> - offset = paddr - (paddr & PAGE_MASK);
>> - vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
>> - trunk = PAGE_SIZE - offset;
>> - trunk = min(trunk, len);
>> - if (from_phys)
>> - memcpy_fromio(buffer, vaddr + offset, trunk);
>> - else
>> - memcpy_toio(vaddr + offset, buffer, trunk);
>> - len -= trunk;
>> - paddr += trunk;
>> - buffer += trunk;
>> - ghes_unmap(vaddr, fixmap_idx);
>> - }
>> -}
>> -
>> -/* Check the top-level record header has an appropriate size. */
>> -static int __ghes_check_estatus(struct ghes *ghes,
>> - struct acpi_hest_generic_status *estatus)
>> -{
>> - u32 len = cper_estatus_len(estatus);
>> - u32 max_len = min(ghes->generic->error_block_length,
>> - ghes->estatus_length);
>> -
>> - if (len < sizeof(*estatus)) {
>> - pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
>> - return -EIO;
>> - }
>> -
>> - if (!len || len > max_len) {
>> - pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
>> - return -EIO;
>> - }
>> -
>> - if (cper_estatus_check_header(estatus)) {
>> - pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
>> - return -EIO;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> -/* Read the CPER block, returning its address, and header in estatus. */
>> -static int __ghes_peek_estatus(struct ghes *ghes,
>> - struct acpi_hest_generic_status *estatus,
>> - u64 *buf_paddr, enum fixed_addresses fixmap_idx)
>> -{
>> - struct acpi_hest_generic *g = ghes->generic;
>> - int rc;
>> -
>> - rc = apei_read(buf_paddr, &g->error_status_address);
>> - if (rc) {
>> - *buf_paddr = 0;
>> - pr_warn_ratelimited(FW_WARN GHES_PFX
>> -"Failed to read error status block address for hardware error source: %d.\n",
>> - g->header.source_id);
>> - return -EIO;
>> - }
>> - if (!*buf_paddr)
>> - return -ENOENT;
>> -
>> - ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
>> - fixmap_idx);
>> - if (!estatus->block_status) {
>> - *buf_paddr = 0;
>> - return -ENOENT;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> - u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> - size_t buf_len)
>> -{
>> - ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
>> - if (cper_estatus_check(estatus)) {
>> - pr_warn_ratelimited(FW_WARN GHES_PFX
>> - "Failed to read error status block!\n");
>> - return -EIO;
>> - }
>> -
>> - return 0;
>> -}
>> -
>> -static int ghes_read_estatus(struct ghes *ghes,
>> - struct acpi_hest_generic_status *estatus,
>> - u64 *buf_paddr, enum fixed_addresses fixmap_idx)
>> -{
>> - int rc;
>> -
>> - rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
>> - if (rc)
>> - return rc;
>> -
>> - rc = __ghes_check_estatus(ghes, estatus);
>> - if (rc)
>> - return rc;
>> -
>> - return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
>> - cper_estatus_len(estatus));
>> -}
>> -
>> -static void ghes_clear_estatus(struct ghes *ghes,
>> - struct acpi_hest_generic_status *estatus,
>> - u64 buf_paddr, enum fixed_addresses fixmap_idx)
>> -{
>> - estatus->block_status = 0;
>> -
>> - if (!buf_paddr)
>> - return;
>> -
>> - ghes_copy_tofrom_phys(estatus, buf_paddr,
>> - sizeof(estatus->block_status), 0,
>> - fixmap_idx);
>> -
>> - /*
>> - * GHESv2 type HEST entries introduce support for error acknowledgment,
>> - * so only acknowledge the error if this support is present.
>> - */
>> - if (is_hest_type_generic_v2(ghes))
>> - ghes_ack_error(ghes->generic_v2);
>> -}
>>
>> /**
>> * struct ghes_task_work - for synchronous RAS event
>> diff --git a/drivers/acpi/apei/ghes_cper.c b/drivers/acpi/apei/ghes_cper.c
>> index 63047322a3d9..7e0015e960c1 100644
>> --- a/drivers/acpi/apei/ghes_cper.c
>> +++ b/drivers/acpi/apei/ghes_cper.c
>
> IMO, just "cper.c" would be fine.
"cper.c" and "include/acpi/cper.h" already exist under EFI. This code is
GHES‑specific CPER handling (the GHES view of CPER), not a generic UEFI
CPER API, so I’m keeping the GHES‑scoped naming to avoid ambiguity.
>> @@ -1,7 +1,7 @@
>> // SPDX-License-Identifier: GPL-2.0
>> /*
>> *
>> - * APEI GHES CPER helper translation unit - staging file for helper moves
>> + * APEI GHES CPER helper translation unit - code mechanically moved from ghes.c
>> *
>> * Copyright (C) 2026 ARM Ltd.
>> * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> @@ -17,10 +17,176 @@
>> #include <linux/slab.h>
>>
>> #include <acpi/apei.h>
>> +#include <acpi/ghes_cper.h>
>>
>> #include <asm/fixmap.h>
>> #include <asm/tlbflush.h>
>>
>> #include "apei-internal.h"
>>
>> -/* Helper bodies will be moved here in follow-up commits. */
>> +static void __iomem *ghes_map(u64 pfn, enum fixed_addresses fixmap_idx)
>> +{
>> + phys_addr_t paddr;
>> + pgprot_t prot;
>> +
>> + paddr = PFN_PHYS(pfn);
>> + prot = arch_apei_get_mem_attribute(paddr);
>> + __set_fixmap(fixmap_idx, paddr, prot);
>> +
>> + return (void __iomem *) __fix_to_virt(fixmap_idx);
>> +}
>> +
>> +static void ghes_unmap(void __iomem *vaddr, enum fixed_addresses fixmap_idx)
>> +{
>> + int _idx = virt_to_fix((unsigned long)vaddr);
>> +
>> + WARN_ON_ONCE(fixmap_idx != _idx);
>> + clear_fixmap(fixmap_idx);
>> +}
>> +
>> +static void ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
>> +{
>> + int rc;
>> + u64 val = 0;
>> +
>> + rc = apei_read(&val, &gv2->read_ack_register);
>> + if (rc)
>> + return;
>> +
>> + val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
>> + val |= gv2->read_ack_write << gv2->read_ack_register.bit_offset;
>> +
>> + apei_write(val, &gv2->read_ack_register);
>> +}
>> +
>> +static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len,
>> + int from_phys,
>> + enum fixed_addresses fixmap_idx)
>> +{
>> + void __iomem *vaddr;
>> + u64 offset;
>> + u32 trunk;
>> +
>> + while (len > 0) {
>> + offset = paddr - (paddr & PAGE_MASK);
>> + vaddr = ghes_map(PHYS_PFN(paddr), fixmap_idx);
>> + trunk = PAGE_SIZE - offset;
>> + trunk = min(trunk, len);
>> + if (from_phys)
>> + memcpy_fromio(buffer, vaddr + offset, trunk);
>> + else
>> + memcpy_toio(vaddr + offset, buffer, trunk);
>> + len -= trunk;
>> + paddr += trunk;
>> + buffer += trunk;
>> + ghes_unmap(vaddr, fixmap_idx);
>> + }
>> +}
>> +
>> +/* Check the top-level record header has an appropriate size. */
>> +int __ghes_check_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus)
>> +{
>> + u32 len = cper_estatus_len(estatus);
>> + u32 max_len = min(ghes->generic->error_block_length,
>> + ghes->estatus_length);
>> +
>> + if (len < sizeof(*estatus)) {
>> + pr_warn_ratelimited(FW_WARN GHES_PFX "Truncated error status block!\n");
>> + return -EIO;
>> + }
>> +
>> + if (!len || len > max_len) {
>> + pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid error status block length!\n");
>> + return -EIO;
>> + }
>> +
>> + if (cper_estatus_check_header(estatus)) {
>> + pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid CPER header!\n");
>> + return -EIO;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/* Read the CPER block, returning its address, and header in estatus. */
>> +int __ghes_peek_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
>> +{
>> + struct acpi_hest_generic *g = ghes->generic;
>> + int rc;
>> +
>> + rc = apei_read(buf_paddr, &g->error_status_address);
>> + if (rc) {
>> + *buf_paddr = 0;
>> + pr_warn_ratelimited(FW_WARN GHES_PFX
>> +"Failed to read error status block address for hardware error source: %d.\n",
>> + g->header.source_id);
>> + return -EIO;
>> + }
>> + if (!*buf_paddr)
>> + return -ENOENT;
>> +
>> + ghes_copy_tofrom_phys(estatus, *buf_paddr, sizeof(*estatus), 1,
>> + fixmap_idx);
>> + if (!estatus->block_status) {
>> + *buf_paddr = 0;
>> + return -ENOENT;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> + size_t buf_len)
>> +{
>> + ghes_copy_tofrom_phys(estatus, buf_paddr, buf_len, 1, fixmap_idx);
>> + if (cper_estatus_check(estatus)) {
>> + pr_warn_ratelimited(FW_WARN GHES_PFX
>> + "Failed to read error status block!\n");
>> + return -EIO;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +int ghes_read_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 *buf_paddr, enum fixed_addresses fixmap_idx)
>> +{
>> + int rc;
>> +
>> + rc = __ghes_peek_estatus(ghes, estatus, buf_paddr, fixmap_idx);
>> + if (rc)
>> + return rc;
>> +
>> + rc = __ghes_check_estatus(ghes, estatus);
>> + if (rc)
>> + return rc;
>> +
>> + return __ghes_read_estatus(estatus, *buf_paddr, fixmap_idx,
>> + cper_estatus_len(estatus));
>> +}
>> +
>> +void ghes_clear_estatus(struct ghes *ghes,
>> + struct acpi_hest_generic_status *estatus,
>> + u64 buf_paddr, enum fixed_addresses fixmap_idx)
>> +{
>> + estatus->block_status = 0;
>> +
>> + if (!buf_paddr)
>> + return;
>> +
>> + ghes_copy_tofrom_phys(estatus, buf_paddr,
>> + sizeof(estatus->block_status), 0,
>> + fixmap_idx);
>> +
>> + /*
>> + * GHESv2 type HEST entries introduce support for error acknowledgment,
>> + * so only acknowledge the error if this support is present.
>> + */
>> + if (is_hest_type_generic_v2(ghes))
>> + ghes_ack_error(ghes->generic_v2);
>> +}
>> diff --git a/include/acpi/ghes_cper.h b/include/acpi/ghes_cper.h
>> index 2597fbadc4f3..2e3919f0c3e7 100644
>> --- a/include/acpi/ghes_cper.h
>> +++ b/include/acpi/ghes_cper.h
>> @@ -74,21 +74,21 @@ struct ghes_vendor_record_entry {
>> char vendor_record[];
>> };
>>
>
> ditto. "include/acpi/cper.h"
As above.
>> -static struct ghes *ghes_new(struct acpi_hest_generic *generic);
>> -static void ghes_fini(struct ghes *ghes);
>> +struct ghes *ghes_new(struct acpi_hest_generic *generic);
>> +void ghes_fini(struct ghes *ghes);
>>
>> -static int ghes_read_estatus(struct ghes *ghes,
>> +int ghes_read_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> -static void ghes_clear_estatus(struct ghes *ghes,
>> +void ghes_clear_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 buf_paddr, enum fixed_addresses fixmap_idx);
>> -static int __ghes_peek_estatus(struct ghes *ghes,
>> +int __ghes_peek_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus,
>> u64 *buf_paddr, enum fixed_addresses fixmap_idx);
>> -static int __ghes_check_estatus(struct ghes *ghes,
>> +int __ghes_check_estatus(struct ghes *ghes,
>> struct acpi_hest_generic_status *estatus);
>> -static int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> +int __ghes_read_estatus(struct acpi_hest_generic_status *estatus,
>> u64 buf_paddr, enum fixed_addresses fixmap_idx,
>> size_t buf_len);
>>
>>
>> --
>> 2.43.0
>>
>>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh
2026-02-26 7:03 ` Himanshu Chauhan
@ 2026-03-11 13:41 ` Ahmed Tiba
0 siblings, 0 replies; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-11 13:41 UTC (permalink / raw)
To: Himanshu Chauhan
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 26/02/2026 07:03, Himanshu Chauhan wrote:
> On Fri, Feb 20, 2026 at 7:15 PM Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>>
>> Describe the DeviceTree node that exposes the Arm firmware-first handler
>> CPER provider and hook the file into MAINTAINERS so the binding has an
>> owner.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
>> ---
>> .../devicetree/bindings/firmware/arm,ras-ffh.yaml | 71 ++++++++++++++++++++++
>> MAINTAINERS | 5 ++
>> 2 files changed, 76 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
>> new file mode 100644
>> index 000000000000..eccbaaf45885
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
>> @@ -0,0 +1,71 @@
>> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/firmware/arm,ras-ffh.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Arm Firmware-First Handler (FFH) CPER provider
>
> Please don't called it FFH. FFH stands for Fixed Feature Hardware and
> ACPI uses it at multiple places. It is causing confusion.
Agreed. I can drop "ffh" and rename it to "arm,ras-cper".
>> +
>> +maintainers:
>> + - Ahmed Tiba <ahmed.tiba@arm.com>
>> +
>> +description: |
>> + Arm Reliability, Availability and Serviceability (RAS) firmware can expose
>> + a firmware-first handler (FFH) that provides UEFI CPER Generic Error Status
>> + blocks directly via DeviceTree. The firmware owns the CPER buffer
>> + and notifies the OS through an interrupt.
>> +
>> +properties:
>> + compatible:
>> + const: arm,ras-ffh
>> +
>> + reg:
>> + minItems: 1
>> + items:
>> + - description:
>> + CPER Generic Error Status block exposed by firmware
>> + - description:
>> + Optional 32- or 64-bit doorbell register used on platforms
>> + where firmware needs an explicit "ack" handshake before overwriting
>> + the CPER buffer. Firmware watches bit 0 and expects the OS to set it
>> + once the current status block has been consumed.
>> +
>> + interrupts:
>> + maxItems: 1
>> + description:
>> + Interrupt used to signal that a new status record is ready.
>> +
>> + memory-region:
>> + $ref: /schemas/types.yaml#/definitions/phandle
>> + description:
>> + Optional phandle to the reserved-memory entry that backs the status
>> + buffer so firmware and the OS use the same carved-out region.
>> +
>> +required:
>> + - compatible
>> + - reg
>> + - interrupts
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> + - |
>> + #include <dt-bindings/interrupt-controller/arm-gic.h>
>> +
>> + reserved-memory {
>> + #address-cells = <2>;
>> + #size-cells = <2>;
>> + ras_cper_buffer: cper@fe800000 {
>> + reg = <0x0 0xfe800000 0x0 0x1000>;
>> + no-map;
>> + };
>> + };
>> +
>> + error-handler@fe800000 {
>> + compatible = "arm,ras-ffh";
>> + reg = <0xfe800000 0x1000>,
>> + <0xfe810000 0x4>;
>> + memory-region = <&ras_cper_buffer>;
>> + interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>;
>> + };
>> +...
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index b8d8a5c41597..47db7877b485 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -22027,6 +22027,11 @@ M: Alexandre Bounine <alex.bou9@gmail.com>
>> S: Maintained
>> F: drivers/rapidio/
>>
>> +RAS ERROR STATUS
>> +M: Ahmed Tiba <ahmed.tiba@arm.com>
>> +S: Maintained
>> +F: Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml
>> +
>> RAS INFRASTRUCTURE
>> M: Tony Luck <tony.luck@intel.com>
>> M: Borislav Petkov <bp@alien8.de>
>>
>> --
>> 2.43.0
>>
>>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-02-24 15:55 ` Jonathan Cameron
@ 2026-03-12 12:23 ` Ahmed Tiba
2026-03-12 14:50 ` Jonathan Cameron
0 siblings, 1 reply; 39+ messages in thread
From: Ahmed Tiba @ 2026-03-12 12:23 UTC (permalink / raw)
To: Jonathan Cameron
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
On 24/02/2026 15:55, Jonathan Cameron wrote:
> On Fri, 20 Feb 2026 13:42:29 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
>
>> Add a DeviceTree firmware-first CPER provider that reuses the shared
>> GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
>> the admin guide. Update MAINTAINERS now that the driver exists.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> Hi Ahmed,
>
> Various comments inline.
>
> Jonathan
>
>> ---
>> Documentation/admin-guide/RAS/main.rst | 18 +++
>> MAINTAINERS | 1 +
>> drivers/acpi/apei/apei-internal.h | 10 +-
>> drivers/acpi/apei/ghes_cper.c | 2 +
>> drivers/ras/Kconfig | 12 ++
>> drivers/ras/Makefile | 1 +
>> drivers/ras/esource-dt.c | 264 +++++++++++++++++++++++++++++++++
>> include/acpi/ghes_cper.h | 9 ++
>> 8 files changed, 308 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
>> index 5a45db32c49b..4ffabaaeabb1 100644
>> --- a/Documentation/admin-guide/RAS/main.rst
>> +++ b/Documentation/admin-guide/RAS/main.rst
>> @@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
>> .. [#f3] For more details about the Machine Check Architecture (MCA),
>> please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
>>
>> +Firmware-first CPER via DeviceTree
>> +----------------------------------
>> +
>> +Some systems expose Common Platform Error Record (CPER) data
>> +via DeviceTree instead of ACPI HEST tables.
>
> I'd argue this isn't really DT specific, it's just not ACPI table.
> You could for instance use PRP0001 and wire this up on ACPI with only
> one trivial change to generic property.h accessor for the boolean.
>
> Or use another firmware information source entirely.
I'm intentionally keeping the scope DT-only for this series,
so I'll keep the wording DT-focused.
>> +Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
>> +driver and describe the CPER error source buffer with the
>> +``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
>> +The driver reuses the GHES CPER helper object in
>> +``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
>> +memory failure handling match the ACPI GHES behaviour even when
>> +ACPI is disabled.
>> +
>> +Once a platform describes a firmware-first provider, both ACPI GHES and the
>> +DeviceTree driver reuse the same code paths. This keeps the behaviour
>> +consistent regardless of whether the error source is described via ACPI
>> +tables or DeviceTree.
>
>> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
>> index fc4f4bb94a4c..ea6d96713020 100644
>> --- a/drivers/ras/Kconfig
>> +++ b/drivers/ras/Kconfig
>> @@ -34,6 +34,18 @@ if RAS
>> source "arch/x86/ras/Kconfig"
>> source "drivers/ras/amd/atl/Kconfig"
>>
>> +config RAS_ESOURCE_DT
>> + bool "DeviceTree firmware-first CPER error source block provider"
> It isn't really DT specific other than one call that I've suggested you
> replace with a generic firmware accessor.
>
I'll keep it DT-specific for this series.
>> + depends on OF
>
> Generally we don't gate on OF unless there are OF specific calls. Here there
> aren't so you are just reducing build coverage. || COMPILE_TEST
> maybe.
>
Agreed. I'll drop OF and add COMPILE_TEST.
>> + depends on ARM64
>
> Likewise, nothing in here is arm64 specific that I can spot.
>
Agreed. I'll drop ARM64.
>> + select GHES_CPER_HELPERS
>> + help
>> + Enable support for firmware-first Common Platform Error Record (CPER)
>> + error source block providers that are described via DeviceTree
>> + instead of ACPI HEST tables. The driver reuses the existing GHES
>> + CPER helpers so the error processing matches the ACPI code paths,
>> + but it can be built even when ACPI is disabled.
>> +
>
>> diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
>> new file mode 100644
>> index 000000000000..b575a2258536
>> --- /dev/null
>> +++ b/drivers/ras/esource-dt.c
>> @@ -0,0 +1,264 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * DeviceTree provider for firmware-first CPER error source block.
>> + *
>> + * This driver shares the GHES CPER helpers so we keep the reporting and
>> + * notifier behaviour identical to ACPI GHES
>> + *
>> + * Copyright (C) 2025 ARM Ltd.
>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> + */
>> +
>> +#include <linux/atomic.h>
>> +#include <linux/bitops.h>
>> +#include <linux/device.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/io.h>
>> +#include <linux/io-64-nonatomic-lo-hi.h>
> Used?
No, I'll drop it.
>> +#include <linux/module.h>
> mod_devicetable.h for of_device_id definition.
>
Ack. I'll add <linux/mod_devicetable.h> and keep module.h.
>> +#include <linux/of_address.h>
>> +#include <linux/of_irq.h>
> Generally very little reason to include these. Not sure why you need
> them here.
>
Agreed, I'll drop both.
>> +#include <linux/panic.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +
>> +#include <acpi/ghes.h>
>> +#include <acpi/ghes_cper.h>
>> +
>> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
> I'd normally expect an IDA or similar. If nothing else it clearly
> indicates we only want a unique ID.
I'll keep the atomic for now; it's just a monotonic unique ID with no
lifetime tracking. If you strongly prefer IDA I can switch.
>> +
>> +struct ghes_ffh_ack {
>> + void __iomem *addr;
>> + u64 preserve;
>> + u64 set;
>> + u8 width;
>> + bool present;
>> +};
>> +
>> +struct ghes_ffh {
>> + struct device *dev;
>> + void __iomem *status;
>> + size_t status_len;
>> +
>> + struct ghes_ffh_ack ack;
>> +
>> + struct acpi_hest_generic *generic;
>> + struct acpi_hest_generic_status *estatus;
>> +
>> + bool sync;
>> + int irq;
>> +
>> + /* Serializes access to the firmware-owned buffer. */
> If we are serializing it, in what sense is it owned by the firmware?
>
I'll clarify the comment:
firmware owns the buffer contents and the OS only serializes access.
>> + spinlock_t lock;
>> +};
>
>
>> +
>> +static void ghes_ffh_process(struct ghes_ffh *ctx)
>> +{
>> + unsigned long flags;
>> + int sev;
>> +
>> + spin_lock_irqsave(&ctx->lock, flags);
>
> guard() + include cleanup.h. Then can do returns in error paths.
Agreed. I'll switch to guard() and include <linux/cleanup.h>.
>> +
>> + if (ghes_ffh_copy_status(ctx))
>> + goto out;
> Like here to give simpler lfow.
>
>
>> +
>> + sev = ghes_severity(ctx->estatus->error_severity);
>> + if (sev >= GHES_SEV_PANIC)
>> + ghes_ffh_fatal(ctx);
>> +
>> + if (!ghes_estatus_cached(ctx->estatus)) {
>> + if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
>
> Combine the two if statements with &&
>
Will do.
>> + ghes_estatus_cache_add(ctx->generic, ctx->estatus);
>> + }
>> +
>> + ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
>> +
>> + ghes_ffh_ack(ctx);
>> +
>> +out:
>> + spin_unlock_irqrestore(&ctx->lock, flags);
>> +}
>> +
>> +static irqreturn_t ghes_ffh_irq(int irq, void *data)
>> +{
>> + struct ghes_ffh *ctx = data;
>> +
>> + ghes_ffh_process(ctx);
>> +
>> + return IRQ_HANDLED;
>> +}
>> +
>> +static int ghes_ffh_init_ack(struct platform_device *pdev,
>> + struct ghes_ffh *ctx)
>> +{
>> + struct resource *res;
>> + size_t size;
>> +
>> + res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>> + if (!res)
>> + return 0;
>> +
>> + ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
> Why not devm_platform_get_and_ioremap_resource()?
Will switch to devm_platform_get_and_ioremap_resource().
>> + if (IS_ERR(ctx->ack.addr))
>> + return PTR_ERR(ctx->ack.addr);
>> +
>> + size = resource_size(res);
>> + switch (size) {
>> + case 4:
>> + ctx->ack.width = 32;
>> + ctx->ack.preserve = ~0U;
>> + break;
>> + case 8:
>> + ctx->ack.width = 64;
>> + ctx->ack.preserve = ~0ULL;
>> + break;
>> + default:
>> + dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
>> + return -EINVAL;
>> + }
>> +
>> + ctx->ack.set = BIT_ULL(0);
>> + ctx->ack.present = true;
>> + return 0;
>> +}
>> +
>> +static int ghes_ffh_probe(struct platform_device *pdev)
>
> Consider using a
> struct device *dev = &pdev->dev;
> given there is only one device around and it will shorten a bunch of
> lines a little.
I'll use a local dev pointer.
>> +{
>> + struct ghes_ffh *ctx;
>> + struct resource *res;
>> + int rc;
>> +
>> + ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
>> + if (!ctx)
>> + return -ENOMEM;
>> +
>> + spin_lock_init(&ctx->lock);
>> + ctx->dev = &pdev->dev;
>> + ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
> Hmm. I'd allow for other firmware types with
> device_property_read_bool() instead.
Given DT-only scope, I'll keep of_property_read_bool() here.
>> +
>> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> + if (!res) {
>> + dev_err(&pdev->dev, "status region missing\n");
> In probe you can always use dev_err_probe. It pretty prints the return value etc and
> saves lines of code.
> return dev_err_probe(&pdev->dev, -EINVAL, "status region missing\n");
Agreed. I'll use dev_err_probe() here and for zero length.
> Don't worry about slightly long line.
>
>> + return -EINVAL;
>> + }
>> +
>> + ctx->status_len = resource_size(res);
>> + if (!ctx->status_len) {
>> + dev_err(&pdev->dev, "Status region has zero length\n");
> As above, use dev_err_probe()
>
>> + return -EINVAL;
>> + }
>> +
>> + ctx->status = devm_ioremap_resource(&pdev->dev, res);
> I'd be tempted to use devm_platform_get_and_ioremap_resource() and just
> not worry about mapping and unmapping that will unnecessarily occur in the
> case of error.
Will do (as above).
>> + if (IS_ERR(ctx->status))
>> + return PTR_ERR(ctx->status);
>> +
>> + rc = ghes_ffh_init_ack(pdev, ctx);
>> + if (rc)
>> + return rc;
>> +
>> + rc = ghes_ffh_init_pool();
>> + if (rc)
>> + return rc;
>> +
>> + ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
>> + if (!ctx->estatus)
>> + return -ENOMEM;
>> +
>> + ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
>> + if (!ctx->generic)
>> + return -ENOMEM;
>> +
>> + ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
>> + ctx->generic->header.source_id =
>> + atomic_inc_return(&ghes_ffh_source_ids);
>> + ctx->generic->notify.type = ctx->sync ?
>> + ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
>> + ctx->generic->error_block_length = ctx->status_len;
>> +
>> + ctx->irq = platform_get_irq_optional(pdev, 0);
>> + if (ctx->irq <= 0) {
>> + if (ctx->irq == -EPROBE_DEFER)
>> + return ctx->irq;
>> + dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
> If it's required, why call get_irq_optional?
> That only serves to suppress the error message inside the call. Use
> the non optional version and drop this.
I'll use platform_get_irq().
>> + return -EINVAL;
>> + }
>> +
>> + rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
>> + NULL, ghes_ffh_irq,
>> + IRQF_ONESHOT,
>> + dev_name(&pdev->dev), ctx);
>> + if (rc)
>> + return rc;
>> +
>> + platform_set_drvdata(pdev, ctx);
>
> I can't immediately spot where this is used. If it isn't don't set it as that
> will mislead people into thinking it's needed.
Agreed. I'll drop it.
>> + dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
>
> Krysztof already commented on this one.
>
>> + return 0;
>> +}
>> +
>> +static void ghes_ffh_remove(struct platform_device *pdev)
>> +{
>
> If nothing to do, platform drivers don't need a remove so get rid of it.
Agreed. I'll remove it.
>> +}
>> +
>> +static const struct of_device_id ghes_ffh_of_match[] = {
>> + { .compatible = "arm,ras-ffh" },
>> + { /* sentinel */ }
>> +};
>> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
>> +
>> +static struct platform_driver ghes_ffh_driver = {
>> + .driver = {
>> + .name = "esource-dt",
>> + .of_match_table = ghes_ffh_of_match,
>> + },
>> + .probe = ghes_ffh_probe,
>> + .remove = ghes_ffh_remove,
>> +};
>> +
> Common convention is keep this tightly coupled with the
> struct platform_driver but not having a blank line here.
I'll drop the blank line.
>> +module_platform_driver(ghes_ffh_driver);
>> +
>> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
>> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
>> +MODULE_LICENSE("GPL");
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
2026-03-12 12:23 ` Ahmed Tiba
@ 2026-03-12 14:50 ` Jonathan Cameron
0 siblings, 0 replies; 39+ messages in thread
From: Jonathan Cameron @ 2026-03-12 14:50 UTC (permalink / raw)
To: Ahmed Tiba
Cc: devicetree, linux-acpi, Dmitry.Lamerov, catalin.marinas, bp, robh,
rafael, will, conor, linux-arm-kernel, linux-doc, krzk+dt,
Michael.Zhao2, tony.luck
> >> +Firmware-first CPER via DeviceTree
> >> +----------------------------------
> >> +
> >> +Some systems expose Common Platform Error Record (CPER) data
> >> +via DeviceTree instead of ACPI HEST tables.
> >
> > I'd argue this isn't really DT specific, it's just not ACPI table.
> > You could for instance use PRP0001 and wire this up on ACPI with only
> > one trivial change to generic property.h accessor for the boolean.
> >
> > Or use another firmware information source entirely.
>
> I'm intentionally keeping the scope DT-only for this series,
> so I'll keep the wording DT-focused.
Why? Generally when the support works fine with generic firmware
accessors that's preferred because there are no real disadvantages.
> >> +#include <acpi/ghes.h>
> >> +#include <acpi/ghes_cper.h>
> >> +
> >> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
> > I'd normally expect an IDA or similar. If nothing else it clearly
> > indicates we only want a unique ID.
>
> I'll keep the atomic for now; it's just a monotonic unique ID with no
> lifetime tracking. If you strongly prefer IDA I can switch.
If it doesn't 'need' to be monotonic due to some design issue then
yes I'd prefer an IDA.
> >> + spinlock_t lock;
> >> +};
> >
> >
> >> +
> >> +static void ghes_ffh_process(struct ghes_ffh *ctx)
> >> +{
> >> + unsigned long flags;
> >> + int sev;
> >> +
> >> + spin_lock_irqsave(&ctx->lock, flags);
> >
> > guard() + include cleanup.h. Then can do returns in error paths.
>
> Agreed. I'll switch to guard() and include <linux/cleanup.h>.
A general process thing. If you agree with a suggestion, just
do it and crop that section of the email thread out.
Reply that you agree tends not to benefit anyone!
>
> >> + ghes_estatus_cache_add(ctx->generic, ctx->estatus);
> >> + }
> >> +
> >> + ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
> >> +
> >> + ghes_ffh_ack(ctx);
> >> +
> >> +out:
> >> + spin_unlock_irqrestore(&ctx->lock, flags);
> >> +}
...
> >> +{
> >> + struct ghes_ffh *ctx;
> >> + struct resource *res;
> >> + int rc;
> >> +
> >> + ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
> >> + if (!ctx)
> >> + return -ENOMEM;
> >> +
> >> + spin_lock_init(&ctx->lock);
> >> + ctx->dev = &pdev->dev;
> >> + ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
> > Hmm. I'd allow for other firmware types with
> > device_property_read_bool() instead.
>
> Given DT-only scope, I'll keep of_property_read_bool() here.
>
> >> +
> >> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> >> + if (!res) {
> >> + dev_err(&pdev->dev, "status region missing\n");
Jonathan
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2026-03-12 14:50 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-02-24 15:22 ` Jonathan Cameron
2026-03-11 11:39 ` Ahmed Tiba
2026-03-11 12:39 ` Jonathan Cameron
2026-03-11 12:56 ` Ahmed Tiba
2026-02-26 6:44 ` Himanshu Chauhan
2026-03-11 11:55 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
2026-02-24 15:25 ` Jonathan Cameron
2026-03-11 12:19 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-02-24 15:32 ` Jonathan Cameron
2026-03-11 12:38 ` Ahmed Tiba
2026-02-26 5:58 ` Himanshu Chauhan
2026-03-11 13:18 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-02-24 15:34 ` Jonathan Cameron
2026-02-20 13:42 ` [PATCH v2 08/11] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-02-20 19:19 ` kernel test robot
2026-02-20 19:24 ` kernel test robot
2026-02-20 20:37 ` kernel test robot
2026-02-20 21:16 ` kernel test robot
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
2026-02-26 7:03 ` Himanshu Chauhan
2026-03-11 13:41 ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-21 9:06 ` Krzysztof Kozlowski
2026-02-23 19:10 ` Ahmed Tiba
2026-02-24 15:55 ` Jonathan Cameron
2026-03-12 12:23 ` Ahmed Tiba
2026-03-12 14:50 ` Jonathan Cameron
2026-02-26 7:01 ` Himanshu Chauhan
2026-02-26 7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
2026-03-11 10:44 ` Ahmed Tiba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox