* [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently
@ 2025-12-19 12:39 Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco
When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via one of two similar paths, either ELOG or GHES.
Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.
Make the two mentioned paths act similarly for what relates to logging
and tracing.
--- Changes for v8 ---
- Don't make GHES dependend on PCI and drop patch 3/6 -
incidentally it works out the issues that the KTR found with v7
(Jonathan, Hanjun)
- Don't have EXTLOG dependend on CXL_BUS and move the new helpers
to a new file, then link it to ghes.c only if ACPI_APEI_PCIEAER is
selected. Placing the new helpers to their own translation unit seems
be a more flexible and safer solution than messing with Kconfig or
with conditional compilation macros within ghes.c. PCI may not be an
option in embedded platforms
--- Changes for v7 ---
- Reference UEFI v2.11 (Sathyanarayanan)
- Substitute !(A || B) with !(A && B) in an 'if' statement to
convey the intended logic (Jonathan)
- Make ACPI_APEI_GHES explicitly select PCIAER because the needed
ACPI_APEI_PCIEAER doesn't recursively select that prerequisite (Jonathan)
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202510232204.7aYBpl7h-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202510232204.XIXgPWD7-lkp@intel.com/
- Don't add the unnecessary cxl_cper_ras_handle_prot_err() wrapper
for cxl_cper_handle_prot_err() (Jonathan)
- Make ACPI_EXTLOG explicitly select PCIAER && ACPI_APEI because
the needed ACPI_APEI_PCIEAER doesn't recursively select the
prerequisites
- Make ACPI_EXTLOG select CXL_BUS
--- Changes for v6 ---
- Rename the helper that copies the CPER CXL protocol error
information to work struct (Dave)
- Return -EOPNOTSUPP (instead of -EINVAL) from the two helpers if
ACPI_APEI_PCIEAER is not defined (Dave)
--- Changes for v5 ---
- Add 3/6 to select ACPI_APEI_PCIEAER for GHES
- Add 4,5/6 to move common code between ELOG and GHES out to new
helpers use them in 6/6 (Jonathan).
--- Changes for v4 ---
- Re-base on top of recent changes of the AER error logging and
drop obsoleted 2/4 (Sathyanarayanan)
- Log with pr_warn_ratelimited() (Dave)
- Collect tags
--- Changes for v3 ---
1/4, 2/4:
- collect tags; no functional changes
3/4:
- Invert logic of checks (Yazen)
- Select CONFIG_ACPI_APEI_PCIEAER (Yazen)
4/4:
- Check serial number only for CXL devices (Yazen)
- Replace "invalid" with "unknown" in the output of a pr_err()
(Yazen)
--- Changes for v2 ---
- Add a patch to pass log levels to pci_print_aer() (Dan)
- Add a patch to trace CPER CXL Protocol Errors
- Rework commit messages (Dan)
- Use log_non_standard_event() (Bjorn)
--- Changes for v1 ---
- Drop the RFC prefix and restart from PATCH v1
- Drop patch 3/3 because a discussion on it has not yet been
settled
- Drop namespacing in export of pci_print_aer while() (Dan)
- Don't use '#ifdef' in *.c files (Dan)
- Drop a reference on pdev after operation is complete (Dan)
- Don't log an error message if pdev is NULL (Dan)
Fabio M. De Francesco (5):
ACPI: extlog: Trace CPER Non-standard Section Body
ACPI: extlog: Trace CPER PCI Express Error Section
acpi/ghes: Add helper for CPER CXL protocol errors checks
acpi/ghes: Add helper to copy CPER CXL protocol error info to work
struct
ACPI: extlog: Trace CPER CXL Protocol Error Section
drivers/acpi/Kconfig | 2 +
drivers/acpi/acpi_extlog.c | 64 +++++++++++++++++++++++++++++++
drivers/acpi/apei/Makefile | 1 +
drivers/acpi/apei/ghes.c | 40 +-------------------
drivers/acpi/apei/ghes_helpers.c | 65 ++++++++++++++++++++++++++++++++
drivers/cxl/core/ras.c | 3 +-
drivers/pci/pcie/aer.c | 2 +-
include/cxl/event.h | 22 +++++++++++
8 files changed, 159 insertions(+), 40 deletions(-)
create mode 100644 drivers/acpi/apei/ghes_helpers.c
base-commit: ea1013c1539270e372fc99854bc6e4d94eaeff66
--
2.52.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
@ 2025-12-19 12:39 ` Fabio M. De Francesco
2025-12-24 1:58 ` Shuai Xue
2025-12-19 12:39 ` [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco, Jonathan Cameron,
Kuppuswamy Sathyanarayanan, Qiuxu Zhuo
ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
(UEFI v2.11 Appendix N 2.3), extlog_print() does not. This gap was
noticed by a RAS test that injected CXL protocol errors which were
notified to extlog_print() via the IOMCA (I/O Machine Check
Architecture) mechanism. Bring parity to the extlog_print() path by
including a similar log_non_standard_event().
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/acpi_extlog.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index f6b9562779de0..47d11cb5c9120 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -183,6 +183,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else {
+ void *err = acpi_hest_get_payload(gdata);
+
+ log_non_standard_event(sec_type, fru_id, fru_text,
+ gdata->error_severity, err,
+ gdata->error_data_length);
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2025-12-19 12:39 ` Fabio M. De Francesco
2025-12-24 2:20 ` Shuai Xue
2025-12-19 12:39 ` [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks Fabio M. De Francesco
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco
I/O Machine Check Architecture events may signal failing PCIe components
or links. The AER event contains details on what was happening on the wire
when the error was signaled.
Trace the CPER PCIe Error section (UEFI v2.11, Appendix N.2.7) reported
by the I/O MCA.
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/acpi_extlog.c | 34 ++++++++++++++++++++++++++++++++++
drivers/pci/pcie/aer.c | 2 +-
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 47d11cb5c9120..88a2237772c26 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -132,6 +132,36 @@ static int print_extlog_rcd(const char *pfx,
return 1;
}
+static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
+ int severity)
+{
+#ifdef ACPI_APEI_PCIEAER
+ struct aer_capability_regs *aer;
+ struct pci_dev *pdev;
+ unsigned int devfn;
+ unsigned int bus;
+ int aer_severity;
+ int domain;
+
+ if (!(pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO))
+ return;
+
+ aer_severity = cper_severity_to_aer(severity);
+ aer = (struct aer_capability_regs *)pcie_err->aer_info;
+ domain = pcie_err->device_id.segment;
+ bus = pcie_err->device_id.bus;
+ devfn = PCI_DEVFN(pcie_err->device_id.device,
+ pcie_err->device_id.function);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (!pdev)
+ return;
+
+ pci_print_aer(pdev, aer_severity, aer);
+ pci_dev_put(pdev);
+#endif
+}
+
static int extlog_print(struct notifier_block *nb, unsigned long val,
void *data)
{
@@ -183,6 +213,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+ extlog_print_pcie(pcie_err, gdata->error_severity);
} else {
void *err = acpi_hest_get_payload(gdata);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa896803c..71ee4f5064ded 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -973,7 +973,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
pcie_print_tlp_log(dev, &aer->header_log, info.level,
dev_fmt(" "));
}
-EXPORT_SYMBOL_NS_GPL(pci_print_aer, "CXL");
+EXPORT_SYMBOL_GPL(pci_print_aer);
/**
* add_error_device - list device to be handled
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2025-12-19 12:39 ` Fabio M. De Francesco
2025-12-23 23:58 ` kernel test robot
2025-12-19 12:39 ` [PATCH 4/5 v8] acpi/ghes: Add helper to copy CPER CXL protocol error info to work struct Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 5/5 v8] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
4 siblings, 1 reply; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco
Move the CPER CXL protocol errors validity check out of
cxl_cper_post_prot_err() to new cxl_cper_sec_prot_err_valid() and limit
the serial number check only to CXL agents that are CXL devices (UEFI
v2.10, Appendix N.2.13).
Export the new symbol for reuse by ELOG.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/apei/Makefile | 1 +
drivers/acpi/apei/ghes.c | 18 +-----------------
drivers/acpi/apei/ghes_helpers.c | 32 ++++++++++++++++++++++++++++++++
include/cxl/event.h | 10 ++++++++++
4 files changed, 44 insertions(+), 17 deletions(-)
create mode 100644 drivers/acpi/apei/ghes_helpers.c
diff --git a/drivers/acpi/apei/Makefile b/drivers/acpi/apei/Makefile
index 2c474e6477e12..5db61dfb46915 100644
--- a/drivers/acpi/apei/Makefile
+++ b/drivers/acpi/apei/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_ACPI_APEI) += apei.o
obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
+obj-$(CONFIG_ACPI_APEI_PCIEAER) += ghes_helpers.o
obj-$(CONFIG_ACPI_APEI_EINJ) += einj.o
einj-y := einj-core.o
einj-$(CONFIG_ACPI_APEI_EINJ_CXL) += einj-cxl.o
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 0dc767392a6c6..cc4cc7ee8422d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -713,24 +713,8 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
struct cxl_cper_prot_err_work_data wd;
u8 *dvsec_start, *cap_start;
- if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
- pr_err_ratelimited("CXL CPER invalid agent type\n");
+ if (cxl_cper_sec_prot_err_valid(prot_err))
return;
- }
-
- if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
- pr_err_ratelimited("CXL CPER invalid protocol error log\n");
- return;
- }
-
- if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
- pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
- prot_err->err_len);
- return;
- }
-
- if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
- pr_warn(FW_WARN "CXL CPER no device serial number\n");
guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
diff --git a/drivers/acpi/apei/ghes_helpers.c b/drivers/acpi/apei/ghes_helpers.c
new file mode 100644
index 0000000000000..e5f65f57d9ec7
--- /dev/null
+++ b/drivers/acpi/apei/ghes_helpers.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright(c) 2025 Intel Corporation. All rights reserved
+
+#include <cxl/event.h>
+
+int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
+{
+ if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
+ pr_err_ratelimited("CXL CPER invalid agent type\n");
+ return -EINVAL;
+ }
+
+ if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
+ pr_err_ratelimited("CXL CPER invalid protocol error log\n");
+ return -EINVAL;
+ }
+
+ if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
+ pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
+ prot_err->err_len);
+ return -EINVAL;
+ }
+
+ if ((prot_err->agent_type == RCD || prot_err->agent_type == DEVICE ||
+ prot_err->agent_type == LD || prot_err->agent_type == FMLD) &&
+ !(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
+ pr_warn_ratelimited(FW_WARN
+ "CXL CPER no device serial number\n");
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 6fd90f9cc2034..4d7d1036ea9cb 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -320,4 +320,14 @@ static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data
}
#endif
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err);
+#else
+static inline int
+cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
#endif /* _LINUX_CXL_EVENT_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/5 v8] acpi/ghes: Add helper to copy CPER CXL protocol error info to work struct
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
` (2 preceding siblings ...)
2025-12-19 12:39 ` [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks Fabio M. De Francesco
@ 2025-12-19 12:39 ` Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 5/5 v8] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
4 siblings, 0 replies; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco
Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent
type and copy the CPER CXL protocol errors information to a work data
structure.
Export the new symbol for reuse by ELOG.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/apei/ghes.c | 22 +--------------------
drivers/acpi/apei/ghes_helpers.c | 33 ++++++++++++++++++++++++++++++++
include/cxl/event.h | 10 ++++++++++
3 files changed, 44 insertions(+), 21 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index cc4cc7ee8422d..79755587871fa 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -711,7 +711,6 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
{
#ifdef CONFIG_ACPI_APEI_PCIEAER
struct cxl_cper_prot_err_work_data wd;
- u8 *dvsec_start, *cap_start;
if (cxl_cper_sec_prot_err_valid(prot_err))
return;
@@ -721,27 +720,8 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
if (!cxl_cper_prot_err_work)
return;
- switch (prot_err->agent_type) {
- case RCD:
- case DEVICE:
- case LD:
- case FMLD:
- case RP:
- case DSP:
- case USP:
- memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
-
- dvsec_start = (u8 *)(prot_err + 1);
- cap_start = dvsec_start + prot_err->dvsec_len;
-
- memcpy(&wd.ras_cap, cap_start, sizeof(wd.ras_cap));
- wd.severity = cper_severity_to_aer(severity);
- break;
- default:
- pr_err_ratelimited("CXL CPER invalid agent type: %d\n",
- prot_err->agent_type);
+ if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
return;
- }
if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
pr_err_ratelimited("CXL CPER kfifo overflow\n");
diff --git a/drivers/acpi/apei/ghes_helpers.c b/drivers/acpi/apei/ghes_helpers.c
index e5f65f57d9ec7..8b7f330c97b29 100644
--- a/drivers/acpi/apei/ghes_helpers.c
+++ b/drivers/acpi/apei/ghes_helpers.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
// Copyright(c) 2025 Intel Corporation. All rights reserved
+#include <linux/aer.h>
#include <cxl/event.h>
int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
@@ -30,3 +31,35 @@ int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
return 0;
}
EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
+
+int cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+ struct cxl_cper_sec_prot_err *prot_err,
+ int severity)
+{
+ u8 *dvsec_start, *cap_start;
+
+ switch (prot_err->agent_type) {
+ case RCD:
+ case DEVICE:
+ case LD:
+ case FMLD:
+ case RP:
+ case DSP:
+ case USP:
+ memcpy(&wd->prot_err, prot_err, sizeof(wd->prot_err));
+
+ dvsec_start = (u8 *)(prot_err + 1);
+ cap_start = dvsec_start + prot_err->dvsec_len;
+
+ memcpy(&wd->ras_cap, cap_start, sizeof(wd->ras_cap));
+ wd->severity = cper_severity_to_aer(severity);
+ break;
+ default:
+ pr_err_ratelimited("CXL CPER invalid agent type: %d\n",
+ prot_err->agent_type);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_cper_setup_prot_err_work_data);
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 4d7d1036ea9cb..94081aec597ae 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -322,12 +322,22 @@ static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data
#ifdef CONFIG_ACPI_APEI_PCIEAER
int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err);
+int cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+ struct cxl_cper_sec_prot_err *prot_err,
+ int severity);
#else
static inline int
cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
{
return -EOPNOTSUPP;
}
+static inline int
+cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+ struct cxl_cper_sec_prot_err *prot_err,
+ int severity)
+{
+ return -EOPNOTSUPP;
+}
#endif
#endif /* _LINUX_CXL_EVENT_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/5 v8] ACPI: extlog: Trace CPER CXL Protocol Error Section
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
` (3 preceding siblings ...)
2025-12-19 12:39 ` [PATCH 4/5 v8] acpi/ghes: Add helper to copy CPER CXL protocol error info to work struct Fabio M. De Francesco
@ 2025-12-19 12:39 ` Fabio M. De Francesco
4 siblings, 0 replies; 9+ messages in thread
From: Fabio M. De Francesco @ 2025-12-19 12:39 UTC (permalink / raw)
To: linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Fabio M. De Francesco,
Kuppuswamy Sathyanarayanan
When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.11 Appendix N.2.13). Linux parses the CPER
sections via one of two similar paths, either ELOG or GHES. The errors
managed by ELOG are signaled to the BIOS by the I/O Machine Check
Architecture (I/O MCA).
Currently, ELOG and GHES show some inconsistencies in how they report to
userspace via trace events.
Therefore, make the two mentioned paths act similarly by tracing the CPER
CXL Protocol Error Section.
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/Kconfig | 2 ++
drivers/acpi/acpi_extlog.c | 24 ++++++++++++++++++++++++
drivers/cxl/core/ras.c | 3 ++-
include/cxl/event.h | 2 ++
4 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ca00a5dbcf750..df0ff0764d0d5 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -494,6 +494,8 @@ config ACPI_EXTLOG
tristate "Extended Error Log support"
depends on X86_MCE && X86_LOCAL_APIC && EDAC
select UEFI_CPER
+ select ACPI_APEI
+ select ACPI_APEI_GHES
help
Certain usages such as Predictive Failure Analysis (PFA) require
more information about the error than what can be described in
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 88a2237772c26..7ad3b36013cc6 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -12,6 +12,7 @@
#include <linux/ratelimit.h>
#include <linux/edac.h>
#include <linux/ras.h>
+#include <cxl/event.h>
#include <acpi/ghes.h>
#include <asm/cpu.h>
#include <asm/mce.h>
@@ -162,6 +163,23 @@ static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
#endif
}
+static void
+extlog_cxl_cper_handle_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+ int severity)
+{
+#ifdef ACPI_APEI_PCIEAER
+ struct cxl_cper_prot_err_work_data wd;
+
+ if (cxl_cper_sec_prot_err_valid(prot_err))
+ return;
+
+ if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
+ return;
+
+ cxl_cper_handle_prot_err(&wd);
+#endif
+}
+
static int extlog_print(struct notifier_block *nb, unsigned long val,
void *data)
{
@@ -213,6 +231,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
+ struct cxl_cper_sec_prot_err *prot_err =
+ acpi_hest_get_payload(gdata);
+
+ extlog_cxl_cper_handle_prot_err(prot_err,
+ gdata->error_severity);
} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 2731ba3a07993..a90480d07c878 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -63,7 +63,7 @@ static int match_memdev_by_parent(struct device *dev, const void *uport)
return 0;
}
-static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
+void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
{
unsigned int devfn = PCI_DEVFN(data->prot_err.agent_addr.device,
data->prot_err.agent_addr.function);
@@ -104,6 +104,7 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
else
cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);
}
+EXPORT_SYMBOL_GPL(cxl_cper_handle_prot_err);
static void cxl_cper_prot_err_work_fn(struct work_struct *work)
{
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 94081aec597ae..ff97fea718d2c 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -340,4 +340,6 @@ cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
}
#endif
+void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *wd);
+
#endif /* _LINUX_CXL_EVENT_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks
2025-12-19 12:39 ` [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks Fabio M. De Francesco
@ 2025-12-23 23:58 ` kernel test robot
0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2025-12-23 23:58 UTC (permalink / raw)
To: Fabio M. De Francesco, linux-cxl
Cc: oe-kbuild-all, Rafael J Wysocki, Len Brown, Tony Luck,
Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, linux-media,
Shuai Xue, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Alison Schofield, Vishal Verma, Ira Weiny, Dan Williams,
Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
linux-kernel, linux-acpi, linuxppc-dev, linux-pci,
Fabio M. De Francesco
Hi Fabio,
kernel test robot noticed the following build errors:
[auto build test ERROR on ea1013c1539270e372fc99854bc6e4d94eaeff66]
url: https://github.com/intel-lab-lkp/linux/commits/Fabio-M-De-Francesco/ACPI-extlog-Trace-CPER-Non-standard-Section-Body/20251219-204338
base: ea1013c1539270e372fc99854bc6e4d94eaeff66
patch link: https://lore.kernel.org/r/20251219124042.3759749-4-fabio.m.de.francesco%40linux.intel.com
patch subject: [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks
config: arm64-defconfig (https://download.01.org/0day-ci/archive/20251224/202512240711.Iv57ik8I-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251224/202512240711.Iv57ik8I-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512240711.Iv57ik8I-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/acpi/apei/ghes_helpers.c: In function 'cxl_cper_sec_prot_err_valid':
>> drivers/acpi/apei/ghes_helpers.c:9:17: error: implicit declaration of function 'pr_err_ratelimited' [-Wimplicit-function-declaration]
9 | pr_err_ratelimited("CXL CPER invalid agent type\n");
| ^~~~~~~~~~~~~~~~~~
>> drivers/acpi/apei/ghes_helpers.c:27:17: error: implicit declaration of function 'pr_warn_ratelimited' [-Wimplicit-function-declaration]
27 | pr_warn_ratelimited(FW_WARN
| ^~~~~~~~~~~~~~~~~~~
>> drivers/acpi/apei/ghes_helpers.c:27:37: error: 'FW_WARN' undeclared (first use in this function)
27 | pr_warn_ratelimited(FW_WARN
| ^~~~~~~
drivers/acpi/apei/ghes_helpers.c:27:37: note: each undeclared identifier is reported only once for each function it appears in
>> drivers/acpi/apei/ghes_helpers.c:27:44: error: expected ')' before string constant
27 | pr_warn_ratelimited(FW_WARN
| ~ ^
| )
28 | "CXL CPER no device serial number\n");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/pr_err_ratelimited +9 drivers/acpi/apei/ghes_helpers.c
5
6 int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
7 {
8 if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
> 9 pr_err_ratelimited("CXL CPER invalid agent type\n");
10 return -EINVAL;
11 }
12
13 if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
14 pr_err_ratelimited("CXL CPER invalid protocol error log\n");
15 return -EINVAL;
16 }
17
18 if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
19 pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
20 prot_err->err_len);
21 return -EINVAL;
22 }
23
24 if ((prot_err->agent_type == RCD || prot_err->agent_type == DEVICE ||
25 prot_err->agent_type == LD || prot_err->agent_type == FMLD) &&
26 !(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> 27 pr_warn_ratelimited(FW_WARN
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body
2025-12-19 12:39 ` [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2025-12-24 1:58 ` Shuai Xue
0 siblings, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-12-24 1:58 UTC (permalink / raw)
To: Fabio M. De Francesco, linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Kuppuswamy Sathyanarayanan, Qiuxu Zhuo
On 12/19/25 8:39 PM, Fabio M. De Francesco wrote:
> ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
> (UEFI v2.11 Appendix N 2.3), extlog_print() does not. This gap was
> noticed by a RAS test that injected CXL protocol errors which were
> notified to extlog_print() via the IOMCA (I/O Machine Check
> Architecture) mechanism. Bring parity to the extlog_print() path by
> including a similar log_non_standard_event().
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index f6b9562779de0..47d11cb5c9120 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -183,6 +183,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> if (gdata->error_data_length >= sizeof(*mem))
> trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> (u8)gdata->error_severity);
> + } else {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + log_non_standard_event(sec_type, fru_id, fru_text,
> + gdata->error_severity, err,
> + gdata->error_data_length);
> }
> }
>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Thanks.
Shuai
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section
2025-12-19 12:39 ` [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2025-12-24 2:20 ` Shuai Xue
0 siblings, 0 replies; 9+ messages in thread
From: Shuai Xue @ 2025-12-24 2:20 UTC (permalink / raw)
To: Fabio M. De Francesco, linux-cxl
Cc: Rafael J Wysocki, Len Brown, Tony Luck, Borislav Petkov,
Hanjun Guo, Mauro Carvalho Chehab, Davidlohr Bueso,
Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci
On 12/19/25 8:39 PM, Fabio M. De Francesco wrote:
> I/O Machine Check Architecture events may signal failing PCIe components
> or links. The AER event contains details on what was happening on the wire
> when the error was signaled.
>
> Trace the CPER PCIe Error section (UEFI v2.11, Appendix N.2.7) reported
> by the I/O MCA.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 34 ++++++++++++++++++++++++++++++++++
> drivers/pci/pcie/aer.c | 2 +-
> 2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index 47d11cb5c9120..88a2237772c26 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -132,6 +132,36 @@ static int print_extlog_rcd(const char *pfx,
> return 1;
> }
>
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> + int severity)
> +{
> +#ifdef ACPI_APEI_PCIEAER
> + struct aer_capability_regs *aer;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + unsigned int bus;
> + int aer_severity;
> + int domain;
> +
> + if (!(pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO))
> + return;
> +
> + aer_severity = cper_severity_to_aer(severity);
> + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> + domain = pcie_err->device_id.segment;
> + bus = pcie_err->device_id.bus;
> + devfn = PCI_DEVFN(pcie_err->device_id.device,
> + pcie_err->device_id.function);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (!pdev)
> + return;
> +
> + pci_print_aer(pdev, aer_severity, aer);
> + pci_dev_put(pdev);
> +#endif
> +}
> +
> static int extlog_print(struct notifier_block *nb, unsigned long val,
> void *data)
> {
> @@ -183,6 +213,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> if (gdata->error_data_length >= sizeof(*mem))
> trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> (u8)gdata->error_severity);
> + } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
> + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
> +
> + extlog_print_pcie(pcie_err, gdata->error_severity);
Hi, Fabio,
If PCIe errors are signaled by IOMCA, do we also need to queue a work to
recover the error like we do in ghes_handle_aer()?
Thanks.
Shuai
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-24 2:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 12:39 [PATCH 0/5 v8] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 1/5 v8] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2025-12-24 1:58 ` Shuai Xue
2025-12-19 12:39 ` [PATCH 2/5 v8] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2025-12-24 2:20 ` Shuai Xue
2025-12-19 12:39 ` [PATCH 3/5 v8] acpi/ghes: Add helper for CPER CXL protocol errors checks Fabio M. De Francesco
2025-12-23 23:58 ` kernel test robot
2025-12-19 12:39 ` [PATCH 4/5 v8] acpi/ghes: Add helper to copy CPER CXL protocol error info to work struct Fabio M. De Francesco
2025-12-19 12:39 ` [PATCH 5/5 v8] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).