linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
@ 2025-10-23 12:25 Fabio M. De Francesco
  2025-10-23 12:25 ` [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via one of two similar paths, either ELOG or GHES.

Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.

Make the two mentioned paths act similarly for what relates to logging
and tracing.

--- Changes for v6 ---

	- Rename the helper that copies the CPER CXL protocol error
	  information to work struct (Dave)
	- Return -EOPNOTSUPP (instead of -EINVAL) from the two helpers if
	  ACPI_APEI_PCIEAER is not defined (Dave)

--- Changes for v5 ---

	- Add 3/6 to select ACPI_APEI_PCIEAER for GHES
	- Add 4,5/6 to move common code between ELOG and GHES out to new
	  helpers use them in 6/6 (Jonathan).

--- Changes for v4 ---

	- Re-base on top of recent changes of the AER error logging and
	  drop obsoleted 2/4 (Sathyanarayanan)
	- Log with pr_warn_ratelimited() (Dave)
	- Collect tags

--- Changes for v3 ---

    1/4, 2/4:
	- collect tags; no functional changes
    3/4:
	- Invert logic of checks (Yazen)
	- Select CONFIG_ACPI_APEI_PCIEAER (Yazen)
    4/4:
	- Check serial number only for CXL devices (Yazen)
	- Replace "invalid" with "unknown" in the output of a pr_err()
	  (Yazen)
	
--- Changes for v2 ---

	- Add a patch to pass log levels to pci_print_aer() (Dan)
	- Add a patch to trace CPER CXL Protocol Errors
	- Rework commit messages (Dan)
	- Use log_non_standard_event() (Bjorn)

--- Changes for v1 ---

	- Drop the RFC prefix and restart from PATCH v1
	- Drop patch 3/3 because a discussion on it has not yet been
	  settled
	- Drop namespacing in export of pci_print_aer while() (Dan)
	- Don't use '#ifdef' in *.c files (Dan)
	- Drop a reference on pdev after operation is complete (Dan)
	- Don't log an error message if pdev is NULL (Dan)

Fabio M. De Francesco (6):
  ACPI: extlog: Trace CPER Non-standard Section Body
  ACPI: extlog: Trace CPER PCI Express Error Section
  acpi/ghes: Make GHES select ACPI_APEI_PCIEAER
  acpi/ghes: Add helper for CPER CXL protocol errors validity checks
  acpi/ghes: Add helper to copy CPER CXL protocol error information to
    work struct
  ACPI: extlog: Trace CPER CXL Protocol Error Section

 drivers/acpi/Kconfig       |  1 +
 drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/apei/Kconfig  |  1 +
 drivers/acpi/apei/ghes.c   | 62 +++++++++++++++++++++++++-------------
 drivers/cxl/core/ras.c     |  6 ++++
 drivers/pci/pcie/aer.c     |  2 +-
 include/cxl/event.h        | 22 ++++++++++++++
 7 files changed, 132 insertions(+), 22 deletions(-)


base-commit: 552c50713f273b494ac6c77052032a49bc9255e2
-- 
2.51.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-28 16:19   ` Kuppuswamy Sathyanarayanan
  2025-10-23 12:25 ` [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci, Jonathan Cameron, Qiuxu Zhuo

ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
(UEFI v2.10 Appendix N 2.3), extlog_print() does not. This gap was
noticed by a RAS test that injected CXL protocol errors which were
notified to extlog_print() via the IOMCA (I/O Machine Check
Architecture) mechanism. Bring parity to the extlog_print() path by
including a similar log_non_standard_event().

Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index f6b9562779de..47d11cb5c912 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -183,6 +183,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 			if (gdata->error_data_length >= sizeof(*mem))
 				trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
 						       (u8)gdata->error_severity);
+		} else {
+			void *err = acpi_hest_get_payload(gdata);
+
+			log_non_standard_event(sec_type, fru_id, fru_text,
+					       gdata->error_severity, err,
+					       gdata->error_data_length);
 		}
 	}
 
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
  2025-10-23 12:25 ` [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-28 14:48   ` Jonathan Cameron
  2025-10-23 12:25 ` [PATCH 3/6 v6] acpi/ghes: Make GHES select ACPI_APEI_PCIEAER Fabio M. De Francesco
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

I/O Machine Check Architecture events may signal failing PCIe components
or links. The AER event contains details on what was happening on the wire
when the error was signaled.

Trace the CPER PCIe Error section (UEFI v2.10, Appendix N.2.7) reported
by the I/O MCA.

Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/Kconfig       |  1 +
 drivers/acpi/acpi_extlog.c | 32 ++++++++++++++++++++++++++++++++
 drivers/pci/pcie/aer.c     |  2 +-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ca00a5dbcf75..f8a97db075fc 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -494,6 +494,7 @@ config ACPI_EXTLOG
 	tristate "Extended Error Log support"
 	depends on X86_MCE && X86_LOCAL_APIC && EDAC
 	select UEFI_CPER
+	select ACPI_APEI_PCIEAER
 	help
 	  Certain usages such as Predictive Failure Analysis (PFA) require
 	  more information about the error than what can be described in
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 47d11cb5c912..cefe8d2d8aff 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -132,6 +132,34 @@ static int print_extlog_rcd(const char *pfx,
 	return 1;
 }
 
+static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
+			      int severity)
+{
+	struct aer_capability_regs *aer;
+	struct pci_dev *pdev;
+	unsigned int devfn;
+	unsigned int bus;
+	int aer_severity;
+	int domain;
+
+	if (!(pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID ||
+	      pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO))
+		return;
+
+	aer_severity = cper_severity_to_aer(severity);
+	aer = (struct aer_capability_regs *)pcie_err->aer_info;
+	domain = pcie_err->device_id.segment;
+	bus = pcie_err->device_id.bus;
+	devfn = PCI_DEVFN(pcie_err->device_id.device,
+			  pcie_err->device_id.function);
+	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+	if (!pdev)
+		return;
+
+	pci_print_aer(pdev, aer_severity, aer);
+	pci_dev_put(pdev);
+}
+
 static int extlog_print(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
@@ -183,6 +211,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 			if (gdata->error_data_length >= sizeof(*mem))
 				trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
 						       (u8)gdata->error_severity);
+		} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+			struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+			extlog_print_pcie(pcie_err, gdata->error_severity);
 		} else {
 			void *err = acpi_hest_get_payload(gdata);
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 0b5ed4722ac3..1b903e0644d6 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -971,7 +971,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
 		pcie_print_tlp_log(dev, &aer->header_log, info.level,
 				   dev_fmt("  "));
 }
-EXPORT_SYMBOL_NS_GPL(pci_print_aer, "CXL");
+EXPORT_SYMBOL_GPL(pci_print_aer);
 
 /**
  * add_error_device - list device to be handled
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6 v6] acpi/ghes: Make GHES select ACPI_APEI_PCIEAER
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
  2025-10-23 12:25 ` [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
  2025-10-23 12:25 ` [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-23 12:25 ` [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks Fabio M. De Francesco
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

GHES handles the PCI Express Error Section and also the Compute Express
Link (CXL) Protocol Error Section. Two of its functions depend on the
APEI PCIe AER logging/recovering support (ACPI_APEI_PCIEAER).

Make GHES select ACPI_APEI_PCIEAER and remove the conditional
compilation from the body of two static functions that handle the CPER
Error Sections mentioned above.

Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/apei/Kconfig | 1 +
 drivers/acpi/apei/ghes.c  | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 070c07d68dfb..c265b54d810d 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -23,6 +23,7 @@ config ACPI_APEI_GHES
 	select ACPI_HED
 	select IRQ_WORK
 	select GENERIC_ALLOCATOR
+	select ACPI_APEI_PCIEAER
 	select ARM_SDE_INTERFACE if ARM64
 	help
 	  Generic Hardware Error Source provides a way to report
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 97ee19f2cae0..d6fe5f020e96 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -613,7 +613,6 @@ static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata,
  */
 static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 {
-#ifdef CONFIG_ACPI_APEI_PCIEAER
 	struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
 
 	if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
@@ -646,7 +645,6 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 				  (struct aer_capability_regs *)
 				  aer_info);
 	}
-#endif
 }
 
 static BLOCKING_NOTIFIER_HEAD(vendor_record_notify_list);
@@ -711,7 +709,6 @@ struct work_struct *cxl_cper_prot_err_work;
 static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
 				   int severity)
 {
-#ifdef CONFIG_ACPI_APEI_PCIEAER
 	struct cxl_cper_prot_err_work_data wd;
 	u8 *dvsec_start, *cap_start;
 
@@ -767,7 +764,6 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
 	}
 
 	schedule_work(cxl_cper_prot_err_work);
-#endif
 }
 
 int cxl_cper_register_prot_err_work(struct work_struct *work)
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
                   ` (2 preceding siblings ...)
  2025-10-23 12:25 ` [PATCH 3/6 v6] acpi/ghes: Make GHES select ACPI_APEI_PCIEAER Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-28 14:54   ` Jonathan Cameron
  2025-10-23 12:25 ` [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct Fabio M. De Francesco
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

Move the CPER CXL protocol errors validity out of
cxl_cper_post_prot_err() to cxl_cper_sec_prot_err_valid() and limit the
serial number check only to CXL agents that are CXL devices (UEFI v2.10,
Appendix N.2.13).

Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++----------
 include/cxl/event.h      | 10 ++++++++++
 2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d6fe5f020e96..e69ae864f43d 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -706,30 +706,42 @@ static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
 static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
 struct work_struct *cxl_cper_prot_err_work;
 
-static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
-				   int severity)
+int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
 {
-	struct cxl_cper_prot_err_work_data wd;
-	u8 *dvsec_start, *cap_start;
-
 	if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
 		pr_err_ratelimited("CXL CPER invalid agent type\n");
-		return;
+		return -EINVAL;
 	}
 
 	if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
 		pr_err_ratelimited("CXL CPER invalid protocol error log\n");
-		return;
+		return -EINVAL;
 	}
 
 	if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
 		pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
 				   prot_err->err_len);
-		return;
+		return -EINVAL;
 	}
 
-	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
-		pr_warn(FW_WARN "CXL CPER no device serial number\n");
+	if ((prot_err->agent_type == RCD || prot_err->agent_type == DEVICE ||
+	     prot_err->agent_type == LD || prot_err->agent_type == FMLD) &&
+	    !(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
+		pr_warn_ratelimited(FW_WARN
+				    "CXL CPER no device serial number\n");
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
+
+static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+				   int severity)
+{
+	struct cxl_cper_prot_err_work_data wd;
+	u8 *dvsec_start, *cap_start;
+
+	if (cxl_cper_sec_prot_err_valid(prot_err))
+		return;
 
 	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
 
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 6fd90f9cc203..4d7d1036ea9c 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -320,4 +320,14 @@ static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data
 }
 #endif
 
+#ifdef CONFIG_ACPI_APEI_PCIEAER
+int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err);
+#else
+static inline int
+cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 #endif /* _LINUX_CXL_EVENT_H */
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
                   ` (3 preceding siblings ...)
  2025-10-23 12:25 ` [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-28 14:59   ` Jonathan Cameron
  2025-10-23 12:25 ` [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
  2025-10-27 19:40 ` [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Rafael J. Wysocki
  6 siblings, 1 reply; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent
type and copy the CPER CXL protocol errors information to a work data
structure.

Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/apei/ghes.c | 42 ++++++++++++++++++++++++++--------------
 include/cxl/event.h      | 10 ++++++++++
 2 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index e69ae864f43d..2f4632d9855a 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -734,20 +734,12 @@ int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
 }
 EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
 
-static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
-				   int severity)
+int cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+				      struct cxl_cper_sec_prot_err *prot_err,
+				      int severity)
 {
-	struct cxl_cper_prot_err_work_data wd;
 	u8 *dvsec_start, *cap_start;
 
-	if (cxl_cper_sec_prot_err_valid(prot_err))
-		return;
-
-	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
-
-	if (!cxl_cper_prot_err_work)
-		return;
-
 	switch (prot_err->agent_type) {
 	case RCD:
 	case DEVICE:
@@ -756,20 +748,40 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
 	case RP:
 	case DSP:
 	case USP:
-		memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
+		memcpy(&wd->prot_err, prot_err, sizeof(wd->prot_err));
 
 		dvsec_start = (u8 *)(prot_err + 1);
 		cap_start = dvsec_start + prot_err->dvsec_len;
 
-		memcpy(&wd.ras_cap, cap_start, sizeof(wd.ras_cap));
-		wd.severity = cper_severity_to_aer(severity);
+		memcpy(&wd->ras_cap, cap_start, sizeof(wd->ras_cap));
+		wd->severity = cper_severity_to_aer(severity);
 		break;
 	default:
 		pr_err_ratelimited("CXL CPER invalid agent type: %d\n",
 				   prot_err->agent_type);
-		return;
+		return -EINVAL;
 	}
 
+	return 0;
+}
+EXPORT_SYMBOL_GPL(cxl_cper_setup_prot_err_work_data);
+
+static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+				   int severity)
+{
+	struct cxl_cper_prot_err_work_data wd;
+
+	if (cxl_cper_sec_prot_err_valid(prot_err))
+		return;
+
+	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
+
+	if (!cxl_cper_prot_err_work)
+		return;
+
+	if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
+		return;
+
 	if (!kfifo_put(&cxl_cper_prot_err_fifo, wd)) {
 		pr_err_ratelimited("CXL CPER kfifo overflow\n");
 		return;
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 4d7d1036ea9c..94081aec597a 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -322,12 +322,22 @@ static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data
 
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err);
+int cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+				      struct cxl_cper_sec_prot_err *prot_err,
+				      int severity);
 #else
 static inline int
 cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
 {
 	return -EOPNOTSUPP;
 }
+static inline int
+cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
+				  struct cxl_cper_sec_prot_err *prot_err,
+				  int severity)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif /* _LINUX_CXL_EVENT_H */
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
                   ` (4 preceding siblings ...)
  2025-10-23 12:25 ` [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct Fabio M. De Francesco
@ 2025-10-23 12:25 ` Fabio M. De Francesco
  2025-10-28 15:06   ` Jonathan Cameron
  2025-10-27 19:40 ` [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Rafael J. Wysocki
  6 siblings, 1 reply; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-23 12:25 UTC (permalink / raw)
  To: linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Fabio M . De Francesco, Peter Zijlstra,
	Ingo Molnar, Guo Weikang, Xin Li, Will Deacon, Huang Yiwei,
	Gavin Shan, Smita Koralahalli, Uwe Kleine-König, Li Ming,
	Ilpo Järvinen, Kuppuswamy Sathyanarayanan, Karolina Stolarek,
	Jon Pan-Doh, Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi,
	linuxppc-dev, linux-pci

When Firmware First is enabled, BIOS handles errors first and then it makes
them available to the kernel via the Common Platform Error Record (CPER)
sections (UEFI 2.10 Appendix N). Linux parses the CPER sections via one of
two similar paths, either ELOG or GHES. The errors managed by ELOG are
signaled to the BIOS by the I/O Machine Check Architecture (I/O MCA).

Currently, ELOG and GHES show some inconsistencies in how they report to
userspace via trace events.

Therefore, make the two mentioned paths act similarly by tracing the CPER
CXL Protocol Error Section (UEFI v2.10, Appendix N.2.13).

Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 22 ++++++++++++++++++++++
 drivers/cxl/core/ras.c     |  6 ++++++
 include/cxl/event.h        |  2 ++
 3 files changed, 30 insertions(+)

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index cefe8d2d8aff..b005918517d1 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -12,6 +12,7 @@
 #include <linux/ratelimit.h>
 #include <linux/edac.h>
 #include <linux/ras.h>
+#include <cxl/event.h>
 #include <acpi/ghes.h>
 #include <asm/cpu.h>
 #include <asm/mce.h>
@@ -160,6 +161,21 @@ static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
 	pci_dev_put(pdev);
 }
 
+static void
+extlog_cxl_cper_handle_prot_err(struct cxl_cper_sec_prot_err *prot_err,
+				int severity)
+{
+	struct cxl_cper_prot_err_work_data wd;
+
+	if (cxl_cper_sec_prot_err_valid(prot_err))
+		return;
+
+	if (cxl_cper_setup_prot_err_work_data(&wd, prot_err, severity))
+		return;
+
+	cxl_cper_ras_handle_prot_err(&wd);
+}
+
 static int extlog_print(struct notifier_block *nb, unsigned long val,
 			void *data)
 {
@@ -211,6 +227,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
 			if (gdata->error_data_length >= sizeof(*mem))
 				trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
 						       (u8)gdata->error_severity);
+		} else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
+			struct cxl_cper_sec_prot_err *prot_err =
+				acpi_hest_get_payload(gdata);
+
+			extlog_cxl_cper_handle_prot_err(prot_err,
+							gdata->error_severity);
 		} else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
 
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 2731ba3a0799..3f527b0c6509 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -105,6 +105,12 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
 		cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);
 }
 
+void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd)
+{
+	cxl_cper_handle_prot_err(wd);
+}
+EXPORT_SYMBOL_GPL(cxl_cper_ras_handle_prot_err);
+
 static void cxl_cper_prot_err_work_fn(struct work_struct *work)
 {
 	struct cxl_cper_prot_err_work_data wd;
diff --git a/include/cxl/event.h b/include/cxl/event.h
index 94081aec597a..a37eef112411 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -340,4 +340,6 @@ cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
 }
 #endif
 
+void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd);
+
 #endif /* _LINUX_CXL_EVENT_H */
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
  2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
                   ` (5 preceding siblings ...)
  2025-10-23 12:25 ` [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
@ 2025-10-27 19:40 ` Rafael J. Wysocki
  2025-10-27 20:15   ` Luck, Tony
  6 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2025-10-27 19:40 UTC (permalink / raw)
  To: Fabio M. De Francesco, Tony Luck
  Cc: linux-cxl, Len Brown, Borislav Petkov, Hanjun Guo,
	Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Thu, Oct 23, 2025 at 2:26 PM Fabio M. De Francesco
<fabio.m.de.francesco@linux.intel.com> wrote:
>
> When Firmware First is enabled, BIOS handles errors first and then it
> makes them available to the kernel via the Common Platform Error Record
> (CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
> via one of two similar paths, either ELOG or GHES.
>
> Currently, ELOG and GHES show some inconsistencies in how they print to
> the kernel log as well as in how they report to userspace via trace
> events.
>
> Make the two mentioned paths act similarly for what relates to logging
> and tracing.
>
> --- Changes for v6 ---
>
>         - Rename the helper that copies the CPER CXL protocol error
>           information to work struct (Dave)
>         - Return -EOPNOTSUPP (instead of -EINVAL) from the two helpers if
>           ACPI_APEI_PCIEAER is not defined (Dave)
>
> --- Changes for v5 ---
>
>         - Add 3/6 to select ACPI_APEI_PCIEAER for GHES
>         - Add 4,5/6 to move common code between ELOG and GHES out to new
>           helpers use them in 6/6 (Jonathan).
>
> --- Changes for v4 ---
>
>         - Re-base on top of recent changes of the AER error logging and
>           drop obsoleted 2/4 (Sathyanarayanan)
>         - Log with pr_warn_ratelimited() (Dave)
>         - Collect tags
>
> --- Changes for v3 ---
>
>     1/4, 2/4:
>         - collect tags; no functional changes
>     3/4:
>         - Invert logic of checks (Yazen)
>         - Select CONFIG_ACPI_APEI_PCIEAER (Yazen)
>     4/4:
>         - Check serial number only for CXL devices (Yazen)
>         - Replace "invalid" with "unknown" in the output of a pr_err()
>           (Yazen)
>
> --- Changes for v2 ---
>
>         - Add a patch to pass log levels to pci_print_aer() (Dan)
>         - Add a patch to trace CPER CXL Protocol Errors
>         - Rework commit messages (Dan)
>         - Use log_non_standard_event() (Bjorn)
>
> --- Changes for v1 ---
>
>         - Drop the RFC prefix and restart from PATCH v1
>         - Drop patch 3/3 because a discussion on it has not yet been
>           settled
>         - Drop namespacing in export of pci_print_aer while() (Dan)
>         - Don't use '#ifdef' in *.c files (Dan)
>         - Drop a reference on pdev after operation is complete (Dan)
>         - Don't log an error message if pdev is NULL (Dan)
>
> Fabio M. De Francesco (6):
>   ACPI: extlog: Trace CPER Non-standard Section Body
>   ACPI: extlog: Trace CPER PCI Express Error Section
>   acpi/ghes: Make GHES select ACPI_APEI_PCIEAER
>   acpi/ghes: Add helper for CPER CXL protocol errors validity checks
>   acpi/ghes: Add helper to copy CPER CXL protocol error information to
>     work struct
>   ACPI: extlog: Trace CPER CXL Protocol Error Section
>
>  drivers/acpi/Kconfig       |  1 +
>  drivers/acpi/acpi_extlog.c | 60 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/apei/Kconfig  |  1 +
>  drivers/acpi/apei/ghes.c   | 62 +++++++++++++++++++++++++-------------
>  drivers/cxl/core/ras.c     |  6 ++++
>  drivers/pci/pcie/aer.c     |  2 +-
>  include/cxl/event.h        | 22 ++++++++++++++
>  7 files changed, 132 insertions(+), 22 deletions(-)
>
>
> base-commit: 552c50713f273b494ac6c77052032a49bc9255e2
> --

I need ACKs or equivalent for patches [3-5/6] from the designated APEI
reviewers.  Tony?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently
  2025-10-27 19:40 ` [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Rafael J. Wysocki
@ 2025-10-27 20:15   ` Luck, Tony
  0 siblings, 0 replies; 17+ messages in thread
From: Luck, Tony @ 2025-10-27 20:15 UTC (permalink / raw)
  To: Rafael J. Wysocki, Fabio M. De Francesco
  Cc: linux-cxl@vger.kernel.org, Len Brown, Borislav Petkov, Hanjun Guo,
	Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Jiang, Dave, Schofield, Alison, Verma, Vishal L,
	Weiny, Ira, Williams, Dan J, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-pci@vger.kernel.org

> I need ACKs or equivalent for patches [3-5/6] from the designated APEI
> reviewers.  Tony?

There's an LKP complaint against patch 3 (perhaps for a crazy randconfig, but an indication that Kconfig dependencies aren't right).

The APEI bits look ok to me. But I think 3-6 need some CXL acks too.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section
  2025-10-23 12:25 ` [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2025-10-28 14:48   ` Jonathan Cameron
  2025-10-31 10:18     ` Fabio M. De Francesco
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron @ 2025-10-28 14:48 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Thu, 23 Oct 2025 14:25:37 +0200
"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:

> I/O Machine Check Architecture events may signal failing PCIe components
> or links. The AER event contains details on what was happening on the wire
> when the error was signaled.
> 
> Trace the CPER PCIe Error section (UEFI v2.10, Appendix N.2.7) reported
> by the I/O MCA.
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Hi Fabio,

Was taking a fresh look at this as a precursor to looking at later
patches in series and spotted something that I'm doubtful about.

> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index 47d11cb5c912..cefe8d2d8aff 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -132,6 +132,34 @@ static int print_extlog_rcd(const char *pfx,
>  	return 1;
>  }
>  
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> +			      int severity)
> +{
> +	struct aer_capability_regs *aer;
> +	struct pci_dev *pdev;
> +	unsigned int devfn;
> +	unsigned int bus;
> +	int aer_severity;
> +	int domain;
> +
> +	if (!(pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID ||
> +	      pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO))

Looking again, I'm not sure this is as intended.  Is the aim to
allow for either one of these two?  Or check that that are both present? 
That is should it be !(A && B) rather than !(A || B)?


> +		return;
> +
> +	aer_severity = cper_severity_to_aer(severity);
> +	aer = (struct aer_capability_regs *)pcie_err->aer_info;
> +	domain = pcie_err->device_id.segment;
> +	bus = pcie_err->device_id.bus;
> +	devfn = PCI_DEVFN(pcie_err->device_id.device,
> +			  pcie_err->device_id.function);
> +	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> +	if (!pdev)
> +		return;
> +
> +	pci_print_aer(pdev, aer_severity, aer);
> +	pci_dev_put(pdev);
> +}



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks
  2025-10-23 12:25 ` [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks Fabio M. De Francesco
@ 2025-10-28 14:54   ` Jonathan Cameron
  2025-11-04 17:41     ` Fabio M. De Francesco
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron @ 2025-10-28 14:54 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Thu, 23 Oct 2025 14:25:39 +0200
"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:

> Move the CPER CXL protocol errors validity out of

validity check

> cxl_cper_post_prot_err() to cxl_cper_sec_prot_err_valid() and limit the

to new cxl_cper_sec_prot_err_valid() 

as otherwise it sounds like it already exists.

> serial number check only to CXL agents that are CXL devices (UEFI v2.10,
> Appendix N.2.13).

Perhaps a little more here on why.  I assume because you are going to have
a second user for it, but good to say that. Also serves to justify the
export.

> 
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
>  drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++----------
>  include/cxl/event.h      | 10 ++++++++++
>  2 files changed, 32 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d6fe5f020e96..e69ae864f43d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -706,30 +706,42 @@ static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
>  static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
>  struct work_struct *cxl_cper_prot_err_work;
>  
> -static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> -				   int severity)
> +int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)

Useful to return an error number?  Or would a bool be better given it is either
valid or not?

Otherwise looks good to me,

Jonathan

>  {
> -	struct cxl_cper_prot_err_work_data wd;
> -	u8 *dvsec_start, *cap_start;
> -
>  	if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
>  		pr_err_ratelimited("CXL CPER invalid agent type\n");
> -		return;
> +		return -EINVAL;
>  	}
>  
>  	if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
>  		pr_err_ratelimited("CXL CPER invalid protocol error log\n");
> -		return;
> +		return -EINVAL;
>  	}
>  
>  	if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
>  		pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
>  				   prot_err->err_len);
> -		return;
> +		return -EINVAL;
>  	}
>  
> -	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> -		pr_warn(FW_WARN "CXL CPER no device serial number\n");
> +	if ((prot_err->agent_type == RCD || prot_err->agent_type == DEVICE ||
> +	     prot_err->agent_type == LD || prot_err->agent_type == FMLD) &&
> +	    !(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> +		pr_warn_ratelimited(FW_WARN
> +				    "CXL CPER no device serial number\n");
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
> +
> +static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> +				   int severity)
> +{
> +	struct cxl_cper_prot_err_work_data wd;
> +	u8 *dvsec_start, *cap_start;
> +
> +	if (cxl_cper_sec_prot_err_valid(prot_err))
> +		return;
>  
>  	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
>  




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct
  2025-10-23 12:25 ` [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct Fabio M. De Francesco
@ 2025-10-28 14:59   ` Jonathan Cameron
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron @ 2025-10-28 14:59 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Thu, 23 Oct 2025 14:25:40 +0200
"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:

> Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent
> type and copy the CPER CXL protocol errors information to a work data
> structure.
> 
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
LGTM
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section
  2025-10-23 12:25 ` [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
@ 2025-10-28 15:06   ` Jonathan Cameron
  2025-11-04 16:53     ` Fabio M. De Francesco
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron @ 2025-10-28 15:06 UTC (permalink / raw)
  To: Fabio M. De Francesco
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Thu, 23 Oct 2025 14:25:41 +0200
"Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:

> When Firmware First is enabled, BIOS handles errors first and then it makes
> them available to the kernel via the Common Platform Error Record (CPER)
> sections (UEFI 2.10 Appendix N). Linux parses the CPER sections via one of
> two similar paths, either ELOG or GHES. The errors managed by ELOG are
> signaled to the BIOS by the I/O Machine Check Architecture (I/O MCA).
> 
> Currently, ELOG and GHES show some inconsistencies in how they report to
> userspace via trace events.
> 
> Therefore, make the two mentioned paths act similarly by tracing the CPER
> CXL Protocol Error Section (UEFI v2.10, Appendix N.2.13).
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>

Just one small question.   With that addressed, 
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>

> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 2731ba3a0799..3f527b0c6509 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -105,6 +105,12 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
>  		cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);
>  }
>  
> +void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd)

Why do we need this wrapper?  The name is a bit more general, so if you
do need it, then why not instead just rename cxl_cper_handle_prot_err()

> +{
> +	cxl_cper_handle_prot_err(wd);
> +}
> +EXPORT_SYMBOL_GPL(cxl_cper_ras_handle_prot_err);
> +
>  static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>  {
>  	struct cxl_cper_prot_err_work_data wd;
> diff --git a/include/cxl/event.h b/include/cxl/event.h
> index 94081aec597a..a37eef112411 100644
> --- a/include/cxl/event.h
> +++ b/include/cxl/event.h
> @@ -340,4 +340,6 @@ cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
>  }
>  #endif
>  
> +void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd);
> +
>  #endif /* _LINUX_CXL_EVENT_H */



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body
  2025-10-23 12:25 ` [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2025-10-28 16:19   ` Kuppuswamy Sathyanarayanan
  0 siblings, 0 replies; 17+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2025-10-28 16:19 UTC (permalink / raw)
  To: Fabio M. De Francesco, linux-cxl
  Cc: Rafael J . Wysocki, Len Brown, Tony Luck, Borislav Petkov,
	Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue, Davidlohr Bueso,
	Jonathan Cameron, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Karolina Stolarek, Jon Pan-Doh, Lukas Wunner, Shiju Jose,
	linux-kernel, linux-acpi, linuxppc-dev, linux-pci, Qiuxu Zhuo



On 10/23/2025 5:25 AM, Fabio M. De Francesco wrote:
> ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
> (UEFI v2.10 Appendix N 2.3), extlog_print() does not. This gap was

Latest is v2.11, right? Why not use it for reference?

> noticed by a RAS test that injected CXL protocol errors which were
> notified to extlog_print() via the IOMCA (I/O Machine Check
> Architecture) mechanism. Bring parity to the extlog_print() path by
> including a similar log_non_standard_event().
> 
> Cc: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


>  drivers/acpi/acpi_extlog.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index f6b9562779de..47d11cb5c912 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -183,6 +183,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
>  			if (gdata->error_data_length >= sizeof(*mem))
>  				trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
>  						       (u8)gdata->error_severity);
> +		} else {
> +			void *err = acpi_hest_get_payload(gdata);
> +
> +			log_non_standard_event(sec_type, fru_id, fru_text,
> +					       gdata->error_severity, err,
> +					       gdata->error_data_length);
>  		}
>  	}
>  

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section
  2025-10-28 14:48   ` Jonathan Cameron
@ 2025-10-31 10:18     ` Fabio M. De Francesco
  0 siblings, 0 replies; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-10-31 10:18 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Tuesday, October 28, 2025 3:48:16 PM Central European Standard Time Jonathan Cameron wrote:
> On Thu, 23 Oct 2025 14:25:37 +0200
> "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:
> 
> > I/O Machine Check Architecture events may signal failing PCIe components
> > or links. The AER event contains details on what was happening on the wire
> > when the error was signaled.
> > 
> > Trace the CPER PCIe Error section (UEFI v2.10, Appendix N.2.7) reported
> > by the I/O MCA.
> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> Hi Fabio,
> 
> Was taking a fresh look at this as a precursor to looking at later
> patches in series and spotted something that I'm doubtful about.
> 
> > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> > index 47d11cb5c912..cefe8d2d8aff 100644
> > --- a/drivers/acpi/acpi_extlog.c
> > +++ b/drivers/acpi/acpi_extlog.c
> > @@ -132,6 +132,34 @@ static int print_extlog_rcd(const char *pfx,
> >  	return 1;
> >  }
> >  
> > +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> > +			      int severity)
> > +{
> > +	struct aer_capability_regs *aer;
> > +	struct pci_dev *pdev;
> > +	unsigned int devfn;
> > +	unsigned int bus;
> > +	int aer_severity;
> > +	int domain;
> > +
> > +	if (!(pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID ||
> > +	      pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO))
> 
> Looking again, I'm not sure this is as intended.  Is the aim to
> allow for either one of these two?  Or check that that are both present? 
> That is should it be !(A && B) rather than !(A || B)?
> 
Hi Jonathan,

You're right. We need to check that both are true and return if they are 
not, then the statement has to be !(A && B).

Thank you,

Fabio 
> 
> > +		return;
> > +
> > +	aer_severity = cper_severity_to_aer(severity);
> > +	aer = (struct aer_capability_regs *)pcie_err->aer_info;
> > +	domain = pcie_err->device_id.segment;
> > +	bus = pcie_err->device_id.bus;
> > +	devfn = PCI_DEVFN(pcie_err->device_id.device,
> > +			  pcie_err->device_id.function);
> > +	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> > +	if (!pdev)
> > +		return;
> > +
> > +	pci_print_aer(pdev, aer_severity, aer);
> > +	pci_dev_put(pdev);
> > +}
> 
> 






^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section
  2025-10-28 15:06   ` Jonathan Cameron
@ 2025-11-04 16:53     ` Fabio M. De Francesco
  0 siblings, 0 replies; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-11-04 16:53 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Tuesday, October 28, 2025 4:06:09 PM Central European Standard Time Jonathan Cameron wrote:
> On Thu, 23 Oct 2025 14:25:41 +0200
> "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:
> 
> > When Firmware First is enabled, BIOS handles errors first and then it makes
> > them available to the kernel via the Common Platform Error Record (CPER)
> > sections (UEFI 2.10 Appendix N). Linux parses the CPER sections via one of
> > two similar paths, either ELOG or GHES. The errors managed by ELOG are
> > signaled to the BIOS by the I/O Machine Check Architecture (I/O MCA).
> > 
> > Currently, ELOG and GHES show some inconsistencies in how they report to
> > userspace via trace events.
> > 
> > Therefore, make the two mentioned paths act similarly by tracing the CPER
> > CXL Protocol Error Section (UEFI v2.10, Appendix N.2.13).
> > 
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> 
> Just one small question.   With that addressed, 
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> 
> > diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> > index 2731ba3a0799..3f527b0c6509 100644
> > --- a/drivers/cxl/core/ras.c
> > +++ b/drivers/cxl/core/ras.c
> > @@ -105,6 +105,12 @@ static void cxl_cper_handle_prot_err(struct cxl_cper_prot_err_work_data *data)
> >  		cxl_cper_trace_uncorr_prot_err(cxlmd, data->ras_cap);
> >  }
> >  
> > +void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd)
> 
> Why do we need this wrapper?  The name is a bit more general, so if you
> do need it, then why not instead just rename cxl_cper_handle_prot_err()
> 
Actually, on a second thought I believe that we don't need either this
wrapper or renaming cxl_cper_handle_prot_err(). I'll export the latter
as it is.

Fabio
> > +{
> > +	cxl_cper_handle_prot_err(wd);
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_cper_ras_handle_prot_err);
> > +
> >  static void cxl_cper_prot_err_work_fn(struct work_struct *work)
> >  {
> >  	struct cxl_cper_prot_err_work_data wd;
> > diff --git a/include/cxl/event.h b/include/cxl/event.h
> > index 94081aec597a..a37eef112411 100644
> > --- a/include/cxl/event.h
> > +++ b/include/cxl/event.h
> > @@ -340,4 +340,6 @@ cxl_cper_setup_prot_err_work_data(struct cxl_cper_prot_err_work_data *wd,
> >  }
> >  #endif
> >  
> > +void cxl_cper_ras_handle_prot_err(struct cxl_cper_prot_err_work_data *wd);
> > +
> >  #endif /* _LINUX_CXL_EVENT_H */
> 
> 






^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks
  2025-10-28 14:54   ` Jonathan Cameron
@ 2025-11-04 17:41     ` Fabio M. De Francesco
  0 siblings, 0 replies; 17+ messages in thread
From: Fabio M. De Francesco @ 2025-11-04 17:41 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-cxl, Rafael J . Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Hanjun Guo, Mauro Carvalho Chehab, Shuai Xue,
	Davidlohr Bueso, Dave Jiang, Alison Schofield, Vishal Verma,
	Ira Weiny, Dan Williams, Mahesh J Salgaonkar,
	Oliver O'Halloran, Bjorn Helgaas, Sunil V L, Xiaofei Tan,
	Mario Limonciello, Huacai Chen, Heinrich Schuchardt,
	Arnd Bergmann, Peter Zijlstra, Ingo Molnar, Guo Weikang, Xin Li,
	Will Deacon, Huang Yiwei, Gavin Shan, Smita Koralahalli,
	Uwe Kleine-König, Li Ming, Ilpo Järvinen,
	Kuppuswamy Sathyanarayanan, Karolina Stolarek, Jon Pan-Doh,
	Lukas Wunner, Shiju Jose, linux-kernel, linux-acpi, linuxppc-dev,
	linux-pci

On Tuesday, October 28, 2025 3:54:15 PM Central European Standard Time Jonathan Cameron wrote:
> On Thu, 23 Oct 2025 14:25:39 +0200
> "Fabio M. De Francesco" <fabio.m.de.francesco@linux.intel.com> wrote:
> 
> > Move the CPER CXL protocol errors validity out of
> 
> validity check
> 
> > cxl_cper_post_prot_err() to cxl_cper_sec_prot_err_valid() and limit the
> 
> to new cxl_cper_sec_prot_err_valid() 
> 
> as otherwise it sounds like it already exists.
> 
> > serial number check only to CXL agents that are CXL devices (UEFI v2.10,
> > Appendix N.2.13).
> 
> Perhaps a little more here on why.  I assume because you are going to have
> a second user for it, but good to say that. Also serves to justify the
> export.
> 
Hi Jonathan,

All the corrections you made will be applied to the next version.
> > 
> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> > ---
> >  drivers/acpi/apei/ghes.c | 32 ++++++++++++++++++++++----------
> >  include/cxl/event.h      | 10 ++++++++++
> >  2 files changed, 32 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> > index d6fe5f020e96..e69ae864f43d 100644
> > --- a/drivers/acpi/apei/ghes.c
> > +++ b/drivers/acpi/apei/ghes.c
> > @@ -706,30 +706,42 @@ static DEFINE_KFIFO(cxl_cper_prot_err_fifo, struct cxl_cper_prot_err_work_data,
> >  static DEFINE_SPINLOCK(cxl_cper_prot_err_work_lock);
> >  struct work_struct *cxl_cper_prot_err_work;
> >  
> > -static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> > -				   int severity)
> > +int cxl_cper_sec_prot_err_valid(struct cxl_cper_sec_prot_err *prot_err)
> 
> Useful to return an error number?  Or would a bool be better given it is either
> valid or not?
> 
I prefer to return more information when reasonable and leave the callers free
to use or ignore the specific error number.

Fabio
>
> Otherwise looks good to me,
> 
> Jonathan
> 
> >  {
> > -	struct cxl_cper_prot_err_work_data wd;
> > -	u8 *dvsec_start, *cap_start;
> > -
> >  	if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
> >  		pr_err_ratelimited("CXL CPER invalid agent type\n");
> > -		return;
> > +		return -EINVAL;
> >  	}
> >  
> >  	if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
> >  		pr_err_ratelimited("CXL CPER invalid protocol error log\n");
> > -		return;
> > +		return -EINVAL;
> >  	}
> >  
> >  	if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
> >  		pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
> >  				   prot_err->err_len);
> > -		return;
> > +		return -EINVAL;
> >  	}
> >  
> > -	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> > -		pr_warn(FW_WARN "CXL CPER no device serial number\n");
> > +	if ((prot_err->agent_type == RCD || prot_err->agent_type == DEVICE ||
> > +	     prot_err->agent_type == LD || prot_err->agent_type == FMLD) &&
> > +	    !(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> > +		pr_warn_ratelimited(FW_WARN
> > +				    "CXL CPER no device serial number\n");
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(cxl_cper_sec_prot_err_valid);
> > +
> > +static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> > +				   int severity)
> > +{
> > +	struct cxl_cper_prot_err_work_data wd;
> > +	u8 *dvsec_start, *cap_start;
> > +
> > +	if (cxl_cper_sec_prot_err_valid(prot_err))
> > +		return;
> >  
> >  	guard(spinlock_irqsave)(&cxl_cper_prot_err_work_lock);
> >  
> 
> 
> 






^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-11-04 17:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 12:25 [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Fabio M. De Francesco
2025-10-23 12:25 ` [PATCH 1/6 v6] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2025-10-28 16:19   ` Kuppuswamy Sathyanarayanan
2025-10-23 12:25 ` [PATCH 2/6 v6] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2025-10-28 14:48   ` Jonathan Cameron
2025-10-31 10:18     ` Fabio M. De Francesco
2025-10-23 12:25 ` [PATCH 3/6 v6] acpi/ghes: Make GHES select ACPI_APEI_PCIEAER Fabio M. De Francesco
2025-10-23 12:25 ` [PATCH 4/6 v6] acpi/ghes: Add helper for CXL protocol errors checks Fabio M. De Francesco
2025-10-28 14:54   ` Jonathan Cameron
2025-11-04 17:41     ` Fabio M. De Francesco
2025-10-23 12:25 ` [PATCH 5/6 v6] acpi/ghes: Add helper to copy CXL protocol error info to work struct Fabio M. De Francesco
2025-10-28 14:59   ` Jonathan Cameron
2025-10-23 12:25 ` [PATCH 6/6 v6] ACPI: extlog: Trace CPER CXL Protocol Error Section Fabio M. De Francesco
2025-10-28 15:06   ` Jonathan Cameron
2025-11-04 16:53     ` Fabio M. De Francesco
2025-10-27 19:40 ` [PATCH 0/6 v6] Make ELOG and GHES log and trace consistently Rafael J. Wysocki
2025-10-27 20:15   ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).