public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Terry Bowman <terry.bowman@amd.com>
To: <dave@stgolabs.net>, <jic23@kernel.org>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <djbw@kernel.org>,
	<bhelgaas@google.com>, <shiju.jose@huawei.com>,
	<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
	<rrichter@amd.com>, <dan.carpenter@linaro.org>,
	<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
	<Benjamin.Cheatham@amd.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<vishal.l.verma@intel.com>, <alucerop@amd.com>,
	<ira.weiny@intel.com>, <corbet@lwn.net>, <rafael@kernel.org>,
	<xueshuai@linux.alibaba.com>, <linux-cxl@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-doc@vger.kernel.org>,
	<terry.bowman@amd.com>
Subject: [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers
Date: Tue, 5 May 2026 12:30:25 -0500	[thread overview]
Message-ID: <20260505173029.2718246-8-terry.bowman@amd.com> (raw)
In-Reply-To: <20260505173029.2718246-1-terry.bowman@amd.com>

Restricted CXL Host (RCH) error handling is a separate path from the
new CXL Port error handling flow. Fold RCH error handling into the
Port flow so both share a common entry point.

Update cxl_rch_handle_error_iter() to forward RCH protocol errors
through the AER-CXL kfifo.

Update cxl_handle_proto_error() to dispatch RCH errors via
cxl_handle_rdport_errors(). cxl_handle_rdport_errors() handles both
correctable and uncorrectable RCH protocol errors.

Behavior change: an RCD uncorrectable CXL RAS error now panics via
cxl_do_recovery(). Before this patch the RCH path returned
PCI_ERS_RESULT_NEED_RESET via cxl_pci's err_handler. After this patch
the same condition panics. This matches the panic policy added in the
common CXL Port protocol error flow. CXL.cachemem traffic cannot be
safely recovered from an uncorrectable protocol error in software.

Change cxl_handle_rdport_errors() to take a PCI device instead of a
CXL device state, matching the new caller context. The error trace events
emitted from this path now report device=<PCI BDF> instead of device=<memN>,
matching the rest of the unified CXL trace events. Userspace consumers keyed
off the memdev name need to map the PCI BDF back to a memdev.

Include the RCD Endpoint serial number in RCH log messages so the RCH
can be associated with its RCD.

Remove the cxlds->rcd check from cxl_cor_error_detected() and
cxl_error_detected(). RCH errors are now forwarded by
cxl_rch_handle_error_iter() through the AER-CXL kfifo to
cxl_handle_proto_error(), so cxl_pci's err_handler no longer sees
them.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>

---

Changes in v16->v17:
- Drop now-dead cxlds->rcd branches from cxl_{cor_,}error_detected().
- Drop duplicate subject line from commit body.
- Document panic-on-uncorrectable behavior change for RCD path.
- Document trace event device-name change (memN -> PCI BDF) for RCH path.
- Rewrite cxl_handle_proto_error() RC_END comment to clarify RCD/RCH shared
  interrupt relationship
- Rewrite commit message

Changes in v16:
- New commit
---
 drivers/cxl/core/core.h        |  4 ++--
 drivers/cxl/core/ras.c         | 14 +++++++++-----
 drivers/cxl/core/ras_rch.c     |  8 +++-----
 drivers/pci/pcie/aer_cxl_rch.c | 17 +----------------
 4 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index bc36cd1575a4..2c7387506dfb 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -188,7 +188,7 @@ void cxl_handle_cor_ras(struct device *dev, u64 serial,
 			void __iomem *ras_base);
 void cxl_dport_map_rch_aer(struct cxl_dport *dport);
 void cxl_disable_rch_root_ints(struct cxl_dport *dport);
-void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds);
+void cxl_handle_rdport_errors(struct pci_dev *pdev);
 void devm_cxl_dport_ras_setup(struct cxl_dport *dport);
 #else
 static inline int cxl_ras_init(void)
@@ -205,7 +205,7 @@ static inline void cxl_handle_cor_ras(struct device *dev, u64 serial,
 				      void __iomem *ras_base) { }
 static inline void cxl_dport_map_rch_aer(struct cxl_dport *dport) { }
 static inline void cxl_disable_rch_root_ints(struct cxl_dport *dport) { }
-static inline void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
+static inline void cxl_handle_rdport_errors(struct pci_dev *pdev) { }
 static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
 #endif /* CONFIG_CXL_RAS */
 
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 0a552d5a236e..1f1dd20623f6 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -267,9 +267,6 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
 			return;
 		}
 
-		if (cxlds->rcd)
-			cxl_handle_rdport_errors(cxlds);
-
 		cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
 				   cxlmd->endpoint->regs.ras);
 	}
@@ -292,8 +289,6 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
 			return PCI_ERS_RESULT_DISCONNECT;
 		}
 
-		if (cxlds->rcd)
-			cxl_handle_rdport_errors(cxlds);
 		/*
 		 * A frozen channel indicates an impending reset which is fatal to
 		 * CXL.mem operation, and will likely crash the system. On the off
@@ -329,6 +324,15 @@ EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
 static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
 				   struct cxl_dport *dport, int severity)
 {
+	/*
+	 * An RC_END device is an RCD (Restricted CXL Device). Its AER
+	 * interrupt is shared with the RCH Downstream Port, so handle RCH
+	 * Downstream Port protocol errors first before processing the RCD's
+	 * own errors. See CXL spec r3.1 s12.2.
+	 */
+	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END)
+		cxl_handle_rdport_errors(pdev);
+
 	if (severity == AER_CORRECTABLE) {
 		cxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),
 				   to_ras_base(port, dport));
diff --git a/drivers/cxl/core/ras_rch.c b/drivers/cxl/core/ras_rch.c
index 61835fbafc0f..cbd02cabefbc 100644
--- a/drivers/cxl/core/ras_rch.c
+++ b/drivers/cxl/core/ras_rch.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2025 AMD Corporation. All rights reserved. */
 
-#include <linux/types.h>
 #include <linux/aer.h>
 #include "cxl.h"
 #include "core.h"
@@ -95,9 +94,8 @@ static bool cxl_rch_get_aer_severity(struct aer_capability_regs *aer_regs,
 	return false;
 }
 
-void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
+void cxl_handle_rdport_errors(struct pci_dev *pdev)
 {
-	struct pci_dev *pdev = to_pci_dev(cxlds->dev);
 	struct aer_capability_regs aer_regs;
 	struct cxl_dport *dport;
 	int severity;
@@ -115,9 +113,9 @@ void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
 
 	pci_print_aer(pdev, severity, &aer_regs);
 	if (severity == AER_CORRECTABLE)
-		cxl_handle_cor_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
+		cxl_handle_cor_ras(&pdev->dev, pci_get_dsn(pdev),
 				   dport->regs.ras);
 	else
-		cxl_handle_ras(&cxlds->cxlmd->dev, pci_get_dsn(pdev),
+		cxl_handle_ras(&pdev->dev, pci_get_dsn(pdev),
 			       dport->regs.ras);
 }
diff --git a/drivers/pci/pcie/aer_cxl_rch.c b/drivers/pci/pcie/aer_cxl_rch.c
index e471eefec9c4..83142eac0cab 100644
--- a/drivers/pci/pcie/aer_cxl_rch.c
+++ b/drivers/pci/pcie/aer_cxl_rch.c
@@ -37,26 +37,11 @@ static bool cxl_error_is_native(struct pci_dev *dev)
 static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
 {
 	struct aer_err_info *info = (struct aer_err_info *)data;
-	const struct pci_error_handlers *err_handler;
 
 	if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
 		return 0;
 
-	guard(device)(&dev->dev);
-
-	err_handler = dev->driver ? dev->driver->err_handler : NULL;
-	if (!err_handler)
-		return 0;
-
-	if (info->severity == AER_CORRECTABLE) {
-		if (err_handler->cor_error_detected)
-			err_handler->cor_error_detected(dev);
-	} else if (err_handler->error_detected) {
-		if (info->severity == AER_NONFATAL)
-			err_handler->error_detected(dev, pci_channel_io_normal);
-		else if (info->severity == AER_FATAL)
-			err_handler->error_detected(dev, pci_channel_io_frozen);
-	}
+	cxl_forward_error(dev, info);
 	return 0;
 }
 
-- 
2.34.1


  parent reply	other threads:[~2026-05-05 17:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 17:30 [PATCH v17 00/11] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-05-05 17:30 ` [PATCH v17 01/11] PCI/AER: Introduce AER-CXL Kfifo Terry Bowman
2026-05-05 21:17   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 02/11] cxl/ras: Unify Endpoint and Port AER trace events Terry Bowman
2026-05-05 21:46   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 03/11] cxl: Use common CPER handling for all CXL devices Terry Bowman
2026-05-05 22:02   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 04/11] cxl: Rename find_cxl_port() to find_cxl_port_by_dport() Terry Bowman
2026-05-05 22:06   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 05/11] cxl: Limit CXL-CPER kfifo registration functions scope Terry Bowman
2026-05-05 22:16   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 06/11] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-05-05 17:30 ` Terry Bowman [this message]
2026-05-05 23:59   ` [PATCH v17 07/11] PCI/CXL: Add RCH support to CXL handlers Dave Jiang
2026-05-05 17:30 ` [PATCH v17 08/11] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-05-05 17:30 ` [PATCH v17 09/11] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-05-06 17:43   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 10/11] PCI/CXL: Mask/Unmask CXL protocol errors Terry Bowman
2026-05-06 18:00   ` Dave Jiang
2026-05-05 17:30 ` [PATCH v17 11/11] Documentation: cxl: Document CXL protocol error handling Terry Bowman
2026-05-06 18:34   ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260505173029.2718246-8-terry.bowman@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=dan.carpenter@linaro.org \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=djbw@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=jic23@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ming.li@zohomail.com \
    --cc=rafael@kernel.org \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=vishal.l.verma@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox