Linux CXL
 help / color / mirror / Atom feed
* [PATCH 1/1] cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
@ 2024-01-25  8:14 Li Ming
  2024-01-26  6:37 ` Dan Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Li Ming @ 2024-01-25  8:14 UTC (permalink / raw)
  To: linux-cxl
  Cc: dan.j.williams, terry.bowman, rrichter, Jonathan.Cameron,
	dave.jiang, Li Ming

CXL.mem protocol errors are logged in CXL RAS capability, if CXL.mem
device is unbound from CXL.mem driver, will not expect any CXL.mem
protocol errors happen on the endpoint or the dport connected to the
endpoint. Giving up these unexpected errors to avoid error handler to
access unmapped RCH dport's RAS capability. The error handler of CXL PCI
device helps to handle RAS errors happened on RCH dport. The host of the
RCH dport's RAS capability mapping is CXL.mem device, so the error
handler will access unmapped RCH dport's RAS capability after CXL.mem
device is unbound from the CXL.mem driver.

Fixes: 6ac07883dbb5 ("cxl/pci: Add RCH downstream port error logging")
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Ming <ming4.li@intel.com>
---
 drivers/cxl/core/pci.c | 43 ++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 6c9c8d92f8f7..480489f5644e 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -932,11 +932,21 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
 void cxl_cor_error_detected(struct pci_dev *pdev)
 {
 	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+	struct device *dev = &cxlds->cxlmd->dev;
+
+	scoped_guard(device, dev) {
+		if (!dev->driver) {
+			dev_warn(&pdev->dev,
+				 "%s: memdev disabled, abort error handling\n",
+				 dev_name(dev));
+			return;
+		}
 
-	if (cxlds->rcd)
-		cxl_handle_rdport_errors(cxlds);
+		if (cxlds->rcd)
+			cxl_handle_rdport_errors(cxlds);
 
-	cxl_handle_endpoint_cor_ras(cxlds);
+		cxl_handle_endpoint_cor_ras(cxlds);
+	}
 }
 EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, CXL);
 
@@ -948,16 +958,25 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
 	struct device *dev = &cxlmd->dev;
 	bool ue;
 
-	if (cxlds->rcd)
-		cxl_handle_rdport_errors(cxlds);
+	scoped_guard(device, dev) {
+		if (!dev->driver) {
+			dev_warn(&pdev->dev,
+				 "%s: memdev disabled, abort error handling\n",
+				 dev_name(dev));
+			return PCI_ERS_RESULT_DISCONNECT;
+		}
+
+		if (cxlds->rcd)
+			cxl_handle_rdport_errors(cxlds);
+		/*
+		 * A frozen channel indicates an impending reset which is fatal to
+		 * CXL.mem operation, and will likely crash the system. On the off
+		 * chance the situation is recoverable dump the status of the RAS
+		 * capability registers and bounce the active state of the memdev.
+		 */
+		ue = cxl_handle_endpoint_ras(cxlds);
+	}
 
-	/*
-	 * A frozen channel indicates an impending reset which is fatal to
-	 * CXL.mem operation, and will likely crash the system. On the off
-	 * chance the situation is recoverable dump the status of the RAS
-	 * capability registers and bounce the active state of the memdev.
-	 */
-	ue = cxl_handle_endpoint_ras(cxlds);
 
 	switch (state) {
 	case pci_channel_io_normal:
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-27  3:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-25  8:14 [PATCH 1/1] cxl/pci: Skip to handle RAS errors if CXL.mem device is detached Li Ming
2024-01-26  6:37 ` Dan Williams
2024-01-26 14:04   ` Bowman, Terry
2024-01-27  3:05     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox