From: Terry Bowman <terry.bowman@amd.com>
To: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <nifan.cxl@gmail.com>,
<dave@stgolabs.net>, <jonathan.cameron@huawei.com>,
<dave.jiang@intel.com>, <alison.schofield@intel.com>,
<vishal.l.verma@intel.com>, <dan.j.williams@intel.com>,
<bhelgaas@google.com>, <mahesh@linux.ibm.com>,
<ira.weiny@intel.com>, <oohall@gmail.com>,
<Benjamin.Cheatham@amd.com>, <rrichter@amd.com>,
<nathan.fontenot@amd.com>, <terry.bowman@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>, <lukas@wunner.de>,
<ming.li@zohomail.com>, <PradeepVineshReddy.Kodamati@amd.com>
Subject: [PATCH v6 17/17] cxl/pci: Handle CXL Endpoint and RCH protocol errors separately from PCIe errors
Date: Fri, 7 Feb 2025 18:29:41 -0600 [thread overview]
Message-ID: <20250208002941.4135321-18-terry.bowman@amd.com> (raw)
In-Reply-To: <20250208002941.4135321-1-terry.bowman@amd.com>
CXL Endpoint and Restricted CXL host (RCH) Downstream Port Protocol Errors
are currently treated as PCIe errors, which does not properly process CXL
uncorrectable (UCE) errors. When a CXL device encounters an uncorrectable
protocol error, the system should panic to prevent potential CXL memory
corruption.
Treat CXL Endpoint protocol errors as CXL errors. This requires updates in
the CXL and AER drivers.
Update the CXL Endpoint driver with a new declaration for struct
cxl_error_handlers named cxl_ep_error_handlers. Move the existing CE and
UCE handler assignments from cxl_error_handlers to the new
cxl_ep_error_handlers. Remove the 'state' parameter from the UCE handler
interface because it is not used in CXL recovery.
Update the AER driver to associate CXL Protocol errors with CXL error
handling. Change detection in handles_cxl_errors() from using
pcie_is_cxl_port() to instead use pcie_is_cxl().
Update AER driver to use CXL handlers for RCH handling.
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
drivers/cxl/core/pci.c | 26 +++++---------------------
drivers/cxl/cxlpci.h | 3 +--
drivers/cxl/pci.c | 10 +++++++---
drivers/pci/pcie/aer.c | 11 ++++++-----
4 files changed, 19 insertions(+), 31 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 36e686a31045..18d47a14959e 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1075,8 +1075,7 @@ void cxl_cor_error_detected(struct pci_dev *pdev)
}
EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
+pci_ers_result_t cxl_error_detected(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
struct cxl_memdev *cxlmd = cxlds->cxlmd;
@@ -1088,7 +1087,7 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
dev_warn(&pdev->dev,
"%s: memdev disabled, abort error handling\n",
dev_name(dev));
- return PCI_ERS_RESULT_DISCONNECT;
+ return PCI_ERS_RESULT_PANIC;
}
if (cxlds->rcd)
@@ -1102,26 +1101,11 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
ue = cxl_handle_endpoint_ras(cxlds);
}
-
- switch (state) {
- case pci_channel_io_normal:
- if (ue) {
- device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- }
- return PCI_ERS_RESULT_CAN_RECOVER;
- case pci_channel_io_frozen:
- dev_warn(&pdev->dev,
- "%s: frozen state error detected, disable CXL.mem\n",
- dev_name(dev));
+ if (ue) {
device_release_driver(dev);
- return PCI_ERS_RESULT_NEED_RESET;
- case pci_channel_io_perm_failure:
- dev_warn(&pdev->dev,
- "failure state error detected, request disconnect\n");
- return PCI_ERS_RESULT_DISCONNECT;
+ return PCI_ERS_RESULT_PANIC;
}
- return PCI_ERS_RESULT_NEED_RESET;
+ return PCI_ERS_RESULT_CAN_RECOVER;
}
EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 54e219b0049e..4b8910d934d5 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -133,6 +133,5 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
-pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
- pci_channel_state_t state);
+pci_ers_result_t cxl_error_detected(struct pci_dev *pdev);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index b2c943a4de0a..520570741402 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1104,11 +1104,14 @@ static void cxl_reset_done(struct pci_dev *pdev)
}
}
-static const struct pci_error_handlers cxl_error_handlers = {
+static const struct cxl_error_handlers cxl_ep_error_handlers = {
.error_detected = cxl_error_detected,
+ .cor_error_detected = cxl_cor_error_detected,
+};
+
+static const struct pci_error_handlers pcie_ep_error_handlers = {
.slot_reset = cxl_slot_reset,
.resume = cxl_error_resume,
- .cor_error_detected = cxl_cor_error_detected,
.reset_done = cxl_reset_done,
};
@@ -1116,7 +1119,8 @@ static struct pci_driver cxl_pci_driver = {
.name = KBUILD_MODNAME,
.id_table = cxl_mem_pci_tbl,
.probe = cxl_pci_probe,
- .err_handler = &cxl_error_handlers,
+ .err_handler = &pcie_ep_error_handlers,
+ .cxl_err_handler = &cxl_ep_error_handlers,
.dev_groups = cxl_rcd_groups,
.driver = {
.probe_type = PROBE_PREFER_ASYNCHRONOUS,
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 8e3a60411610..07c888fd4c08 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -993,7 +993,7 @@ static bool cxl_error_is_native(struct pci_dev *dev)
static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
{
struct aer_err_info *info = (struct aer_err_info *)data;
- const struct pci_error_handlers *err_handler;
+ const struct cxl_error_handlers *err_handler;
if (!is_cxl_mem_dev(dev) || !cxl_error_is_native(dev))
return 0;
@@ -1001,7 +1001,8 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
/* protect dev->driver */
device_lock(&dev->dev);
- err_handler = dev->driver ? dev->driver->err_handler : NULL;
+ err_handler = dev->driver ? dev->driver->cxl_err_handler : NULL;
+
if (!err_handler)
goto out;
@@ -1010,9 +1011,9 @@ static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
err_handler->cor_error_detected(dev);
} else if (err_handler->error_detected) {
if (info->severity == AER_NONFATAL)
- err_handler->error_detected(dev, pci_channel_io_normal);
+ err_handler->error_detected(dev);
else if (info->severity == AER_FATAL)
- err_handler->error_detected(dev, pci_channel_io_frozen);
+ err_handler->error_detected(dev);
cxl_do_recovery(dev);
}
@@ -1070,7 +1071,7 @@ static bool handles_cxl_errors(struct pci_dev *dev)
if (pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC)
pcie_walk_rcec(dev, handles_cxl_error_iter, &handles_cxl);
else
- handles_cxl = pcie_is_cxl_port(dev);
+ handles_cxl = pcie_is_cxl(dev);
return handles_cxl;
}
--
2.34.1
prev parent reply other threads:[~2025-02-08 0:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-08 0:29 [PATCH v6 00/17] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2025-02-08 0:29 ` [PATCH v6 01/17] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2025-02-08 0:29 ` [PATCH v6 02/17] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe Port support Terry Bowman
2025-02-08 0:29 ` [PATCH v6 03/17] CXL/PCI: Introduce PCIe helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
2025-02-08 0:29 ` [PATCH v6 04/17] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2025-02-08 0:29 ` [PATCH v6 05/17] PCI/AER: Add CXL PCIe Port correctable error support in AER service driver Terry Bowman
2025-02-08 0:29 ` [PATCH v6 06/17] PCI/AER: Add CXL PCIe Port uncorrectable error recovery " Terry Bowman
2025-02-10 18:43 ` Gregory Price
2025-02-10 20:13 ` Gregory Price
2025-02-08 0:29 ` [PATCH v6 07/17] cxl/pci: Map CXL PCIe Root Port and Downstream Switch Port RAS registers Terry Bowman
2025-02-08 0:29 ` [PATCH v6 08/17] cxl/pci: Map CXL PCIe Upstream " Terry Bowman
2025-02-08 0:29 ` [PATCH v6 09/17] cxl/pci: Update RAS handler interfaces to also support CXL PCIe Ports Terry Bowman
2025-02-10 20:16 ` Gregory Price
2025-02-10 20:36 ` Bowman, Terry
2025-02-08 0:29 ` [PATCH v6 10/17] cxl/pci: Add log message for unmapped registers in existing RAS handlers Terry Bowman
2025-02-08 0:29 ` [PATCH v6 11/17] cxl/pci: Change find_cxl_port() to non-static Terry Bowman
2025-02-08 0:29 ` [PATCH v6 12/17] cxl/pci: Add error handler for CXL PCIe Port RAS errors Terry Bowman
2025-02-08 0:29 ` [PATCH v6 13/17] cxl/pci: Add trace logging " Terry Bowman
2025-02-08 0:29 ` [PATCH v6 14/17] cxl/pci: Update CXL Port RAS logging to also display PCIe SBDF Terry Bowman
2025-02-08 0:29 ` [PATCH v6 15/17] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2025-02-08 0:29 ` [PATCH v6 16/17] PCI/AER: Enable internal errors for CXL Upstream and Downstream Switch Ports Terry Bowman
2025-02-08 0:29 ` Terry Bowman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250208002941.4135321-18-terry.bowman@amd.com \
--to=terry.bowman@amd.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nifan.cxl@gmail.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox