From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <nifan.cxl@gmail.com>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<dan.j.williams@intel.com>, <bhelgaas@google.com>,
<mahesh@linux.ibm.com>, <ira.weiny@intel.com>, <oohall@gmail.com>,
<Benjamin.Cheatham@amd.com>, <rrichter@amd.com>,
<nathan.fontenot@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>, <lukas@wunner.de>,
<ming.li@zohomail.com>, <PradeepVineshReddy.Kodamati@amd.com>
Subject: Re: [PATCH v8 06/16] CXL/PCI: Introduce CXL uncorrectable protocol error 'recovery'
Date: Wed, 23 Apr 2025 17:35:40 +0100 [thread overview]
Message-ID: <20250423173540.000034b3@huawei.com> (raw)
In-Reply-To: <20250327014717.2988633-7-terry.bowman@amd.com>
On Wed, 26 Mar 2025 20:47:07 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> Create cxl_do_recovery() to provide uncorrectable protocol error (UCE)
> handling. Follow similar design as found in PCIe error driver,
> pcie_do_recovery(). One difference is that cxl_do_recovery() will treat all
> UCEs as fatal with a kernel panic. This is to prevent corruption on CXL
> memory.
>
> Copy the PCIe error handlers merge_result(). Introduce PCI_ERS_RESULT_PANIC
> and add support in the merge_result() routine.
>
> Copy pci_walk_bridge() to cxl_walk_bridge(). Make a change to walk the
> first device in all cases.
>
> Copy report_error_detected() to cxl_report_error_detected(). Update this
> function to populate the CXL error information structure, 'struct
> cxl_prot_error_info', before calling the device error handler.
>
> Call panic() to halt the system in the case of uncorrectable errors (UCE)
> in cxl_do_recovery(). Export pci_aer_clear_fatal_status() for CXL to use
> if a UCE is not found. In this case the AER status must be cleared and
> uses pci_aer_clear_fatal_status().
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
> drivers/cxl/core/ras.c | 92 +++++++++++++++++++++++++++++++++++++++++-
> drivers/pci/pci.h | 2 -
> include/linux/pci.h | 5 +++
> 3 files changed, 96 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index eca8f11a05d9..1f94fc08e72b 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -141,7 +141,97 @@ int cxl_create_prot_err_info(struct pci_dev *_pdev, int severity,
> }
> EXPORT_SYMBOL_NS_GPL(cxl_create_prot_err_info, "CXL");
>
> -static void cxl_do_recovery(struct pci_dev *pdev) { }
> +
> +static pci_ers_result_t merge_result(enum pci_ers_result orig,
Rename perhaps to avoid confusion / grep clashed...
> + enum pci_ers_result new)
> +{
> + if (new == PCI_ERS_RESULT_PANIC)
> + return PCI_ERS_RESULT_PANIC;
> +
> + if (new == PCI_ERS_RESULT_NO_AER_DRIVER)
> + return PCI_ERS_RESULT_NO_AER_DRIVER;
> +
> + if (new == PCI_ERS_RESULT_NONE)
> + return orig;
> +
> + switch (orig) {
> + case PCI_ERS_RESULT_CAN_RECOVER:
> + case PCI_ERS_RESULT_RECOVERED:
> + orig = new;
> + break;
> + case PCI_ERS_RESULT_DISCONNECT:
> + if (new == PCI_ERS_RESULT_NEED_RESET)
> + orig = PCI_ERS_RESULT_NEED_RESET;
> + break;
> + default:
> + break;
> + }
> +
> + return orig;
> +}
> +
> +static void cxl_walk_bridge(struct pci_dev *bridge,
> + int (*cb)(struct pci_dev *, void *),
> + void *userdata)
> +{
> + if (cb(bridge, userdata))
> + return;
> +
> + if (bridge->subordinate)
> + pci_walk_bus(bridge->subordinate, cb, userdata);
> +}
> +
Trivial but seems there are two blank lines where one will do.
> +
> +static int cxl_report_error_detected(struct pci_dev *pdev, void *data)
> +{
> + struct cxl_driver *pdrv;
> + pci_ers_result_t vote, *result = data;
> + struct cxl_prot_error_info err_info = { 0 };
> + const struct cxl_error_handlers *cxl_err_handler;
> +
> + if (cxl_create_prot_err_info(pdev, AER_FATAL, &err_info))
> + return 0;
> +
> + struct device *dev __free(put_device) = get_device(err_info.dev);
> + if (!dev)
> + return 0;
> +
> + pdrv = to_cxl_drv(dev->driver);
> + if (!pdrv || !pdrv->err_handler ||
> + !pdrv->err_handler->error_detected)
> + return 0;
> +
> + cxl_err_handler = pdrv->err_handler;
> + vote = cxl_err_handler->error_detected(dev, &err_info);
> +
> + *result = merge_result(*result, vote);
> +
> + return 0;
> +}
> +
> +static void cxl_do_recovery(struct pci_dev *pdev)
> +{
> + struct pci_host_bridge *host = pci_find_host_bridge(pdev->bus);
> + pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER;
> +
> + cxl_walk_bridge(pdev, cxl_report_error_detected, &status);
> + if (status == PCI_ERS_RESULT_PANIC)
> + panic("CXL cachemem error.");
> +
> + /*
> + * If we have native control of AER, clear error status in the device
> + * that detected the error. If the platform retained control of AER,
> + * it is responsible for clearing this status. In that case, the
> + * signaling device may not even be visible to the OS.
> + */
> + if (host->native_aer) {
> + pcie_clear_device_status(pdev);
> + pci_aer_clear_nonfatal_status(pdev);
> + pci_aer_clear_fatal_status(pdev);
> + }
> +
> + pci_info(pdev, "CXL uncorrectable error.\n");
> +}
>
> static int cxl_rch_handle_error_iter(struct pci_dev *pdev, void *data)
> {
next prev parent reply other threads:[~2025-04-23 16:35 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 1:47 [PATCH v8 00/16] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2025-03-27 1:47 ` [PATCH v8 01/16] PCI/CXL: Introduce PCIe helper function pcie_is_cxl() Terry Bowman
2025-03-27 15:11 ` Ira Weiny
2025-03-27 15:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 02/16] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2025-03-27 16:48 ` Bjorn Helgaas
2025-03-27 17:15 ` Bowman, Terry
2025-03-27 17:49 ` Bjorn Helgaas
2025-03-27 16:58 ` Ira Weiny
2025-03-27 17:17 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 03/16] CXL/AER: Introduce Kfifo for forwarding CXL errors Terry Bowman
2025-03-27 17:08 ` Bjorn Helgaas
2025-03-27 18:12 ` Bowman, Terry
2025-03-28 17:02 ` Bjorn Helgaas
2025-03-28 17:36 ` Bowman, Terry
2025-03-28 17:01 ` Ira Weiny
2025-04-07 13:43 ` Bowman, Terry
2025-04-04 16:53 ` Jonathan Cameron
2025-04-23 14:33 ` Jonathan Cameron
2025-04-23 15:04 ` Jonathan Cameron
2025-04-23 22:12 ` Gregory Price
2025-03-27 1:47 ` [PATCH v8 04/16] cxl/aer: AER service driver forwards CXL error to CXL driver Terry Bowman
2025-03-27 17:13 ` Bjorn Helgaas
2025-04-07 14:00 ` Bowman, Terry
2025-04-23 15:04 ` Jonathan Cameron
2025-04-24 14:17 ` Bowman, Terry
2025-04-25 13:18 ` Jonathan Cameron
2025-04-25 21:03 ` Bowman, Terry
2025-05-15 21:52 ` Bowman, Terry
2025-05-20 11:04 ` Jonathan Cameron
2025-05-20 13:21 ` Bowman, Terry
2025-05-21 18:34 ` Jonathan Cameron
2025-05-21 23:30 ` Bowman, Terry
2025-04-23 22:21 ` Gregory Price
2025-03-27 1:47 ` [PATCH v8 05/16] PCI/AER: CXL driver dequeues CXL error forwarded from AER service driver Terry Bowman
2025-03-27 4:43 ` kernel test robot
2025-04-23 16:28 ` Jonathan Cameron
2025-04-24 15:03 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 06/16] CXL/PCI: Introduce CXL uncorrectable protocol error 'recovery' Terry Bowman
2025-03-27 3:37 ` kernel test robot
2025-03-27 4:19 ` kernel test robot
2025-04-23 16:35 ` Jonathan Cameron [this message]
2025-04-24 14:22 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 07/16] cxl/pci: Move existing CXL RAS initialization to CXL's cxl_port driver Terry Bowman
2025-04-17 10:18 ` Jonathan Cameron
2025-04-24 14:25 ` Bowman, Terry
2025-05-12 14:47 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 08/16] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-03-27 1:47 ` [PATCH v8 09/16] cxl/pci: Update RAS handler interfaces to also support CXL PCIe Ports Terry Bowman
2025-03-27 1:47 ` [PATCH v8 10/16] cxl/pci: Add log message if RAS registers are not mapped Terry Bowman
2025-04-23 16:41 ` Jonathan Cameron
2025-04-24 14:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 11/16] cxl/pci: Unifi CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-04-23 16:44 ` Jonathan Cameron
2025-05-07 16:28 ` Shiju Jose
2025-05-07 18:30 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 12/16] cxl/pci: Assign CXL Port protocol error handlers Terry Bowman
2025-04-23 16:47 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 13/16] cxl/pci: Assign CXL Endpoint " Terry Bowman
2025-03-27 19:46 ` kernel test robot
2025-04-23 16:49 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 14/16] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-04-17 17:22 ` Jonathan Cameron
2025-03-27 1:47 ` [PATCH v8 15/16] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-04-04 17:05 ` Jonathan Cameron
2025-04-07 14:34 ` Bowman, Terry
2025-03-27 1:47 ` [PATCH v8 16/16] CXL/PCI: Disable CXL protocol errors during CXL Port cleanup Terry Bowman
2025-03-28 1:18 ` kernel test robot
2025-04-04 17:04 ` Jonathan Cameron
2025-04-07 14:25 ` Bowman, Terry
2025-04-17 10:13 ` Jonathan Cameron
2025-04-24 16:37 ` Bowman, Terry
2025-03-27 17:16 ` [PATCH v8 00/16] Enable CXL PCIe port protocol error handling and logging Bjorn Helgaas
2025-03-27 22:04 ` Bowman, Terry
2025-05-06 23:06 ` Gregory Price
2025-05-07 18:28 ` Bowman, Terry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250423173540.000034b3@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=ming.li@zohomail.com \
--cc=nathan.fontenot@amd.com \
--cc=nifan.cxl@gmail.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.